Skip to content
This repository has been archived by the owner on Dec 7, 2021. It is now read-only.

fix: test asset distribution to include all tags on test/train split #823

Merged

Conversation

hermanho
Copy link
Contributor

@hermanho hermanho commented Jun 9, 2019

The test asset may not included all tags when export with test/train split option and uneven distribution of tags in current venison (2.1.0). Here is an example:

Image 1-8 are tagged with Tag-1.
Image 9-10 are tagged with Tag-2.
Export provider: PascalVOC
Asset State: Only tagged Assets
Test / Train Split is 80%.

Output
tag-1_val.txt

asset-9.jpg -1
asset-10.jpg -1

tag-2_val.txt

asset-9.jpg 1
asset-10.jpg 1

In current venison, there is no images of tag "Tag-1" in the text file "tag-1_val.txt" because the split method is based on the whole image set so only image 9 & 10 are put in test asset. The split method is changed to extract the test assets by each tag and combine them at the end. Image 7,8,10 will put in test asset after the change applied.

Output
tag-1_val.txt

asset-7.jpg 1
asset-8.jpg 1
asset-10.jpg -1

tag-2_val.txt

asset-7.jpg -1
asset-8.jpg -1
asset-10.jpg 1

@tbarlow12 tbarlow12 closed this Aug 20, 2019
@tbarlow12 tbarlow12 reopened this Aug 20, 2019
@tbarlow12
Copy link
Contributor

@hermanho please re-target your PR to the develop branch

The test asset may not included all tags when export with test/train split option in current venison (2.1.0).
@hermanho hermanho force-pushed the fix-test-train-split-sampling branch from 52f528a to c609dc6 Compare August 21, 2019 01:43
@codecov
Copy link

codecov bot commented Aug 21, 2019

Codecov Report

Merging #823 into master will decrease coverage by 0.13%.
The diff coverage is 80.76%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #823      +/-   ##
==========================================
- Coverage    87.9%   87.76%   -0.14%     
==========================================
  Files         129      129              
  Lines        4845     4880      +35     
  Branches      922      929       +7     
==========================================
+ Hits         4259     4283      +24     
- Misses        584      595      +11     
  Partials        2        2
Impacted Files Coverage Δ
src/providers/export/pascalVOC.ts 80.29% <75.67%> (-3.92%) ⬇️
src/providers/export/cntk.ts 94.44% <93.33%> (-3.18%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 745e854...c609dc6. Read the comment docs.

@hermanho hermanho changed the base branch from master to develop August 21, 2019 01:44
@hermanho
Copy link
Contributor Author

@tbarlow12 changed

Copy link
Contributor

@wbreza wbreza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job on this - especially the shared logic used between Pascal VOC & CNTK. @JacopoMangiavacchi Can you please take a pass on these changes.

@@ -3,6 +3,7 @@ import { ExportProvider, IExportResults } from "./exportProvider";
import { IAssetMetadata, IExportProviderOptions, IProject } from "../../models/applicationState";
import HtmlFileReader from "../../common/htmlFileReader";
import Guard from "../../common/guard";
import {splitTestAsset} from "./testAssetsSplitHelper";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Spacing around { in import statements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in bfce1ec

export function splitTestAsset(allAssets: IAssetMetadata[], tags: ITag[], testSplitRatio: number): string[] {
const testAssets: string[] = [];

if (testSplitRatio > 0 && testSplitRatio <= 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider inverting if statement to reduce code nesting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 74a5a81

Copy link
Contributor

@JacopoMangiavacchi JacopoMangiavacchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Yes kudos for the splitTestAsset helper

@JacopoMangiavacchi JacopoMangiavacchi merged commit 9d64f4a into microsoft:develop Aug 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants