-
Notifications
You must be signed in to change notification settings - Fork 607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Class Balances #34
Comments
@andreasveit Andreas, can you please take a quick look? |
@rkrasin @andreasveit I'm getting similar results too. The histogram of labels does not match what is presented in the README. |
@rkrasin @andreasveit Same here. |
According to this query in BigQuery, there's 2532 categories with at least 1000 images. |
@theostrauss @tiborr @atqamar @wawamanhunt may I see your code that builds the histogram? I am pretty sure the histogram in README is close to truth based on the confirmation from BigQuery, but I also have no doubt that your results are also based on something. So, I want to reproduce your results to understand. |
Actually, given the new update posted (https://research.googleblog.com/2017/07/an-update-to-open-images-now-with.html), all histograms need to be reevaluated. :) |
Sorry for not seeing this thread sooner. I wonder if the distribution from Andreas was for human annotations whereas you are looking at the machine annotations? 83M is from the machine annotations. |
Hi,
Big fan of google research datasets, have been hoping to use this dataset to train a model.
For my model, I am looking for 2000 to 3000 balanced classes with 1000 or more observations each. I examined the aggregate training and validation annotations, to test for class balance, taking up to 3000 observations from each class. The original annotations had 83 million labels, which this operation trimmed to 1.7 million rows. I expected my distribution to be a bit flatter at the upper bound otherwise be the same as the graphs provided in the repository, show around 2500 classes with 1000 occurrences each.
The graphs provided in the repository show around 2500 classes with 1000 occurrences each. I re-downloaded the annotations and checked and rewrote my scripts a bunch of times, but still keep getting this result. I was wondering if anyone else is having this issue, or if I am missing something.
The text was updated successfully, but these errors were encountered: