Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distinct labels with same description in dict.csv #31

Open
chcomin opened this issue Jun 29, 2017 · 4 comments
Open

Distinct labels with same description in dict.csv #31

chcomin opened this issue Jun 29, 2017 · 4 comments

Comments

@chcomin
Copy link

chcomin commented Jun 29, 2017

Hi, I noticed that some labels have the same description in the file dict.csv.
Is that expected? Should these cases be treated as distinct entities or is it
better to merge them into a single label?

The list of repeated descriptions is:

/m/018w8, /m/0frqg3, basketball
/m/0449p, /m/0h5wslk, jaguar
/m/07ptj3n, /m/08gqpm, cup
/m/03_mn6, /m/0j67plv, lotus
/m/02g0fy, /m/0f5mx3, fiat 500
/m/019v__, /m/0j6n39d, tvr
/g/121sl9wl, /m/02jcyv, jaguar s-type
/m/01tv9, /m/08p92x, cream
/m/02m57w, /m/0ds4250, groom
/m/080cpdy, /m/0fq0fnf, plank
/m/01l849, /m/025rs2z, gold
/m/04bct1, /m/0dzdr, chest
/m/01txr2, /m/01ty0n, spring
/m/05b1swx, /m/0633h, polar bear
/m/03c_kl, /m/0l_yv, snowshoe
/m/028ygt, /m/02jwq3, punch
/m/01d380, /m/02hhhb, drill
/m/015zzv, /m/03bxt6z, runway
/m/018xm, /m/0dpm1v, ball
/m/0jqjp, /m/0lxkm, iris
/m/0319l, /m/04lmyz, horn
/m/025rw19, /m/03cld36, iron
/m/07bg4p, /m/095_n, heart
/m/017cc, /m/04n0b__, brain
/m/091410, /m/09141t, collar
/m/01z9v6, /m/054fyh, pitcher
/m/01443y, /m/0cphhk, headgear
/m/02tcwp, /m/031vtq, /m/03hqlh, trunk
/m/0879r3, /m/09gys, squid
/m/02g387, /m/033cnk, egg
/m/011_f4, /m/0d8lm, string instrument
/m/01fpbm, /m/04c38s, daisy
/m/01j_h3, /m/0h5wwjv, subaru
/m/02g7g2, /m/0cjs7, asparagus
/m/04mtl, /m/0h5x4j3, lamborghini
/m/09xp_, /m/09xqv, cricket
/m/03xr7y, /m/04gth, lavender
/m/0266skk, /m/0m775, tilapia
/m/01qk4t, /m/0l14v3, conch
/m/020lf, /m/04rmv, mouse
/m/05h2v35, /m/0by3w, jumping
/m/06wrt, /m/0cc6_9k, sailing
/m/03qsdpk, /m/05npqn, theatre
/m/02519, /m/07s6bqg, cable car
/m/031n9j, /m/04tdh, marble
/m/02qsq_1, /m/03bx3wh, corn on the cob
/m/01cjsf, /m/02zt3, kite
/m/013y0j, /m/013y1f, organ
/m/06g1w2, /m/0hwky, pattern
/m/0cyhj_, /m/0jc_p, orange
/m/0gqbt, /m/0j3gthp, shrub
/m/06ff5p, /m/0b209p, rolls-royce corniche
/m/0fsg8, /m/0m150, harrier
/m/01lbxg, /m/039hvj, nut
/m/026y54h, /m/02823g9, /m/0bzfym, alfa romeo giulietta
/m/04f6rz, /m/0fgkh, turquoise
/m/027y004, /m/0cqdf, sponge
/m/01226z, /m/02vx4, football
/m/01m0p1, /m/0jwr9, cardinal
/m/07_l0f, /m/0gzznm, powder
/m/03clckp, /m/083vt, wood
/m/04d01f, /m/0pbc, amber
/m/07pbfj, /m/0ch_cf, fish
/m/0gccln, /m/0gccmf, ford model a
/m/01b7b, /m/027k49j, bishop
/m/0151b0, /m/07jx7, triangle
/m/01brf, /m/04_10ss, bronze
/m/014sg5, /m/07_l6, viola
/m/08g_yr, /m/0cx45, temple
/m/01c43w, /m/01v50j, crane
/m/03r18y, /m/0dj6p, peach
/m/03wfhdl, /m/0y8r, armored car
/m/04ffcj, /m/0k354, lilac
/m/06s7q8, /m/0k2jq, sabre

@chcomin
Copy link
Author

chcomin commented Jun 29, 2017

Actually, it seems that some of the descriptions (e.g., 'basketball') are indeed related to distinct concepts while others (e.g., 'alfa romeo giulietta') seem to describe the same thing.

@rkrasin
Copy link
Contributor

rkrasin commented Jun 29, 2017

Hi @chcomin,

thank you for the find. You're correct, there are two classes of errors:

  1. Different entities, same description. Like, /m/020lf, /m/04rmv, mouse
    In one case, it's a computer mouse, and in the other case, it's an animal.

    For this kinds of collisions, I would propose to make the descriptions more verbose. Like "mouse" -> "computer mouse", "mouse" -> "mouse (animal)". Feel free to make a pull request, and I will try to advocate for its acceptance.

  2. Same real entities, same description, different ids (like "alfa romeo giulietta'). A short term fix would be to modify labels so that these entities also have the same images attached. Eventually, there shall be chosen a winner, but I don't have enough information to give an informed advice here.

@SlipknotTN
Copy link

Checking the test images here http://openimages.oldjpg.com/, I see that sometimes the duplicates classes are actually the same (e.g. egg) but other times no. For example "mouse" as already said, but also "fish" (one is the animal and the other one is food).

Please notice that the 3 "alfa romeo giulietta" are:

  • New Alfa Rometo Giulietta
  • Old Alfa Romeo Giulietta
  • A mix of the two

So resolving all the duplicates would be a useful work, but we have to check all the classes, a simple merge could be wrong.

@rkrasin
Copy link
Contributor

rkrasin commented Jun 30, 2017

I agree. Let me check, if it's a good time to do with the Google team.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants