Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Label dataset #2

Open
marco-c opened this issue Jan 29, 2018 · 21 comments
Open

Label dataset #2

marco-c opened this issue Jan 29, 2018 · 21 comments

Comments

@marco-c
Copy link
Owner

marco-c commented Jan 29, 2018

The labeling can be performed using the label.py script.

This script will show you a couple of images, and then you can press 'y' to label them as being compatible, 'd' to label them as being compatible with content differences (e.g. on news site, two screenshots could be compatible even though they are showing two different news, simply because the news shown depends on the time the screenshot was taken and not on the different browser), 'n' to label them as not being compatible, 'RETURN' to skip them (in case you are not sure yet), 'ESCAPE' to terminate the current labeling session and store the current results.

More details about the three-labeling system are present in the documentation at https://github.com/marco-c/autowebcompat#labeling.

@iamvc7
Copy link
Contributor

iamvc7 commented Feb 7, 2018

@marco-c A CNN learns more about the patterns in the image (Edges, Corners and their correlations) from example 2 it is evident that it will be difficult for a NN to learn the adversary and classify that both are compatible.

To detect differences, Y+D and N in a better way or even Y and D+N, I think we can focus more on, Finding ROIs (Attention based) and feed those patches to the NN. This can be our next go-to-go (alternative) if nothing works very well after training part which you suggested.

@nok
Copy link

nok commented Feb 12, 2018

At the beginning I would start with screenshots based on equal page sources (same content), so only Y vs D+N. Furthermore I would try to normalise the device settings to bring the rendered Firefox version closer to the rendered Chrome version. And maybe we could remove the system look and feel elements by injecting a small script before the screenshot will be taken.

Repository owner deleted a comment from sagarvijaygupta Feb 20, 2018
Repository owner deleted a comment from marxmit7 Feb 20, 2018
Repository owner deleted a comment from marxmit7 Feb 20, 2018
Repository owner deleted a comment from sagarvijaygupta Feb 20, 2018
Repository owner deleted a comment from nok Feb 20, 2018
@Shashi456
Copy link
Contributor

@marco-c i'd like to label parts of our dataset, how do you suggest i go about doing that ? because as far as i've seen there is no script which merges labels from the label_persons directory into the actual labels directory .

@sagarvijaygupta
Copy link
Collaborator

@Shashi456 I think you are talking about generate_labels.py.

@Shashi456
Copy link
Contributor

Shashi456 commented May 1, 2018

@sagarvijaygupta oh , i thought it wasn't updated for the new files :P , but regardless should we not spend some time labeling the dataset we may need it this summer

@marco-c
Copy link
Owner Author

marco-c commented May 1, 2018

@marco-c i'd like to label parts of our dataset, how do you suggest i go about doing that ? because as far as i've seen there is no script which merges labels from the label_persons directory into the actual labels directory .

The script hasn't been updated yet to deal with bounding boxes, but you can already start labeling and pushing your labels file to the repo. Then, once we have the script done, we will actually combine the labeling done by you and the labeling done by other persons.

@sdv4
Copy link
Collaborator

sdv4 commented Jun 12, 2018

I am running label.py on my mac, and I am finding that it is slow or unresponsive on non-y images. For instance, it takes a long from when I try to drop a boundary box to when it shows up and for the 'T', resizing arrow, and movement arrow show up. Clicking on any causes everything to disappear until I release my mouse + a couple of seconds.

Is this a problem that anyone else has come up against?

@marco-c
Copy link
Owner Author

marco-c commented Jun 13, 2018

It could be a Mac issue, I think nobody has tested it on a Mac yet. Could you try in a Linux VM?

@sdv4
Copy link
Collaborator

sdv4 commented Jun 27, 2018

@marco-c I am not having that problem on the Linux VM, so I can label a lot faster now. A couple of questions:

  • Applying labels: suppose two images seem to only be different in terms of the position on the page that has been scrolled to (ex. Image 1 looks like image 2, except that image 2 has been scrolled down and thus exposes more of the page content). Would these be considered compatible, not compatible, or compatible but different.

  • Getting my labels into the main repo: Should I open a PR for a new branch off of my forked master that is the same as the upstream master, except that it includes my new labels?

@sdv4
Copy link
Collaborator

sdv4 commented Jun 27, 2018

Also, how would you label a pair of images when they show the same page except that one is in English and the other in Italian?

@sagarvijaygupta
Copy link
Collaborator

@sdv4 you can take help from the #220 till it is merged. Those screenshots are marked by @marco.
For the last one you should mark them incompatible while drawing bounding box on Italian side.

@marco-c
Copy link
Owner Author

marco-c commented Jun 30, 2018

Getting my labels into the main repo: Should I open a PR for a new branch off of my forked master that is the same as the upstream master, except that it includes my new labels?

Yes! You can open a PR that says "Add some labels from Shane Sims".

@marco-c
Copy link
Owner Author

marco-c commented Jun 30, 2018

Are the other two questions answered by #220?

@sagarvijaygupta
Copy link
Collaborator

sagarvijaygupta commented Jun 30, 2018

@marco-c For the scroll one we have marked them as incompatible in screenshots, and for italian one we mark bounding boxes in italian side with incompatibility in #220 .

@marco-c
Copy link
Owner Author

marco-c commented Jun 30, 2018

For the scroll one we have marked them as incompatible in screenshots

IIRC I've marked them as compatible, didn't I?

@marco-c
Copy link
Owner Author

marco-c commented Jun 30, 2018

No maybe not, they should be incompatible (e.g. if clicking on a button causes a scroll in one browser, it should cause a scroll in the other browser too).

@sagarvijaygupta
Copy link
Collaborator

driver.execute_script('arguments[0].scrollIntoView();', elem)

And if this script works differently on two browsers then also it should be an incompatibility?

@marco-c
Copy link
Owner Author

marco-c commented Jun 30, 2018

And if this script works differently on two browsers then also it should be an incompatibility?

It shouldn't, but it's hard to tell whether it was this script that failed or something else.
Maybe we should just assume this always works.

@sagarvijaygupta
Copy link
Collaborator

Okay!

@Shashi456
Copy link
Contributor

Shashi456 commented Jul 5, 2018

@marco-c @sagarvijaygupta so while i was labeling the dataset one of the major themes that popped up was how chrome had a scrollbar. Almost all images which have a scrollbar are very similar but the scrollbars adds a shift which makes the overlay look incompatible .

Should we update the crawler options for chrome to remove the scroll bar or suggest the user something accordingly in the labeling guide?

@sagarvijaygupta
Copy link
Collaborator

@Shashi456 it is already removed from the crawler.

@marco-c marco-c added this to the 3a. Accuracy improvements milestone Dec 7, 2018
@marco-c marco-c added this to To do in Increasing dataset size via automation Dec 7, 2018
marxmit7 pushed a commit to marxmit7/autowebcompat that referenced this issue Jan 18, 2019
* Update libmozdata from 0.1.40 to 0.1.43

* Update xgboost from 0.80 to 0.81

* Update imbalanced-learn from 0.3.3 to 0.4.3

* Update spacy from 2.0.12 to 2.0.16

* Update flake8 from 3.5.0 to 3.6.0

* Update flake8-coding from 1.3.0 to 1.3.1

* Update flake8-copyright from 0.2.0 to 0.2.2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

No branches or pull requests

6 participants