Add repetition of steps in different browsers without element ids. #41

sagarvijaygupta · 2018-02-07T18:57:45Z

Solves #15

sagarvijaygupta · 2018-02-08T10:36:24Z

@marco-c please review it.

marco-c · 2018-02-08T14:56:19Z

collect.py

    else:
+        elem_id = elem_attributes['id']
        elem = driver.find_element_by_id(elem_id)


This is still only using the ID. It means we will find many more possible elements in the first browser, but then won't be able to reproduce the same steps in the second browser.

@marco-c We are able to reproduce same steps in second browser as in line 80 I do check if the id is present in elem_attributes. In most cases it is not found so the code enters the if condition and finds the element on the basis of attributes. Images in #15 demonstrate how the code reproduces the same steps for the second browser.

Oh you're right, sorry I hadn't noticed.
I would refactor this a bit so that the first branch is taken when we are looking for an element to choose, and the second branch when we already know the element we want to use (either with the attributes or the ID).

@marco-c I didn't understand what do you mean by refactoring it? Should I create two functions?

I mean something like this:

if elem.attributes is None: CODE THAT CHOOSES AN ELEMENT FROM THE PAGE else: FIND AN ALREADY CHOSEN ELEMENT BY ID OR USING ALL THE ATTRIBUTES

So the first branch implements the "choose random element" part, the second branch implements the "I already know what element I want to use, let's find it on the page" part.

@marco-c I made the required change. Please check.

marco-c · 2018-02-08T15:05:46Z

I'm thinking that to avoid confusion, maybe we can actually only have two possible formats:
WEBCOMPAT-ID_BROWSER.png for the first load screenshots;
WEBCOMPAT-ID_SEQUENCE-NUM_BROWSER.png for the screenshots generated after interacting with an element (and a single WEBCOMPAT-ID_SEQUENCE.txt file containing the attributes of the elements interacted with).

And we have to convert the already existing data to this format.
To do it, we should write a script that renames the WEBCOMPAT-ID_ELEMENT-ID_SEQUENCE-NUM_BROWSER.png files to WEBCOMPAT-ID_SEQUENCE-NUM_BROWSER.png, storing the ELEMENT-IDs in a WEBCOMPAT-ID_SEQUENCE.txt file.

marco-c · 2018-02-08T20:38:00Z

collect.py

-            if elem_id == '':
-                continue
+            # Get all the attributes of the child.
+            child_attributes = driver.execute_script('var elem_attribute = {}; for (i = 0; i < arguments[0].attributes.length; i++) { elem_attribute[arguments[0].attributes[i].name] = arguments[0].attributes[i].value }; return elem_attribute;', child)


Can you make this script more readable by using a multiline string? Like in wait_loaded.

It would be great if we could use the Selenium API to do the same, but for now it's fine.

@marco-c I will try to do it with Selenium API.

marco-c · 2018-02-08T20:39:15Z

collect.py

+            for child in children:
+
+                # Get all the attributes of the child.
+                child_attributes = driver.execute_script('var elem_attribute = {}; for (i = 0; i < arguments[0].attributes.length; i++) { elem_attribute[arguments[0].attributes[i].name] = arguments[0].attributes[i].value }; return elem_attribute;', child)


Since this is repeated, it should be in a separate function.

marco-c · 2018-02-08T20:39:52Z

collect.py


            # If the element is not displayed or is disabled, the user can't interact with it. Skip
            # non-displayed/disabled elements, since we're trying to mimic a real user.
            if not child.is_displayed() or not child.is_enabled():
                continue
-


Don't remove this newline

marco-c · 2018-02-08T20:40:30Z

collect.py

+                # Get all the attributes of the child.
+                child_attributes = driver.execute_script('var elem_attribute = {}; for (i = 0; i < arguments[0].attributes.length; i++) { elem_attribute[arguments[0].attributes[i].name] = arguments[0].attributes[i].value }; return elem_attribute;', child)
+
+                # If the element is not displayed or is disabled, the user can't interact with it. Skip


I think we can ignore this part, as we already know that the element was displayed and not enabled in the first browser.

@marco-c Because of two browsers can display in different way it might be that one browser displays the element while other doesn't. Should I remove it?

If that happens, it's a compatibility issue and we want to detect it, so it's useful to take a screenshot even if the element is not displayed in the second browser.

Oh ok! I get it.

marco-c · 2018-02-08T20:42:05Z

collect.py

@@ -197,6 +220,10 @@ def run_tests(firefox_driver, chrome_driver):
               not os.path.exists('data/%d_chrome.png' % bug['id']):
                sequence = run_test(bug, 'firefox', firefox_driver)
                run_test(bug, 'chrome', chrome_driver, sequence)
+
+                with open("data/" + str(bug['id']) + ".txt", 'w') as f:
+                    f.write(json.dumps(sequence))


Let's write an object of the sequence per line, instead of a full json object. This way it's easier to see diffs.

marco-c · 2018-02-08T20:42:40Z

collect.py

-            print('  - Using %s' % elem_id)
+            print('  - Using %s' % elem_attributes)
+            image_file = str(bug['id']) + '_' + str(i) + '_' + browser
+            screenshot(driver, 'data/%s.png' % (image_file))


Since this is breaking compatibility with the old way of doing things, before merging this we will also need a script to convert the current data to the new format.

Updated collect.py

marco-c · 2018-02-09T10:50:08Z

collect.py

-def do_something(driver, elem_id=None):
+def get_all_attributes(driver, child):
+    child_attributes = driver.execute_script("""
+        var elem_attribute = {};


Use let instead of var.

marco-c · 2018-02-09T10:50:17Z

collect.py

+    child_attributes = driver.execute_script("""
+        var elem_attribute = {};
+
+        for (i = 0; i < arguments[0].attributes.length; i++) {


Use let i

marco-c · 2018-02-09T10:52:31Z

collect.py

@@ -197,6 +235,11 @@ def run_tests(firefox_driver, chrome_driver):
               not os.path.exists('data/%d_chrome.png' % bug['id']):
                sequence = run_test(bug, 'firefox', firefox_driver)
                run_test(bug, 'chrome', chrome_driver, sequence)
+
+                with open("data/" + str(bug['id']) + ".txt", 'w') as f:


Nit: Use % instead of + as it is more readable

'data/%d.txt' % bug['id']

marco-c · 2018-02-09T10:53:17Z

collect.py

+            children = buttons + links + inputs
+
+            for child in children:
+


Nit: no newline here

This is still here 😄

marco-c · 2018-02-09T10:55:05Z

collect.py

@@ -92,12 +107,9 @@ def do_something(driver, elem_id=None):
        random.shuffle(children)

        for child in children:
-            elem_id = child.get_attribute('id')



Nit: no newline here

This is still here 😄

marco-c · 2018-02-09T11:09:17Z

rename_images.py

@@ -0,0 +1,19 @@
+from os import listdir, rename


The script should also generate the .txt files accompanying the images.
I suggest you use split('_') to get the components from the name (if there are two components, it's an image in the format WEBCOMPAT-ID_BROWSER.png and you can skip it; if there are more than two components, it's an image in the format WEBCOMPAT-ID_ELEMENT-ID_SEQUENCE-NUM_BROWSER.png and you have to rename it and put in the .txt file in order of SEQUENCE-NUM). You can use this https://github.com/marco-c/autowebcompat/blob/master/data_inconsistencies.py#L12 as inspiration

marco-c · 2018-02-09T17:16:57Z

collect.py

        for child in children:
-            elem_id = child.get_attribute('id')



Nit: remove newline here

marco-c · 2018-02-09T17:17:16Z

collect.py

+            inputs = body.find_elements_by_tag_name('input')
+            children = buttons + links + inputs
+            for child in children:
+


Nit: remove newline here

marco-c · 2018-02-09T17:17:30Z

collect.py

+                # If the element is not displayed or is disabled, the user can't interact with it. Skip
+                # non-displayed/disabled elements, since we're trying to mimic a real user.
+                if not child.is_displayed() or not child.is_enabled():
+                    continue


As discussed, this part should be removed.

marco-c · 2018-02-09T17:19:24Z

rename_images.py

+    parts = os.path.splitext(f)[0].split('_')
+    if len(parts) <= 2:
+        continue
+    if parts[0] not in image_info.keys():


Assign names to the parts before this line, otherwise the rest of the code is hard to read.

marco-c · 2018-02-09T17:27:25Z

rename_images.py

+
+for key, attributes in image_info.items():
+    with open("./data/%s.txt" % key, "w") as text_file:
+        attributes = sorted(attributes.items())


Isn't this going to sort by element ID rather than sequence number?

@marco-c No actually because attributes is a dictionary of { seq_no : elem_id }
I changed the variables for better understanding.

marco-c · 2018-02-10T00:46:55Z

collect.py

+
+        for (let i = 0; i < arguments[0].attributes.length; i++) {
+          elem_attribute[arguments[0].attributes[i].name] = arguments[0].attributes[i].value;
+          }


Nit: This is not aligned correctly.

marco-c · 2018-02-10T00:47:22Z

collect.py

+          }
+
+        return elem_attribute;
+        """, child)


Nit: align this with child_attributes.

marco-c · 2018-02-10T00:47:49Z

collect.py

@@ -90,14 +104,10 @@ def do_something(driver, elem_id=None):
        children = buttons + links + inputs

        random.shuffle(children)
-


Nit: don't remove newline here 😄

@marco-c Is there any way to know where to put newline and where not? 😆

Don't remove newlines unless they are related to your changes;

No newlines after a if, for, def, etc.;

Newline to separate two logic blocks.

@marco-c I will make sure nothing related to newline comes in my future PRs

Don't worry, they are minor annoyances.

marco-c · 2018-02-10T00:48:34Z

collect.py

+            links = body.find_elements_by_tag_name('a')
+            inputs = body.find_elements_by_tag_name('input')
+            children = buttons + links + inputs
+            for child in children:


Nit: newline before this for

marco-c · 2018-02-10T00:50:22Z

rename_images.py

+        seq_no_and_elem_id = sorted(seq_no_and_elem_id.items())
+        for value in seq_no_and_elem_id:
+            sequence_no = value[0]
+            elem_id = value[1][0]


This is going to use only the first part of the element ID (e.g. for an_element_id, this variable would be an). I would replace line 12 with '_'.join(parts[1:-2]) and this line with value[1].

@marco-c Done!

marco-c · 2018-02-10T12:35:39Z

rename_images.py

+        for value in seq_no_and_elem_id:
+            sequence_no = value[0]
+            elem_id = value[1]
+            text_file.write("%s %s\n" % (sequence_no, elem_id))


Note that here you're writing text, but in collect.py you are writing json objects.
The format of the files should be the same.

@marco-c For format to be same I made {"id" : "elem_id"} per line according to sequence number in rename_images.py.

It would be better to do json.dumps({'id': elem_id})

It doesn't matter though, the end result is the same.

Can you run the script on your machine to rename the files? Then zip the resulting data directory and upload it on Dropbox or some other service. Before merging this PR, I will have to update get_dependencies.py to use the new data.zip file.

@marco-c I will try to upload it today. Got a better Internet connection 😄

@marco-c
https://www.dropbox.com/s/rsdq581uteu3mtc/data.zip?dl=0
this is the link with the renamed files.

sagarvijaygupta · 2018-02-13T12:16:30Z

@marco-c I just realized the assert statement in get_images in utils.py called in labels.py will fail. Should I keep the txt files in another folder?

marco-c · 2018-02-13T14:38:13Z

@marco-c I just realized the assert statement in get_images in utils.py called in labels.py will fail. Should I keep the txt files in another folder?

We should update get_all_images in utils.py to only return png files, and update prepare_images to use get_all_images instead of os.listdir.

The rename_images.py script should also update the labels files.

marco-c · 2018-02-13T14:46:45Z

We should update get_all_images in utils.py to only return png files, and update prepare_images to use get_all_images instead of os.listdir.

I've done these changes myself.

The only thing missing now is rewriting the label files with the new image names. Once that's done, I will replace the current data.zip file with the one you uploaded, then you can remove the rename_images.py script from the PR and we can finally merge it!

sagarvijaygupta · 2018-02-13T15:48:31Z

@marco-c I made the required changes. (removed rename_images.py and updated the labels.csv and other csv)

marco-c · 2018-02-13T18:22:13Z

Thanks for the hard work! I'm verifying the new data.zip file you've uploaded and then I'll merge this.

propr · 2018-02-14T00:07:29Z

Please provide your feedback on this pull request here.

Privacy statement: We don't store any personal information such as your email address or name. We ask for GitHub authentication as an anonymous identifier to account for duplicate feedback entries and to see people specific preferences.

Add repetition of steps in different browsers without element ids.

4cde9b6

Solves marco-c#15

marco-c reviewed Feb 8, 2018

View reviewed changes

Update to save a WEBCOMPAT-ID_SEQUENCE.txt file.

b435e05

sagarvijaygupta force-pushed the id_less branch from ced0fca to b435e05 Compare February 8, 2018 16:48

sagarvijaygupta added 2 commits February 9, 2018 00:40

Refactoring collect.py.

e99e344

Merge branch 'master' of github.com:marco-c/autowebcompat into id_less

8bec55d

marco-c reviewed Feb 8, 2018

View reviewed changes

Add script to change file names according to new convention.

86612cd

Updated collect.py

marco-c reviewed Feb 9, 2018

View reviewed changes

Updated rename_images.py to write to file with sequence number.

97f4bd3

marco-c reviewed Feb 9, 2018

View reviewed changes

Update with new variables for better understanding.

72735fa

marco-c reviewed Feb 10, 2018

View reviewed changes

Merge branch 'master' of github.com:marco-c/autowebcompat into id_less

5d4c0bc

sagarvijaygupta force-pushed the id_less branch from 914efd8 to 5d4c0bc Compare February 10, 2018 12:06

marco-c reviewed Feb 10, 2018

View reviewed changes

sagarvijaygupta force-pushed the id_less branch 2 times, most recently from fa61e6e to 3ff0cc0 Compare February 13, 2018 05:06

Updated rename_images.py to work same as collect.py

9e5f074

sagarvijaygupta force-pushed the id_less branch from 3ff0cc0 to 9e5f074 Compare February 13, 2018 05:06

Update labels.csv according to new naming convention.

eb7ef6c

Merge branch 'master' into id_less

598efa2

marco-c approved these changes Feb 14, 2018

View reviewed changes

marco-c merged commit 5bb234e into marco-c:master Feb 14, 2018

sagarvijaygupta deleted the id_less branch May 29, 2018 18:19

		@@ -90,14 +104,10 @@ def do_something(driver, elem_id=None):
		children = buttons + links + inputs

		random.shuffle(children)

Add repetition of steps in different browsers without element ids. #41

Add repetition of steps in different browsers without element ids. #41

Conversation

sagarvijaygupta commented Feb 7, 2018

sagarvijaygupta commented Feb 8, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marco-c commented Feb 8, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sagarvijaygupta Feb 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sagarvijaygupta commented Feb 13, 2018 • edited Loading

marco-c commented Feb 13, 2018

marco-c commented Feb 13, 2018

sagarvijaygupta commented Feb 13, 2018

marco-c commented Feb 13, 2018

propr bot commented Feb 14, 2018

sagarvijaygupta Feb 10, 2018 •

edited

Loading

sagarvijaygupta commented Feb 13, 2018 •

edited

Loading