Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add repetition of steps in different browsers without element ids. #41

Merged
merged 11 commits into from
Feb 14, 2018

Conversation

sagarvijaygupta
Copy link
Collaborator

Solves #15

@sagarvijaygupta
Copy link
Collaborator Author

@marco-c please review it.

collect.py Outdated
else:
elem_id = elem_attributes['id']
elem = driver.find_element_by_id(elem_id)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still only using the ID. It means we will find many more possible elements in the first browser, but then won't be able to reproduce the same steps in the second browser.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marco-c We are able to reproduce same steps in second browser as in line 80 I do check if the id is present in elem_attributes. In most cases it is not found so the code enters the if condition and finds the element on the basis of attributes. Images in #15 demonstrate how the code reproduces the same steps for the second browser.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you're right, sorry I hadn't noticed.
I would refactor this a bit so that the first branch is taken when we are looking for an element to choose, and the second branch when we already know the element we want to use (either with the attributes or the ID).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marco-c I didn't understand what do you mean by refactoring it? Should I create two functions?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean something like this:

if elem.attributes is None:
    CODE THAT CHOOSES AN ELEMENT FROM THE PAGE
else:
    FIND AN ALREADY CHOSEN ELEMENT BY ID OR USING ALL THE ATTRIBUTES

So the first branch implements the "choose random element" part, the second branch implements the "I already know what element I want to use, let's find it on the page" part.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marco-c I made the required change. Please check.

@marco-c
Copy link
Owner

marco-c commented Feb 8, 2018

I'm thinking that to avoid confusion, maybe we can actually only have two possible formats:
WEBCOMPAT-ID_BROWSER.png for the first load screenshots;
WEBCOMPAT-ID_SEQUENCE-NUM_BROWSER.png for the screenshots generated after interacting with an element (and a single WEBCOMPAT-ID_SEQUENCE.txt file containing the attributes of the elements interacted with).

And we have to convert the already existing data to this format.
To do it, we should write a script that renames the WEBCOMPAT-ID_ELEMENT-ID_SEQUENCE-NUM_BROWSER.png files to WEBCOMPAT-ID_SEQUENCE-NUM_BROWSER.png, storing the ELEMENT-IDs in a WEBCOMPAT-ID_SEQUENCE.txt file.

collect.py Outdated
if elem_id == '':
continue
# Get all the attributes of the child.
child_attributes = driver.execute_script('var elem_attribute = {}; for (i = 0; i < arguments[0].attributes.length; i++) { elem_attribute[arguments[0].attributes[i].name] = arguments[0].attributes[i].value }; return elem_attribute;', child)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this script more readable by using a multiline string? Like in wait_loaded.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if we could use the Selenium API to do the same, but for now it's fine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marco-c I will try to do it with Selenium API.

collect.py Outdated
for child in children:

# Get all the attributes of the child.
child_attributes = driver.execute_script('var elem_attribute = {}; for (i = 0; i < arguments[0].attributes.length; i++) { elem_attribute[arguments[0].attributes[i].name] = arguments[0].attributes[i].value }; return elem_attribute;', child)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is repeated, it should be in a separate function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

collect.py Outdated

# If the element is not displayed or is disabled, the user can't interact with it. Skip
# non-displayed/disabled elements, since we're trying to mimic a real user.
if not child.is_displayed() or not child.is_enabled():
continue

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't remove this newline

collect.py Outdated
# Get all the attributes of the child.
child_attributes = driver.execute_script('var elem_attribute = {}; for (i = 0; i < arguments[0].attributes.length; i++) { elem_attribute[arguments[0].attributes[i].name] = arguments[0].attributes[i].value }; return elem_attribute;', child)

# If the element is not displayed or is disabled, the user can't interact with it. Skip
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can ignore this part, as we already know that the element was displayed and not enabled in the first browser.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marco-c Because of two browsers can display in different way it might be that one browser displays the element while other doesn't. Should I remove it?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that happens, it's a compatibility issue and we want to detect it, so it's useful to take a screenshot even if the element is not displayed in the second browser.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok! I get it.

collect.py Outdated
@@ -197,6 +220,10 @@ def run_tests(firefox_driver, chrome_driver):
not os.path.exists('data/%d_chrome.png' % bug['id']):
sequence = run_test(bug, 'firefox', firefox_driver)
run_test(bug, 'chrome', chrome_driver, sequence)

with open("data/" + str(bug['id']) + ".txt", 'w') as f:
f.write(json.dumps(sequence))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's write an object of the sequence per line, instead of a full json object. This way it's easier to see diffs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok!

print(' - Using %s' % elem_id)
print(' - Using %s' % elem_attributes)
image_file = str(bug['id']) + '_' + str(i) + '_' + browser
screenshot(driver, 'data/%s.png' % (image_file))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is breaking compatibility with the old way of doing things, before merging this we will also need a script to convert the current data to the new format.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

collect.py Outdated
def do_something(driver, elem_id=None):
def get_all_attributes(driver, child):
child_attributes = driver.execute_script("""
var elem_attribute = {};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use let instead of var.

collect.py Outdated
child_attributes = driver.execute_script("""
var elem_attribute = {};

for (i = 0; i < arguments[0].attributes.length; i++) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use let i

collect.py Outdated
@@ -197,6 +235,11 @@ def run_tests(firefox_driver, chrome_driver):
not os.path.exists('data/%d_chrome.png' % bug['id']):
sequence = run_test(bug, 'firefox', firefox_driver)
run_test(bug, 'chrome', chrome_driver, sequence)

with open("data/" + str(bug['id']) + ".txt", 'w') as f:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Use % instead of + as it is more readable

'data/%d.txt' % bug['id']

collect.py Outdated
children = buttons + links + inputs

for child in children:

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: no newline here

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still here 😄

collect.py Outdated
@@ -92,12 +107,9 @@ def do_something(driver, elem_id=None):
random.shuffle(children)

for child in children:
elem_id = child.get_attribute('id')

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: no newline here

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still here 😄

rename_images.py Outdated
@@ -0,0 +1,19 @@
from os import listdir, rename
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script should also generate the .txt files accompanying the images.
I suggest you use split('_') to get the components from the name (if there are two components, it's an image in the format WEBCOMPAT-ID_BROWSER.png and you can skip it; if there are more than two components, it's an image in the format WEBCOMPAT-ID_ELEMENT-ID_SEQUENCE-NUM_BROWSER.png and you have to rename it and put in the .txt file in order of SEQUENCE-NUM). You can use this https://github.com/marco-c/autowebcompat/blob/master/data_inconsistencies.py#L12 as inspiration

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

collect.py Outdated
for child in children:
elem_id = child.get_attribute('id')

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: remove newline here

collect.py Outdated
inputs = body.find_elements_by_tag_name('input')
children = buttons + links + inputs
for child in children:

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: remove newline here

collect.py Outdated
# If the element is not displayed or is disabled, the user can't interact with it. Skip
# non-displayed/disabled elements, since we're trying to mimic a real user.
if not child.is_displayed() or not child.is_enabled():
continue
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, this part should be removed.

rename_images.py Outdated
parts = os.path.splitext(f)[0].split('_')
if len(parts) <= 2:
continue
if parts[0] not in image_info.keys():
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assign names to the parts before this line, otherwise the rest of the code is hard to read.

rename_images.py Outdated

for key, attributes in image_info.items():
with open("./data/%s.txt" % key, "w") as text_file:
attributes = sorted(attributes.items())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this going to sort by element ID rather than sequence number?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marco-c No actually because attributes is a dictionary of { seq_no : elem_id }
I changed the variables for better understanding.

collect.py Outdated

for (let i = 0; i < arguments[0].attributes.length; i++) {
elem_attribute[arguments[0].attributes[i].name] = arguments[0].attributes[i].value;
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This is not aligned correctly.

collect.py Outdated
}

return elem_attribute;
""", child)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: align this with child_attributes.

collect.py Outdated
@@ -90,14 +104,10 @@ def do_something(driver, elem_id=None):
children = buttons + links + inputs

random.shuffle(children)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: don't remove newline here 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marco-c Is there any way to know where to put newline and where not? 😆

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Don't remove newlines unless they are related to your changes;
  2. No newlines after a if, for, def, etc.;
  3. Newline to separate two logic blocks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marco-c I will make sure nothing related to newline comes in my future PRs

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry, they are minor annoyances.

collect.py Outdated
links = body.find_elements_by_tag_name('a')
inputs = body.find_elements_by_tag_name('input')
children = buttons + links + inputs
for child in children:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: newline before this for

rename_images.py Outdated
seq_no_and_elem_id = sorted(seq_no_and_elem_id.items())
for value in seq_no_and_elem_id:
sequence_no = value[0]
elem_id = value[1][0]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to use only the first part of the element ID (e.g. for an_element_id, this variable would be an). I would replace line 12 with '_'.join(parts[1:-2]) and this line with value[1].

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marco-c Done!

rename_images.py Outdated
for value in seq_no_and_elem_id:
sequence_no = value[0]
elem_id = value[1]
text_file.write("%s %s\n" % (sequence_no, elem_id))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that here you're writing text, but in collect.py you are writing json objects.
The format of the files should be the same.

Copy link
Collaborator Author

@sagarvijaygupta sagarvijaygupta Feb 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marco-c For format to be same I made {"id" : "elem_id"} per line according to sequence number in rename_images.py.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to do json.dumps({'id': elem_id})

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't matter though, the end result is the same.

Can you run the script on your machine to rename the files? Then zip the resulting data directory and upload it on Dropbox or some other service. Before merging this PR, I will have to update get_dependencies.py to use the new data.zip file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marco-c I will try to upload it today. Got a better Internet connection 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marco-c
https://www.dropbox.com/s/rsdq581uteu3mtc/data.zip?dl=0
this is the link with the renamed files.

@sagarvijaygupta
Copy link
Collaborator Author

sagarvijaygupta commented Feb 13, 2018

@marco-c I just realized the assert statement in get_images in utils.py called in labels.py will fail. Should I keep the txt files in another folder?

@marco-c
Copy link
Owner

marco-c commented Feb 13, 2018

@marco-c I just realized the assert statement in get_images in utils.py called in labels.py will fail. Should I keep the txt files in another folder?

We should update get_all_images in utils.py to only return png files, and update prepare_images to use get_all_images instead of os.listdir.

The rename_images.py script should also update the labels files.

@marco-c
Copy link
Owner

marco-c commented Feb 13, 2018

We should update get_all_images in utils.py to only return png files, and update prepare_images to use get_all_images instead of os.listdir.

I've done these changes myself.

The only thing missing now is rewriting the label files with the new image names. Once that's done, I will replace the current data.zip file with the one you uploaded, then you can remove the rename_images.py script from the PR and we can finally merge it!

@sagarvijaygupta
Copy link
Collaborator Author

@marco-c I made the required changes. (removed rename_images.py and updated the labels.csv and other csv)

@marco-c
Copy link
Owner

marco-c commented Feb 13, 2018

Thanks for the hard work! I'm verifying the new data.zip file you've uploaded and then I'll merge this.

@marco-c marco-c merged commit 5bb234e into marco-c:master Feb 14, 2018
@propr
Copy link

propr bot commented Feb 14, 2018

Please provide your feedback on this pull request here.

Privacy statement: We don't store any personal information such as your email address or name. We ask for GitHub authentication as an anonymous identifier to account for duplicate feedback entries and to see people specific preferences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants