Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Fixed bug FormRequest.from_response() clickdata ignores input[type=image] #3153

Merged
merged 6 commits into from Mar 21, 2018

Conversation

ViralMehtaSWE
Copy link
Contributor

@ViralMehtaSWE ViralMehtaSWE commented Mar 3, 2018

BugFix for issue #3139

@codecov
Copy link

codecov bot commented Mar 3, 2018

Codecov Report

Merging #3153 into master will increase coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3153      +/-   ##
==========================================
+ Coverage   82.11%   82.12%   +<.01%     
==========================================
  Files         228      228              
  Lines        9588     9593       +5     
  Branches     1385     1385              
==========================================
+ Hits         7873     7878       +5     
  Misses       1456     1456              
  Partials      259      259
Impacted Files Coverage Δ
scrapy/http/request/form.py 95.9% <ø> (ø) ⬆️
scrapy/core/downloader/tls.py 90% <0%> (+1.42%) ⬆️

@ViralMehtaSWE
Copy link
Contributor Author

@cathalgarvey , please review the PR. Thanks :)

@cathalgarvey
Copy link
Contributor

Thanks @virmht I'll give it a look :)

@cathalgarvey
Copy link
Contributor

Looking at this, I think it's correct. And it includes a test-case.

I'm not sure though whether the XPath should be reformatted to use a single regex pattern, matching ^(submit|image)$ for example, or whether we should accept that the list of OR predicates may grow, and reformat the Xpath expression to keep one OR option per line?

E.g., the difference between a DRYer single-pattern match for images and submits:

form.xpath(
    'descendant::*[(self::input or self::button)'
    ' and re:test(@type, "^(submit|image)$", "i")]'
   '|descendant::button[not(@type)]',
    namespaces={"re": "http://exslt.org/regular-expressions"})

..and with one-option-per-line, keeping options clearly separate to assist debug..

form.xpath(
    'descendant::*[(self::input or self::button) and re:test(@type, "^submit$", "i")]'
    '|descendant::*[(self::input or self::button) and re:test(@type, "^image$", "i")]'
    '|descendant::button[not(@type)]',
    namespaces={"re": "http://exslt.org/regular-expressions"})

@kmike - I know you care about DRY, but Xpath expressions will be missed by linters and other automated tools, so maybe clarity in each case is a useful way to keep it clean? What do you think?

@virmht - What's your opinion? :)

@cathalgarvey
Copy link
Contributor

(With or without changes, though, consider this MRG+1 from me - I'll change the title to reflect)

@cathalgarvey cathalgarvey changed the title Fixed bug FormRequest.from_response() clickdata ignores input[type=image] [MRG+1] Fixed bug FormRequest.from_response() clickdata ignores input[type=image] Mar 6, 2018
@ViralMehtaSWE
Copy link
Contributor Author

ViralMehtaSWE commented Mar 6, 2018

@cathalgarvey, I got your point. Which pattern to follow depends on the expected length of the list of OR predicates. If the list size doesn't increase by more than 3-4 then, maybe, we will be just fine with using a single regex pattern, otherwise, using the one-predicate-per-line pattern(used by me) will be a better choice. So, how long, do you think, can the list get ? What's your opinion ? :)

'descendant::*[(self::input or self::button)'
' and re:test(@type, "^submit$", "i")]'
'|descendant::button[not(@type)]',
namespaces={"re": "http://exslt.org/regular-expressions"})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it me or these lines are changed from 4 spaces indentation to 5 spaces?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR it would be much easier to review if you keep the indentation at 4 spaces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will make the necessary changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dangra @cathalgarvey Resolved this review. Please see the new changes.

@@ -172,6 +172,8 @@ def _get_clickable(clickdata, form):
el for el in form.xpath(
'descendant::*[(self::input or self::button)'
' and re:test(@type, "^submit$", "i")]'
'|descendant::*[(self::input or self::button)'
' and re:test(@type, "^image$", "i")]'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering button[type=image] isn't valid, I would go for:

descendant::input[re.test(@type, "^(submit|image)$", "i")]
|descendant::button[not(@type) or re.test(@type, "^submit$", "i")]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! @dangra, please see the new changes.

@ViralMehtaSWE
Copy link
Contributor Author

@dangra, @cathalgarvey, - suggestions are welcome :)

@dangra dangra merged commit 6c3970e into scrapy:master Mar 21, 2018
@dangra
Copy link
Member

dangra commented Mar 21, 2018

thanks @virmht !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants