Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Various FormRequest tests+fixes #1597

Merged
merged 7 commits into from
Dec 4, 2015

Conversation

Digenis
Copy link
Member

@Digenis Digenis commented Nov 13, 2015

#1595 FormRequest should consider input type values case-insensitive
#1596 FormRequest doesn't handle input elements without type attribute

Plus: I noticed a maybe-unicode xpath wasn't encoded
when raising an exception whose str message is formatting it with %s.

@codecov-io
Copy link

Current coverage is 82.76%

Merging #1597 into master will increase coverage by +0.03% as of 9a30a3d

Powered by Codecov. Updated on successful CI builds.

@kmike kmike changed the title Various FormRequest tests+fixes [MRG+1] Various FormRequest tests+fixes Nov 13, 2015
@kmike
Copy link
Member

kmike commented Nov 13, 2015

Awesome, thanks!

'|descendant::input[@type!="submit" and @type!="image" and @type!="reset"'
'and ((@type!="checkbox" and @type!="radio") or @checked)]')
'|descendant::input[not(@type) or @type['
' translate(., "SUBMIT", "submit") != "submit"'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lxml has support for EXSLT regex extensions:
https://stackoverflow.com/questions/2755950/how-to-use-regular-expression-in-lxml-xpath
http://exslt.org/regexp/functions/test/index.html

boolean regexp:test(string, string, 'i')

I think it would be easier to read

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re:test(@type, "^(submit|image|reset)$", "i") only for the first 3
and separately the checkbox and the radio.
Alternatively it's a regexp per value.
I didn't test it but I think there should be a performance penalty
invoking python's re module instead of leaving it to libxml.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

" there should be a performance penalty"
probably. I'd be curious to see a comparison.

and I'd still go for the regexp version for readability

@Digenis
Copy link
Member Author

Digenis commented Nov 19, 2015

I changed it according to @redapple's suggestion.
Is there anything else left for this PR?

@kmike
Copy link
Member

kmike commented Nov 23, 2015

@redapple what do you think about the new implementation?

@redapple
Copy link
Contributor

@Digenis , @kmike ,
looks alright to me,
although the ../@checked part feels a bit odd.
But more correct now than before (I mean before the PR)
so LGTM

FWIW, I was thinking of something like

inputs = form.xpath(r'''descendant::textarea |
                        descendant::select |
                        descendant::input[not(
                            @type and
                            (
                                re:test(@type, "^(submit|image|reset)$", "i") or
                                (not(@checked) and re:test(@type, "^(checkbox|radio)$", "i"))
                            )
                        )]''',
                    namespaces={"re": "http://exslt.org/regular-expressions"})

@Digenis
Copy link
Member Author

Digenis commented Dec 2, 2015

Is there something left to do for this PR?

dangra added a commit that referenced this pull request Dec 4, 2015
[MRG+1] Various FormRequest tests+fixes
@dangra dangra merged commit dc65026 into scrapy:master Dec 4, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants