Skip to content

Conversation

sortafreel
Copy link
Contributor

@sortafreel sortafreel commented Jun 15, 2019

CSS immediate children pseudo-element (from CSS :scope > div to XPath child::div or ./div). Works only at the start of a selector. https://developer.mozilla.org/en-US/docs/Web/CSS/:scope

Needed to get immediate children of a processed selector in Scrapy:

for product in response.css('.product'):
    description = product.css(':scope > div::text').get()

HTML example:

<body>
    <div class="product">
        <p class="price">
            <div>$99.99</div>
        </p>
        <div>
            Some description text
        </div>
        <p>
            <div>Shipping info</div>
        </p>
    </div>
    <div class="product">
        <p class="price">
            <div>$199.99</div>
        </p>
        <div>
            Another description text
        </div>
        <p>
            <div>Another shipping info</div>
        </p>
    </div>
</body>

@sortafreel
Copy link
Contributor Author

Testing, adding custom tests.

@codecov
Copy link

codecov bot commented Jun 15, 2019

Codecov Report

Merging #87 into master will decrease coverage by 0.1%.
The diff coverage is 80%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #87      +/-   ##
==========================================
- Coverage   95.25%   95.14%   -0.11%     
==========================================
  Files           2        2              
  Lines         695      700       +5     
  Branches      113      115       +2     
==========================================
+ Hits          662      666       +4     
  Misses         19       19              
- Partials       14       15       +1
Impacted Files Coverage Δ
cssselect/xpath.py 94.86% <100%> (+0.03%) ⬆️
cssselect/parser.py 95.34% <66.66%> (-0.22%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cb7a7e2...4b96685. Read the comment docs.

Copy link
Member

@dangra dangra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside of some nitpicks I think it is a good addition as it is. thanks

@sortafreel sortafreel changed the title [WIP] CSS immediate children CSS immediate children Jun 15, 2019
@alexander-matsievsky
Copy link

@sortafreel Great job, Alexander!

One thought though. The Selectors Level 4 defines the :scope pseudo-class. From https://developer.mozilla.org/en-US/docs/Web/CSS/:scope:

When used from a DOM API such as querySelector(), querySelectorAll(), matches(), or Element.closest(), :scope matches the element you called the method on.

Could we use it instead of < (or other non-standard alternatives like @, &, etc.)? The reasoning is to avoid extending CSS in an ad-hoc manner and to save on the refactoring when cssselect supports CSS4 selectors eventually.

@sortafreel sortafreel changed the title CSS immediate children CSS immediate children (:scope selector) Jun 16, 2019
@sortafreel sortafreel changed the title CSS immediate children (:scope selector) CSS immediate children (:scope selector) Jun 16, 2019
@sortafreel sortafreel force-pushed the css-immediate-children branch from ed529ae to 5a794fa Compare June 16, 2019 15:31
@sortafreel sortafreel force-pushed the css-immediate-children branch 6 times, most recently from 8f08371 to ce928d2 Compare June 17, 2019 13:29
@sortafreel sortafreel force-pushed the css-immediate-children branch from ce928d2 to 270f118 Compare June 17, 2019 13:32
Copy link
Member

@Gallaecio Gallaecio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love these changes! The only thing I think is missing is to update https://cssselect.readthedocs.io/en/latest/#supported-selectors to mention :scope.

@sortafreel
Copy link
Contributor Author

sortafreel commented Jun 18, 2019

I love these changes! The only thing I think is missing is to update https://cssselect.readthedocs.io/en/latest/#supported-selectors to mention :scope.

@Gallaecio Done :)

@dangra
Copy link
Member

dangra commented Jul 3, 2019

@sortafreel seems we are very close, only rest covering the case codecov is complaining with a test.

@Gallaecio
Copy link
Member

@dangra The tests cover all changes as far as @sortafreel and me can tell. @sortafreel tried different changes to see if they had any effect on codecov, but neither @sortafreel nor me managed to find a new path of code that is not hit by at least one test.

@Gallaecio
Copy link
Member

Gallaecio commented Jul 3, 2019

I believe we are being hit by https://bugs.python.org/issue2506

If I do what https://bitbucket.org/ned/coveragepy/issues/594/problems-with-branch-identification-and#comment-38845397 suggests:

            if stream.peek() != ('DELIM', '('):
                # …
                if result.__repr__() == 'Pseudo[Element[*]:scope]':
                    # …
                # 2 lines added here:
                else:
                    a = 1  # https://bitbucket.org/ned/coveragepy/issues/594/problems-with-branch-identification-and#comment-38845397
                continue

Then the coverage data from tox -e py36 (after adding show_missing = True under [report] at .coveragerc to show lines missing in the coverage) no longer shows those lines as not covered.

@sortafreel
Copy link
Contributor Author

@Gallaecio thanks for the support :)

@dangra
Copy link
Member

dangra commented Jul 3, 2019

Nice catch @Gallaecio ! all good then.

@dangra dangra merged commit 99bc54c into scrapy:master Jul 3, 2019
@dangra
Copy link
Member

dangra commented Jul 3, 2019

@sortafreel thanks for all the good work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants