Skip to content

Commit

Permalink
Prep for 1.9.1 release (#142)
Browse files Browse the repository at this point in the history
* Prep for 1.9.1 release
Fix changelog typo. Add F.A.Q. detailing iframe behavior. Bump version.

* Fix spelling

* Add test for :nth-child 'of' case sensitivity
  • Loading branch information
facelessuser committed Apr 13, 2019
1 parent c7929c7 commit ce5a65a
Show file tree
Hide file tree
Showing 6 changed files with 52 additions and 2 deletions.
1 change: 1 addition & 0 deletions docs/src/dictionary/en-custom.txt
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ multiline
namespace
namespaces
newline
parser's
parsers
pre
prerelease
Expand Down
2 changes: 1 addition & 1 deletion docs/src/markdown/about/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
for which the element under consideration applies.
- **FIX**: HTML pseudo-classes will check that all key elements checked are in the XHTML namespace (HTML parsers that do
not provide namespaces will assume the XHTML namespace).
- **FIX**: Ensure that all pseudo-classes names are case insensitive and allow CSS escapes.
- **FIX**: Ensure that all pseudo-class names are case insensitive and allow CSS escapes.

## 1.9.0

Expand Down
41 changes: 41 additions & 0 deletions docs/src/markdown/faq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Frequent Asked Questions

## Why do selectors not work the same in Beautiful Soup 4.7+?

Soup Sieve is the official CSS selector library in Beautiful Soup 4.7+, and with this change, Soup Sieve introduces a
number of changes that break some of the expected behaviors that existed in versions prior to 4.7.

In short, Soup Sieve follows the CSS specifications fairly close, and this broke a number of non-standard behaviors.
These non-standard behaviors were not allowed according to the CSS specifications. Soup Sieve has no intentions of
bringing back these behaviors.

For more details on specific changes, and the reasoning why a specific change is considered a good change, or simply a
feature that Soup Sieve cannot/will not support, see [Beautiful Soup Differences](./differences.md).

## How does `iframe` handling work?

In web browsers, CSS selectors do not usually select content inside an `iframe` element if the selector is called on an
element outside of the `iframe`. Each HTML document is usually encapsulated and CSS selector leakage across this
`iframe` boundary is usually prevented.

In it's current iteration, Soup Sieve is not aware of the origin of the documents in the `iframe`, and Soup Sieve will
not prevent selectors from crossing these boundaries. Soup Sieve is not used to style documents, but to scrape
documents. For this reason, it seems to be more helpful to allow selector combinators to cross these boundaries.

Soup Sieve isn't entirely unaware of `iframe` elements though. In Soup Sieve 1.9.1, it was noticed that some
pseudo-classes behaved in unexpected ways without awareness to `iframes`, this was fixed in 1.9.1. Pseudo-classes such
as [`:default`](./selectors.md#:default), [`:indeterminate`](./selectors.md#:indeterminate), [`:dir()`](
./selectors.md#:dir), [`:lang()`](./selectors.md#:lang), [`:root`](./selectors.md#:root), and [`:contains()`](
./selectors.md#:contains) where given awareness of `iframes` to ensure they behaved properly and returned the expected
elements. This doesn't mean that `select` won't return elements in `iframes`, but it won't allow something like
`:default` to select a `button` in an `iframe` whose parent `form` is outside the `iframe`. Or better put, a default
`button` will be evaluated in the context of the document it is in.

With all of this said, if your selectors have issues with `iframes`, it is most likely because `iframes` are handled
differently by different parsers. `html.parser` will usually parse `iframe` elements as it sees them. `lxml` parser will
often remove `html` and `body` tags of an `iframe` HTML document. `lxml-xml` will simply ignore the content in a XHTML
document. And `html5lib` will HTML escape the content of an `iframe` making traversal impossible.

In short, Soup Sieve will return elements from all documents, even `iframes`. But certain pseudo-classes may take into
consideration the context of the document they are in. But even with all of this, a parser's handling of `iframes` may
make handling its content difficult if it doesn't parse it as HTML elements, or augments its structure.
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ nav:
- Soup Sieve: index.md
- API: api.md
- CSS Selectors: selectors.md
- F.A.Q.: faq.md
- Beautiful Soup Differences: differences.md
- About:
- Contributing & Support: about/contributing.md
Expand Down
2 changes: 1 addition & 1 deletion soupsieve/__meta__.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,5 +186,5 @@ def parse_version(ver, pre=False):
return Version(major, minor, micro, release, pre, post, dev)


__version_info__ = Version(1, 9, 1, ".dev")
__version_info__ = Version(1, 9, 1, "final")
__version__ = __version_info__._get_canonical()
7 changes: 7 additions & 0 deletions tests/test_level4/test_nth_child.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,13 @@ def test_nth_child_of_s_complex(self):
flags=util.HTML
)

self.assert_selector(
self.MARKUP,
":nth-child(2n + 1 OF :is(p, span).test)",
['2', '6', '10'],
flags=util.HTML
)


class TestNthChildQuirks(TestNthChild):
"""Test `nth` child selectors with quirks."""
Expand Down

0 comments on commit ce5a65a

Please sign in to comment.