From 9a818b7787f2622542f3a02fb28b98b728fb7f96 Mon Sep 17 00:00:00 2001 From: facelessuser Date: Fri, 14 Dec 2018 19:21:28 -0700 Subject: [PATCH] Update docs --- docs/src/markdown/about/development.md | 36 ++++++++++++------------ docs/src/markdown/api.md | 38 +++++++++++++++++--------- docs/src/markdown/index.md | 13 ++++++--- setup.py | 2 +- soupsieve/__meta__.py | 2 +- 5 files changed, 53 insertions(+), 38 deletions(-) diff --git a/docs/src/markdown/about/development.md b/docs/src/markdown/about/development.md index 37abd2eb..03aef76d 100644 --- a/docs/src/markdown/about/development.md +++ b/docs/src/markdown/about/development.md @@ -24,7 +24,7 @@ Directory | Description ## Coding Standards -When writing code, the code should roughly conform to PEP8 and PEP257 suggestions. The project utilizes the Flake8 linter (with some additional plugins) to ensure code conforms (give or take some of the rules). When in doubt, follow the formatting hints of existing code when adding or modifying files. existing files. Listed below are the modules used: +When writing code, the code should roughly conform to PEP8 and PEP257 suggestions. The project utilizes the Flake8 linter (with some additional plugins) to ensure code conforms (give or take some of the rules). When in doubt, follow the formatting hints of existing code when adding files or modifying existing files. Listed below are the modules used: - @gitlab:pycqa/flake8 - @gitlab:pycqa/flake8-docstrings @@ -55,7 +55,7 @@ Spell checking is performed via @facelessuser/pyspelling. During validation we build the docs and spell check various files in the project. [Aspell][aspell] must be installed and in the path. Currently this project uses one of the more recent versions of Aspell. It is not expected that everyone will install and run Aspell locally, but it will be run in CI tests for pull requests. -In order to perform the spell check, it is expected you are setup to build the documents, and that you have Aspell installed in your system path (if needed you can use the `--binary` option to point to the location of your Aspell binary). It is also expected that you have the `en` dictionary installed as well. To initiate the spell check, run the following command from the root of the project. +In order to perform the spell check locally, it is expected you are setup to build the documents, and that you have Aspell installed in your system path (if needed you can use the `--binary` option to point to the location of your Aspell binary). It is also expected that you have the `en` dictionary installed as well. To initiate the spell check, run the following command from the root of the project. You will need to make sure the documents are built first: @@ -63,7 +63,7 @@ You will need to make sure the documents are built first: mkdocs build --clean ``` -And then run the spell checker. Using `python -m` from the project root will load your checked out version of PySpelling instead of your system installed version: +And then run the spell checker. ``` pyspelling @@ -73,7 +73,7 @@ It should print out the files with the misspelled words if any are found. If yo ## Validation Tests -In order to preserve good code health, a test suite has been put together with pytest (@pytest-dev/pytest). There are currently two kinds of tests: syntax and targeted. To run these tests, you can use the following command: +In order to preserve good code health, a test suite has been put together with pytest (@pytest-dev/pytest). To run these tests, you can use the following command: ``` py.test @@ -95,21 +95,19 @@ By running Tox, it will walk through all the environments and create them (assum tox ``` -If you don't have all the Python versions needed to test all the environments, those entries will fail. You can ignore those. Spelling will also fail if you don't have the correct version of Aspell. - -As most people will not have all the Python versions on their machine, it makes more sense to target specific environments. To target a specific environment to test, you use the `-e` option to select the environment of interest. To select lint: +If you don't have all the Python versions needed to test all the environments, those entries will fail. To run the tests for specific versions of Python, you specify the environment with `-e PXY` where `X` is the major version and `Y` is the minor version. ``` -tox -e lint +tox -e py37 ``` -To select Python 3.7 unit tests (or other versions -- change accordingly): +To target linting: ``` -tox -e py37 +tox -e lint ``` -To select spelling and document building: +To select spell checking and document building: ``` tox -e documents @@ -123,13 +121,13 @@ When running the validation tests through Tox, it is setup to track code coverag coverage erase ``` -Then run each unit test environment to and coverage will be calculated. All the data from each run is merged together. HTML is output for each file in `.tox/pyXX/tmp`. You can use these to see areas that are not covered/exercised yet with testing. +Then run each unit test environment to generate coverage data. All the data from each run is merged together. HTML is output for each file in `.tox/pyXX/tmp`. You can use these to see areas that are not covered/exercised yet with testing. You can checkout `tox.ini` to see how this is accomplished. ## Code Documentation -Soup Sieve is laid out in the following structure: +The Soup Sieve module is laid out in the following structure: ``` soupseive @@ -156,7 +154,7 @@ When a CSS selector string is given to Soup Sieve, it is run through the `CSSPar A `SelectorList` represents a list of compound selectors. So if you had the selector `#!css div > p`, you would get a `SelectorList` object containing one `Selector` object. If you had `#!css div, p`, you would get a `SelectorList` with two `Selector` objects as this is a selector list of two compound selectors. -A compound selector gets parsed into pieces. Each part of a specific compound selector is usually assigned to an attribute in a single `Selector` object. The attributes of the `Selector` object may be as simple as a boolean or a string, but they can also be a tuple of of `SelectorList` objects. In the case of `#!css *:not(p, div)`, `#!css *` will be a `SelectorList` with one `Selector`. The `#!css :not(p, div)` selector list will be a tuple containing one `SelectorList` of two `Selectors` (one for `p` and one for `div`) under the `selectors` attribute of the `#!css *` `Selector`. +A compound selector gets parsed into pieces. Each part of a specific compound selector is usually assigned to an attribute in a single `Selector` object. The attributes of the `Selector` object may be as simple as a boolean or a string, but they can also be a tuple of more `SelectorList` objects. In the case of `#!css *:not(p, div)`, `#!css *` will be a `SelectorList` with one `Selector`. The `#!css :not(p, div)` selector list will be a tuple containing one `SelectorList` of two `Selectors` (one for `p` and one for `div`) under the `selectors` attribute of the `#!css *` `Selector`. In short, `Selectors` are always contained within a `SelectorList`, and a compound selector is a single `Selector` object that may chain other `SelectorLists` objects depending on the complexity of the compound selector. If you provide a selector list, then you will get multiple `Selector` objects (one for each compound selector in the list) which in turn may chain other `Selector` objects. @@ -170,7 +168,7 @@ class SelectorList: """Initialize.""" ``` -`SelectorList` | Description +Attribute | Description -------------- | ----------- `selectors` | A list of `Selector` objects. `is_not` | Are the selectors in the selector list from a `:not()`. @@ -185,7 +183,7 @@ class Selector: """Initialize.""" ``` -`Selector` | Description +Attribute | Description ------------ | ----------- `tag` | Contains a single [`SelectorTag`](#selectortag) object, or `None`. `id` | Contains a tuple of ids to match. Usually if multiple conflicting ids are present, it simply won't match a tag, but it allows multiple to handle the syntax `tag#1#2` even if it is invalid. @@ -208,7 +206,7 @@ class SelectorTag: """Initialize.""" ``` -`SelectorTag` | Description +Attribute | Description ------------- | ----------- `name` | `name` contains the tag name to match. `prefix` | `prefix` contains the namespace prefix to match. `prefix` can also be `None`. @@ -224,7 +222,7 @@ class SelectorAttribute: """Initialize.""" ``` -`SelectorAttribute` | Description +Attribute | Description ------------------- | ----------- `attribute` | Contains the attribute name to match. `prefix` | Contains the attribute namespace prefix to match if any. @@ -241,7 +239,7 @@ class SelectorNth: """Initialize.""" ``` -`SelectorNth` | Description +Attribute | Description ------------- | ----------- `a` | The `a` value in the formula `an+b` specifying an index. `n` | `True` if the provided formula has included a literal `n` which signifies the formula is not a static index. diff --git a/docs/src/markdown/api.md b/docs/src/markdown/api.md index e7c254ce..95d89e8e 100644 --- a/docs/src/markdown/api.md +++ b/docs/src/markdown/api.md @@ -1,12 +1,24 @@ # API -Soup Sieve will detect the document type being used from the Beautiful Soup object that is given to it. For all HTML document types, it will treat tag names and attribute names without case sensitivity like most browsers do (even with XHTML). For HTML5, XHTML and XML, it will consider namespaces per the document's support (provided by the parser). To get namespaces support in HTML5, it is recommended to use `html5lib` as the parser. Some additional configuration is required when using namespaces, see [Namespace](#namespaces) for more information. +Soup Sieve uses a subset of the CSS4 selector specification to detect and filter elements. To learn more about which specific selectors are implemented, see [CSS Selectors](./selectors.md). -While attribute values are always generally treated as case sensitive, HTML5, XHTML, and HTML treat the `type` attribute special. The `type` attribute's value is always case insensitive. This is generally how most browsers treat `type`. If you need `type` to be sensitive, you can use the `s` flag: `#!css [type="submit" s]`. +Soup Sieve will detect the document type being used from the Beautiful Soup object that is given to it, and depending on the document type, its behavior may be slightly different: + +- All HTML document types (HTML, HTML5, and XHTML) will have their tag names and attribute names treated without case sensitivity, like most browsers do. Though XHTML is XML, which traditionally is case sensitive, it will still be treated like HTML in this respect. + +- XML document types will have their tag names and attribute names treated with case sensitivity. + +- HTML5, XHTML and XML document types will have namespaces evaluated per the document's support (provided via the parser). + + `html5lib` provides proper namespaces for HTML5, but `lxml` will not. If you need namespace support for HTML5, consider using `html5lib`. + + For XML, `lxml` will provide proper namespaces. It is generally suggested that `lxml` is used to parse XHTML documents. Some additional configuration is required when using namespaces, see [Namespace](#namespaces) for more information. + +- While attribute values are generally treated as case sensitive, HTML5, XHTML, and HTML treat the `type` attribute special. The `type` attribute's value is always case insensitive. This is generally how most browsers treat `type`. If you need `type` to be sensitive, you can use the `s` flag: `#!css [type="submit" s]`. ## Flags -There are no flags at this time, but the parameter is provided for potential future use. +Early in development, flags were used to specify document type, but as of 1.0.0, there are no flags used at this time, but the parameter is provided for potential future use. ## `soupsieve.select()` @@ -15,9 +27,9 @@ def select(select, node, namespaces=None, limit=0, flags=0): """Select the specified tags.""" ``` -`select` given a tag, will select all tags that match the provided CSS selector string. You can give `limit` a positive integer to return a specific number tags (0 means to return all tags). +`select` will return all tags under the given a tag, that match the given CSS selectors provided. You can also limit the number of tags returned by providing a positive integer via the `limit` parameter (0 means to return all tags). -`select` accepts a CSS selector string, a `node` or element, an optional [namespace](#namespaces) dictionary, a `limit`, and `flags`. +`select` accepts a CSS selector string, a `node`/element, an optional [namespace](#namespaces) dictionary, a `limit`, and `flags`. ```pycon3 >>> import soupsieve as sv @@ -41,9 +53,9 @@ def match(select, node, namespaces=None, flags=0): """Match node.""" ``` -`match` matches a given node/element with a given CSS selector. +The `match` function matches a given `node`/element with a given CSS selector. -`match` accepts a CSS selector string, a `node` or element, an optional [namespace](#namespaces) dictionary, and flags. +`match` accepts a CSS selector string, a `node`/element, an optional [namespace](#namespaces) dictionary, and flags. ```pycon3 >>> nodes = sv.select('p:is(.a, .b, .c)', soup) @@ -60,7 +72,7 @@ def filter(select, nodes, namespaces=None, flags=0): """Filter list of nodes.""" ``` -`filter` takes an iterable containing HTML nodes and will filter them based on the provided CSS selector string. If given a Beautiful Soup tag, it will iterate the children that are tags. +`filter` takes an iterable containing HTML `nodes`/elements and will filter them based on the provided CSS selector string. If given a Beautiful Soup tag, it will iterate the direct children that are tags. `filter` accepts a CSS selector string, an iterable containing tags, an optional [namespace](#namespaces) dictionary, and flags. @@ -76,9 +88,9 @@ def comments(node, limit=0, flags=0): """Get comments only.""" ``` -`comments` if useful to extract all comments from a document or document tag. It will extract from the given tag down through all of its children. You can limit how many comments are returned with `limit`. +The `comments` function can be used to extract all comments from a document or document tag. It will extract from the given tag down through all of its children. You can limit how many comments are returned with `limit`. -`comments` accepts a `node` or element, a `limit`, and flags. +`comments` accepts a `node`/element, a `limit`, and flags. ## `soupsieve.icomments()` @@ -123,12 +135,12 @@ class SoupSieve: ## `soupsieve.purge()` -Soup Sieve caches compiled patterns for performance. If for whatever reason you need to purge the cache, simply call `purge`. +Soup Sieve caches compiled patterns for performance. If for whatever reason, you need to purge the cache, simply call `purge`. ## Namespaces -Many of Soup Sieve's selector functions take an optional namespaces dictionary. Namespaces, just like CSS, must be defined for Soup Sieve to evaluate `ns|tag` type selectors. This is analogous to CSS's namespace at-rule: +Many of Soup Sieve's selector functions take an optional namespace dictionary. Namespaces, just like CSS, must be defined for Soup Sieve to evaluate `ns|tag` type selectors. This is analogous to CSS's namespace at-rule: ```css @namespace url("http://www.w3.org/1999/xhtml"); @@ -145,7 +157,7 @@ namespace = { } ``` -Tags do not necessarily have to have a prefix for Soup Sieve to recognize them. For instance, in HTML5, SVG *should* automatically get the SVG namespace. Depending how namespaces were defined in the documentation, tags may inherit namespaces in some conditions. Namespace assignment is mainly handled by the parser and exposed through the Beautiful Soup API. Soup Sieve uses the Beautiful Soup API to then compare namespaces when the appropriate document that supports namespaces is set. +Tags do not necessarily have to have a prefix for Soup Sieve to recognize them. For instance, in HTML5, SVG *should* automatically get the SVG namespace. Depending how namespaces were defined in the documentation, tags may inherit namespaces in some conditions. Namespace assignment is mainly handled by the parser and exposed through the Beautiful Soup API. Soup Sieve uses the Beautiful Soup API to then compare namespaces for supported documents. --8<-- refs.txt diff --git a/docs/src/markdown/index.md b/docs/src/markdown/index.md index 3302fcb2..5423adb5 100644 --- a/docs/src/markdown/index.md +++ b/docs/src/markdown/index.md @@ -2,11 +2,10 @@ ## Overview -Soup Sieve is a CSS4 selector library designed to be used with [Beautiful Soup 4][bs4]. It aims to provide selecting, matching, and filtering with using modern CSS selectors. +Soup Sieve is a CSS selector library designed to be used with [Beautiful Soup 4][bs4]. It aims to provide selecting, matching, and filtering using modern CSS selectors. Soup Sieve currently provides selectors from a subset of the CSS4 specification. While Beautiful Soup comes with a builtin CSS selection API, it is not without issues. In addition, it also lacks support for some more modern CSS features. -Soup Sieve supports a subset of CSS4 selectors which allows for filtering of tags in a Beautiful Soup object. Soup Sieve does not attempt to support all CSS4 selectors as many don't make sense in a non-browser environment. Some of the supported selectors are: - `#!css .classes` @@ -35,7 +34,7 @@ If you want to manually install it, run `#!bash python setup.py build` and `#!ba ## Usage -Using Soup Sieve is easy. Simply create a Beautiful Soup object: +To use Soup Sieve, you must create a Beautiful Soup object: ```pycon3 >>> import bs4 @@ -83,7 +82,7 @@ Or even just extracting comments: [' These are animals '] ``` -If you've ever used Python's Re library for regular expression, you may know that it is often useful to pre-compile a regular expression pattern, especially if you plan to use it more than once. The same is true for Soup Sieve's matchers. If you have a pattern that you want to use more than once, it may be wise to pre-compile it early on: +If you've ever used Python's Re library for regular expressions, you may know that it is often useful to pre-compile a regular expression pattern, especially if you plan to use it more than once. The same is true for Soup Sieve's matchers, though is not required. If you have a pattern that you want to use more than once, it may be wise to pre-compile it early on: ```pycon3 >>> selector = sv.compile('p:is(.a, .b, .c)') @@ -93,6 +92,12 @@ If you've ever used Python's Re library for regular expression, you may know tha A compiled object has all the same methods, though the parameters will be slightly different as they don't need things like the pattern or flags once compiled. See [API](./api.md) documentation for more info. +Compiled patterns are cached, so if for any reason you need to clear the cache, simply issue the `purge` command. + +```pycon3 +>>> sv.purge() +``` + --8<-- refs.txt --8<-- diff --git a/setup.py b/setup.py index a449753f..3a82daa2 100644 --- a/setup.py +++ b/setup.py @@ -45,7 +45,7 @@ def get_description(): name='soupsieve', version=VER, python_requires=">=3.4", - keywords='CSS HTML selector filter', + keywords='CSS HTML XML selector filter query soup', description='A CSS4 selector implementation for Beautiful Soup.', long_description=get_description(), long_description_content_type='text/markdown', diff --git a/soupsieve/__meta__.py b/soupsieve/__meta__.py index 15f43345..c8ee9880 100644 --- a/soupsieve/__meta__.py +++ b/soupsieve/__meta__.py @@ -186,5 +186,5 @@ def parse_version(ver, pre=False): return Version(major, minor, micro, release, pre, post, dev) -__version_info__ = Version(1, 0, 0, "beta", 2) +__version_info__ = Version(1, 0, 0, "final") __version__ = __version_info__._get_canonical()