Allow Python scraper to keep empty spans with ids #2082
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #1081
Source has 3 IDs
Scraped page has 2
Turns out that
core/clean_text
devdocs/lib/docs/filters/core/clean_text.rb
Line 10 in 887c879
sphinx/clean_html
does some ID manipulation and removal of some empty spansdevdocs/lib/docs/filters/sphinx/clean_html.rb
Lines 39 to 42 in 887c879
Code changes done
sphinx_keep_empty_ids
parameter to bypasssphinx/clean_html
processing and set python scraper tooptions[:sphinx_keep_empty_ids] = true
options[:clean_text] = false
to bypasscore/clean_text
Other notes
In testing this solve the issue mentioned and likely other scenarios (could be up to 54 scenarios in total)
Some others affected that are fixed in my local setup
option flags
https://devdocs.io/python~3.7/library/doctest#doctest-optionsdoctest directives
https://devdocs.io/python~3.7/library/doctest#doctest-directives