strip-hidden-form-values
CLI tool for stripping hidden form values from an HTML document
Why would you need this? Imagine you're running a Git scraper against a website that includes hidden form fields (such as those produced by __VIEWSTATE
fields) that change on every request. You can pipe the HTML through this tool to strip those hidden form values such that a change is only recorded if the rest of the page is modified in some way.
scrape-ca-wildlife-rules is an example of a repository that uses this tool for that, see the scrape.yml workflow there for details.
Install this tool using pip
:
$ pip install strip-hidden-form-values
You can pipe HTML into this tool:
curl http://... | strip-hidden-form-values > output.html
Or pass it a filename:
strip-hidden-form-values input.html > output.html
The tool will replace the value=
attribute of any hidden form fields with a blank string,
so the following:
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="p8nVm4PgVPA" />
Will be replaced with:
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="" />
All other HTML will remain unchanged.
To contribute to this tool, first checkout the code. Then create a new virtual environment:
cd strip-hidden-form-values
python -m venv venv
source venv/bin/activate
Or if you are using pipenv
:
pipenv shell
Now install the dependencies and test dependencies:
pip install -e '.[test]'
To run the tests:
pytest