Skip to content
Uses Screaming Frog Internal HTML with text extraction along with a shingling algorithm to compare content duplication across the pages of a crawled site.
Python
Branch: master
Clone or download
Latest commit 2b6bccc Feb 20, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
LICENSE Initial commit Feb 20, 2019
README.md Update README.md Feb 20, 2019
requirements.txt Add files via upload Feb 20, 2019
sf_extraction.png Add files via upload Feb 20, 2019
sf_shingling.py Add files via upload Feb 20, 2019

README.md

screaming-frog-shingling

Uses Screaming Frog Internal HTML with text extraction along with a shingling algorithm to compare content duplication across the pages of a crawled site.

Example Usage

  1. pip install -r requirements.txt

  2. Run Screaming Frog and use Extraction to pull the content out of a specific DOM element. Screaming Frog Extraction

  3. Export the internal HTML to a CSV file.

  4. Run the script using the following arguments.

 Example Usage:
    -i : Input filename
    -o : Output filename
    -c : Column from Screaming Frog that contains your extracted content.
    Example invocation:
    python sf_shingling.py -i internal_html_ap.csv -o output_html_ap.csv -c "BodyContent 1"
You can’t perform that action at this time.