South Park Wiki Scraper
Scrapes the South Park Wiki and converts it into a format acceptable by Neo4j 3.5. Accompanying blog post for this code can be found on my website.
- Python 3.7 or later
- Inflect Engine (
pip install inflect)
- BeautifulSoup4 HTML Parser (
pip install beautifulsoup4)
How to run
scraper.pyusing Python 3.7.
- Resulting output is written to the
/output/folder. This may take about 15 minutes.
- Import the nodes and relationships into Neo4j:
./bin/neo4j-admin import --nodes nodes.csv --relationships edges.csv
- You're done!