This project takes us from BBC live text commentary for Champions League 2014/2015 matches to a Neo4j graph containing the events of each match.
-
Install the latest version of Neo4j from http://neo4j.com/download/
-
Windows users: Install desktop application & then click the 'start' button
-
Mac/Linux: Unpack the tarball & then
./bin/neo4j start
-
Download import.cql to your machine
Import the data into Neo4j:
cd neo4j-community-2.2.2
./bin/neo4j-shell --file import.cql
Open http://localhost:7474 and you’re good to go
If you want to play with the raw data you’ll first need to setup a Python environment.
Install virtualenv and create a sandbox for this project:
virtualenv bbc
source bbc/bin/activate
Install the appropriate libraries:
pip install -r requirements.txt
Download all the matches:
python find_all_matches.py | xargs wget -P data/raw
Generate the CSV files that we import into Neo4j:
python extract_players.py
# players will be written to data/players.csv
python extract_events.py
# the other CSV files in data/ will be written