This repository contains the following directories:
data
: contains theraw
,processed
&graph
data i.e. the data loaded in the graph database Neo4j.model
: contains the original and the evolved graph model representation.scripts
: contains all the scripts used in the project, including the exercise problem scripts.submission_scripts
: contains the exercise problem scripts. They have been added here for convenience in verification.
The data present in data/raw_data
was generated by executing the python files in scripts/api_scripts
. It also consists synthetic and additional data sources necessary for the problem statement. This data is then processed using the scripts present in scripts/processing_scripts
.
These python files (for instance author_processing.py
can be run in the scripts/processing_scripts
directory as
cd scripts/processing_scripts
python author_processing.py
The final processed data files are stored in data/processed_data
. The useful files required for loading in the graph database are stored in data/graph_data
.
The data present in the graph_data
folder can now be moved to Neo4js import directory. To load the data, we use the scripts/loading_scripts
directory. You can execute the scripts in order i.e. PartA.2, PartA.3, PartB, PartC & PartD as per the problem statement. Make sure to add your configuration details in scripts/loading_scripts/config.ini
before running the aforementioned scripts.
cd scripts/loading_scripts
python PartA.2_IslamGupta.py
python PartA.3_IslamGupta.py
python PartB_IslamGupta.py
python PartC_IslamGupta.py
python PartD_IslamGupta.py