Skip to content


Folders and files

Last commit message
Last commit date

Latest commit


Repository files navigation

Deploy CCC


  • install dependencies with pip install -r requirements.txt
  • In ccc/scripts, put Blazegraph.jar Direct link to download


  • Change bee/ content with content of bee/
  • Change ccc/ content with content of ccc/
  • Shell (from ccc/scripts): python3 -m script.ccc.run_bee. It creates a folder called test in the same folder scripts.
  • OUTPUT JSON: scripts/test/share/ref/todo
  • ERRORS: scripts/test/index/ref/issue

Rerun BEE

  • Empty/remove the folder test/
  • Run: python3 -m script.ccc.run_bee

Run BEE on a single XML file

  • Include the XML file in the folder script/ccc/
  • Uncomment lines 39, 40 of script/ccc/
  • Change in script/ccc/ the name of the file to be parsed
  • Run: python3 -m script.ccc.test_bee


  • INPUT JSON: scripts/test/share/ref/todo
  • OUTPUT RDF (dump): scripts/ccc/
  • Run Blazegraph: java -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 -server -Xmx1g -Djetty.port=9999 -jar blazegraph.jar
  • Run: python3 -m script.ccc.run_spacin


  • Empty scripts/ccc/ BUT do not remove scripts/ccc/context.json
  • Remove scripts/ccc.jnl (quit the .jar first!)
  • If you want to rerun SPACIN on the same JSON files, move the content of scripts/test/share/ref/done into scripts/test/share/ref/todo

Other notes:

  • do not change the config file script/ccc/
  • do not delete context.json included in scripts/ccc/ when rerunning SPACIN

Exploiting local indexes

BEE and SPACIN have been enhanced in order to exploit respectively a CSV dataset generated with europe-pubmed-central-dataset tool and papendex tool.

  • (BEE) in scripts/script/bee/ there are:

    • PARALLEL_PROCESSING: set to True in order to enable the improvement made
    • dataset_reference: absolute reference to the CSV generated
    • article_path_reference: absolute reference to the directory where all the XML articles are stored
    • n_process: the number of processes that will be spawned.
    • doc_for_process: the CSV will be splitted in a number of chunks (one for each process), having the number of docs specified here
  • (SPACIN) in script/ccc/ there are:

    • crossref_query_interface_type: set to 'local' if you want to exploit the local index, otherwise 'remote'
    • orcid_query_interface_type = set to 'local' if you want to exploit the local index, otherwise 'remote'