Steps to reproduce the results presented in the ISWC 17 paper "The Odyssey Approach for Optimizing Federated SPARQL Queries"
- Modify the file
scripts/configFileand replaceMYFOLDERwith the path of the folder where you have this repository - Modify the file
scripts/configFileand replace"HOST1" "HOST2" "HOST3" "HOST4" "HOST5" "HOST6" "HOST7" "HOST8" "HOST9"with the addresses of the machines that host the endpoints, if all the endpoints are hosted at the same machine include only one time the address of the machine, e.g.,"localhost" - Execute script
scripts/addEndpoints.shto replace the address and port of the endpoints in the files that describe the federation for the different approaches- If the endpoints are hosted by different machines, each endpoint is expected to be available at port 8891
- If the endpoints are hosted at the same machine, they are expected to be available at ports 8891-8899
- Ports can be modified in the file
script/addEndpoints.sh
- Make sure that
numRumsis set to10andtimeoutValueis set to1800 - Execute the script
scripts/getLib.shto make sure that the libraries and engines are available- This script will download Apache Jena 2.13 and httpcomponents-client-4.5.3
- It will also download some engines used in the empirical evaluation from their github repositories and enhance them with some modified files available at
enginesto include time measurements used in the empirical evaluation
- Execute the script
scripts/compileAll.shto compile the Java files (requires Java 8) - Execute the script
scripts/setPathFederationFiles.shto include absolute paths in some of the files used by the engines - Make sure that the statistics for Odyssey are available
- Check the section about Generating Odyssey's statistics
- Execute the systems you are interested in using the appropriate scripts.
- To have the engines accessing the endpoints directly (without using proxies), use the scripts (available in the folder
scripts):
- To have the engines accessing the endpoints directly (without using proxies), use the scripts (available in the folder
./executeQueriesOdyssey.sh
./executeQueriesFedX.sh
./executeQueriesHibiscus.sh
./executeQueriesSPLENDID.sh
./executeQueriesSemaGrow.sh
- To have the engines accessing the endpoints through proxies and count the number of intermediate results and requests, execute the script
scripts/changeToProxies.sh- This will replace the ports used by the endpoints (8891-8899) with the ports used by the proxies (3030-3038) in the relevant files used by the engines.
- Changing back the ports used by the proxies to the ports used by the endpoints can be done using the script
scripts/changeToEndpoints.sh
- And execute the engines using the relevant scripts:
./executeQueriesOdysseyWithProxies.sh
./executeQueriesFedXWithProxies.sh
./executeQueriesHibiscusWithProxies.sh
./executeQueriesSPLENDIDWithProxies.sh
./executeQueriesSemaGrowWithProxies.sh
- To use Odyssey's plans with FedX's optimizations for join order, you can use the scripts
executeOurPlansFedXOrder.shorexecuteOurPlansFedXOrderWithProxies.sh - To use FedX's source selection with Odyssey's decomposition and join order, you can use the scripts
executeFedXSelectionOurDecompositionAndOrder.shorexecuteFedXSelectionOurDecompositionAndOrderWithProxies.sh - The scripts will write to standard output the measurements taken during the evaluation, some examples of outputs are available at
results, e.g.,results/outputExecuteQueriesOdysseyWithProxies20170718- The output includes one line for each execution of each query following the format:
- query, number of selected sources, number of subqueries, optimization time, execution time, number of tuples sent from the endpoints to the engine, number of requests, number of results
- Examples
CD7 6 5 157 4743 638 549 2 LS3 7 5 6849 26390 19760 3103 9054
- The output includes one line for each execution of each query following the format:
- Our code has been tested using Java 8, Python 2.7.12.
- The core statistics used by Odyssey are available at
fedbench/statistics*- This includes the information about local characteristic sets and pairs at each endpoint (
fedbench/statistics*_css_*andfedbench/statistics*_cps_*) and the entity summaries that represent subject and object at each endpoint (statistics*_rtree_*) - These core statistics were generated using the script
scripts/generateStatisticsIndividualSources.sh- The current code to generate core statistics requires that each dataset is available in one N-Triples file with triples sorted by subject
- Scripts to sort N-triples files per subject are available at
scripts/sortDatasets.shandscripts/sortDatasetsBoundMemory.sh(used to sort large datasets) - The script
script/preprocessFolders.shcan be used to collect multiple RDF files (in different formats) provided in one folder (${dumpFolder}/files/${federation}Data/$endpoint) into one N-triples file (${dumpFolder}/${federation}Data/${endpoint}/${endpoint}Data.nt)- (the
dumpFoldervariable is set in the filescripts/configFileand thefederationvariable is set in the filescripts/setFederation
- (the
- Scripts to sort N-triples files per subject are available at
- The current code to generate core statistics requires that each dataset is available in one N-Triples file with triples sorted by subject
- This includes the information about local characteristic sets and pairs at each endpoint (
- Use the core statistics to compute federated statistics based on the entity summaries using the script
scripts/generateStatisticsFederated.sh
- You can set up a federation using Docker and dump files with the federation data using some of the scripts available at
scripts- Create the containers for the endpoints using the script
createDockers.sh- It is assumed that the data for the federation is available in the folder
${dumpFolder}/${federation}Data/${endpoint}and a volume for the container can be created at${folder}/endpoints/${federation}${endpoint}- The
dumpFolderandfoldervariables are set in the filescripts/configFileand thefederationandnamesvariables are set in the filescripts/setFederation(namesincludes the names of the datasets in the federation)
- The
- It is assumed that the data for the federation is available in the folder
- You can uncompress the federation data using the script
uncompressData.sh- The script assumes that the data is available in the
${dumpFolder}/${federation}Data/folder with one folder per endpoint - The
dumpFoldervariable is set in the filescripts/configFileand thefederationandnamesvariables are set in the filescripts/setFederation(namesincludes the names of the datasets in the federation)
- The script assumes that the data is available in the
- Use the script
createLoadFiles.shto generate isql files to load the data into Virtuoso endpoints - Use the script
loadDataDockers.shto use the generated files and load the data into the endpoints
- Create the containers for the endpoints using the script
- You can start the federation using the script
scripts/restartDockers.sh- The file uses the informations set in the files
configFileandsetFederation
- The file uses the informations set in the files
- You can stop the federation using the script
scripts/stopDockers.sh- The file uses the informations set in the files
configFileandsetFederation
- The file uses the informations set in the files
- Void's statistics were generated using the script:
generateVoidStatistics.sh
This code is available in this repository