This project includes two main phases:
- Prepare Phase — processes raw input data and generates optimized intermediate data.
- Run Phase — runs the final query pipeline on the processed data and produces results.
Install the necessary dependencies before running any scripts:
pip install -r optimized/requirements.txtBefore running the scripts, place your query results in optimized/inputs.py under the same format as the provided baseline example.
Make sure your data directory follows the structure expected by the scripts.
Run:
python3 optimized/prepare.py \
--data-dir /Users/wuhaodong/Downloads/data \
--out-dir optimized-dataThis process may take around 10 minutes to complete.
Once the preparation phase finishes, run the final query model:
python3 optimized/run.py \
--data-dir optimized-data \
--out-dir final-resultsResults will be saved in the final-results/ directory.
optimized-data/→ intermediate data files from the prepare phasefinal-results/→ final output files after the run phase