Framework for RAG and RAGEC.
- Create virtual environment using
venv
.
python3.10 -m venv ./venv310
- Activate the virtual environment.
source ./venv310/bin/activate
- Install dependencies
pip install -e .
- Setup OpenAI key. Look at .envrc.example and rename or add an
.envrc
file. - Download and preprocess Dragonball and CLAPnq. Note that the shell script contains python calls. Edit the shell script if necessary.
bash ./download_data.sh
- Run Dragonball.
python -m scripts.dragonball_run
- Run CLAPnq
python -m scripts.clapnq_run
The config is specified in ./conf/
. We use hydra
to parse the config. See here for a description of the config.
As you can see from the code, it is decomposed into components. Each component will have artifacts as inputs and outputs. Artifacts are data or results that could be saved to or loaded from the disk. An artifact contains the underlying data and the path of where it should be saved or loaded.
One can "dry run" a component, i.e. to prepare the output artifact(s) of that component, without running the component itself. This allows easy continuation of the running a sequence of components.
For each run, the log of the run is saved at ./outputs/{RUN_DATE}/{RUN_TIME}/
. In the directory, you can also see the .hydra
folder directory that
indicates the config for that run as well. The artifact will be shared across runs. The path of the artifact will be specified as artifact_path
in the main config.
The default path is ./outputs/{DATA}/{RUN_NAME}/
. It will be changed to include the dataset names later.