This codebase provides the scripts that were used in the paper. The pipeline directory contains a folder for each main step of the pipeline, that are:
- Data Translation and Instruction Formatting
- Language Adaptation and Multimodal Adaptation
- Evaluation
- Conversation Training and Evaluation
The data, models and reports folders are supposed to be used to keep data after processing, models after training and evaluation results.
You will find a README markdown file in each folder which explains what has been done and how to use the code.
However, please note that our experiments were done using Singularity for containerization and increased reproducibility, therefore, most scripts are launched using it. We recommend installing Singularity on the system you want to test the codebase on.
Finally, git-lfs was also used to download data, we also recommend installing it.
If you are interested in data and models, you can find them in this collection.