CLI pipeline for generating and uploading the COLD French Law dataset.
The COLD French Law dataset is a collection of currently applicable law articles filtered from France's LEGI dataset.
English translations are available for ~800K articles. These translations were generated by OpenAI's GPT-4 and provided by Casetext, Part of Thomson Reuters.
⚠️ This process is transformative and, while data is sourced from France's LEGI dataset:
- The accuracy of the data going in and out of this pipeline cannot be guaranteed.
- This pipeline and resulting dataset are unofficial and experimental
This pipeline requires Python 3.11+ and Python Poetry.
Pulling and pushing data from HuggingFace may require the HuggingFace CLI and valid authentication.
git clone https://github.com/harvard-lil/cold-french-law-pipeline.gitpoetry installWill generate a CSV under data/cold_csv.
# See: build.py --help for a list of available options
poetry run python build.py
Will attempt to upload the resulting CSV file to harvard-lil/cold-french-law
poetry run python upload.py