Skip to content

Latest commit

 

History

History
19 lines (12 loc) · 1.23 KB

README.md

File metadata and controls

19 lines (12 loc) · 1.23 KB

This scripts allows to download a corpus from the pangloss collection and soon from any OLAC repository.

For the moment, you must first download Pangloss metadata using the script that can be found here.

You can then download data from a subset of languages using the following command:

python olac_grabber.py  --metadata "metadata_pangloss.xml" --languages "Lazé" --exceptspeakers "Anonyme"

the tests performed were done with the Lazé language, using the following command :

*no speaker excluded : python olac_grabber.py --metadata "/home/mfily/Documents/diagnoSTIC_XP/03_make_corpus/metadata_pangloss.xml" --languages "Lazé"

*with excluded speakers : python olac_grabber.py --metadata "/home/mfily/Documents/diagnoSTIC_XP/03_make_corpus/metadata_pangloss.xml" --languages "Lazé" --exceptspeakers "Anonyme"

the difference can be seen in files downloaded_data_lazé_no_exception.csv and downloaded_data_lazé_with_exception.csv

This script has been developed during the DiagnoSTIC project.

This work was partly funded by Agence de l’Innovation de Défense (grant 2022 65 0079).