Skip to content
/ aba Public

Alignment-Based Approach for automatic modernization of french texts from the 16th to the 18th century

License

Notifications You must be signed in to change notification settings

johnseazer/aba

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ABA

Alignment-Based Approach for automatic modernization of french texts from the 17th to the 18th century

Online demo at https://igm.univ-mlv.fr/~gambette/text-processing/aba/

Install

Install Packages

  • With make
make
  • Without make

Add an extra line with ASR_metrics in the end of the file requirements.txt if you want to use the evaluation metrics

pip install -r requirements.txt

Generate Data

Download, Align and Analyze PARALLEL17

  1. Download PARALLEL17 and put it into the download folder or run script
python -m aba.download_git 'https://github.com/PhilippeGambette/PARALLEL17.git'
  1. Align PARALLEL17 by words
python -m aba.align_words
  1. Extract dictionaries from PARALLEL17
python -m aba.analyze

Extract Morphalou Dictionary

  1. Download Morphalou
  2. Copy morphalou/4/Morphalou3.1_formatCSV_toutEnUn/Morphalou3.1_CSV.csv to download folder
  3. Run script
python -m aba.extract_dic_morphalou

Extract Wikisource Dictionary

Extract old french → modern french dictionary from Wikisource.

python -m aba.extract_dic_wikisource

Extract Name Dictionary

Extract dictionary from multiple .dic files located in resources folder.

python -m aba.extract_dic_resources

Main Scripts

Modernize Corpus

python -m aba.modernize_corpus

Modernize Text

Modernize a text in old French. 1

python -m aba.modernize [-h] text_old_path

Modernize Text and Evaluate It

Modernize a text in old French and evaluate it by comparing it with a reference version stored in a file TEXT_NEW_PATH

python -m aba.modernize_and_evaluate [-h] -n TEXT_NEW_PATH text_old_path

Tools

Rules Chart

Opens a labeled dictionary and displays an interactive plotly pie chart showing the frequence of modernization rules. A copy of the chart is saved in data/rules_chart.html.

python -m aba.rules_chart

Find Strings

Search 2-columns .tsv files in a given directory for two corresponding strings old and new. Prints files, rows and lines where both strings appear.

python -m aba.find_strings [-h] [-d DIRECTORY] old new

Run Tests

py.test

Footnotes

  1. Path arborescence must be written with forward slashes /.

About

Alignment-Based Approach for automatic modernization of french texts from the 16th to the 18th century

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published