Skip to content

jorge-martinez-gil/oruga

Repository files navigation

ORUGA: Optimizing Readability Using Genetic Algorithms

This repository contains code for reproducing the experiments reported in the paper

Also available is a collection of three articles on Medium that are written for a general audience.

🌍 Overview

ORUGA is a method that seeks to optimize the readability of any text automatically. It is based on genetic algorithms, so it is unsupervised and requires no training.

Example

🛠️ Install

pip install -r requirements.txt

(Important!! Please be very careful with the incompabilities of packages Readability and readability-lxml. The import Readability belongs to py-readability-metrics. Also, be sure to use the package pygad==2.1.0, otherwise, you will find errors.)

📊 Dataset

texts.txt Ten texts extracted from Wikipedia with different lengths, topics and readability levels.

🚀 Use

python oruga_wordnet.py Run the basic ORUGA program looking to minimize the FKGL score using the WordNet synonym library.

python oruga_word2vec.py Run the basic ORUGA program looking to minimize the FKGL score using the word2vec synonym method. (Slow)

python oruga_webscraping.py Run the basic ORUGA program looking to minimize the FKGL score using webscraping. (Please be responsible)

python oruga2_nsga2.py Run the second version of ORUGA using NSGA-II to minimize FKGL score and number of words to be replaced simultaneously.

python oruga2_gde3.py Run the second version of ORUGA using GDE3 to minimize FKGL score and number of words to be replaced simultaneously.

python oruga2_nsga2_wmd.py Run the third version of ORUGA using NSGA-II to minimize FKGL score, the number of words to be replaced simultaneously, and WMD for semantic distance.

python oruga2_gde3_wmd.py Run the third version of ORUGA using GDE3 to minimize FKGL score and number of words to be replaced simultaneously, and WMD for semantic distance.

📚 Citation

If you use ORUGA, please cite:

@article{martinez2024oruga,
	author = {Jorge Martinez-Gil},
	title = {Optimizing readability using genetic algorithms},
	journal = {Knowledge-Based Systems},
	volume = {284},
	pages = {111273},
	year = {2024},
	issn = {0950-7051},
	doi = {https://doi.org/10.1016/j.knosys.2023.111273}	
}

📄 License

MIT