Text Similarity as An Evaluation Measure of Text Generation

❓ Context

Natural Language Generation (NLG) is the process of generating human-like language by machines. One of the key challenges in evaluating the quality of generated text is to compare it with 'gold standard' references.

However, obtaining human annotations for evaluation is an expensive and time-consuming process, making it impractical for large-scale experiments. As a result, researchers have explored alternative methods for evaluating the quality of generated text.

Two families of metrics have been proposed: trained metrics and untrained metrics. While trained metrics may not generalize well to new data, untrained metrics, such as word or character-based metrics and embedding-based metrics, offer a more flexible and cost-effective solution. To assess the performance of an evaluation metric, correlation measures such as Pearson, Spearman, or Kendall tests are used, either at the text-level or system-level.

🎯 Objective

This project aims to benchmark the correlation of existing metrics with human scores on generation task: translation or data2text generation or story generation.

🚀 How to use the project

First, you need to clone the repository and cd into it :

git clone https://github.com/lidamsoukaina/NLG_Evaluation_Metrics.git
cd NLG_EVALUATION_METRICS

Then, you need to create a virtual environment and activate it :

python3 -m venv venv
source venv/bin/activate

You need to install all the requirements using the following command :

pip install -r requirements.txt

[Optional] if you are using this repository in development mode, you can run the following command to set up the git hook scripts:

pre-commit install

You can now run the python files in the file name folder using the following commands :

cd cluster
python3 [file name]

To test the project, you can run the test.ipynb notebook.

📝 Results

TODO: List the tested metrics

TODO: Describe the criteria used to evaluate the results

Metric	criterion1	criterion2	criterion3
TER	XX	XX	XX
DepthScore	XX	XX	XX

TODO: Describe and analyse the results

🤔 What's next ?

TODO: List the next steps

📚 References

TODO: List the references

✏️ Authors

LETAIEF Maram
LIDAM Soukaina

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
HANNA_analysis.ipynb		HANNA_analysis.ipynb
README.md		README.md
RocStory.json		RocStory.json
metrics.py		metrics.py
requirements.txt		requirements.txt
rocStories.ipynb		rocStories.ipynb
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Similarity as An Evaluation Measure of Text Generation

❓ Context

🎯 Objective

🚀 How to use the project

📝 Results

🤔 What's next ?

📚 References

✏️ Authors

About

Releases

Packages

Contributors 2

Languages

lidamsoukaina/NLG_Evaluation_Metrics

Folders and files

Latest commit

History

Repository files navigation

Text Similarity as An Evaluation Measure of Text Generation

❓ Context

🎯 Objective

🚀 How to use the project

📝 Results

🤔 What's next ?

📚 References

✏️ Authors

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages