SignWriting Evaluation

The lack of automatic SignWriting evaluation metrics is a major obstacle in the development of SignWriting transcription and translation¹ models.

Goals

The primary objective of this repository is to house a suite of automatic evaluation metrics specifically tailored for SignWriting. This includes standard metrics like BLEU², chrF³, and CLIPScore⁴, as well as custom-developed metrics unique to our approach. We recognize the distinct challenges in evaluating single signs versus continuous signing, and our methods reflect this differentiation.

To qualitatively demonstrate the efficacy of these evaluation metrics, we implement a nearest-neighbor search for selected signs from the SignBank corpus. The rationale is straightforward: the closer the sign is to its nearest neighbor in the corpus, the more effective the evaluation metric is in capturing the nuances of sign language transcription and translation.

Evaluation Metrics

Tokenized BLEU - BLEU score for tokenized SignWriting FSW strings.
chrF - chrF score for untokenized SignWriting FSW strings.
CLIPScore - CLIPScore between SignWriting images. (Using the original CLIP model)
Similarity - symbol distance score for SignWriting FSW strings (README).

Qualitative Evaluation

Distribution of Scores

Using a sample of the corpus, we compute the any-to-any scores for each metric. Intuitively, we expect a good metric given any two random signs to produce a bad score, since most signs are unrelated. This should be reflected in the distribution of scores, which should be skewed towards lower scores.

Nearest Neighbor Search

It is well-known that the SignBank corpus contains many forms of the sign for "hello". We carefully select some of these signs to evaluate our metrics, by looking for their closest matches in the corpus, which contains around 230k single signs.

The problems of each metric are revealed when comparing the top 10 nearest neighbors for each sign. For each sign and metric, either the first match is incorrect, or there is a more correct match further down the list.


	CLIPScore	SymbolsDistances	TokenizedBLEU	CHRF	CLIPScore	SymbolsDistances	TokenizedBLEU	CHRF	CLIPScore	SymbolsDistances	TokenizedBLEU	CHRF
1
2
3
4
5
6
7
8
9
10

References

Amit Moryossef, Zifan Jiang. 2023. SignBank+: Preparing a Multilingual Sign Language Dataset for Machine Translation Using Large Language Models. ↩
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics. ↩
Maja Popović. 2015. chrF: character n-gram F-score for automatic MT evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation, pages 392–395, Lisbon, Portugal. Association for Computational Linguistics. ↩
Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. 2021. CLIPScore: A Reference-free Evaluation Metric for Image Captioning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7514–7528, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github/workflows		.github/workflows
assets		assets
signwriting_evaluation		signwriting_evaluation
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SignWriting Evaluation

Goals

Evaluation Metrics

Qualitative Evaluation

Distribution of Scores

Nearest Neighbor Search

References

About

Releases

Packages

Contributors 3

Languages

sign-language-processing/signwriting-evaluation

Folders and files

Latest commit

History

Repository files navigation

SignWriting Evaluation

Goals

Evaluation Metrics

Qualitative Evaluation

Distribution of Scores

Nearest Neighbor Search

References

Footnotes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages