RO-animacy

Background

This repository contains the code for a classifier that distinguishes between two classes of Romanian nouns: human and non-human. The classifier is built using pre-trained word embeddings and three different machine learning algorithms: Random Forest, Multi-layer Perceptron, and k-nearest neighbors. The corresponding paper is: Tepei, M., & Bloem, J. (in press). Automatic Animacy Classification for Romanian Nouns. In Proceedings of the Joint International Conference on Computational Linguistics, Language Resources and Evaluation 2024 (COLING-LREC 2024)

Usage

Install the required libraries: nltk, gensim, numpy, pandas, scikit-learn, spacy. Download the necessary data:

pre-trained word embeddings for Romanian (corola.300.20.vec): http://89.38.230.23/word_embeddings/
SpaCy's Romanian model. Execute the code in the notebook to preprocess data, train classifiers, and evaluate performance.

Included files

evaluation.txt: Input text file for evaluation (Wiki articles).
noun_prediction_annotation.csv: Manually annotated labels and predicted labels.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
animacy-code.ipynb		animacy-code.ipynb
evaluation.txt		evaluation.txt
noun_prediction_annotation.csv		noun_prediction_annotation.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RO-animacy

Background

Usage

Included files

About

Releases

Packages

Languages

mariatepei/RO-animacy

Folders and files

Latest commit

History

Repository files navigation

RO-animacy

Background

Usage

Included files

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages