Skip to content

Predicting age, gender and topic from italian texts

Notifications You must be signed in to change notification settings

labadier/EVALITA_TAG_it

Repository files navigation

EVALITA_TAG_it

This Project constist on a neural network model used for participating in the TAG-it Author Profiling task at EVALITA 2020. This task aims to predict age and gender of blogs users from their posts, as the topic they wrote about. It combines learned representations by RNN at word and sentence levels, Transformer Neural Net, specifically BERT arquitecture, and hand-crafted stylistic features. All these representations are mixed and fed into fully connected layer from a fedforward neural network in order to make predictions for addressed subtasks.

The Models description is available here.

For this code to be functional is needed:

  • Python 3.8
  • tensorflow 2.0
  • Keras 2.4.3
  • Freeling 4.1 and python API
  • Italian Word Embedding avalilable here

Steps for using the model

Training models of the ensemble

The models code for predicting each task is locatend on Ensemble floder, also there is a file train.py which once run save the weights learned with the provided training data. So the first step for use this classifier is run on the command line:

 python ./Ensemble/train.py

The training files are located on data folder and these are the one provided by the contest organizers. If you want to chage the trainning file, change the source variable on this train.py file.

source = "./data/training.txt"

Making Predictions

For making predictions run:

 python main.py

You should provide the test files by -dp option. Inside the test_data folder is the test data provided by the organizers.

Data Format

The datasets are composed by texts written by multiple users, with possibly multiple posts per user. The data is distributed in the form of one XML-like file per genre with one sample per elements, and attributes specifying an id, the topic, the gender male|female, and the age range [0,19], [20,29], [30-39], [40-49], [50-100]. This is a sample:

<doc id="3046" topic="orologi" age="30-39" gender="male" >
 <post>
   Per quale motivo oggi, il mondo dell'orologeria è così importante per voi? 
 </post>
 <post>
   Cosa vi ha spinto a rendervi appassionati così bramosi?
 </post>
</doc>

Releases

No releases published

Packages

No packages published