Skip to content

hazemalsaied/IdenSys

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A simple transition-based system for the identification of the verbal multword expressions.

This system has participated in the competition of MWE Shared task - EACL 2017 (Valencia, Spain).

Corpora: Data sets proposed by MWE Shared task - EACL 2017 (Valencia, Spain).

Results: results of experiments on 18 languages with multiple reports reflecting the progress of the identification process.

SourceCode: The folder Src

Overview of the source code:

The identifier runs using identify function (identifier.py).

The input of the system is a set of configurations:

1- the language you want to test and its feature group (This should be represented as a json file in "Experiments/xp").

NB configuration files with our preferred feature group are available in 'Experiments/Langs'.

2- The mode of evaluation:

A- Debug mode: used during development for faster execution time;

B- Test Mode: using the full version of Shared task train and test data sets;

C- Cross validation mode: A 5-fold cross validation evaluation over the Shared task train data (No shuffle used).

The Output is list of F-scores for the identification and the categorization (over each category) of VMWEs.

The actual version is very light (No reports, no analysis and no code for additional experiments).

A good start to understand the code is to start with the identifier script and to follow the code.

**NB:**The folder script contain multiple secondary scripts used for 1- baseline.py: creating a baseline system for identifying Sharedtask MWE using string matching techniques 2- marmot.py and mateParsing.py which were used to generate the automatic annotation (POS tags and syntaxic annotations) for languages whose annotations where manulally annotated (FR, CS, HU, and PL).

Citation:

Al Saied, H., Constant, M., & Candito, M. (2017). The atilf-llf system for parseme shared task: a transition-based verbal multiword expression tagger. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017) (pp. 127-132).