Koç University Dependency Parser

Dependency parser implementation used by KParse team in CoNLL18 shared task. The model that we implemented explained in our paper titled as Tree-stack LSTM in Transition Based Dependency Parsing

Prerequisites

We use text files tokenized by UDPipe, please make sure that you have installed it from their official repository. All this code is working with julia 0.6.2 current versions not supported yet.

Installing

Clone the repository to install the parser and dependencies:

git clone https://github.com/kirnap/ku-dependency-parser2.git && cd ku-dependency-parser2

Code Structure

Bi-LSTM Language Model

We used our pre-trained language model from CoNLL17 shared task and the code for that is given under our CoNLL17 repository section LM

Parser Model Files

Since this is a research repository, code structure is a bit messy. Let's walk through the code structure. As we explained in the paper, we use morphological features only for some languages. Following command prints the dictionary where the true labels indicate that morpohological features are used for that language.

Training

cat use_feats.jl

For example if we want to train en_lines here are the steps for that:

1. cat use_feats.jl | grep en_lines which gives true, therefore we need to train with the following command:

julia train_feats3.jl --lmfile your/path/to/english_chmodel.jld --datafiles /your-path-to/ud-treebanks-v2.2/UD_English-LinES/en_lines-ud-train.conllu  /your/path/to/ud-treebanks-v2.2/UD_English-LinES/en_lines-ud-dev.conllu --bestfile your_model_file.jld

1. Suppose we want to train hu_szeged which is not using morphological features, thus we need the following command:

julia train_nofeats.jl --lmfile your/path/to/hu_szeged.jld --datafiles /your-path-to/hu_szeged.train.conllu  /your-path-to/hu_szeged.dev.conllu --bestfile your_model_file.jld

Testing

Let's dive into the testing case Suppose we want to test the performance of our en_lines model that we trained in the previous section

julia train_feats3.jl --datafiles your-path-to/ud-treebanks-v2.2/UD_English-LinES/en_lines-ud-dev.conllu --loadfile your-path-to/en_lines.jld --epochs 0 --output your_testfile.conllu

Similarly if you want to test a model trained without morphological features (e.g. hu_szeged)

julia train_nofeats.jl --datafiles your-path-to/ud-treebanks-v2.2/UD_Hungarian/hu_szeged.conllu --loadfile your-path-to/hu_szeged.jld --epochs 0 --output your_testfile.conllu

Please not that these commands creates .conllu formatted files with predicted 'head' and deprel columns

Code details

In order to understand the code structure here is a brief explanation of some model files under src/ directory:

src/_model_feat3_1.jl : contains the most current version of our model using morphological features as well.
src/model_nofeat1.jl : contains the most current version of our model not using morphological features
src/model_nofeat_dyn.jl : contains the model which not uses morphological features and trained with dynamic oracle training that we explained in our paper

To better understand the code start from src/header.jl file, please note that you have to provide .conllu formatted file to our system.

Pre-trained models

You may download the parser models from here

You may download the language models from here

You may find converted version of langauge models from here (If you couldn't find your model, please refer the next section on this document)

Loading Language Models on julia 1.0.3

You need 2 steps arrangements:

on julia 0.6

   using JLD, Knet;include("src/header.jl")
   language_model = "/kuacc/users/okirnap/ud-treebanks-v2.2/chmodel_converted/english_chmodel.jld"
   d = load(language_model);
   word_vocab2 = Dict{String, Int64}();
   for (k,v) in d["word_vocab"]; word_vocab2[k]=v;end;
   # we have a character conversion inconvenience :( , to fix it store those .txt file and reload it from julia 1
   open("english_chars.txt", "w") do f; for (k,v) in d["char_vocab"]; k1=string(k); write(f, "$k1,$v\n");end;end;
   new_d2 = Dict{String, Any}();for (k,v) in d; (k =="word_vocab") ? new_d2[k]=word_vocab2 : new_d2[k] =v;end;

   using JLD2
   JLD2.@save "english_chmodel.jld2" new_d2

on julia 1.0, please make sure that you are on branch julia1

   using JLD2,Knet;include("src/header.jl")
   JLD2.@load "english_chmodel.jld2" new_d2; # now you have it!
   char_vocab = Dict{Char, Int}() # use this char_vocab instead of the one coming from new_d2
   for line in eachline("english_chars.txt"); s1, s2 = split(line, ","); isempty(s1) && continue; char_vocab[s1[1]] = parse(Int, s2);end;

Additional help

For more help, you are welcome to open an issue, or directly contact okirnap@ku.edu.tr.

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
analyzer.jl		analyzer.jl
experiment.sh		experiment.sh
fixjld.jl		fixjld.jl
notest.txt		notest.txt
tiraprocess.jl		tiraprocess.jl
train.jl		train.jl
train_feats.jl		train_feats.jl
train_feats3.jl		train_feats3.jl
train_nofeats.jl		train_nofeats.jl
train_nofeatsdyn.jl		train_nofeatsdyn.jl
use_feats.jl		use_feats.jl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Koç University Dependency Parser

Prerequisites

Installing

Code Structure

Bi-LSTM Language Model

Parser Model Files

Training

Testing

Code details

Pre-trained models

Loading Language Models on julia 1.0.3

Additional help

About

Releases

Packages

Languages

License

kirnap/ku-dependency-parser2

Folders and files

Latest commit

History

Repository files navigation

Koç University Dependency Parser

Prerequisites

Installing

Code Structure

Bi-LSTM Language Model

Parser Model Files

Training

Testing

Code details

Pre-trained models

Loading Language Models on julia 1.0.3

Additional help

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages