Dependency parser implementation used by KParse team in CoNLL18 shared task. The model that we implemented explained in our paper titled as Tree-stack LSTM in Transition Based Dependency Parsing
We use text files tokenized by UDPipe, please make sure that you have installed it from their official repository. All this code is working with julia 0.6.2 current versions not supported yet.
Clone the repository to install the parser and dependencies:
git clone https://github.com/kirnap/ku-dependency-parser2.git && cd ku-dependency-parser2
We used our pre-trained language model from CoNLL17 shared task and the code for that is given under our CoNLL17 repository section LM
Since this is a research repository, code structure is a bit messy. Let's walk through the code structure. As we explained in the paper, we use morphological features only for some languages. Following command prints the dictionary where the true labels indicate that morpohological features are used for that language.
cat use_feats.jl
For example if we want to train en_lines here are the steps for that:
-
cat use_feats.jl | grep en_lines
which gives true, therefore we need to train with the following command:
julia train_feats3.jl --lmfile your/path/to/english_chmodel.jld --datafiles /your-path-to/ud-treebanks-v2.2/UD_English-LinES/en_lines-ud-train.conllu /your/path/to/ud-treebanks-v2.2/UD_English-LinES/en_lines-ud-dev.conllu --bestfile your_model_file.jld
-
- Suppose we want to train hu_szeged which is not using morphological features, thus we need the following command:
julia train_nofeats.jl --lmfile your/path/to/hu_szeged.jld --datafiles /your-path-to/hu_szeged.train.conllu /your-path-to/hu_szeged.dev.conllu --bestfile your_model_file.jld
Let's dive into the testing case Suppose we want to test the performance of our en_lines model that we trained in the previous section
julia train_feats3.jl --datafiles your-path-to/ud-treebanks-v2.2/UD_English-LinES/en_lines-ud-dev.conllu --loadfile your-path-to/en_lines.jld --epochs 0 --output your_testfile.conllu
Similarly if you want to test a model trained without morphological features (e.g. hu_szeged)
julia train_nofeats.jl --datafiles your-path-to/ud-treebanks-v2.2/UD_Hungarian/hu_szeged.conllu --loadfile your-path-to/hu_szeged.jld --epochs 0 --output your_testfile.conllu
Please not that these commands creates .conllu formatted files with predicted 'head' and deprel columns
In order to understand the code structure here is a brief explanation of some model files under src/
directory:
src/_model_feat3_1.jl
: contains the most current version of our model using morphological features as well.src/model_nofeat1.jl
: contains the most current version of our model not using morphological featuressrc/model_nofeat_dyn.jl
: contains the model which not uses morphological features and trained with dynamic oracle training that we explained in our paper
To better understand the code start from src/header.jl
file, please note that you have to provide .conllu formatted file to our system.
You may download the parser models from here
You may download the language models from here
You may find converted version of langauge models from here (If you couldn't find your model, please refer the next section on this document)
You need 2 steps arrangements:
- on julia 0.6
using JLD, Knet;include("src/header.jl")
language_model = "/kuacc/users/okirnap/ud-treebanks-v2.2/chmodel_converted/english_chmodel.jld"
d = load(language_model);
word_vocab2 = Dict{String, Int64}();
for (k,v) in d["word_vocab"]; word_vocab2[k]=v;end;
# we have a character conversion inconvenience :( , to fix it store those .txt file and reload it from julia 1
open("english_chars.txt", "w") do f; for (k,v) in d["char_vocab"]; k1=string(k); write(f, "$k1,$v\n");end;end;
new_d2 = Dict{String, Any}();for (k,v) in d; (k =="word_vocab") ? new_d2[k]=word_vocab2 : new_d2[k] =v;end;
using JLD2
JLD2.@save "english_chmodel.jld2" new_d2
- on julia 1.0, please make sure that you are on branch julia1
using JLD2,Knet;include("src/header.jl")
JLD2.@load "english_chmodel.jld2" new_d2; # now you have it!
char_vocab = Dict{Char, Int}() # use this char_vocab instead of the one coming from new_d2
for line in eachline("english_chars.txt"); s1, s2 = split(line, ","); isempty(s1) && continue; char_vocab[s1[1]] = parse(Int, s2);end;
For more help, you are welcome to open an issue, or directly contact okirnap@ku.edu.tr.