Skip to content

kirnap/ku-dependency-parser2

Repository files navigation

Koç University Dependency Parser

Dependency parser implementation used by KParse team in CoNLL18 shared task. The model that we implemented explained in our paper titled as Tree-stack LSTM in Transition Based Dependency Parsing

Prerequisites

We use text files tokenized by UDPipe, please make sure that you have installed it from their official repository. All this code is working with julia 0.6.2 current versions not supported yet.

Installing

Clone the repository to install the parser and dependencies:

git clone https://github.com/kirnap/ku-dependency-parser2.git && cd ku-dependency-parser2

Code Structure

Bi-LSTM Language Model

We used our pre-trained language model from CoNLL17 shared task and the code for that is given under our CoNLL17 repository section LM

Parser Model Files

Since this is a research repository, code structure is a bit messy. Let's walk through the code structure. As we explained in the paper, we use morphological features only for some languages. Following command prints the dictionary where the true labels indicate that morpohological features are used for that language.

Training
cat use_feats.jl

For example if we want to train en_lines here are the steps for that:

    1. cat use_feats.jl | grep en_lines which gives true, therefore we need to train with the following command:
julia train_feats3.jl --lmfile your/path/to/english_chmodel.jld --datafiles /your-path-to/ud-treebanks-v2.2/UD_English-LinES/en_lines-ud-train.conllu  /your/path/to/ud-treebanks-v2.2/UD_English-LinES/en_lines-ud-dev.conllu --bestfile your_model_file.jld
    1. Suppose we want to train hu_szeged which is not using morphological features, thus we need the following command:
julia train_nofeats.jl --lmfile your/path/to/hu_szeged.jld --datafiles /your-path-to/hu_szeged.train.conllu  /your-path-to/hu_szeged.dev.conllu --bestfile your_model_file.jld
Testing

Let's dive into the testing case Suppose we want to test the performance of our en_lines model that we trained in the previous section

julia train_feats3.jl --datafiles your-path-to/ud-treebanks-v2.2/UD_English-LinES/en_lines-ud-dev.conllu --loadfile your-path-to/en_lines.jld --epochs 0 --output your_testfile.conllu

Similarly if you want to test a model trained without morphological features (e.g. hu_szeged)

julia train_nofeats.jl --datafiles your-path-to/ud-treebanks-v2.2/UD_Hungarian/hu_szeged.conllu --loadfile your-path-to/hu_szeged.jld --epochs 0 --output your_testfile.conllu

Please not that these commands creates .conllu formatted files with predicted 'head' and deprel columns

Code details

In order to understand the code structure here is a brief explanation of some model files under src/ directory:

  1. src/_model_feat3_1.jl : contains the most current version of our model using morphological features as well.
  2. src/model_nofeat1.jl : contains the most current version of our model not using morphological features
  3. src/model_nofeat_dyn.jl : contains the model which not uses morphological features and trained with dynamic oracle training that we explained in our paper

To better understand the code start from src/header.jl file, please note that you have to provide .conllu formatted file to our system.

Pre-trained models

You may download the parser models from here

You may download the language models from here

You may find converted version of langauge models from here (If you couldn't find your model, please refer the next section on this document)

Loading Language Models on julia 1.0.3

You need 2 steps arrangements:

  1. on julia 0.6
   using JLD, Knet;include("src/header.jl")
   language_model = "/kuacc/users/okirnap/ud-treebanks-v2.2/chmodel_converted/english_chmodel.jld"
   d = load(language_model);
   word_vocab2 = Dict{String, Int64}();
   for (k,v) in d["word_vocab"]; word_vocab2[k]=v;end;
   # we have a character conversion inconvenience :( , to fix it store those .txt file and reload it from julia 1
   open("english_chars.txt", "w") do f; for (k,v) in d["char_vocab"]; k1=string(k); write(f, "$k1,$v\n");end;end;
   new_d2 = Dict{String, Any}();for (k,v) in d; (k =="word_vocab") ? new_d2[k]=word_vocab2 : new_d2[k] =v;end;

   using JLD2
   JLD2.@save "english_chmodel.jld2" new_d2
  1. on julia 1.0, please make sure that you are on branch julia1
   using JLD2,Knet;include("src/header.jl")
   JLD2.@load "english_chmodel.jld2" new_d2; # now you have it!
   char_vocab = Dict{Char, Int}() # use this char_vocab instead of the one coming from new_d2
   for line in eachline("english_chars.txt"); s1, s2 = split(line, ","); isempty(s1) && continue; char_vocab[s1[1]] = parse(Int, s2);end;

Additional help

For more help, you are welcome to open an issue, or directly contact okirnap@ku.edu.tr.

About

Yet another dependency parser...

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published