Skip to content
Go to file

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Koç University Dependency Parser

Dependency parser implementation used by Koç University team in Conll17 shared task. Our team ranked 7th as posted in the results.

Getting started

This document will guide you to get a working copy of dependency parser software on your machine. The system has two parts; language modelling and dependency parsing. Most updated version of source can be found on the official repo.


We use text files tokenized by UDPipe, please make sure that you have installed it from their official repository.

Our entire software runs on Julia, so it should be installed on your system as well. Julia can be installed from their official download page. After the requirements are met, follow the installation instructions below.


Clone the repository to install the parser and dependencies:

git clone && cd ku-dependency-parser
julia installer.jl

Code Structure

Dependency parser related code is under parser folder and Language Model related code is under lm.


To be able to train parser on a specific language, first you need to have a pre-trained language model so that you can generate context and word embeddings for that language. Here are the steps to train a language model:

Bi-LSTM based Language Model

Switch to language model directory:

cd lm

If you do not have raw version of .conllu formatted file, run the following to obtain tokenized raw text:

udpipe --output=horizontal none --outfile texts/{}.txt *.conllu

Create a vocabulary file from the text file that is tokenized by UDPipe (provided by Conll17 task organizers), please notice that output file contains word-frequency information for the supplied text file:

julia wordcount.jl --textfile 'input text file' --output 'vocabulary-file'

Language model training expects the vocabulary file not contain any frequency information, thus using linux tools remove that frequency information:

awk '{$1="";print $0}' path/to/vocabulary-file > path/to/words-file

Create a file that includes top N (e.g. 10000) words to be used during the training:

head -n10000 path/to/words-file > path/to/vocab-file

To train the language model, you need to run the following command:

julia lm_train.jl --trainfile 'udpipe-output.txt' --vocabfile 'path/to/vocab-file'  --wordsfile 'path/to/words-file' --savefile model.jld

Warning: Please be aware that language model training takes approximately 24 hours on Tesla K80 GPU.

Dependency parser

Go to the parent directory and run the following command:

julia main.jl --load '/path/to/pre-trained language model' --datafiles 'path/to/train_file.conllu' 'path/to/dev_file.conllu' --otrain 'number of epochs'

For more detailed options, run:

julia main.jl --help

Additional help

For more help, you are welcome to open an issue, or directly contact


Koc University Dependency Parser Implementation used in Conll17 shared task




No releases published


No packages published


You can’t perform that action at this time.