Skip to content
Turkish Morphological Analysis library
Java
Branch: master
Clone or download
Latest commit 46807bf Jul 14, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
src/main no message Jun 30, 2019
.gitattributes Initial commit May 31, 2018
.gitignore no message Jun 6, 2018
README.md Update README.md Jul 14, 2019
pom.xml Added generation of all parses less than some length. Apr 11, 2019

README.md

MorphologicalAnalysis

Morphology

Turkish is one of the morphologically rich languages due to its agglutinative nature. Morphological Analysis repository provides a two-level morphological analyzer for Turkish which consists of finite state transducer, rule engine for suffixation, and lexicon.

For Developers

Requirements

Java

To check if you have a compatible version of Java installed, use the following command:

java -version

If you don't have a compatible version, you can download either Oracle JDK or OpenJDK

Maven

To check if you have Maven installed, use the following command:

mvn --version

To install Maven, you can follow the instructions here.

Git

Install the latest version of Git.

Download Code

In order to work on code, create a fork from GitHub page. Use Git for cloning the code to your local or below line for Ubuntu:

git clone <your-fork-git-link>

A directory called MorphologicalAnalysis will be created. Or you can use below link for exploring the code:

git clone https://github.com/olcaytaner/MorphologicalAnalysis.git

Open project with IntelliJ IDEA

Steps for opening the cloned project:

  • Start IDE
  • Select File | Open from main menu
  • Choose MorphologicalAnalysis/pom.xml file
  • Select open as project option
  • Couple of seconds, dependencies with Maven will be downloaded.

Compile

From IDE

After being done with the downloading and Maven indexing, select Build Project option from Build menu. After compilation process, user can run Morphological Analysis.

From Console

Go to MorphologicalAnalysis directory and compile with

 mvn compile 

Generating jar files

From IDE

Use package of 'Lifecycle' from maven window on the right and from MorphologicalAnalysis root module.

From Console

Use below line to generate jar file:

 mvn install

Morphological Analysis

Maven Usage

<groupId>NlpToolkit</groupId>
<artifactId>MorphologicalAnalysis</artifactId>
<version>1.0.1</version>
<properties>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
</properties>
<repositories>
    <repository>
        <id>NlpToolkit</id>
        <url>http://haydut.isikun.edu.tr:8081/artifactory/NlpToolkit</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>NlpToolkit</groupId>
        <artifactId>Corpus</artifactId>
        <version>1.0.0</version>
    </dependency>
    <dependency>
        <groupId>NlpToolkit</groupId>
        <artifactId>DataStructure</artifactId>
        <version>1.0.0</version>
    </dependency>
    <dependency>
        <groupId>NlpToolkit</groupId>
        <artifactId>Dictionary</artifactId>
        <version>1.0.1</version>
    </dependency>
</dependencies>

Creating FsmMorphologicalAnalyzer

FsmMorphologicalAnalyzer provides Turkish morphological analysis. This class can be created as follows:

FsmMorphologicalAnalyzer fsm = new FsmMorphologicalAnalyzer();

This generates a new TxtDictionary type dictionary from turkish_dictionary.txt with fixed cache size 100000 and by using turkish_finite_state_machine.xml.

Creating a morphological analyzer with different cache size, dictionary or finite state machine is also possible.

  • With different cache size,

      FsmMorphologicalAnalyzer fsm = new FsmMorphologicalAnalyzer(50000)   
    
  • Using a different dictionary,

      FsmMorphologicalAnalyzer fsm = new FsmMorphologicalAnalyzer("my_turkish_dictionary.txt")    
    
  • Specifying both finite state machine and dictionary,

      FsmMorphologicalAnalyzer fsm = new FsmMorphologicalAnalyzer("fsm.xml", "my_turkish_dictionary.txt")       
    
  • Giving finite state machine and cache size with creating TxtDictionary object,

      TxtDictionary dictionary = new TxtDictionary("my_turkish_dictionary.txt", new TurkishWordComparator())
      FsmMorphologicalAnalyzer fsm = new FsmMorphologicalAnalyzer("fsm.xml", dictionary, 50000) 
    
  • With different finite state machine and creating TxtDictionary object,

      TxtDictionary dictionary = new TxtDictionary("my_turkish_dictionary.txt", new TurkishWordComparator())
      FsmMorphologicalAnalyzer fsm = new FsmMorphologicalAnalyzer("fsm.xml", dictionary)
    

Word level morphological analysis

For morphological analysis, morphologicalAnalysis(String word) method of FsmMorphologicalAnalyzer is used. This returns FsmParseList object.

FsmMorphologicalAnalyzer fsm = new FsmMorphologicalAnalyzer();
String word = "yarına";
FsmParseList fsmParseList = fsm.morphologicalAnalysis(word);
for (int i = 0; i < fsmParseList.size(); i++){
  System.out.println(fsmParseList.getFsmParse(i).transitionList();
} 

Output

yar+NOUN+A3SG+P2SG+DAT
yar+NOUN+A3SG+P3SG+DAT
yarı+NOUN+A3SG+P2SG+DAT
yarın+NOUN+A3SG+PNON+DAT

From FsmParseList, a single FsmParse can be obtained as follows:

FsmParse parse = fsmParseList.getFsmParse(0);
System.out.println(parse.transitionList();   

Output

yar+NOUN+A3SG+P2SG+DAT

Sentence level morphological analysis

morphologicalAnalysis(Sentence sentence, Boolean debug) method of FsmMorphologicalAnalyzer is used. This returns FsmParseList[] object.

FsmMorphologicalAnalyzer fsm = new FsmMorphologicalAnalyzer();
Sentence sentence = new Sentence("Yarın doktora gidecekler");
FsmParseList[] parseLists = fsm.morphologicalAnalysis(sentence, false);
for(int i = 0; i < parseLists.length; i++){
    for(int j = 0; j < parseLists[i].size(); j++){
        FsmParse parse = parseLists[i].getFsmParse(j);
        System.out.println(parse.transitionList());
    }
    System.out.println("-----------------");
}

Output

-----------------
yar+NOUN+A3SG+P2SG+NOM
yar+NOUN+A3SG+PNON+GEN
yar+VERB+POS+IMP+A2PL
yarı+NOUN+A3SG+P2SG+NOM
yarın+NOUN+A3SG+PNON+NOM
-----------------
doktor+NOUN+A3SG+PNON+DAT
doktora+NOUN+A3SG+PNON+NOM
-----------------
git+VERB+POS+FUT+A3PL
git+VERB+POS^DB+NOUN+FUTPART+A3PL+PNON+NOM
You can’t perform that action at this time.