A WordNet is a graph data structure where the nodes are word senses with their associated lemmas (and collocations in the case of multiword expressions (MWEs)) and edges are semantic relations between the sense pairs. Usually, the multiple senses corresponding to a single lemma are enumerated and are referenced as such. For example, the triple
represents an edge in the WordNet graph and corresponds to a semantic relation r1 between the second sense of the lemma w5 and the third sense of the lemma w7. The direction of the relation is usually implicit in the ordering of the elements of the triple. For synonymy, the direction is symmetric. For hypernymy, as a convention, the first sense is an hyponym of the second.
The main lexical source for KeNet is the Contemporary Dictionary of Turkish (CDT) (Güncel Türkçe Sözlük) published online and in paper by the Turkish Language Institute (TLI) (Türk Dil Kurumu), a government organization. Among other literary and academic works, the TLI publishes specialized and comprehensive dictionaries. These dictionaries are often taken as an authoritative reference by other dictionaries. The online version of the CDT contains 65,944 lemmas. Although the TLI publishes a separate dictionary of idioms and proverbs, the CDT still contains some MWE entries that have idiomatic senses.
The structure of a sample synset is as follows:
<SYNSET> <ID>TUR10-0038510</ID> <LITERAL>anne<SENSE>2</SENSE> </LITERAL> <POS>n</POS> <DEF>...</DEF> <EXAMPLE>...</EXAMPLE> </SYNSET>
Each entry in the dictionary is enclosed by and tags. Synset members are represented as literals and their sense numbers. shows the unique identifier given to the synset. and tags denote part of speech and definition, respectively. As for the tag, it gives a sample sentence for the synset.
To check if you have a compatible version of Java installed, use the following command:
To check if you have Maven installed, use the following command:
To install Maven, you can follow the instructions here.
Install the latest version of Git.
In order to work on code, create a fork from GitHub page. Use Git for cloning the code to your local or below line for Ubuntu:
git clone <your-fork-git-link>
A directory called WordNet will be created. Or you can use below link for exploring the code:
git clone https://github.com/olcaytaner/WordNet.git
Open project with IntelliJ IDEA
Steps for opening the cloned project:
- Start IDE
- Select File | Open from main menu
- Select open as project option
- Couple of seconds, dependencies with Maven will be downloaded.
After being done with the downloading and Maven indexing, select Build Project option from Build menu. After compilation process, user can run WordNet.
WordNet directory and compile with
Generating jar files
package of 'Lifecycle' from maven window on the right and from
WordNet root module.
Use below line to generate jar file:
<groupId>NlpToolkit</groupId> <artifactId>WordNet</artifactId> <version>1.0.11</version> <properties> <maven.compiler.source>1.8</maven.compiler.source> <maven.compiler.target>1.8</maven.compiler.target> </properties> <repositories> <repository> <id>NlpToolkit</id> <url>http://haydut.isikun.edu.tr:8081/artifactory/NlpToolkit</url> </repository> </repositories> <dependencies> <dependency> <groupId>NlpToolkit</groupId> <artifactId>DataStructure</artifactId> <version>1.0.0</version> </dependency> <dependency> <groupId>NlpToolkit</groupId> <artifactId>Dictionary</artifactId> <version>1.0.2</version> </dependency> <dependency> <groupId>NlpToolkit</groupId> <artifactId>MorphologicalAnalysis</artifactId> <version>1.0.2</version> </dependency> </dependencies>