NLP framework: sentence detector, tokeniser, pos-tagger and dependency parser
Java Other
Latest commit f22c26f Jan 30, 2017 @urieli urieli Up the version to 3.0.5

README.md

Talismane Logo

Build Status

Talismane is a natural language processing framework with sentence detector, tokeniser, pos-tagger and dependency syntax parser. Current available language packs include French and English.

Sample input:

Les amoureux qui se bécotent sur les bancs publics ont des petites gueules bien sympathiques.

Sample output: a syntax tree, shown below in CoNLL-X format, also available as a Java object for manipulation in code.

1   Les les DET DET n=p|    2   det 2   det
2   amoureux    amoureux    NC  NC  g=m|    10  suj 10  suj
3   qui qui PROREL  PROREL  n=s|    5   suj 5   suj
4   se  se  CLR CLR n=p|p=3|    5   aff 5   aff
5   bécotent   bécoter    V   V   n=p|t=PS|p=3|   2   mod_rel 2   mod_rel
6   sur sur P   P       5   mod 5   mod
7   les les DET DET n=p|    8   det 8   det
8   bancs   banc    NC  NC  n=p|g=m|    6   prep    6   prep
9   publics public  ADJ ADJ n=p|g=m|    8   mod 8   mod
10  ont avoir   V   V   n=p|t=P|p=3|    0   root    0   root
11  des des DET DET n=p|    13  det 13  det
12  petites petit   ADJ ADJ n=p|g=f|    13  mod 13  mod
13  gueules gueule  NC  NC  n=p|    10  obj 10  obj
14  bien    bien    ADV ADV     15  mod 15  mod
15  sympathiques    sympathique ADJ ADJ n=p|    13  mod 13  mod
16  .   .   PONCT   PONCT       15  ponct   15  ponct

Downloads: The latest release and language packs can be downloaded on the releases pages.

Wiki: Simple instructions for use can be found on the Talismane wiki.

Command-line usage: follow the setup instructions, and then run a command similar to the following:

java -Xmx1G -Dconfig.file=talismane-fr-X.X.X.conf -jar talismane-core-X.X.X.jar encoding=UTF8 inFile=data/frTest.txt outFile=data/frTest.tal

Calling from Java: For syntax analysis within Java code via the API, see this Java code example.

JavaDoc API: You may also consult the full JavaDoc API online.

User's manual: A full users's manual can be found on the GitHub Talismane project page.

Additional information on the project can be found on the CLLE-ERSS laboratory Talismane project home page.

Language pack usage

  • The French language pack can be used for research purposes provided that you have a license for the French Treebank. The model included is not optimised as it uses a Maximum Entropy model (which only requires about 1G of RAM) rather than a Linear SVM model (which requires about 24G RAM). If you would like the more optimised Linear SVM model, please contact Assaf Urieli.

  • The English language pack can be used for research purposes provided that you have a license for the Penn Treebank. WARNING: the English model is only an initial version, with no attempts at optimisation.