NLP framework: sentence detector, tokeniser, pos-tagger and dependency parser
Latest commit 650817d Apr 23, 2018

README.md

Talismane Logo

Build Status

Talismane is a natural language processing framework with sentence detector, tokeniser, pos-tagger and dependency syntax parser. Current available language packs include French and English.

Sample input:

Les amoureux qui se bécotent sur les bancs publics ont des petites gueules bien sympathiques.

Sample output: a syntax tree, shown below in CoNLL-X format, also available as a Java object for manipulation in code.

1	Les	les	DET	DET	n=p|	2	det	2	det
2	amoureux	amoureux	NC	NC	g=m|	10	suj	10	suj
3	qui	qui	PROREL	PROREL	n=s|	5	suj	5	suj
4	se	se	CLR	CLR	n=p|p=3|	5	aff	5	aff
5	bécotent	bécoter	V	V	n=p|t=PS|p=3|	2	mod_rel	2	mod_rel
6	sur	sur	P	P		5	mod	5	mod
7	les	les	DET	DET	n=p|	8	det	8	det
8	bancs	banc	NC	NC	n=p|g=m|	6	prep	6	prep
9	publics	public	ADJ	ADJ	n=p|g=m|	8	mod	8	mod
10	ont	avoir	V	V	n=p|t=P|p=3|	0	root	0	root
11	des	des	DET	DET	n=p|	13	det	13	det
12	petites	petit	ADJ	ADJ	n=p|g=f|	13	mod	13	mod
13	gueules	gueule	NC	NC	n=p|	10	obj	10	obj
14	bien	bien	ADV	ADV		15	mod	15	mod
15	sympathiques	sympathique	ADJ	ADJ	n=p|	13	mod	13	mod
16	.	.	PONCT	PONCT		15	ponct	15	ponct

Downloads: The latest release and language packs can be downloaded on the releases pages.

Wiki: Simple instructions for use can be found on the Talismane wiki.

Command-line usage: follow the setup instructions, and then run a command similar to the following:

java -Xmx1G -Dconfig.file=talismane-fr-X.X.X.conf -jar talismane-core-X.X.X.jar --analyse --sessionId=fr --encoding=UTF8 --inFile=data/frTest.txt --outFile=data/frTest.tal

Calling from Java: For syntax analysis within Java code via the API, see this Java code example.

JavaDoc API: You may also consult the full JavaDoc API online.

User's manual: An out-of-date users's manual can be found on the GitHub Talismane project page. For up-to-date documentation, you're far better off consulting the wiki or the JavaDoc API .

Additional information on the project can be found on the CLLE-ERSS laboratory Talismane project home page.

Language pack usage

  • The French language pack can be used for research purposes provided that you have a license for the French Treebank. The model included is not optimised as it uses a Maximum Entropy model (which only requires about 1G of RAM) rather than a Linear SVM model (which requires about 24G RAM). If you would like the more optimised Linear SVM model, please contact Assaf Urieli.

  • The English language pack can be used for research purposes provided that you have a license for the Penn Treebank. WARNING: the English model is only an initial version, with no attempts at optimisation.