NLP framework: sentence detector, tokeniser, pos-tagger and dependency parser
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
examples
talismane_core
talismane_distribution
talismane_examples
talismane_extensions
talismane_fr
talismane_machine_learning
talismane_parent
talismane_utils
.gitignore
.travis.yml
CONTRIBUTING.md
LICENSE.txt
README.md

README.md

Talismane Logo

Build Status

Talismane is a natural language processing framework with sentence detector, tokeniser, pos-tagger and dependency syntax parser. Current available language packs include French (standard and Universal Dependencies) and English.

Sample input:

Les amoureux qui se bécotent sur les bancs publics ont des petites gueules bien sympathiques.

Sample output: a syntax tree, shown below in CoNLL-X format, also available as a Java object for manipulation in code.

1	Les	les	DET	DET	n=p|	2	det	2	det
2	amoureux	amoureux	NC	NC	g=m|	10	suj	10	suj
3	qui	qui	PROREL	PROREL	n=s|	5	suj	5	suj
4	se	se	CLR	CLR	n=p|p=3|	5	aff	5	aff
5	bécotent	bécoter	V	V	n=p|t=PS|p=3|	2	mod_rel	2	mod_rel
6	sur	sur	P	P		5	mod	5	mod
7	les	les	DET	DET	n=p|	8	det	8	det
8	bancs	banc	NC	NC	n=p|g=m|	6	prep	6	prep
9	publics	public	ADJ	ADJ	n=p|g=m|	8	mod	8	mod
10	ont	avoir	V	V	n=p|t=P|p=3|	0	root	0	root
11	des	des	DET	DET	n=p|	13	det	13	det
12	petites	petit	ADJ	ADJ	n=p|g=f|	13	mod	13	mod
13	gueules	gueule	NC	NC	n=p|	10	obj	10	obj
14	bien	bien	ADV	ADV		15	mod	15	mod
15	sympathiques	sympathique	ADJ	ADJ	n=p|	13	mod	13	mod
16	.	.	PONCT	PONCT		15	ponct	15	ponct

Downloads: The latest release and language packs can be downloaded on the releases pages.

Wiki: Simple instructions for use can be found on the Talismane wiki.

Command-line usage: follow the setup instructions, and then run a command similar to the following:

java -Xmx1G -Dconfig.file=talismane-fr-X.X.X.conf -jar talismane-core-X.X.X.jar --analyse --sessionId=fr --encoding=UTF8 --inFile=data/frTest.txt --outFile=data/frTest.tal

Calling from Java: For syntax analysis within Java code via the API, see this Java code example.

JavaDoc API: You may also consult the full JavaDoc API online.

User's manual: An out-of-date users's manual can be found on the GitHub Talismane project page. For up-to-date documentation, you're far better off consulting the wiki or the JavaDoc API .

Additional information on the project can be found on the CLLE-ERSS laboratory Talismane project home page.

Language pack usage

  • The French language pack can be used for research purposes provided that you have a license for the French Treebank. The model included is not optimised as it uses a Maximum Entropy model (which only requires about 1G of RAM) rather than a Linear SVM model (which requires about 24G RAM). If you would like the more optimised Linear SVM model, please contact Assaf Urieli.

  • The English language pack can be used for research purposes provided that you have a license for the Penn Treebank. WARNING: the English model is only an initial version, with no attempts at optimisation.