-
Notifications
You must be signed in to change notification settings - Fork 14
Training and evaluating all modules using the example corpus
For now, this page simply lists the commands for training and evaluating all modules using the example corpus.
java -Xmx1G -Dconfig.file=examples/french/conf/fr-train-eval-no-lex.conf -jar talismane-core-X.X.X.jar --train --sessionId=fr --module=sentenceDetector --inFile="examples/french/corpus/frWikiDisc_v1.1-sentence-train.txt" --logConfigFile=examples/conf/logback.xml --sentenceModel="output/models/sentenceTest1.zip"
Note: In this case, there was a sentence file available with one sentence per line. If this file isn't available, sentences can be reconstructed from a CoNLL file or equivalent. To do this, we add the following settings to the configuration file:
sentence-detector { train { corpus-reader = com.joliciel.talismane.tokeniser.TokenRegexBasedCorpusReader input-pattern = ${input-pattern} ... } }
java -Xmx1G -Dconfig.file=examples/french/conf/fr-train-eval-no-lex.conf -jar talismane-core-X.X.X.jar --evaluate --sessionId=fr --module=sentenceDetector --sentenceModel="output/models/sentenceTest1.zip" --inFile=examples/french/corpus/frWikiDisc_v1.1-sentence-test.txt --encoding=UTF8 --logConfigFile=examples/conf/logback.xml --outDir=output/eval/sentence
java -Xmx1G -Dconfig.file=examples/french/conf/fr-train-eval-no-lex.conf -jar talismane-core-X.X.X.jar --train --sessionId=fr --module=tokeniser --inFile="examples/french/corpus/frWikiDisc_v1.1-train.conll" --logConfigFile=data/conf/logback.xml --tokeniserModel="output/models/tokeniserTest1.zip"
java -Xmx1G -Dconfig.file=examples/french/conf/fr-train-eval-no-lex.conf -jar talismane-core-X.X.X.jar --evaluate --sessionId=fr --module=tokeniser --tokeniserModel="output/models/tokeniserTest1.zip" --inFile=examples/french/corpus/frWikiDisc_v1.1-test.conll --encoding=UTF8 --logConfigFile=examples/conf/logback.xml --outDir=output/eval/tokeniser
java -Xmx1G -Dconfig.file=examples/french/conf/fr-train-eval-no-lex.conf -jar talismane-core-X.X.X.jar --train --sessionId=fr --module=posTagger --posTaggerModel=output/models/frPosTagger1.zip --inFile=examples/french/corpus/frWikiDisc_v1.1-train.conll --encoding=UTF8 --logConfigFile=examples/conf/logback.xml
java -Xmx1G -Dconfig.file=examples/french/conf/fr-serialize-lexicon.conf -jar talismane-core-X.X.X.jar --serializeLexicon --sessionId=fr --lexiconProps=examples/french/lexicons/lexicons_fr.txt --outFile=output/lexicons/lexicons_fr.zip
java -Xmx1G -Dconfig.file=examples/french/conf/fr-train-eval-with-lex.conf -jar talismane-core-X.X.X.jar --testLexicon --sessionId=fr --words=à,dommage,drainer,dites,que
Note: that the configuration file fr-train-eval-with-lex.conf
looks for the lexicon in the directory output/lexicons/lexicons_fr.zip
, as indicated by the following key:
lexicons = [ "output/lexicons/lexicons_fr.zip" ]
If you serialized into a different directory, you need to change this configuration value. If you have a configuration file with a lexicons key, or if you want to override the location in the configration file, you can use the command-line option --lexicon
, as follows:
java -Xmx1G -Dconfig.file=examples/french/conf/fr-train-eval-with-lex.conf -jar talismane-core-X.X.X.jar --testLexicon --sessionId=fr --lexicon=other-location/lexicons_fr.zip --words=à,dommage,drainer,dites,que
java -Xmx1G -Dconfig.file=examples/french/conf/fr-train-eval-with-lex.conf -jar talismane-core-X.X.X.jar --train --sessionId=fr --module=posTagger --posTaggerModel=output/models/frPosTaggerLex1.zip --inFile=examples/french/corpus/frWikiDisc_v1.1-train.conll --encoding=UTF8 --lexicon=output/lexicons/lexicons_fr.zip --logConfigFile=examples/conf/logback.xml
java -Xmx1G -Dconfig.file=examples/french/conf/fr-train-eval-with-lex.conf -jar talismane-core-X.X.X.jar --evaluate --sessionId=fr --module=posTagger --posTaggerModel=output/models/frPosTaggerLex1.zip --inFile=examples/french/corpus/frWikiDisc_v1.1-test.conll --encoding=UTF8 --lexicon=output/lexicons/lexicons_fr.zip --logConfigFile=examples/conf/logback.xml --outDir=output/eval/posTagger --suffix=_lex1
java -Xmx1G -Dconfig.file=examples/french/conf/fr-train-eval-with-lex.conf -jar talismane-core-X.X.X.jar --train --sessionId=fr --module=parser --parserModel=output/models/frParserLex1.zip --inFile=examples/french/corpus/frWikiDisc_v1.1-train.conll --encoding=UTF8 --lexicon=output/lexicons/lexicons_fr.zip --logConfigFile=examples/conf/logback.xml
java -Xmx1G -Dconfig.file=examples/french/conf/fr-train-eval-with-lex.conf -jar talismane-core-X.X.X.jar --evaluate --sessionId=fr --module=parser --parserModel=output/models/frParserLex1.zip --inFile=examples/french/corpus/frWikiDisc_v1.1-test.conll --encoding=UTF8 --logConfigFile=examples/conf/logback.xml --outDir=output/eval/parser --suffix=_lex1 --lexicon=output/lexicons/lexicons_fr.zip