Skip to content

MaxEntClassificationEDA

Roberto Zanoli edited this page Dec 22, 2014 · 1 revision

MaxEntClassificationEDA is an Entailment Decision Algorithm (EDA) based on a prototype system called TIE (Textual Inference Engine), which is developed and maintained by Rui Wang and his colleagues in the Language Technology (LT) lab of DFKI GmbH.

Notice that technically running MaxEntClassificationEDA does not require additional installation or building steps apart from setting up the EOP. Also, among the knowledge resources that should be manually installed by the users (explained here in the manual), we highly recommend the users to install TreeTagger in order to use most of the components described below. Other required knowledge resources for each of the configurations are described below.

Here are the configurations you could setup for MaxEntClassificationEDA.

Table of Contents

Configuration File

There is a list of pre-defined configuration files which can be found at /config and also in the eop-resources archive at eop-resources/configuration-file/. Most values in the configuration file can stay exactly as provided. We bring here the details of some of the values you may wish (or need) to change.

Common settings

Section Property Value Requirement
PlatformConfiguration activatedEDA It's the common setting for selecting the EDA. The default value here is eu.excitementproject.eop. core.MaxEntClassificationEDA. N/A
PlatformConfiguration language For the moment, MaxEntClassificationEDA supports English (EN), German (DE), and Italian (IT). In principle, the EDA is language-independent. The default value is EN. N/A
PlatformConfiguration activatedLAP The linguistic analysis pipeline needed for the EDA. The default value is eu.excitementproject.eop.lap. dkpro.MaltParserEN. Notice that the EN indicates the language flag. N/A
eu.excitementproject.eop. core.MaxEntClassificationEDA modelFile The location where the trained model is stored. The default location is under ./src/test/resources/model/. The conventional name for a model consists of the EDA name, the settings, and the language flag. For instance, MaxEntClassificationEDAModel_Base +TS_DE means a German model using the bag-of-words similarity, the bag-of-lemmas similarity and the tree skeleton similarity. The default value is usually the same as the configuration file name. For training, the model file should NOT exist; for testing, the path to the model file should be updated correctly.
eu.excitementproject.eop. core.MaxEntClassificationEDA trainDir The directory contains the training data. The data should be (linguistically) preprocessed and serialized into xmi files. The default value is ./target/EN/dev/. Notice that the EN indicates the language flag. The directory should exist.
eu.excitementproject.eop. core.MaxEntClassificationEDA testDir The directory contains the testing data. The data should be (linguistically) preprocessed and serialized into xmi files. The default value is ./target/EN/test/. Notice that the EN indicates the language flag. The directory should exist.
eu.excitementproject.eop. core.MaxEntClassificationEDA classifier The setting for the maximum entropy classifier. For the moment, there are two parameters supported, maximum iteration number and the cutoff threshold, which are separated by comma. The default value is 10000,1. N/A
eu.excitementproject.eop. core.MaxEntClassificationEDA Components The list of components used in the EDA, which are separated by comma. Notice that each of the components needs to have a separate section in the configuration file. Otherwise, there will be a ConfigurationException. N/A
BagOfWordsScoring N/A The bag-of-words scoring component. There is no further settings supported. The LAP should include a tokenizer, e.g., OpenNLPTaggerEN.
BagOfLemmasScoring N/A The bag-of-lemmas scoring component. There is no further settings supported. The LAP should include a tokenizer and a lemmatizer, e.g., TreeTaggerEN.
BagOfDepsScoring N/A The bag-of-dependencies (without POS tags) scoring component. There is no further settings supported. The LAP should include syntactic analysis, e.g., MaltParserEN.
BagOfDepsPosScoring N/A The bag-of-dependencies (with POS tags) scoring component. There is no further settings supported. The LAP should include syntactic analysis, e.g., MaltParserEN.
TreeSkeletonScoring N/A The tree skeleton scoring component. There is no further settings supported. The LAP should include syntactic analysis, e.g., MaltParserEN.

Specific settings for English

Notice that the English lexical resources, WordNet and VerbOcean, need to be properly installed in order to run the following configurations respectively.

Section Property Value Requirement
BagOfLexesScoring WordnetLexicalResource It indicates the usage of the WordNet. The value indicates the relations used separated by comma. The default value is the relations related to entailment, i.e., HYPERNYM, SYNONYM, PART_HOLONYM. There is a separate section for further settings. N/A
WordnetLexicalResource wordNetFilesPath The path to the location of WordNet. The default value is /ontologies/ EnglishWordNet-dict/. The path needs to be updated.
WordnetLexicalResource isCollapsed Whether to query the WordNet with all the selected relations together or separately. The default value is true. N/A
WordnetLexicalResource useFirstSenseOnlyLeft Whether to query the WordNet with only the first sense on the left hand side of the relation. The default value is false. N/A
WordnetLexicalResource useFirstSenseOnlyRight Whether to query the WordNet with only the first sense on the right hand side of the relation. The default value is false. N/A
BagOfLexesScoring VerbOceanLexicalResource It indicates the usage of the VerbOcean. The value indicates the relations used separated by comma. The default value is the relations related to entailment, i.e., StrongerThan, CanResultIn, Similar. There is a separate section for further settings. N/A
VerbOceanLexicalResource verbOceanFilePath The path to the location of VerbOcean. The default value is /VerbOcean/ verbocean.unrefined.2004-05-20.txt. The path needs to be updated.
VerbOceanLexicalResource isCollapsed Whether to query the VerbOcean with all the selected relations together or separately. The default value is true. N/A

Specific settings for German

Notice that the German lexical resources, GermaNet, DistSim, and DerivBase, need to be properly installed in order to run the following configurations respectively. In particular, GermaNet is not delivered with the EOP resources package. Further settings of the lexical resources can be found here.

Section Property Value Requirement
BagOfLexesScoring withPOS Whether the bag-of-lexes scoring component will include POS in the queries to the lexical resources. The default value is false. N/A
BagOfLexesScoring DerivBaseResource It indicates the usage of the German derivational resource. Further settings can be found here. It is only triggered when withPOS is turned on.
BagOfLexesScoring GermanDistSim Also called DewakDistributional. It indicates the usage of the German distributional similarity resource. Further information can be found here. N/A
BagOfLexesScoring GermaNetWrapper It indicates the usage of the GermaNet. The value indicates the relations used, separated by comma. The default value is the relations related to entailment, i.e., Causes, Entails, Has_Hypernym, Has_Synonym. Further settings can be found here. GermaNet should be properly installed and the path should be correctly specified. Can be used both with and without withPOS being turned on.
BagOfLexesScoring GermanTransDmResource It indicates the usage of the German transDM resource, a distributional similarity resource on a translated, syntactic space. Further information can be found here. Can be used both with and without withPOS being turned on.
Clone this wiki locally