NCSR Demokritos submission to Pan 2016.

This work is based on last year's submission for PAN15..

Installation:

Dataset:

In order to run the examples you will need to download the corpus for the author profiling task from the PAN website:

http://pan.webis.de/clef16/pan16-web/author-profiling.html

Requirements:

Install the requirements

pip install -r requirements.txt

Module:

You can also install the module if you would like to check it out from ipython. git clone this project cd projectfolder pip install --user .

Package consists of a python module and scripts for:

crossvalidating
training
testing models on the PAN 2016 dataset.

Example usage:

python tesst.py -i pan16-author-profiling-training-dataset/pan16-author-profiling-training-dataset-english/ -s 0.2 # for train/test splitting
python cross.py -i pan16-author-profiling-training-dataset/pan16-author-profiling-training-dataset-english/

Configuration:

Configuration follows the same conventions used for PAN15 submission. In the config folder is a toy setup of the configuration for pangram. It is based on the YAML format.

Settings currently configurable are:

Pan dataset settings for each language
Feature groupings, preprocessing for each feature group, and classifier settings

In config/languages there is a file for each language which specifies where each attribute to be predicted is in the truth file that contains the label for the training set. For each of these attributes, you can set a file that contains the feature grouping and preprocessing settings. In the example provided the mapping is the same for each language, but this need not be the case.

In config/features the settings for each feature group can be found. The format is in the form label of:

label of feature group

feature extractor 1

feature extractor 2

.. preprocessing : label: label this so that it doesn't get computed twice if it has been defined elsewhere pipe: - method 1 - method 2 - ... In the above snippet, feature extractor names are expected to be defined in pan/features.py. Similarly, the above methods are expected to be defined in pan/preprocess.py and process a mutable iterable in place. (in our case a list of texts)

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.ipynb_checkpoints		.ipynb_checkpoints
English_Function_Words_Set		English_Function_Words_Set
PRES_RES		PRES_RES
comb_res		comb_res
config		config
models		models
pan		pan
results		results
results2		results2
Ensemble_Tests.ipynb		Ensemble_Tests.ipynb
Entropy_Features.ipynb		Entropy_Features.ipynb
Features.ipynb		Features.ipynb
LICENCE.txt		LICENCE.txt
Presentation_Tests.ipynb		Presentation_Tests.ipynb
README.md		README.md
Results.png		Results.png
SOAC_tests.ipynb		SOAC_tests.ipynb
Untitled.ipynb		Untitled.ipynb
cross.py		cross.py
cross_txt.py		cross_txt.py
example.log		example.log
iris.dot		iris.dot
iris.pdf		iris.pdf
iris_out.pdf		iris_out.pdf
json.json		json.json
meta-cross.py		meta-cross.py
meta-test.py		meta-test.py
meta.py		meta.py
out.profile		out.profile
profiler_scirpt.py		profiler_scirpt.py
requirements.txt		requirements.txt
tesst.py		tesst.py
test.py		test.py
test1.png		test1.png
test_plot_classif_report.png		test_plot_classif_report.png
train.py		train.py
unigram_tests.ipynb		unigram_tests.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NCSR Demokritos submission to Pan 2016.

Installation:

Dataset:

Requirements:

Module:

Example usage:

Configuration:

License

About

Releases

Packages

Languages

License

kbogas/authorProfPAN16

Folders and files

Latest commit

History

Repository files navigation

NCSR Demokritos submission to Pan 2016.

Installation:

Dataset:

Requirements:

Module:

Example usage:

Configuration:

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages