Design and Implementation of a Methodology for the Automatic Identification of the User Geographic Idiom in a Social Media Text Corpus

Abstract

The rapid spread of social media creates more and more issues to investigate and study the scientific community. The sheer volume of information in itself is a challenge in terms of management. The information organized by topic, author, age, gender and geographical origin are examples of problems seeking solution.

Purpose of this project is the development of a methodology for the automatic recognition of regionalization idiom of the author through the corpus of social media. Initially referring to the fields of text classification, knowledge extraction from text and author recognition. Then proceed to the collection of data coming from social media networking namely by users for whom know their origin. Once the collection of text then becomes pretreatment and annotation of text in order to make feature extraction. The feature extraction is based on linguistic elements but also on idioms that betray the geographical origin of the author. Finally we perform classification experiments using several classification algorithms, comparing and evaluating the results as we receive.

Used Tools/Technologies

Python 2.7
NLTK (Natural Language Processing Toolkit)
Weka
Facepager

Having in mind

Text corpus is not available in this repository. Also all details are available in thesis-report.pdf in Greek language and the implementation (code for feature extraction) is available under the /dev subdirectory.

Author

Simakis Panagiotis simakis@autistici.org

Licence

GNU GENERAL PUBLIC LICENSE

Version 3, 29 June 2007

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
dev		dev
presentation_el		presentation_el
LICENSE.md		LICENSE.md
README.md		README.md
_config.yml		_config.yml
thesis-report.pdf		thesis-report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Design and Implementation of a Methodology for the Automatic Identification of the User Geographic Idiom in a Social Media Text Corpus

Abstract

Used Tools/Technologies

Having in mind

Author

Licence

About

Languages

License

sp1thas/ceid-thesis

Folders and files

Latest commit

History

Repository files navigation

Design and Implementation of a Methodology for the Automatic Identification of the User Geographic Idiom in a Social Media Text Corpus

Abstract

Used Tools/Technologies

Having in mind

Author

Licence

About

Topics

Resources

License

Stars

Watchers

Forks

Languages