Skip to content

CLEF 19 Author Profiling Using Stylometry approach

Notifications You must be signed in to change notification settings

pan-webis-de/ashraf19

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Results From Pan CLEF19 Test Datasets

Dataset lang type gender
1 es 0.8611 0.7556
1 en 0.9280 0.7652
2 es 0.8839 0.7261
2 es 0.9227 0.7583

Pan Author Identification (Bots and Gender Profiling)

Identify Author of text on bases of their stylometry and writing style.

Installation

Use the package manager pip to install foobar.

pip install -r requirments.txt

Usage

To train model

python train.py -i 'trainingdatapath'

python train.py -i '/input/train/data/'

To test model

python test.py -i 'testdatapath' -o 'outputpath'

python test.py -i '/input/test/data/'  -o '/output/'

Features Selected :

1. emoji_count -> Count all kind Kind of emojis
2. face_smiling -> Count 😀😃😄😁😆😅🤣😂🙂🙃😉😊😇
3. face_affection -> Count 🥰😍🤩😘😗☺😚😙
4. face_tongue -> Count 😋😛😜🤪😝🤑
5. face_hand -> Count 🤗🤭🤫🤔
6. face_neutral_skeptical -> Count 🤐🤨😐😑😶😏😒🙄😬🤥
7. face_concerned -> Count 😕😟🙁☹😮😯😲😳🥺😦😧😨😰😥😢😭😱😖😣😞
8. monkey_face -> Count 🙈🙉🙊
9. emotions -> Count 💋💌💘💝💖💗💓💞💕💟❣💔❤🧡💛💚💙💜🤎🖤'
10. url_count -> Count all kind of link/urls
11. space_count -> Spaces count
12. capital_count -> Capital letter count
13. text_length -> Total length of messge
14. curly_brackets_count -> Count { }
15. round_brackets_count -> Count ( )
16. underscore_count -> Count _
17. question_mark_count -> Count ?
18. exclamation_mark_count -> Count !
19. dollar_mark_count -> Count $
20. ampersand_mark_count -> Count &
21. hash_count -> Count #
22. tag_count -> Count @
23. slashes_count -> Count Slashes // / \
24. operator_count -> Count Operators +-*/%<>^|
25. punc_count -> Count Puntuations '",.:;`
26. line_count -> Count nextlines \n
27. word_count -> Count Words A-Za-z

Results for English Train Test Split Dataset:


Predict Bot / Human

Classifier Accuracy
'LogisticRegression' 0.9158576051779935
'RandomForestClassifier' 0.9757281553398058
'LinearSVC' 0.8770226537216829
'BernoulliNB' 0.9239482200647249
'MultinomialNB' 0.8236245954692557
'SVC' 0.5056634304207119

Best Model RandomForestClassifier

Author precision recall f1-score support
bot 0.98 0.97 0.98 622
human 0.97 0.98 0.98 614
micro avg 0.98 0.98 0.98 1236
macro avg 0.98 0.98 0.98 1236
weighted avg 0.98 0.98 0.98 1236

Predict Male / Female

Classifier Accuracy
'LogisticRegression' 0.7265372168284789
'RandomForestClassifier' 0.8106796116504854
'LinearSVC' 0.6019417475728155
'BernoulliNB' 0.616504854368932
'MultinomialNB' 0.616504854368932
'SVC' 0.4967637540453074

Best Model RandomForestClassifier

Gender precision recall f1-score support
female 0.79 0.85 0.82 311
male 0.83 0.77 0.80 307
micro avg 0.81 0.81 0.81 618
macro avg 0.81 0.81 0.81 618
weighted avg 0.81 0.81 0.81 618

Results for Spanish Train Test Split Dataset:


Predict Bot / Human

Classifier Accuracy
'LogisticRegression' 0.8433333333333334
'RandomForestClassifier' 0.9288888888888889
'LinearSVC' 0.7488888888888889
'BernoulliNB' 0.8188888888888889
'MultinomialNB' 0.7644444444444445
'SVC' 0.4888888888888889

Best Model RandomForestClassifier

Author precision recall f1-score support
bot 0.93 0.93 0.93 440
human 0.93 0.93 0.93 460
micro avg 0.93 0.93 0.93 900
macro avg 0.93 0.93 0.93 900
weighted avg 0.93 0.93 0.93 900

Predict Male / Female

Classifier Accuracy
'LogisticRegression' 0.6844444444444444
'RandomForestClassifier' 0.7844444444444445
'LinearSVC' 0.5666666666666667
'BernoulliNB' 0.6066666666666667
'MultinomialNB' 0.6355555555555555
'SVC' 0.48444444444444446

Best Model RandomForestClassifier

Gender precision recall f1-score support
female 0.77 0.83 0.80 232
male 0.80 0.74 0.77 218
micro avg 0.78 0.78 0.78 450
macro avg 0.79 0.78 0.78 450
weighted avg 0.79 0.78 0.78 450

About

CLEF 19 Author Profiling Using Stylometry approach

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%