Skip to content

pan-webis-de/ashraf19

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code
This branch is 2 commits behind omerjaved11:master.

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 

Results From Pan CLEF19 Test Datasets

Dataset lang type gender
1 es 0.8611 0.7556
1 en 0.9280 0.7652
2 es 0.8839 0.7261
2 es 0.9227 0.7583

Pan Author Identification (Bots and Gender Profiling)

Identify Author of text on bases of their stylometry and writing style.

Installation

Use the package manager pip to install foobar.

pip install -r requirments.txt

Usage

To train model

python train.py -i 'trainingdatapath'

python train.py -i '/input/train/data/'

To test model

python test.py -i 'testdatapath' -o 'outputpath'

python test.py -i '/input/test/data/'  -o '/output/'

Features Selected :

1. emoji_count -> Count all kind Kind of emojis
2. face_smiling -> Count 😀😃😄😁😆😅🤣😂🙂🙃😉😊😇
3. face_affection -> Count 🥰😍🤩😘😗☺😚😙
4. face_tongue -> Count 😋😛😜🤪😝🤑
5. face_hand -> Count 🤗🤭🤫🤔
6. face_neutral_skeptical -> Count 🤐🤨😐😑😶😏😒🙄😬🤥
7. face_concerned -> Count 😕😟🙁☹😮😯😲😳🥺😦😧😨😰😥😢😭😱😖😣😞
8. monkey_face -> Count 🙈🙉🙊
9. emotions -> Count 💋💌💘💝💖💗💓💞💕💟❣💔❤🧡💛💚💙💜🤎🖤'
10. url_count -> Count all kind of link/urls
11. space_count -> Spaces count
12. capital_count -> Capital letter count
13. text_length -> Total length of messge
14. curly_brackets_count -> Count { }
15. round_brackets_count -> Count ( )
16. underscore_count -> Count _
17. question_mark_count -> Count ?
18. exclamation_mark_count -> Count !
19. dollar_mark_count -> Count $
20. ampersand_mark_count -> Count &
21. hash_count -> Count #
22. tag_count -> Count @
23. slashes_count -> Count Slashes // / \
24. operator_count -> Count Operators +-*/%<>^|
25. punc_count -> Count Puntuations '",.:;`
26. line_count -> Count nextlines \n
27. word_count -> Count Words A-Za-z

Results for English Train Test Split Dataset:


Predict Bot / Human

Classifier Accuracy
'LogisticRegression' 0.9158576051779935
'RandomForestClassifier' 0.9757281553398058
'LinearSVC' 0.8770226537216829
'BernoulliNB' 0.9239482200647249
'MultinomialNB' 0.8236245954692557
'SVC' 0.5056634304207119

Best Model RandomForestClassifier

Author precision recall f1-score support
bot 0.98 0.97 0.98 622
human 0.97 0.98 0.98 614
micro avg 0.98 0.98 0.98 1236
macro avg 0.98 0.98 0.98 1236
weighted avg 0.98 0.98 0.98 1236

Predict Male / Female

Classifier Accuracy
'LogisticRegression' 0.7265372168284789
'RandomForestClassifier' 0.8106796116504854
'LinearSVC' 0.6019417475728155
'BernoulliNB' 0.616504854368932
'MultinomialNB' 0.616504854368932
'SVC' 0.4967637540453074

Best Model RandomForestClassifier

Gender precision recall f1-score support
female 0.79 0.85 0.82 311
male 0.83 0.77 0.80 307
micro avg 0.81 0.81 0.81 618
macro avg 0.81 0.81 0.81 618
weighted avg 0.81 0.81 0.81 618

Results for Spanish Train Test Split Dataset:


Predict Bot / Human

Classifier Accuracy
'LogisticRegression' 0.8433333333333334
'RandomForestClassifier' 0.9288888888888889
'LinearSVC' 0.7488888888888889
'BernoulliNB' 0.8188888888888889
'MultinomialNB' 0.7644444444444445
'SVC' 0.4888888888888889

Best Model RandomForestClassifier

Author precision recall f1-score support
bot 0.93 0.93 0.93 440
human 0.93 0.93 0.93 460
micro avg 0.93 0.93 0.93 900
macro avg 0.93 0.93 0.93 900
weighted avg 0.93 0.93 0.93 900

Predict Male / Female

Classifier Accuracy
'LogisticRegression' 0.6844444444444444
'RandomForestClassifier' 0.7844444444444445
'LinearSVC' 0.5666666666666667
'BernoulliNB' 0.6066666666666667
'MultinomialNB' 0.6355555555555555
'SVC' 0.48444444444444446

Best Model RandomForestClassifier

Gender precision recall f1-score support
female 0.77 0.83 0.80 232
male 0.80 0.74 0.77 218
micro avg 0.78 0.78 0.78 450
macro avg 0.79 0.78 0.78 450
weighted avg 0.79 0.78 0.78 450

About

CLEF 19 Author Profiling Using Stylometry approach

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%