GitHub - pazzini/svm: filtragem de conteudo utilizando svm

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
users		users
.gitignore		.gitignore
Makefile		Makefile
README		README
README_SVM		README_SVM
get_bests_results.py		get_bests_results.py
libsvm.so.2		libsvm.so.2
media_w.py		media_w.py
methods.py		methods.py
open_all_medias.py		open_all_medias.py
propotion.txt		propotion.txt
propotion_finder.py		propotion_finder.py
run_media.py		run_media.py
run_svm.py		run_svm.py
stop_words.txt		stop_words.txt
svm.py		svm.py
svm_class.py		svm_class.py
svm_codes.py		svm_codes.py
svmutil.py		svmutil.py
tweet.py		tweet.py
tweet_list.py		tweet_list.py

Repository files navigation

-f: features to be done
-p: proportion of data division. Example: -p 50, half of data will
be used on learning and half on tests
-v: cross validation. Create v parts of data of same size and use -v 1
on train and test on last one, all parts are tested one time. -v max use
the size of data, training with data -1 and testing on the last one.
-t: number of times the process will be done
-r: the change of w on each time of process
-w1: the initial value of w1
-w-1: the initial value of w-1
-d: delete all files of input user
-fm: run the fast mode of cross-validation
-ft: load the first db as training base and the second db as test base,
if that method is being used the cross-validation method can't be used.

It can be runned by command line, like this:
python svm_codes.py users/user2.xml -f 0,2,7,8,12,13,14,17,18,22,25,27,29,30,31 -d -w1 98.2 -w-1 1.8 -r 0.1 -t 1 -v max -fm
python svm_codes.py users/user1.xml -f 0,2,7,8,12,13,14,17,18,22,25,27,29,30,31 
python svm_codes.py users/user2.xml -ft users/user4.xml -f 0,2,7,8,12,13,14,17,18,22,25,27,29,30,31 -d -w1 98.2 -w-1 1.8 -r 0.1 -t 1

The run_svm.py is used to run several times the svm_codes.py

"text", #0
"status_created_at", #1
"source", #2
"in_reply_to_twitter_status_id", #3
"in_reply_to_twitter_user_id", #4
"in_reply_to_twitter_user_screen_name", #5
"retweeted_twitter_status_id", #6
"retweet_count", #7
"retweeted", #8
"geo", #9
"contributors", #10
"name", #11
"screen_name", #12
"location", #13
"description", #14
"url",15
"protected", #16
"followers_count", #17
"friends_count", #18
"favourites_count", #19
"user_created_at", #20
"utc_offset", #21
"time_zone", #22
"geo_enabled", #23
"statuses_count", #24
"lang", #25
"contributors_enabled", #26
"listed_count", #27
"is_translator", #28
"favorited", #29
"entity_user_mention", #30
"entity_hashtag", #31
"entity_url", #32
"manual_classification" #33