Multi-Level Feature Distribution Detection

In this project I am trying to find out the best distribution that describes most of linguistic features in social media in different levels of analysis such as county, user, and message level. For that, I go through two different regimes

Unsupervised
Supervised

In "unsupervised" section, we use statistical testing methods to find the mostly confident distribution that best describes our feature empirical distribution.

In "supervised" section, we use a different distributions as a prior in a NaiveBayes classifier to predict a label like gender or sentiment and we see that which of these distributions give us the best accuracy.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
README.md		README.md
Untitled.ipynb		Untitled.ipynb
distributions_fit_and_likelihood.ipynb		distributions_fit_and_likelihood.ipynb
faster_naiveBayes_loaddata_main.ipynb		faster_naiveBayes_loaddata_main.ipynb
find_best_distribution.ipynb		find_best_distribution.ipynb
ks_anderson_cdfs.ipynb		ks_anderson_cdfs.ipynb
ks_anderson_cdfs_Optimized-addedNegs.ipynb		ks_anderson_cdfs_Optimized-addedNegs.ipynb
ks_anderson_cdfs_Optimized.ipynb		ks_anderson_cdfs_Optimized.ipynb
kstest_find_best_distribution.ipynb		kstest_find_best_distribution.ipynb
largeDataVersion_find_best_distribution.ipynb		largeDataVersion_find_best_distribution.ipynb
likelihood_2randsets_confidInterval.ipynb		likelihood_2randsets_confidInterval.ipynb
likelihood_2randsets_confidInterval_zid.ipynb		likelihood_2randsets_confidInterval_zid.ipynb
likelihood_2randsets_optimized_permutationTest.ipynb		likelihood_2randsets_optimized_permutationTest.ipynb
likelihood_2randsets_optimized_user_LIWC.ipynb		likelihood_2randsets_optimized_user_LIWC.ipynb
message_distribution.ipynb		message_distribution.ipynb
naiveBayes_loaddata_main.ipynb		naiveBayes_loaddata_main.ipynb
plot_multilevel_distribution.ipynb		plot_multilevel_distribution.ipynb

fataltes/feature_distribution

Folders and files

Latest commit

History

Repository files navigation

Multi-Level Feature Distribution Detection

About

Resources

Stars

Watchers

Forks

Languages