Data Analysis for OSF

This is the repository for data Analysis for OSF, github.

Prerequisite

To run the scripts here you will have following packages installed. A recommended package manager will be conda.

Packages (updating):

Python: 3.6
jupyter notebook: latest
matplotlib: latest
tensorflow: 1.*

environment.yml is also provided in the root directory. To quickly set up the environment, do

conda env create -f environment.yml
source activate osf

Analysis

Gender Analysis

gender_classifier is used for calculating the gender distribution of the active users in OSF.

Taxonomy Classification

I have built a CNN for text classification upon Kim's Convolutional Neural Networks for Sentence Classification and dennybritz's work.

Some major changes including allowing multiclass classification, adopting static word embeddings which was discussed in Kim's Paper, allowing unknown choices for classification given noisy data, etc

The training data includes more than 60000 paper abstracts in 10 categories based on the taxonomy and data crawled from Digital Commons Network, and the pre-trained word embeddings is GoogleNews-vectors-negative300 from word2vec

crawler is available here, which uses xlml to parse the page source and also has the functionality to continue crawling if halted without losing any data.

Consistency check

consistency_check is to find the consistency of a certain contributor regarding the categories of the projects he/she contributes to.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
cnn		cnn
crawler		crawler
data		data
util		util
.gitignore		.gitignore
LICENSE		LICENSE
Readme.md		Readme.md
consistency_check.ipynb		consistency_check.ipynb
environment.yml		environment.yml
gender.pickle		gender.pickle
gender_classifier.ipynb		gender_classifier.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Analysis for OSF

Prerequisite

Analysis

Gender Analysis

Taxonomy Classification

Consistency check

About

Uh oh!

Releases

Packages

Languages

License

shadowgamefly/osf_analysis

Folders and files

Latest commit

History

Repository files navigation

Data Analysis for OSF

Prerequisite

Analysis

Gender Analysis

Taxonomy Classification

Consistency check

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages