Skip to content

shadowgamefly/osf_analysis

Repository files navigation

Data Analysis for OSF

This is the repository for data Analysis for OSF, github.

Prerequisite

To run the scripts here you will have following packages installed. A recommended package manager will be conda.

Packages (updating):

  • Python: 3.6
  • jupyter notebook: latest
  • matplotlib: latest
  • tensorflow: 1.*

environment.yml is also provided in the root directory. To quickly set up the environment, do

conda env create -f environment.yml
source activate osf

Analysis

Gender Analysis

gender_classifier is used for calculating the gender distribution of the active users in OSF.

Taxonomy Classification

I have built a CNN for text classification upon Kim's Convolutional Neural Networks for Sentence Classification and dennybritz's work.

Some major changes including allowing multiclass classification, adopting static word embeddings which was discussed in Kim's Paper, allowing unknown choices for classification given noisy data, etc

The training data includes more than 60000 paper abstracts in 10 categories based on the taxonomy and data crawled from Digital Commons Network, and the pre-trained word embeddings is GoogleNews-vectors-negative300 from word2vec

crawler is available here, which uses xlml to parse the page source and also has the functionality to continue crawling if halted without losing any data.

Consistency check

consistency_check is to find the consistency of a certain contributor regarding the categories of the projects he/she contributes to.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published