Skip to content

sourabhrohilla/ds-masterclass-hands-on

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ds-masterclass-hands-on

The folder

  1. session-1 : contains code for session on Intrusion Detection, divided into R and python code folders.
  2. session-2 : contains code for session on Text classification, divided into R and python code folders.

Public dropbox folder containing data, problem description, data dictionaries

https://goo.gl/PEug5P

For participants who will be using python

  1. Anaconda distribution for python : Go to https://www.continuum.io/downloads and download the latest Anaconda distribution. Please use Python 2.7 installation.

  2. Run conda install -c anaconda seaborn

  3. Run conda install -c glemaitre imbalanced-learn

  4. Install the libraries listed below using pip.

Steps to install a library in python.

  1. Go to terminal/command-prompt.
  2. Run pip install <library name>
  3. For instance, to install numpy, you’d run pip install numpy

List of libraries used in the hands-on session
Session 1 : Intrusion detection

  1. numpy
  2. pandas
  3. matplotlib
  4. seaborn
  5. sklearn
  6. imblearn
  7. xgboost

Session 2 : News articles recommender

  1. numpy
  2. pandas
  3. sklearn
  4. nltk 3.2.4
  5. Install nltk corpus and model:
    > import nltk
    > nltk.download('stopwords')
    > nltk.download('punkt')
    > nltk.download('maxent_ne_chunker')
    > nltk.download('averaged_perceptron_tagger')
    > nltk.download('words')   
    
  6. gensim 0.12.4
    conda install -c anaconda gensim
    

For participants who will be using R

  1. Set up R : Go to https://cran.rstudio.com/ and download R for your OS. Please download R version >=3.4.1
  2. Set up R Studio : Go to https://www.rstudio.com/products/rstudio/ and download open source version of RStudio Desktop.
  3. Install the libraries listed below.

Steps to install a library in RStudio

  1. Open RStudio.
  2. In the console, run install.packages(“<library name>”)
  3. For instance, to install ggplot2, you’d run install.packages(“ggplot2”)

List of libraries used in the hands-on session
Session 1 : Intrusion detection

  1. ggplot2
  2. randomForest
  3. caret
  4. rpart
  5. plyr
  6. gbm
  7. rpart.plot
  8. reshape2
  9. naivebayes
  10. corrplot
  11. e1071

Session 2 : News articles recommender

  1. tm
  2. topicmodels
  3. lda
  4. MASS
  5. devtools
  6. NLP
  7. R.utils
  8. stringdist
  9. dplyr
  10. openNLP
  11. rjava
  12. NLP
  13. openNLP
  14. RWeka
  15. qdap
  16. magrittr
  17. openNLPmodels.en
  18. data.table
  19. text2vec

Note If any issues with Rjava, make sure you have JDK and JRE installed on your system.

For Windows: http://docs.oracle.com/javase/7/docs/webnotes/install/windows/jdk-installation-windows.html

For Linux: https://github.com/hannarud/r-best-practices/wiki/Installing-RJava-(Ubuntu)

If you are not able to setup your machine, please send an email to sourabh@tatrasdata.com

About

Contains code for masterclass hands-on session

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •