Skip to content

nuitrcs/Predictive-Modeling-with-Scikit-learn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Northwestern Research Computing Predictive Modeling with Scikit-learn Workshop

General Info

General information about RCS Python Workshops can be found in the Python Workshops Repository. This includes information about software installations and general Python resources.

Prep

Please install Anaconda as it comes with everything that we need in this workshop. We will work from Jupyter Notebook!

Python Installation Instructions

Downloading Files

Recommended: Entire directory

You can download all of the files by clicking the green button above and choosing "Download ZIP."

Individual Files

If you download files from the links above, you have to click through to the RAW version of the notebook and download that. If you download directly from the links above, the files won't open because they are web pages, not the raw files.

On a Mac, to open the files in Jupyter Notebook, start Jupyter Notebook from the folder where you saved the files. On Windows, navigate to the directory within Jupyter Notebook.

Predictive Modeling with Scikit-learn Workshop Overview

Objective of the workshop

To know how to use the main algorithms needed for predictive modeling with python/scikit-learn.

Learning outcomes

  • How do I use ScikitLearn to choose which attributes of my data to include in the model?
  • How could ScikitLearn help me choose which model to use?
  • How do I optimize this model for best performance?
  • How do I ensure that I'm building a model that will generalize to unseen data?
  • Can I estimate how well my model is likely to perform on unseen data?

Schedule

To achieve the objectives and get the above outcomes we divided the material into 8 sections. In each section we introduce the concepts, explain how to use them in scikit-learn and practice what we learned.

First day the following notebooks are used:

Second day the following notebooks are used:

Resources

General scikit-learn resources and more specific tutorials that cover multiple topics can be found on Scikit-Learn Website.

Additional Predictive Modeling-specific resources include:

Data Science Central - A great online group of data science enthusiasts where you can find everything related to machine learning, predictive modeling, data science and more.

KDnuggets - A great source of news anything ML and Data Science.

Coursera, edx, udacity courses. I would strongly recommend Andrew Ng's machine learning courses.

Kaggle should be your home for data science, the most well known for the data science competitions organized regularly.

For preparing notebooks in this workshop I used notebooks from Scikit-Learn official Tutorials and workshops and from the following resources:

https://stats.stackexchange.com/questions/10289/whats-the-difference-between-normalization-and-standardization

https://github.com/ogrisel/scipy-2018-sklearn

https://scikit-learn.org/stable/modules/impute.html

https://www.geeksforgeeks.org/regression-classification-supervised-machine-learning/

https://ekababisong.org/gcp-ml-seminar/scikit-learn/

https://www.analyticsvidhya.com/blog/2016/07/practical-guide-data-preprocessing-python-scikit-learn/

http://www.biostat.washington.edu/~dwitten/

https://www.quora.com/Which-machine-algorithms-require-data-scaling-normalization

Domain Specific Contents

Looking for particular domains that participants needed I found the following interesting blogs and remembered to post some links:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published