Skip to content

This is a repository to practice with some public data sets. Part of the Machine Learning Meet Up

Notifications You must be signed in to change notification settings

machine-learning-study-group/machine_learning_notes

 
 

Repository files navigation

machine_learning_notes

As part of Machine Learning Study Group from recworks meet-a-mentor community work on kaggle ML project.

We use Jupyter Notebook, Python and some libraries (Pandas, NumPy, Matplotlib, and Scikit-learn) to solve ML problems on public datasets. We have started with the following dataset from Kaggle:

https://www.kaggle.com/c/house-prices-advanced-regression-techniques

This project is the result of the joint effort of a study group of enthusiastic people from different professional backgrounds. Anyone is welcome to join and to contribute with the discussions.

The best way to join the project is by assisting during our regular sessions. Also, feel free to fork and check what we have done in this project. We are very happy to receive any comment and suggestions to improve.

There are two notebooks contained in the repository:

1) machine_learning_notes.ipynb:

This notebook contains:

  • 1 - Define the problem

  • 2 - Load data and displaying info

  • 3 - Prepare Data

    • [Identify features]
      • Separate numerical from categorical features
      • Separate nominal and ordinal (from categorical features)
    • [Clean data]
      • Remove numerical features with missing values
      • Remove categorical features with missing values
      • drop outliers in numerical values # WIP
    • [transform]
      • transform categorical values #TODO
  • 3 - Feature selection

    • [Select features using random forest classifier]
  • 5 - Spot Check Algorithms

    • [split dataset]
    • [train on multiple algorithms]

2) plot_outlier_detection.ipynb

This notebook contains a reference for some techniques to tackle the problem to detect and remove outliers sample from the dataset.

About

This is a repository to practice with some public data sets. Part of the Machine Learning Meet Up

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages

  • Jupyter Notebook 96.7%
  • Python 3.3%