Skip to content

Some practices using statistical machine learning technique based on some dataset. (notes and doing from scratch)

Notifications You must be signed in to change notification settings

idleyui/MachineLearningPractice

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

94 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine Learning Practice

Some practices using statistical machine learning technique based on some dataset.

To see more detail or example about deep learning, you can checkout my Deep Learning repository.

Environment

  • Using Python 3

(most of the relative path links are according to the repository root)

Dependencies

  • numpy: For low-level math operations
  • pandas: For data manipulation
  • sklearn - Scikit Learn: For evaluation metrics, some data preprocessing

For comparison purpose

  • sklearn: For machine learning models
  • cvxopt: For convex optimization problem (for SVM)

NLP related

  • gensim: Topic Modelling
  • hmmlearn: Hidden Markov Models in Python, with scikit-learn like API
  • jieba: Chinese text segementation library
  • pyHanLP: Chinese NLP library (Python API)
  • nltk: Natural Language Toolkit

Projects

Subject Technique / Task Dataset Solution Notes
Letter Recognition kNN / Classification Letter Recognition Datasets (File) kNN From Scratch, kNN Scikit Learn Notes
Page Blocks Classification Decision Tree / Classification Page Blocks Classification Data Set (File) Decision Tree (CART) From Scratch, Decision Tree Scikit Learn Notes
CSM Linear Regression / Regression CSM Dataset (2014 and 2015) (File) Linear Regression From Scratch, Linear Regression Scikit Learn Notes
Nursery Naive Bayes / Classification Nursery Data Set (File) Gaussian Naive Bayes From Scratch, Gaussian Naive Bayes Scikit Learn Notes
Post-Operative Patient SVM (cvxopt) / Binary Classification Post-Operative Patient Data Set (File, Simplified) SVM From Scratch (using cvxopt and simplified dataset), SVM Scikit Learn Notes
Student Performance AdaBoost / Classification Student Performance Data Set (File) AdaBoost From Scratch, AdaBoost Scikit Learn Notes
Sales Transactions k-Means / Clustering Sales Transactions Dataset Weekly (File) k-Means From Scratch, k-Means Scikit Learn Notes
Frequent Itemset Mining FP-Growth / Frequent Itemsets Mining Retail Market Basket Data Set (File) FP-Growth From Scratch Notes
Automobile PCA / Dimensionality Reduction Automobile Data Set (File) PCA From Scratch, PCA Scikit Learn Notes
Anonymous Microsoft Web Data SVD / Recommendation System Anonymous Microsoft Web Data Data Set (File, Ratings Matrix (by R)) SVD From Scratch, R Notebook - IBCF Recommender System Notes
Handwriting Digit SVM (SMO) / Binary & Multi-class Classification MNIST (File) Binary SVM From Scratch, Multi-class (OVR) SVM From Scratch Notes
Chinese Text Segmentation HMM (EM) / Text Segmentation & POS Tagging File HMM From Scratch, HMM hmmlearn, Compare with Jieba and HanLP -
Document Similarity and LSI VSM, SVD / LSI Corpus of the People's Daily (File) VSM From Scratch, VSM Gensim, SVD/LSI Gensim Notes

Machine Learning Categories

Consider the learning task

  • Surpervised Learning
    • Classification - Discrete
    • Regression - Continuous
  • Unsupervised Learning
    • Clustering - Discrete
    • Dimensionality Reduction - Continuous
    • Association Rule Learning
  • Semi-supervised Learning
  • Reinforcement Learning

Cosider the desired output of a ML system

Ensemble Method (Meta-algorithm)

  • Bagging
    • Random Forests
  • Boosting

Others

Heuristic Algorithm

General Case

Categorized

Specific Field

Machine Learning Mathematics

Topic

Categories

  • Linear Algebra
    • Orthogonality
    • Eigenvalues
    • Hessian Matrix
    • Quadratic Form
    • Markov Chain - HMM
  • Calculus
    • Multivariable Deratives
      • Quadratic Approximations
      • Lagrange Multipliers and Constrained Optimization - SVM SMO
      • Lagrange Duality
  • Probability and Statistics

Basics

  • Algebra
  • Trigonometry

Application

(from A to Z)

  • Decision Tree
    • Entropy
  • Naive Bayes
    • Bayes' Theorem
  • PCA
    • Orthogonal Transformations
    • Eigenvalues
  • SVD
    • Eigenvalues
  • SVM
    • Convex Optimization
    • Constrained Optimization
    • Lagrange Multipliers
    • Kernel

Books Recommendation

Machine Learning

Mathematics

  • Linear Algebra with Applications (Steven Leon)
  • Convex Optimization (Stephen Boyd & Lieven Vandenberghe)
  • Numerical Linear Algebra (L. Trefethen & D. Bau III)

Resources

Tutorial

Videos

Documentations

Interactive Learning

MOOC

Github

Datasets

Machine Learning Platform

Machine Learning Tool

About

Some practices using statistical machine learning technique based on some dataset. (notes and doing from scratch)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.3%
  • R 1.7%