Skip to content

Latest commit

 

History

History
37 lines (32 loc) · 1.89 KB

File metadata and controls

37 lines (32 loc) · 1.89 KB

Data Tools

Pandas

  • A tabular data manipulation library.
  • Content in the pandas/ directory
    • examples.py contains some things you can do in pandas
    • exercise.py contains some practice problems exercising techniques from examples.py
    • solutions.py contains solutions from exercise.py

Numpy

  • A numerical array library.
  • General Reference
  • Reference for generating statistical distributions
  • Content in the numpy/ directory
    • examples.py contains some things you can do numpy
    • exercise.py contains some practice problems exercising techniques from examples.py
    • solutions.py contains solutions from exercise.py

Matplotlib

  • A plotting library.
  • Pyplot reference.
  • Content in the matplotlib/ directory
    • examples.py shows some plots you can make with matplotlib
    • exercise.py contains a difficult matplotlib plot
    • solutions.py contains an example implementation for the exercise

Scikit-Learn (and some Scipy)

  • A machine learning library.
  • We will learn about clustering algorithms. In particular, the K-means algorithm and the Gaussian Mixed Model algorithm.
  • Content in the scikit_learn/ directory
    • examples.py contains some unsupervised clustering algorithm examples
  • Another resource containing the same examples.

Data Cleaning Project

  • An example exercise for data cleanup and normaliztion. This is a very common problem.