# Maths references

## Linear algebra

In ML, Linear Algebra comes up everywhere. Topics such as: 
* Principal Component Analysis (PCA), 
* Singular Value Decomposition (SVD), 
* Eigendecomposition of a matrix, 
* LU Decomposition, 
* QR Decomposition/Factorization, 
* Symmetric Matrices, 
* Orthogonalization & Orthonormalization, 
* Matrix Operations, 
* Projections, 
* Eigenvalues & Eigenvectors, 
* Vector Spaces 
* Norms 

are needed for understanding the optimization methods used for ML. 

Links:
* Linear Algebra course is the one offered by MIT Courseware ([Prof. Gilbert Strang](https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/)).
* https://www.khanacademy.org/math/linear-algebra
* [Coding the Matrix: Linear Algebra through Computer Science Applications](http://codingthematrix.com/) by Philip Klein, Brown University
* [Linear Algebra — Foundations to Frontiers](https://www.edx.org/course/linear-algebra-foundations-to-frontiers) by Robert van de Geijn, University of Texas
* Applications of Linear Algebra, [Part 1](https://www.edx.org/course/applications-of-linear-algebra-part-1) and [Part 2](https://www.edx.org/course/applications-of-linear-algebra-part-2). A newer course by Tim Chartier, Davidson College

### [Sparse matrix](https://en.wikipedia.org/wiki/Sparse_matrix)
In numerical analysis and scientific computing, a **sparse matrix** or sparse array is a matrix in which most of the elements are zero. <br>By contrast, if most of the elements are nonzero, then the matrix is considered **dense**. The number of zero-valued elements divided by the total number of elements (e.g., m × n for an m × n matrix) is called the sparsity of the matrix (which is equal to 1 minus the density of the matrix). Using those definitions, a matrix will be sparse when its sparsity is greater than 0.5.


## Probability theory & statistics

Machine Learning and Statistics aren’t very different fields. Actually, someone recently defined Machine Learning as ‘doing statistics on a Mac’. Some of the fundamental Statistical and Probability Theory needed for ML are: 
* Combinatorics, 
* Probability Rules & Axioms, 
* Bayes’ Theorem, 
* Random Variables, 
* Variance and Expectation, 
* Conditional and Joint Distributions, 
* Standard Distributions (Bernoulli, Binomial, Multinomial, Uniform and Gaussian), 
* Moment Generating Functions, 
* Maximum Likelihood Estimation (MLE), 
* Prior and Posterior, 
* Maximum a Posteriori Estimation (MAP) 
* Sampling Methods.

Links:
* https://www.khanacademy.org/math/probability
* [Statistics 110: Probability](https://projects.iq.harvard.edu/stat110/youtube) by Joe Blitzstein
* [All of Statistics: A Concise Course in Statistical Inference](http://read.pudn.com/downloads158/ebook/702714/Larry%20Wasserman_ALL%20OF%20Statistics.pdf)
* [Udacity’s Introduction to Statistics](https://www.udacity.com/course/intro-to-statistics--st101)

## Multivaraite calculus

Some of the necessary topics include: 
* Differential and Integral Calculus, 
* Partial Derivatives, 
* Vector-Values Functions, 
* Directional Gradient, 
* Hessian, 
* Jacobian, 
* Laplacian 
* Lagragian Distribution.

Links:
* https://www.khanacademy.org/math/multivariable-calculus

## Algortihms & complexity

This is important for understanding the computational efficiency and scalability of our Machine Learning Algorithm and for exploiting sparsity in our datasets. Knowledge of: 
* data structures (Binary Trees, Hashing, Heap, Stack etc), 
* Dynamic Programming, 
* Randomized & Sublinear Algorithm, 
* Graphs, 
* Gradient/Stochastic Descents 
* Primal-Dual methods 

are needed.

Links:
* https://www.khanacademy.org/math/ap-calculus-ab/ab-diff-analytical-applications-new/ab-5-11/e/optimization

This comprises of other Math topics not covered in the four major areas described above. They include Real and Complex Analysis (Sets and Sequences, Topology, Metric Spaces, Single-Valued and Continuous Functions, Limits, Cauchy Kernel, Fourier Transforms), Information Theory (Entropy, Information Gain), Function Spaces and Manifolds.

Links:
* [Boyd and Vandenberghe’s course on Convex optimization](http://stanford.edu/~boyd/cvxbook/) from Stanford
* 

## Misc notes

**The real prerequisite for machine learning isn’t math, it’s data analysis** [link](https://www.r-bloggers.com/the-real-prerequisite-for-machine-learning-isnt-math-its-data-analysis/) 
* “Off the shelf” tools take care of the math for you; 
* Most data scientists don’t do much math; 
* 80% of your work will be data preparation, EDA, and visualization;
* For beginning practitioners, data hacking beats math

# Ideas

## Normalizing Flows

Links:
* http://akosiorek.github.io/ml/2018/04/03/norm_flows.html

Machine learning is all about probability. To train a model, we typically tune its parameters to maximise the probability of the training dataset under the model. To do so, we have to assume some probability distribution as the output of our model. The two distributions most commonly used are [Categorical](https://en.wikipedia.org/wiki/Categorical_distribution) for classification and [Gaussian](https://en.wikipedia.org/wiki/Normal_distribution) for regression. The latter case can be problematic, as the true probability density function (pdf) of real data is often far from Gaussian. If we use the Gaussian as likelihood for image-generation models, we end up with blurry reconstructions. We can circumvent this issue by adversarial training, which is an example of likelihood-free inference, but this approach has its own issues.

# Scikit-learn


Scikit-learn is a free software machine learning *library* for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.

[github link](https://github.com/scikit-learn/scikit-learn)


# ML

## ML Pipelines

A **Pipeline** is specified as a sequence of stages, and each stage is either a Transformer or an Estimator. These stages are run in order, and the input DataFrame is transformed as it passes through each stage.

A pipeline consists of a sequence of stages. There are two basic types of pipeline stages: 
* Transformer - A Transformer takes a dataset as input and produces an augmented dataset as output. E.g., a tokenizer is a Transformer that transforms a dataset with text into an dataset with tokenized words ![](https://spark.apache.org/docs/latest/img/ml-PipelineModel.png)
* Estimator - An Estimator must be first fit on the input dataset to produce a model, which is a Transformer that transforms the input dataset. E.g., logistic regression is an Estimator that trains on a dataset with labels and features and produces a logistic regression model ![](https://spark.apache.org/docs/latest/img/ml-Pipeline.png)

