# Machine Learning

*Yuriy Sverchkov*

**BMI 773 Clinical Research Informatics**

March 23, 2020

## Lecture Goals

* Be able to define Machine Learning
* List the most common tasks that machine learning is used to solve
* Be able to distinguish between a supervised and unsupervised task
* Describe how some widely-used machine learning methods work
* Describe how machine learning methods are trained and evaluated

## What is Machine Learning

## Machine Learning Tasks
* Classification
* Regression
* Clustering

## The ML workflow for supervised classification/regression

1. Have a question about data of the form "can some set of measurements/observations predict an outcome?"
2. Turn your measurements and observations into numbers (some are numbers already, categorical variables should usually be 1-hot)
3. Turn your outome into a number (this is where it becomes clear if we're doing classification or regression)
4. Fit a mathematical model on some known outcomes
5. Test if it works
6. You now have a way of predicting the outcome from your measurements

In [None]:
# Where does training and testing vs fitting fit?

## Logistic Regression

Most of you are probably familiar with logistic regression in one context or another.

### Examples
(todo)

In [10]:
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression


data = load_breast_cancer()

print(data.DESCR)




.. _breast_cancer_dataset:

Breast cancer wisconsin (diagnostic) dataset
--------------------------------------------

**Data Set Characteristics:**

    :Number of Instances: 569

    :Number of Attributes: 30 numeric, predictive attributes and the class

    :Attribute Information:
        - radius (mean of distances from center to points on the perimeter)
        - texture (standard deviation of gray-scale values)
        - perimeter
        - area
        - smoothness (local variation in radius lengths)
        - compactness (perimeter^2 / area - 1.0)
        - concavity (severity of concave portions of the contour)
        - concave points (number of concave portions of the contour)
        - symmetry 
        - fractal dimension ("coastline approximation" - 1)

        The mean, standard error, and "worst" or largest (mean of the three
        largest values) of these features were computed for each image,
        resulting in 30 features.  For instance, field 3 is Mean Radius, f

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

In [20]:
lr1 = LogisticRegression(penalty='l1', solver='liblinear', max_iter = 1000).fit(data.data, data.target)

NameError: name 'np' is not defined

In [34]:
import numpy as np
order = (-np.abs(lr.coef_)).argsort()
top2f = order[0,0:2]

In [37]:
lr = LogisticRegression(penalty='none').fit(data.data[:,top2f], data.target)
lr.intercept_

array([18.69375933])

### Mathematical notation

The model that is learned in logistic regression has the form:

$logit(p) = \beta x$

Note the simple structure of the model: features are multiplied by weights and summed, then, a function is applied to transform the result.

Let's look at the logit function more closely:

In [2]:
# Plot logit here

This function has an unbounded domain (goes all the way fron $-\infty$ to $+\infty$) and a range from 0 to 1.

Note that it is also *monotonically increasing*, that is, if $a > b$ then $logit(a) > logit(b)$.

This property makes it particularly easy to interpret what the weights mean: higher positive weights mean that x correlates with y, weights near 0 mean that x does not affect y much, and hight negative weights 

### LR as optimization
The task of learning a logistic regression can be viewed as an *optimization problem*.

**Optimization** is the act? study? (TODO:look up) of finding the maximum or minimum of some function.

In LR, we want to find $\beta$ that minimizes
$$ todo formula here $$
This is also called the *cross-entropy*.

Most machine learning model learning can be viewed as an optimization problem where the task is to find some parameters that minimize a cost function.

In [1]:
# 3d plot of logistic regression on 2 variables --
# different angles show (a) the shape of the logit function
# (b) the linear separator
# (c) the x-beta line.

## Artificial Neural Networks

Artificial neural networks have seen much success in recent years.

[find examples]

### The structure of a neural network

[image]

### LR as a neural network

In logistic regression we have our input vector $x$, the inputs are multiplied by weights $\beta$, the result is transformed by a nonlinear function (the expit).
This can be expressed as a neural network with no hidden layers:

[image]

### ANN as optimization

Like with LR, learning an ANN is an optimization task:

* cost function

### A second look at examples?

### (optional) Network structure tailored to the task

* Hierarchical networks
* CNN
* RNN (very maybe)

## Decision trees

[image of a decision tree]

What is the optimization problem?

A greedy optimisation strategy (greedy strategies are not always perfect)



## ML pitfalls

- examples from recent lecture