# Introduction

Here we present a brief introduction to __machine learning__.

To start, we give a definition of machine learning

> __Well-posed Learning Problem__: A computer program is said to _learn_ from experience $E$ with respect to some task $T$ and some performance measure $P$, if its performance on $T$, as measured by $P$, improves with experience $E$.

In other words, we say a computer learns to perform a task thanks to the experience.

Suppose your email program watches which emails you do or do not mark as spam, and based on that learns how to better filter spam. What is the task T in this setting?

* Classifying emails as spam or not spam.
* Watching you label emails as spam or not spam.
* The number (or fraction) of emails correctly classified as spam/not spam.
* None of the above—this is not a machine learning problem.

## Machine Learning algoritms

Machine Learning algorithm can be split in two big classes:

1. Supervised learning
2. Unsupervised learning

There are others (recommender systems, reinforcement learning, etc.), but in our discussion they can be considered marginal.

### Supervised Learning

__Example__: Housing price prediction

![title](https://www.researchgate.net/profile/Ahmad_Al_Musawi/publication/323108787/figure/fig4/AS:592692094455810@1518320215663/Housing-price-prediction.png)



We want to say how much will cost a house sized - for instance - $750$ ft$^2$. 

We have at our disposal, the __right answers__ (_i.e._ the prices) for several data points, from this we perform a so-called _regression_ (_i.e._ the prediction of continuos outputs).
The presence of __right answers__ or __labels__ in our dataset makes this a supervised learning algorithm.

### Unsupervised Learning

![title](https://lakshaysuri.files.wordpress.com/2017/03/sup-vs-unsup.png?w=648)

The main difference between supervised and unsupervised learning is the presence (or absence) of labels to datasets.
The most important example of unsupervised learning algorithm are the clustering algorithms, trying to classify and group data in clusters.

Of the following examples, which would you address using an unsupervised learning algorithm?

1. Given email labeled as spam/not spam, learn a spam filter.
2. Given a set of news articles found on the web, group them into set of articles about the same story.
3. Given a database of customer data, automatically discover market segments and group customers into different market segments.
4. Given a dataset of patients diagnosed as either having diabetes or not, learn to classify new patients as having diabetes or not.

In [2]:
from IPython.display import HTML

# prezi
HTML('<iframe width="990" height="720" src="https://prezi.com/view/P0IKLSeFVsCTMMkmhMZk/embed" frameborder="0" allowfullscreen></iframe>')


# Few Concepts of Statistics

The aim of this section (with no surprise) is to introduce the basic concepts of statistics.
In particular we are going to focus very rapidly on the basics (random variables, distributions, etc.), applying the hopefully previous knowledge to some exercise, then we move forward to the theoretical basis of machine learning, _i.e._ Statistical inference.

Hence, we can summmarise the content of this section,

* Basics
 - Probability measure
 - Random variables
 - Distributions
 - Mean, median, variance
* Relations between quantitites
 - Correlation
 - Linear regression
 - General laws of regression
* Statistical inference
 - Bayesian inference
 - Frequentist inference

## Basics

Just to start we give a definition of __statistics__.

> Statistics is the discipline that concerns the collection, organization, displaying, analysis, interpretation and presentation of data.

In applying statistics to some context, it is conventional to begin with a so-called _statistical population_ or a _statistical model_ to be studied.

_e.g._ We want to study among all people in this class, how many are blonde.

Two main statistical methods are used in data analysis: __descriptive statistics__, which summarise data from a sample using indexes such as the _mean_ or _standard deviation_, and __inferential statistics__, which draw conclusions from data that are subject to random variation (e.g., observational errors, sampling variation). We will focus on the latter, however, we are going to give a brief description of few useful concepts of the former. 

Descriptive statistics are most often concerned with two sets of properties of a distribution (sample or population): central tendency (or location) seeks to characterize the distribution's central or typical value, while dispersion (or variability) characterizes the extent to which members of the distribution depart from its center and each other. Inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena.

In [5]:
# prezi
HTML('<iframe id="iframe_container" frameborder="0" webkitallowfullscreen="" mozallowfullscreen="" allowfullscreen="" allow="autoplay; fullscreen" width="990" height="720" src="https://prezi.com/embed/xxw5tb6fueqc/?bgcolor=ffffff&amp;lock_to_path=0&amp;autoplay=0&amp;autohide_ctrls=0&amp;landing_data=bHVZZmNaNDBIWnNjdEVENDRhZDFNZGNIUE43MHdLNWpsdFJLb2ZHanI0aTBIeXpGSk1KTXRDSzFJaEZYRE00Sm1BPT0&amp;landing_sign=ms2nKGTS6hwcZtfKj95YmDIqIYsvddwELYxwjm03060"></iframe>')


## Statistical Inference

Statistical inference is the process of using data analysis to deduce properties of an underlying probability distribution. Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates. It is assumed that the observed data set is sampled from a larger population.

Statistical inference makes propositions about a population, using data drawn from the population with some form of sampling. Given a hypothesis about a population, for which we wish to draw inferences, statistical inference consists of (first) selecting a statistical model of the process that generates the data and (second) deducing propositions from the model.

### Example

With inferential statistics you take that sample data from a small number of people and and try to determine if the data can predict whether the drug will work for everyone (i.e. the population). There are various ways you can do this, from calculating a z-score (z-scores are a way to show where your data would lie in a normal distribution) to post-hoc (advanced) testing.

Let's say we want to know the cost of a house. We can make use of inferential statistics in this way: we take data from estate agencies prices. We calculate the average cost per meter squared in the area of our interest. Hence, knowing the size of the house we want, we can get an estimation of its cost.

![title](https://www.statcrunch.com/grabimageforreport.php?reportid=5647&image_id=386301)

#### Further readings

[This article](https://statisticsbyjim.com/basics/descriptive-inferential-statistics/) contains istructive examples of inferencial and descriptive statistics.