Machine Learning: Artificial Intelligence x Data Science

AI -> Cognitive Systems (thinking like humans) vs Machine Learning

#### Conundrums in AI:
1. Intelligent agents have limited resources (computational speed, memory) -> But many problems are computationally intractable.
2. Computation is local, but problems have global constraints.
3. Logic is deductive, but many problems are not (they are abductive or inductive).
4. The world is dynamic, but knowledge is limited. AI agent always begins with what it knows -> How does it address new problems?
5. Problem solving, reasoning and learning are complex, but explanation and justification are even more complex.

#### Characteristics of AI Problems:
1. Knowledge often arrives incrementally.
2. Problems exhibit recurring patterns.
3. Problems have multiple levels of granularity.
4. Many problems are computationally intractable.
5. The world is dynamic, but knowledge of the world is static.
6. The world is open-ended, but knowledge is limited.

(From [Knowledge-Based AI](https://www.udacity.com/course/knowledge-based-ai-cognitive-systems--ud409?_ga=1.192741295.463903328.1463823313))

#### AI As Uncertainty Management
AI = what to do when you don't know what to do

Reasons for uncertainty:
- Sensor limits
- Adversaries
- Stochastic environments (rolling dice)
- Laziness (Can compute what situation is but too lazy to do it)
- Ignorance (Could know something but just don't care)

(From uDacity Sebastian Thrun)

e.g.: Watson (answering Jeopardy questions)

Process:
- Read clue (understand natural language sentences)
- Search through knowledge base
- Decide on answer
- Phrase answer

Specifics:
- Know of the potential answers (e.g. Michael Phelps, Hey Jude) and know information pertaining to the potential answers
- Understand the statement: Interpret words in context. May need to interpret puns.
- Know the format of the answer

Core **deliberation processes**:
1. Reasoning (read and generate natural language sentences)
2. Learning (make decisions and see if those decisions are correct or not -> Change)
3. Memory (Store knowledge and what we learn)

[img](images/intro-1.png)

#### Four schools of thought of AI
[Four quadrants (schools of thought) of AI](images/intro-2.png)

Thinking vs acting,
Optimally vs like humans.

Knowledge-based AI: interested in agents that think like humans.
Examples:
[Examples of applications in each school of thought of AI](images/intro-3.png)

E.g. autonomous vehicle: acts (and thinks?) optimally.

Patterns of knowledge-based data: AI behaviour 

[Categorising four examples](images/intro-4.png)

### Bayes' Rule

$$P(A|B) = \frac{P(B|A)*P(A)}{P(B)}$$

$$ Posterior = \frac{Likelihood x Prior}{Marginal likelihood}$$

Likelihood: If we knew the cause (A), what would be the probability of the evidence we just observed? But to correct for the inversion, we need to multiply by the prior.

$$P(B) = \sum_aP(B|A=a)P(A=a)$$

(Total probability)

#### Bayes Network
[Bayes Network](images/intro-5.png)

Number of parameters in this Bayes Network: 3. P(A), P(B|A), P(B| not A).

Data is a lot about discerning unseen cause of the data that we can see.

## Data Science

[What is a data scientist?](images/intro-ds1.png)

'Substantive Expertise':
- Know which questions to ask
- Can interpret the data well
- Understands structure of the data

But data scientists often work in teams so they can complement each other's strengths and weaknesses.

[Data Science Process](images/intro-ds2.png)



## Machine Learning

What is ML?

Philosophy of ML:
- Theoretical (Michael) vs Practical (Charles)


Theoretical: ML is computational statistics that is about proving theorems.
Practical: ML is the broader notion of building computational artifacts that learn over time based on experience. Applied stats.

(They are hilarious.)

Supervised learning:
- Taking labelled datasets, gleaning info from it so you can label new datasets.
- Function approximation
- Approximate function induction
-> Make assumptions about the world, e.g. well-behaved function that fits that data that is generalises.

Supervised learning is about **inductive bias**. Specifics -> Generalities.

Vs deduction: Generalities -> Specifics.

### Induction, deduction and abduction

[ida](images/intro-ida.png)

Deduction: Given the rule and the cause, deduce the effect. (Proof-preserving)

[d](images/intro-d.png)

Induction: Given a cause and an effect, induce a rule. (Correctness not guaranteed.)

[i](images/intro-i.png)

Abduction: Given a rule and an effect, abduce a cause. (Correctness not guaranteed.)

[a](images/intro-a.png)

ML is about **inducing a rule**. The rule doesn't have to be causal - correlations are useful too.

E.g. apply abductively to figure out where insider trading has occurred.

## Unsupervised Learning

**Description or summarisation** (vs supervised learning -> Approximation).
Just have input, no given labels. Derive structure from input.

Differences with supervised learning:
- All ways of dividing up the world are in a way equally good (absent other signals telling you something is goood or not good).
- Unsupervised is helpful in supervised -> Can help

[unsup](images/intro-unsup.png)

## Reinforcement Learning

Learning from delayed reward vs supervised learning 'here's what you should do'.

E.g. Playing tic-tac-toe -> lost -> learn which moves were important (bad).

Reinforcement learn is in a sense harder than supervised learning because you're not told what to do.
Like playing a game without knowing any of the rules but being told once in a while that you've won or you've lost.

## Comparison of three parts of ML

Supervised: Labels. 
Unsupervised: Don't know if one cluster is better than another.
-> But there is an assumed set of labels because you're clustering.

- In many cases you can formulate these problems as some sort of optimisation.
    - SL: Labels data well
    - RL: Behaviour scores well
    - UL: Cluster scores well

One view:
Compsci hink in terms of algorithms, theorems vs ML data being central. Or the two being co-equal.

