<img src='images/gdd-logo.png' width='300px' align='right' style="padding: 15px">

# Advanced Data Science with Python

Welcome to GoDataDriven's **Taster** for the Advanced Data Science with Python course!

In this course we cover different aspects of advanced machine learning that can help you to improve your model dev elopment as well as look at some unsupervised learning.

For this taster we will look at one topic from the course: **Feature Engineering**. 

### Pre-requisites:

Since this is an advanced course, it's good to be very comfortable with the following:

- Building a simple model with `scikit learn`
- Scikit learn pipelines
- Using pipelines in Pandas (or pandas chaining at the very least)

	



_**There will be exercises!** Please visit_ [gdd.li/taster_adwsp](https://gdd.li/taster_adwsp) _to do the exercises & code along!_

---
<img src='images/ml.jpeg' width='300px' align='right' style="padding: 20px">

## Machine Learning Refresher

Machine Learning has been a **buzzword** for years, especially with the high amount of data production by applications, the increase of computation power and the development of better algorithms. Machine Learning is used anywhere from automating mundane tasks to offering intelligent insights.

There are many examples of ML in use:

- **Prediction** — Using past data to make predictions of the future.
- **Image recognition** — Detecting potential fraudsters, border control or logging in to personal devices.
- **Speech Recognition** — The translation of spoken words into the text, voice searches, voice dialing and more.
- **Medical diagnoses** — Machines can be trained to recognize cancerous tissues in imagery.
- **Financial industry and trading** — companies use ML in fraud investigations and credit checks.

## A Quick History of Machine Learning

The first manually operated computer system, ENIAC (Electronic Numerical Integrator and Computer), was invented in the 1940s. At that time the word “computer” was being used as a name for a human with intensive numerical computation capabilities, so, ENIAC was called a numerical computing machine! Well, you may say it has nothing to do with learning?! WRONG, from the beginning the idea was to build a machine able to emulate human thinking and learning.

<img src='images/ml-history.jpeg' width=400px>

In the 1950s, we see the first computer game program claiming to be able to beat the checkers world champion. This program helped checkers players a lot in improving their skills!

<img src='images/ml-timeline.png'>

Thanks to statistics, machine learning became very famous in the 1990s. The intersection of computer science and statistics gave birth to probabilistic approaches in AI. This shifted the field further toward data-driven approaches. Having large-scale data available, scientists started to build intelligent systems that were able to analyze and learn from large amounts of data.

---

## What is Machine Learning?

Machine Learning algorithms enable the computers to learn from data, and even improve themselves, without being explicitly programmed.

Machine learning can be classified into 3 types of algorithms.

![](images/ml-robot.jpeg)

Some also consider ***semi-supervised*** to be the fourth type of algorithm. This is where some data is **labeled** but most of it is **unlabeled** and a **mixture of supervised and unsupervised** techniques can be used

---

## Overview of Supervised Learning Algorithm
Supervised learning is the machine learning task of inferring a function from **labeled training data**. 

<img src='images/reg-v-class.png' width='300px' align='right' style="padding: 30px">

### Steps to success

- Training data with labeled examples
- Find optimal model parameters
- Predict unknown labels

### Types of Supervised Learning:

- **Classification:** A classification problem is when the output variable is a category, such as “red” and “blue” or “disease” and “no disease”.
- **Regression:** A regression problem is when the output variable is a real value, such as “dollars” or “weight”.

---

## Overview of Unsupervised Learning Algorithm
Unsupervised learning is the machine learning task of inferring a function to describe **hidden structures** from **unlabeled data**.

### Steps to sucesss

<img src='images/unsupervised.png' width='600px' align='right' style="padding: 30px">

- Only have input data with no corresponding output
- Model the underlying structure or distribution
- No correct answer/model - no teacher
- Left on their own to discover the structure

### Types of Unsupervised learning

- **Clustering:** A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behavior.
- **Association:** An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people that buy X also tend to buy Y.

---
 
## Overview of Reinforcement Learning
A reinforcement learning algorithm, or agent, learns by interacting with its environment. The agent receives rewards by performing correctly and penalties for performing incorrectly. The agent learns without intervention from a human by maximizing its reward and minimizing its penalty. 
<img src='images/reinforcement.png' width='400px' align='right' style="padding: 30px">

#### Examples of Reinforcement Learning

- Chess
- Go - AlphaGo
- Dota 2 - OpenAi

## Summary of algorithms

|Algorithm|Short Summary|
|:---:|:---:|
|**Supervised**|All data is **labeled** and the algorithm **learns to predict** the output from the data input|
|**Unsupervised**|Add data is **unlabeled** and the algorithm **learns the inherent** structure from the input data|
|**Reinforcement Learning**|An agent **maximises the level of reward** by interacting within an environment with **no intervention from a human**.|

<img src='images/choosing.jpeg' width='300px' align='right' style="padding: 30px">

## So what do we focus on in Advanced Machine Learning?

**MACHINE LEARNING**
- **Choosing the right model**
- **Feature Engineering**
- Feature Selection
- Unsupervise Learning: Clustering

**ADVANCED PYTHON**
- Object Oriented Programming (OOP) & OOP in Scikit-learn
- **Functions, Decorators & Pandas Pipelines**
- Coding best practice
- ML in Production & Streamlit
- Hackathon: Refactoring ML Project

The items in bold are covered (or touched upon) in today's taster.

Let's take a look at our main topic of this taster: [Feature Engineering](01-feature-engineering.ipynb)