# End to End Machine Learning Project

In this project we will go through an example project end to end, pretending to be a recently hired data scientist in a real estate company.Here are the main steps we will go through:
<ul>
<li>Look at the big picture.</li>
<li>Get the data.</li>
<li>Discover and visualize the data to gain insights.</li>
<li>Prepare the data for Machine Learning algorithms.</li>
<li>Select a model and train it.</li>
<li>Fine-tune your model.</li>
<li>Present your solution.</li>
<li>Launch, monitor, and maintain your system.</li>
</ul>

This notes are based on excellent book - [Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow](https://learning.oreilly.com/library/view/hands-on-machine-learning/9781492032632/) by Aurélien Géron.

Aurélien offers a checklist of eight mains steps for any Machine Learning Project which can be adapted as we need. This helps in streamlining the project and avoid rework. Let's put them in special format, so we can use it as flash card.

<div class="alert alert-block alert-warning" id='anki_front'>
What are the eight main steps of a machine learning project?
</div>
<div class="alert alert-block alert-success" id='anki_back'>
<ol>
<li>Frame the problem and understand the big picture.</li>
<li>Get the data.</li>
<li>Explore the data to gain initial insights.</li>
<li>Prepare and clean the data to better expose underlying patterns to Machine Learning algorithms.</li>
<li>Explore the different models/approaches and shortlist the most promising ones.</li>
<li>Fine-tune the models and combine them if possible for a better solution.</li>
<li>Present your solution.</li>
<li>Launch, monitor and maintain your solution.</li>
</ol>
</div>

In this project, we are going to follow these steps (more or less, as this is just beginning, we might not have all the context, or some step might be very trivial). Let's say at your job, you are given California Census Data to predict median housing price for each block group in California (districts, which typically has population of 600 to 3000).  The data contains various metrices such as population, median income, and median housing prices etc. Let's try to follow the steps above to solve this problem.

## 1. Frame the problem and understand the big picture

Whenever we start a Machine Learning project, it is better to ask the right questions and look at the big picture before we begin. There are many questions you can ask to get a bigger picture. These questions will help you make better choices down the line.

<div class="alert alert-block alert-warning" id='anki_front'>
What questions should be asked for understanding the big picture of your project?
</div>
<div class="alert alert-block alert-success" id='anki_back'>
<ol>
    <li>What is the business objective of this project?</li>
    <li>What are the current solutions around?</li>
    <li>How will the solution be used?</li>
    <li>What kind of Machine Learning task it? Supervised/Unsupervised. Online/Batch etc</li>
    <li>How should the performance be measured?</li>
    <li>What is the minimum performance needed to reach business objective?</li>
    <li>What exisiting problems are similar to this one? Can we reuse tools, models or experience?</li>
    <li>Is the human expertise available?</li>
    <li>What assumptions are we making? Can we verify them?</li>
    <li>How would you solve the problem manually?</li>
</ol>
</div>

Let's answer as many questions as possible for our problem.

**Business Objective:** The end goal of the project determines how we plan, build and evaluate the model. It should be very clear why we are building the solution? Let's say our model's output will be input to another ML system which along with our output takes many other signals to determine if it is worth investing in an area or not. This is a typical machine learning pipeline.

**Current Solutions:** What if there are already state of art solutions available for this problem? In that case, why re-invent the wheel? It is good idea to gauge performance and architecture of the existing systems. This gives you reference for approaching the problem. Your boss tells you the current process is manual and have error variation of around 20%. In such case, you will like to get your hands dirty.

**Describing the task**: Now, we are designing the system.
* Is it a supervised, unsupervised or Reinforcement learning task? Supervised, as we have labeled training examples.
* Is it a classification, regression or something else? Is is regression as we are going to predict a value.
* Should we use online or batch learning? As there is no continuous data flow coming into the system, there is no requirement to adjust the data rapidly, and the data is small, so we will rely on plain batch learning.
   
