# 1.1 Learn Python Machine Learning The Wrong Way

** When starting to learn machine learning; **

** Refrain from:**
- Getting really good at Python programming and Python syntax.
- Deeply studying the underlying theory and parameters for machine learning algorithms in scikit-learn.
- Avoiding or lightly touching on all of the other tasks needed to complete a real project.

** Focus on: **
- The process of building predictive machine learning models in Python that you can actually use to make predictions.

# 1.2 Machine Learning in Python #

- Predictive modeling is a sub-field of machine learning.
- Unlike statistics, where models are used to understand data, predictive modeling is laser focused on developing models that make the most accurate predictions at the expense of explaining why predictions are made. 
- Unlike the broader field of machine learning that could feasibly be used with data in any format, predictive modeling is primarily focused on tabular data (e.g. tables of numbers like in a spreadsheet).
- The 3 themes of the book:
    - **Lessons:** Learn how the sub-tasks of a machine learning project map onto Python and the best practice way of working through each task.
    - **Projects:** Tie together all of the knowledge from the lessons by working through case study predictive modeling problems.
    - **Recipes:** Apply machine learning with a catalog of standalone recipes in Python that you can copy-and-paste as a starting point for new projects.

## 1.2.1 Lessons ##

A predictive modeling machine learning project can be broken down into 6 top-level tasks:
- **Define Problem:** Investigate and characterize the problem in order to better understand the goals of the project.
- **Analyze Data:** Use descriptive statistics and visualization to better understand the data you have available.
- **Prepare Data:** Use data transforms in order to better expose the structure of the prediction problem to modeling algorithms.
    - Pre-process data.
    - Feature selection.
- **Evaluate Algorithms:** Design a test harness to evaluate a number of standard algorithms on the data and select the top few to investigate further.
    - Resampling methods.
    - Algorithm evaluation metrics.
    - Spot-check algorithms.
    - Model selection.
    - Pipelines
- **Improve Results:** Use algorithm tuning and ensemble methods to get the most out of well-performing algorithms on your data.
- **Present Results:** Finalize the model, make predictions and present results.

## 1.2.2 Projects ##

[UCI Machine learning repository](http://archive.ics.uci.edu/ml/) datasets are excellent for practicing applied machine learning because:
- **They are small**, meaning they fit into memory and algorithms can model them in reasonable time.
- **They are well behaved**, meaning you often don’t need to do a lot of feature engineering to get a good result.
- **They are benchmarks**, meaning that many people have used them before and you can get ideas of good algorithms to try and accuracy levels you should expect.

The 3 projects:
- **Hello World Project (Iris flowers dataset):** This is a quick pass through the project steps without much tuning or optimizing on a dataset that is widely used as the hello world of machine learning.
- **Regression (Boston House Price dataset):** Work through each step of the project process with a regression problem.
- **Binary Classification (Sonar dataset):** Work through each step of the project process using all of the methods on a binary classification problem.

## 1.2.3 Recipes ##

Recipes make the difference between a beginner who is having trouble and a fast learner capable of making accurate predictions quickly on any new project. A catalog of recipes provides a repertoire of skills that you can draw from when starting a new project. More formally, recipes are defined as follows:
- Recipes are code snippets not tutorials.
- Recipes provide just enough code to work.
- Recipes are demonstrative not exhaustive.
- Recipes run as-is and produce a result.
- Recipes assume that required libraries are installed.
- Recipes use built-in datasets or datasets provided in specific libraries.

You can also build upon this recipe catalog as you discover new techniques.

## 1.2.4 Your Outcomes From Reading This Book ##

You will know:
- How to work through a small to medium sized dataset end-to-end.
- How to deliver a model that can make accurate predictions on new unseen data. How to complete all subtasks of a predictive modeling problem with Python.
- How to learn new and different techniques in Python and SciPy.
- How to get help with Python machine learning.

From here you can start to dive into the specifics of the functions, techniques and algorithms used with the goal of learning how to use them better in order to deliver more accurate predictive models, more reliably in less time.

# 1.3 What This Book is Not #

- **This is not a machine learning textbook.** We will not be getting into the basic theory of machine learning (e.g. induction, bias-variance trade-off, etc.). You are expected to have some familiarity with machine learning basics, or be able to pick them up yourself.
- **This is not an algorithm book.** We will not be working through the details of how specific machine learning algorithms work (e.g. Random Forests). You are expected to have some basic knowledge of machine learning algorithms or how to pick up this knowledge yourself.
- **This is not a Python programming book.** We will not be spending a lot of time on Python syntax and programming (e.g. basic programming tasks in Python). You are expected to be a developer who can pick up a new C-like language relatively quickly.

# [scikit-learn algorithm cheat-sheet](http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html) #

