Introduction to Machine Learning

Notes of the free Machine Learning course from Pluralsight, found here.

To see the python scripts, first you need to have intalled Python 3 and pip.

Then, you have to install Jupyter and then run the Jupyter notebook:

pip install jupyter
jupyter notebook

Definition

Machine learning: Building a model from example inputs to make data-driven predictions vs. following strictly static program instructions.

Machine learning logic

Instead of "if", "case", "while" and "until", uses the data parsed to a format we can use, then we pass this formatted data to an algorithm that analyses the data (data analysis) and then it creates a model that implements the solution to solve the problem based on the data.

Ways machines learn from data

Supervised: Data is labeled and has features, and we know the result we want to obtain for that data.
Unsupervised: Search clusters of blank data and encounters groups of data that share the same traits.

Supervised	Unsupervised
Value prediction	Identify clusters of like data
Needs training data containing value being predicted	Data does not contain cluster membership
Trained model predicts value in new data	Model probides access to data by cluster

Machine Learning Workflow

An orchestrated and repeatable pattern which systematically transform and processes information to create prediction solutions.

Ask the right question
Preparing data
Selecting the algorithm
Training the model
Testing the model --> If something went wrong, iterate from (2)

Guidelines for machine learning workflow

Early steps are most important. Each step depends on previous steps
Expect to go backwards. Later knowledge effects previous steps
Data is never as you need it. Data will have to be altered.
More data is better. More data => better results.
Don't pursue a bad solution. Reevaluate, fix or quit.

Asking the right question

Define end goal, starting point and how to achieve goal.

Predict if a person will develop diabetes

This sentence can be improved.

Solution statements goals:

Define scope (including data sources)
Define target performance
Define context for usage
Define how solution will be created

Detailed:

Scope and data sources:
- Understand the features in data
- Identify critical features
- Focus on at risk population
- Select data source -> Pimia indian diabetes study is a good source

Using Pima Indian Diabetes data, predict which people will develop diabetes.

Performance targets:
- Binary result (True or False)
- We want more accurancy than just a coin flip (>50%)
- Genetic difference are a factor
- 70% accurancy is common target.

Using Pima Indian Diabetes data, predict with 70% or great accuracy, which people will develop diabetes.

Context
- Disease prediction
- Medical research practices
- Unknown variations between people
- Likelihood is used

Using Pima Indian Diabetes data, predict with 70% or great accuracy, which people are likely to develop diabetes.

Solution creation
- Usually we will use the Machine Learning Workflow to develop the solution.
  - Process Pima Indian Data.
  - Transform data as required.

Using the Machine Learning Workflow to process and transform Pima Indian Diabetes data to create a prediction model. This model must predict which people are likely to develop diabetes with 70% or great accuracy.

Preparing data

Find data we need Inspect and clean data Explore data and modify if necessary Mold the data to tidy data

Tidy data

Tidy datasets area easy to manipulate, model and visualize, and have a specific structure:

Each variable is a column
Each observation is a row
Each type of observational unit is a table

50-80% of a ML project is spent getting, cleaning, and organizing data.

Getting data

Where to get it:

Google
Government databases
Professional or company data sources
Your company
All of the above

Data Rule #1: Closer the data is to what you are predicting, the better.

Data Rule #2: Data will never be in the format you need.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.ipynb_checkpoints		.ipynb_checkpoints
images		images
README.md		README.md
Untitled.ipynb		Untitled.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction to Machine Learning

Definition

Machine learning logic

Ways machines learn from data

Machine Learning Workflow

Guidelines for machine learning workflow

Asking the right question

Solution statements goals:

Preparing data

Tidy data

Getting data

About

Releases

Packages

Languages

rosita-hormann/intro-machine-learning

Folders and files

Latest commit

History

Repository files navigation

Introduction to Machine Learning

Definition

Machine learning logic

Ways machines learn from data

Machine Learning Workflow

Guidelines for machine learning workflow

Asking the right question

Solution statements goals:

Preparing data

Tidy data

Getting data

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages