# Introduction to ML


### Agenda

- Course Introduction
- Significance of Data
- What is Machine Learning (ML)?
- Practical Use cases
- Concepts and Terms
- Tools/Platforms for ML
- Machine Learning End to End Pipeline
- Who is Data Engineer and Data Scientist
- What does a Data Scientist do?

### Did you install Anaconda?

- Install Anaconda (https://www.anaconda.com/distribution/)

> Make sure to install the latest (python 3.7) version 

<img src="../images/anaconda.png" width="600" height="600" align="center"/>

- We will go over the Python Programming language in next session

![ML.png](attachment:ML.png)

[source](https://augnitive.com/introduction-to-machine-learning/)

## Course Introduction

### What is Machine Learning (ML)?

Machine learning (ML) is a type of artificial intelligence (AI) that allows software applications to become more accurate at predicting outcomes without being explicitly programmed to do so. Machine learning algorithms use historical data as input to predict new output values.

##### Why is machine learning important?
Machine learning is important because it gives enterprises a view of trends in customer behavior and business operational patterns, as well as supports the development of new products. Many of today's leading companies, such as Facebook, Google and Uber, make machine learning a central part of their operations. Machine learning has become a significant competitive differentiator for many companies.

### Practical Use cases

**Voice Assistant**
Voice assistants are ubiquitous right now. Popular voice assistants like Apple’s Siri, Google Assistant, Amazon’s Alexa, etc. are paving the way to be part of people’s general conversation. Machine learning algorithm works behind all these voice assistants to recognize the speech using Natural Language Processing (NLP). Then, it converts the speech into numbers using machine learning and formulates a response accordingly.

 
**Personalised Marketing**
Technology is gaining ground in the marketing system. Using machine learning features, marketing industry segments customers based on behavioural and characteristic data. The digital advertisement platforms allow marketers to focus on the set of audience with relevant product influence. They understand customer requirements and serve with better product promotion accordingly.

 

**Fraud Detection**
Big companies involved in financial engagements and banks are using machine learning for fraud detection. This helps companies to keep consumers safe. Machine learning can also be valuable to companies that handle credit card transactions. The technology is trained to flag transactions that appear to be fraudulent based on certain criteria according to the company’s rules. By detecting such mishaps, companies can be prevented from falling prey to a big loss.

 

**Self-Driving Cars**
Self-driving cars are one of the fascinating technologies where machine learning is leveraged on a high-level. The beauty of self-driving cars is that all the three main aspects of machine learning namely supervised, unsupervised and reinforcement learning are used throughout the car’s design. Smart cars use machine learning features like detecting objects around the car, finding the distance between the car in the front, where the pavement is located, and traffic signal, evaluating the condition of the driver and scene classification.
 

**Predicting Behaviour**
Organisations can use machine learning models to predict the customer’s behaviour based on their past data. Companies look for what people are talking about in social media and then identify those who are searching for the given product or service. For example, Zappos uses analytics and machine learning to help provide personalized sizing and search result for customers, as well as predictive behavior models.

 

**Healthcare**
The value of machine learning in healthcare is its ability to process huge datasets beyond scope of human capability, and then reliably converts analysis of that data into clinical insights that aid physicians. Machine learning helps in planning and providing care, ultimately leading to better outcomes, lower costs of care, and increased patient satisfaction. Computer-assisted diagnosis (CAD), an application of machine learning can also be used to review the mammography scans of women in predicting cancer.


**Chatbots**
Machine learning is helping customer support by leveraging chatbots that give relevant reply to consumer’s queries. Using concepts of Natural Language Processing (NLP) and sentiment analysis, machine learning algorithms are able to understand customer’s need and the tone they say it. Then the system redirects the query to appropriate customer support person.

### Tools/Platforms for ML

![download.png](attachment:download.png)

### Machine Learning End to End Pipeline

![1_rKOicpPWdEsomGFIgMiEwg.png](attachment:1_rKOicpPWdEsomGFIgMiEwg.png)

The **CRoss Industry Standard Process for Data Mining** (`CRISP-DM`) is a process model with six phases that naturally describes the data science life cycle. It’s like a set of guardrails to help you plan, organize, and implement your data science (or machine learning) project.

- Business understanding – What does the business need?
- Data understanding – What data do we have / need? Is it clean?
- Data preparation – How do we organize the data for modeling?
- Modeling – What modeling techniques should we apply?
- Evaluation – Which model best meets the business objectives?
- Deployment – How do stakeholders access the results?

Published in 1999 to standardize data mining processes across industries, it has since become the most common methodology for data mining, analytics, and data science projects.

Data science teams that combine a loose implementation of CRISP-DM with overarching team-based agile project management approaches will likely see the best results. Even teams that don’t explicitly follow CRISP-DM, can still use the framework diagram to explain how the differences between data science and software projects.

### Who is Data Engineer and Data Scientist

![Data-Engg-tools.jpg](attachment:Data-Engg-tools.jpg)

#### What Does a Data Engineer Do?
A data engineer is a data professional who prepares the data infrastructure for analysis. They are focused on the production readiness of raw data and elements such as formats, resilience, scaling, data storage, and security. Data engineers are tasked with designing, building, testing, integrating, managing, and optimizing data from a variety of sources. They also build the infrastructure and architectures that enable data generation.

Their primary focus is to build free-flowing data pipelines by combining a variety of big data technologies that enable real-time analytics. Data engineers also write complex queries to ensure that data is easily accessible.



#### What Are the Requirements To Become a Data Engineer?
Data engineers usually hail from a software engineering background and are proficient in programming languages like Java, Python, SQL, and Scala. Alternatively, they might have a degree in mathematics or statistics that helps them apply different analytical approaches to solve business problems.

To get hired as a data engineer, most companies look for candidates with a bachelor’s degree in computer science, applied math, or information technology. Candidates may also be required to have a few data engineering certifications, like Google’s Professional Data Engineer or IBM Certified Data Engineer. It also helps if they are experienced in building big data warehouses that can run some Extract, Transform, and Load, or ETL, on top of big data sets.

### What does a Data Scientist do?

Data scientists examine which questions need answering and where to find the related data. They have business acumen and analytical skills as well as the ability to mine, clean, and present data. Businesses use data scientists to source, manage, and analyze large amounts of unstructured data. Results are then synthesized and communicated to key stakeholders to drive strategic decision-making in the organization.

Specific tasks include:

- Identifying the data-analytics problems that offer the greatest opportunities to the organization
- Determining the correct data sets and variables
- Collecting large sets of structured and unstructured data from disparate sources
- Cleaning and validating the data to ensure accuracy, completeness, and uniformity
- Devising and applying models and algorithms to mine the stores of big data
- Analyzing the data to identify patterns and trends
- Interpreting the data to discover solutions and opportunities
- Communicating findings to stakeholders using visualization and other means

![1235543088.png](attachment:1235543088.png)