<a href="https://colab.research.google.com/github/sharithomas/ML-AI/blob/main/Machine_Learning_introduction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Tools & Languages in AI & Machine Learning

**1. Python**

Python is the most widely used programming language in AI. It’s simple, versatile, and backed by thousands of libraries.
Why it matters: Readable syntax, massive community, and endless ML/AI resources.


**2. NumPy & Pandas **

Before building models, you clean and understand data. These libraries make it easy.

NumPy: Fast matrix computations

Pandas: Smart data manipulation and analysis


**3. Scikit-learn **

Want to build a model to predict house prices or classify emails as spam? Scikit-learn is perfect for regression, classification, clustering, and more.


 **4. TensorFlow & PyTorch – Deep Learning Giants**

These are the two leading frameworks used for building neural networks, CNNs, RNNs, LLMs, and more.

TensorFlow: Backed by Google, highly scalable

PyTorch: Preferred in research for its flexibility and Pythonic style


**5. Keras – The Friendly Deep Learning API**

Built on top of TensorFlow, it allows quick prototyping of deep learning models with minimal code.


**6. OpenCV – For Computer Vision**

Want to build face recognition or object detection apps? OpenCV is your go-to for processing images and video.


**7. NLTK & spaCy – NLP Toolkits**

These tools help machines understand human language. You’ll use them to build chatbots, summarize text, or analyze sentiment.


**8. Jupyter Notebook**

Interactive notebooks where you can write code, visualize data, and explain logic in one place. Great for experimentation and demos.


**9. Google Colab – Free GPU-Powered Coding**

Run your AI code with GPUs for free in the cloud — ideal for training ML models without any setup.


**10. Hugging Face – Pre-trained AI Models**

Use models like BERT, GPT, and more with just a few lines of code. No need to train everything from scratch!

***Understanding Data & Datasets***

In AI and Machine Learning, data isn’t just important — it’s everything.

Without clean, relevant data, even the best algorithms fail. Today, we’ll cover how to find, clean, and prepare data for powerful AI models.


1. What Is a Dataset?

A dataset is a structured collection of data — like an Excel file or a table — used to train and test ML models.

Examples:

- Image datasets (e.g., CIFAR-10, MNIST)
- Text datasets (e.g., IMDB reviews, news articles)
- Tabular datasets (e.g., Titanic dataset, sales reports)


2. Types of Data

Structured Data: Rows and columns (e.g., Excel, SQL)

Unstructured Data: Images, audio, video, text

Semi-structured Data: JSON, XML



3. Sources of Datasets

You can find free datasets on:

- Kaggle
- UCI Machine Learning Repository
- Google Dataset Search
- Hugging Face Datasets


4. Steps in Data Preparation

a. Data Cleaning:

- Remove null values

- Handle duplicates

- Fix inconsistent formatting


b. Data Transformation:

- Normalize or standardize values

- Encode categorical variables

- Scale numerical features


c. Splitting Data:

- Train set (e.g., 80%)

- Test set (e.g., 20%)

- Optionally: Validation set (for tuning)


5. Why Data Quality Matters

Bad data = bad predictions.

Your model learns patterns from data — if the data is incorrect, incomplete, or biased, your AI will reflect that.


**Introduction to Machine Learning (ML)**

***Machine Learning*** is the core of Artificial Intelligence — enabling systems to learn from data and make decisions with minimal human intervention.


1. What Is Machine Learning?

Machine Learning is a subset of AI that allows machines to learn from data, identify patterns, and make predictions or decisions without being explicitly programmed.


2. Three Main Types of ML

a. Supervised Learning

- Data is labeled

- The model learns from input-output pairs


Examples:

- Predicting house prices

- Classifying emails as spam or not


b. Unsupervised Learning

- Data is not labeled

- The model finds hidden patterns or groupings

Examples:

- Customer segmentation

- Market basket analysis


c. Reinforcement Learning

- The model learns by trial and error

- Receives rewards or penalties

- Used in gaming, robotics, autonomous systems


3. Popular ML Algorithms

- Linear Regression – For prediction (Supervised)

- K-Means Clustering – For grouping (Unsupervised)

- Decision Trees – For classification tasks

- Q-Learning – For reinforcement learning


4. Workflow of a Typical ML Project

1. Data Collection
2. Data Preprocessing
3. Model Selection
4. Training the Model
5. Model Evaluation
6. Model Deployment


5. Real-World Examples of ML

- Netflix recommends what you’ll watch next

- Credit card companies detect fraud

- Amazon suggests what you might want to buy

- Self-driving cars recognize stop signs and pedestrians


**Supervised learning**

Supervised learning as the name indicates the presence of a supervisor as a teacher. Basically supervised learning is a learning in which we teach or train the machine using data which is well labeled that means some data is already tagged with the correct answer. After that, the machine is provided with a new set of examples(data) so that supervised learning algorithm analyses the training data(set of training examples) and produces a correct outcome from labeled data.
Supervised learning classified into two categories of algorithms:

•	**Classification**: A classification problem is when the output variable is a category, such as “Red” or “blue” or “disease” and “no disease”.

•	**Regression**: A regression problem is when the output variable is a real value, such as “dollars” or “weight”.


**Unsupervised learning**

**Unsupervised learning **is the training of machine using information that is neither classified nor labeled and allowing the algorithm to act on that information without guidance. Here the task of machine is to group unsorted information according to similarities, patterns and differences without any prior training of data.

Unsupervised learning classified into two categories of algorithms:

•	**Clustering**: A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behavior.

•	**Association**: An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people that buy X also tend to buy Y.


**Reinforcement learning.**

A newer type of learning problem that has gained a great deal of traction recently is called reinforcement learning. In reinforcement learning, we do not provide the machine with examples of correct input-output pairs, but we do provide a method for the machine to quantify its performance in the form of a reward signal. Reinforcement learning methods resemble how humans and animals learn: the machine tries a bunch of different things and is rewarded when it does something well.


Linear Regression — Your First AI Algorithm!

Linear Regression is the most basic yet powerful algorithm in machine learning. It finds the relationship between a dependent variable (Y) and one or more independent variables (X) by fitting a straight line.

Real-life analogy:
Imagine you want to predict someone’s weight based on their height. Linear regression draws a line that best fits all the (height, weight) data points to make such predictions.

Mathematical Form:
Y = mX + b

Where:

- Y is the predicted value
- m is the slope
- X is the input variable
- b is the intercept


Use cases:

- Predicting house prices
- Forecasting sales
- Estimating stock trends
- Any problem involving continuous numerical prediction


Tools used:
Python Libraries: Scikit-learn, Pandas, Matplotlib

In [1]:
from sklearn.linear_model import LinearRegression
import pandas as pd

# Sample Data
data = {'Experience': [1, 2, 3, 4], 'Salary': [30000, 35000, 40000, 45000]}
df = pd.DataFrame(data)

# Splitting variables
X = df[['Experience']]
y = df['Salary']

# Model Training
model = LinearRegression()
model.fit(X, y)

# Predicting
print(model.predict([[5]]))  # Predict salary for 5 years experience

[50000.]


