
## Summary of First Session - Machine Learning Zoomcamp

### 1. **Introduction to Machine Learning with Cars Data**
We start with data about cars, including characteristics (features) and prices (target).  
A Machine Learning (ML) model can be used to extract patterns from known information (data) about some cars in order to predict car prices based on their characteristics.

---

### 2. **Rules-Based Systems vs. Machine Learning**
- **Rules-Based Systems:**  
  Manually converting rules into code using a programming language and applying them to data to produce outcomes.  
  Extracting patterns manually can become complex and challenging.

- **Machine Learning:**  
  Instead of manually coding rules, ML models automatically extract patterns from data using **Mathematics and Statistics**.

---

### 3. **Supervised Machine Learning**
In supervised learning, models learn from labeled data (with known outcomes) to make predictions on unseen data.  
There are three main types:

- **Classification:** target is a class (e.g., spam or not spam)  
- **Regression:** target is a number (e.g., price)  
- **Ranking:** output is a list of items ordered by importance or scores

---

### 4. **CRISP-DM (Cross Industry Standard Process for Data Mining)**
A structured methodology from the 1990s for organizing ML projects, consisting of the following iterative steps:

1. **Business Understanding** — identify the problem, understand requirements, measure success, and decide if ML is needed.  
2. **Data Understanding** — analyze available data and determine quality, reliability, and sufficiency.  
3. **Data Preparation** — transform and clean data, build pipelines, and convert it into tabular form for ML models.  
4. **Modeling** — choose and train models, adjust features, and fix data issues to improve performance.  
5. **Evaluation** — assess if the model meets business goals and performance metrics.  
6. **Deployment** — roll out the model to users, monitor its performance, and ensure scalability and maintainability.

> This process is **iterative**, allowing for continuous improvement and feedback.

---

### 5. **Model Selection**
Split data into **training**, **validation**, and **test** sets.  
Train multiple models, validate performance, select the best one, and finally test it on unseen data to ensure **generalization**.

> *Note:* You can reuse the validation data. After choosing the best model, combine the **training** and **validation** datasets to retrain the model before testing on the **test set**.

---

### 6. **Setting Up the Environment**
Install the following tools and libraries:

- Python  
- NumPy  
- Pandas  
- Matplotlib  
- Scikit-learn  

> The easiest setup option is using **Anaconda**.  
> Optionally, you can use **AWS** for cloud-based experimentation and resources.

---

### 7. **Introduction to NumPy**
NumPy is essential for numerical computations, providing efficient operations on arrays, matrices, and linear algebra functions.

---

### 8. **Linear Algebra**
Understanding various types of **vector** and **matrix multiplication**.  
Example: creating an **identity matrix** using:

```python
np.eye()
````

---

### 9. **Introduction to Pandas**

**Pandas** is a Python library used for processing, analyzing, and manipulating **tabular data** efficiently.

---

*This summary provides a quick recap of the foundational topics covered in the first session of the ML Zoomcamp.*
