<a href="https://colab.research.google.com/github/peter-kiilu/python_course/blob/main/Pandas_for_Data_Science.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## 📌 Understanding Pandas: A Powerful Tool for Data Science & Analysis

### 🔹 What is Pandas?

**Pandas** is an open-source Python library used for **data manipulation, analysis, and cleaning**. It provides **fast, flexible, and powerful** tools for handling structured data (tables, spreadsheets, databases).

If you have ever used **Excel**, **Google Sheets**, or **SQL databases**, Pandas gives you **even more control and automation** over data. It is built on top of **NumPy**, making it efficient for numerical computations.

### 🔹 Why is Pandas Important in Data Science & Analysis?

Data Science and Data Analysis **require working with large datasets**. Pandas makes this easy by:

✅ **Loading** data from CSV, Excel, JSON, databases, or APIs.  
✅ **Cleaning** messy data by removing missing values, duplicates, and errors.  
✅ **Filtering & sorting** data to find patterns and trends.  
✅ **Performing statistical analysis** like mean, median, and standard deviation.  
✅ **Visualizing** data by integrating with libraries like Matplotlib and Seaborn.  

> **📌 Think of Pandas as a super-powered Excel but with Python, making it more flexible and scalable!**

---

## 🔹 Pandas Series: A Single Column of Data

A **Pandas Series** is like a **one-dimensional labeled list**. It is useful when working with **a single variable** (e.g., temperature, stock prices, sales).

### 🎯 Example 1: Tracking Daily Stock Prices

```python
import pandas as pd  

# Creating a Pandas Series
stock_prices = pd.Series([180, 185, 190, 200, 195, 210, 220],
                         index=["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"])

print(stock_prices)
```

### 🖥️ Output:
```
Mon    180
Tue    185
Wed    190
Thu    200
Fri    195
Sat    210
Sun    220
dtype: int64
```

### 🔹 Key Features of a Series

- **Acts like a dictionary**: The index (Mon, Tue, etc.) represents labels.
- **Holds numerical or text data**: Numbers, dates, or even words.
- **Fast and efficient**: Uses NumPy for calculations.

### ✅ Common Operations on Pandas Series

#### 📌 Finding the Highest and Lowest Stock Prices

```python
max_price = stock_prices.max()
min_price = stock_prices.min()
print("Highest Stock Price:", max_price)
print("Lowest Stock Price:", min_price)
```

#### 📌 Find Days When Stock Price Was Above 190

```python
high_prices = stock_prices[stock_prices > 190]
print(high_prices)
```

#### 📌 Calculating the Average Stock Price

```python
average_price = stock_prices.mean()
print("Average Stock Price:", average_price)
```

#### 📌 Applying Mathematical Operations

```python
adjusted_prices = stock_prices * 1.05  # Increase by 5%
print(adjusted_prices)
```

### 📌 Real-World Use Case:
Stock market analysts **track stock trends** to make **investment decisions**.

---

## 🔹 Pandas DataFrame: A Table for Structured Data

A **Pandas DataFrame** is a **2D table** (like an Excel sheet). It can store multiple **columns** of data, where each column is a **Series**.

### 🎯 Example 2: Employee Salary Data

```python
data = {
    "Name": ["Alice", "Bob", "Charlie", "David"],
    "Age": [25, 30, 22, 28],
    "Job Role": ["Chef", "Waiter", "Manager", "Cashier"],
    "Salary": [50000, 30000, 70000, 40000]
}

df = pd.DataFrame(data)
print(df)
```

### 🖥️ Output:
```
     Name  Age   Job Role  Salary
0   Alice   25      Chef  50000
1     Bob   30    Waiter  30000
2  Charlie   22   Manager  70000
3   David   28   Cashier  40000
```

### ✅ Common Operations on Pandas DataFrame

#### 📌 Sorting Employees by Salary

```python
df_sorted = df.sort_values(by="Salary", ascending=False)
print(df_sorted)
```

#### 📌 Filtering Employees with Salary Above 40,000

```python
high_salary = df[df["Salary"] > 40000]
print(high_salary)
```

#### 📌 Calculating Average Salary

```python
average_salary = df["Salary"].mean()
print("Average Salary:", average_salary)
```

#### 📌 Adding a New Column for Bonus (10% of Salary)

```python
df["Bonus"] = df["Salary"] * 0.10
print(df)
```

---

## 🔹 Advanced Real-World Examples with Analysis

### 1️⃣ Climate Change Data Analysis 🌍

```python
import pandas as pd

climate_data = pd.DataFrame({
    "Year": [2000, 2005, 2010, 2015, 2020],
    "Avg Temperature": [14.3, 14.5, 14.8, 15.1, 15.5]
})

# Calculate the temperature trend
climate_data['Temp Change'] = climate_data["Avg Temperature"].diff()
print(climate_data)
````

**Analysis:**

- The data shows a **steady increase** in global average temperature over the years.
- The `diff()` function calculates the change in temperature between years, highlighting how climate change has accelerated.
- This analysis helps policymakers and environmentalists make decisions on climate action strategies.

---

### 2️⃣ Sales Performance Analysis 🛒

```python
sales_data = pd.DataFrame({
    "Product": ["Laptop", "Phone", "Tablet", "Monitor"],
    "Sales": [500, 1200, 300, 700],
    "Revenue": [500000, 900000, 150000, 350000]
})

# Find the most profitable product
most_profitable = sales_data[sales_data["Revenue"] == sales_data["Revenue"].max()]
print(most_profitable)
```

**Analysis:**

- The `max()` function helps determine which product generates the highest revenue.
- This insight allows businesses to **prioritize high-revenue products** for marketing and inventory management.
- A similar approach can be used to identify **low-performing products** that may need discounts or promotions.

---

### 3️⃣ Healthcare Data Analysis 🏥

```python
hospital_data = pd.DataFrame({
    "Hospital": ["A", "B", "C", "D"],
    "Patients": [150, 200, 170, 220],
    "Doctors": [20, 30, 25, 35]
})

# Calculate doctor-to-patient ratio
hospital_data["Doctor-Patient Ratio"] = hospital_data["Patients"] / hospital_data["Doctors"]
print(hospital_data)
```

**Analysis:**

- The **doctor-to-patient ratio** helps hospitals evaluate their staff allocation.
- A high ratio may indicate **overworked doctors**, leading to potential patient care issues.
- This analysis helps **healthcare administrators** optimize staffing levels.

---

### 4️⃣ Education System Analysis 🎓

```python
education_data = pd.DataFrame({
    "School": ["X", "Y", "Z", "W"],
    "Students": [500, 750, 600, 800],
    "Teachers": [40, 50, 45, 55]
})

# Calculate student-to-teacher ratio
education_data["Student-Teacher Ratio"] = education_data["Students"] / education_data["Teachers"]
print(education_data)
```

**Analysis:**

- Schools with a **high student-to-teacher ratio** might indicate **overcrowded classrooms**.
- This helps **educational planners** determine where to hire more teachers.
- A lower ratio often results in **better student performance** due to increased teacher attention.

---

### 5️⃣ Transportation Efficiency Study 🚆

```python
transport_data = pd.DataFrame({
    "City": ["New York", "London", "Tokyo", "Paris"],
    "Avg Commute Time (min)": [45, 40, 35, 50],
    "Public Transport Usage (%)": [70, 80, 85, 75]
})

# Find the city with the shortest commute time
best_commute = transport_data[transport_data["Avg Commute Time (min)"] == transport_data["Avg Commute Time (min)"].min()]
print(best_commute)
```

**Analysis:**

- The city with the shortest commute time can serve as a **benchmark for efficiency**.
- High **public transport usage** is often linked to **reduced traffic congestion** and **lower carbon emissions**.
- City planners can use this data to **improve public transport policies** and **invest in infrastructure**.

---

## 🔹 Conclusion

Pandas is an essential tool for **data-driven decision-making** across industries. By leveraging Pandas:

- **Businesses** can optimize sales strategies.
- **Hospitals** can allocate resources efficiently.
- **Schools** can improve teacher-student ratios.
- **Governments** can plan better transportation systems.
- **Scientists** can analyze climate trends to combat global warming.

Mastering Pandas **empowers professionals** to uncover insights and make data-driven decisions that lead to real-world improvements! 🚀

```
```