# Introduction to Pandas

## 1. What is Pandas?

Pandas is a powerful Python library for data analysis, providing efficient data structures and functions for data manipulation.

In [None]:
import numpy as np
import pandas as pd

### **Exercise:** Install pandas using `pip install pandas` (if not installed). Import pandas and print its version.


            ### **AI Prompt: Understanding Pandas**
            - Explain Pandas as if you're teaching a 10-year-old.
            - What are the key differences between Pandas and NumPy?
            - Why do we need Pandas in data analysis? Provide an example.
            

## 2. Pandas Objects - Series

Series is a one-dimensional indexed data structure in Pandas.

In [None]:
series1 = pd.Series([1, 2], index=['a', 'b'])
series2 = pd.Series({"a": 1, "b": 2})
print(series1)
print(series2)

### **Exercise:** Create a Series containing 5 city names as values and use index labels as country names.


            ### **AI Prompt: Exploring Pandas Series**
            - How does a Pandas Series differ from a Python list?
            - Can you provide a real-world example where using a Series is beneficial?
            - Generate additional exercises that explore Series operations.
            

## 3. Pandas Objects - DataFrame

DataFrame is a two-dimensional table with labeled rows and columns.

In [None]:
df1 = pd.DataFrame({'state': ['Ohio', 'California'], 'year': [2000, 2010]})
print(df1)

df2 = pd.DataFrame(np.random.rand(3, 2), columns=['foo', 'bar'])
print(df2)

### **Exercise:** Create a DataFrame with student names, their subjects, and corresponding grades.


            ### **AI Prompt: Working with DataFrames**
            - Compare a DataFrame with an SQL table. How are they similar and different?
            - What are some common operations performed on a DataFrame?
            - Describe a scenario where a DataFrame is more useful than a Series.
            

## 4. Indexing and Selection

Pandas provides various ways to select and retrieve data using `.loc[]` and `.iloc[]`.

In [None]:
s1 = pd.Series(range(10, 14), index=list("abcd"))
print(s1.loc['b'])
print(s1.iloc[1])

### **Exercise:** Given a Series of five countries and their populations, retrieve the population of a specific country using `.loc[]`.


            ### **AI Prompt: Mastering Indexing**
            - What is the difference between `.loc[]` and `.iloc[]` in Pandas?
            - How can incorrect indexing lead to errors in data analysis?
            - Generate a challenging indexing problem and explain how to solve it.
            

## 5. Handling Missing Data

Pandas allows handling missing values using `fillna()` or `dropna()`.

In [None]:
df_missing = pd.DataFrame({'A': [1, 2, None], 'B': [None, 3, 4]})
df_filled = df_missing.fillna(0)
print(df_filled)

### **Exercise:** Create a DataFrame with missing values and replace them with the column mean.


            ### **AI Prompt: Handling Missing Data**
            - Why is handling missing data important in real-world datasets?
            - Compare the differences between `.fillna()`, `.dropna()`, and `.interpolate()`.
            - What are some strategies for handling missing categorical values?
            

## 6. MultiIndex (Hierarchical Indexing)

MultiIndex allows working with hierarchical data efficiently.

In [None]:
index = [('California', 2000), ('California', 2010), ('New York', 2000)]
populations = [33871648, 37253956, 18976457]
ind = pd.MultiIndex.from_tuples(index, names=('State', 'Year'))
df_multi = pd.DataFrame({'Population': populations}, index=ind)
print(df_multi)

### **Exercise:** Create a MultiIndex DataFrame for sales data across multiple years and retrieve sales for a specific year.


            ### **AI Prompt: Exploring MultiIndex**
            - When should you use MultiIndex instead of a simple index?
            - How does MultiIndex impact performance in large datasets?
            - Can you create a dataset where MultiIndexing is necessary?
            

## Final Comprehensive Exercise

- Create a dataset of 10 employees with **Name, Age, Department, Salary, and Year of Joining**.
- Retrieve employees in a specific department.
- Fill missing salaries with the department average.
- Create a MultiIndex grouping employees by **Department and Year of Joining**.
- Display the department with the highest average salary.