# Class 9: Introduction to Pandas

Welcome to the ninth class of our Python course! Today, we'll explore Pandas, a powerful library for data manipulation and analysis in Python. Pandas provides data structures like Series and DataFrames that make it easier to work with structured data. Let's dive in!

## 1. Pandas Basics

### 1.1. Introduction to Pandas Series and DataFrames

**Pandas Series** is a one-dimensional labeled array capable of holding any data type. It can be thought of as a column in a spreadsheet or a database table.

**Pandas DataFrame** is a two-dimensional labeled data structure with columns that can be of different data types. It is similar to a table in a relational database or an Excel spreadsheet.

**Importing Pandas:**

In [None]:
import pandas as pd

### 1.2. Creating Series and DataFrames

**Creating a Pandas Series:**

In [None]:
# Creating a Series from a list
data = [1, 2, 3, 4, 5]
ser = pd.Series(data)
print(ser)

# Creating a Series with custom indices
ser = pd.Series(data, index=['a', 'b', 'c', 'd', 'e'])
print(ser)

**Creating a Pandas DataFrame:**

In [None]:
# Creating a DataFrame from a dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df)

# Creating a DataFrame from a list of lists
data = [
    ['Alice', 25, 'New York'],
    ['Bob', 30, 'Los Angeles'],
    ['Charlie', 35, 'Chicago']
]
df = pd.DataFrame(data, columns=['Name', 'Age', 'City'])
print(df)

## 2. DataFrame Operations

### 2.1. Indexing and Selecting Data

Pandas provides various ways to index and select data in a DataFrame.

**Selecting a column:**

In [None]:
# Selecting a single column
print(df['Name'])

# Selecting multiple columns
print(df[['Name', 'City']])

**Selecting rows using `.loc` and `.iloc`:**

In [None]:
# Selecting rows by label using .loc
print(df.loc[0])  # First row

# Selecting rows by integer location using .iloc
print(df.iloc[1])  # Second row

# Selecting a subset of rows and columns
print(df.loc[0:1, ['Name', 'City']])

### 2.2. Adding and Removing Columns

You can easily add or remove columns in a DataFrame.

**Adding a new column:**

In [None]:
# Adding a new column
df['Salary'] = [70000, 80000, 90000]
print(df)

**Removing a column:**

In [None]:
# Removing a column
df = df.drop('Salary', axis=1)
print(df)

## 3. Handling Missing Data

### 3.1. Identifying Missing Data

In real-world data, it's common to encounter missing values. Pandas provides functions to identify missing data.

In [None]:
# Creating a DataFrame with missing values
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, None, 40],
    'City': ['New York', None, 'Chicago', 'Houston']
}
df = pd.DataFrame(data)
print(df)

# Checking for missing values
print(df.isnull())

# Counting missing values in each column
print(df.isnull().sum())

### 3.2. Handling Missing Data (Drop, Fill)

Pandas provides several options for handling missing data, including dropping or filling missing values.

**Dropping missing data:**

In [None]:
# Dropping rows with missing values
df_dropped = df.dropna()
print(df_dropped)

# Dropping columns with missing values
df_dropped_col = df.dropna(axis=1)
print(df_dropped_col)

**Filling missing data:**

In [None]:
# Filling missing values with a specific value
df_filled = df.fillna('Unknown')
print(df_filled)

# Filling missing values with the mean of the column
df['Age'] = df['Age'].fillna(df['Age'].mean())
print(df)

## 4. Exercises

Now it's time to practice what you've learned! Try to solve the following exercises.

### Exercise 1: Create a DataFrame

Create a DataFrame representing a small dataset with information about three products (Product Name, Price, Quantity). Add a new column that calculates the total value of each product (Price * Quantity).

### Exercise 2: Indexing and Selecting Data

Given the following DataFrame:

In [None]:
data = {
    'Student': ['John', 'Jane', 'Jim', 'Jake'],
    'Math': [85, 92, 78, 88],
    'Science': [90, 85, 80, 70]
}
df = pd.DataFrame(data)

Select the 'Math' scores for the first three students.

### Exercise 3: Handling Missing Data

Create a DataFrame with some missing values. Use different methods to handle the missing data (e.g., drop rows, fill with a specific value, fill with the mean).

### Exercise 4: Add and Remove Columns

Create a DataFrame and add a new column with calculated values. Then, remove one of the existing columns.

### Exercise 5: Summary Statistics

Create a DataFrame and calculate the mean, median, and standard deviation of its numerical columns.

Feel free to experiment with different Pandas functions and explore the vast capabilities of this library. Understanding Pandas is essential for data manipulation and analysis in Python. Happy coding!