# Python for Data Analysis - Week 2
## Practice Exercises: Pandas Fundamentals I (Part 2)

### Overview
This notebook is a continuation of the practice exercises for Week 2's Pandas Fundamentals session. Please complete Part 1 before starting this notebook, as we'll be using the same DataFrames we created there.

### Instructions
1. Read each exercise carefully
2. Write your code in the provided cells
3. Run your code to check your solution
4. Compare your approach with the provided solution
5. If you're stuck, review the lecture materials or ask for help

## Setup

First, let's import the necessary libraries and recreate our dataset from Part 1.

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# For plotting in the notebook
%matplotlib inline

# Set display options
pd.set_option('display.max_columns', None)  # Show all columns
pd.set_option('display.max_rows', 15)       # Limit number of rows shown
pd.set_option('display.width', 1000)        # Set width of display

# Load the numeric_data.csv file
numeric_df = pd.read_csv('../Data/numeric_data.csv')

print("Libraries imported and data loaded successfully!")
print(numeric_df.head())

## Section 2: Column Selection and Basic Operations

In this section, we'll practice selecting columns and performing basic operations on DataFrames.

### Exercise 2.1: Selecting Columns

Using the `numeric_df` DataFrame, perform the following column selections:

1. Select a single column using bracket notation
2. Select a single column using dot notation
3. Select multiple columns using a list of column names
4. Select the first three columns using position

In [None]:
# Your code here


### Solution 2.1

In [None]:
# 1. Select a single column using bracket notation
print("Single column using bracket notation:")
print(numeric_df['value1'])

# 2. Select a single column using dot notation
print("\nSingle column using dot notation:")
print(numeric_df.value2)

# 3. Select multiple columns using a list of column names
print("\nMultiple columns using a list:")
print(numeric_df[['value1', 'value3']])

# 4. Select the first three columns using position
print("\nFirst three columns using position:")
print(numeric_df.iloc[:, 0:3])

### Exercise 2.2: Creating New Columns

Using the `numeric_df` DataFrame, perform the following operations:

1. Create a new column called 'total' that is the sum of value1, value2, and value3
2. Create a new column called 'average' that is the average of value1, value2, and value3
3. Create a new column called 'above_average' that is True if value1 is greater than its mean, and False otherwise
4. Create a new column called 'category_code' that maps category 'A' to 1, 'B' to 2, and 'C' to 3

In [None]:
# Your code here


### Solution 2.2

In [None]:
# 1. Create a total column
numeric_df['total'] = numeric_df['value1'] + numeric_df['value2'] + numeric_df['value3']

# 2. Create an average column
numeric_df['average'] = numeric_df[['value1', 'value2', 'value3']].mean(axis=1)

# 3. Create a boolean column based on value1
value1_mean = numeric_df['value1'].mean()
numeric_df['above_average'] = numeric_df['value1'] > value1_mean

# 4. Create a category_code column
category_mapping = {'A': 1, 'B': 2, 'C': 3}
numeric_df['category_code'] = numeric_df['category'].map(category_mapping)

# Display the updated DataFrame
numeric_df

### Exercise 2.3: Handling Missing Values

Create a DataFrame called `missing_df` based on `numeric_df` but introduce some missing values (NaN) in it. Then:

1. Count the number of missing values in each column
2. Drop rows with any missing values and save as a new DataFrame
3. Fill missing values with the mean of their respective columns and save as a new DataFrame
4. Fill missing values with different strategies for each column: forward fill for value1, 0 for value2, and the column mean for value3

In [None]:
# Your code here


### Solution 2.3

In [None]:
# Create a copy of numeric_df
missing_df = numeric_df.copy()

# Introduce missing values
missing_df.loc[1, 'value1'] = np.nan
missing_df.loc[3, 'value2'] = np.nan
missing_df.loc[5, 'value3'] = np.nan
missing_df.loc[7, ['value1', 'value2']] = np.nan

print("DataFrame with missing values:")
print(missing_df)

# 1. Count missing values
print("\nMissing values in each column:")
print(missing_df.isna().sum())

# 2. Drop rows with missing values
df_dropped = missing_df.dropna()
print("\nDataFrame after dropping rows with missing values:")
print(df_dropped)

# 3. Fill missing values with column means
df_mean_filled = missing_df.fillna(missing_df.mean())
print("\nDataFrame after filling missing values with column means:")
print(df_mean_filled[['value1', 'value2', 'value3']])

# 4. Fill missing values with different strategies
df_custom_filled = missing_df.copy()
df_custom_filled['value1'] = df_custom_filled['value1'].fillna(method='ffill')  # Forward fill
df_custom_filled['value2'] = df_custom_filled['value2'].fillna(0)  # Fill with 0
df_custom_filled['value3'] = df_custom_filled['value3'].fillna(df_custom_filled['value3'].mean())  # Fill with mean

print("\nDataFrame after filling missing values with custom strategies:")
print(df_custom_filled[['value1', 'value2', 'value3']])