# Python for Data Analysis - Week 2
## Practice Exercises: Pandas Fundamentals I (Part 3)

### Overview
This notebook is a continuation of the practice exercises for Week 2's Pandas Fundamentals session. Please complete Parts 1 and 2 before starting this notebook.

### Instructions
1. Read each exercise carefully
2. Write your code in the provided cells
3. Run your code to check your solution
4. Compare your approach with the provided solution
5. If you're stuck, review the lecture materials or ask for help

## Setup

First, let's import the necessary libraries and recreate our dataset from previous parts.

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# For plotting in the notebook
%matplotlib inline

# Set display options
pd.set_option('display.max_columns', None)  # Show all columns
pd.set_option('display.max_rows', 15)       # Limit number of rows shown
pd.set_option('display.width', 1000)        # Set width of display

# Load the numeric_data.csv file
numeric_df = pd.read_csv('../Data/numeric_data.csv')

print("Libraries imported and data loaded successfully!")
print(numeric_df.head())

## Section 3: Row Selection and Filtering

In this section, we'll practice filtering rows based on various conditions.

### Exercise 3.1: Basic Filtering

Using the `numeric_df` DataFrame, perform the following row selections:

1. Select rows where value1 is greater than 6
2. Select rows where category is 'A'
3. Select rows where value2 is between 10 and 12 (inclusive)
4. Select the first 3 rows of the DataFrame

In [None]:
# Your code here


### Solution 3.1

In [None]:
# 1. Select rows where value1 is greater than 6
print("Rows where value1 > 6:")
print(numeric_df[numeric_df['value1'] > 6])

# 2. Select rows where category is 'A'
print("\nRows where category is 'A':")
print(numeric_df[numeric_df['category'] == 'A'])

# 3. Select rows where value2 is between 10 and 12
print("\nRows where 10 <= value2 <= 12:")
print(numeric_df[(numeric_df['value2'] >= 10) & (numeric_df['value2'] <= 12)])

# 4. Select the first 3 rows
print("\nFirst 3 rows:")
print(numeric_df.iloc[0:3])

### Exercise 3.2: Advanced Filtering

Using the `numeric_df` DataFrame, perform the following advanced row selections:

1. Select rows where category is 'A' or 'B'
2. Select rows where value1 is greater than 6 AND value2 is less than 13
3. Select rows where category is 'C' OR value3 is greater than 20
4. Select rows where (category is 'A' AND value1 > 5) OR (category is 'B' AND value1 < 6)

In [None]:
# Your code here


### Solution 3.2

In [None]:
# 1. Select rows where category is 'A' or 'B'
print("Rows where category is 'A' or 'B':")
print(numeric_df[numeric_df['category'].isin(['A', 'B'])])
# Alternative: numeric_df[(numeric_df['category'] == 'A') | (numeric_df['category'] == 'B')]

# 2. Select rows where value1 > 6 AND value2 < 13
print("\nRows where value1 > 6 AND value2 < 13:")
print(numeric_df[(numeric_df['value1'] > 6) & (numeric_df['value2'] < 13)])

# 3. Select rows where category is 'C' OR value3 > 20
print("\nRows where category is 'C' OR value3 > 20:")
print(numeric_df[(numeric_df['category'] == 'C') | (numeric_df['value3'] > 20)])

# 4. Select rows where (category is 'A' AND value1 > 5) OR (category is 'B' AND value1 < 6)
print("\nRows where (category is 'A' AND value1 > 5) OR (category is 'B' AND value1 < 6):")
condition1 = (numeric_df['category'] == 'A') & (numeric_df['value1'] > 5)
condition2 = (numeric_df['category'] == 'B') & (numeric_df['value1'] < 6)
print(numeric_df[condition1 | condition2])

### Exercise 3.3: Using loc and iloc

Using the `numeric_df` DataFrame, perform the following selections using `loc` and `iloc`:

1. Use `loc` to select rows where category is 'A' and display only the value1 and value2 columns
2. Use `iloc` to select the first 3 rows and columns 1 through 3
3. Use `loc` to select rows where value1 is greater than the mean value1 and display all columns
4. Use `iloc` to select every other row and every other column starting from the first

In [None]:
# Your code here


### Solution 3.3

In [None]:
# 1. Use loc to select rows where category is 'A' and display value1 and value2
print("Rows where category is 'A', showing value1 and value2:")
print(numeric_df.loc[numeric_df['category'] == 'A', ['value1', 'value2']])

# 2. Use iloc to select the first 3 rows and columns 1 through 3
print("\nFirst 3 rows, columns 1-3:")
print(numeric_df.iloc[0:3, 1:4])

# 3. Use loc to select rows where value1 > mean value1
value1_mean = numeric_df['value1'].mean()
print(f"\nRows where value1 > {value1_mean:.2f} (mean):")
print(numeric_df.loc[numeric_df['value1'] > value1_mean])

# 4. Use iloc to select every other row and every other column
print("\nEvery other row and every other column:")
print(numeric_df.iloc[::2, ::2])

## Section 4: SQL to Pandas Translation

In this section, we'll practice translating SQL queries to their pandas equivalents.

### Exercise 4.1: Basic SQL Translations

For each of the following SQL queries, write the equivalent pandas code using the `numeric_df` DataFrame:

1. `SELECT * FROM table WHERE category = 'A'`
2. `SELECT value1, value2 FROM table ORDER BY value1 DESC`
3. `SELECT * FROM table WHERE value1 > 6 AND value2 < 13`
4. `SELECT * FROM table LIMIT 5`

In [None]:
# Your code here


### Solution 4.1

In [None]:
# 1. SELECT * FROM table WHERE category = 'A'
query1 = numeric_df[numeric_df['category'] == 'A']
print("Query 1: SELECT * FROM table WHERE category = 'A'")
print(query1)

# 2. SELECT value1, value2 FROM table ORDER BY value1 DESC
query2 = numeric_df[['value1', 'value2']].sort_values('value1', ascending=False)
print("\nQuery 2: SELECT value1, value2 FROM table ORDER BY value1 DESC")
print(query2)

# 3. SELECT * FROM table WHERE value1 > 6 AND value2 < 13
query3 = numeric_df[(numeric_df['value1'] > 6) & (numeric_df['value2'] < 13)]
print("\nQuery 3: SELECT * FROM table WHERE value1 > 6 AND value2 < 13")
print(query3)

# 4. SELECT * FROM table LIMIT 5
query4 = numeric_df.head(5)
print("\nQuery 4: SELECT * FROM table LIMIT 5")
print(query4)

### Exercise 4.2: Intermediate SQL Translations

For each of the following SQL queries, write the equivalent pandas code using the `numeric_df` DataFrame:

1. `SELECT category, COUNT(*) FROM table GROUP BY category`
2. `SELECT category, AVG(value1) as avg_value1 FROM table GROUP BY category ORDER BY avg_value1 DESC`
3. `SELECT * FROM table WHERE value1 IN (5.5, 6.3, 7.2)`
4. `SELECT category, SUM(value1) as total_value1 FROM table GROUP BY category HAVING SUM(value1) > 15`

In [None]:
# Your code here


### Solution 4.2

In [None]:
# 1. SELECT category, COUNT(*) FROM table GROUP BY category
query5 = numeric_df.groupby('category').size().reset_index(name='count')
print("Query 5: SELECT category, COUNT(*) FROM table GROUP BY category")
print(query5)

# 2. SELECT category, AVG(value1) as avg_value1 FROM table GROUP BY category ORDER BY avg_value1 DESC
query6 = numeric_df.groupby('category')['value1'].mean().reset_index(name='avg_value1').sort_values('avg_value1', ascending=False)
print("\nQuery 6: SELECT category, AVG(value1) FROM table GROUP BY category ORDER BY avg_value1 DESC")
print(query6)

# 3. SELECT * FROM table WHERE value1 IN (5.5, 6.3, 7.2)
query7 = numeric_df[numeric_df['value1'].isin([5.5, 6.3, 7.2])]
print("\nQuery 7: SELECT * FROM table WHERE value1 IN (5.5, 6.3, 7.2)")
print(query7)

# 4. SELECT category, SUM(value1) as total_value1 FROM table GROUP BY category HAVING SUM(value1) > 15
# First group by category and sum value1
grouped = numeric_df.groupby('category')['value1'].sum().reset_index(name='total_value1')
# Then filter for total_value1 > 15
query8 = grouped[grouped['total_value1'] > 15]
print("\nQuery 8: SELECT category, SUM(value1) as total_value1 FROM table GROUP BY category HAVING SUM(value1) > 15")
print(query8)

## Conclusion

Congratulations! You've completed the practice exercises for Pandas Fundamentals I. These exercises have covered:

1. Creating and exploring DataFrames
2. Column selection and basic operations
3. Row selection and filtering
4. SQL to Pandas translation

These skills form the foundation of data analysis with pandas. As you become more comfortable with these operations, you'll be able to manipulate and analyze data more efficiently.

### Next Steps

- Review any exercises you found challenging
- Experiment with different ways to accomplish the same tasks
- Practice applying these concepts to other datasets
- Explore the pandas documentation for additional functionality

In the next sessions, we'll build on these fundamentals to explore more advanced pandas operations and data visualization techniques.