# Pandas Practice Notebook
### Dataset: AusApparalSales4thQrt2020.csv (Australian Apparel Sales Q4 2020)

**Columns:** Date, Time, State, Group, Unit, Sales

**Instructions:**
- Each question is in a markdown cell
- Write your solution in the empty code cell below each question
- Run cells with `Shift + Enter`
- Refer to `pandas_basics.md` for syntax help

In [None]:
# Setup - Run this cell first!
import pandas as pd
import numpy as np

df = pd.read_csv('../AusApparalSales4thQrt2020.csv')
df['Date'] = pd.to_datetime(df['Date'], format='%d-%b-%Y')
for col in df.select_dtypes(include='object').columns:
    df[col] = df[col].str.strip()

print(f"Dataset loaded: {df.shape[0]} rows, {df.shape[1]} columns")
print(f"Columns: {list(df.columns)}")
df.head()

---
## Section 1: Loading & Exploring Data

**Q1.** Load the CSV file into a DataFrame. Display the first 10 rows and last 5 rows.

In [None]:
# Q1: Your solution here


**Q2.** Print the shape of the DataFrame. How many rows and columns are there?

In [None]:
# Q2: Your solution here


**Q3.** Use `.info()` to check column data types and non-null counts. Are there any missing values?

In [None]:
# Q3: Your solution here


**Q4.** Use `.describe()` to get summary statistics for numeric columns. What is the mean, min, and max of Sales?

In [None]:
# Q4: Your solution here


**Q5.** Print all unique values in the `State`, `Group`, and `Time` columns. How many unique values does each have?

In [None]:
# Q5: Your solution here


**Q6.** Display 10 random rows from the DataFrame using `.sample()`.

In [None]:
# Q6: Your solution here


**Q7.** Check the data types of each column. Is the `Date` column a datetime type? If not, convert it.

In [None]:
# Q7: Your solution here


---
## Section 2: Selecting & Filtering Data

**Q8.** Select only the `State` and `Sales` columns. Display the first 10 rows.

In [None]:
# Q8: Your solution here


**Q9.** Select rows 100 to 110 using `.iloc[]`.

In [None]:
# Q9: Your solution here


**Q10.** Select all rows where `State` is `'WA'`.

In [None]:
# Q10: Your solution here


**Q11.** Select all rows where `Sales` is greater than 30000.

In [None]:
# Q11: Your solution here


**Q12.** Select all rows where `Group` is `'Women'` AND `Time` is `'Morning'`.

In [None]:
# Q12: Your solution here


**Q13.** Select all rows where `State` is either `'WA'`, `'NT'`, or `'SA'` using `.isin()`.

In [None]:
# Q13: Your solution here


**Q14.** Select all rows where `Sales` is between 15000 and 35000 (inclusive).

In [None]:
# Q14: Your solution here


**Q15.** Use `.query()` method to find all rows where `Unit > 15` and `Group == 'Kids'`.

In [None]:
# Q15: Your solution here


**Q16.** Find all rows where the `Group` column contains the letter `'e'` (using `.str.contains()`).

In [None]:
# Q16: Your solution here


---
## Section 3: Adding, Modifying & Removing Columns

**Q17.** Add a new column `Revenue_Per_Unit` which is `Sales / Unit`.

In [None]:
# Q17: Your solution here


**Q18.** Add a new column `Sales_Category` that labels each row as:
- `'Low'` if Sales < 15000
- `'Medium'` if Sales between 15000 and 30000
- `'High'` if Sales > 30000

In [None]:
# Q18: Your solution here


**Q19.** Add a new column `Month` extracted from the `Date` column (after converting to datetime).

In [None]:
# Q19: Your solution here


**Q20.** Rename the column `Unit` to `Units_Sold` and `Sales` to `Total_Sales`.

In [None]:
# Q20: Your solution here


**Q21.** Drop the `Revenue_Per_Unit` column you created in Q17.

In [None]:
# Q21: Your solution here


**Q22.** Reorder the columns so that `Date` is first, followed by `State`, `Group`, `Time`, `Unit`, `Sales`.

In [None]:
# Q22: Your solution here


---
## Section 4: Sorting

**Q23.** Sort the DataFrame by `Sales` in descending order. Display the top 10 highest sales.

In [None]:
# Q23: Your solution here


**Q24.** Sort by `State` (ascending) and then by `Sales` (descending).

In [None]:
# Q24: Your solution here


**Q25.** Find the top 5 rows with the highest `Unit` values using `.nlargest()`.

In [None]:
# Q25: Your solution here


**Q26.** Find the bottom 5 rows with the lowest `Sales` values using `.nsmallest()`.

In [None]:
# Q26: Your solution here


**Q27.** Sort by `Date` (chronological order). You may need to convert `Date` to datetime first.

In [None]:
# Q27: Your solution here


---
## Section 5: Grouping & Aggregation

**Q28.** Find the total sales for each `State`. Which state has the highest total sales?

In [None]:
# Q28: Your solution here


**Q29.** Find the average units sold per `Group` (Kids, Men, Women, Seniors).

In [None]:
# Q29: Your solution here


**Q30.** Find the total sales for each `Time` period (Morning, Afternoon, Evening).

In [None]:
# Q30: Your solution here


**Q31.** Group by `State` and `Group`, then find the mean sales for each combination.

In [None]:
# Q31: Your solution here


**Q32.** Group by `State` and calculate multiple aggregations on `Sales`: mean, sum, count, min, max.

In [None]:
# Q32: Your solution here


**Q33.** Find the total units sold per state per month. (Hint: extract month from Date first)

In [None]:
# Q33: Your solution here


**Q34.** Use `.transform()` to add a column showing each state's average sales alongside each row.

In [None]:
# Q34: Your solution here


**Q35.** For each `Group`, find the state with the highest average sales.

In [None]:
# Q35: Your solution here


**Q36.** Calculate the percentage contribution of each `State` to total sales.

In [None]:
# Q36: Your solution here


---
## Section 6: Handling Missing Data

**Q37.** Check if there are any missing values in the DataFrame. Show the count per column.

In [None]:
# Q37: Your solution here


**Q38.** Intentionally set 50 random `Sales` values to `NaN`. Then:
- Count the total NaN values
- Fill NaN with the mean of `Sales`
- Alternatively, fill NaN with forward fill (`ffill`)

In [None]:
# Q38: Your solution here


**Q39.** From the modified DataFrame (with NaN), drop all rows that have any missing values. How many rows remain?

In [None]:
# Q39: Your solution here


**Q40.** Fill missing `Sales` values with the mean sales of their respective `State` group.

In [None]:
# Q40: Your solution here


---
## Section 7: Pivot Tables & Cross Tabs

**Q41.** Create a pivot table showing average `Sales` for each `State` (rows) and `Time` (columns).

In [None]:
# Q41: Your solution here


**Q42.** Create a pivot table showing total `Unit` sold for each `Group` (rows) and `State` (columns).

In [None]:
# Q42: Your solution here


**Q43.** Create a cross-tabulation of `State` and `Group` (count of occurrences).

In [None]:
# Q43: Your solution here


**Q44.** Create a pivot table showing average `Sales` by `Month` (rows) and `Group` (columns).

In [None]:
# Q44: Your solution here


**Q45.** From the pivot table in Q41, which State-Time combination has the highest average sales?

In [None]:
# Q45: Your solution here


---
## Section 8: Merging & Concatenating

**Q46.** Split the DataFrame into two: `df_morning` (Time == Morning) and `df_evening` (Time == Evening). Concatenate them back together vertically.

In [None]:
# Q46: Your solution here


**Q47.** Split the DataFrame into `df_wa` (State == WA) and `df_nt` (State == NT). Concatenate them and reset the index.

In [None]:
# Q47: Your solution here


**Q48.** Create a small DataFrame with State abbreviations and full names:
```python
state_names = pd.DataFrame({
    'State': ['WA', 'NT', 'SA', 'TAS'],
    'Full_Name': ['Western Australia', 'Northern Territory', 'South Australia', 'Tasmania']
})
```
Merge this with the main DataFrame to add full state names.

In [None]:
# Q48: Your solution here


**Q49.** Split the data by month into 3 separate DataFrames (Oct, Nov, Dec). Then concatenate them back and verify the shape matches the original.

In [None]:
# Q49: Your solution here


---
## Section 9: String Operations & Data Cleaning

**Q50.** Check if there are any leading/trailing spaces in the `State` or `Group` columns. If yes, strip them.

In [None]:
# Q50: Your solution here


**Q51.** Convert all values in the `Group` column to uppercase.

In [None]:
# Q51: Your solution here


**Q52.** Create a new column `State_Group` that combines `State` and `Group` with a hyphen (e.g., `'WA-Kids'`).

In [None]:
# Q52: Your solution here


**Q53.** Find all rows where `Time` starts with `'M'`.

In [None]:
# Q53: Your solution here


**Q54.** Replace `'WA'` with `'Western Australia'` in the `State` column.

In [None]:
# Q54: Your solution here


---
## Section 10: Date & Time Operations

**Q55.** Convert the `Date` column to datetime format. Extract `Year`, `Month`, `Day`, and `Day_Name` into separate columns.

In [None]:
# Q55: Your solution here


**Q56.** Find the total sales for each day of the week (Monday, Tuesday, etc.). Which day has the highest sales?

In [None]:
# Q56: Your solution here


**Q57.** Find the total sales per week number.

In [None]:
# Q57: Your solution here


**Q58.** Filter data for only the month of November 2020.

In [None]:
# Q58: Your solution here


**Q59.** Find the date with the single highest total daily sales (sum of all sales on that date).

In [None]:
# Q59: Your solution here


**Q60.** Calculate the 7-day rolling average of daily total sales.

In [None]:
# Q60: Your solution here


---
## Section 11: Apply & Map

**Q61.** Use `.apply()` with a lambda function to create a column that doubles the `Sales` value.

In [None]:
# Q61: Your solution here


**Q62.** Use `.apply()` to categorize `Unit` as:
- `'Low Demand'` if Unit <= 5
- `'Medium Demand'` if Unit between 6-15
- `'High Demand'` if Unit > 15

In [None]:
# Q62: Your solution here


**Q63.** Use `.map()` to replace `Time` values: `Morning → AM`, `Afternoon → PM`, `Evening → EVE`.

In [None]:
# Q63: Your solution here


**Q64.** Write a custom function that takes a row and returns `Sales * Unit`. Apply it row-wise using `.apply(axis=1)`.

In [None]:
# Q64: Your solution here


---
## Section 12: Advanced Analysis

**Q65.** Find the top 3 states by total sales for each month.

In [None]:
# Q65: Your solution here


**Q66.** Calculate the month-over-month sales growth percentage for each state.

In [None]:
# Q66: Your solution here


**Q67.** Find the state where the difference between max and min daily sales is the largest.

In [None]:
# Q67: Your solution here


**Q68.** Rank all states by their total sales using `.rank()`.

In [None]:
# Q68: Your solution here


**Q69.** Create a new column showing the cumulative sales for each state (sorted by date).

In [None]:
# Q69: Your solution here


**Q70.** Use `pd.cut()` to bin `Sales` into 4 categories: Low, Medium, High, Premium. Show the count per bin.

In [None]:
# Q70: Your solution here


---
## Congratulations!
You've completed all 70 Pandas practice questions.

**Next steps:**
- Review any questions you found difficult
- Move on to the **Matplotlib** practice notebook
- Refer to `pandas_basics.md` for any syntax you want to revisit