# NYC Population: Master Practice Notebook

This notebook combines all previous exercises into one master file. It also includes a **Cheat Sheet** of the concepts you found tricky.

## ðŸ§  Pandas Cheat Sheet (The "Gotchas")

### 1. Filtering vs. Sorting
*   **Filtering (Selecting Rows):** Use `df[ condition ]`.
    *   *Example:* `df[df["Borough"] == "Queens"]`
*   **Sorting:** Use `df.sort_values()`. Do **not** put this inside `[]`.
    *   *Example:* `df.sort_values(by="Population", ascending=False)`

### 2. Finding the "Row with the Max Value"
*   **`df.max()`**: Gives you the highest *number* (e.g., 1,000,000). It does NOT give you the district name.
*   **`df.idxmax()`**: Gives you the *index ID* (row number) of the highest value.
*   **`df.loc[ ... ]`**: Uses that ID to get the actual row.
    *   *Pattern:* `df.loc[df["Column"].idxmax()]`

### 3. String Methods
*   Columns don't have `.len()` or `.startswith()`. You must use the **`.str`** accessor.
    *   *Length:* `df["Name"].str.len()`
    *   *Contains:* `df["Name"].str.contains("Park")`
    *   *Starts With:* `df["Name"].str.startswith("W")`

### 4. Combining Conditions
*   Use `&` for AND, `|` for OR.
*   **CRITICAL:** You MUST wrap each condition in parentheses `( )`.
    *   *Correct:* `(df["Pop"] > 100) & (df["Borough"] == "Bronx")`
    *   *Wrong:* `df["Pop"] > 100 & df["Borough"] == "Bronx"`

### 5. `.loc` Syntax
*   `df.loc[ ROWS , COLUMNS ]`
*   If you want specific columns for your filtered rows, use a comma.
    *   *Example:* `df.loc[df["Borough"]=="Queens", "CD Name"]`

---
## Setup
Run this cell to load and clean the data.


In [None]:
import pandas as pd

# Read and clean data
df = pd.read_csv('2_New_York_City_Population_By_Community_Districts_20251114 (1).csv')
pop_cols = ['1970 Population', '1980 Population', '1990 Population', '2000 Population', '2010 Population']
for col in pop_cols:
    df[col] = df[col].astype(str).str.replace(',', '').astype(int)

# Re-create the Growth columns
df['Growth_2000_2010'] = df['2010 Population'] - df['2000 Population']
df['Growth_Rate'] = (df['Growth_2000_2010'] / df['2000 Population']) * 100

df.head()

# Part 1: Basic Exercises

## Question 1
How many Community Districts (rows) are in the dataframe?

## Question 2
What was the total population of New York City in 2010?

## Question 3
What is the highest population recorded for any single district in 2010?

## Question 4
Which Community Districts had a 2010 population greater than 200,000?

## Question 5
Which Community District has the longest 'CD Name'?

## Question 6
Which specific Community District had the highest population in 1970?

## Question 7
What is the total population of the 'Bronx' borough in 2010?

## Question 8
Which Community District had the lowest population in 1990?

## Question 9
Create a new column 'Growth_2000_2010' representing the raw population change (2010 Population - 2000 Population).

## Question 10
What is the maximum value in the 'Growth_2000_2010' column?

## Question 11
Which Community District experienced this maximum growth?

## Question 12
Create a new column 'Growth_Rate' which is (Growth_2000_2010 / 2000 Population) * 100.

## Question 13
Of the districts with a Growth Rate greater than 10%, which one had the highest 2010 Population?

## Question 14
How many districts in 'Brooklyn' saw a population decrease (negative growth) between 2000 and 2010?

## Question 15
What is the 2010 population of the district with the shortest 'CD Name'?

# Part 2: Advanced Practice (Fixing Common Mistakes)

## 1. String vs. Column Logic
Filter the dataframe to show only rows where the 'Borough' is 'Brooklyn'.
*(Remember: Compare the column `df['Borough']`, not the string 'Borough' itself)*

## 2. The `.str` Accessor
Create a new column called `Name_Length` that contains the number of characters in the `CD Name` column.
*(Hint: You can't use `len()` directly on a column!)*

## 3. Row Selection vs. Aggregation
Which **specific district** (row) had the lowest population in 1980?
*(Hint: `df.min()` gives you the lowest number, but `df.sort_values()` or `.idxmin()` helps you find the district associated with it.)*

## 4. Using `.loc` with a Comma
Use `.loc[row, col]` to select the `CD Name` and `2010 Population` columns for all districts in 'Queens'.
*(Try to do this without chaining `[][]`)*

## 5. Multiple Conditions
Find all districts where the `2010 Population` is greater than 150,000 **AND** the `Borough` is 'Bronx'.
*(Hint: Use `&` and wrap each condition in parentheses `()`)*

## 6. String Searching
Filter for all districts that have the word "Park" anywhere in their `CD Name`.
*(Hint: Look for a method like `.str.contains()`)*

## 7. Sorting to find Top Items
Find the top 3 districts with the highest `Growth_Rate`. Display their names and rates.
*(Hint: Use `.sort_values()`)*

## 8. Starts With
Create a filter to find all districts where the `CD Name` starts with the letter 'W'.

## 9. The Aggregation Trap
Calculate the average `2010 Population` for all districts in 'Manhattan'.
*(Hint: Filter first, then aggregate)*

## 10. Complex Selection
Find the `CD Name` of the district with the highest `2000 Population`.
*(Try to do this using `.loc` and `.idxmax()` for the most efficient solution)*