# Week 2 - In-Class Exercise: Python & Pandas Fundamentals

**Dataset:** Estados Financieros (Financial Statements) from datos.gov.co

**Time:** ~30 minutes

**Objective:** Practice loading, exploring, and filtering data with pandas.

---

## Part 1: Setup and Data Loading (5 min)

In [None]:
# Import pandas
import pandas as pd

# Verify version (should be 1.x or 2.x)
print(f"pandas version: {pd.__version__}")

In [None]:
# Load the financial statements dataset
# URL from datos.gov.co - Estados Financieros
url = "https://www.datos.gov.co/api/views/m7mk-8d9w/rows.csv?accessType=DOWNLOAD"

df = pd.read_csv(url)

print("Data loaded successfully!")
print(f"Shape: {df.shape[0]} rows, {df.shape[1]} columns")

## Part 2: Basic Exploration (10 min)

Use the "Big 3" methods to understand the data.

### Exercise 2.1: View the first 10 rows

Use `head()` to see what the data looks like.

In [None]:
# YOUR CODE HERE: Display first 10 rows
df.head(10)

### Exercise 2.2: Check the data types

Use `info()` to see column names and data types.

In [None]:
# YOUR CODE HERE: Display DataFrame info
df.info()

**Question:** What are the column names? Are there any numeric columns?

### Exercise 2.3: Get summary statistics

Use `describe()` to see statistics for numeric columns.

In [None]:
# YOUR CODE HERE: Display summary statistics
df.describe()

**Question:** What is the minimum and maximum value in the dataset?

### Exercise 2.4: Check for missing values

In [None]:
# Count missing values per column
df.isnull().sum()

## Part 3: Answering Business Questions (10 min)

### Exercise 3.1: How many unique categories are there?

Hint: Look for a column that contains category information and use `.unique()` or `.nunique()`

In [None]:
# First, let's see all column names
print("Column names:")
print(df.columns.tolist())

In [None]:
# YOUR CODE HERE: Find unique categories
# Replace 'COLUMN_NAME' with the actual category column name

# Option 1: Get all unique values
# df['COLUMN_NAME'].unique()

# Option 2: Count unique values
# df['COLUMN_NAME'].nunique()

### Exercise 3.2: Find the top 5 categories by total value

Steps:
1. Group by the category column
2. Sum the value column
3. Sort descending
4. Take top 5

In [None]:
# YOUR CODE HERE: Top 5 categories by total value
# Hint: df.groupby('category_column')['value_column'].sum().sort_values(ascending=False).head(5)

### Exercise 3.3: Filter the data

Find all rows where the value is greater than 1,000,000,000 (one billion).

In [None]:
# YOUR CODE HERE: Filter for values > 1 billion
# high_value = df[df['value_column'] > 1_000_000_000]
# high_value

**Question:** How many rows have values greater than 1 billion?

## Part 4: Quick Visualization (5 min)

Create a simple bar chart of categories.

In [None]:
# Import matplotlib
import matplotlib.pyplot as plt

# Count rows per category and plot
# df['category_column'].value_counts().head(10).plot(kind='bar', figsize=(10, 6))
# plt.title('Number of Records by Category')
# plt.xlabel('Category')
# plt.ylabel('Count')
# plt.xticks(rotation=45, ha='right')
# plt.tight_layout()
# plt.show()

---

## Summary

In this exercise, you practiced:

- Loading data from a URL with `pd.read_csv()`
- Exploring data with `head()`, `info()`, `describe()`
- Finding unique values with `unique()` and `nunique()`
- Aggregating with `groupby()` and `sum()`
- Filtering with boolean conditions
- Creating a simple bar chart

**Next:** Complete the workshop for more practice!