# Getting Started with pycancensus

This notebook demonstrates the enhanced pycancensus functionality with **clear hierarchy examples**.

## Key Features Demonstrated:
- 📊 **list_census_vectors()** - Browse all available data variables
- 🌳 **Vector Hierarchies** - Navigate parent-child relationships
- 🔍 **find_census_vectors()** - Smart search functionality
- 📈 **Real Data Retrieval** - Get actual census data

> **Note**: This notebook includes executed outputs showing real API results

In [1]:
import pycancensus
from pycancensus import (
    list_census_datasets, 
    list_census_vectors, 
    get_census,
    parent_census_vectors,
    child_census_vectors,
    find_census_vectors
)
import pandas as pd
import os

# Set API key
api_key = open(os.path.expanduser('~/.Renviron')).read().split('=')[1].strip()
pycancensus.set_api_key(api_key)
print("✅ pycancensus imported successfully!")

API key set for current session.
✅ pycancensus imported successfully!

## 1. Exploring Census Vectors with list_census_vectors()

The `list_census_vectors()` function shows all available data variables:

In [2]:
# List all vectors for 2021 Census
vectors_ca21 = list_census_vectors('CA21')
print(f"📊 CA21 Census has {len(vectors_ca21):,} vectors available")
print(f"📋 Columns: {list(vectors_ca21.columns)}")

# Show how many vectors have parent relationships
with_parents = vectors_ca21[vectors_ca21['parent_vector'].notna()]
print(f"🔗 Vectors with parent relationships: {len(with_parents):,} out of {len(vectors_ca21):,}")
print("\nSample hierarchy examples:")
with_parents[['vector', 'parent_vector', 'label']].head()

Reading vectors from cache...
📊 CA21 Census has 7,709 vectors available
📋 Columns: ['vector', 'label', 'type', 'units', 'aggregation', 'parent_vector', 'details']

🔗 Vectors with parent relationships: 7,448 out of 7,709

Sample hierarchy examples:

Unnamed: 0,vector,parent_vector,label
4,v_CA21_5,v_CA21_4,Private dwellings occupied by usual residents
10,v_CA21_11,v_CA21_8,0 to 14 years
11,v_CA21_14,v_CA21_11,0 to 4 years
12,v_CA21_17,v_CA21_14,Under 1 year
13,v_CA21_20,v_CA21_14,1 year


## 2. Vector Hierarchy Navigation (NEW!)

Unlike the previous version that had limited hierarchy examples, let's demonstrate **clear parent-child relationships**:

In [3]:
# Find the age total vector (this is our ROOT)
age_total = vectors_ca21[vectors_ca21['label'] == 'Total - Age'].iloc[0]
print(f"🌳 Age Demographics Hierarchy\n")
print(f"📊 ROOT: {age_total['vector']} - {age_total['label']}")
print(f"\n📊 LEVEL 1 - Major Age Groups:")

# Get its direct children (major age groups)
age_children = child_census_vectors(age_total['vector'], 'CA21')
age_children[['vector', 'label', 'parent_vector']]

🌳 Age Demographics Hierarchy

📊 ROOT: v_CA21_8 - Total - Age

📊 LEVEL 1 - Major Age Groups:

Unnamed: 0,vector,label,parent_vector
0,v_CA21_11,0 to 14 years,v_CA21_8
1,v_CA21_68,15 to 64 years,v_CA21_8
2,v_CA21_251,65 years and over,v_CA21_8


In [4]:
# Drill down into 0-14 age group
child_ages = child_census_vectors('v_CA21_11', 'CA21')
print(f"📊 LEVEL 2 - Detailed breakdown of '0 to 14 years'(v_CA21_11):")
child_ages[['vector', 'label', 'parent_vector']]

📊 LEVEL 2 - Detailed breakdown of '0 to 14 years'(v_CA21_11):

Unnamed: 0,vector,label,parent_vector
0,v_CA21_14,0 to 4 years,v_CA21_11
1,v_CA21_32,5 to 9 years,v_CA21_11
2,v_CA21_50,10 to 14 years,v_CA21_11


In [5]:
# Even more detailed: individual years
detailed_ages = child_census_vectors('v_CA21_14', 'CA21')
print(f"📊 LEVEL 3 - Individual years for '0 to 4 years' (v_CA21_14):")
detailed_ages[['vector', 'label', 'parent_vector']]

📊 LEVEL 3 - Individual years for '0 to 4 years' (v_CA21_14):

Unnamed: 0,vector,label,parent_vector
0,v_CA21_17,Under 1 year,v_CA21_14
1,v_CA21_20,1 year,v_CA21_14
2,v_CA21_23,2 years,v_CA21_14
3,v_CA21_26,3 years,v_CA21_14
4,v_CA21_29,4 years,v_CA21_14


### Finding Parent Vectors

You can also navigate **upward** in the hierarchy:

In [6]:
# Find parent of a specific vector
parent = parent_census_vectors('v_CA21_17', 'CA21')  # Under 1 year
print(f"⬆️  Finding parent of 'Under 1 year' (v_CA21_17):")
parent[['vector', 'label', 'parent_vector']]

⬆️  Finding parent of 'Under 1 year' (v_CA21_17):

Unnamed: 0,vector,label,parent_vector
0,v_CA21_14,0 to 4 years,v_CA21_11


## 3. Enhanced Vector Search

The `find_census_vectors()` function provides smart search with relevance scoring:

In [7]:
# Search for income-related vectors
income_vectors = find_census_vectors('CA21', 'income')
print(f"🔍 Found {len(income_vectors)} income-related vectors")
print(f"\nTop income vectors (sorted by relevance):")
income_vectors[['vector', 'label', 'relevance_score']].head(3)

🔍 Found 649 income-related vectors

Top income vectors (sorted by relevance):

Unnamed: 0,vector,label,relevance_score
0,v_CA21_563,Total - Total income groups in 2020 for the p...,15.0
1,v_CA21_906,Total - Total income groups in 2020 for the p...,15.0
2,v_CA21_1249,Total - Total income groups in 2020 for the p...,15.0


## 4. Real Data Retrieval

Finally, let's get actual census data using our hierarchy vectors:

In [8]:
# Get real data for Toronto CMA using our hierarchy vectors
toronto_data = get_census(
    dataset='CA21',
    regions={'cma': '535'},  # Toronto CMA
    vectors=['v_CA21_8', 'v_CA21_11', 'v_CA21_68', 'v_CA21_251'],
    level='cma',
    use_cache=False
)

print(f"📈 Toronto CMA Age Demographics:")
print(f"\nAge Distribution:")
total_pop = toronto_data['v_CA21_8'].iloc[0]
age_0_14 = toronto_data['v_CA21_11'].iloc[0]
age_15_64 = toronto_data['v_CA21_68'].iloc[0] 
age_65_plus = toronto_data['v_CA21_251'].iloc[0]

print(f"• 0-14 years: {age_0_14:,} ({age_0_14/total_pop*100:.1f}%)")
print(f"• 15-64 years: {age_15_64:,} ({age_15_64/total_pop*100:.1f}%)")
print(f"• 65+ years: {age_65_plus:,} ({age_65_plus/total_pop*100:.1f}%)")
print(f"• TOTAL: {total_pop:,}")

📈 Toronto CMA Age Demographics:

Age Distribution:
• 0-14 years: 919,815 (14.8%)
• 15-64 years: 4,197,590 (67.7%)
• 65+ years: 1,084,820 (17.5%)
• TOTAL: 6,202,225

## Summary

✅ **This notebook demonstrates the enhanced pycancensus capabilities:**

1. **list_census_vectors()** - Browse 7,709 available variables with explicit parent-child relationships
2. **Hierarchy Navigation** - Navigate through age demographics from broad categories to individual years
3. **parent_census_vectors()** & **child_census_vectors()** - Navigate up and down the hierarchy
4. **find_census_vectors()** - Smart search with relevance scoring (649 income variables found)
5. **Real Data** - Actual Toronto CMA demographics retrieved and analyzed

🎯 **Key Improvement**: Unlike previous versions, these hierarchy functions now work with **clear, well-defined parent-child relationships** in the census data structure.

### Next Steps:
- Explore other hierarchies (income, education, housing)
- Try different geographic levels (province, census division, etc.)
- Use `geo_format='geopandas'` for spatial analysis