# Tri-Agency Research Grants Analysis

This notebook analyzes grant data from Canada's three main research funding agencies:
- NSERC (Natural Sciences and Engineering Research Council)
- CIHR (Canadian Institutes of Health Research)
- SSHRC (Social Sciences and Humanities Research Council)

## Guide to Using the Tri-Agency Data Fetcher

This notebook demonstrates how to use the Tri-Agency Data Fetcher to access and analyze grant data from NSERC, SSHRC, and CIHR.

### Basic Setup and Import

```python
import pandas as pd
import numpy as np
from datetime import datetime
from IPython.display import display
from data.fetcher import Fetcher, FetcherConfig

# Initialize fetcher with default settings (minimal output)
fetcher = Fetcher()

# Or initialize with verbose output to see progress
fetcher_verbose = Fetcher(FetcherConfig(verbose=True))
```

### Fetching Data

#### Basic Usage (Default Settings)
```python
# Keeps only latest amendment version
grants_df = fetcher.fetch_all_orgs(
    year="2019",
    verify_ssl=False  # Use if getting SSL verification errors
)
```

#### Alternative Amendment Handling
```python
# Keep all amendments
grants_all = fetcher.fetch_all_orgs(
    year="2019",
    verify_ssl=False,
    handle_amendments='all'
)

# Keep only original grants (no amendments)
grants_original = fetcher.fetch_all_orgs(
    year="2019",
    verify_ssl=False,
    handle_amendments='none'
)
```

### Analyzing Data

#### Built-in Analysis
```python
if not grants_df.empty:
    analysis_results = fetcher.analyze_grants(grants_df)
```

This will show:
- Summary by agency (counts, totals, averages)
- Provincial distribution
- Top 10 recipients
- Funding range distribution

#### Custom Analysis Examples

1. Total funding by organization:
```python
org_totals = grants_df.groupby('org')['agreement_value'].agg(['sum', 'count'])
display(org_totals)
```

2. Average grant value by province:
```python
province_avg = grants_df.groupby('recipient_province')['agreement_value'].mean()
display(province_avg)
```

3. Temporal distribution:
```python
grants_df['month'] = pd.to_datetime(grants_df['agreement_start_date']).dt.month
monthly_dist = grants_df.groupby(['org', 'month'])['agreement_value'].count()
display(monthly_dist)
```

### Common Issues & Solutions

1. SSL Verification Errors
   - Use `verify_ssl=False` when calling `fetch_all_orgs()`

2. Progress Monitoring
   - Initialize fetcher with `verbose=True` to see progress
   - Example: `fetcher = Fetcher(FetcherConfig(verbose=True))`

3. Amendment Handling
   - `'latest'`: Only most recent version of each grant (default)
   - `'all'`: All versions including amendments
   - `'none'`: Only original grants, no amendments

### Available Fields

The returned DataFrame includes these main fields:
- `ref_number`: Unique reference number for each grant
- `agreement_start_date`: Start date of the grant
- `agreement_end_date`: End date of the grant
- `agreement_value`: Dollar value of the grant
- `amendment_number`: Amendment version (if any)
- `recipient_legal_name`: Name of recipient
- `recipient_province`: Province of recipient
- `org`: Funding organization (NSERC, SSHRC, CIHR)
- `year`: Year of the grant

In [1]:
from data.fetcher import Fetcher, FetcherConfig

# Initialize fetcher with optional verbose mode
fetcher = Fetcher(FetcherConfig(verbose=True))

# Fetch 2019 data
grants_df = fetcher.fetch_all_orgs(
    year="2019",
    verify_ssl=False,
    handle_amendments='latest'
)

🚚 Starting tri-agency data fetch for 2019... 
  🔍 Fetching NSERC data... ✓ (12,080 records found)
  🔍 Fetching SSHRC data... ✓ (5,454 records found)
  🔍 Fetching CIHR data... ✓ (3,283 records found)

🔄️ Combining datasets... ✓
💾 Saved dataset to /u1/a9dutta/cs348/rgap/data/processed/tri_agency_grants_2019.csv

Dataset Summary
 - Total records: 20,817
 - Unique reference numbers: 20,817

Records per organization:
org
NSERC    12080
SSHRC     5454
CIHR      3283
Name: count, dtype: int64


In [2]:
# Analyze the results
if not grants_df.empty:
    analysis_results = fetcher.analyze_grants(grants_df)

🗃️ Performing grant analysis... 
  [1/4] Calculating summary by organization... ✓
  [2/4] Calculating provincial distribution... ✓
  [3/4] Identifying top recipients... ✓
  [4/4] Analyzing funding ranges... ✓

Analysis Results

Summary by Organization:


Unnamed: 0_level_0,agreement_value,agreement_value,agreement_value,agreement_value,recipient_province
Unnamed: 0_level_1,count,sum,mean,median,count
org,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
CIHR,3283,"$1,048,362,664.00","$319,330.69","$100,000.00",3211
NSERC,12080,"$1,107,476,253.22","$91,678.50","$24,835.00",12066
SSHRC,5454,"$1,091,804,708.96","$200,184.22","$40,000.00",5394



Grants by Province:


org,CIHR,NSERC,SSHRC
recipient_province,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
AB,300,1119,427
BC,416,1654,729
CA,0,34,11
CO,0,1,0
CT,0,3,1
FL,0,1,0
GA,0,3,0
HI,0,0,1
IL,0,4,3
IN,0,1,2



Top 10 Recipients by Funding:


Unnamed: 0_level_0,agreement_value,agreement_value,org
Unnamed: 0_level_1,count,sum,first
recipient_legal_name,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
University of Toronto - University of Toronto,1,"$46,525,808.00",SSHRC
"Chertkow, Howard M",1,"$31,625,000.00",CIHR
The University of British Columbia - The University of British Columbia,1,"$28,384,162.00",SSHRC
McGill University - Université McGill,1,"$27,180,649.00",SSHRC
"Anis, Aslam H",1,"$22,850,000.00",CIHR
University of Alberta - University of Alberta,1,"$17,872,608.00",SSHRC
Université de Montréal - Université de Montréal,1,"$17,858,583.00",SSHRC
Université Laval - Université Laval,1,"$14,124,963.00",SSHRC
University of Ottawa - Université d'Ottawa,1,"$14,095,015.00",SSHRC
McMaster University - McMaster University,1,"$13,900,412.00",SSHRC



Funding Range Distribution:


funding_range,0-10K,10K-50K,50K-100K,100K-500K,500K+
org,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
CIHR,755,723,220,886,692
NSERC,4481,3189,921,3219,257
SSHRC,215,2732,1083,1216,208
