# Interstate Commerce Network Analysis

**Research Questions:**
1. Which states are most influential in our network of interstate flows?
2. What are the most significant flows between those states?

**Data Source:** U.S. Census Bureau CFS 2017 Public Use File

**Note:** This notebook downloads official Census data and uses validated analysis scripts to reproduce research findings.

## Step 1: Setup - Download CFS Data from Census Bureau

Download the official CFS 2017 Public Use File directly from the U.S. Census Bureau (~140MB, takes 2-3 minutes).

In [None]:
# Download CFS 2017 data from Census Bureau
import os
import urllib.request
import zipfile

# Census Bureau official data URL
data_url = "https://www2.census.gov/programs-surveys/cfs/datasets/2017/puf/cfs_2017_puf_csv.zip"

# Download if not already present
if not os.path.exists('cfs_2017_puf.csv'):
    print("Downloading CFS 2017 data from Census Bureau...")
    urllib.request.urlretrieve(data_url, 'cfs_data.zip')
    
    print("Extracting data...")
    with zipfile.ZipFile('cfs_data.zip', 'r') as zip_ref:
        zip_ref.extractall('.')
    
    # Clean up zip file
    os.remove('cfs_data.zip')
    print("Data downloaded successfully!")
else:
    print("CFS data already present.")

## Step 2: Import Analysis Scripts from GitHub

Download our validated analysis scripts from the GitHub repository.

In [None]:
# Download analysis scripts from GitHub
import urllib.request

# GitHub raw file URLs
base_url = "https://raw.githubusercontent.com/rsthornton/cfs-network-analysis/main/analysis/"

scripts = [
    "centrality_analysis.py",
    "flow_extraction.py"
]

for script in scripts:
    if not os.path.exists(script):
        print(f"Downloading {script}...")
        urllib.request.urlretrieve(base_url + script, script)
        print(f"  ✓ {script} downloaded")
    else:
        print(f"  ✓ {script} already present")

print("\nAnalysis scripts ready!")

## Step 3: Install Required Libraries

Install the necessary Python packages for the analysis.

In [None]:
# Install required packages
!pip install pandas networkx matplotlib seaborn -q

print("Required libraries installed.")

## Step 4: Run Three-Level Centrality Analysis

Identify the most influential states using our three-level framework:
- **MACRO**: Regional bridging power (betweenness centrality)
- **MESO**: Influence networks (eigenvector centrality)
- **MICRO**: Distribution power (weighted out-degree)

In [None]:
# Run centrality analysis
!python centrality_analysis.py --data cfs_2017_puf.csv --output results --top-n 10

print("\nCentrality analysis complete!")

## Step 5: Load and Display Centrality Results

Load the results and create a simple summary table showing the most influential states.

In [None]:
import pandas as pd
import json

# Load centrality results
results_files = os.listdir('results')
json_file = [f for f in results_files if f.startswith('centrality_analysis') and f.endswith('.json')][0]

with open(f'results/{json_file}', 'r') as f:
    results = json.load(f)

# Create summary table of top 5 states at each level
print("=" * 60)
print("MOST INFLUENTIAL STATES BY LEVEL (Top 5)")
print("=" * 60)

levels = [
    ('MACRO - Regional Bridging', 'macro_level'),
    ('MESO - Influence Networks', 'meso_level'),
    ('MICRO - Distribution Power', 'micro_level')
]

for level_name, level_key in levels:
    print(f"\n{level_name}:")
    for i, state in enumerate(results['three_level_analysis'][level_key]['top_states'][:5], 1):
        print(f"  {i}. {state['state_code']}: {state['score']:.3f}")

# Show multi-level leaders
print("\n" + "=" * 60)
print("MULTI-LEVEL LEADERS (States appearing in multiple rankings)")
print("=" * 60)
for state in results['three_level_analysis']['multi_level_leaders'][:5]:
    print(f"  {state['state']}: {state['levels']} levels - Score: {state['score']:.2f}")

## Step 6: Extract Bilateral Flows Between Top States

Analyze the most significant commodity flows between the influential states identified above.

In [None]:
# Get top states from centrality analysis
top_states = set()
for level in ['macro_level', 'meso_level', 'micro_level']:
    for state in results['three_level_analysis'][level]['top_states'][:5]:
        top_states.add(state['state_code'])

states_list = ','.join(sorted(top_states))
print(f"Analyzing flows between: {states_list}")

# Run flow extraction for top states
!python flow_extraction.py --data cfs_2017_puf.csv --states {states_list} --top-n 20 --output flow_results

## Step 7: Display Flow Analysis Results

Show the most significant bilateral flows between influential states.

In [None]:
# Load flow results
flow_files = os.listdir('flow_results')
csv_file = [f for f in flow_files if f.startswith('bilateral_flows') and f.endswith('.csv')][0]

flows_df = pd.read_csv(f'flow_results/{csv_file}')

# Display top flows
print("=" * 60)
print("TOP 10 BILATERAL FLOWS BETWEEN INFLUENTIAL STATES")
print("=" * 60)
print("\nRank | Origin → Destination | Value ($B) | Weight (M tons)")
print("-" * 60)

for idx, row in flows_df.head(10).iterrows():
    print(f"{idx+1:4d} | {row['origin_state']:^6} → {row['dest_state']:^6} | "
          f"${row['total_value']/1e9:8.2f} | {row['total_tons']/1e6:8.2f}")

# Summary statistics
print("\n" + "=" * 60)
print("SUMMARY STATISTICS")
print("=" * 60)
print(f"Total flow value analyzed: ${flows_df['total_value'].sum()/1e9:.1f} billion")
print(f"Number of state pairs: {len(flows_df)}")
print(f"Average flow value: ${flows_df['total_value'].mean()/1e9:.2f} billion")

## Step 8: Create Simple Visualization

Generate a basic bar chart showing the relative importance of states.

In [None]:
import matplotlib.pyplot as plt

# Create simple bar chart of multi-level leaders
fig, ax = plt.subplots(figsize=(10, 6))

leaders = results['three_level_analysis']['multi_level_leaders'][:10]
states = [l['state'] for l in leaders]
scores = [l['score'] for l in leaders]

ax.bar(states, scores, color='steelblue')
ax.set_xlabel('State', fontsize=12)
ax.set_ylabel('Multi-Level Influence Score', fontsize=12)
ax.set_title('Most Influential States in Interstate Commerce Network', fontsize=14)
ax.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print("Analysis complete! Key findings:")
print(f"1. Most influential states identified across three levels")
print(f"2. Bilateral flows quantified between top states")
print(f"3. Results ready for academic citation")

## Citation Information

**Data Source:**
U.S. Census Bureau. (2017). Commodity Flow Survey Public Use File. 
Retrieved from https://www.census.gov/programs-surveys/cfs/data/datasets.html

**Analysis Code:**
Available at: https://github.com/rsthornton/cfs-network-analysis

**Method:**
Three-level network centrality analysis using NetworkX, with survey-weighted interstate commodity flows.