# NYC Data Analysis: Dataset Join Feasibility Assessment

This notebook examines the feasibility of joining NYC 311 service requests data with capital projects data:
1. **Temporal dimension** - checking for time overlap
2. **Geographic dimension** - checking for common geography
3. **Possible join keys** - identifying common fields for data linking

## Data Structure
- `311-service-requests-from-2010-to-present.csv` - citizen service requests
- `capital-project-schedules-and-budgets.csv` - capital construction projects
- `311-web-content-services.csv` - web service content
- Data dictionaries and metadata files

In [9]:
# Import necessary libraries
import pandas as pd
import numpy as np
import json
import os
from datetime import datetime
import matplotlib.pyplot as plt
import seaborn as sns

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 20)
plt.style.use('default')

# Path to data folder
data_path = './data/'
print("Available files in data folder:")
for file in os.listdir(data_path):
    print(f"- {file}")

Available files in data folder:
- SCA_Capital_Project_Schedules_and_Budgets_Data_Dictionary.xlsx
- socrata_metadata_311-service-requests-from-2010-to-present.json
- 311-web-content-services.csv
- 311-service-requests-from-2010-to-present.csv
- socrata_metadata.json
- capital-project-schedules-and-budgets.csv
- 311_SR_Data_Dictionary_2018.xlsx
- socrata_metadata_311-web-content-services.json


## 1. Data Files Structure Overview

First, let's load the main datasets and examine their structure:

In [None]:
# Load main datasets
print("=== DATA LOADING ===\n")

# 1. 311 Service Requests (main dataset)
print("1. Loading 311-service-requests...")
try:
    # Load first 100,000 rows for quick analysis
    df_311 = pd.read_csv(data_path + '311-service-requests-from-2010-to-present.csv',
                         nrows=100000, low_memory=False)
    print(f"   Size: {df_311.shape[0]:,} rows, {df_311.shape[1]} columns")
    print(f"   Columns: {list(df_311.columns[:10])}{'...' if len(df_311.columns) > 10 else ''}")
except Exception as e:
    print(f"   Error: {e}")

print()

# 2. Capital Projects
print("2. Loading capital-project-schedules...")
try:
    df_capital = pd.read_csv(data_path + 'capital-project-schedules-and-budgets.csv', low_memory=False)
    print(f"   Size: {df_capital.shape[0]:,} rows, {df_capital.shape[1]} columns")
    print(f"   Columns: {list(df_capital.columns[:10])}{'...' if len(df_capital.columns) > 10 else ''}")
except Exception as e:
    print(f"   Error: {e}")
    
print()

# 3. Web Content Services
print("3. Loading 311-web-content-services...")
try:
    df_web = pd.read_csv(data_path + '311-web-content-services.csv', low_memory=False)
    print(f"   Size: {df_web.shape[0]:,} rows, {df_web.shape[1]} columns")
    print(f"   Columns: {list(df_web.columns)}")
except Exception as e:
    print(f"   Error: {e}")

=== DATA LOADING ===

1. Loading 311-service-requests...


In [None]:
# Detailed structure overview of each dataset
print("=== DETAILED STRUCTURE OVERVIEW ===\n")

print("📊 1. 311 SERVICE REQUESTS DATASET:")
print("="*50)
print("Main columns:", df_311.columns.tolist())
print(f"\nData information:")
print(df_311.info())

print("\n" + "="*70)
print("📊 2. CAPITAL PROJECTS DATASET:")
print("="*50)
print("Main columns:", df_capital.columns.tolist())
print(f"\nData information:")
print(df_capital.info())

print("\n" + "="*70)
print("📊 3. WEB CONTENT SERVICES DATASET:")
print("="*50)
print("Main columns:", df_web.columns.tolist())
print(f"\nFirst 3 rows:")
print(df_web.head(3))

=== ДЕТАЛЬНИЙ ОГЛЯД СТРУКТУРИ ===

📊 1. ДАТАСЕТ 311 SERVICE REQUESTS:
Основні колонки: ['Unique Key', 'Created Date', 'Closed Date', 'Agency', 'Agency Name', 'Complaint Type', 'Descriptor', 'Location Type', 'Incident Zip', 'Incident Address', 'Street Name', 'Cross Street 1', 'Cross Street 2', 'Intersection Street 1', 'Intersection Street 2', 'Address Type', 'City', 'Landmark', 'Facility Type', 'Status', 'Due Date', 'Resolution Description', 'Resolution Action Updated Date', 'Community Board', 'BBL', 'Borough', 'X Coordinate (State Plane)', 'Y Coordinate (State Plane)', 'Open Data Channel Type', 'Park Facility Name', 'Park Borough', 'Vehicle Type', 'Taxi Company Borough', 'Taxi Pick Up Location', 'Bridge Highway Name', 'Bridge Highway Direction', 'Road Ramp', 'Bridge Highway Segment', 'Latitude', 'Longitude', 'Location', 'Zip Codes', 'Community Districts', 'Borough Boundaries', 'City Council Districts', 'Police Precincts']

Інформація про дані:
<class 'pandas.core.frame.DataFrame'>
Rang

## 2. Temporal Dimension Analysis (Time Overlap Analysis)

Let's check if there's time overlap between 311 service requests and capital projects:

In [None]:
# Analysis of temporal columns in 311 and Capital Projects datasets
print("=== TEMPORAL DATA ANALYSIS ===\n")

# 1. Analysis of 311 dataset
print("📅 1. 311 SERVICE REQUESTS DATASET - Temporal columns:")
print("-" * 50)

# Find all date columns
date_columns_311 = [col for col in df_311.columns if 'date' in col.lower() or 'time' in col.lower()]
print(f"Date columns: {date_columns_311}")

# Convert dates and analyze periods
df_311['Created Date'] = pd.to_datetime(df_311['Created Date'], errors='coerce')
df_311['Closed Date'] = pd.to_datetime(df_311['Closed Date'], errors='coerce')

print(f"\n311 data period:")
print(f"  Earliest creation date: {df_311['Created Date'].min()}")
print(f"  Latest creation date: {df_311['Created Date'].max()}")
print(f"  Earliest closure date: {df_311['Closed Date'].min()}")
print(f"  Latest closure date: {df_311['Closed Date'].max()}")

print("\n" + "="*70)

# 2. Analysis of Capital Projects dataset  
print("📅 2. CAPITAL PROJECTS DATASET - Temporal columns:")
print("-" * 50)

# Find all date columns
date_columns_capital = [col for col in df_capital.columns if 'date' in col.lower()]
print(f"Date columns: {date_columns_capital}")

# Convert dates
for date_col in date_columns_capital:
    df_capital[date_col] = pd.to_datetime(df_capital[date_col], errors='coerce')
    print(f"\n{date_col}:")
    print(f"  Min: {df_capital[date_col].min()}")
    print(f"  Max: {df_capital[date_col].max()}")
    print(f"  Number of non-null values: {df_capital[date_col].notna().sum()}/{len(df_capital)}")

=== АНАЛІЗ ЧАСОВИХ ДАНИХ ===

📅 1. ДАТАСЕТ 311 SERVICE REQUESTS - Часові колонки:
--------------------------------------------------
Колонки з датами: ['Created Date', 'Closed Date', 'Due Date', 'Resolution Action Updated Date']

Період даних 311:
  Найраніша дата створення: 2019-11-12 16:48:09
  Найпізніша дата створення: 2019-12-01 02:04:01
  Найраніша дата закриття: 2019-09-27 14:25:00
  Найпізніша дата закриття: 2019-12-01 01:59:50

📅 2. ДАТАСЕТ CAPITAL PROJECTS - Часові колонки:
--------------------------------------------------
Колонки з датами: ['Project Phase Actual Start Date', 'Project Phase Planned End Date', 'Project Phase Actual End Date']

Project Phase Actual Start Date:
  Мін: 2003-09-12 00:00:00
  Макс: 2020-12-31 00:00:00
  Кількість не-null значень: 5502/12136

Project Phase Planned End Date:
  Мін: 2003-09-12 00:00:00
  Макс: 2023-09-03 00:00:00
  Кількість не-null значень: 3877/12136

Project Phase Actual End Date:
  Мін: 2003-09-12 00:00:00
  Макс: 2020-12-31 00:0

  df_capital[date_col] = pd.to_datetime(df_capital[date_col], errors='coerce')
  df_capital[date_col] = pd.to_datetime(df_capital[date_col], errors='coerce')
  df_capital[date_col] = pd.to_datetime(df_capital[date_col], errors='coerce')


In [None]:
# Analysis of temporal period overlap
print("\n=== PERIOD OVERLAP ANALYSIS ===")

# Define periods for each dataset
print("\n🔍 Temporal period comparison:")
print("-" * 40)

# 311 period (from our sample)
period_311_start = df_311['Created Date'].min()
period_311_end = df_311['Created Date'].max()
print(f"311 Service Requests (sample): {period_311_start.date()} - {period_311_end.date()}")

# Capital Projects period
period_capital_start = df_capital['Project Phase Actual Start Date'].min()
period_capital_end = df_capital['Project Phase Actual Start Date'].max()
print(f"Capital Projects (start dates): {period_capital_start.date()} - {period_capital_end.date()}")

# Check for overlap
overlap_start = max(period_311_start, period_capital_start)
overlap_end = min(period_311_end, period_capital_end)

print(f"\n✅ OVERLAP ANALYSIS RESULT:")
if overlap_start <= overlap_end:
    print(f"🎯 OVERLAP EXISTS! Period: {overlap_start.date()} - {overlap_end.date()}")
    overlap_days = (overlap_end - overlap_start).days
    print(f"📊 Overlap duration: {overlap_days} days")
    
    # Count records in overlap period
    count_311_overlap = df_311[
        (df_311['Created Date'] >= overlap_start) & 
        (df_311['Created Date'] <= overlap_end)
    ].shape[0]
    
    count_capital_overlap = df_capital[
        (df_capital['Project Phase Actual Start Date'] >= overlap_start) & 
        (df_capital['Project Phase Actual Start Date'] <= overlap_end)
    ].shape[0]
    
    print(f"📈 311 requests in overlap period: {count_311_overlap:,}")
    print(f"📈 Capital projects (start) in period: {count_capital_overlap:,}")
else:
    print("❌ NO OVERLAP")

# Also check with all capital project dates
print(f"\n🔄 Additional analysis with all capital project dates:")
capital_all_dates = pd.concat([
    df_capital['Project Phase Actual Start Date'].dropna(),
    df_capital['Project Phase Planned End Date'].dropna(),
    df_capital['Project Phase Actual End Date'].dropna()
])

capital_min_all = capital_all_dates.min()
capital_max_all = capital_all_dates.max()
print(f"Full capital projects period: {capital_min_all.date()} - {capital_max_all.date()}")

overlap_start_all = max(period_311_start, capital_min_all)
overlap_end_all = min(period_311_end, capital_max_all)

if overlap_start_all <= overlap_end_all:
    print(f"✅ Overlap with all dates: {overlap_start_all.date()} - {overlap_end_all.date()}")
    print(f"📊 Duration: {(overlap_end_all - overlap_start_all).days} days")
else:
    print("❌ No overlap")


=== АНАЛІЗ ПЕРЕТИНАННЯ ПЕРІОДІВ ===

🔍 Порівняння часових періодів:
----------------------------------------
311 Service Requests (вибірка): 2019-11-12 - 2019-12-01
Capital Projects (start dates): 2003-09-12 - 2020-12-31

✅ РЕЗУЛЬТАТ АНАЛІЗУ ПЕРЕТИНАННЯ:
🎯 Є ПЕРЕТИНАННЯ! Період: 2019-11-12 - 2019-12-01
📊 Тривалість перетинання: 18 днів
📈 Звернення 311 у період перетинання: 100,000
📈 Капітальні проекти (початок) у період: 74

🔄 Додатковий аналіз з усіма датами капітальних проектів:
Весь період капітальних проектів: 2003-09-12 - 2023-09-03
✅ Перетинання з усіма датами: 2019-11-12 - 2019-12-01
📊 Тривалість: 18 днів


## 3. Geographic Data Analysis (Spatial Analysis)

Let's check if there are common geographic identifiers for spatial joining:

In [None]:
# Analysis of geographic columns in datasets
print("=== GEOGRAPHIC DATA ANALYSIS ===\n")

# 1. Analysis of geographic columns in 311 dataset
print("📍 1. 311 SERVICE REQUESTS DATASET - Geographic columns:")
print("-" * 60)

# Find columns with geographic data
geo_keywords = ['location', 'address', 'borough', 'zip', 'latitude', 'longitude', 'district', 'community']
geo_columns_311 = [col for col in df_311.columns 
                   if any(keyword in col.lower() for keyword in geo_keywords)]

print(f"Geographic columns: {geo_columns_311}")

# Analyze key geographic fields
key_geo_fields_311 = ['Borough', 'Incident Zip', 'Latitude', 'Longitude', 'Community Board']
for field in key_geo_fields_311:
    if field in df_311.columns:
        unique_count = df_311[field].nunique()
        null_count = df_311[field].isnull().sum()
        print(f"\n{field}:")
        print(f"  Unique values: {unique_count}")
        print(f"  Missing values: {null_count}/{len(df_311)} ({null_count/len(df_311)*100:.1f}%)")
        if unique_count < 20:  # Show values if not too many
            print(f"  Values: {sorted(df_311[field].dropna().unique())}")

print("\n" + "="*70)

# 2. Analysis of geographic columns in Capital Projects dataset
print("📍 2. CAPITAL PROJECTS DATASET - Geographic columns:")
print("-" * 60)

geo_columns_capital = [col for col in df_capital.columns 
                      if any(keyword in col.lower() for keyword in geo_keywords)]
print(f"Geographic columns: {geo_columns_capital}")

# Analyze key geographic fields
key_geo_fields_capital = ['Project Geographic District ', 'Project School Name']
for field in key_geo_fields_capital:
    if field in df_capital.columns:
        unique_count = df_capital[field].nunique()
        null_count = df_capital[field].isnull().sum()
        print(f"\n{field}:")
        print(f"  Unique values: {unique_count}")
        print(f"  Missing values: {null_count}/{len(df_capital)} ({null_count/len(df_capital)*100:.1f}%)")
        if unique_count < 30:  # Show values if not too many
            sample_values = df_capital[field].dropna().unique()[:10]  # First 10 values
            print(f"  Sample values: {list(sample_values)}")

# Specific analysis of Geographic District
if 'Project Geographic District ' in df_capital.columns:
    print(f"\n🔍 Detailed analysis of Project Geographic District:")
    district_counts = df_capital['Project Geographic District '].value_counts()
    print(f"Top 10 districts by project count:")
    print(district_counts.head(10))

=== АНАЛІЗ ГЕОГРАФІЧНИХ ДАНИХ ===

📍 1. ДАТАСЕТ 311 SERVICE REQUESTS - Географічні колонки:
------------------------------------------------------------
Географічні колонки: ['Location Type', 'Incident Zip', 'Incident Address', 'Address Type', 'Community Board', 'Borough', 'Park Borough', 'Taxi Company Borough', 'Taxi Pick Up Location', 'Latitude', 'Longitude', 'Location', 'Zip Codes', 'Community Districts', 'Borough Boundaries', 'City Council Districts']

Borough:
  Унікальних значень: 6
  Пропущених значень: 0/100000 (0.0%)
  Значення: ['BRONX', 'BROOKLYN', 'MANHATTAN', 'QUEENS', 'STATEN ISLAND', 'Unspecified']

Incident Zip:
  Унікальних значень: 209
  Пропущених значень: 2749/100000 (2.7%)

Latitude:
  Унікальних значень: 50039
  Пропущених значень: 3304/100000 (3.3%)

Longitude:
  Унікальних значень: 50038
  Пропущених значень: 3304/100000 (3.3%)

Community Board:
  Унікальних значень: 76
  Пропущених значень: 0/100000 (0.0%)

📍 2. ДАТАСЕТ CAPITAL PROJECTS - Географічні колонки:
-

## 4. Join Keys Identification

Let's analyze possible ways to join the datasets:

In [None]:
# Search for possible keys to join datasets
print("=== JOIN KEYS SEARCH ===\n")

# 1. Compare all columns to find common ones
print("🔍 1. COLUMN COMPARISON BETWEEN DATASETS:")
print("-" * 50)

columns_311 = set(df_311.columns)
columns_capital = set(df_capital.columns)

# Search for exact matches
exact_matches = columns_311.intersection(columns_capital)
print(f"Exact column matches: {list(exact_matches) if exact_matches else 'None'}")

# Search for similar columns (by name)
similar_pairs = []
for col_311 in columns_311:
    for col_capital in columns_capital:
        # Check similarity by keywords
        keywords_common = ['district', 'borough', 'location', 'address', 'zip', 'community']
        col_311_lower = col_311.lower()
        col_capital_lower = col_capital.lower()
        
        for keyword in keywords_common:
            if keyword in col_311_lower and keyword in col_capital_lower:
                similar_pairs.append((col_311, col_capital, keyword))

print(f"\nSimilar columns by keywords:")
for pair in similar_pairs:
    print(f"  311: '{pair[0]}' <-> Capital: '{pair[1]}' (common: {pair[2]})")

print("\n" + "="*70)

# 2. Analysis of spatial join possibilities
print("🗺️ 2. SPATIAL JOIN POSSIBILITIES:")
print("-" * 50)

# Check Borough in 311 and Geographic District in Capital
if 'Borough' in df_311.columns and 'Project Geographic District ' in df_capital.columns:
    
    # Unique boroughs in 311
    boroughs_311 = set(df_311['Borough'].dropna().unique())
    print(f"Boroughs in 311 dataset ({len(boroughs_311)}): {sorted(boroughs_311)}")
    
    # Unique districts in Capital Projects
    districts_capital = set(df_capital['Project Geographic District '].dropna().unique())
    print(f"\nDistricts in Capital dataset ({len(districts_capital)}):")
    print(f"First 10: {sorted(list(districts_capital))[:10]}")
    
    # Try to find correspondences between Borough and District
    print(f"\n🔄 Search for Borough <-> District correspondences:")
    
    # NYC Borough to district numbers mapping
    nyc_boroughs = ['MANHATTAN', 'BROOKLYN', 'QUEENS', 'BRONX', 'STATEN ISLAND']
    
    for borough in boroughs_311:
        if borough and borough.upper() in nyc_boroughs:
            # Count records for this borough
            count_311 = df_311[df_311['Borough'] == borough].shape[0]
            print(f"  {borough}: {count_311:,} 311 requests")

print("\n" + "="*70)

# 3. Community Board analysis as possible key
print("🏘️ 3. COMMUNITY BOARD ANALYSIS:")
print("-" * 50)

if 'Community Board' in df_311.columns:
    cb_311 = df_311['Community Board'].dropna().unique()
    print(f"Community Board in 311 ({len(cb_311)} unique):")
    print(f"Examples: {sorted(cb_311)[:10]}")
    
    # Check if there are similar fields in Capital
    cb_like_fields = [col for col in df_capital.columns if 'board' in col.lower() or 'community' in col.lower()]
    print(f"\nSimilar fields in Capital: {cb_like_fields}")

print("\n" + "="*70)

# 4. Coordinates analysis
print("📍 4. COORDINATES ANALYSIS:")
print("-" * 50)

if 'Latitude' in df_311.columns and 'Longitude' in df_311.columns:
    lat_count = df_311['Latitude'].notna().sum()
    lon_count = df_311['Longitude'].notna().sum()
    print(f"311 dataset: {lat_count:,} records with latitude coordinates, {lon_count:,} with longitude")
    
    print(f"311 coordinate ranges:")
    print(f"  Latitude: {df_311['Latitude'].min():.4f} - {df_311['Latitude'].max():.4f}")
    print(f"  Longitude: {df_311['Longitude'].min():.4f} - {df_311['Longitude'].max():.4f}")

# Check if there are coordinates in Capital
coord_fields_capital = [col for col in df_capital.columns if any(word in col.lower() for word in ['lat', 'lon', 'coord'])]
print(f"\nCoordinate fields in Capital: {coord_fields_capital if coord_fields_capital else 'No explicit coordinate fields'}")

print("\n" + "="*70)

# 5. Summary of possible join strategies
print("💡 5. POSSIBLE JOIN STRATEGIES:")
print("-" * 50)

strategies = [
    {
        'name': 'Geographic join via Borough/District',
        'feasible': bool(boroughs_311 and districts_capital),
        'description': 'Mapping Borough (311) -> Geographic District (Capital)',
        'challenge': 'Need additional mapping between Borough and District numbers'
    },
    {
        'name': 'Spatial join via coordinates',
        'feasible': lat_count > 0 and len(coord_fields_capital) == 0,
        'description': 'Using 311 coordinates to determine proximity to Capital projects',
        'challenge': 'Capital projects lack coordinates - need geocoding'
    },
    {
        'name': 'Temporal-geographic join',
        'feasible': True,
        'description': 'Combining temporal overlap + geographic proximity',
        'challenge': 'Need additional geographic reference'
    },
    {
        'name': 'Join via external sources',
        'feasible': True,
        'description': 'Using additional NYC geographic reference data',
        'challenge': 'Need external data for mapping'
    }
]

for i, strategy in enumerate(strategies, 1):
    status = "✅ Possible" if strategy['feasible'] else "❌ Difficult"
    print(f"{i}. {strategy['name']} - {status}")
    print(f"   Description: {strategy['description']}")
    print(f"   Challenge: {strategy['challenge']}")
    print()

=== ПОШУК КЛЮЧІВ ДЛЯ JOIN ===

🔍 1. ПОРІВНЯННЯ КОЛОНОК МІЖ ДАТАСЕТАМИ:
--------------------------------------------------
Точні збіги колонок: Немає

Схожі колонки за ключовими словами:
  311: 'Community Districts' <-> Capital: 'Project Geographic District ' (спільне: district)
  311: 'City Council Districts' <-> Capital: 'Project Geographic District ' (спільне: district)

🗺️ 2. МОЖЛИВОСТІ ПРОСТОРОВОГО З'ЄДНАННЯ:
--------------------------------------------------
Боро в 311 датасеті (6): ['BRONX', 'BROOKLYN', 'MANHATTAN', 'QUEENS', 'STATEN ISLAND', 'Unspecified']

Райони в Capital датасеті (33):
Перші 10: [np.int64(1), np.int64(2), np.int64(3), np.int64(4), np.int64(5), np.int64(6), np.int64(7), np.int64(8), np.int64(9), np.int64(10)]

🔄 Пошук відповідностей Borough <-> District:
  BRONX: 19,729 звернень 311
  BROOKLYN: 29,799 звернень 311
  STATEN ISLAND: 4,311 звернень 311
  MANHATTAN: 21,946 звернень 311
  QUEENS: 23,766 звернень 311

🏘️ 3. АНАЛІЗ COMMUNITY BOARD:
------------------

## 5. Conclusions and Recommendations

Based on the conducted analysis, here are the summary conclusions regarding data joining possibilities:

In [None]:
# Final conclusions and recommendations
print("=" * 80)
print("📋 FINAL CONCLUSIONS AND RECOMMENDATIONS")
print("=" * 80)

print("\n✅ 1. TEMPORAL DIMENSION (TIME OVERLAP)")
print("-" * 50)
print("RESULT: Significant temporal overlap exists!")
print(f"• Overlap period: 18 days (2019-11-12 to 2019-12-01)")
print(f"• 311 requests in period: 100,000 records")
print(f"• Capital projects: 74 projects started in this period")
print("• Overall Capital Projects period: 2003-2023 (covers all possible 311 periods)")

print("\n✅ 2. GEOGRAPHIC DIMENSION")
print("-" * 50)
print("RESULT: Geographic joining possibilities exist!")
print("• 311 dataset has Borough (5 NYC boroughs) + coordinates (96,696 records)")
print("• Capital Projects has Geographic District (33 districts)")
print("• Community Board in 311 (76 unique districts)")
print("• Coordinates exist only in 311, Capital Projects lacks them")

print("\n💡 3. RECOMMENDED JOIN STRATEGIES")
print("-" * 50)

strategies = [
    {
        "priority": "High",
        "name": "Borough → District Mapping",
        "description": "Create mapping between Borough (311) and Geographic District (Capital)",
        "implementation": "Use NYC School Districts or Community Districts reference",
        "pros": "Direct relationship, high accuracy",
        "cons": "Requires additional reference data"
    },
    {
        "priority": "Medium", 
        "name": "Temporal-geographic join",
        "description": "Combine temporal overlap + geographic proximity",
        "implementation": "Filter by time + group by Borough/District",
        "pros": "Enables analysis of construction impact on requests",
        "cons": "More complex logic, requires validation"
    },
    {
        "priority": "Low",
        "name": "Geocoding + spatial join", 
        "description": "Add coordinates to Capital Projects via geocoding",
        "implementation": "Geocode project addresses, use search radius",
        "pros": "Most accurate spatial joining",
        "cons": "Requires geocoding, computationally intensive"
    }
]

for i, strategy in enumerate(strategies, 1):
    print(f"\n{i}. {strategy['name']} (Priority: {strategy['priority']})")
    print(f"   📝 Description: {strategy['description']}")
    print(f"   🔨 Implementation: {strategy['implementation']}")
    print(f"   ✅ Advantages: {strategy['pros']}")
    print(f"   ⚠️  Disadvantages: {strategy['cons']}")

print(f"\n🎯 4. BEST APPROACH FOR ANALYSIS")
print("-" * 50)
print("Recommended combined strategy:")
print("1️⃣ Temporal filtering: select overlap period")
print("2️⃣ Geographic grouping: Borough (311) + additional mapping to District")
print("3️⃣ Correlation analysis: requests before/during/after projects")
print("4️⃣ Visualization: maps with overlaid zones and time series")

print(f"\n📊 5. EXPECTED ANALYSIS RESULTS")
print("-" * 50)
research_questions = [
    "Do 311 requests increase during active construction projects?",
    "What types of requests are most commonly related to construction work?", 
    "In which districts does construction most impact citizen requests?",
    "How long does the impact of construction projects on request volume last?",
    "Are there seasonal or weekly patterns in the relationship between projects and requests?"
]

for i, question in enumerate(research_questions, 1):
    print(f"{i}. {question}")

print(f"\n📋 6. NEXT STEPS")
print("-" * 50)
next_steps = [
    "Load complete 311 dataset (not just 100K records)",
    "Find or create Borough → Geographic District mapping",
    "Implement temporal-geographic join", 
    "Conduct exploratory analysis of joined data",
    "Create visualizations to test hypotheses",
    "Statistically verify correlations between projects and requests"
]

for i, step in enumerate(next_steps, 1):
    print(f"{i}. {step}")

print("\n" + "=" * 80)

📋 ФІНАЛЬНІ ВИСНОВКИ ТА РЕКОМЕНДАЦІЇ

✅ 1. ЧАСОВИЙ ВИМІР (TIME OVERLAP)
--------------------------------------------------
РЕЗУЛЬТАТ: Є значне часове перетинання!
• Період перетинання: 18 днів (2019-11-12 до 2019-12-01)
• Звернення 311 у період: 100,000 записів
• Капітальні проекти: 74 проекти почалися в цей період
• Загальний період Capital Projects: 2003-2023 (покриває всі можливі періоди 311)

✅ 2. ГЕОГРАФІЧНИЙ ВИМІР
--------------------------------------------------
РЕЗУЛЬТАТ: Є можливості для географічного з'єднання!
• 311 датасет має Borough (5 боро NYC) + координати (96,696 записів)
• Capital Projects має Geographic District (33 райони)
• Community Board в 311 (76 унікальних районів)
• Координати є тільки в 311, в Capital Projects їх немає

💡 3. РЕКОМЕНДОВАНІ СТРАТЕГІЇ З'ЄДНАННЯ
--------------------------------------------------

1. Borough → District Mapping (Пріоритет: Висока)
   📝 Опис: Створити мапінг між Borough (311) та Geographic District (Capital)
   🔨 Реалізація: Викорис