# Notebook Title

## Setup Python and R environment
you can ignore this section

In [46]:
%load_ext rpy2.ipython
%load_ext autoreload
%autoreload 2

%matplotlib inline  
from matplotlib import rcParams
rcParams['figure.figsize'] = (16, 100)

import warnings
from rpy2.rinterface import RRuntimeWarning
warnings.filterwarnings("ignore") # Ignore all warnings
# warnings.filterwarnings("ignore", category=RRuntimeWarning) # Show some warnings

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import display, HTML

The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [47]:
%%javascript
// Disable auto-scrolling
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}

<IPython.core.display.Javascript object>

In [48]:
%%R

# My commonly used R imports

require('tidyverse')

## 👉 download your data

You can write code here to download your dataset. Or if you already have it, just leave the URL in the comments and just load it into a pandas or R (or both) dataframe.

In [2]:
import pandas as pd

df = pd.read_csv("mental-health-directory-2024.csv")  
df.head()


Unnamed: 0,name1,name2,street1,street2,city,state,zip,phone,intake1,intake2,intake1a,intake2a,service_code_info
0,SpectraCare Health Systems,Henry County Clinic,219 Dothan Road,,Abbeville,AL,36310,800-951-4357,,,,,SA MH SUMH * OP * CMHC * CHLOR FLUPH HALOP PER...
1,SpectraCare Health Systems,Henry County Day Treatment,1242 U.S. Highway 431 South,,Abbeville,AL,36310,800-951-4357,334-951-4357,,,,MH SUMH * OP * CMHC * CHLOR FLUPH HALOP ARIPI ...
2,South Central Alabama MHC,Covington County Mental Health Center,19815 Bay Branch Road,,Andalusia,AL,36420,334-222-2523,877-530-0002,,,,SA MH SUMH * OP PHDT * CMHC * CHLOR FLUPH HALO...
3,South Central Alabama MHC,Montezuma Complex,205 Academy Drive,,Andalusia,AL,36420,334-428-5050,877-530-0002,,,,SA MH SUMH * PHDT RES * MSMH * CHLOR FLUPH HAL...
4,RMC Anniston,,400 East 10th Street,,Anniston,AL,36207,256-235-5121,256-235-5482,256-741-6484,,,MH * HI * IPSY * CHLOR FLUPH HALOP THIOR ARIPI...


## 👉 convert addresses --> lat/long 

See the [census-examples](https://github.com/data4news/census-examples) repository for examples. If you need help, try asking in the class slack channel. Chances are someone in the class is struggling with the same problem as you are so we might as well all learn together in the same slack channel! 

In [3]:
# !pip install requests-cache

In [4]:
# pip install "urllib3<2.0"


In [5]:
import pandas as pd

df = pd.read_csv("mental-health-directory-2024.csv")

df['street1'] = df['street1'].fillna("").str.strip().str.title()
df['city'] = df['city'].fillna("").str.strip().str.title()
df['state'] = df['state'].fillna("").str.strip().str.upper()
df['zip'] = df['zip'].astype(str).str.zfill(5)

nyc_boroughs = ['New York', 'Brooklyn', 'Bronx', 'Queens', 'Staten Island']
df = df[df['city'].isin(nyc_boroughs)]

df['full_address'] = df['street1'] + ', ' + df['city'] + ', NY ' + df['zip']

df.to_csv("nyc_cleaned_addresses.csv", index=False)
print("✅ Cleaned NYC addresses saved to nyc_cleaned_addresses.csv")


✅ Cleaned NYC addresses saved to nyc_cleaned_addresses.csv


In [6]:
pip install --upgrade geopy urllib3


Note: you may need to restart the kernel to use updated packages.


In [17]:
import pandas as pd
from geopy.geocoders import Nominatim
from time import sleep
from tqdm import tqdm  # <-- progress bar

# Load your data
df = pd.read_csv("nyc_cleaned_addresses.csv")

geolocator = Nominatim(user_agent="nyc-mental-health")

def safe_geocode(address):
    try:
        location = geolocator.geocode(address, timeout=10)
        sleep(1)  # respect Nominatim rate limit
        return location
    except Exception as e:
        print(f"Error: {address} -> {e}")
        return None

locations = []
for address in tqdm(df['full_address'], desc="Geocoding addresses"):
    loc = safe_geocode(address)
    locations.append(loc)

df['location'] = locations
df['latitude'] = df['location'].apply(lambda loc: loc.latitude if loc else None)
df['longitude'] = df['location'].apply(lambda loc: loc.longitude if loc else None)

df.to_csv("nyc_geocoded_addresses.csv", index=False)
print("✅ Done! File saved as 'nyc_geocoded_addresses.csv'")


Geocoding addresses: 100%|████████████████████| 125/125 [03:04<00:00,  1.48s/it]

✅ Done! File saved as 'nyc_geocoded_addresses.csv'





## 👉 convert lat/long to census geography codes 

(like 'GEOID', 'STATE', 'COUNTY', 'TRACT', 'BLOCK', etc...)

Same note as above, see [census-examples](https://github.com/data4news/census-examples) repository for examples or ask in the class slack channel if stuck.

In [10]:
pip install censusgeocode


Note: you may need to restart the kernel to use updated packages.


In [15]:
import pandas as pd
df = pd.read_csv('nyc_geocoded_addresses.csv')
df

Unnamed: 0,name1,name2,street1,street2,city,state,zip,phone,intake1,intake2,intake1a,intake2a,service_code_info,full_address,location,latitude,longitude
0,Medstar Harbor Hospital,Behavioral Health,3001 South Hanover Street,Suite 164,Brooklyn,MD,21225,410-350-7550,,,,,MH * OP PHDT * PSY * ARIPI CLOZA OLANZ OLANZF ...,"3001 South Hanover Street, Brooklyn, NY 21225",,,
1,Astor Servs for Children and Families,Astor Day Treatment Program,516 East Tremont Avenue,,Bronx,NY,10457,347-978-2450,929-285-3917 x1096,,,,MH SUMH * PHDT * PH * CHLOR HALOP PERPH ANTPYC...,"516 East Tremont Avenue, Bronx, NY 10457","516, East Tremont Avenue, East Tremont, The Br...",40.846700,-73.896822
2,Astor Servs for Children and Families,Highbridge Clinic,1419 Shakespeare Avenue,1st Floor,Bronx,NY,10452,718-231-3400,718-732-7080 x0,,,,SA MH SUMH * OP * OMH * NSC ANTPYCH * CBT CFT ...,"1419 Shakespeare Avenue, Bronx, NY 10452","1419, Shakespeare Avenue, High Bridge, The Bro...",40.842533,-73.921238
3,Astor Servs for Children and Families,Lawrence F Hickey Center,4010 Dyre Avenue,,Bronx,NY,10466,845-515-3000,718-515-3000,,,,MH * PHDT * PH * ANTPYCH * CBT CFT GT IPT TELE...,"4010 Dyre Avenue, Bronx, NY 10466","Public School 15, 4010, Dyre Avenue, Parkside,...",40.890946,-73.830730
4,Astor Servs for Children and Families,Tilden Clinic,750 Tilden Street,,Bronx,NY,10467,718-231-3400,,,,,SA MH SUMH * OP * OMH * ANTPYCH * CBT CFT DBT ...,"750 Tilden Street, Bronx, NY 10467","750, Tilden Street, Williams Bridge, The Bronx...",40.876680,-73.862771
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
120,South Beach Psychiatric Center,Ocean View Lodge Clinic,777 Seaview Avenue,"Building 10, 2nd Floor",Staten Island,NY,10305,718-667-2536,718-667-2463,,,,SA MH SUMH * OP * OMH * CHLOR FLUPH HALOP LOXA...,"777 Seaview Avenue, Staten Island, NY 10305",Staten Island Children and Youth Day Treatment...,40.582628,-74.080394
121,South Beach Psychiatric Center,South Richmond ACT Team,- - -,,Staten Island,NY,10305,718-668-8050,,,,,SA MH SUMH * OP * OMH * CHLOR FLUPH HALOP LOXA...,"- - -, Staten Island, NY 10305","Staten Island, Richmond County, City of New Yo...",40.583456,-74.149605
122,Staten Island Mental Health,A Div of Richmond University Med Ctr,- - -,,Staten Island,NY,10301,718-818-4440,718-818-6700,,,,MH SUMH * OP * CMHC * CHLOR PERPH ARIPI LURAS ...,"- - -, Staten Island, NY 10301","Staten Island, Richmond County, City of New Yo...",40.583456,-74.149605
123,Staten Island Mental Health A Division,A Div of Richmond University Med Ctr,669 Castleton Avenue,,Staten Island,NY,10301,718-818-6690,718-818-6700 x86700,,,,MH SUMH * OP * OMH * ARIPI QUETI RISPE NRT ANT...,"669 Castleton Avenue, Staten Island, NY 10301","669, Castleton Avenue, West New Brighton, Stat...",40.635285,-74.103755


In [16]:
import pandas as pd
import requests
import time
from tqdm.notebook import tqdm

df = pd.read_csv("nyc_geocoded_addresses.csv")


df['GEOID'] = None
df['STATE'] = None
df['COUNTY'] = None
df['TRACT'] = None
df['BLOCK'] = None

for idx, row in tqdm(df.iterrows(), total=len(df)):
    if pd.isna(row['latitude']) or pd.isna(row['longitude']):
        continue
        
    try:
        url = f"https://geocoding.geo.census.gov/geocoder/geographies/coordinates?x={row['longitude']}&y={row['latitude']}&benchmark=Public_AR_Current&vintage=Current_Current&format=json"
        
        response = requests.get(url)
        
        if response.status_code == 200:
            result = response.json()
            
            if 'result' in result and 'geographies' in result['result']:
                geographies = result['result']['geographies']
                
                if '2020 Census Blocks' in geographies and len(geographies['2020 Census Blocks']) > 0:
                    block = geographies['2020 Census Blocks'][0]
                    
                    df.at[idx, 'GEOID'] = block.get('GEOID', None)
                    df.at[idx, 'STATE'] = block.get('STATE', None)
                    df.at[idx, 'COUNTY'] = block.get('COUNTY', None)
                    df.at[idx, 'TRACT'] = block.get('TRACT', None)
                    df.at[idx, 'BLOCK'] = block.get('BLOCK', None)
        
        time.sleep(0.5)
        
    except Exception as e:
        print(f"Error processing row {idx}: {e}")


df.to_csv("mental_health_facilities_nyc_with_census.csv", index=False)

ImportError: cannot import name 'appengine' from 'urllib3.contrib' (/Users/somaiyah/.pyenv/versions/3.12.7/lib/python3.12/site-packages/urllib3/contrib/__init__.py)

## 👉 Output Data

Output your dataframe containing your data and the Census connector codes (like tract, block, etc...).

In [13]:
import pandas as pd

df = pd.read_csv("mental_health_facilities_nyc_with_census.csv")  
df.head()


Unnamed: 0,name1,name2,street1,street2,city,state,zip,phone,intake1,intake2,...,service_code_info,full_address,location,latitude,longitude,GEOID,STATE,COUNTY,TRACT,BLOCK
0,Medstar Harbor Hospital,Behavioral Health,3001 South Hanover Street,Suite 164,Brooklyn,MD,21225,410-350-7550,,,...,MH * OP PHDT * PSY * ARIPI CLOZA OLANZ OLANZF ...,"3001 South Hanover Street, Brooklyn, NY 21225",,,,,,,,
1,Astor Servs for Children and Families,Astor Day Treatment Program,516 East Tremont Avenue,,Bronx,NY,10457,347-978-2450,929-285-3917 x1096,,...,MH SUMH * PHDT * PH * CHLOR HALOP PERPH ANTPYC...,"516 East Tremont Avenue, Bronx, NY 10457","516, East Tremont Avenue, East Tremont, The Br...",40.8467,-73.896822,360050400000000.0,36.0,5.0,39500.0,4001.0
2,Astor Servs for Children and Families,Highbridge Clinic,1419 Shakespeare Avenue,1st Floor,Bronx,NY,10452,718-231-3400,718-732-7080 x0,,...,SA MH SUMH * OP * OMH * NSC ANTPYCH * CBT CFT ...,"1419 Shakespeare Avenue, Bronx, NY 10452","1419, Shakespeare Avenue, High Bridge, The Bro...",40.842533,-73.921238,360050200000000.0,36.0,5.0,21302.0,3000.0
3,Astor Servs for Children and Families,Lawrence F Hickey Center,4010 Dyre Avenue,,Bronx,NY,10466,845-515-3000,718-515-3000,,...,MH * PHDT * PH * ANTPYCH * CBT CFT GT IPT TELE...,"4010 Dyre Avenue, Bronx, NY 10466","Public School 15, 4010, Dyre Avenue, Parkside,...",40.890946,-73.83073,360050500000000.0,36.0,5.0,45600.0,2002.0
4,Astor Servs for Children and Families,Tilden Clinic,750 Tilden Street,,Bronx,NY,10467,718-231-3400,,,...,SA MH SUMH * OP * OMH * ANTPYCH * CBT CFT DBT ...,"750 Tilden Street, Bronx, NY 10467","750, Tilden Street, Williams Bridge, The Bronx...",40.87668,-73.862771,360050400000000.0,36.0,5.0,38000.0,4006.0
