# Computable Phenotype - Homelessness
Jack Rossi and Anu Sharma

## 1. Define Population for Training Data

This project seeks to define the characteristics in electronic medical records that indicate that a person is experiencing homelessness. We seek to define, understand, and intervene on this population, especially homeless patients who are very sick and/or highly utilize inpatient services.

We will identify a subset of patients in the study period who have 3 or more inpatient admissions in any 1-year period. A chart review will then be completed for these patients to idenify which of them are experiencing homelessness.

In [1]:
import pandas as pd

In [2]:
adm = pd.read_csv('lmh admissions.csv')
adm["synth_date"] = pd.to_datetime(adm.loc[:,"synth_date"]).dt.date
adm = adm.sort_values(by = 'synth_date')

In [3]:
adm.head()

Unnamed: 0,synth_id,synth_date
44834,171538957,2014-01-05
35905,1008244996,2014-01-10
37561,1896715568,2014-01-15
37601,631697669,2014-01-15
33123,956115776,2014-01-16


#### Find all patients with 3+ admits in a year-long span

In [4]:
window = pd.to_timedelta(365, 'd')
ids = adm.loc[:,'synth_id'].tolist()
synth_date = adm.loc[:,'synth_date'].tolist()

patients = []
#loop throuhg each date, defining the year beginning on that day
for d in list(set(synth_date)):
    #get list of admissions between date d and date d + window; count number of each ID
    numAdmByPat = adm.loc[((adm['synth_date'] < d + window) & (adm['synth_date'] >= d)),'synth_id'].value_counts()
    patsThreePlus = list(numAdmByPat.index[numAdmByPat >= 3])
    patients.extend(patsThreePlus)

In [9]:
patient_set = set(patients)
print('There are {} patients who were admitted 3 or more times in a year span.'.format(len(patient_set)))

There are 913 patients who were admitted 3 or more times in a year span.


#### Validation - patients admitted 3+ times in a calendar year.

In [6]:
#create a column for the year of the admission
adm['Year'] = adm['synth_date'].map(lambda x: x.year)
#count the number of admission dates for each year and patient
admByYear = adm.groupby(by = ['Year','synth_id']).count()

In [11]:
#find patients who've had 3 or more
admByYear1 = admByYear.loc[admByYear['synth_date'] >= 3].reset_index()
patient_set1 = set(admByYear1.loc[:,'synth_id'])
print('There are {} patients who were admitted 3+ times in a calendar year.'.format(len(patient_set1)))

There are 631 patients who were admitted 3+ times in a calendar year.


#### Validation - patients who were admitted 3+ in a year span, but not calendar year.

In [14]:
patient_set.difference(patient_set1)

{25209670,
 25448643,
 30708806,
 31569564,
 34549340,
 57638971,
 63302070,
 64002924,
 71538267,
 79642394,
 83898215,
 91638118,
 91954940,
 98227342,
 108971290,
 111441205,
 112460299,
 116592038,
 122049363,
 127879980,
 154554168,
 162749199,
 164333826,
 165158584,
 167856983,
 194736040,
 200384996,
 201415040,
 206405851,
 218372888,
 230831747,
 238297392,
 240030363,
 240837192,
 241352037,
 260388794,
 261913852,
 274209750,
 277395607,
 299395574,
 300749248,
 302434324,
 305546191,
 307982246,
 315284609,
 325368638,
 332732535,
 348254844,
 350038258,
 353951241,
 357371922,
 357580279,
 358420792,
 365162658,
 389893288,
 390859504,
 399750255,
 414498035,
 421426315,
 421555621,
 422705517,
 425142292,
 428421773,
 435675520,
 456469066,
 458706285,
 471392823,
 475713699,
 479226567,
 495297011,
 498245766,
 503775766,
 514724741,
 522476968,
 534842521,
 537429615,
 549705040,
 550979882,
 564824727,
 574588395,
 574694407,
 577486715,
 578152965,
 591967595,
 59273

## Construction of Homeless Address Indicators

"The homeless address indicators were based on six sources: (1) a comprehen- sive directory of shelter and single-site supportive housing programs provided by housing program staff from Hennepin and Ramsey Counties in response to an open-ended inquiry by electronic mail. We added several other address types noted by local homeless experts to this directory, including (2) the General Delivery Address (GDA)— a free service offered by the U.S. Postal Service for an individual’s mail to be held at the post office; (3) addresses of local homeless service centers collecting mail for homeless clients; (4) free text responses (e.g., “homeless”) recorded in the mailing address section of Medicaid enrollment records synonymous with homelessness; and (5) addresses of institutions commonly used by homeless individuals, including hotels, places of worship, and hospitals (Zech et al. 2015). Finally, (6) within the data, we observed frequent use of the addresses of county administrative offices and added these locations to the directory." (Vickery, 2018)

"
Category   Includes
Keyword    Addresses that include a variant of keywords “homeless” and “undomiciled” 
Hospital   Addresses of health care facilities participating in Healthix
Shelter    Addresses of 270 shelters in New York City and Long Island    
Worship    Addresses of 9677 places of worship in New York City and Long Island
" (Zech, 2015)

In [19]:
# estimating number of shelters between the boroughs
shelters = pd.read_json('https://data.cityofnewyork.us/resource/3qem-6v3v.json')
# shelters.groupby('borough')['']
print('Shelters by borough:')
print(shelters.reset_index().groupby('borough').count()['index'])
print('Total shelters {}'.format(shelters.shape[0]))
shelters.head()

Shelters by borough:
borough
Bronx            204
Brooklyn         272
Manhattan        187
Queens           204
Staten Island     17
Name: index, dtype: int64
Total shelters 884


Unnamed: 0,report_date,borough,community_district,adult_family_shelter,adult_shelter,fwc_cluster,fwc_comm_hotel,fwc_shelter,adult_shelter_comm_hotel,adult_family_comm_hotel
0,2020-01-31T00:00:00.000,Bronx,201,1.0,2.0,4.0,1.0,7.0,,
1,2020-01-31T00:00:00.000,Staten Island,501,,,,,1.0,,
2,2020-01-31T00:00:00.000,Queens,414,1.0,,,1.0,1.0,,
3,2020-01-31T00:00:00.000,Queens,413,,,,2.0,2.0,5.0,
4,2020-01-31T00:00:00.000,Queens,412,1.0,4.0,,3.0,6.0,7.0,


In [15]:
# family shelter performance
shelPerf = pd.read_json('https://data.cityofnewyork.us/resource/y7z5-rhh5.json')
shelPerf

Unnamed: 0,year_quarter,facility_name,provider_agency,performance_tier
0,2012 Q4,Clinton Family Inn,Homes for the Homeless,1st Performance Tier
1,2012 Q4,CRF House East,Children's Rescue Fund,1st Performance Tier
2,2012 Q4,HB - LaGuardia Family Center,"Housing Bridge, Inc.",1st Performance Tier
3,2012 Q4,HB - New Broadway,"Housing Bridge, Inc.",1st Performance Tier
4,2012 Q4,HELP- Bronx Morris,HELP U.S.A,1st Performance Tier
...,...,...,...,...
134,2013 Q1,Providence House 7,"Providence House, Inc.",6th Performance Tier
135,2013 Q1,St. John's Place Family Residence,Settlement Housing Fund,6th Performance Tier
136,2013 Q1,Stockholm Family Residence,SCO,6th Performance Tier
137,2013 Q1,Theresa Haven,Family Support System,6th Performance Tier


In [18]:
# NYC Health + Hospitals Locations
nychealth = pd.read_json('https://data.cityofnewyork.us/resource/ymhw-9cz9.json')
nychealth
print('NYC Health + Hospitals by borough')
print(nychealth.reset_index().groupby('borough').count()['index'])

NYC Health + Hospitals by borough
borough
Bronx            14
Brooklyn         26
Manhattan        24
Queens           11
Staten Island     3
Name: index, dtype: int64


In [27]:
short_list_data = pd.read_json('https://data.cityofnewyork.us/resource/bmxf-3rd4.json')