# Preparing the Mental Health Care Facilities Data

#### By Irene Casado Sanchez

The purpose of this notebook is to prepare the four datasets that we are going to use in our further analysis.

Mental Health Care Datasets:
   * **Licensed and Certified Healthcare Facility Listing** – Source: https://data.chhs.ca.gov/dataset/healthcare-facility-locations
   * **Licensed Mental Health Rehabilitation Centers (MHRC) and Psychiatric Health Facilities (PHF)** – Source: https://data.chhs.ca.gov/dataset/licensed-mental-health-rehabilitation-centers-mhrc-and-psychiatric-health-facilities-phf
   * **Licensed Narcotic Treatment Programs** – Source: https://data.chhs.ca.gov/dataset/licensed-narcotic-treatment-programs
   * **SUD Recovery Treatment Facilities** – Source: https://data.chhs.ca.gov/dataset/sud-recovery-treatment-facilities

In [3]:
import pandas as pd
import numpy as np
import datetime
import altair as alt
import os

Let's read the data:

In [4]:
DATA_DIR = os.environ['DATA_DIR']

In [5]:
narcotics = pd.read_csv(DATA_DIR + "/raw/edited_clean_narcotics_facilities.csv") 

In [6]:
mental_health = pd.read_csv(DATA_DIR + "/raw/edited_from_full_to_mental_health_facilities.csv") 

In [7]:
psych = pd.read_csv(DATA_DIR + "/raw/edited_mental_health_psych_cleaned_geo.csv")

In [8]:
substance_abuse = pd.read_csv(DATA_DIR + "/raw/edited_substance_use_cleaned.csv") 

Let's put the data together:

In [9]:
frames = civilian_officer_frames = [narcotics, mental_health, substance_abuse, psych]

In [10]:
all_frames = pd.concat(frames) # Concatenating the dataframes

In [11]:
all_frames.head(5) # Checking the data

Unnamed: 0,facility_name,facility_type,address,city,county,state,zip,capacity,latitude,longitude,phone_number,other_details
0,"California Forensic Medical Group, Inc",narcotics,5325 Broder Boulevard,Dublin,Alameda County,CA,94568,120,37.715926,-121.884151,(925) 551-6700,
1,Humanistic Alternatives to Addiction Research ...,narcotics,20094 Mission Boulevard,Hayward,Alameda County,CA,94541,400,37.687185,-122.102244,(510) 727-9755,
2,"Addiction Research and Treatment, Inc.",narcotics,1124 International Boulevard,Oakland,Alameda County,CA,94606,650,37.791022,-122.248455,(510) 533-0800,
3,"BAART Behavioral Health Services, Inc.",narcotics,1124 International Boulevard,Oakland,Alameda County,CA,94606,50,37.791022,-122.248455,(510) 533-0800,
4,Successful Alternatives For Addiction & Counse...,narcotics,795 Fletcher Lane,Hayward,Alameda County,CA,94544,365,37.665521,-122.080495,(510) 247-8300,


In [13]:
# Here, we are going to standarize the rows in upper case

all_frames['facility_type'] = all_frames['facility_type'].str.upper()
all_frames['county'] = all_frames['county'].str.upper()
all_frames.head(3)
all_frames.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2132 entries, 0 to 63
Data columns (total 12 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   facility_name  2132 non-null   object 
 1   facility_type  2132 non-null   object 
 2   address        2132 non-null   object 
 3   city           2132 non-null   object 
 4   county         2132 non-null   object 
 5   state          2132 non-null   object 
 6   zip            2132 non-null   object 
 7   capacity       2132 non-null   int64  
 8   latitude       2132 non-null   float64
 9   longitude      2132 non-null   float64
 10  phone_number   2122 non-null   object 
 11  other_details  1811 non-null   object 
dtypes: float64(2), int64(1), object(9)
memory usage: 216.5+ KB


In [15]:
# Here, we create a new column with the full address of each facility

all_frames["full_address"] = all_frames["address"] + ',' + ' ' + all_frames["city"] + ',' + ' ' + all_frames["county"] + ',' + ' ' + all_frames["state"] + ',' + ' ' + all_frames["zip"].astype(str)
all_frames.head(3) # Checking the data

Unnamed: 0,facility_name,facility_type,address,city,county,state,zip,capacity,latitude,longitude,phone_number,other_details,full_address
0,"California Forensic Medical Group, Inc",NARCOTICS,5325 Broder Boulevard,Dublin,ALAMEDA COUNTY,CA,94568,120,37.715926,-121.884151,(925) 551-6700,,"5325 Broder Boulevard, Dublin, ALAMEDA COUNTY,..."
1,Humanistic Alternatives to Addiction Research ...,NARCOTICS,20094 Mission Boulevard,Hayward,ALAMEDA COUNTY,CA,94541,400,37.687185,-122.102244,(510) 727-9755,,"20094 Mission Boulevard, Hayward, ALAMEDA COUN..."
2,"Addiction Research and Treatment, Inc.",NARCOTICS,1124 International Boulevard,Oakland,ALAMEDA COUNTY,CA,94606,650,37.791022,-122.248455,(510) 533-0800,,"1124 International Boulevard, Oakland, ALAMEDA..."


Our dataset is ready for the next steps, let's create a .csv

In [16]:
all_frames.to_csv(r'all_facilities.csv') # Create a csv with all the dataframes