# Adults in CA Who Reported Adverse Childhood Experiences (ACEs)

https://data.ca.gov/dataset/respondents-indicating-at-least-1-type-of-adverse-childhood-experience-lghc-indicator/resource/d0219ee0-0921-483f-9b19-f9eddae5fd8b

This notebook will delve into initial data exploration (cleaning the data set and inspecting it) of adults in CA who indicated having at least 1 type of adverse childhood experience.

* According to the website, "The Adverse Childhood Experiences (ACEs) module of the Behavioral Risk Factors Surveillance System (BRFSS) asks respondents questions about eight different traumatic childhood experiences that occurred before the age of 18.

* These include verbal/emotional abuse, physical abuse, sexual abuse, and negative household situations including the incarceration of an adult, alcohol or drug abuse by an adult, violence between adults, mental illness of a household member, and parental divorce or separation.

* This indicator shows the prevalence of adults who reported having 1 or more ACEs."

After cleaning this data set and making some initial observations, I will move into data analysis in a different notebook within the data_analysis folder.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import math

In [2]:
# Read in the data set:

adverse_df = pd.read_csv('../data/Raw/adult_ACEs.csv')

In [3]:
adverse_df

Unnamed: 0,LGHC Indicator Name,Geography,Year,Strata,Strata Name,Rate,Lower 95% CI,Upper 95% CI,Standard Error,LGHC Indicator ID,LGHC Target Rate
0,Respondents Indicating at Least 1 Type of Adve...,California,2015,Total population,Total population,63.5,61.4,65.7,1.1,4,45
1,Respondents Indicating at Least 1 Type of Adve...,California,2015,Age,18 to 24 years,70.2,63.1,77.3,3.6,4,45
2,Respondents Indicating at Least 1 Type of Adve...,California,2015,Age,25 to 34 years,62.8,57.0,68.6,3.0,4,45
3,Respondents Indicating at Least 1 Type of Adve...,California,2015,Age,35 to 44 years,64.2,58.9,69.5,2.7,4,45
4,Respondents Indicating at Least 1 Type of Adve...,California,2015,Age,45 to 54 years,66.6,61.7,71.5,2.5,4,45
5,Respondents Indicating at Least 1 Type of Adve...,California,2015,Age,55 to 64 years,66.2,61.9,70.6,2.2,4,45
6,Respondents Indicating at Least 1 Type of Adve...,California,2015,Age,65 years and above,51.7,48.2,55.2,1.8,4,45
7,Respondents Indicating at Least 1 Type of Adve...,California,2015,Education,Less than high school,63.7,57.3,70.0,3.2,4,45
8,Respondents Indicating at Least 1 Type of Adve...,California,2015,Education,High school graduate,65.4,60.3,70.5,2.6,4,45
9,Respondents Indicating at Least 1 Type of Adve...,California,2015,Education,Some college,70.3,66.3,74.3,2.0,4,45


In [4]:
adverse_df.shape

(51, 11)

In [5]:
# Look through the columns:

adverse_df.columns

Index(['LGHC Indicator Name', 'Geography', 'Year', 'Strata', 'Strata Name',
       'Rate', 'Lower 95% CI', 'Upper 95% CI', 'Standard Error',
       'LGHC Indicator ID', 'LGHC Target Rate'],
      dtype='object')

Observations:

* It looks like there are 51 rows and 11 columns, with each row representing a specific demographic (age, sex), education level, type of health insurance categories.
* The 'LGHC Indicator Name' seems to be the same throughout all the rows, so I know I can clean that out to get the data set I want.
* It seems like the age groups that I'm interested in are all listed as age categories, so I will be using that
* The 'Rate' is telling us the amount of adults belonging to each category (row) who indicated at least 1 type of ACE.

In [6]:
# Renaming the columns to fit my preferences and make them easier to work with:

cname_dict = {
    'Geography' : 'state',
    'Year' : 'year',
    'Strata' : 'category',
    'Strata Name' : 'category_name',
    'Rate' : 'rate',
    'Lower 95% CI' : 'lower_cl',
    'Upper 95% CI' : 'upper_cl'
}

adverse_df = adverse_df.rename(columns=cname_dict)

In [7]:
adverse_df

Unnamed: 0,LGHC Indicator Name,state,year,category,category_name,rate,lower_cl,upper_cl,Standard Error,LGHC Indicator ID,LGHC Target Rate
0,Respondents Indicating at Least 1 Type of Adve...,California,2015,Total population,Total population,63.5,61.4,65.7,1.1,4,45
1,Respondents Indicating at Least 1 Type of Adve...,California,2015,Age,18 to 24 years,70.2,63.1,77.3,3.6,4,45
2,Respondents Indicating at Least 1 Type of Adve...,California,2015,Age,25 to 34 years,62.8,57.0,68.6,3.0,4,45
3,Respondents Indicating at Least 1 Type of Adve...,California,2015,Age,35 to 44 years,64.2,58.9,69.5,2.7,4,45
4,Respondents Indicating at Least 1 Type of Adve...,California,2015,Age,45 to 54 years,66.6,61.7,71.5,2.5,4,45
5,Respondents Indicating at Least 1 Type of Adve...,California,2015,Age,55 to 64 years,66.2,61.9,70.6,2.2,4,45
6,Respondents Indicating at Least 1 Type of Adve...,California,2015,Age,65 years and above,51.7,48.2,55.2,1.8,4,45
7,Respondents Indicating at Least 1 Type of Adve...,California,2015,Education,Less than high school,63.7,57.3,70.0,3.2,4,45
8,Respondents Indicating at Least 1 Type of Adve...,California,2015,Education,High school graduate,65.4,60.3,70.5,2.6,4,45
9,Respondents Indicating at Least 1 Type of Adve...,California,2015,Education,Some college,70.3,66.3,74.3,2.0,4,45


Now I have set the column names to fit what I would like to work with; these names are more convenient! Next, I will create a new data set, adverse_df1, with only the columns I need for my data exploration and analysis:

In [8]:
# Choosing specific columns I want to work with:

cols_to_use = [
    'year',
    'category',
    'category_name',
    'rate',
    'lower_cl',
    'upper_cl'
]

adverse_df1 = adverse_df[cols_to_use]

In [9]:
adverse_df1

Unnamed: 0,year,category,category_name,rate,lower_cl,upper_cl
0,2015,Total population,Total population,63.5,61.4,65.7
1,2015,Age,18 to 24 years,70.2,63.1,77.3
2,2015,Age,25 to 34 years,62.8,57.0,68.6
3,2015,Age,35 to 44 years,64.2,58.9,69.5
4,2015,Age,45 to 54 years,66.6,61.7,71.5
5,2015,Age,55 to 64 years,66.2,61.9,70.6
6,2015,Age,65 years and above,51.7,48.2,55.2
7,2015,Education,Less than high school,63.7,57.3,70.0
8,2015,Education,High school graduate,65.4,60.3,70.5
9,2015,Education,Some college,70.3,66.3,74.3


Great! Now I have a much simpler data set that shows the columns I'm most interested in. I will save out this cleaned data set so that I can directly use that in my data analysis:

In [10]:
# Saving cleaned data to my folder:

adverse_df1.to_csv('../data/Cleaned/ACEs_CLEANED.csv', index=False)

Observations:

* The data is giving us the odd years of 2011, 2013, and 2015, which I will have to keep in mind when comparing to the other data sets in my folder.
* Each year out of these 3 has a "Total population" row.
* I will be interested in focusing on the differences of ACEs among age groups throughout the years! That is what I will be focusing on in the data_analysis folder for this data set. 