# COMP 4151 Project

### Overview

The goal of this project is to apply analytics techniques learned in this course to study a real-world dataset, and if possible, build a model to make prediction.

The main dataset provided is BCHI-dataset_2019-03-04.csv. More information about the data set can be found here: https://www.bigcitieshealth.org/city-data




### I. Basic Understanding of the Dataset

In this phase, make sure you understand as much as possible the data set.

Here are some questions that you should answer. When you answer a question, your answer should be in English. At the same time, if applicable, the answer should consist Python code that shows how you obtain your answer.  People should be able to read and understand your answer without guessing on your behalf how to get the answer.

1. What is the Indicator attribute?
2. How many categories of Indicator are there?
3. Explain the "Value" value of row 26382 in this dataset.
4. Explain the "Value" value of row 7833.
5. Explain the "Value" value of row 10682.  What does it mean that the "Sex" value is "Both"?
6. Explain the "Value" value of row 26701.
7. Specifically, which factors does the indicator category 'Social and Economic Factors' consist of?


### II. Data Exploration

Study obesity in both adults and high school students in this data set.

Such a study may involve a number steps:

* Quick exploration of the data on the subject of obesity.
    + This requires that you ask basic questions. Many of these questions can be answered using pandas.
    + This may also require that you draw figures to get a better understanding of the data.  seaborn should be helpful here.

* Establishing a number of hypotheses about obesity (cause, effect, etc.)

* Analyze, report on your hypotheses.

An important goal of the project is for you to communicate effectively. Clarity in writing (English and Python) is extremely important. Readers do not want to guess how you arrive to your findings.




### III. Limitations of the Data

In this part, I would like to you think about realistic limitations about this dataset. Which attributes/factors are not included in the dataset, but can be important in understanding or effecting a health factor?

This part requires you to think deeply and speculate larger hypotheses.

It also requires your team to collect additional data not included in this dataset.

Lastly, it requires you to perform analyses to prove or disprove your hypotheses.  This part should require pandas, sklearn, etc.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
data = pd.read_csv("./dataset/BCHI-dataset_2019-03-04.csv")

### IV. Project Report

The report should be in stages. You should turn in notebooks with properly named. For example, COMP4151_Project_Team01_StageI.ipynb.

In [3]:
place_to_exclude = "U.S. Total, U.S. Total"
sex_to_include = "Both"

In [4]:
data['Value'].fillna(0,inplace=True)

In [5]:
percent_adult_obese = data[data['Indicator'] == 'Percent of Adults Who Are Obese']
percent_hs_obese = data[data['Indicator'] == 'Percent of High School Students Who Are Obese']
adult_indicators = [j for j,x in enumerate(['Adult' in i for i in data['Indicator'].unique()]) if x]
high_school_indicators = [j for j,x in enumerate(['High School' in i for i in data['Indicator'].unique()]) if x]

In [44]:
exclude_place = ["U.S. Total, U.S. Total"]
include_sex = "Both"
include_race = "All"

filtered_data = data[~(data['Place'].isin(exclude_place) ) & (data['Race/Ethnicity'] == include_race) & (data['Sex'] == include_sex)]

array([2010, 2011, 2012, 2013, 2014, 2015, 2016, 2018, 2017])

In [10]:
data[data['Indicator'] == data['Indicator'].unique()[2]]

Unnamed: 0,Indicator Category,Indicator,Year,Sex,Race/Ethnicity,Value,Place,BCHC Requested Methodology,Source,Methods,Notes,90% Confidence Level - Low,90% Confidence Level - High,95% Confidence Level - Low,95% Confidence Level - High
1167,Behavioral Health/Substance Abuse,Percent of High School Students Who Binge Drank,2010,Both,All,19.0,"Seattle, WA",YRBS/YRBSS (or similar). Consuming five or mor...,Healthy Youth Survey,,,,,16.0,22.0
1168,Behavioral Health/Substance Abuse,Percent of High School Students Who Binge Drank,2010,Both,All,19.7,"Chicago, Il",YRBS/YRBSS (or similar). Consuming five or mor...,,,,,,,
1169,Behavioral Health/Substance Abuse,Percent of High School Students Who Binge Drank,2010,Both,Asian/PI,12.0,"Seattle, WA","YRBS/YRBSS (or similar). ""During the past 30 d...",Healthy Youth Survey,,Does not include Pacific Islanders as we repor...,,,10.0,14.0
1170,Behavioral Health/Substance Abuse,Percent of High School Students Who Binge Drank,2010,Both,Black,13.0,"Seattle, WA",YRBS/YRBSS (or similar). Consuming five or mor...,Healthy Youth Survey,,,,,10.0,17.0
1171,Behavioral Health/Substance Abuse,Percent of High School Students Who Binge Drank,2010,Both,Black,13.4,"Chicago, Il",YRBS/YRBSS (or similar). Consuming five or mor...,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1460,Behavioral Health/Substance Abuse,Percent of High School Students Who Binge Drank,2015,Male,All,14.6,"Fort Worth (Tarrant County), TX",YRBS/YRBSS (or similar). Consuming five or mor...,"Fort Worth Independent School District, 2015 Y...",,Fort Worth Independent School District (not al...,,,12.4,17.1
1461,Behavioral Health/Substance Abuse,Percent of High School Students Who Binge Drank,2015,Male,All,15.6,"Portland (Multnomah County), OR",YRBS/YRBSS (or similar). Consuming five or mor...,Oregon Healthy Teens,weighted % of 11th graders who binge drank at ...,"2015 Oregon Healthy Teens, 11th graders",,,13.0,18.2
1462,Behavioral Health/Substance Abuse,Percent of High School Students Who Binge Drank,2015,Male,All,15.9,"Charlotte, NC",YRBS/YRBSS (or similar). Consuming five or mor...,,,,,,12.7,19.6
1463,Behavioral Health/Substance Abuse,Percent of High School Students Who Binge Drank,2015,Male,All,16.4,"Miami (Miami-Dade County), FL",YRBS/YRBSS (or similar). Consuming five or mor...,Data source Youth Risk Behavior Surveillance S...,,,,,,


In [6]:
high_school_indicators

[2, 12, 13, 14, 57]

In [11]:
data['Indicator'].unique()

array(['Opioid-Related Unintentional Drug Overdose Mortality Rate (Age-Adjusted; Per 100,000 people)',
       'Percent of Adults Who Binge Drank',
       'Percent of High School Students Who Binge Drank',
       'All Types of Cancer Mortality Rate (Age-Adjusted; Per 100,000 people)',
       'Female Breast Cancer Mortality Rate (Age-Adjusted; Per 100,000 people)',
       'Lung Cancer Mortality Rate (Age-Adjusted; Per 100,000 people)',
       'Asthma Emergency Department Visit Rate (Age-Adjusted; Per 10,000)',
       'Diabetes Mortality Rate (Age-Adjusted; Per 100,000 people)',
       'Heart Disease Mortality Rate (Age-Adjusted; Per 100,000 people)',
       'Percent of Adults Who Are Obese',
       'Percent of Adults Who Currently Smoke',
       'Percent of Adults Who Meet CDC-Recommended Physical Activity Levels',
       'Percent of High School Students Who Are Obese',
       'Percent of High School Students Who Currently Smoke',
       'Percent of High School Students Who Meet CDC-Reco

In [19]:
rate_related_indicator = [x for x in data['Indicator'].unique() if "Rate" in x]

In [22]:
rate_related_indicator

['Opioid-Related Unintentional Drug Overdose Mortality Rate (Age-Adjusted; Per 100,000 people)',
 'All Types of Cancer Mortality Rate (Age-Adjusted; Per 100,000 people)',
 'Female Breast Cancer Mortality Rate (Age-Adjusted; Per 100,000 people)',
 'Lung Cancer Mortality Rate (Age-Adjusted; Per 100,000 people)',
 'Asthma Emergency Department Visit Rate (Age-Adjusted; Per 10,000)',
 'Diabetes Mortality Rate (Age-Adjusted; Per 100,000 people)',
 'Heart Disease Mortality Rate (Age-Adjusted; Per 100,000 people)',
 'Rate of Laboratory Confirmed Infections Caused by Salmonella (Per 100,000 people)',
 'Rate of Laboratory Confirmed Infections Caused by Shiga Toxin-Producing E-Coli (Per 100,000 people)',
 'AIDS Diagnoses Rate (Per 100,000 people)',
 'HIV Diagnoses Rate (Per 100,000 people)',
 'HIV-Related Mortality Rate (Age-Adjusted; Per 100,000 people)',
 'Persons Living with HIV/AIDS Rate (Per 100,000 people)',
 'Pneumonia and Influenza Mortality Rate (Age-Adjusted; Per 100,000 people)',
 'Tub

In [23]:
data['Indicator'].unique()

array(['Opioid-Related Unintentional Drug Overdose Mortality Rate (Age-Adjusted; Per 100,000 people)',
       'Percent of Adults Who Binge Drank',
       'Percent of High School Students Who Binge Drank',
       'All Types of Cancer Mortality Rate (Age-Adjusted; Per 100,000 people)',
       'Female Breast Cancer Mortality Rate (Age-Adjusted; Per 100,000 people)',
       'Lung Cancer Mortality Rate (Age-Adjusted; Per 100,000 people)',
       'Asthma Emergency Department Visit Rate (Age-Adjusted; Per 10,000)',
       'Diabetes Mortality Rate (Age-Adjusted; Per 100,000 people)',
       'Heart Disease Mortality Rate (Age-Adjusted; Per 100,000 people)',
       'Percent of Adults Who Are Obese',
       'Percent of Adults Who Currently Smoke',
       'Percent of Adults Who Meet CDC-Recommended Physical Activity Levels',
       'Percent of High School Students Who Are Obese',
       'Percent of High School Students Who Currently Smoke',
       'Percent of High School Students Who Meet CDC-Reco

In [54]:
filtered_data['Indicator'].unique()

array(['Opioid-Related Unintentional Drug Overdose Mortality Rate (Age-Adjusted; Per 100,000 people)',
       'Percent of Adults Who Binge Drank',
       'Percent of High School Students Who Binge Drank',
       'All Types of Cancer Mortality Rate (Age-Adjusted; Per 100,000 people)',
       'Lung Cancer Mortality Rate (Age-Adjusted; Per 100,000 people)',
       'Asthma Emergency Department Visit Rate (Age-Adjusted; Per 10,000)',
       'Diabetes Mortality Rate (Age-Adjusted; Per 100,000 people)',
       'Heart Disease Mortality Rate (Age-Adjusted; Per 100,000 people)',
       'Percent of Adults Who Are Obese',
       'Percent of Adults Who Currently Smoke',
       'Percent of Adults Who Meet CDC-Recommended Physical Activity Levels',
       'Percent of High School Students Who Are Obese',
       'Percent of High School Students Who Currently Smoke',
       'Percent of High School Students Who Meet CDC-Recommended Physical Activity Levels',
       'Percent Foreign Born', 'Percent of Pop

In [55]:
obese = ['Percent of Adults Who Are Obese', 'Percent of High School Students Who Are Obese']

In [58]:
filtered_data[(filtered_data['Indicator'].isin(obese)) & (filtered_data['Place'] == 'Boston, MA')]

Unnamed: 0,Indicator Category,Indicator,Year,Sex,Race/Ethnicity,Value,Place,BCHC Requested Methodology,Source,Methods,Notes,90% Confidence Level - Low,90% Confidence Level - High,95% Confidence Level - Low,95% Confidence Level - High
6503,Chronic Disease,Percent of Adults Who Are Obese,2010,Both,All,20.3,"Boston, MA",BRFSS (or similar survey). Percent of populati...,"Boston Behavioral Risk Factor Survey, Boston P...",,This survey is not conducted annually.,,,18.1,22.5
6809,Chronic Disease,Percent of Adults Who Are Obese,2013,Both,All,21.7,"Boston, MA",BRFSS (or similar survey). Percent of populati...,"Boston Behavioral Risk Factor Survey, Boston P...",,This survey is not conducted annually.,,,20.0,23.4
7046,Chronic Disease,Percent of Adults Who Are Obese,2015,Both,All,21.9,"Boston, MA",BRFSS (or similar survey). Percent of populati...,"Boston Behavioral Risk Factor Survey, 2015, Bo...",,This survey is conducted every other year.,,,19.9,24.0
8177,Chronic Disease,Percent of High School Students Who Are Obese,2011,Both,All,14.3,"Boston, MA",YRBS/YRBSS (or similar summary). Percent of hi...,"Youth Risk Behavior Survey, Centers for Diseas...",,,,,11.7,16.9
8255,Chronic Disease,Percent of High School Students Who Are Obese,2013,Both,All,13.8,"Boston, MA",YRBS/YRBSS (or similar summary). Percent of hi...,"Youth Risk Behavior Survey, Centers for Diseas...",,,,,11.4,16.2
8374,Chronic Disease,Percent of High School Students Who Are Obese,2015,Both,All,14.6,"Boston, MA",YRBS/YRBSS (or similar summary). Percent of hi...,"Youth Risk Behavior Survey, Centers for Diseas...",,,,,12.4,16.7


In [51]:
hiv_data = filtered_data[filtered_data['Indicator'] == "HIV Diagnoses Rate (Per 100,000 people)"]

In [60]:
obese_data = filtered_data[filtered_data['Indicator'].isin(obese)]

In [61]:
hiv_data

Unnamed: 0,Indicator Category,Indicator,Year,Sex,Race/Ethnicity,Value,Place,BCHC Requested Methodology,Source,Methods,Notes,90% Confidence Level - Low,90% Confidence Level - High,95% Confidence Level - Low,95% Confidence Level - High
15138,HIV/AIDS,"HIV Diagnoses Rate (Per 100,000 people)",2010,Both,All,7.7,"Phoenix, AZ",HIV cases diagnosed in a given year; report cr...,Arizona Department of Health Services (ADHS) a...,,Maricopa County level data,,,,
15140,HIV/AIDS,"HIV Diagnoses Rate (Per 100,000 people)",2010,Both,All,17.3,"Las Vegas (Clark County), NV",HIV cases diagnosed in a given year; report cr...,,,,,,15.5,19.2
15141,HIV/AIDS,"HIV Diagnoses Rate (Per 100,000 people)",2010,Both,All,17.4,"San Antonio, TX",HIV cases diagnosed in a given year; report cr...,"Texas DSHS, Texas Health Data Center for Healt...",,Bexar County level data,,,15.4,19.4
15142,HIV/AIDS,"HIV Diagnoses Rate (Per 100,000 people)",2010,Both,All,18.3,"San Diego County, CA",HIV cases diagnosed in a given year; report cr...,"Source: County of San Diego, Health & Human Se...",All HIV/AIDS data was pulled from the HIV/AIDS...,,,,,
15143,HIV/AIDS,"HIV Diagnoses Rate (Per 100,000 people)",2010,Both,All,18.5,"Kansas City, MO",HIV cases diagnosed in a given year; report cr...,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16074,HIV/AIDS,"HIV Diagnoses Rate (Per 100,000 people)",2016,Both,All,24.0,"Las Vegas (Clark County), NV",HIV cases diagnosed in a given year; report cr...,,,,,,21.9,26.2
16075,HIV/AIDS,"HIV Diagnoses Rate (Per 100,000 people)",2016,Both,All,30.7,"Oakland (Alameda County), CA",HIV cases diagnosed in a given year; report cr...,"Alameda County eHARS, Q2 2017",,,,,25.2,36.2
16076,HIV/AIDS,"HIV Diagnoses Rate (Per 100,000 people)",2016,Both,All,31.5,"Philadelphia, PA","HIV cases diagnosed in 2012, 2013, 2014 (as av...",AACO team,,,,,,
16077,HIV/AIDS,"HIV Diagnoses Rate (Per 100,000 people)",2016,Both,All,34.4,"Dallas, TX",HIV cases diagnosed in a given year; report cr...,Data Sources: Dallas County Department of Heal...,,All data are at the Dallas County level.,,,,


In [64]:
merged_temp = pd.merge(hiv_data,obese_data, on=['Year','Sex','Race/Ethnicity','Place'],suffixes=('_hiv','_obese'))

In [73]:
?pd.Series.apply

In [75]:
import numpy as np

In [87]:
merged_temp[merged_temp['Place'] == 'Baltimore, MD']

Unnamed: 0,Indicator Category_hiv,Indicator_hiv,Year,Sex,Race/Ethnicity,Value_hiv,Place,BCHC Requested Methodology_hiv,Source_hiv,Methods_hiv,...,Indicator_obese,Value_obese,BCHC Requested Methodology_obese,Source_obese,Methods_obese,Notes_obese,90% Confidence Level - Low_obese,90% Confidence Level - High_obese,95% Confidence Level - Low_obese,95% Confidence Level - High_obese
9,HIV/AIDS,"HIV Diagnoses Rate (Per 100,000 people)",2010,Both,All,77.6,"Baltimore, MD",HIV cases diagnosed in a given year; report cr...,Baltimore City Annual HIV Epidemiological prof...,"For 2010 data, the denominators are from the 2...",...,Percent of Adults Who Are Obese,29.2,BRFSS (or similar survey). Percent of populati...,CDC BRFSS,The three most recent years of available data ...,"Due to changes in BRFSS sampling methodology, ...",,,,
32,HIV/AIDS,"HIV Diagnoses Rate (Per 100,000 people)",2011,Both,All,81.4,"Baltimore, MD",HIV cases diagnosed in a given year; report cr...,Baltimore City Annual HIV Epidemiological prof...,"For 2010 data, the denominators are from the 2...",...,Percent of Adults Who Are Obese,37.3,BRFSS (or similar survey). Percent of populati...,CDC BRFSS,The three most recent years of available data ...,"Due to changes in BRFSS sampling methodology, ...",,,,
49,HIV/AIDS,"HIV Diagnoses Rate (Per 100,000 people)",2012,Both,All,89.9,"Baltimore, MD",HIV cases diagnosed in a given year; report cr...,Baltimore City Annual HIV Epidemiological prof...,"For 2010 data, the denominators are from the 2...",...,Percent of Adults Who Are Obese,30.7,BRFSS (or similar survey). Percent of populati...,CDC BRFSS,The three most recent years of available data ...,"Due to changes in BRFSS sampling methodology, ...",,,,


In [89]:
merged_temp.groupby(['Indicator_obese','Year','Sex','Race/Ethnicity','Place'])['Value_obese'].apply(np.sum).reset_index()

Unnamed: 0,Indicator_obese,Year,Sex,Race/Ethnicity,Place,Value_obese
0,Percent of Adults Who Are Obese,2010,Both,All,"Baltimore, MD",29.2
1,Percent of Adults Who Are Obese,2010,Both,All,"Charlotte, NC",26.7
2,Percent of Adults Who Are Obese,2010,Both,All,"Houston, TX",30.6
3,Percent of Adults Who Are Obese,2010,Both,All,"Miami (Miami-Dade County), FL",29.3
4,Percent of Adults Who Are Obese,2010,Both,All,"San Antonio, TX",32.4
...,...,...,...,...,...,...
116,Percent of High School Students Who Are Obese,2015,Both,All,"Las Vegas (Clark County), NV",11.4
117,Percent of High School Students Who Are Obese,2015,Both,All,"Oakland (Alameda County), CA",0.0
118,Percent of High School Students Who Are Obese,2015,Both,All,"Philadelphia, PA",13.7
119,Percent of High School Students Who Are Obese,2015,Both,All,"Portland (Multnomah County), OR",13.6


In [71]:
merged_temp.sort_values(by=['Year','Value_hiv'],ascending=[False,False])

Unnamed: 0,Indicator Category_hiv,Indicator_hiv,Year,Sex,Race/Ethnicity,Value_hiv,Place,BCHC Requested Methodology_hiv,Source_hiv,Methods_hiv,...,Indicator_obese,Value_obese,BCHC Requested Methodology_obese,Source_obese,Methods_obese,Notes_obese,90% Confidence Level - Low_obese,90% Confidence Level - High_obese,95% Confidence Level - Low_obese,95% Confidence Level - High_obese
120,HIV/AIDS,"HIV Diagnoses Rate (Per 100,000 people)",2016,Both,All,31.5,"Philadelphia, PA","HIV cases diagnosed in 2012, 2013, 2014 (as av...",AACO team,,...,Percent of Adults Who Are Obese,28.0,BRFSS (or similar survey). Percent of populati...,PA Eddie-->BRFSS,,,,,24.0,33.0
118,HIV/AIDS,"HIV Diagnoses Rate (Per 100,000 people)",2016,Both,All,30.7,"Oakland (Alameda County), CA",HIV cases diagnosed in a given year; report cr...,"Alameda County eHARS, Q2 2017",,...,Percent of Adults Who Are Obese,26.9,BRFSS (or similar survey). Percent of populati...,California Health Interview Survey (AskCHIS),California Health Interview Survey. Percent of...,Data is for Alameda County.,,,12.4,41.5
119,HIV/AIDS,"HIV Diagnoses Rate (Per 100,000 people)",2016,Both,All,30.7,"Oakland (Alameda County), CA",HIV cases diagnosed in a given year; report cr...,"Alameda County eHARS, Q2 2017",,...,Percent of High School Students Who Are Obese,0.0,YRBS/YRBSS (or similar summary). Percent of hi...,California Health Interview Survey (AskCHIS),Obese (BMI highest 5th percentile) Teen only,Data is for Alameda County; Records where the ...,,,,
117,HIV/AIDS,"HIV Diagnoses Rate (Per 100,000 people)",2016,Both,All,24.0,"Denver, CO",HIV cases diagnosed in a given year; report cr...,,,...,Percent of Adults Who Are Obese,18.5,BRFSS (or similar survey). Percent of populati...,Colorado BRFSS,Colorado BRFSS,,,,15.8,21.1
116,HIV/AIDS,"HIV Diagnoses Rate (Per 100,000 people)",2016,Both,All,20.2,"Kansas City, MO",HIV cases diagnosed in a given year; report cr...,,,...,Percent of Adults Who Are Obese,31.6,BRFSS (or similar survey). Percent of populati...,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4,HIV/AIDS,"HIV Diagnoses Rate (Per 100,000 people)",2010,Both,All,34.6,"Los Angeles, CA",HIV cases diagnosed in a given year; report cr...,eHARS,The selection of LA City residents is based on...,...,Percent of High School Students Who Are Obese,15.2,YRBS/YRBSS (or similar summary). Percent of hi...,YRBS,,Adolescents,,,,
3,HIV/AIDS,"HIV Diagnoses Rate (Per 100,000 people)",2010,Both,All,32.8,"Houston, TX",HIV cases diagnosed in a given year; report cr...,,,...,Percent of Adults Who Are Obese,30.6,BRFSS (or similar survey). Percent of populati...,Source: Texas Behavioral Risk Factor Surveilla...,crude rate of BMI>=30 in Houston-Baytown-Sugar...,"Data for the Houston-Baytown-Sugarland MSA, no...",,,27.9,33.5
2,HIV/AIDS,"HIV Diagnoses Rate (Per 100,000 people)",2010,Both,All,18.3,"San Diego County, CA",HIV cases diagnosed in a given year; report cr...,"Source: County of San Diego, Health & Human Se...",All HIV/AIDS data was pulled from the HIV/AIDS...,...,Percent of Adults Who Are Obese,26.1,BRFSS (or similar survey). Percent of populati...,Centers for Disease Control and Prevention (CD...,,,,,,
1,HIV/AIDS,"HIV Diagnoses Rate (Per 100,000 people)",2010,Both,All,17.4,"San Antonio, TX",HIV cases diagnosed in a given year; report cr...,"Texas DSHS, Texas Health Data Center for Healt...",,...,Percent of Adults Who Are Obese,32.4,BRFSS (or similar survey). Percent of populati...,Data is pulled from the Texas State Health & H...,,Bexar County level data,,,28.4,36.5


In [70]:
filtered_data[(filtered_data['Indicator'] == 'Percent of High School Students Who Are Obese') &
              (filtered_data['Place'] == 'Oakland (Alameda County), CA')]

Unnamed: 0,Indicator Category,Indicator,Year,Sex,Race/Ethnicity,Value,Place,BCHC Requested Methodology,Source,Methods,Notes,90% Confidence Level - Low,90% Confidence Level - High,95% Confidence Level - Low,95% Confidence Level - High
8180,Chronic Disease,Percent of High School Students Who Are Obese,2011,Both,All,0.0,"Oakland (Alameda County), CA",YRBS/YRBSS (or similar summary). Percent of hi...,California Health Interview Survey (AskCHIS),California Health Interview Survey. Percent of...,Data is for Alameda County; Records where the ...,,,,
8233,Chronic Disease,Percent of High School Students Who Are Obese,2012,Both,All,0.0,"Oakland (Alameda County), CA",YRBS/YRBSS (or similar summary). Percent of hi...,California Health Interview Survey (AskCHIS),Obese (BMI highest 5th percentile) Teen only,Data is for Alameda County; Records where the ...,,,,
8262,Chronic Disease,Percent of High School Students Who Are Obese,2013,Both,All,0.0,"Oakland (Alameda County), CA",YRBS/YRBSS (or similar summary). Percent of hi...,California Health Interview Survey (AskCHIS),Obese (BMI highest 5th percentile) Teen only,Data is for Alameda County; Records where the ...,,,,
8349,Chronic Disease,Percent of High School Students Who Are Obese,2014,Both,All,0.0,"Oakland (Alameda County), CA",YRBS/YRBSS (or similar summary). Percent of hi...,California Health Interview Survey (AskCHIS),Obese (BMI highest 5th percentile) Teen only,Data is for Alameda County; Records where the ...,,,,
8377,Chronic Disease,Percent of High School Students Who Are Obese,2015,Both,All,0.0,"Oakland (Alameda County), CA",YRBS/YRBSS (or similar summary). Percent of hi...,California Health Interview Survey (AskCHIS),Obese (BMI highest 5th percentile) Teen only,Data is for Alameda County; Records where the ...,,,,
8442,Chronic Disease,Percent of High School Students Who Are Obese,2016,Both,All,0.0,"Oakland (Alameda County), CA",YRBS/YRBSS (or similar summary). Percent of hi...,California Health Interview Survey (AskCHIS),Obese (BMI highest 5th percentile) Teen only,Data is for Alameda County; Records where the ...,,,,


In [50]:
filtered_data['Place'].unique()

array(['Washington, DC', 'Fort Worth (Tarrant County), TX',
       'Oakland (Alameda County), CA', 'San Antonio, TX',
       'San Diego County, CA', 'Kansas City, MO', 'Denver, CO',
       'Las Vegas (Clark County), NV', 'Columbus, OH', 'Phoenix, AZ',
       'Boston, MA', 'Houston, TX', 'Minneapolis, MN', 'Los Angeles, CA',
       'Miami (Miami-Dade County), FL', 'Portland (Multnomah County), OR',
       'Charlotte, NC', 'Baltimore, MD', 'Philadelphia, PA',
       'Seattle, WA', 'Detroit, MI', 'New York City, NY',
       'Long Beach, CA', 'Chicago, Il',
       'Indianapolis (Marion County), IN', 'San Jose, CA',
       'San Francisco, CA', 'Cleveland, OH', 'Dallas, TX', 'Austin, TX'],
      dtype=object)

In [None]:
# 100,000