# Understanding Homelessness Rates

## Table of Contents
<ul>
<li><a href="#intro">Introduction</a></li>
<li><a href="#wrangling">Data Wrangling</a></li>
<li><a href="#eda">Exploratory Data Analysis</a></li>
<li><a href="#conclusions">Conclusions</a></li>
</ul>

## Introduction
**Need to add info about the databases and assumptions of homelessness including personal v macro issues**

### Data Background
(borrowed from: https://www.kaggle.com/bltxr9/eda-of-total-homeless-population)
This dataset was generated by CoC and provided to HUD. Note: HUD did not conduct a full data quality review on the data submitted by each CoC. For more information about this dataset click below:

[Continuum of Care (CoC) Program](https://www.hudexchange.info/programs/coc/)

[PIT and HIC Data Since 2007](https://www.hudexchange.info/resource/3031/pit-and-hic-data-since-2007/)

[CoC Homeless Populations and Subpopulations Reports](https://www.hudexchange.info/programs/coc/coc-homeless-populations-and-subpopulations-reports/)

## Data Wrangling

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

### Inspect Databases
#### Homelessness

In [3]:
df_home = pd.read_csv('2007-2016-Homelessness-USA.csv')
df_home.head()

Unnamed: 0,Year,State,CoC Number,CoC Name,Measures,Count
0,1/1/2007,AK,AK-500,Anchorage CoC,Chronically Homeless Individuals,224
1,1/1/2007,AK,AK-500,Anchorage CoC,Homeless Individuals,696
2,1/1/2007,AK,AK-500,Anchorage CoC,Homeless People in Families,278
3,1/1/2007,AK,AK-500,Anchorage CoC,Sheltered Chronically Homeless Individuals,187
4,1/1/2007,AK,AK-500,Anchorage CoC,Sheltered Homeless,842


**Check Data Types and Missing Data**

In [4]:
df_home.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 86529 entries, 0 to 86528
Data columns (total 6 columns):
Year          86529 non-null object
State         86529 non-null object
CoC Number    86529 non-null object
CoC Name      86529 non-null object
Measures      86529 non-null object
Count         86529 non-null object
dtypes: object(6)
memory usage: 4.0+ MB


No missing data but need to convert `Count` to int.

Check unique values

In [7]:
df_home.nunique()

Year            10
State           54
CoC Number     414
CoC Name       414
Measures        42
Count         3608
dtype: int64

What is 54 states?

In [8]:
df_home['State'].unique()

array(['AK', 'AL', 'AR', 'AZ', 'CA', 'CO', 'CT', 'DC', 'DE', 'FL', 'GA',
       'GU', 'HI', 'IA', 'ID', 'IL', 'IN', 'KS', 'KY', 'LA', 'MA', 'MD',
       'ME', 'MI', 'MN', 'MO', 'MS', 'MT', 'NC', 'ND', 'NE', 'NH', 'NJ',
       'NM', 'NV', 'NY', 'OH', 'OK', 'OR', 'PA', 'PR', 'RI', 'SC', 'SD',
       'TN', 'TX', 'UT', 'VA', 'VI', 'VT', 'WA', 'WI', 'WV', 'WY'],
      dtype=object)

I am less familiar with state abbreviations - need to get state names.

Check the year info

In [9]:
df_home['Year'].unique()

array(['1/1/2007', '1/1/2008', '1/1/2009', '1/1/2010', '1/1/2011',
       '1/1/2012', '1/1/2013', '1/1/2014', '1/1/2015', '1/1/2016'],
      dtype=object)

So each date is a single year and the additional date data could be dropped if so wished.

Check what measures are counted.

In [6]:
df_home['Measures'].unique()

array(['Chronically Homeless Individuals', 'Homeless Individuals',
       'Homeless People in Families',
       'Sheltered Chronically Homeless Individuals', 'Sheltered Homeless',
       'Sheltered Homeless Individuals',
       'Sheltered Homeless People in Families', 'Total Homeless',
       'Unsheltered Chronically Homeless Individuals',
       'Unsheltered Homeless', 'Unsheltered Homeless Individuals',
       'Unsheltered Homeless People in Families', 'Chronically Homeless',
       'Chronically Homeless People in Families', 'Homeless Veterans',
       'Sheltered Chronically Homeless',
       'Sheltered Chronically Homeless People in Families',
       'Sheltered Homeless Veterans', 'Unsheltered Chronically Homeless',
       'Unsheltered Chronically Homeless People in Families',
       'Unsheltered Homeless Veterans', 'Children of Parenting Youth',
       'Homeless Unaccompanied Children (Under 18)',
       'Homeless Unaccompanied Young Adults (Age 18-24)',
       'Homeless Unaccompan

There are a number of different types of categorizations here:

**Location of Homelessness**
- Sheltered homeless
- Unsheltered homeless

**Type of Homelessness**
- Chronically homeless
- Homeless

**Family Status**
- Individuals
- In Families
- Unaccompanied

**Age**
- Youth (Under 25)
- Young Adults (18 - 24)
- Children (Under 18)

**Others**
- Veterans
- Parenting youth

It is quite likely that all of these categorizations are not mutually exclusive and will need to be compared against each other and the totals to appropriately analyze. 