<a href="https://colab.research.google.com/github/ksuaray/LAEP_S24/blob/Homelessness/Homelessness_Assignment_1_NB.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**National Estimates of Homelessness**

#**Importing Necessary Python Modules**

Python incorporates a variety of open source add-ins called **modules** that enable us to be able to

In [2]:
import pandas as pd
import numpy as np
import plotly.express as px
from IPython.display import Image


#**Context**

The Continuum of Care (CoC) Program is designed to promote communitywide commitment to the goal of ending homelessness; provide funding for efforts by nonprofit providers, and State and local governments to quickly rehouse homeless individuals and families while minimizing the trauma and dislocation caused to homeless individuals, families, and communities by homelessness; promote access to and effect utilization of mainstream programs by homeless individuals and families; and optimize self-sufficiency among individuals and families experiencing homelessness. For more information on the Program, please visit https://www.hudexchange.info/programs/coc/

The U.S. Department of Housing and Urban Development (HUD) provides Point-in-Time (PIT) count reports of sheltered and unsheltered persons experiencing homelessness, by household type and subpopulation. This data is available at the national and state level, and for each CoC. HUD also provides Housing Inventory Count (HIC) reports, which provide a snapshot of a CoC’s inventory of beds and units available on the night designated for the count by program type, and include beds dedicated to serve persons who are homeless as well as persons in Permanent Supportive Housing.
This raw data set contains PIT estimates of homelessness, and the corresponding accompanying HIC data from 2021.

Attribution: Adapted from U.S. Department of Housing and Urban Development

#**About the Dataset**

This dataset contains 100 rows corresponding to a random sample of localities (typically counties or similar large regions) that received Continuum of Care funding from HUD. A total of 21 variables are provided as listed on the next page.

**Variables**

| Variable Name(s) | Description |
| --- | --- |
| "CoC Number<br>CoC Name" | CoC locality identifier |
| CoC Category | Community setting designation: Rural, Urban, etc |
| Type of Count | Description of whether fully or partially sheltered individuals (or both) are included |
| Overall Homeless, 2021 | Total homeless count in all facilities indicated by the “Type of Count” variable |
| HMIS Participation Rate for Year-Round Beds (ES,TH,SH) | Percentage of facilities in the locality that participate in the Homeless Management Information System (HMIS) |
| "Total Year-Round Beds (ES, TH, SH)<br>Total Year-Round Beds (ES)<br>Total Year-Round Beds (TH)<br>Total Year-Round Beds (SH)" | "(all types combined)<br>ES = Emergency Shelter<br>TH = Transitional Housing<br>SH = Safe Haven" |
| "Total Units for Households with Children (ES, TH, SH)<br>Total Beds for Households with Children (ES, TH, SH)" | Number of units and beds designated for households with children |
| Sheltered ES Homeless 2021 | Estimate of the number of individuals sheltered in an Emergency Facility at the time of the study |
| Sheltered ES Homeless - Age 18 to 24, 2021 | Estimate of the number of individuals sheltered in an Emergency Facility by age |
| "Sheltered ES Homeless - Female, 2021<br>Sheltered ES Homeless - Male, 2021<br>Sheltered ES Homeless - Trans+, 2021" | Estimate of the number of individuals sheltered in an Emergency Facility by gender |
| "Sheltered ES Homeless - Hispanic/Latino, 2021<br>Sheltered ES Homeless - White, 2021<br>Sheltered ES Homeless - Black or African American, 2021<br>Sheltered ES Homeless - Asian or Pacific Islander<br>Sheltered ES Homeless - American Indian or Alaska Native, 2021<br>Sheltered ES Homeless - Multiple Races, 2021" | Estimate of the number of individuals sheltered in an Emergency Facility by race/ethnicity |


*Attribution:  FiveThirtyEight.com*

We can view a snippet of the data by first importing it directly from the [url below](https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/grad-students.csv).

**Data**

In [8]:
file_url = "https://raw.githubusercontent.com/ksuaray/LAEP_S24/Homelessness/Homelessness_Data.csv"
df = pd.read_csv(file_url)


Next, we can display the data by typing the name of the DataFrame. To ensure we can see all columns, we'll use the *pd.set_option* method.

In [9]:
# Set display options to show all columns
pd.set_option('display.max_columns', None)
df

Unnamed: 0,CoC Number,CoC Name,CoC Category,Type of Count,"Overall Homeless, 2021","Total Year-Round Beds (ES, TH, SH)","HMIS Participation Rate for Year-Round Beds (ES, TH, SH)",Total Year-Round Beds (ES),Total Year-Round Beds (TH),Total Year-Round Beds (SH),"Total Units for Households with Children (ES, TH, SH)","Total Beds for Households with Children (ES, TH, SH)","Sheltered ES Homeless, 2021","Sheltered ES Homeless - Age 18 to 24, 2021","Sheltered ES Homeless - Female, 2021","Sheltered ES Homeless - Male, 2021","Sheltered ES Homeless - Trans+, 2021","Sheltered ES Homeless - Hispanic/Latino, 2021","Sheltered ES Homeless - White, 2021","Sheltered ES Homeless - Black or African American, 2021",Sheltered ES Homeless - Asian or Pacific Islander,"Sheltered ES Homeless - American Indian or Alaska Native, 2021","Sheltered ES Homeless - Multiple Races, 2021"
0,LA-505,Monroe/Northeast Louisiana CoC,Largely Rural CoC,Sheltered-Only Count,78,133,0.233,69,64,0,18,52,40,4,21,19,0,0,17,19,0,4,0
1,MN-502,Rochester/Southeast Minnesota CoC,Largely Rural CoC,Sheltered-Only Count,419,599,0.604,378,221,0,114,372,261,18,110,150,1,42,186,51,9,3,12
2,PA-508,Scranton/Lackawanna County CoC,Largely Suburban CoC,Sheltered and full unsheltered,165,155,0.839,64,80,11,25,71,81,5,34,46,1,7,61,17,2,0,1
3,OK-503,Oklahoma Balance of State CoC,Largely Rural CoC,Sheltered-Only Count,125,149,0.295,135,14,0,5,34,125,24,49,75,1,52,88,25,1,8,3
4,OH-504,Youngstown/Mahoning County CoC,Other Largely Urban CoC,Sheltered-Only Count,62,198,0.192,187,11,0,39,115,56,6,41,13,2,21,25,19,1,0,11
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,KS-507,Kansas Balance of State CoC,Largely Rural CoC,Sheltered-Only Count,783,1170,0.162,703,452,15,170,572,430,41,153,276,1,48,307,60,7,23,33
96,CA-507,Marin County CoC,Largely Suburban CoC,Sheltered-Only Count,225,293,0.727,163,130,0,46,142,105,6,41,63,1,28,83,16,1,1,4
97,TN-512,"Morristown/Blount, Sevier, Campbell, Cocke Cou...",Largely Rural CoC,Sheltered and full unsheltered,472,133,0.316,117,16,0,28,65,104,5,61,43,0,5,99,1,1,1,2
98,MA-515,Fall River CoC,Largely Suburban CoC,Sheltered and full unsheltered,324,236,0.877,212,24,0,67,215,223,15,122,101,0,72,135,80,1,2,5


#**ASSIGNMENT 1 - Descriptive Statistics: Graphical and Numerical Summary**

**INSTRUCTIONS**

Use Python to analyze the data set and complete each of the following. As appropriate, copy the SPSS output and paste it in the correct part below. For problems that require a written response, type the answer below.

##**QUESTION 1**

Determine whether the three variables below are qualitative or quantitative. If they are quantitative, specify whether they are continuous or discrete.

| Variable         | Qual or Quant | Dis, Con, or Neither |
|------------------|---------------|----------------------|
| **CoC Category**              | Qual or Quant  | Dis, Con, or Neither              |
| **HMIS Participation Rate for Year-Round Beds (ES,TH,SH)**           | Qual or Quant   | Dis, Con, or Neither              |
| **Sheltered ES Homeless - Age 18 to 24, 2021**           | Qual or Quant  | Dis, Con, or Neither           |


##**QUESTION 2**

Construct a frequency table, relative frequency table, and bar chart to describe the distribution of the following variable based on your last name (pick the variable based on your last name). State any fact that jumps out to you.

| Last Name Variable | Description |
| --- | --- |
| A-M | CoC Category |
| N-Z | Type Of Count |


In [18]:
#Frequency table
freq_table = pd.value_counts(df['CoC Category'])
freq_table

Largely Suburban CoC       43
Largely Rural CoC          33
Other Largely Urban CoC    13
Major City CoC             11
Name: CoC Category, dtype: int64

In [19]:
#Relative frequency table
freq_table/len(df)

Largely Suburban CoC       0.43
Largely Rural CoC          0.33
Other Largely Urban CoC    0.13
Major City CoC             0.11
Name: CoC Category, dtype: float64

In [21]:
dfrf = pd.DataFrame(freq_table)
fig = px.bar(x=dfrf.index, y=dfrf['CoC Category'], barmode='group',
             title='Frequency Distribution Bar Chart')
fig.show()

## **QUESTIONS 3-6**

For questions 3-6: Find your variable based on your last name and use that variable when answering questions #3 to #6.  

| Last Name | Variable |
| --- | --- |
| A-F | Sheltered ES Homeless - Hispanic/Latino, 2021 |
| G-M | Sheltered ES Homeless - White, 2021 |
| N-S | Sheltered ES Homeless - Black or African American, 2021 |
| T-Z | Sheltered ES Homeless - Asian or Pacific Islander |



###**QUESTION 3**

Construct a histogram for your variable. Use Number of Intervals = 12.

In [24]:
fig = px.histogram(x=df['Sheltered ES Homeless - Hispanic/Latino, 2021'],nbins = 12)
fig.show()

###**QUESTION 4**

Construct a boxplot for your variable.  

In [23]:
px.box(x=df['Sheltered ES Homeless - Hispanic/Latino, 2021'])

###**QUESTION 5**

Calculate the following summary statistics for your variable: minimum, maximum, mean, median, standard deviation, Q1, and Q3. Paste the output below.

###**QUESTION 6**

Use information from questions #3, #4, and #5 to describe your variable in terms of shape, center, spread, and outliers. Interpret your findings.

##**QUESTION 7**

Calculate and state the unemployment rate for graduates, the median salary for graduates, the unemployment rate for non-graduates, and the median salary for non-graduates for your major or intended major. Compare the results.

##**QUESTION 8**

Generate a paragraph of at least 100 words to address one of the following questions:

### **QUESTION 8a**

Discuss how analyzing your chosen data set using statistical methods could help you become better prepared for future courses in your major?

### **QUESTION 8b**

Discuss how analyzing your chosen data set using statistical methods could be instrumental in becoming better prepared for your future career?