<a href="https://colab.research.google.com/github/kemoj/Ireland-Homelessness-Analysis/blob/main/Copy_of_Homelessness_Ireland.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# IMPORTANT: RUN THIS CELL IN ORDER TO IMPORT YOUR KAGGLE DATA SOURCES,
# THEN FEEL FREE TO DELETE THIS CELL.
# NOTE: THIS NOTEBOOK ENVIRONMENT DIFFERS FROM KAGGLE'S PYTHON
# ENVIRONMENT SO THERE MAY BE MISSING LIBRARIES USED BY YOUR
# NOTEBOOK.
import kagglehub
dafrsp_homelessness_in_ireland_2019_october_2021_path = kagglehub.dataset_download('dafrsp/homelessness-in-ireland-2019-october-2021')

print('Data source import complete.')


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
%matplotlib inline

In [None]:
from plotly import __version__
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot

In [None]:
import cufflinks as cf

In [None]:
init_notebook_mode(connected=True)

In [None]:
cf.go_offline()

Importing the CVS File to be analyzed

In [None]:
hmls = pd.read_csv('../input/homelessness-in-ireland-2019-october-2021/Homeless_Ireland_2019_to_2021.csv')

Checking the first five rows of the dataframe to make sure everything is all right.

In [None]:
hmls.head(5)

Using a heatmap to find null cells inside of the dataframe.

In [None]:
sns.heatmap(hmls.isnull(),yticklabels = False, cbar = False, cmap = 'viridis')

In [None]:
hmls.columns

So, most null cells are in the number of child dependents in a family column, with some data missing from 'Number of Adults in Families', 'Number of Single-Parent families', and 'Number of Dependants in Families'. This is data that was not provided from the original data.

For easier analysis later of the different regions in Ireland, and the homelessness in those regions, data frames could be made for each individual region.

In [None]:
dub = hmls[hmls['Region'] == 'Dublin']
mide = hmls[hmls['Region'] == 'Mid-East']
midl = hmls[hmls['Region'] == 'Midlands']
midw = hmls[hmls['Region'] == 'Mid-West']
ne = hmls[hmls['Region'] == 'North-East']
nw = hmls[hmls['Region'] == 'North-West']
se = hmls[hmls['Region'] == 'South-East']
sw = hmls[hmls['Region'] == 'South-West']
w = hmls[hmls['Region'] == 'West']

However, since each region was made into their own dataframes, there's no need for the 'Region' column in these dataframes

In [None]:
dub.drop(columns = 'Region', inplace = True)
mide.drop(columns = 'Region',inplace = True)
midl.drop(columns = 'Region', inplace = True)
midw.drop(columns = 'Region', inplace = True)
ne.drop(columns = 'Region', inplace = True)
nw.drop(columns = 'Region',inplace = True)
se.drop(columns = 'Region',inplace = True)
sw.drop(columns = 'Region',inplace = True)
w.drop(columns = 'Region',inplace = True)

Next, let's look at some statistics for each region individually.

In [None]:
#Descriptive statistics of Dublin
dub.describe()

In [None]:
#Descriptive statistics of the Mid-East of Ireland
mide.describe()

In [None]:
#Descriptive statistics of the middle of Ireland
midl.describe()

In [None]:
#Descriptive statistics of the Middle West of Ireland
midw.describe()

In [None]:
#Descriptive Statistics of the North East of Ireland
ne.describe()

In [None]:
#Descriptive statistics of the North West of Ireland
nw.describe()

In [None]:
#Descriptive Statistics of the South East of Ireland
se.describe()

In [None]:
#Descriptive Statistics of the South West of Ireland
sw.describe()

In [None]:
#Descriptive statistics on the west of Ireland
w.describe()

In [None]:
#Descriptive Statistics of Ireland as a whole
hmls.describe()

In [None]:
#Descriptive statistics can be done without creating new dataframes, however, the individual dataframes were created
#for the eas of later analyses.
hmls[hmls['Region'] == 'West'].describe()

Grouping all the months together in a whole year in order to analyze the years '2019','2020', and '2021' more easily and to make easier to read figures.

In [None]:
#df.groupby(['Date', 'Keyword'])['Views'].sum().reset_index()
hmls_year = hmls.groupby(['Region','Year'])[['Total Adults', 'Male Adults', 'Female Adults',
       'Adults Aged 18-24', 'Adults Aged 25-44', 'Adults Aged 45-64',
       'Adults Aged 65+',
       'Number of people who accessed Private Emergency Accommodation',
       'Number of people who accessed Supported Temporary Accommodation',
       'Number of people who accessed Temporary Emergency Accommodation',
       'Number of people who accessed Other Accommodation',
       'Number of Families', 'Number of Adults in Families',
       'Number of Single-Parent families', 'Number of Dependants in Families']].sum().reset_index()

Checking to make sure that the table came out clearly

In [None]:
hmls_year.head(5)

To start with the figures, let's look at the total number of homeless adults in Ireland for each year, keeping in mind that there is significant missing data from four months in 2021. First, to get just to total of each year, we'll further group the 'hmls_year' dataframe down to totals of each column from each year. The result should be a new data frame with totals for each column per year.

In [None]:
hmls_totals = hmls_year.groupby(['Year']).sum().reset_index()

Now, let's see the totals from each year, then graph the total number of homeless adults/per:

In [None]:
hmls_totals

In [None]:
plt.figure(figsize = (24,12))
sns.barplot(x = 'Year', y = 'Total Adults', data = hmls_totals)

In [None]:
fig1 = px.bar(hmls_totals, x = 'Year', y = 'Total Adults', height = 750)
fig1

Now, using a seaborn bar plot to see how many adults are homeless in total in each region. Seaborn automatically combines all months in the region.

In [None]:
plt.figure(figsize=(24,12))

sns.barplot(x = 'Region', y = 'Total Adults', data = hmls)
plt.xticks(plt.xticks()[0], rotation=90)
plt.tight_layout()

Using Seaborn to see the number of homeless adults in each region over the past three years.

In [None]:
plt.figure(figsize=(24,12))

sns.barplot(x = 'Region', y = 'Total Adults', data = hmls, hue = 'Year')
plt.xticks(plt.xticks()[0], rotation=90)
plt.tight_layout()

One important thing to remember, when looking at this figure, is that 2021 is missing data from 4 months: November, December, July, and March 2021. As these are totals for each year, data from 4 months is a lot of missing information.


Now, using both iplot and plotly express to create interactive figures. These worked better with the dataframe 'hmls_year', but each column is still separated by year.

In [None]:
hmls_year.iplot(kind = 'bar', x = 'Region', y = 'Total Adults', sortbars = True, barmode = 'group')

In [None]:
fig2 = px.bar(hmls_year, x = 'Region', y = 'Total Adults', color = 'Year', height = 500)
fig2.update_layout(barmode='group')
fig2

Using plotly express, this can be made a bit clearer for each year and each region. As can be seen, the highest total amount of homeless adults in Ireland is significantly concentrated in Dublin, while the next would be in the South-West (Cork and Kerry Counties).

In [None]:
fig3 = px.bar(hmls_year, x = 'Year', y = 'Total Adults', color = 'Region', height = 500)
fig3.update_layout(barmode='group')
fig3

As can be seen in Dublin, both 2019 and 2020 had around 53k and 51k (rounded), respectfully, while in 2021, the numbers dropped to 33k homeless adults in Dublin. Let's graph a monthly chart for Dublin to see how many homeless people were in Dublin during March, July, November, and December of 2019, and 2020

In [None]:
fig4 = px.bar(dub, x = 'Month', y = 'Total Adults', color = 'Year')
fig4.update_layout(barmode = 'group')
fig4

So, in March 2019 there were 4315 homeless adults in Dublin, and in March 2020, there were 4515 homeless adults; 4300 homeless adults in July 2019, and 4188 homeless adults in 2020; 4509 homeless adults in November 2019, and 4243 homeless adults in 2020; and 4509 homeless adults in 2019, and 4158 in 2020. Let's try to get the average of these months.

In [None]:
dub_av_mar = dub[dub['Month'] == 'March'].mean()
dub_av_july = dub[dub['Month'] == 'July'].mean()
dub_av_nov = dub[dub['Month'] == 'November'].mean()
dub_av_dec = dub[dub['Month'] == 'December'].mean()

In [None]:
print('March')
print(dub_av_mar[0])
print('July')
print(dub_av_july[0])
print('November')
print(dub_av_nov[0])
print('December')
print(dub_av_dec[0])

In [None]:
av_miss = dub_av_mar[0] + dub_av_july[0] + dub_av_nov[0] + dub_av_dec[0]
av_miss

So, the average number of homeless adults in Dublin in March is 4,415; in July it's 4,244, with a total of 17,369. Add that to the 33,453 total counted now, that could be an approximately 50,822. However, this is completely speculative, and not good enough to suspect that the number of homeless adults in Dublin has continued to decrease since 2019, though it may have as that was the trend from the previous two years. Still, there is not enough data over the years, or during this year to make a proper guess. This missing data can be seen clearly across each year, as plotted below. This will probably be a little clearer looking at the total number of homeless adults in Dublin by month for each year.

In [None]:
fig5 = px.bar(dub, x = 'Year', y = 'Total Adults', color = 'Month', height = 500)
fig5.update_layout(barmode = 'group')
fig5

Now, let's have a look at the different types of accommodation available to homeless people. To start, let's compare how much each type of accommodation has been used in total.

In [None]:
accom_year = hmls_year.groupby(['Year'])[['Number of people who accessed Private Emergency Accommodation',
       'Number of people who accessed Supported Temporary Accommodation',
       'Number of people who accessed Temporary Emergency Accommodation',
       'Number of people who accessed Other Accommodation']].sum()

In [None]:
accom_sum = accom_year[['Number of people who accessed Private Emergency Accommodation',
       'Number of people who accessed Supported Temporary Accommodation',
       'Number of people who accessed Temporary Emergency Accommodation',
       'Number of people who accessed Other Accommodation']].sum()

In [None]:
accom_sums = accom_sum.to_frame().reset_index()

In [None]:
accom_sums

In [None]:
accom_sums.rename(columns = {'index':'Type of Accommodation', 0:'Total'}, inplace = True)

In [None]:
accom_sums

In [None]:
fig6 = px.bar(accom_sums, x = 'Type of Accommodation', y = 'Total', height = 1000)
fig6

So, the most significant types of accommodation used or made available to homeless adults in Ireland over the past three years is Private Emergency, and Supported Temporary Accommodation. How has this changed each year though?

In [None]:
accom_year['Total'] = accom_year.sum(axis = 1)

In [None]:
accom_year

In [None]:
accom_years = accom_year.reset_index()

In [None]:
accom_years

In [None]:
fig7 = px.bar(accom_years, x = 'Year', y = 'Total', height = 1000)
fig7

In [None]:
#Recalling fig1, looking at total number of homeless adults to comopare to fig7, the total accommodation available
fig1

So, while the total number of accommodation has dropped each year, so has the total number of homeless adults in Ireland. Still, there are more ways to look at all of this data. It has been organized a few different ways to make later analyses easier. Next is a look at the numbers of different types of accommodation used in Ireland.

In [None]:
accom = hmls[['Number of people who accessed Private Emergency Accommodation',
       'Number of people who accessed Supported Temporary Accommodation',
       'Number of people who accessed Temporary Emergency Accommodation',
       'Number of people who accessed Other Accommodation']].sum().reset_index()

In [None]:
#dubav.rename(columns = {'index':'Homelessness Dublin',0: 'Average'}, inplace = True)
accom.rename(columns = {'index':'Accommodation',0:'Total'}, inplace = True)

In [None]:
accom

In [None]:
#How can I look at each type of accommodation individually.
plt.figure(figsize = (24,12))
sns.barplot(x = 'Accommodation', y = 'Total', data = accom)
plt.xticks(plt.xticks()[0],rotation = 90)
plt.tight_layout()

In [None]:
fig8 = px.bar(accom, x = 'Accommodation', y = 'Total', height = 1000)
fig8

It seems that Private Emergency Accommodation had the most with almost 105k homeless people accessing this type of accomodation, with Supported Temporary Accommodation coming next with nearly 94k homeless people across Ireland accessing this type of accomodation. Now, let's have a closer look at each type of accommodation over years and regions.

In [None]:
plt.figure(figsize = (24,12))
sns.barplot(x = 'Region', y = 'Number of people who accessed Private Emergency Accommodation', hue = 'Year', data = hmls)
plt.xticks(plt.xticks()[0], rotation=90)
plt.tight_layout()

Here's an interactive figure of Private Emergency Accommodation

In [None]:
fig9 = px.bar(hmls_year, x = 'Year', y = 'Number of people who accessed Private Emergency Accommodation', color = 'Region', height = 500)
fig9.update_layout(barmode = 'group')
fig9

So, in Dublin, the number has slightly gone up between 2019 and 2020. The same trend was seen in the Mid-East of Ireland, and the North West. If the Irish Independent is correct, then this is a step in the right direction. Let's look further at the different types of accommodation, and how the numbers compare to Private Emergency Accommodation.

In [None]:
fig10 = px.bar(hmls_year, x = 'Year', y = 'Number of people who accessed Supported Temporary Accommodation', color = 'Region', height = 500)
fig10.update_layout(barmode = 'group')
fig10

So, temporary supported housing, which is generally places like hostels, for people who are generally unable to support themselves (thus, supported housing).

In [None]:
fig11 = px.bar(hmls_year, x = 'Year', y = 'Number of people who accessed Temporary Emergency Accommodation', color = 'Region')
fig11.update_layout(barmode = 'group')
fig11

In [None]:
plt.figure(figsize=(24,12))

sns.barplot(x = 'Region', y = 'Number of people who accessed Supported Temporary Accommodation', data = hmls, hue = 'Year')
plt.xticks(plt.xticks()[0], rotation=90)
plt.tight_layout()

In [None]:
fig12 = px.bar(hmls_year, x = 'Year', y = 'Number of people who accessed Supported Temporary Accommodation', color = 'Region')
fig12.update_layout(barmode = 'group')
fig12

In [None]:
plt.figure(figsize = (24,12))

sns.barplot(x = 'Region', y = 'Number of people who accessed Temporary Emergency Accommodation', data = hmls, hue = 'Year')
plt.xticks(plt.xticks()[0], rotation = 90)
plt.tight_layout()



In [None]:
fig13 = px.bar(hmls_year, x = 'Year', y = 'Number of people who accessed Temporary Emergency Accommodation', color = 'Region')
fig13.update_layout(barmode = 'group')
fig13

In [None]:
fig14 = px.bar(hmls_year, x = 'Year', y = 'Number of people who accessed Other Accommodation', color = 'Region', height = 500)
fig14.update_layout(barmode = 'group')
fig14

It seems that, in all ways, there's been a drop in housing for the homeless, but the number of homeless people in Ireland has not decreased.

What about looking at the averages across different regions. Let's start looking at Dublin since we've already had a look into that region. First, we have to get the average of each category for each month of each year, then turn that into a dataframe, reset the index, and give the columns proper names.

In [None]:
dubav = dub.drop([0]).mean()

In [None]:
dubav = dubav.to_frame()

In [None]:
dubav

In [None]:
dubav.reset_index(inplace = True)
dubav.rename(columns = {'index':'Homelessness Dublin',0: 'Average'}, inplace = True)
dubav

In [None]:
plt.figure(figsize = (12,24))
dubav.iplot(kind = 'bar', x = 'Homelessness Dublin', y = 'Average')

In [None]:
fig15 = px.bar(dubav, x = 'Homelessness Dublin', y = 'Average', height=750)
fig15

So, these are some analyses that can be done, and simple insights that can be drawn from this data. A few key take aways is that:
1) There were more homeless adults in Ireland in 2019 compared to both 2020, and 2021 (though, it should be remembered that there are four months missing in 2021)

2) The highest number of homeless adults are in the Dublin region, while the second highest (which is significantly lower) is in the South-West Region (Cork and Kerry Counties).

3) The number of accommodation that has been used or has been available (not made clear on the website) has dropped between 2019 and 2020, but the number is still higher than the number of homeless adults in Ireland. It should be noted, however, it is not clear from the website whether these were repeated users of the accommodation or what percentage of homeless adults used the population, therefore, any clear interpretation of these findings would be totally speculative.

4) Private Emergency Accommodation makes up the most of accommodation with Supported Temporary Accommodation being second. As pointed out several times, many politicians believe that Private Accommodation should be significantly reduced, and better measures should take it's place.