# Establishing the link between Crime and Homeless Shelters
According to <a href="https://crim.sas.upenn.edu/sites/default/files/Ridgeway_Effect%20of%20Emergency%20Shelters-v5_1.2.2018.pdf">statistics</a>, the presence of a homeless shelter appears to cause  crime to increase by 56% within 100m of that shelter, with thefts from vehicles, other thefts, and vandalism driving the increase. The Vancouver open Data catalogue provides us a list of Homeless shelters within city limits along with their exact co-ordinates in the same dataset. By plotting it against crime in the vicinity, we shall attempt to establish by ourself if there is any truth to this statement, atleast for the city of Vancouver.

In [3]:
import pandas as pd
import reverse_geocoder as rg
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import *
from geocodio import GeocodioClient
API_KEY = 'dd80c07f04d3066730c74d703707660d407fdcf'

In [4]:
#Create Spark Session and context
spark = SparkSession\
    .builder\
    .appName("example code")\
    .config("spark.driver.extraClassPath","/home/jim/spark-2.4.0-bin-hadoop2.7/jars/mysql-connector-java-5.1.49.jar")\
    .getOrCreate()
spark.sparkContext.setLogLevel('WARN')
sc = spark.sparkContext

Load the Homelessness Dataset

In [6]:
homeless = spark.read.format("csv").option("header", "true").load("../Data/Homeless_shelters.csv")
homeless.show(10,truncate=True)
print('The Graffiti dataset has {} rows'.format(homeless.count()))

+--------------------+----------------+-----------------+--------------------+------------+-----+----+-----+
|            FACILITY|             LAT|             LONG|            CATEGORY|       PHONE|MEALS|PETS|CARTS|
+--------------------+----------------+-----------------+--------------------+------------+-----+----+-----+
|      Covenant House|49.2754042223039|-123.126322529911| Youth (all genders)|604-685-7474|  yes|  no|   no|
|  Aboriginal Shelter|49.2715743534663|-123.099346861579|Adults (all genders)|604-347-0299|  yes| yes|  yes|
|      Anchor of Hope|49.2821327941377|-123.101324923635|Adults (all genders)|604-646-6899|  yes|  no|   no|
|       Yukon Shelter|49.2668908859729| -123.11236726194|Adults (all genders)|604-264-1680|   no| yes|   no|
|           Crosswalk|49.2818050804002|-123.107672478832|Adults (all genders)|604-669-4349|   no|  no|   no|
|Dusk to Dawn-Dire...|49.2795671693397|-123.127792927078| Youth (all genders)|604-633-1472|  yes|  no|   no|
|        New Founta

#### Here, we have lat/long pairs in the dataset but this is not enough to join it to any other datset based on location. The problem is that LAT/LONG pairs are never exact. For a 13 digit lat/long pair, there will exist only a single block.
We on the other hand, are considering crime levels by AREA, hence we need a way to generate a 'HUNDRED_BLOCK' field from the LAT/LONG pair. We have a useful API that can be used to do so as to that effect.

In [8]:
homeless = homeless.select('FACILITY','LAT','LONG')
latitude_list = homeless.select("LAT").rdd.flatMap(lambda x: x).collect()
longitude_list = homeless.select("LONG").rdd.flatMap(lambda x: x).collect()
neighbourhood_list = []
client = GeocodioClient(API_KEY)

for i,j in zip(latitude_list,longitude_list):
    location = client.reverse((i,j))
    neighbourhood_list.append(location['results'][0]['address_components']['number'][:2]+'XX '+location['results'][0]['address_components']['formatted_street'].upper())

temp_df = homeless.toPandas()
temp_df['HUNDRED_BLOCK'] = neighbourhood_list
homeless = spark.createDataFrame(temp_df)
homeless.show(15,truncate=False) 

+------------------------------------+----------------+-----------------+-------------------+
|FACILITY                            |LAT             |LONG             |HUNDRED_BLOCK      |
+------------------------------------+----------------+-----------------+-------------------+
|Covenant House                      |49.2754042223039|-123.126322529911|12XX SEYMOUR ST    |
|Aboriginal Shelter                  |49.2715743534663|-123.099346861579|24XX NORTHERN ST   |
|Anchor of Hope                      |49.2821327941377|-123.101324923635|13XX E CORDOVA ST  |
|Yukon Shelter                       |49.2668908859729|-123.11236726194 |20XX YUKON ST      |
|Crosswalk                           |49.2818050804002|-123.107672478832|10XX W HASTINGS ST |
|Dusk to Dawn-Directions Youth Centre|49.2795671693397|-123.127792927078|11XX BURRARD ST    |
|New Fountain                        |49.2828738118386|-123.105225398756|36XX BLOOD ALLEY SQ|
|Catholic Charities Men's Hostel     |49.2777650064068|-123.

### Now we will load the dataset of crimes that is our main source of crime data

In [27]:
crime_df = spark.read.format("csv").option("header", "true").load("..//Data/crime/crime_all_years_latlong.csv")
#Drop unrequired columns
crime_df = crime_df.select(['TYPE','HUNDRED_BLOCK','LATITUDE','LONGITUDE'])
crime_df = crime_df.dropna(how='any')
crime_df.show(10,truncate=True)
print("Crime Dataset has {} rows".format(crime_df.count()))

+--------------------+------------------+------------------+-------------------+
|                TYPE|     HUNDRED_BLOCK|          LATITUDE|          LONGITUDE|
+--------------------+------------------+------------------+-------------------+
|            Mischief|     6X E 52ND AVE| 49.22285547453633|-123.10457767461014|
|    Theft of Vehicle|   71XX NANAIMO ST| 49.21942208176436|-123.05928356709362|
|Break and Enter C...|   1XX E PENDER ST|49.280454355702865|-123.10100566349294|
|            Mischief|     9XX CHILCO ST| 49.29261448054877|-123.13962081805273|
|            Mischief|     9XX CHILCO ST| 49.29260865723727|-123.13945233120421|
|            Mischief|24XX E HASTINGS ST|49.281126361961825| -123.0554729922974|
|  Theft from Vehicle| 8X W BROADWAY AVE|49.263002922167225|-123.10655743565438|
|            Mischief|24XX E HASTINGS ST| 49.28112610578195|-123.05525671257254|
|  Theft from Vehicle|   29XX W 14TH AVE| 49.25958751890934| -123.1707943860336|
|  Theft from Vehicle|   29X

#### Let's check if there are any common HUNDRED_BLOCK values to join else we need to join on another field

In [26]:
print("Common values are:")
set(df['HUNDRED_BLOCK'].unique()).intersection(set(homeless.select("HUNDRED_BLOCK").rdd.flatMap(lambda x: x).collect()))

Common values are:


{'10XX W HASTINGS ST',
 '11XX BURRARD ST',
 '11XX W 10TH AVE',
 '12XX SEYMOUR ST',
 '15XX ROBSON ST',
 '16XX E 1ST AVE',
 '18XX E 1ST AVE',
 '20XX YUKON ST',
 '32XX E HASTINGS ST'}

We're in luck : there are some crimes in the same hundred block radii as homeless shelters.
#### Now let us join the datasets. We will retain the crime dataset completely because we wanna see if there are homeless shelters in areas where crime concentration is relatively high.

In [55]:
#Create Temp tables in SPark.sql
homeless.createOrReplaceTempView("DF1")
crime_df.createOrReplaceTempView("DF2")

#SQL JOIN
joined_df = spark.sql("""SELECT DF1.FACILITY, DF2.* 
                      FROM DF1 RIGHT JOIN DF2 ON DF1.HUNDRED_BLOCK = DF2.HUNDRED_BLOCK""")
joined_df.dropna()
joined_df.show(15,truncate=True)
print("The new Dataset has {} rows".format(joined_df.count()))

+--------+--------------------+--------------------+-----------------+-------------------+
|FACILITY|                TYPE|       HUNDRED_BLOCK|         LATITUDE|          LONGITUDE|
+--------+--------------------+--------------------+-----------------+-------------------+
|    null|Vehicle Collision...|10XX BLOCK HAMILT...|49.27679130655303|-123.11977504146066|
|    null|  Theft from Vehicle|     10XX E 17TH AVE|49.25543143653735|-123.08315357881993|
|    null|Break and Enter R...|     10XX E 17TH AVE|49.25543110184606|-123.08311867268819|
|    null|            Mischief|     10XX E 17TH AVE|49.25535289926625|-123.08354015601581|
|    null|  Theft from Vehicle|     10XX E 17TH AVE|49.25535289926625|-123.08354015601581|
|    null|  Theft from Vehicle|     10XX E 17TH AVE|49.25535289926625|-123.08354015601581|
|    null|  Theft from Vehicle|     10XX E 17TH AVE|49.25542775484542|-123.08276892427928|
|    null|Break and Enter R...|     10XX E 17TH AVE|49.25542942852742|-123.08294372976631|

Since the join, the fACILITY column displays nulls when a homeless shelter is not in the vicinity of a particular crime and the shelter name if otherwise. For visualization purposes, it is better to convert nulls to 0 meaning that a facility does not exist or 1 meaning there is a homeless shelter in the vicinity.

In [51]:
joined_df = joined_df.withColumn('FACILITY',when(col('FACILITY').isNull(),col('FACILITY')).otherwise(lit(1)))
joined_df = joined_df.na.fill({'FACILITY':0})
joined_df.show(15)

+--------+--------------------+--------------------+-----------------+-------------------+
|FACILITY|                TYPE|       HUNDRED_BLOCK|         LATITUDE|          LONGITUDE|
+--------+--------------------+--------------------+-----------------+-------------------+
|       0|Vehicle Collision...|10XX BLOCK HAMILT...|49.27679130655303|-123.11977504146066|
|       0|  Theft from Vehicle|     10XX E 17TH AVE|49.25543143653735|-123.08315357881993|
|       0|Break and Enter R...|     10XX E 17TH AVE|49.25543110184606|-123.08311867268819|
|       0|            Mischief|     10XX E 17TH AVE|49.25535289926625|-123.08354015601581|
|       0|  Theft from Vehicle|     10XX E 17TH AVE|49.25535289926625|-123.08354015601581|
|       0|  Theft from Vehicle|     10XX E 17TH AVE|49.25535289926625|-123.08354015601581|
|       0|  Theft from Vehicle|     10XX E 17TH AVE|49.25542775484542|-123.08276892427928|
|       0|Break and Enter R...|     10XX E 17TH AVE|49.25542942852742|-123.08294372976631|

In [56]:
joined_df.repartition(1).write.format("com.databricks.spark.csv").option("header", "true").save("Homeless.csv")

## Here is the Tableau visualization:
The locations with homeless shelters are plotted on the map in big red markers. The size of the marker represents the No of homeless shelters where as their color intensity (blue )depicts crime intensity. The Tableau public dashboard can be found at <a href="https://public.tableau.com/views/Crime_vs_Graffiti/Dashboard1?:language=en&:display_count=y&publish=yes&:origin=viz_share_link">https://public.tableau.com/views/Crime_vs_Graffiti/Dashboard1?:language=en&:display_count=y&publish=yes&:origin=viz_share_link
</a><br>
<img src="../Visualisation/Raw/Homeless.png">