<h2> Regions and Districts of Uganda known for Homicides,Economic, Sex, and Child Related crimes in 2021</h2>

In this lab you are to prepare the data for analysis using geopy, geocoder and geopandas python packages to show the locations of police regins_districs where these crimes were reported in the entire Uganda Map. Include pointers and markers for most polar cimes regions.

<b> By Michael Kumakech</b>

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
df =pd.read_csv("crime_uganda.csv", encoding = "latin1")
df.head()

Unnamed: 0,REGIONS_DISTRICTS,HOMICIDES,ECONOMIC_CRIMES,SEX_RELATED_OFFENCES,CHILD_RELATED_OFFENCES
0,C.I.D Headquarters,3,491,13,4.0
1,S.I.D,2,49,1,
2,Aviation Police,1,47,7,
3,C.I.D,5,588,21,4.0
4,Jinja Road,20,320,116,9.0


In [3]:
df.tail()

Unnamed: 0,REGIONS_DISTRICTS,HOMICIDES,ECONOMIC_CRIMES,SEX_RELATED_OFFENCES,CHILD_RELATED_OFFENCES
176,BUSOGA EAST,228,363,773,513.0
177,Jinja,55,269,138,362.0
178,Buwenge,20,58,84,31.0
179,Kakira,27,21,46,48.0
180,KIIRA,102,348,268,442.0


In [4]:
df.shape

(181, 5)

In [5]:
df.dtypes

REGIONS_DISTRICTS          object
HOMICIDES                   int64
ECONOMIC_CRIMES             int64
SEX_RELATED_OFFENCES        int64
CHILD_RELATED_OFFENCES    float64
dtype: object

<h2> Clean the data</h2>

<b>1. Check the missing data<b>

In [6]:
import numpy as np

# replace "?" to NaN
df.replace("", np.nan, inplace = True)
df.head(5)

Unnamed: 0,REGIONS_DISTRICTS,HOMICIDES,ECONOMIC_CRIMES,SEX_RELATED_OFFENCES,CHILD_RELATED_OFFENCES
0,C.I.D Headquarters,3,491,13,4.0
1,S.I.D,2,49,1,
2,Aviation Police,1,47,7,
3,C.I.D,5,588,21,4.0
4,Jinja Road,20,320,116,9.0


<b> Evaluation for missing data </b>

In [7]:
missing_data = df.isnull()
missing_data.head(5)

Unnamed: 0,REGIONS_DISTRICTS,HOMICIDES,ECONOMIC_CRIMES,SEX_RELATED_OFFENCES,CHILD_RELATED_OFFENCES
0,False,False,False,False,False
1,False,False,False,False,True
2,False,False,False,False,True
3,False,False,False,False,False
4,False,False,False,False,False


In [8]:
missing_data = df.isnull()
missing_data.tail(5)

Unnamed: 0,REGIONS_DISTRICTS,HOMICIDES,ECONOMIC_CRIMES,SEX_RELATED_OFFENCES,CHILD_RELATED_OFFENCES
176,False,False,False,False,False
177,False,False,False,False,False
178,False,False,False,False,False
179,False,False,False,False,False
180,False,False,False,False,False


<b> Count missing values in each column</b>

In [9]:
for column in missing_data.columns.values.tolist():
    print(column)
    print (missing_data[column].value_counts())
    print("")  

REGIONS_DISTRICTS
False    181
Name: REGIONS_DISTRICTS, dtype: int64

HOMICIDES
False    181
Name: HOMICIDES, dtype: int64

ECONOMIC_CRIMES
False    181
Name: ECONOMIC_CRIMES, dtype: int64

SEX_RELATED_OFFENCES
False    181
Name: SEX_RELATED_OFFENCES, dtype: int64

CHILD_RELATED_OFFENCES
False    177
True       4
Name: CHILD_RELATED_OFFENCES, dtype: int64



<b> Deal with missing data</b> Replace the with mean

In [10]:
# Calculate the average of the column
avg_child_loss = df["CHILD_RELATED_OFFENCES"].astype("float").mean(axis=0)
print("Average of CHILD_RELATED_OFFENCES:","{:.0f}".format(avg_child_loss))

Average of CHILD_RELATED_OFFENCES: 131


<b> Replace "NaN" by mean value in "CHILD_RELATED_OFFENCES" column</b>

In [11]:

df["CHILD_RELATED_OFFENCES"].replace(np.nan, "{:.0f}".format(avg_child_loss), inplace=True)

In [12]:
df.head()

Unnamed: 0,REGIONS_DISTRICTS,HOMICIDES,ECONOMIC_CRIMES,SEX_RELATED_OFFENCES,CHILD_RELATED_OFFENCES
0,C.I.D Headquarters,3,491,13,4.0
1,S.I.D,2,49,1,131.0
2,Aviation Police,1,47,7,131.0
3,C.I.D,5,588,21,4.0
4,Jinja Road,20,320,116,9.0


<b> Good! Now, we obtain the dataset with no missing values.</b>

In [13]:
df.dtypes

REGIONS_DISTRICTS         object
HOMICIDES                  int64
ECONOMIC_CRIMES            int64
SEX_RELATED_OFFENCES       int64
CHILD_RELATED_OFFENCES    object
dtype: object

In [13]:
df[["CHILD_RELATED_OFFENCES"]] = df[["CHILD_RELATED_OFFENCES"]].astype("int")

In [14]:
df.dtypes

REGIONS_DISTRICTS         object
HOMICIDES                  int64
ECONOMIC_CRIMES            int64
SEX_RELATED_OFFENCES       int64
CHILD_RELATED_OFFENCES     int32
dtype: object

In [15]:
df.describe()

Unnamed: 0,HOMICIDES,ECONOMIC_CRIMES,SEX_RELATED_OFFENCES,CHILD_RELATED_OFFENCES
count,181.0,181.0,181.0,181.0
mean,49.696133,166.790055,193.558011,130.983425
std,61.687217,224.796328,242.120799,181.264217
min,1.0,3.0,1.0,1.0
25%,17.0,36.0,70.0,23.0
50%,27.0,85.0,112.0,66.0
75%,54.0,181.0,192.0,143.0
max,368.0,1414.0,1343.0,1190.0


In [16]:
df.corr()

Unnamed: 0,HOMICIDES,ECONOMIC_CRIMES,SEX_RELATED_OFFENCES,CHILD_RELATED_OFFENCES
HOMICIDES,1.0,0.68306,0.83842,0.781763
ECONOMIC_CRIMES,0.68306,1.0,0.736904,0.66881
SEX_RELATED_OFFENCES,0.83842,0.736904,1.0,0.853476
CHILD_RELATED_OFFENCES,0.781763,0.66881,0.853476,1.0


In [17]:
!pip install geopy

from geopy.geocoders import Nominatim # library to covert address to latitude and longitude

!pip install geocoder
import geocoder



In [18]:
!pip install geopandas



<b> Get latitude and longitude for each rows of the dataframe.</b>

In [19]:
import geocoder
def get_latlng(arcgis_geocoder): # defining the function
    
    lat_lng_coords = None # initialising location to None
    
    while(lat_lng_coords is None): # geocode while loop to create latitude and longitude for each rows
        g = geocoder.arcgis('{}, Uganda, Africa'.format(arcgis_geocoder))
        lat_lng_coords = g.latlng
    return lat_lng_coords

<b> Get the latitude and longitude based on Region_district</b>

In [20]:
regions_dist_code = df['REGIONS_DISTRICTS']
coordinates = [get_latlng(regions_dist_code) for regions_dist_code in regions_dist_code.tolist()]

<b> Put the Latitude and Longitude columns in the dataframe and print the first 12 rows.</b>

In [21]:
df_loc = df

df_coordinates = pd.DataFrame(coordinates, columns = ['Latitude', 'Longitude'])

df_loc['Latitude'] = df_coordinates['Latitude']

df_loc['Longitude'] = df_coordinates['Longitude']

df_loc.head(12)

Unnamed: 0,REGIONS_DISTRICTS,HOMICIDES,ECONOMIC_CRIMES,SEX_RELATED_OFFENCES,CHILD_RELATED_OFFENCES,Latitude,Longitude
0,C.I.D Headquarters,3,491,13,4,53.32941,53.54489
1,S.I.D,2,49,1,131,0.35128,32.53793
2,Aviation Police,1,47,7,131,53.32941,53.54489
3,C.I.D,5,588,21,4,0.35128,32.53793
4,Jinja Road,20,320,116,9,0.32545,32.60655
5,Kira Division,24,178,105,22,0.05128,32.47446
6,Kira Road,30,171,94,46,0.40568,32.64193
7,Mukono,36,283,176,41,0.35825,32.75137
8,Nagalama,19,54,76,10,0.51444,32.75528
9,KMP EAST,129,1007,567,127,0.35128,32.53793


<b> Importing the needed libraries</b>

In [22]:
df_loc.tail()

Unnamed: 0,REGIONS_DISTRICTS,HOMICIDES,ECONOMIC_CRIMES,SEX_RELATED_OFFENCES,CHILD_RELATED_OFFENCES,Latitude,Longitude
176,BUSOGA EAST,228,363,773,513,-0.09991,31.75632
177,Jinja,55,269,138,362,0.43118,33.21013
178,Buwenge,20,58,84,31,0.63927,33.17785
179,Kakira,27,21,46,48,0.5282,33.28291
180,KIIRA,102,348,268,442,0.35128,32.53793


In [23]:
import matplotlib.cm as cm
import matplotlib.colors as colors

import numpy as np

import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

from sklearn.cluster import KMeans

!pip -q install folium
print('folium installed...')
import folium # library for map rendering
print('folium imported...')
print('Done')

folium installed...
folium imported...
Done


<b> Using the geopy library to get the latitude and longitude values of Uganda</b>

In [24]:
from geopy.geocoders import Nominatim

address = 'Uganda, Africa'

geolocator = Nominatim(user_agent="ln_explorer")

location = geolocator.geocode(address)

latitude = location.latitude

longitude = location.longitude

print('The geographical coordinates of Uganda are {}, {}.'.format(latitude, longitude))

The geographical coordinates of Uganda are 2.0842555, 33.7644179.


<b> Coming up with the map of Uganda with folium </b>

In [25]:
map_uganda = folium.Map(location = [latitude, longitude], zoom_start=12)

map_uganda

<B> Activities </b>

1. Add a Single Marker to the mean location this map - Let the color of the marker be default blue

In [36]:
map = folium.Map(location=[df_loc.Latitude.mean(),df_loc.Longitude.mean()], 
                 zoom_start=4, 
                 control_scale=True)

In [37]:
ug_map = folium.Marker(location=[df_loc.Latitude.mean(), df_loc.Longitude.mean()]).add_to(map)

In [38]:
ug_map

<folium.map.Marker at 0x24225058e80>

2. Change the color of the marker drawn in question 1 to Red

3. Chang the Icon on the Marker on a Folium Map in question 2

4. Add Multiple Markers to the Folium Map seen

5. Control Folium Marker Colour by Category of the crimes

<b> Michael Kumakech is an AI, IoT and Data Science Instructor at Makerere University Kampala, Uganda.</b>