# CRIME RATE IN SAN FRANCISCO

In [1]:
import numpy as np  # useful for many scientific computing in Python
import pandas as pd # primary data structure library

Folium is a powerful Python library that helps you create a number of leaflet maps. The fact that Folium results are interactive makes this library very useful for building dashboards.

Information from the official Folium documentation page:

> Folium builds on the data marshaling strengths of the Python ecosystem and the mapping strengths of the Leaflet.js library. Manipulate your data in Python, then visualize it on a leaflet map through Folium.

> Folium makes it easy to visualize Python-manipulated data on an interactive leaflet map. It allows both data binding to the map for choropleth rendering as well as passing Vincent/Vega visualizations as markers on the map.

> The library has several built-in title sets from OpenStreetMap, Mapbox, and Stamen, and supports custom tile sets with Mapbox or Cloudmade API keys. Folium supports both GeoJSON and TopoJSON overlays, as well as data binding with those overlays to create choropleth maps with color-brewer color schemes.

**Folium** is not available by default. So, first we have to download it before importing.

In [2]:
!conda install -c conda-forge folium=0.5.0 --yes
import folium

print('Folium installed and imported!')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.





  current version: 23.1.0
  latest version: 23.5.0

Please update conda by running

    $ conda update -n base -c conda-forge conda

Or to minimize the number of packages updated during conda update use

     conda install conda=23.5.0




Folium installed and imported!


Creating world maps is easy in **Folium**. You simply create a _Map_ **Folium** object and then display it. The beauty of **Folium** maps is that they are interactive, so you can zoom in on any area of interest no matter what the initial zoom level.

In [3]:
# define the world map
world_map = folium.Map()

# display world map
world_map

You can customize the default definition of the world map by specifying the map center and initial zoom level.

All locations on the map are defined by the values _Latitude_ and _Longitude_ respectively. Therefore, you can create a map and pass in the center of the value of **[0, 0]** of _Latitude_ and _Longitude_.

For a specified center, you can also define the initial zoom level on that location when the map is displayed. ** The higher the zoom level, the more central the map is zoomed in **.

Let's create a Canada centered map and test the zoom level to see how it affects the rendered map.

In [4]:
# define the world map centered around Canada with a low zoom level
world_map = folium.Map(location=[56.130, -106.35], zoom_start=4)

# display world map
world_map

In [5]:
# define the world map centered around Canada with a higher zoom level
world_map = folium.Map(location=[56.130, -106.35], zoom_start=8)

# display world map
world_map

In [11]:
mexico_latitude = 23.6345 
mexico_longitude = -102.5528

#map
mexico_map = folium.Map(location=[23.6345, -102.5528], zoom_start = 4)

#display
mexico_map

### A. Stamen Toner Map

These are high contrast B+W (black and white) maps. They are well suited for mixing data and exploring river and coastal areas.

In [7]:
# create a Stamen Toner map of the world centered around Canada
world_map = folium.Map(location=[56.130, -106.35], zoom_start=4, tiles='Stamen Toner')

# display map
world_map

### B. Stamen Terrain map

These are maps with hill cover and natural vegetation color. They provide an overview of the advanced labeling and routing of two-way roads.

Hãy tạo một bản đồ Stamen Terrain của Canada với mức độ thu phóng 4.


In [8]:
# create a Stamen Toner map of the world centered around Canada
world_map = folium.Map(location=[56.130, -106.35], zoom_start=4, tiles='Stamen Terrain')

# display map
world_map

In [15]:
mexico_map = folium.Map(location=[mexico_latitude, mexico_longitude], zoom_start=4, tiles='Stamen Terrain')
mexico_map

# CRIME RATE IN SAN FRANCISCO

In [None]:
df_incidents = pd.read_csv('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/Police_Department_Incidents_-_Previous_Year__2016_.csv')

print('Dataset downloaded and read into a pandas dataframe!')

In [None]:
df_incidents.head()

Each row contains 13 properties:

> 1. **IncidntNum**: Sequence number of the incident
> 2. **Category**: Classification of crimes or incidents
> 3. **Descript**: Describe the crime or incident
> 4. **DayOfWeek**: The day of the week the problem occurred
> 5. **Date**: The date the problem occurred
> 6. **Time**: Time of day when the problem occurs
> 7. **PdDistrict**: District Police
> 8. **Resolution**: Solution: is the criminal caught or not?
> 9. **Address**: The closest address to where the problem occurred
> 10. **X**: Longitude value of the crime location
> 11. **Y**: Latitude value of crime location
> 12. **Location**: A set of latitude and longitude values
> 13. **PdId**: Police Agency ID

In [None]:
df_incidents.shape

In [None]:
# get the first 100 crimes in the df_incidents dataframe
limit = 100
df_incidents = df_incidents.iloc[0:limit, :]

In [None]:
df_incidents.shape

In [None]:
# San Francisco latitude and longitude values
latitude = 37.77
longitude = -122.42

In [None]:
# create map and display it
sanfran_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# display the map of San Francisco
sanfran_map

In [None]:
# instantiate a feature group for the incidents in the dataframe
incidents = folium.map.FeatureGroup()

# loop through the 100 crimes and add each to the incidents feature group
for lat, lng, in zip(df_incidents.Y, df_incidents.X):
    incidents.add_child(
        folium.features.CircleMarker(
            [lat, lng],
            radius=5, # define how big you want the circle markers to be
            color='yellow',
            fill=True,
            fill_color='blue',
            fill_opacity=0.6
        )
    )

# add incidents to map
sanfran_map.add_child(incidents)

You can also add some pop-up text that will be displayed when you hover over a marker. Let's make each marker show the crime category on hover.

In [None]:
# instantiate a feature group for the incidents in the dataframe
incidents = folium.map.FeatureGroup()

# loop through the 100 crimes and add each to the incidents feature group
for lat, lng, in zip(df_incidents.Y, df_incidents.X):
    incidents.add_child(
        folium.features.CircleMarker(
            [lat, lng],
            radius=5, # define how big you want the circle markers to be
            color='yellow',
            fill=True,
            fill_color='blue',
            fill_opacity=0.6
        )
    )

# add pop-up text to each marker on the map
latitudes = list(df_incidents.Y)
longitudes = list(df_incidents.X)
labels = list(df_incidents.Category)

for lat, lng, label in zip(latitudes, longitudes, labels):
    folium.Marker([lat, lng], popup=label).add_to(sanfran_map)    
    
# add incidents to map
sanfran_map.add_child(incidents)

Not very interesting, is it? Now you can know what kind of crime occurred at each marker.

If you find the map has too many markers, there are two solutions to this problem. The simpler solution is to remove these location markers and just add text to the circle markers themselves like so:

In [None]:
# create map and display it
sanfran_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# loop through the 100 crimes and add each to the map
for lat, lng, label in zip(df_incidents.Y, df_incidents.X, df_incidents.Category):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5, # define how big you want the circle markers to be
        color='yellow',
        fill=True,
        popup=label,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(sanfran_map)

# show map
sanfran_map

Another proper fix is to group the markers into different clusters. Each cluster is then represented by the number of crimes in each neighborhood. These clusters can be thought of as pockets of San Francisco which you can then analyze separately.

To do this, we start by instantiating a _MarkerCluster_ object and adding all the data points in the data frame to this object.

In [None]:
from folium import plugins

# let's start again with a clean copy of the map of San Francisco
sanfran_map = folium.Map(location = [latitude, longitude], zoom_start = 12)

# instantiate a mark cluster object for the incidents in the dataframe
incidents = plugins.MarkerCluster().add_to(sanfran_map)

# loop through the dataframe and add each data point to the mark cluster
for lat, lng, label, in zip(df_incidents.Y, df_incidents.X, df_incidents.Category):
    folium.Marker(
        location=[lat, lng],
        icon=None,
        popup=label,
    ).add_to(incidents)

# display map
sanfran_map

Note that when you minimize, all the points mark the group into a cluster, _The Global Cluster_, including 100 markers or criminals, is the total number of crimes in our data frame. As you start to zoom in, the _global cluster_ will begin to split into smaller clusters. Zooming in all directions reveals individual markers.

# Choropleth map <a id="8"></a>

A `Choropleth` map is a thematic map in which areas are shaded or shaped to correspond to a measurement of a statistical variable displayed on the map, such as population density or average income head. A choropleth map makes it easy for anyone to visualize how change is measured across a geographic area, or it shows how much variation is within an area. Below is a `Choropleth` map of the United States depicting the population in square miles for each state.

<img src = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/labs/Module%205/images/2000_census_population_density_map_by_state.png" width = 600> 


Now, create a `Choropleth` map of the world showing immigration to Canada.

First load and import the Canadian immigration dataset using the _pandas_ `read_excel()` method. Usually, before we do that, we need to load a module that _pandas_ requires to read the excel file. This module is **xlrd**. For convenience, we've pre-installed this module, so you won't have to worry about that. Otherwise, you will need to run the following line of code to install the **xlrd** module:

```
!conda install -c anaconda xlrd --yes
```

In [None]:
df_can = pd.read_excel('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/Canada.xlsx',
                     sheet_name='Canada by Citizenship',
                     skiprows=range(20),
                     skipfooter=2)

print('Data downloaded and read into a dataframe!')

In [None]:
df_can.head()

In [None]:
# print the dimensions of the dataframe
print(df_can.shape)

Clean up data. We'll make some changes to the original dataset to make it easier to create visualizations. Refer to *Introduction to Matplotlib and Line Plots* and *Area Plots, Histograms, and Bar Plots* for a detailed description of the processing.

In [None]:
# clean up the dataset to remove unnecessary columns (eg. REG) 
df_can.drop(['AREA','REG','DEV','Type','Coverage'], axis=1, inplace=True)

# let's rename the columns so that they make sense
df_can.rename(columns={'OdName':'Country', 'AreaName':'Continent','RegName':'Region'}, inplace=True)

# for sake of consistency, let's also make all column labels of type string
df_can.columns = list(map(str, df_can.columns))

# add total column
df_can['Total'] = df_can.sum(axis=1)

# years that we will be using in this lesson - useful for plotting later on
years = list(map(str, range(1980, 2014)))
print ('data dimensions:', df_can.shape)

In [None]:
df_can.head()

To create a `Choropleth` map, we need a GeoJSON file that defines the area or borders for each state, county or country that we are interested in. In this case, since we are trying to create a map of the world, we want a GeoJSON that defines the boundaries of all the countries of the world. For your convenience, we will provide you with this file, so go ahead and download it. Let's name it **world_countries.json**.

In [None]:
# download countries geojson file
!wget --quiet https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/world_countries.json
    
print('GeoJSON file downloaded!')

Now that we have the GeoJSON file, let's create a world map, centered on the _latitude_ and _longitude_ values **[0, 0]** with an initial zoom level of 2.

In [None]:
world_geo = r'world_countries.json' # geojson file

# create a plain world map
world_map = folium.Map(location=[0, 0], zoom_start=2)

And to create a `Choropleth` chart, we will use the _choropleth_ method with the following key parameters:

1. geo_data, which is the GeoJSON file.
2. data, is the data frame that contains the data.
3. columns, represents the columns in the data frame that will be used to create the `Choropleth` map.
4. key_on, is the key or variable in the GeoJSON file containing the preferred variable name. To determine that, you will need to open the GeoJSON file with any text editor and write down the name of the key or variable containing the name. of countries, since countries are the variable we are interested in. In this case, **name** is the key in the GeoJSON file containing the names of the countries. Note that this key is case sensitive, so you need to enter it exactly as it exists in the GeoJSON file.

According to our `Choropleth` map legend, the darker the color of the country and the closer the color is to red, the higher the number of immigrants from that country. Accordingly, the highest number of immigration within 33 years (from 1980 to 2013) was from China, India and the Philippines, followed by Poland, Pakistan and interestingly, the United States.

In [None]:
world_geo = r'world_countries.json'

# create a numpy array of length 6 and has linear spacing from the minium total immigration to the maximum total immigration
threshold_scale = np.linspace(df_can['Total'].min(),
                              df_can['Total'].max(),
                              6, dtype=int)
threshold_scale = threshold_scale.tolist() # change the numpy array to a list
threshold_scale[-1] = threshold_scale[-1] + 1 # make sure that the last value of the list is greater than the maximum immigration

# let Folium determine the scale.
world_map = folium.Map(location=[0, 0], zoom_start=2)
world_map.choropleth(
    geo_data=world_geo,
    data=df_can,
    columns=['Country', 'Total'],
    key_on='feature.properties.name',
    threshold_scale=threshold_scale,
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Immigration to Canada',
    reset=True
)
world_map

It's much better now! Feel free to dig through the data and perhaps create a `Choropleth` map of years, or maybe decades, and compare them to the entire period from 1980 to 2013.