# Data Visualization

## Survey Data Science Interest

A survey was conducted to gauge an audience interest in different data science topics, namely:

- Big Data (Spark / Hadoop)
- Data Analysis / Statistics
- Data Journalism
- Data Visualization
- Deep Learning
- Machine Learning

The participants had three options for each topic: Very Interested, Somewhat interested, and Not interested. 2,233 respondents completed the survey.

If you examine the csv file, you will find that the first column represents the data science topics and the first row represents the choices for each topic.

Use the pandas read_csv method to read the csv file into a pandas dataframe.

In [1]:
import pandas as pd
import matplotlib as plt

In [2]:
#Load the Tourist Movement dataset
url = "Topic_Survey_Assignment.csv"
df = pd.read_csv(url, index_col=0)

Use the artist layer of Matplotlib to replicate the bar chart below to visualize the percentage of the respondents' interest in the different data science topics surveyed.

In [3]:
df.sort_values(['Very interested'], ascending=False, axis=0, inplace=True)
df.head()

Unnamed: 0,Very interested,Somewhat interested,Not interested
Data Analysis / Statistics,1688,444,60
Machine Learning,1629,477,74
Data Visualization,1340,734,102
Big Data (Spark / Hadoop),1332,729,127
Deep Learning,1263,770,136


In [4]:
df_percentages = (df/2333*100)
df_percentages = df_percentages.round(2)

In [5]:
df_percentages.head()

Unnamed: 0,Very interested,Somewhat interested,Not interested
Data Analysis / Statistics,72.35,19.03,2.57
Machine Learning,69.82,20.45,3.17
Data Visualization,57.44,31.46,4.37
Big Data (Spark / Hadoop),57.09,31.25,5.44
Deep Learning,54.14,33.0,5.83


In [6]:
colors_list = ['#5cb85c','#5bc0de','#d9534f']

# Change this line to plot percentages instead of absolute values
ax = (df_percentages.div(df_percentages.sum(1), axis=0)).plot(kind='bar',figsize=(20,8),width = 0.8,color = colors_list,edgecolor=None)
ax.legend(prop={'size': 14})
ax.tick_params(axis='both',labelsize=14)
ax.set_title("Audience interest in different Data Science Topics",fontsize=16)


# Add this loop to add the annotations
for p in ax.patches:
    width, height = p.get_width(), p.get_height()
    x, y = p.get_xy() 
    ax.annotate('{:.0%}'.format(height), (p.get_x()+.15*width, p.get_y() + height + 0.01))

## Crime rate in San Francisco

In the final lab, we created a map with markers to explore crime rate in San Francisco, California. In this question, you are required to create a Choropleth map to visualize crime in San Francisco.

In [7]:
url = "Police_Department_Incidents_-_Previous_Year__2016_.csv"
police = pd.read_csv(url, index_col=0)

In [8]:
police.head()

Unnamed: 0_level_0,Category,Descript,DayOfWeek,Date,Time,PdDistrict,Resolution,Address,X,Y,Location,PdId
IncidntNum,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
120058272,WEAPON LAWS,POSS OF PROHIBITED WEAPON,Friday,01/29/2016 12:00:00 AM,11:00,SOUTHERN,"ARREST, BOOKED",800 Block of BRYANT ST,-122.403405,37.775421,"(37.775420706711, -122.403404791479)",12005827212120
120058272,WEAPON LAWS,"FIREARM, LOADED, IN VEHICLE, POSSESSION OR USE",Friday,01/29/2016 12:00:00 AM,11:00,SOUTHERN,"ARREST, BOOKED",800 Block of BRYANT ST,-122.403405,37.775421,"(37.775420706711, -122.403404791479)",12005827212168
141059263,WARRANTS,WARRANT ARREST,Monday,04/25/2016 12:00:00 AM,14:59,BAYVIEW,"ARREST, BOOKED",KEITH ST / SHAFTER AV,-122.388856,37.729981,"(37.7299809672996, -122.388856204292)",14105926363010
160013662,NON-CRIMINAL,LOST PROPERTY,Tuesday,01/05/2016 12:00:00 AM,23:50,TENDERLOIN,NONE,JONES ST / OFARRELL ST,-122.412971,37.785788,"(37.7857883766888, -122.412970537591)",16001366271000
160002740,NON-CRIMINAL,LOST PROPERTY,Friday,01/01/2016 12:00:00 AM,00:30,MISSION,NONE,16TH ST / MISSION ST,-122.419672,37.76505,"(37.7650501214668, -122.419671780296)",16000274071000


Before you are ready to start building the map, let's restructure the data so that it is in the right format for the Choropleth map. Essentially, you will need to create a dataframe that lists each neighborhood in San Francisco along with the corresponding total number of crimes.

Convert the San Francisco dataset that represents the total number of crimes in each neighborhood.

In [9]:
police1=police.groupby("PdDistrict").size().reset_index(name='counts') 

In [10]:
police1.columns=["Neighborhood", "Count"]
police1.columns = list(map(str,police1.columns))

In [11]:
police1

Unnamed: 0,Neighborhood,Count
0,BAYVIEW,14303
1,CENTRAL,17666
2,INGLESIDE,11594
3,MISSION,19503
4,NORTHERN,20100
5,PARK,8699
6,RICHMOND,8922
7,SOUTHERN,28445
8,TARAVAL,11325
9,TENDERLOIN,9942


Based on the San Francisco crime dataset San Francisco consists of 10 main neighborhoods.

Now you should be ready to proceed with creating the Choropleth map.

As you learned in the Choropleth maps lab, you will need a GeoJSON file that marks the boundaries of the different neighborhoods in San Francisco. 

In [15]:
import folium
world_geo = r'san-francisco.geojson' # geojson file

In [16]:
# San Francisco latitude and longitude values
latitude = 37.77
longitude = -122.42
# create a plain world map
world_map = folium.Map(location=[latitude, longitude], zoom_start=12)

In [18]:
# generate choropleth map using the total immigration of each country to Canada from 1980 to 2013
folium.Choropleth(
    geo_data=world_geo,
    data=police1,
    columns=['Neighborhood', 'Count'],
    key_on='feature.properties.DISTRICT',
    fill_color="YlOrRd",
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Crime Rate in San Francisco'
    ).add_to(world_map)

# display map
world_map