# Making a Choropleth Map #  

Let's take a look at the history of internet usage across the world from the [World Bank Data Portal](https://data.worldbank.org/indicator/IT.NET.USER.ZS?end=2018&start=1960&type=shaded&view=chart). We'll plot this data on a map that will allow users to interact with it in a web page. We'll also explore different ways to customize our map.

  This file is available in the repository in csv form.  
  
[Step 1: Checking out Our Data](#Step-1:-Checking-out-Our-Data)  
[Step 2: Importing Libraries](#Step-2:-Importing-Libraries)  
[Step 3: Gathering and Cleaning Data](#Step-3:-Gathering-and-Cleaning-Data)  
[Step 4: Making a Map](#Step-4:-Making-a-Map)    

Try launching on [mybinder.org](https://mybinder.org/)!  

### Exercise Notes:¶  ###
**For each step, we will**:  
- present an explanation which will include an example of the syntax. 
- allow you to fill in and complete code yourself based on this example.

## Step 1: Checking out Our Data ##  
Before we start writing anything, let's open up the file we have to see what we're working with. Ultimately, we'll want to make sure that we the necessary information to plot our data.  
![internet-data.png](internet-data.png)  
In this dataset, you can see that each entry has a corresponding country code. That will be enough to plot it on a map. However, not every entry is a country which could cause some errors if we tried to plot the entire set. In addition, not every year contains data for every country. Moving forward, we'll have to keep these features in mind.  

## Step 2: Importing Libraries ##
This is typically done in the first lines of a script. This way, you won't waste time running parts of your script before finding that a library was not available.  

In this workshop we'll be using:  
- [pandas](https://pandas.pydata.org/pandas-docs/stable/) for data handling and manipulation
- [folium](https://python-visualization.github.io/folium/) for data vizualization with leaflet maps
- [JSON](https://docs.python.org/3/library/json.html) - JavaScript Object Notation encoder/decoder for reading in geographical data  

In [3]:
import pandas as pd
import folium
import json

## Step 3: Gathering and Cleaning Data  
Now that we've finished our imports, let's collect the data we want into a pandas dataframe. In the picture above, you can see that the actual data does not start until line 5 where the columns are named. 

We can use the following function to read our csv:  
df = pd.read_csv('*file_path*', skiprows=val)  

For example, if we wanted to read our data while skipping the first 10 rows:
```
df = pd.read_csv("Internet.csv", skiprows=10)
```

For our purposes, we'll want to **skip the first 4 rows**.

In [None]:
internet = 
print(internet)

We also have a file with geographical boundaries for countries with their coreresponding codes. We can compare the data in this file with our internet dataframe in order to filter what we want to plot.  

In order to **read in this file**, we can use the following syntax:  
```
with open('filename', 'r') as file:
    contents = json.load(file)
``` 
The name of the file we want is '**country-codes.json**'

In [None]:
country_codes = 
print(country_codes)

Now that we have all of our data in a usable format, we'll need to clean it up. Let's filter our dataframe to include only the entries who country code is also in the country_codes data.   

Accessing data from a certain column in our internet dataframe is easy. To do so, we need only specify the column name in square brackets after the variable name. For example, if we wanted the 'Country Name' column, we could type  
```
names = internet['Country Name']
```  
Try printing out the column with the country codes. You don't have to save it into a variable in order to do that.

In [None]:
print()

To access the **codes in our country_codes variable**, we'll have to do a little more work.  

The data in this variable is structured as a python dictionary. A very simplified version might look like this:  

```
diction = {'name' : 'data', 'features' : ['Canada' : {'id' : CAN, 'area' : 9999} ] }
```

The data of course contains more than just information for Canada. We'll need each id in the list of features in order to match our internet data to the available geographies. To **access these features**, we can use  

```
features = diction['features']
```

From this list of features, **we need each id**, which would involve **iterating through with a loop**. That might look like this  

```
lOfIds = []
for i in features:
    lOfIds.append(i[id])
```

We could also combine these past two steps into one line using a **list comprehension** which would look like this example below.

```
lOfIds = [country['id'] for country in diction['features']]
```

Try getting all the country ids in the country_codes variable by following the examples above.

In [None]:
lOfIds = []

With a list of all the ids in country_codes, we can **filter the internet data** to include only **data for countries** and **not for regions**.  Luckily, pandas has a built-in method for checking if values in a dataframe exist in some other variable. It's called **isin()** and can be used to filter a dataframe in the following form:

```
df = pd.DataFrame({'Town' : ['Boston', 'Salem', 'Worcester', 'Ludlow', 'Lowell']
              ,'Temp' : [65, 63, 67, 58, 55]
              ,'Forecast': ['Rain', 'Rain', 'Cloudy', 'Sunny', 'Windy']}
              )
df = df[df['Forecast'].isin(['Rain', 'Cloudy'])]
```

The last line may look a little dense with all the df's, so let's parse through it. 
```
df['Forecast'].isin(['Rain', 'Cloudy'])
```
This section simply returns a series of True or False values depending on whether the Forecast is in that list of Rain and Cloudy. It would look like
``` 
0     True
1     True
2     True
3    False
4    False
```

We then use this series above to filter out the rows we don't want. We're setting df to be equal to itself where the value in the Forecast column is either Rain or Cloudy. That leaves 

```
        Town  Temp Forecast
0     Boston    65     Rain
1      Salem    63     Rain
2  Worcester    67   Cloudy
```

Now, filter the internet data!

In [None]:
internet = 

Finally, we have all the data ready that we'll need to make a map. Actually, we have enough to make more than one. But before we begin, let's create a list of bins that our data will fall into.

In [None]:
bins = [float(num) for num in range(0,110,10)] 

Not every year has data collected for each country, let's try defining a function to make a map for the years **2005, 2010, and 2017**. We'll only need a few folium functions and methods to do so:  


This line creates a **basemap**. 
```
m1 = folium.Map(location = [25,0], zoom_start = 1.5)
```

The **Choropleth** function will create a separate choropleth layer.
```
c1 = folium.Choropleth(geo_data = countrycodes
            ,data = internet2017
            ,columns = ['Country Code', '2017']
            ,key_on = 'feature.id'
            ,fill_color = 'Spectral'
            ,fill_opacity = 0.7
            ,line_opacity = 0.4
            ,legend_name = 'People Using Internet in 2017 (%)'
            ,name = '2017'
            ,bins = bins
            )
```

The ```add_to()``` method adds a layer, like the Chropleth layer, to a basemap.
```
c1.add_to(m1)
```

To customize your maps, try either looking at the folium docs or typing a function name followed by a question mark e.g. ```folium.Map?``` in order to see more parameters. [Folium docs](https://python-visualization.github.io/folium/) will also show additional functions for adding more features to your map!

Try combining these different functions to create a map for a certain year or make a function that creates several! 

In [9]:
folium.Map?
folium.Choropleth?