## Information Visualizaton

This is a notebook that contains the InfoVis project proposed by the Professor Renaud Blanch. Here you can find the dataset used in this project:  https://gricad-gitlab.univ-grenoble-alpes.fr/blanchr/2020-carbon

The idea of this project is to develop the visualization proposed in the previous step(Design visualizations) with the objective to answer questions we've made about the datasets. In my case, I proposed to create a dorling cartogram to represent the number of travels for each mode of transport for each house. Also, a colormap should help to identify which house in the largest emitter of Co2. Besides, my visualization should support interactions.
    - Action: Mouse pointer pass through a circle representing a house 
        -> Interaction: Should appear a table with some information about the emissions of Co2 of this house
        
     - Action: Select a region
         -> Interaction: Only the houses that belongs to that region should appears in the map.Also, the colors of each house should change accordingly to the examples of each region.(This help us to identify which one is the largerst Co2 emmitter per region).
         
      - Action: Select a mode of transport
          -> Interacton:All examples should consider only the information about the selected mode of transport. On this interaction, the size of each example may change accordingly to the number of missions of each mode of transport.(This helps us to identify how the mode of transport is distributed across the houses)**
          
In addition, as the feedback of the professor was that I should think in a way to encode the X and Y axes because they are the most powerful variables that we have. I chose to use a GeoSpatial dataset to create the circles on it. The dataset have spatial information about the Game of Thrones map. On this hand, this approach aims to use the X and Y to identify the position of each house in the Game of Thrones world. Also, we will be able to see the houses that belongs to same region and how the Co2 emissions and the number of missions are distributed among them.

To conclude, it important to note that I set a maximum size(100) and a minimum size(10) of each circle in the visualization. The initial idea was to maintain the circles in a proportional way. However, it makes it harder to see the examples and these bounds was an idea to improve the visualization.

Here is all the libraries used in this notebook.It is important to note that I use the MPLD3 with a few plugins to create the interactions.

In [1]:
import csv
import pandas as pd
import os
import collections
import geopandas 
import numpy as np
from descartes import PolygonPatch
from shapely.geometry import LineString, MultiLineString
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.widgets import Slider
import mpld3
from mpld3.utils import get_id
from mpld3 import plugins
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
## The following function is used to present the image in this notebook.However, sometimes it creates a few bugs in
##the visualization.
mpld3.enable_notebook()


In [2]:
##Uncomment these lines to install libs that we use in this notebook

##pip install mpld3
##pip install ipympl 

## Preparing the datasets
First, I import three datasets which were provided by the professor and then I made a few operations with them to create our first dataset which will help us to develop the visualization proposed.

In [3]:
##Importing the datasets
directory = os.listdir('2020-carbon-master/data/')
mission_dataset = pd.read_csv('2020-carbon-master/data/' + "missions.tsv", sep = "\t")
user_dataset = pd.read_csv('2020-carbon-master/data/' + "users.tsv", sep = "\t")
place_dataset = pd.read_csv('2020-carbon-master/data/' + "places.tsv", sep = "\t")

##Rename the columns
place_dataset = place_dataset.rename(columns={ "#place_id": "place_id"}   )
user_dataset = user_dataset.rename(columns={ "#user_id": "user_id"}   )
mission_dataset = mission_dataset.rename(columns={ "#mission_id": "mission_id"}   )

##Merging the datasets using joins and selecting the columns that will help us
merged_dataset = mission_dataset.merge(place_dataset, how = "inner" , on = 'place_id')
merged_dataset = merged_dataset.merge(user_dataset, how = "inner" , on = 'user_id')
merged_dataset =  merged_dataset.loc[:,["name","user_id","mode","house","region","mission_id","co2","distance"]]

##Adding a the column in the created dataset
merged_dataset['real_co2'] = merged_dataset['co2'] * merged_dataset['distance']

### Now, we are creating our second dataset
While the first dataset was created to build a relationship between our inital tables. Here, the idea is to extract information about our dataset.To begin, I created the Co2_By_House_dataset, which compute the information of CO2 per each house. Then, I create another dataset(number_missions_per_house) which aims to compute the number of missions per house and per transport mode.

In [4]:
##Using the created dataset to extract more information about the problem
Co2_By_House_dataset = merged_dataset.loc[:,["mode","house","real_co2"]]

#Here  I am doing a few operarions to aggregate the data and let them in a suitable structure
Co2_By_House_dataset = Co2_By_House_dataset.groupby(['house','mode']).sum().reset_index()
Co2_By_House_dataset = pd.pivot_table(Co2_By_House_dataset, values="real_co2", columns = ["mode"], index=['house'], )
Co2_By_House_dataset = Co2_By_House_dataset.fillna(0)

## I am creating a new column with the total amount of Co2
Co2_By_House_dataset['total'] = (Co2_By_House_dataset['car'] + Co2_By_House_dataset['other'] + Co2_By_House_dataset['plane']
                                + Co2_By_House_dataset['public'] + Co2_By_House_dataset['train'])


## A new column is beig create to relate each house with the Co2 values. Yet, a column name is being added due to
## the pivot table. To use the pivot i had to change my index to the name of the houses.On this hand, I am creating
##a column to represent the names and avoinding to use the indexes
names = []
for i in range (0,39):
    names.append(Co2_By_House_dataset.iloc[i].name)
    
Co2_By_House_dataset['name'] = names

### Creating the Third Dataset
The Idea here is to create another dataset. On this case, we need to have a dataset able to relate the houses with the number of missions. Yet, I am ordering the list of houses, putting the best known houses first and trying to keep the visualization as "real" as possible.

In [5]:
## A function to help the user to sort the list of houses
def put_house_first(name, list_of_houses):
    list_of_houses.remove(name)
    list_of_houses.insert(0,name)
    return list_of_houses

## Creating another dataset. This one is based on the number of missions but not the amount of Co2
number_missions_per_house = merged_dataset.loc[:,["mode","house","region"]]

df = pd.DataFrame( columns = ['house','car','train','plane','public','other'] ).fillna(0)
list_housesCO2 = merged_dataset['house'].unique().tolist()

## Organizing the list of houses to put the best known first.
put_house_first("Arryn",list_housesCO2)
put_house_first("Lannister",list_housesCO2)
put_house_first("Tyrell",list_housesCO2)
put_house_first("Martell",list_housesCO2)
put_house_first("Stark",list_housesCO2)

for house in list_housesCO2:
    car_missions = 0
    plane_missions = 0
    train_missions = 0
    other_missions = 0
    public_missions = 0
    for row in  number_missions_per_house.iterrows() :
        if house == row[1][1]:
            if row[1][0] == "car":
                car_missions =  car_missions +1
            if row[1][0] == "plane":
                plane_missions =  plane_missions +1
            if row[1][0] == "train":
                train_missions =  train_missions +1
            if row[1][0] == "other":
                other_missions =  other_missions +1
            if row[1][0] == "public":
                public_missions =  public_missions +1
    
    list_var = {'house': house,'car':car_missions,'train':train_missions,'plane':plane_missions,
                'public':public_missions,'other':other_missions} 
    list_df = pd.DataFrame(data=list_var, index=[0])
    df = df.append(list_var,ignore_index=True)  
number_missions_per_house = df

##Creating another column to compute the total number of mission per house
number_missions_per_house['total'] = (number_missions_per_house['car'] + number_missions_per_house['train'] + number_missions_per_house['plane']
                                    + number_missions_per_house['public'] + number_missions_per_house['other'])


#print(merged_dataset)
#print(Co2_By_House_dataset.iloc[0])
#print(Co2_By_House_dataset)

## Developing the Visualization
Here, I'll start by importing the geospatial dataset about the map of game of thrones (That is the reason why I had to order the list of houses, to put the houses in their specific place) and also creating a new dataset with cities and castles available in the data.

** It is important to note that i am not a big fan of Game of Thrones. On this hand, I just put a few houses in their respective places while the others I chose to randomly select a place for them(I did this because there is a lot of houses that I never heard about in our data set). The places were choses between the cities and castles availables in this geospatial dataset.

In [6]:
## Importing the geospatial Dataset
got_continents = geopandas.GeoDataFrame.from_file('Westeros_Essos_shp/GoTRelease/continents.shp')
got_rivers = geopandas.GeoDataFrame.from_file('Westeros_Essos_shp/GoTRelease/rivers.shp')
got_locations = geopandas.GeoDataFrame.from_file('Westeros_Essos_shp/GoTRelease/locations.shp')
got_cities = got_locations.loc[got_locations['type']=='City' ]
got_islands = geopandas.GeoDataFrame.from_file('Westeros_Essos_shp/GoTRelease/islands.shp')


xlims = np.array(got_continents.bounds[['minx', 'maxx']]).min(), np.array(got_continents.bounds[['minx', 'maxx']]).max()
ylims = np.array(got_continents.bounds[['miny', 'maxy']]).min(), np.array(got_continents.bounds[['miny', 'maxy']]).max()

##New dataframe with cities and castles
got_cities_and_castles = got_cities.append( got_locations.loc[got_locations['type']=='Castle'])

Here we are creating our main dataset. The idea here is to keep the order of places and houses. Then, I select only 39 places (The same number of houses in our dataset) to create a relation between  them and plot it in the GoT map

In [7]:
new_df = got_cities_and_castles.loc[got_cities_and_castles['name']=='Winterfell' ]
new_df = new_df.append(got_cities_and_castles.loc[got_cities_and_castles['name']== "Sunspear" ])
new_df = new_df.append(got_cities_and_castles.loc[got_cities_and_castles['name']== "Highgarden" ])
new_df = new_df.append(got_cities_and_castles.loc[got_cities_and_castles['name']== "Storm's End" ])
new_df = new_df.append(got_cities_and_castles.loc[got_cities_and_castles['name']== "The Eyrie" ])
new_df = new_df.append(got_cities_and_castles.loc[got_cities_and_castles['type']== "City" ])
new_df = new_df.append(got_cities_and_castles.loc[got_cities_and_castles['name']== "Dragonstone" ])
new_df = new_df.append(got_cities_and_castles.loc[got_cities_and_castles['name']== "Pyke" ])
new_df = new_df.append(got_cities_and_castles.loc[got_cities_and_castles['name']== "Castle Black" ])
new_df = new_df.append(got_cities_and_castles.loc[got_cities_and_castles['name']== "Riverrun" ])
new_df = new_df.append(got_locations.loc[got_locations['type']== "Other" ])
new_df = new_df.iloc[:-60,:]
new_df.index = np.arange(len(new_df))

##Adding the Co2 emission for each house and the name of the houses. On this point all of them are in their
##respetive place.For instance: Winterfell => Stark
co2 = []
new_df['houses'] = list_housesCO2
for house in list_housesCO2:
    co2.append(merged_dataset['real_co2'][merged_dataset['house'] == house ].sum())
new_df['real_CO2'] = co2

##Adding a column region for each house of the map. The region will be the one proposed by the first dataset.
##On this hand, may be a few inconsistences about the places.(Two houses of the same region can be far in the map
##due to the places of them being chosen in a randomly)
list_house_regions = []
for house in new_df.iloc[:,6] :
        for row in merged_dataset.iterrows() :
            if house == row[1][3] :
                list_house_regions.append(row[1][4])
                break
new_df['region'] = list_house_regions

Here we are still improving our dataset with pertinent information. We are adding the Co2 emissions by house and by mode of transport.

In [8]:
df = pd.DataFrame( columns = ['name','car','train','plane','public','other','total'] )

for house in list_housesCO2:
    for i in range (0,39) :
        if  Co2_By_House_dataset.iloc[i].name == house:
            df = df.append(Co2_By_House_dataset.iloc[i])
               
df = df.reset_index()
df = df.drop(columns={"index"})

new_df['car'] = df.iloc[:,[1]]
new_df['train'] = df.iloc[:,[2]]
new_df['plane'] = df.iloc[:,[3]]
new_df['public'] = df.iloc[:,[4]]
new_df['other'] = df.iloc[:,[5]]

##Also, we are adding the number of missions of each house with each mode of transport
new_df['car_missions'] = number_missions_per_house.iloc[:,[1]]
new_df['train_missions'] = number_missions_per_house.iloc[:,[2]]
new_df['plane_missions'] = number_missions_per_house.iloc[:,[3]]
new_df['public_missions'] = number_missions_per_house.iloc[:,[4]]
new_df['other_missions'] = number_missions_per_house.iloc[:,[5]]
new_df['total_missions'] = number_missions_per_house.iloc[:,[6]]

### Adding a CSS Style to show our data
Adding a style to the information that will appear with our first interaction(Mouse pointer). Without this CSS style would be hard to read the information due to the name of the places that I chose to put in the map.

In [9]:
css = """
table
{
  border-collapse: collapse;
}
th
{
  color: #ffffff;
  background-color: #000000;
}
td
{
  background-color: #cccccc;
}
table, th, td
{
  font-family:Arial, Helvetica, sans-serif;
  border: 1px solid black;
  text-align: right;
}
"""


### Functions
On these following cells I am defining a few functions that will help me to create my visualization. 

In [10]:
##Defining the function that will calculate the size of each circle
def proportional_size (bigger_value, actual_value, max_size):
        min_size = 10
        actual_size  = (max_size*actual_value)/(bigger_value+1)
        if actual_size < min_size :
            return min_size
        else:
            return actual_size


In [11]:
## The function that will be called each interaction to update the circles with respect 
## to the selected mode of transport
def define_size(dataset,column = 'real_CO2'):
    size_circle = [] 
    if column != "real_CO2":
        column = column + "_missions" 
    if column == "real_CO2":
        column = 'total_missions'
        
    for house in dataset.iterrows() :
        size_circle.append(proportional_size (dataset.loc[ :, [column][0]].max() , house[1][column] , 100))
        
    dataset['size'] = size_circle

In [12]:
## A function to update the map each that a interaction occurs.The idea of this functions is to redefine the circles
## and change them with respect to the region and mode of transport selected
def interaction_handle(fig,ax,dataset, transport,region,labels,label_reg) :


    plot_df = dataset
    transport = transport.lower()
    if transport == 'all':
            transport = 'real_CO2'
    if region != 'All' :
        plot_df = dataset.loc[dataset['region'] == region]
        
    list_houses = plot_df['houses'].unique().tolist()
    labels = labels.loc[ labels['name'].isin(list_houses) ]
    label_reg = label_reg.loc[ label_reg['house'].isin(list_houses) ]
    label_reg = label_reg.rename( columns = {'house': "name"})
    subtitle = []
    for i in range(len(labels)):
        label = labels.iloc[[i], :].T
        labelreg = label_reg.iloc[[i], :].T
        label.columns = ['House {0}'.format(i)]
        labelreg.columns = ['# Missions {0}'.format(i)]
        label = pd.concat([label,labelreg], axis=1)
        label = label.fillna('')
        subtitle.append(str(label.to_html()))
        
    define_size(plot_df,transport)
    point = ax.scatter(plot_df.geometry.x, plot_df.geometry.y, c= -plot_df.iloc[:][transport],  
                        s=plot_df.iloc[:]['size']*20, alpha=1,cmap = plt.cm.gist_heat)

    tooltip = mpld3.plugins.PointHTMLTooltip(point, subtitle,voffset=10, hoffset=10, css=css)
    mpld3.plugins.connect(fig, tooltip)



        

In [13]:
##Function to create the continents,islands,cities,rivers and the circles representing the houses.
def show_map (Regions,Transports):
    fig, ax = plt.subplots(1, figsize = (16, 10))
    fig.subplots_adjust(left=0, bottom=0)
    for i, geo in enumerate(got_continents.geometry): # Adding continents
        ax.add_patch(PolygonPatch(geo, color='burlywood', ec='gray', lw=1))
        ax.text(geo.centroid.xy[0][0], geo.centroid.xy[1][0], s=got_continents.iloc[i]['name'], fontsize=10, color='k')
    for geo in got_islands.geometry: # Adding islands
        ax.add_patch(PolygonPatch(geo, color='burlywood', ec='k', lw=1))
    for i, geo in enumerate(new_df.geometry): # Adding cities
        if (new_df.loc[i]['type'] == 'Castle'):
            ax.text(geo.xy[0][0]+1, geo.xy[1][0]+1, s=new_df.iloc[i]['name'],fontsize=5, color='k')
            ax.plot(geo.xy[0], geo.xy[1], marker='*', color='maroon')
        if (new_df.loc[i]['type'] == 'City'):   
            ax.text(geo.xy[0][0]+1, geo.xy[1][0]+1, s=new_df.iloc[i]['name'],fontsize=5, color='k')
            ax.plot(geo.xy[0], geo.xy[1], marker='.', color='maroon')
        if (new_df.loc[i]['type'] == 'Other'):   
            ax.plot(geo.xy[0], geo.xy[1], marker='+', color='maroon')
            #ax.text(geo.xy[0][0]+1, geo.xy[1][0]+1, s=new_df.iloc[i]['name'],fontsize=5, color='k')
    for geo in got_rivers.geometry: # Add rivers
        if isinstance(geo, LineString): #Rivers are defined as a single line
            ax.plot(geo.xy[0], geo.xy[1], color='cornflowerblue')
        elif isinstance(geo, MultiLineString):
            for j, lj in enumerate(geo): # Some other rivers
                ax.plot(lj.xy[0], lj.xy[1], color='cornflowerblue')

    interaction_handle(fig,ax,new_df,Transports,Regions,df,number_missions_per_house.iloc[:,[0,1,2,3,4,5,6]])

# Finally, our map!!

In our map we can answer the 2 questions that I proposed in the previous step:

       -> How the mode of transport is distributed across the houses?
       -> Which house emits more Co2 ?
       
Firstly, we can get information by analyzing the size of each circle. As each circle of the map represents a house, the bigger a circle is, more missions it have done. Yet, if you want to analyze the number of missions of each mode of transport for each house, you can select the mode and then the size of the circles(representing the number of missions of each house) will change accordingly to the selected mode. Equally important, if you chose to analyze by region, you'll see that the size of the circles will also change accordingly to the houses that belongs to each region.


In addition, you can also analyze the houses based on the Co2 Emissions( which is represented by the color of each circle). When you change the transport mode, you'll see that the colors will change. This happens because there is an interaction that shows the emissions per each mode. So, you can also analyze and compare the emissions of each house. The same occurs when you change the region (The visualization adapts to the examples that are in the map), so you can do a better comparison between the selected houses.

On this hand, the first question is answered in two ways. The first one is by seeing the labels when you passing the pointer of the mouse over each house( analyzing the distribution of the modes of transport used by an specific house) and the second way is by comparing the size of the circles ( Which is a comparison between houses). To answer the second question we need to use the circle colors. In fact, we encode the Co2 emissions into a color scale. In other words, as darker a circle is, more this house emits Co2. Besides, you can also compare the emissions of Co2 for each mode of transport for an specific house through the mouse pointer interaction.

Finally, it is important to note that there is a little difference between the proposed visualization and its real implementation. This visualization does not have sliders. Actually, I thought better and I realize that they were unnecessary as long as I was not using a range of numbers. On this hand, I chose a dropdown menu.

In [14]:
import warnings
warnings.filterwarnings('ignore')      
##Function of MPLD3 library to show the visualization and to handle the interactions
interact(show_map, Transports=[('All'), ('Car'), ('Train'),('Plane'),('Public'),('Other')],
         Regions=[('All'),('North'), ('Dorne'), ('Reach'),('Riverlands'),('Westerlands'),('Vale'),('Crownlands')])

interactive(children=(Dropdown(description='Regions', options=('All', 'North', 'Dorne', 'Reach', 'Riverlands',…

<function __main__.show_map(Regions, Transports)>