# Import des modules nécessaires

Commençons par installer/mettre à jour les modules qui nous seront utiles dans ce cours.

`pip` est le gestionnaire de paquets de Python. L'option `--upgrade` permet de mettre à jour vers la version la plus récente. L'option `--user` permet d'installer au niveau du compte utilisateur et donc de ne pas nécessiter de droits administrateur.

In [1]:
!pip install nxviz numpy pandas matplotlib seaborn plotly cufflinks dash --upgrade --user

Requirement already up-to-date: nxviz in c:\users\pierr\appdata\roaming\python\python36\site-packages (0.3.7)
Requirement already up-to-date: numpy in c:\users\pierr\appdata\roaming\python\python36\site-packages (1.14.5)
Requirement already up-to-date: pandas in c:\users\pierr\appdata\roaming\python\python36\site-packages (0.23.1)
Requirement already up-to-date: matplotlib in c:\users\pierr\anaconda3\lib\site-packages (2.2.2)
Requirement already up-to-date: seaborn in c:\users\pierr\anaconda3\lib\site-packages (0.8.1)
Requirement already up-to-date: plotly in c:\users\pierr\anaconda3\lib\site-packages (2.7.0)
Requirement already up-to-date: cufflinks in c:\users\pierr\anaconda3\lib\site-packages (0.12.1)
Requirement already up-to-date: dash in c:\users\pierr\anaconda3\lib\site-packages (0.21.1)
Requirement not upgraded as not directly required: networkx==2.1 in c:\users\pierr\anaconda3\lib\site-packages (from nxviz) (2.1)
Requirement not upgraded as not directly required: setuptools==3

nxviz 0.3.7 has requirement numpy==1.14.3, but you'll have numpy 1.14.5 which is incompatible.
nxviz 0.3.7 has requirement pandas==0.22.0, but you'll have pandas 0.23.1 which is incompatible.


In [2]:
import cufflinks as cf

import numpy as np

import pandas as pd

import plotly.graph_objs as go
from plotly.offline import init_notebook_mode, iplot

init_notebook_mode(connected=True)

# 1. Des colonnes de DataFrame sous forme de lignes

Les graph objects (lignes, barres, etc.) de plotly sont nombreux. Commencons par quelques points rattachés ensemble par des lignes.

Les données d'entrée peuvent être des listes de nombres, des numpy arrays, des pandas DataFrames.

In [3]:
# On construit d'abord chaque élément de graphique (ici des points reliés entre eux)
trace1 = go.Scatter(
    x=[1,2,3],
    y=[4,5,6],
    marker={
        'color': 'red',
        'symbol': 104,
        'size': "10"
    },
    mode="markers+lines",
    text=["one", "two", "three"],
    name='1st Trace'
)

# On les ajoute aux données du graphe
data=go.Data([trace1])

# On précise le layout du graphe (titre du graphique, titres des axes)
layout=go.Layout(title="First Plot",
                 xaxis={
                     'title':'x1'
                 },
                 yaxis={
                     'title':'x2'
                 }
)

# On crée l'objet Figure
figure = go.Figure(data=data, layout=layout)

# On affiche l'objet figure dans le notebook
iplot(figure, filename='pyguide_1')

In [4]:
# On construit une DataFrame bidon (ici une marche aléatoire).
df = pd.DataFrame(np.random.random(size=(15, 3)), columns=['Pierre', 'Paul', 'Jacques']).cumsum()

df.head(3)

Unnamed: 0,Pierre,Paul,Jacques
0,0.411392,0.886754,0.694656
1,1.333084,1.347745,1.033688
2,1.778165,2.240335,1.47937


In [5]:
# On l'affiche
en_x = df.index

traces = []
for col in df:
    traces.append(
        go.Scatter(
            x=en_x,
            y=df[col]
        )
    )

figure = go.Figure(
    data=go.Data(traces),
    layout=go.Layout(title="Un graphe plus complexe")
)

iplot(figure)

# 2. Et maintenant des barres !

Commençons par charger quelques données.

In [6]:
df = pd.read_csv('titanic.csv')

df.info()

df.sample(3)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 15 columns):
survived       891 non-null int64
pclass         891 non-null int64
sex            891 non-null object
age            714 non-null float64
sibsp          891 non-null int64
parch          891 non-null int64
fare           891 non-null float64
embarked       889 non-null object
class          891 non-null object
who            891 non-null object
adult_male     891 non-null bool
deck           203 non-null object
embark_town    889 non-null object
alive          891 non-null object
alone          891 non-null bool
dtypes: bool(2), float64(2), int64(4), object(7)
memory usage: 92.3+ KB


Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
59,0,3,male,11.0,5,2,46.9,S,Third,child,False,,Southampton,no,False
328,1,3,female,31.0,1,1,20.525,S,Third,woman,False,,Southampton,yes,False
568,0,3,male,,0,0,7.2292,C,Third,man,True,,Cherbourg,no,True


Nous allons nous intéresser au taux de survie par classe de passager.

On peut facilement calculer cela à l'aide de `pandas`, soit via un `groupby` soit via un `pivot_table`. On utilise plutôt le dernier, car il est plus générique (permet de faire plus de choses).

In [7]:
dft = df.pivot_table(index='class', values='survived', aggfunc='mean')

dft

Unnamed: 0_level_0,survived
class,Unnamed: 1_level_1
First,0.62963
Second,0.472826
Third,0.242363


Voilà à quoi notre tableau ressemble. Faisons-en maintenant un graphique en barres.

Volontairement, j'introduit quelques personnalisations (couleur, etc.) car elles pourront vous être utile à l'avenir.

In [8]:
trace = go.Bar(
    x=dft.index,
    y=dft['survived'],
    marker={
        'color': 'rgba(210, 105, 30, 0.5)'
    }
)

layout = go.Layout(
    title="Quelle est votre chance de survie sur le Titanic, selon votre classe ?",
    yaxis={
        'title': "pourcentage de survie",
        'ticklen': 0.1,
        'tickformat': '.0%',
        'range': [0, 1]
    },
    xaxis={
        'title': "classe de passager"
    }
)

figure = go.Figure(
    data=go.Data([trace]),
    layout=layout
)

iplot(figure)

Et si on voulait afficher plusieurs barres ?

Par exemple, dans chaque classe, le taux de survie par sexe.

In [9]:
dft = df.pivot_table(index='class', columns='sex', values='survived', aggfunc='mean')

dft

sex,female,male
class,Unnamed: 1_level_1,Unnamed: 2_level_1
First,0.968085,0.368852
Second,0.921053,0.157407
Third,0.5,0.135447


In [10]:
colors = {
    'female': 'rgba(204, 204, 255, 0.5)',
    'male': 'rgba(72, 137, 151, 0.5)'
}

traces = []
for col in dft: # ici col=sex
    traces.append(
        go.Bar(
            x=dft.index,
            y=dft[col],
            marker={
                'color': colors[col]
            },
            name=col
        )
    )

layout = go.Layout(
    title="Quelle est votre chance de survie sur le Titanic, selon votre classe ?",
    yaxis={
        'title': "pourcentage de survie",
        'ticklen': 0.1,
        'tickformat': '.0%',
        'range': [0, 1]
    },
    xaxis={
        'title': "classe de passager"
    }
)

figure = go.Figure(
    data=go.Data(traces),
    layout=layout
)

iplot(figure)

Et si on voulait superposer les barres ?

Affichons le nombre de passagers, en fonction de leur classe et du port duquel ils sont partis.

In [11]:
dft = df.pivot_table(index='class', columns='embark_town', values='survived', aggfunc='count')

dft

embark_town,Cherbourg,Queenstown,Southampton
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
First,85,2,127
Second,17,3,164
Third,66,72,353


In [12]:
traces = []
for col in dft: # ici col=embark_town
    traces.append(
        go.Bar(
            x=dft.index,
            y=dft[col],
            name=col
        )
    )

layout = go.Layout(
    title="D'où venaient les passagers du Titanic ?",
    yaxis={
        'title': "nombre de passagers"
    },
    xaxis={
        'title': "classe de passager"
    },
    barmode='stack'
)

figure = go.Figure(
    data=go.Data(traces),
    layout=layout
)

iplot(figure)

# 3. Si on faisait des cartes ?

In [13]:
df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/2014_ebola.csv')
df.head()

cases = []
colors = ['rgb(239,243,255)','rgb(189,215,231)','rgb(107,174,214)','rgb(33,113,181)']
months = {6:'June',7:'July',8:'Aug',9:'Sept'}

for i in range(6,10)[::-1]:
    cases.append(go.Scattergeo(
            lon = df[ df['Month'] == i ]['Lon'], #-(max(range(6,10))-i),
            lat = df[ df['Month'] == i ]['Lat'],
            text = df[ df['Month'] == i ]['Value'],
            name = months[i],
            marker = dict(
                size = df[ df['Month'] == i ]['Value']/50,
                color = colors[i-6],
                line = dict(width = 0)
            )
        )
    )

cases[0]['text'] = df[ df['Month'] == 9 ]['Value'].map('{:.0f}'.format).astype(str)+' '+\
    df[ df['Month'] == 9 ]['Country']
cases[0]['mode'] = 'markers+text'
cases[0]['textposition'] = 'bottom center'

inset = [
    go.Choropleth(
        locationmode = 'country names',
        locations = df[ df['Month'] == 9 ]['Country'],
        z = df[ df['Month'] == 9 ]['Value'],
        text = df[ df['Month'] == 9 ]['Country'],
        colorscale = [[0,'rgb(0, 0, 0)'],[1,'rgb(0, 0, 0)']],
        autocolorscale = False,
        showscale = False,
        geo = 'geo2'
    ),
    go.Scattergeo(
        lon = [21.0936],
        lat = [7.1881],
        text = ['Africa'],
        mode = 'text',
        showlegend = False,
        geo = 'geo2'
    )
]

layout = go.Layout(
    title = 'Ebola cases reported by month in West Africa 2014<br> \
Source: <a href="https://data.hdx.rwlabs.org/dataset/rowca-ebola-cases">\
HDX</a>',
    geo = dict(
        resolution = 50,
        scope = 'africa',
        showframe = False,
        showcoastlines = True,
        showland = True,
        landcolor = "rgb(229, 229, 229)",
        countrycolor = "rgb(255, 255, 255)" ,
        coastlinecolor = "rgb(255, 255, 255)",
        projection = dict(
            type = 'Mercator'
        ),
        lonaxis = dict( range= [ -15.0, -5.0 ] ),
        lataxis = dict( range= [ 0.0, 12.0 ] ),
        domain = dict(
            x = [ 0, 1 ],
            y = [ 0, 1 ]
        )
    ),
    geo2 = dict(
        scope = 'africa',
        showframe = False,
        showland = True,
        landcolor = "rgb(229, 229, 229)",
        showcountries = False,
        domain = dict(
            x = [ 0, 0.6 ],
            y = [ 0, 0.6 ]
        ),
        bgcolor = 'rgba(255, 255, 255, 0.0)',
    ),
    legend = dict(
           traceorder = 'reversed'
    )
)

fig = go.Figure(layout=layout, data=cases+inset)

iplot(fig, validate=False, filename='West Africa Ebola cases 2014' )

In [14]:
df = pd.DataFrame({
    'ville': ['Nantes', 'Paris', 'Marseille'],
    'lat': [47.216667, 48.866667, 43.300000],
    'lon': [-1.550000, 2.333333, 5.400000],
    'code': ['44109', '75100', '13200'],
    'population': [934165, 2206488, 861635]
})

df

Unnamed: 0,ville,lat,lon,code,population
0,Nantes,47.216667,-1.55,44109,934165
1,Paris,48.866667,2.333333,75100,2206488
2,Marseille,43.3,5.4,13200,861635


In [15]:
trace = go.Scattergeo(
    lon=df['lon'],
    lat=df['lat'],
    text=df['ville'],
    marker={
        'size': 50. * df['population'] / df['population'].max()
    }
)

layout = go.Layout(
    title="Principales villes de France",
    geo={
        'scope': 'europe',
        'resolution': 50,
        'showland': True,
        'landcolor': "rgb(229, 229, 229)",
        'countrycolor': "rgb(255, 255, 255)",
        'projection': {'type': 'Mercator'},
        'lonaxis': {
            'range': [-5.0, 8.0]
        },
        'lataxis': {
            'range': [42.0, 51.0]
        }
    }
)

figure = go.Figure(layout=layout, data=go.Data([trace]))
iplot(figure)

`Scattergeo` est pratique pour les cartes à grande échelle, mais si vous voulez une carte avec des villes et des rues, etc., vous serez vite limités.

`Plotly` propose de s'aider de la solution [Mapbox](https://www.mapbox.com/) pour cela.

On a besoin d'avoir un compte sur mapbox et de récupérer une clé pour l'utiliser. Voici une petite fonction qui permettra de lire la clé.

In [16]:
def get_mapbox_access_token(folderpath='.', filename="mapbox.txt"):
    import os
    
    with open(os.path.join(folderpath, filename), 'r') as file:
        s = file.read()
    
    return s

In [18]:
mapbox_access_token = get_mapbox_access_token()

data = [
    go.Scattermapbox(
        lat=['38.91427','38.91538','38.91458',
             '38.92239','38.93222','38.90842',
             '38.91931','38.93260','38.91368',
             '38.88516','38.921894','38.93206',
             '38.91275'],
        lon=['-77.02827','-77.02013','-77.03155',
             '-77.04227','-77.02854','-77.02419',
             '-77.02518','-77.03304','-77.04509',
             '-76.99656','-77.042438','-77.02821',
             '-77.01239'],
        mode='markers',
        marker=dict(
            size=9
        ),
        text=["The coffee bar","Bistro Bohem","Black Cat",
             "Snap","Columbia Heights Coffee","Azi's Cafe",
             "Blind Dog Cafe","Le Caprice","Filter",
             "Peregrine","Tryst","The Coupe",
             "Big Bear Cafe"],
    )
]

layout = go.Layout(
    autosize=True,
    hovermode='closest',
    mapbox=dict(
        accesstoken=mapbox_access_token,
        bearing=0,
        center=dict(
            lat=38.92,
            lon=-77.07
        ),
        pitch=0,
        zoom=10
    ),
)

fig = dict(data=data, layout=layout)
iplot(fig, filename='Multiple Mapbox')

# Et si je veux plus de types de graphiques ?

Alors je vais voir [ici](https://plot.ly/python/).