In [1]:
#!pip install plotly kaleido 

In [2]:
import pandas as pd
import plotly.express as px

In [12]:
data = pd.read_csv("wdi.csv")
data.head()

Unnamed: 0,iso3,country,year,continent,region,population,gdp,gdp_capita,life_expectancy,inflation,fertility,maternal_death,infant_mortality_per_1000,suicides_per_100k
0,AFG,Afghanistan,1990,Asia,South Asia,10694796.0,,,45.967,,7.565,,120.9,
1,ALB,Albania,1990,Europe,Europe & Central Asia,3286542.0,8379850000.0,2549.746801,73.144,-0.431369,2.9,,35.4,
2,DZA,Algeria,1990,Africa,Middle East & North Africa,25518074.0,177965000000.0,6974.076379,67.416,30.259599,4.556,,43.6,
3,ASM,American Samoa,1990,Oceania,East Asia & Pacific,47818.0,,,,,,,,
4,AND,Andorra,1990,Europe,Europe & Central Asia,53569.0,,,,7.326244,,,9.1,


In [13]:
df = data[(data.continent.isin(["Europe","Africa"])) & (data.year==2020)].reset_index(drop=True)
df.head()

Unnamed: 0,iso3,country,year,continent,region,population,gdp,gdp_capita,life_expectancy,inflation,fertility,maternal_death,infant_mortality_per_1000,suicides_per_100k
0,ALB,Albania,2020,Europe,Europe & Central Asia,2837849.0,39911620000.0,14064.038615,76.989,0.696542,1.4,0.011492,8.4,
1,DZA,Algeria,2020,Africa,Middle East & North Africa,43451666.0,497618300000.0,11452.226624,74.453,-5.025876,2.942,0.241703,19.6,
2,AND,Andorra,2020,Europe,Europe & Central Asia,77700.0,,,,1.113786,,,2.7,
3,AGO,Angola,2020,Africa,Sub-Saharan Africa,33428486.0,212853800000.0,6367.437317,62.261,10.763105,5.371,1.263877,48.7,
4,AUT,Austria,2020,Europe,Europe & Central Asia,8916864.0,510568000000.0,57258.690227,81.192683,2.558602,1.44,0.007284,3.0,


# 1. Plot Types

- Plotly Express supports a wide variety of plot types (scatter, line, bar, pie, choropleth, etc.)
- In the following, we focus mostly on scatter plots, but most of what we learn here applies to other plot types as well.
- [Full list of plot types](https://plotly.com/python-api-reference/plotly.express.html)
- Gallery of [basic charts](https://plotly.com/python/basic-charts/), [statistical charts](https://plotly.com/python/statistical-charts/) and [maps](https://plotly.com/python/maps/)

In [4]:
px.scatter(data_frame=df, x='gdp_capita', y='life_expectancy')

In [None]:
px.bar(df, x='iso3', y='gdp_capita')

In [18]:
px.line(df, x='iso3', y='gdp_capita')

In [19]:
px.choropleth(df, locations='iso3', color='life_expectancy')

# 2. Data encoding


- When we create a scatterplot, we must specify which data columns are encoded as the `x` and `y` axes. 
- Similarly, we can encode additional data columns using other visual properties such as `color`, `size` or `symbol`.
- Categorical variables are often encoded using `color` or `symbol`, while continuous variables can be encoded using `color` or `size`.

| **Visual Property**| **Description**                                                        |
| ----------------- | ---------------------------------------------------------------------- |
| `x`               | Position on the x-axis                                                 |
| `y`               | Position on the y-axis                                                 |
| `size`            | Size of the marker (area or diameter)                                  |
| `symbol`          | Shape of the marker (circle, square, diamond, ...)                     |
| `color`           | Marker or trace color                                                  |
| `text`            | Text labels shown next to or on the markers                            |
| `hover_name`       | Name that appears when hovering over a point                           |
| `hover_data`       | List of additional information shown when hovering over a point        |
| `facet_col`       | Subplots arranged in columns (based on a categorical variable)         |
| `facet_row`       | Subplots arranged in rows (based on a categorical variable)            |
| `animation_frame` | Determines the animation frames (e.g., one frame per year or category) |


In [16]:
# Try out different encodings for the same data column:

px.scatter(df, x='life_expectancy', y='gdp_capita', color='continent', hover_name="country")
px.scatter(df, x='life_expectancy', y='gdp_capita', symbol='continent', hover_name="country")
px.scatter(df, x='life_expectancy', y='gdp_capita', facet_col='continent', hover_name="country")
px.scatter(df, x='life_expectancy', y='gdp_capita', color='continent', symbol='continent', hover_name="country")

In [7]:
px.scatter(df, x='gdp_capita', y='life_expectancy', color='fertility', hover_name='country')

# 3. Basic Labelling

- Plotly Express exposes a variety of parameters to label the plot (title, subtitle, axis labels). 
- More detailed customization can be achieved using the `update_layout()` and `update_traces()` methods (see following Notebook).

In [8]:
fig = px.scatter(df, 
           x='gdp_capita', 
           y='life_expectancy', 
           color='continent', 
           title='Do rich people live longer?',
           subtitle='GDP per capita vs life expectancy in 2020',
           labels={'gdp_capita': 'GDP per Capita (in US $)',
                   'life_expectancy': 'Life Expectancy (in years)',
                   'continent': 'Continent'})

fig

# 4. Exporting

- Exporting Plotly figures to static images (png, jpg, svg, pdf) requires the `kaleido` package.
- Plotly's figures by default always fill the entire available width of the Jupyter Notebook cell. To control the size of the figure, we can set the `width` and `height` parameters (in pixels) - either during figure creation, or when exporting the figure.
- **Raster formats (png, jpg)** can become pixelated. To avoid this, we can increase the `scale` (e.g. 2 or 3) parameter.
- **Vector formats (svg, pdf)** are resolution-independent and always look sharp.
- **HTML** also look sharp and preserves interactivity (zooming, hovering, etc.) .

In [9]:
fig.write_image("figures/plot.png", width=800, height=500, scale=1)
fig.write_image("figures/plot.svg", width=800, height=500)
fig.write_html("figures/plot.html", default_width=800, default_height=500)

# 5. Pandas Plotting Backend

In [10]:
pd.options.plotting.backend = "plotly"  # matplotlib is default

In [11]:
df.plot.scatter(x='gdp_capita', y='life_expectancy', color='continent')