# The Plotly library

This tutorial demonstrates the basic capabilities of `plotly` a popular Python library for creating interactive data visualizations.

Some exercises may require further tools for a better or simple solution. Feel free to check out the official documentation, ask for help, search for a hint, or even ask a generative AI for problem solving.

It may be necessary to install or upgrade `plotly` to the newest version, by running the following cell. Note that this tutorial notebook was tested using `plotly-6.5.0`.

In [None]:
%pip install -U plotly

Plotly Express is the high-level interface for creating charts in `plotly`, and is imported as `px`. There is also a low-level interface called Plotly Graph Objects for better customization.

In [None]:
# needed for some setup tasks and exercises:
import numpy as np
import pandas as pd

In [None]:
import plotly.express as px
import plotly.graph_objects as go # for more customization

Subplots can also be created in `plotly`.

In [None]:
from plotly.subplots import make_subplots # for subplots

# Datasets used

In [None]:
# for setting up datasets
import pandas as pd
import plotly.express as px

The following datasets are used in this tutorial.

*   Gapminder (for most charts)
*   Hungarian counties
*   Hungarian municipalities

The Gapminder dataset is a built-in dataset in `plotly`. This dataset provides country-level data on life expectancy at birth, population size, and GDP per capita, available in five-year intervals from 1952 to 2007.

In [None]:
df = px.data.gapminder()

df.head()

Hungarian counties and populations are available as two pieces. First, a GeoJSON file is needed which contains the geometry of the county borders. These can either be downloaded from the original site by code, or opened as a file. In Google Colab, is should be uploaded first, named as `counties.geojson`. Either option can be used.

In [None]:
# Option (1): downloading counties.geojson by code
import json
import urllib.request
url = "https://raw.githubusercontent.com/wuerdo/geoHungary/master/counties.geojson"
with urllib.request.urlopen(url) as r:
    hu_geo = json.load(r)

In [None]:
# Option (2): reading counties.geojson as a file (after manually provided)
import json
with open("counties.geojson", "r", encoding="utf-8") as f:
    hu_geo = json.load(f)

Second, the names of the Hungarian counties (as appearing in the GeoJSON) and their populations are defined below.

In [None]:
df_hungarian_counties = {
    "id": [
        "Budapest",
        "Bács-Kiskun",
        "Baranya",
        "Békés",
        "Borsod-Abaúj-Zemplén",
        "Csongrád",  # "Csongrád-Csanád" today, but the geojson is older
        "Fejér",
        "Győr-Moson-Sopron",
        "Hajdú-Bihar",
        "Heves",
        "Jász-Nagykun-Szolnok",
        "Komárom-Esztergom",
        "Nógrád",
        "Pest",
        "Somogy",
        "Szabolcs-Szatmár-Bereg",
        "Tolna",
        "Vas",
        "Veszprém",
        "Zala"
    ],
    "population": [
        1685209,
        488547,
        351158,
        307112,
        610927,
        388106,
        418562,
        471648,
        520129,
        282490,
        349726,
        299262,
        178815,
        1336134,
        290245,
        520551,
        204567,
        245598,
        333345,
        257371
    ]
}
# Hungarian counties and populatons, on 2025-01-01
# (source: KSH, https://www.ksh.hu/stadat_files/nep/hu/nep0034.html, 2025-11-16)

The Hungarian municipalities dataset is available as a single, merged table that contains population, area, latitude, and longitude values, in a single file called `hungarian_municipalities_merged.csv`.

In [None]:
df_hungarian_municipalities = pd.read_csv("hungarian_municipalities_merged.csv", sep=',')

df_hungarian_municipalities.head()

# Basic charting

In the following example, `scatter()` is used to create a scatter chart comparing GDP per capita with life expectancy for all countries and years, also displaying the continent and population data. Notice the parameters responsible for formatting and interactivity.

In [None]:
fig = px.scatter(
    df,
    x="gdpPercap",
    y="lifeExp",
    size="pop",
    color="continent",
    symbol="continent",
    hover_name="country",
    category_orders={
        "continent": sorted(df.continent.unique()),
        # "year": sorted(df.year.unique()), # default is sorted
    },
    hover_data={"year": None, "pop":":,", "lifeExp":":.1f"},
    log_x=True,
    animation_frame="year",
    animation_group="country",
    title="Gapminder: GDP vs Life Expectancy",
    labels={
        "gdpPercap": "GDP per Capita (USD)",
        "lifeExp": "Life Expectancy (years)",
    },
    # template="plotly_dark",
    color_discrete_sequence=px.colors.qualitative.Set1, # only if color= is categorical
    # color_continuous_scale="Viridis", # only if color= is numeric
)

fig.show()

`x=`, `y=` are the main numeric columns used for the chart. Further categorization can be performed by `size=`, `color=`, and `symbol=`. Note that these parameters accept only columns.

The `category_orders=` parameter can be used to customize the order of category fields. The default ordering rule is increasing for numeric and first appearance in the data frame for category fields.

Bubbles in this scatter chart have **tooltips**, which appear when hovering over. `hover_name` tells which column is used for the title of the tooltip. `hover_data=` can be used to hide, show, and format specific columns in the tooltip.

The `animation title=` parameter tells `plotly` to create the slider at the bottom. We can select a given year or play as an animation. The `animation_group=` parameter tells ploty that what bubbles denote the same entities during the animation. `animation_group="country"` means that bubbles belonging to the same country will be connected with a transition when the slider is being played. Without this setting, the transitions **might still work** (`plotly` can interpolate matching data points based on row order), but not guaranteed.

The `color_discrete_sequence=` parameter can be used to provide a custom color set if the `color=` parameter refers to a categorical column. If the `color=` refers to a numeric column, `color_continuous_scale=` can be used instead.

Note that, unlike for `matplotlib` and `seaborn`, the preferred way of displaying `plotly` figures is `fig.show()`, and not `plt.show()`.

The following example is a line chart created using `line()`, which shows life expectancy over years for five selected countries.

In [None]:
countries = ["Hungary", "Germany", "France", "United Kingdom", "Singapore", "United States"]
df_filtered = df[df["country"].isin(countries)]

fig = px.line(
    df_filtered,
    x="year",
    y="lifeExp",
    color="country",
    line_dash="continent",
    hover_name="country",
    markers=True,
    title="Life Expectancy over Time",
    category_orders={
        "continent": ["Europe", "Asia", "America"],
        "year": sorted(df.year.unique()),
    },
    labels={
        "lifeExp": "Life Expectancy (years)",
        "year": "Year",
        "country": "Country"
    },
    color_discrete_sequence=px.colors.qualitative.Set2
)

fig.update_traces(
    line=dict(width=3),
    marker=dict(size=8),
)

fig.update_layout(
    template="plotly_white",
    hovermode="x unified", # single hover box for all lines
    legend_title_text="country",
    title_font=dict(size=22),
    xaxis=dict(dtick=5, showgrid=False),
    yaxis=dict(showgrid=True, gridcolor="lightgray"),
)

fig.show()

In this example, `x=` and `y=` are the main axes, while `color=` and `line_dash=` are used to add additional dimensions. Note that `color=` tells which points should be connected by lines.

Further customization can be appied using `fig.update_traces()` for data traces (lines, markers, shapes) and `fig.update_layout()` for chart-level settings.

Notably, `hovermode=` in `fig.update_layout()` can be used to customize which data points are shown in tooltips. "closest" is the default, "x" and "y" can be used to display tooltips for all data points with equal "x" or "y" values, and "x unified" and "y unified" can be used to show a single combined tooltip for such data points.

Note: chart settings in `plotly` are translated to JSON, so many configuration settings are passed as dictionaries. The `dict()` syntax preferred for providing such settings. However, sometimes the `{}` syntax is used, e.g. for `label=`, because keys are not always valid Python identifiers.  

The following example demonstrates faceting using `box()`.

In [None]:
continents = ["Europe", "Asia", "Africa"]
df_filtered = df[df["continent"].isin(continents) & (df["year"] == 2007)]

fig = px.box(
    df_filtered,
    # x="continent",
    y="gdpPercap",
    color="continent",
    facet_col="continent",
    # facet_col_wrap=2,
    facet_col_spacing=0.05,
    hover_name="country",
    points="all",
    title="GDP per Capita Comparison: Europe, Asia, and Africa (2007)",
    labels={
        "gdpPercap": "GDP per Capita (USD)",
        "continent": "Continent"
    },
    color_discrete_sequence=px.colors.qualitative.Pastel,
)

fig.update_layout(
    template="plotly_white",
    showlegend=False,
    title_font=dict(size=20),
    # yaxis=dict(type=log),
)
# fig.update_yaxes(matches='y')

for axis_name in fig.layout:
    if axis_name.startswith("yaxis"):
        fig.layout[axis_name].type = "log"
        fig.layout[axis_name].matches = "y"

fig.show()

The `facet_col=` parameter is used to create multiple charts by a categorical column. `facet_col_wrap=` could be used to limit the number of charts per row, and `facet_col_spacing=` adds formatting. There are analogous options for row faceting called `facet_row=` and `facet_row_spacing=`, but there is no `facet_row_wrap=`.

Note: faceting has some limitations when applying formatting to the individual facets. In the example, the syncronized and logarithmic scale is set "manually" for each axis, and not using the built-in functions.

### Exercise (Basic charting) 1.

Create a vertical strip plot (using `strip()`) displaying life expectancies of countries, animated over years, categorized by continent, with country names and GDP per capita in the tooltips.

### Exercise (Basic charting) 2.

Create an area chart (using `area()`) displaying total population of India, China, the United States, Indonesia, and Pakistan (in that order), over all years (1952-2007), with a single tooltip for the same year that also shows GDP per capita.

# Map charts

`plotly` has two classes of map chart types: projection-based and tile-based.

The following example shows the `choropleth()` map type, which is projection-based. In choropleth maps, regions are filled according to a quantity. Notice the interactive behavior of the map.

In [None]:
fig = px.choropleth(
    df,
    locations="iso_alpha",
    color="lifeExp",
    scope="world", # could be "europe", "africa", etc.
    hover_name="country",
    animation_frame="year",
    animation_group="country",
    color_continuous_scale="Viridis",
    projection="natural earth", # could be "equirectangular", "mercator", etc.
    hover_data={
        "iso_alpha": None,
    },
    labels={
        "lifeExp": "Life Expectancy (years)",
        "year": "Year",
        "country": "Country",
    },
    title="Life Expectancy by Country",
)

fig.update_geos(
    showcountries=True,
    showcoastlines=True,
    coastlinecolor="gray",
    showocean=True,
    oceancolor="#F0FAFF",
)

fig.show()

The data points are matched to countries using the `locations=` parameter. Currently, ISO 3166-1 alpha-3 codes are used for matching. This could be adjusted using the `locationmode=` parameter.

The `projection=` parameter supports a variety of map projections.

After a figure is created, it can be further customized using `update_geos()`. Note that this function only works for specific map types.

In contrast to `choropleth()`, the `choropleth_map()` is a tile-based map type, providing an experience which is more like a web map.

For the following example, custom GeoJSON data is used, which contains county boundaries that are needed to visualize the regions (`hu_geo`). A corresponding data frame matches county names to populations (`df_hungarian_counties`).

In [None]:
fig = px.choropleth_map(
    df_hungarian_counties,
    geojson=hu_geo,
    locations="id",
    featureidkey="properties.megye",
    color="population",
    map_style="open-street-map",
    #center={"lat": 47.162, "lon": 19.503},
    #zoom=5.5,
    color_continuous_scale="Hot",
    title="Hungarian counties — choropleth_map() example"
)

fig.update_layout(
    map = dict(
        center = {"lat": 47.162, "lon": 19.503},
        zoom = 5.5
    )
)
fig.show()

In the example above, note that the centering and zooming of the map must be set either via arguments to `choroplath_map()`, or via `fig.update_layout()`, otherwise the map is just centered into the default $(0, 0)$ point, inside the Atlantic Ocean.

An example for a non-choropleth-style map is `scatter_geo()`, which is useful for depicting a set of geo-coordinates. The following example uses the Hungarian municipalities dataset.

In [None]:
fig = px.scatter_geo(
    df_hungarian_municipalities,
    lat="lat",
    lon="lon",
    size="population",
    color="area",
    hover_name="name",
    hover_data={
        "id": False,
        "population": ":,",
        "area": True,
        "lat": False,
        "lon": False,
    },
    projection="mercator", # or "natural earth"
    title="Hungarian Municipalities – Population & Area",
)

fig.update_geos(
    fitbounds="locations",
    showcountries=True,
)

max_pop_hungarian_town = df_hungarian_municipalities["population"].max()
fig.update_traces(
    marker=dict(
        sizemode="area",
        sizeref=max_pop_hungarian_town / (40**2),
        sizemin=2,
        line=dict(width=0.5, color="black"),
    )
)

fig.update_layout(
    margin={"r":0, "t":40, "l":0, "b":0},
)

fig.show()

The `fitbounds="locations"` parameter setting of `update_geos()` sets the initial view of the map to automatically adjust ot the points displayed.

The size of bubbles is determined by population. In the example, the `update_traces()` function call was used to manually adjust the minimum and maximum sizes, and format the bubbles of an outline.

A similar, but tile-based map type is `scatter_map()`. The following example displays the same dataset. The bubble sizes are manually computed instead of `update_traces()`, because lack of support for the `sizemin=` parameter.

In [None]:
sizes = df_hungarian_municipalities["population"]
sizes = (sizes - sizes.min()) / (sizes.max() - sizes.min())
sizes = sizes * 38 + 2

fig = px.scatter_map(
    df_hungarian_municipalities,
    lat="lat",
    lon="lon",
    size=sizes, # manually adjusted
    color="area",
    hover_name="name",
    hover_data={
        "id": False,
        "population": ":,",
        "area": True,
        "lat": False,
        "lon": False,
    },
    map_style="open-street-map",
    title="Hungarian Municipalities – Tile Map (Population & Area)",
)

fig.update_layout(
    map=dict(
        center={"lat": 47.162, "lon": 19.503},
        zoom=6,
    ),
    margin={"r": 0, "t": 40, "l": 0, "b": 0}
)

fig.show()

### Exercise (Map charts) 1.

Create a map showing GDP per capita data of European counties in 2007.

### Exercise (Map charts) 2.

Given a reference position, a maximum distance, and a function implementing the Haversine formula for calculating the distance between two positions (as a reference, but can be used as is).

Display all Hungarian municipalities that are close to the given position, with bubble sizes and colors showing area and population, and with Open Street Map in the background.

Note: the `geopy` package could also be used for this task instead of applying the formula.

In [None]:
lat0, lon0 = 47.18183048664217, 17.85553168260499
max_dist_km = 25

In [None]:
def haversine_dist_km(latA, lonA, latB, lonB):
    latA = np.radians(latA)
    lonA = np.radians(lonA)
    latB = np.radians(latB)
    lonB = np.radians(lonB)
    dlat = latB - latA
    dlon = lonB - lonA
    a = np.sin(dlat/2)**2 + np.cos(latA)*np.cos(latB)*np.sin(dlon/2)**2
    c = 2*np.arcsin(np.sqrt(a))
    R = 6371 # Earth radius in km
    return R * c

# Animations and Interactivity

The interactivity options and the animations on a chart can be highly customized in `plotly`, by simply altering the data structure representing the chart. However, achieving a given goal may be very cumbersome. Some applications are shown below.

The following example is a scatter chart showing GDP per capita and life expectancy data for countries, categorized by continents, and has the following key features.

*   The X and Y axes are automatically resized per frame.
*   The original slider is formatted.
*   A new slider that can change opacity is added.
*   Fast forward and jump to start buttons are added.



In [None]:
fig = px.scatter(
    df,
    x="gdpPercap",
    y="lifeExp",
    color="continent",
    symbol="continent",
    hover_name="country",
    category_orders={"continent": sorted(df.continent.unique())},
    log_x=True,
    animation_frame="year",
    animation_group="country",
    title="Gapminder: GDP vs Life Expectancy (Chart #1)",
    labels={
        "gdpPercap": "GDP per Capita (USD)",
        "lifeExp": "Life Expectancy (years)",
        "year": "Year",
        "continent": "Continent",
    },
    color_discrete_sequence=px.colors.qualitative.Set1,
)

#fig.update_xaxes(range=[
#    np.log10(df["gdpPercap"].min()*0.9),np.log10(df["gdpPercap"].max()*1.1)])
#fig.update_yaxes(range=[df["lifeExp"].min()*0.9, df["lifeExp"].max()*1.1])

fig.update_xaxes(autorange=True)
fig.update_yaxes(autorange=True)
fig.layout.updatemenus[0].buttons[0].args[1]["frame"]["redraw"] = True

anim_slider = fig.layout.sliders[0]
anim_slider.currentvalue.prefix = "Year: "
anim_slider.currentvalue.font.size = 18
anim_slider.transition.easing = "cubic-in-out"

opacity_slider = {
    "active": 4,
    "y": -0.7,
    "x": 0.1,
    "len": 0.8,
    "currentvalue": {"prefix": "Opacity: "},
    "steps": [
        {
            "label": f"{alpha:.2f}",
            "method": "restyle",
            "args": [{"opacity": alpha}, []]
        }
        for alpha in [0.1, 0.25, 0.5, 0.75, 1.0]
    ]
}

fig.update_layout(
    sliders = list(fig.layout.sliders) + [opacity_slider]
)

jump_and_fast = {
    "type": "buttons",
    "direction": "right",
    "x": 0,
    "y": 0,
    "pad": {"r": 10, "t": 10},
    "buttons": [
        {
            "label": "⏮",
            "method": "animate",
            "args": [
                [fig.frames[0].name],
                {
                    "mode": "immediate",
                    "frame": {"duration": 0, "redraw": True},
                    "transition": {"duration": 0}
                }
            ],
        },
        {
            "label": "⏩︎",
            "method": "animate",
            "args": [
                None,
                {
                    "mode": "immediate",
                    "frame": {"duration": 80},
                    "transition": {"duration": 0}
                }
            ],
        },
    ],
}

fig.update_layout(
    updatemenus = fig.layout.updatemenus + (jump_and_fast,)
)

fig.show()

`fig.update_xaxes(autorange=True)` is used to define the ranges dynamically. `fig.layout.updatemenus[0].buttons[0].args[1]["frame"]["redraw"] = True` - This row tells `plotly` to redraw the frames whenever the play button is clicked. Note that it is also possible to set the axes to a custom constant, then resizing is not needed throughout the animation, as shown in the commented code section.

Sliders and other controls can be added to a chart, and their behavior can be defined using `fig.update_layout()`.

The original (first) slider has custom transition, delay and title format options.

The new slider has a custom list of steps. Each has the `"method"` set to `"restyle"`. The four supported methods are the following.

*   `"restyle"` - Changes style settings.
*   `"relayout"` - Changes layout settings.
*   `"update"` - Changes both style and layout settings at once.
*   `"animate"` - Triggers animation.

The `updatemenus` attribute is a key for managing animation. In this example, two buttons are added. The first one jumps to the label `fig.frames[0].name`, which is the first year. The second button fast forwards, but without changing the Y-axis.

The buttons are just defined as a data structure and simply appended to `updatemenus`. Their positioning and behavior could be further customized.

The next example shows the following changes.

*   The X-axis and Y-axis limits are now fixed so that no resizing is needed throughout the animation.
*   Buttons are shown which now filter visibility of particular continents.

In [None]:
fig = px.scatter(
    df,
    x="gdpPercap",
    y="lifeExp",
    color="continent",
    symbol="continent",
    hover_name="country",
    category_orders={"continent": sorted(df.continent.unique())},
    log_x=True,
    animation_frame="year",
    animation_group="country",
    title="Gapminder: GDP vs Life Expectancy (Chart #2)",
    labels={
        "gdpPercap": "GDP per Capita (USD)",
        "lifeExp": "Life Expectancy (years)",
        "year": "Year",
        "continent": "Continent",
    },
    color_discrete_sequence=px.colors.qualitative.Set1,
)

fig.update_xaxes(range=[
    np.log10(df["gdpPercap"].min()*0.9),np.log10(df["gdpPercap"].max()*1.1)])
fig.update_yaxes(range=[df["lifeExp"].min()*0.9, df["lifeExp"].max()*1.1])

fig.update_layout(
    sliders=[
        {
            "currentvalue": {
                "prefix": "Year: ",
                "font": {"size": 18}
            },
            "transition": {"duration": 200, "easing": "cubic-in-out"},
        }
    ]
)

all_visible   = [True] * len(fig.data)
europe_only   = [tr.name == "Europe" for tr in fig.data]
asia_only     = [tr.name == "Asia"   for tr in fig.data]

new_buttons = {
    "type": "buttons",
    "direction": "right",
    "x": 0.1,
    "y": -0.7,
    "xanchor": "left",
    "yanchor": "top",
    "pad": {"r": 10, "t": 10},
    "buttons": [
        {
            "label": "Show All",
            "method": "update",
            "args": [{"visible": all_visible}],
        },
        {
            "label": "Europe Only",
            "method": "update",
            "args": [{"visible": europe_only}],
        },
        {
            "label": "Asia Only",
            "method": "update",
            "args": [{"visible": asia_only}],
        },
    ],
}

fig.update_layout(
    updatemenus = fig.layout.updatemenus + (new_buttons,)
)

fig.show()

The X-axis uses a logarithmic scale, so its limits are set accordingly.

Note about the sliders: there is no second slider now, so the formatting of the first slider works this way.

The buttons are configured to show or hide a selected set of **traces**. It is very important that `plotly` generates the plots as a collection of traces. In this instance, each trace corresponds to a color, i.e., a particular continent. So, what we can apply this way corresponds to the format of data points of whole continents. Also, there is a limitation regarding what kind of formatting can be done, e.g. changing markers may not work since the traces are copied onto different frames due to the animation.

In short: interactivity can be customized, but would require very much work for certain tasks.

It should also be noted that very simple tasks may be supported by default. For example, clicking the legend titles can show and hide particular continents as well.

The last example shows implements a dropdown menu to perform the same selection task.

In [None]:
fig = px.scatter(
    df,
    x="gdpPercap",
    y="lifeExp",
    color="continent",
    symbol="continent",
    hover_name="country",
    category_orders={"continent": sorted(df.continent.unique())},
    log_x=True,
    animation_frame="year",
    animation_group="country",
    title="Gapminder: GDP vs Life Expectancy",
    labels={
        "gdpPercap": "GDP per Capita (USD)",
        "lifeExp": "Life Expectancy (years)",
        "year": "Year",
        "continent": "Continent",
    },
    color_discrete_sequence=px.colors.qualitative.Set1,
)

fig.update_xaxes(range=[
    np.log10(df["gdpPercap"].min()*0.9),np.log10(df["gdpPercap"].max()*1.1)])
fig.update_yaxes(range=[df["lifeExp"].min()*0.9, df["lifeExp"].max()*1.1])

fig.update_layout(
    sliders=[
        {
            "currentvalue": {
                "prefix": "Year: ",
                "font": {"size": 18}
            },
            "transition": {"duration": 200, "easing": "cubic-in-out"},
        }
    ]
)

trace_names = [tr.name for tr in fig.data]
continents = sorted(df["continent"].unique())
dropdown_options = [
    {
        "label": "Show All",
        "method": "update",
        "args": [{"visible": [True] * len(fig.data)}],
    }
]
dropdown_options += [
    {
        "label": cont,
        "method": "update",
        "args": [{"visible": [name == cont for name in trace_names]}],
    }
    for cont in continents
]
continent_dropdown = {
    "type": "dropdown",
    "direction": "down",
    "showactive": True,
    "x": 0.01,
    "y": 1,
    "xanchor": "left",
    "yanchor": "top",
    "pad": {"r": 10, "t": 10},
    "buttons": dropdown_options,
}

fig.update_layout(
    updatemenus = fig.layout.updatemenus + (continent_dropdown,)
)

fig.show()

The options in the dropdown list are added manually. The formatting is still applied to entire traces.

### Exercise (Animation and Interactivity)

Create a bar chart (using `bar()`) with the following features.

*   The GDP values (not per capita!) of the countries with the top 20 largest populations in 2007 are displayed.
*   The bars are colored according to GDP per capita.
*   The chart is animated over years.
*   The Y-axis has a fixed limit throughout the animation.
*   The bars are **sorted by GDP (descending)** in each year, resulting in the bars racing with each other as the years are played as an animation.
*   Add a jump to start button.

Hint: sort the original data in a specific order. It is not required to hard-code a sorting rule into the animation.

# Special chart types

`plotly` many chart types. Some additional, special types are demonstrated in this section.

It is possible to make **subplots** with `plotly` using a separate utility available in `plotly.subplots`. The following example demonstrates subplots on three new chart types.

The `sunburst()`, `treemap()`, and `icicle()` chart types are all great options to visualize hierarchical data. They also support selection. The `path=` parameter is used to define the hierarchy in each.

In [None]:
from plotly.subplots import make_subplots

df_2007 = df[df["year"] == 2007]

fig = make_subplots(
    rows=1, cols=3,
    specs=[
        [{"type": "domain"}, {"type": "domain"}, {"type": "domain"}]
    ],
    subplot_titles=("Sunburst", "Treemap", "Icicle")
)

fig.add_trace(
    px.sunburst(
        df_2007,
        path=["continent", "country"],
        values="pop").data[0],
    row=1, col=1
)

fig.add_trace(
    px.treemap(
        df_2007,
        path=["continent", "country"],
        values="pop").data[0],
    row=1, col=2
)

fig.add_trace(
    px.icicle(
        df_2007,
        path=["continent", "country"],
        values="pop").data[0],
    row=1, col=3
)

fig.update_layout(
    title="Subplots demonstration, and hierarchical chart types",
    height=500,
    width=850,
)
fig.show()

`plotly` supports 3D scatter, surface, mesh and line plot charts. The following example demonstrates `scatter_3d()`. Note the full interactivity.

In [None]:
fig = px.scatter_3d(
    df,
    x="gdpPercap",
    y="lifeExp",
    z="pop",
    color="continent",
    hover_name="country",
    animation_frame="year",
)

fig.update_traces(marker=dict(size=5))

fig.update_layout(
    title="scatter_3d() demonstration",
)

fig.show()

Note that `.data[0]` at the end of each plot function call is used to get the first trace. For `sunburst()`, `treemap()`, and `icicle()`, there is a single trace, so that is fetched.

**Ternary diagrams** can be used to display ternary compositions. There are several types of such diagrams. In the following example, `scatter_ternary()` is used.

Note that the Gapminder data set does not contain ternary data. Instead synthetic data is created by normalizing population, GDP per capita, and life expectancy values between 0 and 1. This gives some insight into which of the three features are significantly larger compared to other countries.

In [None]:
df_tern = df[df["year"] == 2007].copy()

def rate(series):
    min = series.min()
    max = series.max()
    return (series - min) / (max - min)

df_tern["pop_rate"] = rate(np.log(df_tern["pop"]))
df_tern["gdpPercap_rate"] = rate(df_tern["gdpPercap"])
df_tern["lifeExp_rate"] = rate(df_tern["lifeExp"])

fig = px.scatter_ternary(
    df_tern,
    a="pop_rate",
    b="gdpPercap_rate",
    c="lifeExp_rate",
    color="continent",
    hover_name="country",
    hover_data={
        "pop_rate": True,
        "gdpPercap_rate": True,
        "lifeExp_rate": True,
        "pop": True,
        "gdpPercap": True,
        "lifeExp": True,
    },
    title="Synthetic ternary chart: Countries of the World (Gapminder, 2007)",
)

fig.show()

Parallel plots are available in `plotly` via `parallel_coordinates()` and `parallel_categories()`, as shown below.

In [None]:
df_2007 = df[df["year"] == 2007]

fig = px.parallel_coordinates(
    df_2007,
    dimensions=["lifeExp", "gdpPercap", "pop"],
    color="lifeExp",
    color_continuous_scale=px.colors.sequential.Plasma,
    title="parallel_coordinates() example",
)

fig.show()

In [None]:
df_2007_large = df[(df["year"] == 2007) & (df["pop"] > 50000000)]

fig = px.parallel_categories(
    df_2007_large,
    dimensions=["continent", "country"],
    color="lifeExp",
    color_continuous_scale="Plasma",
    title="parallel_categories() example",
)

fig.show()

 Both chart types have a `dimensions=` parameter that expect a sequence of columns. The key difference is that `parallel_coordinates()` uses numeric columns, and `parallel_categories()` uses categorical columns.

Polar charts differ from ordinary charts by using polar coordinates instead of perpendicular axes. The following example demonstrates `bar_polar()`. Polar variants of other chart types also exist.

In [None]:
df_eu_2007 = df[(df["year"] == 2007) & (df["continent"] == "Europe")]

fig = px.bar_polar(
    df_eu_2007,
    r="gdpPercap",
    theta="country",
    color="pop",
    color_continuous_scale="Solar",
    title="GDP per capita and population in Europe (2007) — bar_polar() example",
)

fig.show()

Instead of `x=` and `y=` parameters representing the coordinates, there are `theta=` and `r=` parameters in polar charts.

### Exercise (Special chart types) 1.

Categorize the countries by population and GDP per capita into the following categories. Then, based on the groups formed this way, create a parallel chart matching all countries in 2007, and coloring by life expectancy.

In [None]:
pop_category_labels = [
    "Tiny (0–1M)",
    "Small (1–10M)",
    "Medium (10–50M)",
    "Large (50M+)",
]
gdp_category_labels = [
    "Low income (<1k)",
    "Lower middle (1k–4k)",
    "Upper middle (4k–12k)",
    "High (12k–25k)",
    "Very high (25k+)",
]

### Exercise (Special chart types) 2.

Display all countries' total population with hierarchical categorization by the two categories defined in the previous exercise, using any two of the `sunburst()`, `treemap()`, and `icicle()` chart types. The order of the two categories should be different in the two charts. Use subplots.

# Graph Objects

Compared to `px` or Plotly Express, which is a high-level interface, there is also a low-level interface called Plotly Graph Objects, imported as `go`.

In [None]:
from plotly import graph_objects as go

With Plotly Objects, the individual traces on the same chart can be defined and customized. This does not only allow special combinations of visualized data and full customization options, but some chart types are only available via this approach.

The following example shows a scatter chart of countries, combined with regression lines per continent.

In [None]:
df_2007 = df[df["year"] == 2007]

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=df_2007["gdpPercap"],
    y=df_2007["lifeExp"],
    mode="markers",
    text=df_2007["country"],
    customdata=df_2007[["continent", "pop"]].values,
    marker=dict(
        size=(df_2007["pop"] ** 0.5) / 100,
        sizemode="area",
        color=df_2007["continent"].astype("category").cat.codes,
        colorscale="Plasma",
        showscale=False,
    ),
    name="Countries",
    hovertemplate="<b>%{text}</b> (%{customdata[0]})<br>"
        "Life Expectancy: %{y}<br>GDP per capita: %{x}<br>"
        "Population: %{customdata[1]}"
))

continents = df_2007["continent"].unique()

for cont in continents:
    sub = df_2007[df_2007["continent"] == cont]
    x = np.log10(sub["gdpPercap"])
    y = sub["lifeExp"]
    coef = np.polyfit(x, y, 1)
    poly = np.poly1d(coef)
    x_line = np.linspace(x.min(), x.max(), 50)
    y_line = poly(x_line)

    fig.add_trace(go.Scatter(
        x=10**x_line, # convert back to GDP scale (log axis will handle it)
        y=y_line,
        mode="lines",
        name=f"{cont} trend",
        line=dict(width=2),
        hovertemplate=f"<b>{cont}</b> regression<br>"
            "x=%{{x}}<br>y=%{{y}}<extra></extra>"
    ))

fig.update_layout(
    title="Graph Objects: Scatter chart, with regression lines by Continent",
    xaxis=dict(title="GDP per Capita", type="log"),
    yaxis=dict(title="Life Expectancy"),
    template="plotly_white",
)

fig.show()

First, a figure is created using `go.Figure()`, then individual traces are added to it.

Both the scatter chart and the regression lines are done with the same `go.Scatter()` trace, but with different `mode=` parameters (`"markers"` versus `"lines"`).

The `customdata=` parameter is used to pass custom columns which can be displayed as a tooltip using `hovertemplate=`.

The following example produces a stacked bar chart using `go.Bar()`, plus a line chart using `go.Scatter()` again, on the same diagram.

In [None]:
df_ext = df.copy()
df_ext["gdp_total"] = df_ext["pop"] * df_ext["gdpPercap"]
df_cont = (
    df_ext.groupby(["year", "continent"])[["pop"]]
      .sum()
      .reset_index()
      .sort_values("year")
)
continents = df_cont["continent"].unique()
df_gdp = (
    df_ext.groupby("year")["gdp_total"]
      .sum()
      .reset_index()
      .sort_values("year")
)

In [None]:
fig = go.Figure()

for cont in continents:
    subset = df_cont[df_cont["continent"] == cont]
    fig.add_trace(go.Bar(
        x=subset["year"],
        y=subset["pop"],
        name=cont
    ))

fig.add_trace(go.Scatter(
    x=df_gdp["year"],
    y=df_gdp["gdp_total"],
    mode="lines+markers",
    name="Total GDP",
    yaxis="y2",
    line=dict(width=3, color="black")
))

fig.update_layout(
    title="Population by Continent (Stacked Bars) + Global Total GDP (Line)",
    barmode="stack",
    xaxis=dict(title="Year"),
    yaxis=dict(
        title="Population",
        rangemode="tozero"
    ),
    yaxis2=dict(
        title="Total GDP",
        overlaying="y",
        side="right",
        showgrid=False
    ),
    template="plotly_white",
    legend=dict(x=0.01, y=0.99)
)

fig.show()

Note that in `fig.update_layout()`, the `barmode="stack"` parameter setting is used to achieve the stacked chart. Also, there is a second Y-axis, configured using the `yaxis2=` parameter.

The same data again, but using stacked area instead of stacked bars. This is again done by `go.Scatter()`, with the combination of `mode="lines"` and `stackgroup="one"`.

In [None]:
fig = go.Figure()

for cont in continents:
    subset = df_cont[df_cont["continent"] == cont]
    fig.add_trace(go.Scatter(
        x=subset["year"],
        y=subset["pop"],
        mode="lines",
        name=cont,
        stackgroup="one",
        line=dict(width=0.5),
        hovertemplate=f"{cont}<br>Year=%{{x}}<br>Population=%{{y:,}}<extra></extra>"
    ))

fig.add_trace(go.Scatter(
    x=df_gdp["year"],
    y=df_gdp["gdp_total"],
    mode="lines+markers",
    name="Total GDP",
    yaxis="y2",
    line=dict(width=3, color="black")
))

fig.update_layout(
    title="Population by Continent (Stacked Area) + Global Total GDP (Line)",
    barmode="stack",
    xaxis=dict(title="Year"),
    yaxis=dict(
        title="Population",
        rangemode="tozero"
    ),
    yaxis2=dict(
        title="Total GDP",
        overlaying="y",
        side="right",
        showgrid=False
    ),
    template="plotly_white",
    legend=dict(x=0.01, y=0.99)
)

fig.show()

Conclusion: the `go.Scatter()` trace is very versatile, as it can be used to plot single points, line charts, and even filled areas.

The following example demonstrates the `go.Sunburst()`, `go.Treemap()`, and `go.Icicle()` chart types, in the same manner as for the `px` versions earlier in the tutorial, with one difference: only countries with a life expectancy larger than 70 are shown at the bottom level.

In [None]:
df_2007 = df[df["year"] == 2007]
df_2007_restricted = df_2007[df_2007["lifeExp"] > 70]

df_cont = df_2007.groupby("continent", as_index=False)["pop"].sum()
root = "World"
labels  = [root]
parents = [""]
values  = [df_2007["pop"].sum()] # if branchvalues="total"
# values  = [0] # if branchvalues="remainder"

for _, row in df_cont.iterrows():
    labels.append(row["continent"])
    parents.append(root)
    values.append(row["pop"]) # if branchvalues="total"
    # values.append(0) # if branchvalues="remainder"

for _, row in df_2007_restricted.iterrows():
    labels.append(row["country"])
    parents.append(row["continent"])
    values.append(row["pop"])

fig = make_subplots(
    rows=1, cols=3,
    specs=[[{"type": "domain"}, {"type": "domain"}, {"type": "domain"}]],
    subplot_titles=("Sunburst", "Treemap", "Icicle")
)

fig.add_trace(go.Sunburst(
    labels=labels,
    parents=parents,
    values=values,
    branchvalues="total",
), row=1, col=1)

fig.add_trace(go.Treemap(
    labels=labels,
    parents=parents,
    values=values,
    branchvalues="total",
), row=1, col=2)

fig.add_trace(go.Icicle(
    labels=labels,
    parents=parents,
    values=values,
    branchvalues="total",
), row=1, col=3)

fig.update_layout(
    title="Graph Objects: Sunburst, Treemap, Icicle",
    height=500,
    width=850,
)

fig.show()

Even multiple layers of hierarchy could be provided. The catch is that a root node, the labels, the parent nodes, and the values for each node must me manually defined, which can be a bit tedious. If the structure is wrong, the diagram may be not shown.

This could be made slightly simpler by setting `branchvalues=` to `"remainder"` instead of `"total"`. The effect of this is that non-leaf nodes can have zero values and their sum need not be explicitly calculated. However, this would also affect the tooltips, showing zero for parent nodes.

The missing countries at the bottom level are something that would not work with `px` variants by simply filtering the data frame, because it automatically fills parent nodes. The `go` variants allow the restricted setup.

There are some trace types that have no high-level variant in Plotly Express, and therefore, need to be implemented using Graph Objects. A few such trace types are shown below.

The `go.Funnelarea()` chart type is shown below. Tip: try creating the chart without sorting the data first.

In [None]:
df_2007 = df[df.year == 2007]

df_cont = (
    df_2007.groupby("continent")["pop"]
        .sum()
        .reset_index()
        .sort_values("pop", ascending=False) # try disabling
)

fig = go.Figure()

fig.add_trace(go.Funnelarea(
    labels=df_cont["continent"],
    values=df_cont["pop"],
    textinfo="label+value+percent",
    marker=dict(colors=["#1f77b4", "#ff7f0e", "#2ca02c","#d62728", "#9467bd"])
))

fig.update_layout(
    title="Population Share by Continent (2007) — Funnelarea",
)

fig.show()

With `go.Table()`, interactive and customized data tables can be created. The following example demonstrates this with conditional cell formatting.

In [None]:
df_eu_2007 = df[(df["year"] == 2007) & (df["continent"] == "Europe")]

gdp_norm = (
    (df_eu_2007["gdpPercap"] - df_eu_2007["gdpPercap"].min()) /
    (df_eu_2007["gdpPercap"].max() - df_eu_2007["gdpPercap"].min())
)
colorscale = px.colors.sequential.YlOrRd
gdp_colors = [px.colors.sample_colorscale(colorscale, v)[0] for v in gdp_norm]

fig = go.Figure(go.Table(
    header=dict(
        values=["Country", "Life Expectancy", "GDP per capita",],
        align=["left", "right", "right"],
    ),
    cells=dict(
        values=[
            df_eu_2007["country"],
            df_eu_2007["lifeExp"],
            df_eu_2007["gdpPercap"],
        ],
        fill_color=[
            ["white" for _ in df_eu_2007.index],
            ["lightgreen" if v > 75 else "salmon" for v in df_eu_2007["lifeExp"]],
            gdp_colors,
        ],
        align=["left", "right", "right"],
    )
))

fig.update_layout(
    title="European countries — Gapminder (2007) — go.Table() example",
)
fig.show()

Sankey diagrams are created using the `go.Sankey()` trace type. Arbitrary source, target nodes and flows can be defined and a flow diagram is displayed as a result. Nodes must be matched to indices.

The following example "simulates" a flow diagram: it is simply an expansion of continents into countries. Note that the layout is handled by `plotly`.

In [None]:
df_eua = df[df["continent"].isin(["Europe", "Asia"]) & (df["year"] == 2007)]
continents = df_eua["continent"].unique().tolist()
countries  = df_eua["country"].tolist()
labels = continents + countries

label_to_index = {label: i for i, label in enumerate(labels)}

fig = go.Figure(data=go.Sankey(
    node=dict(
        pad=20,
        thickness=20,
        label=labels, # crucial for colors and tooltips
    ),
    link=dict(
        source=[label_to_index[c] for c in df_eua["continent"]],
        target=[label_to_index[c] for c in df_eua["country"]],
        value=df_eua["pop"].tolist(),
    )
))

fig.update_layout(
    title="Sankey diagram example: Continent and Country populations",
    font_size=12
)

fig.show()

In `go.Sankey()`, the network structure is represented by the `link=` parameter. Under `node=`, the `labels=` parameter is crucial for the labels and colors to work.

### Exercise (Graph Objects)

Construct a hierarchy as follows.

*   The root node is Europe.
*   The children of the root should be all European countries except V4 countries (shown below), plus a single V4 node.
*   The children of V4 should be the V4 countries.

Display the total population in 2007 as a sankey and a treemap diagram, as two subplots.

In [None]:
V4_countries = ["Hungary", "Poland", "Slovak Republic", "Czech Republic"]