<a href="https://colab.research.google.com/github/mtazike/Visualization_Design_Exercise/blob/main/Week04.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Plotly and Color

In this exercise, we will explore the effects of different color maps on a visualization.

In [None]:
import pandas as pd
import plotly.express as px

In [3]:
import pandas as pd

# paste your URL here
url = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vQkC5sLOdpoyzxkMm3ax22OZIKZ99kUBa8AuiJG2xGSCnwgX28xSkoF6fCoR2WRyE0WTz4m-kQESChv/pub?gid=1808016370&single=true&output=csv'
who_df = pd.read_csv(url)
who_df.head()

Unnamed: 0,Country (location),ISO code,region,income group,year,Health Exp. (% of GDP),Health Exp. per Capita (USD),Gov. Health Exp. (USD),Private Health Exp. (USD),Out-of-Pocket Exp. per Capita (USD),"Gov. Health Exp. per Capita (USD, 2022 prices)",Value,Category_highlight
0,Algeria,DZA,AFR,Lower-middle,2000,3.214854,61.857853,103533.985,40261.19922,1485.909342,1022.24963,False,other
1,Algeria,DZA,AFR,Lower-middle,2001,3.536286,67.058594,123663.777,38492.03125,1646.495321,1146.437871,False,other
2,Algeria,DZA,AFR,Lower-middle,2002,3.441696,66.681633,126996.8608,41630.37109,1724.133123,1331.83535,False,other
3,Algeria,DZA,AFR,Lower-middle,2003,3.325694,75.951309,145057.4834,43985.0,1689.917331,1164.169817,False,other
4,Algeria,DZA,AFR,Lower-middle,2004,3.290305,92.68763,155499.6782,62326.91406,1676.443072,1202.531803,False,other


#### IMPORTANT NOTE about data in Plotly

<font color='darkred'>By *default*, **each row of data is bound to a visual element in Plotly**.</font> This applies to every visualization you'll make in this class. So, if you have 20 rows of data, a `bar` plot will have 20 different "bars" stacked on one another. If you have a particular visualization in mind, make sure that the rows of your data are defined appropriately.



# Exercises

First, explore your data in Google Sheets or here in pandas.

1. Select one row of data that you find to be particularly interesting, and create a new column (in Google Sheets, or in pandas) that "highlights" that row. So, maybe the value in the column is TRUE at that row, and FALSE otherwise.
2. Select one category of data you find to be worth highlighting. Create a new column which contains one of two values: "\<that category value\>" or "Other", accordingly.

Second, **skim through** the [continuous](https://plotly.com/python/builtin-colorscales/) and [discrete](https://plotly.com/python/discrete-color/) color options in Plotly. *Note: we will cover color in Plotly in-depth later on, so don't spend more than a few minutes on this!*


## EXERCISE 1

Think of a question that would be best answered using a scatterplot, and **share your question here.**

1. **Build a [scatter plot](https://plotly.com/python/line-and-scatter/)** using your data which addresses your question, and change the color of the marker for the row you highlighted. Which color did you choose, and why? *Note: You can change colormaps using the [built-in styling options](https://plotly.com/python/styling-plotly-express/) in Plotly Express.*

In [65]:
import plotly.express as px

fig1a = px.scatter(
    who_df,
    x="year",
    y="Health Exp. per Capita (USD)",
    color="Value",  # highlighted row
    hover_name="Country (location)",
    color_discrete_map={True: "red", False: "blue"},
    title="Health Expenditure per Capita Over Time (highlighted row)"
)
fig1a.show()

**Question**: How has health expenditure per capita changed over time across different countries?

I chose **red for the highlighted row** and **blue for the rest**. Red and blue are processed on different opponent channels (red–green and blue–yellow), which creates strong perceptual contrast. This makes the highlighted row stand out clearly from the rest of the data, which aligns with the principle that important information should be easy to notice at a glance.

2. Now duplicate that scatterplot, but **color your data** based on the category you selected above. What color scheme did you choose, and why?

In [64]:
fig1b = px.scatter(
    who_df,
    x="year",
    y="Health Exp. per Capita (USD)",
    color="Category_highlight",   # must match your sheet exactly
    hover_name="Country (location)",
    color_discrete_map={"High income": "green", "other": "gray"},  # match case of values
    opacity=0.6,
    title="Health Expenditure per Capita: High income vs Other"
)

fig1b.update_traces(marker=dict(size=6))
fig1b.update_yaxes(range=[0, 8000])  # change range

fig1b.show()


I used **green for High income** and **gray for Other**. Green is a vivid, saturated color that emphasizes the group of interest, while gray is low saturation and allows the background data to recede. This makes the differences easier to interpret and follows the principle of saturation contrast: focus categories should be highlighted with vivid colors, while contextual data should be shown in muted tones.

## EXERCISE 2

Think about a question (re: your data) that involves time, and share your question here. E.g., _"How does \____ change over time?"_

1. Select a continuous *and* a categorical column that are both related to this question.
2. Build a [line plot](https://plotly.com/python/line-charts/) with time on the x-axis, and the continuous variable on the y-axis.
3. Color different lines using the categorical column, and try at least 3 different color maps. Let your preferred color map be the last one.

Critique your preferred visualization. What are some notable issues with color, especially?

In [74]:
import plotly.express as px

fig2a = px.line(
    who_df,
    x="year",
    y="Health Exp. per Capita (USD)",
    color="income group",
    line_group="Country (location)",
    title="  Health Expenditure per Capita Over Time by Income Group (Set1)",
    color_discrete_sequence=px.colors.qualitative.Set1
)
fig2a.show()

In [73]:
import plotly.express as px

fig2b = px.line(
    who_df,
    x="year",
    y="Health Exp. per Capita (USD)",
    color="income group",
    line_group="Country (location)",
    title="  Health Expenditure per Capita Over Time by Income Group (Pastel)",
    color_discrete_sequence=px.colors.qualitative.Pastel
)
fig2b.show()

In [72]:
import plotly.express as px

fig2c = px.line(
    who_df,
    x="year",
    y="Health Exp. per Capita (USD)",
    color="income group",
    line_group="Country (location)",
    title="  Health Expenditure per Capita Over Time by Income Group (Dark24)",
    color_discrete_sequence=px.colors.qualitative.Dark24
)
fig2c.show()

**Question:** How does health expenditure per capita change over time for different income groups?

I tested three colormaps: Set1, Pastel, and Dark24. I prefer Set1 because it uses vivid, distinct colors that clearly separate the four income groups. Pastel was too light, making overlapping lines hard to see. Dark24 had high contrast but included some colors that were too similar when plotted as thin lines. A limitation of Set1 is that it may not be fully colorblind-safe, so combining line style (dash patterns) with color could improve accessibility.

## EXERCISE 3

Think of a question that would be best answered using a [density heatmap](https://plotly.com/python/2D-Histogram/), and share your question here.

Present a few different color map options, choose the most appropriate one in your opinion, and explain why you chose it.

In [69]:
import plotly.express as px

# Viridis colormap
fig3a = px.density_heatmap(
    who_df,
    x="year",
    y="Country (location)",
    z="Health Exp. per Capita (USD)",
    histfunc="avg",
    color_continuous_scale="Viridis",
    title="Health Expenditure per Capita by Country and Year (Viridis)"
)
fig3a.show()


In [70]:
import plotly.express as px

# Plasma colormap
fig3b = px.density_heatmap(
    who_df,
    x="year",
    y="Country (location)",
    z="Health Exp. per Capita (USD)",
    histfunc="avg",
    color_continuous_scale="Plasma",
    title="Health Expenditure per Capita by Country and Year (Plasma)"
)

fig3b.show()


In [71]:
import plotly.express as px

# Blues colormap
fig3c = px.density_heatmap(
    who_df,
    x="year",
    y="Country (location)",
    z="Health Exp. per Capita (USD)",
    histfunc="avg",
    color_continuous_scale="Blues",
    title="Health Expenditure per Capita by Country and Year (Blues)"
)
fig3c.show()


**Question**: Which countries and years have the highest health expenditure per capita?

I tested three colormaps: **Viridis, Plasma, and Blues**.

*Viridis* worked best because it is perceptually uniform and colorblind-friendly, so increases in expenditure were consistently visible across the scale.

*Plasma* looked visually appealing but had very bright yellows at the high end, which made middle values harder to compare.

*Blues* was intuitive but lacked contrast at the high end.

I chose **Viridis** because it makes it easiest to identify which countries and years had the highest expenditures, while still being accessible.