<a href="https://colab.research.google.com/github/mtazike/Visualization_Design_Exercise/blob/main/Week_13_customization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lying with Data

In some cases, the default settings in Plotly visualizations may create a situation where the visualization is "lying" to the viewer/user. In this exercise, we'll explore a few different ways to customize your visualizations to keep that from happening.

<font color='darkred'>As usual, please install dash before running this notebook.</font>

In [19]:
!pip install "dash==3.2.0"



In [None]:
!pip install dash

Collecting dash
  Downloading dash-3.3.0-py3-none-any.whl.metadata (11 kB)
Collecting retrying (from dash)
  Downloading retrying-1.4.2-py3-none-any.whl.metadata (5.5 kB)
Downloading dash-3.3.0-py3-none-any.whl (7.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.9/7.9 MB[0m [31m45.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading retrying-1.4.2-py3-none-any.whl (10 kB)
Installing collected packages: retrying, dash
Successfully installed dash-3.3.0 retrying-1.4.2


In [20]:
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px

from dash import Dash, html, dcc, Input, Output, callback

# Exercises

First, load in your data in pandas as you usually would.

<font color='darkred'>**Again, your grade for this exercise will come from the app cell at the bottom of this notebook.**</font>

In [21]:
import pandas as pd

url = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vQkC5sLOdpoyzxkMm3ax22OZIKZ99kUBa8AuiJG2xGSCnwgX28xSkoF6fCoR2WRyE0WTz4m-kQESChv/pub?gid=1808016370&single=true&output=csv'
who_df = pd.read_csv(url)
who_df.head()

Unnamed: 0,Country (location),ISO code,region,income group,year,Health Exp. (% of GDP),Health Exp. per Capita (USD),Gov. Health Exp. (USD),Private Health Exp. (USD),Out-of-Pocket Exp. per Capita (USD),"Gov. Health Exp. per Capita (USD, 2022 prices)",Value,Category_highlight
0,Algeria,DZA,AFR,Lower-middle,2000,3.214854,61.857853,103533.985,40261.19922,1485.909342,1022.24963,False,other
1,Algeria,DZA,AFR,Lower-middle,2001,3.536286,67.058594,123663.777,38492.03125,1646.495321,1146.437871,False,other
2,Algeria,DZA,AFR,Lower-middle,2002,3.441696,66.681633,126996.8608,41630.37109,1724.133123,1331.83535,False,other
3,Algeria,DZA,AFR,Lower-middle,2003,3.325694,75.951309,145057.4834,43985.0,1689.917331,1164.169817,False,other
4,Algeria,DZA,AFR,Lower-middle,2004,3.290305,92.68763,155499.6782,62326.91406,1676.443072,1202.531803,False,other


## EXERCISE 1

1. Ask any interesting question which can be addressed with a visualization of your data. Share the question in your web app, and format the text as you like!
2. Build a "default" visualization which attempts to address this question using **one line** of Plotly Express code. Try to keep the code as simple and "default-y" as possible. Save the visualization as a `fig_...` object, and place it in the web app. Add a label/title for this visualization which indicates it is the "Default" visualization.
3. Add some text *beneath* this visualization that points out at least 3 problems with the default visualization, and why it might be seen as "lying" to the viewer.



In [22]:
df_2020 = who_df[who_df["year"] == 2020]
fig1 = px.bar(df_2020, x="income group", y="Health Exp. per Capita (USD)")
fig1.show()



<font color='darkblue'> **Explanation for Exercise 1:**

**Question**: *Which income group had the highest Health Expenditure per Capita in 2020?*

The y-axis automatically scales, which exaggerates the difference between income groups and may mislead the viewer. The labels and hover text use raw column names, making interpretation harder and less clear. There is no title or context, so the viewer cannot immediately understand what the chart is about.

## EXERCISE 2

1. Copy/Paste the visualization into a new `fig_...` object to improve on. *Note: The final web app for this exercise will have two visualizations: the default one and the improved one.*
2. Update the [tick marks](https://plotly.com/python/tick-formatting/) for your x or y axis (or both), and adjust the text formatting where appropriate. If you are using a non-axes visualization (e.g., choropleth), you can skip this step.
    - Refer to the [axes](https://plotly.com/python/axes/) documentation for Plotly, and make sure the axes in the visualization are as clear as possible. E.g., do you need a zeroline?
3. Adjust the hover text for your visualization. Try to make it informative for your question, but not too cluttered.
4. Use what you've learned so far in this class to make the necessary improvements to the visualization such that it is as clear as possible.
    - You may find yourself using Plotly documentation which was not covered in this class. That's okay!
5. Add some [shape-drawing buttons](https://plotly.com/python/configuration-options/#add-optional-shapedrawing-buttons-to-modebar) to your modebar, and remove any buttons you think are not needed.
6. Add some text beneath the new visualization which describes the visualization and discusses the takeaway message for the web app.

In [31]:

fig2 = px.bar(
    df_2020,
    x="income group",
    y="Health Exp. per Capita (USD)",
    title="Health Expenditure per Capita by Income Group (2020)"
)

# Start y-axis at zero to show true proportions
fig2.update_yaxes(
    range=[0, df_2020["Health Exp. per Capita (USD)"].max() * 1.1],  # Start at 0, go to 110% of max
    tickformat="$,.0f",
    title="Health Expenditure per Capita (USD)"
)
fig2.update_xaxes(title="Income Group")

# HOVER TEXT
fig2.update_traces(
    hovertemplate=
    "<b>Income Group:</b> %{x}<br>" +
    "<b>Health Spending:</b> %{y:$,.0f} per person<br>" +
    "<b>Year:</b> 2020<extra></extra>"
)

fig2.update_layout(
    title_x=0.5,
    font=dict(family="Arial", size=14),
    paper_bgcolor="white",
    plot_bgcolor="#fafafa"
)

<font color='darkblue'> **Explanation for Exercise 2:**

This improved chart uses a log scale because the spending differences between income groups are extremely large. On a normal axis, the lower-income groups look almost flat. The log scale makes all groups visible and easier to compare. The cleaner hover text also helps highlight the key point: higher-income countries spend far more per person on health care..

## DASH APP

---

<font color='darkblue'>**The cell below will be your "app cell".**</font>

- This is the cell that will be graded for this week's exercise.
- Any time you update code, re-run the cell to render changes in the app.
- Click the icon on the upper left corner of the output, and select "View output fullscreen". *Type **Esc** to return to the notebook.*

In [38]:
# Shape toolbar settings for the improved visualization
config2 = {
    "displaylogo": False,
    "modeBarButtonsToAdd": ["drawline", "drawrect"],
    "modeBarButtonsToRemove": ["lasso2d", "select2d"]
}

app = Dash("My visualization")

app.layout = html.Div(children=[

    html.H1("Don't Lie with Visualizations!"),

    # Exercise 1 – Default Visualization
    html.H2("Default Visualization"),

    html.H3("Which income group had the highest Health Expenditure per Capita in 2020?"),

    dcc.Graph(
        id="default-fig",
        figure=fig1
    ),

    html.P([
        html.B("Why this default visualization may be misleading:"), html.Br(),
        "- The y-axis scale hides differences between income groups.", html.Br(),
        "- Hover text uses raw column names and unformatted numbers.", html.Br(),
        "- The figure lacks a title inside the chart, reducing clarity."
    ]),

    html.Br(), html.Hr(), html.Br(),

    # Exercise 2 – Improved Visualization
    html.H2("Improved Visualization"),

    dcc.Graph(
        id="improved-fig",
        figure=fig2,
        config=config2
    ),

    html.P([
    html.B("Takeaway: "),
    "The improved visualization makes it easier to compare income groups by using a clear linear scale and cleaner hover text. ", html.Br(),
    "It shows that high-income countries spend far more per person on health care than all other groups, ", html.Br(),
    "but without hiding the smaller differences among the lower-income groups."
    ])
])


# Do not edit below this line (except jupyter_height)
if __name__ == '__main__':
    app.run(debug=True, jupyter_mode='inline', jupyter_height=1000)


<IPython.core.display.Javascript object>

*Note: If your cell output is stuck on "Loading ..." for more than a minute, you may need to reconnect/restart your Google Colab runtime.*

---