<a href="https://colab.research.google.com/github/qnamalabanan-netizen/CPE-031---Visualization-and-Data-Analysis/blob/main/Hands_On_Activity_14___Telling_the_Truth_with_Data_Visualization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Hands-On Activity 14 | Telling the Truth with Data Visualization**





---



Name : Nadine A. Malabanan <br>
Course Code and Title :CPE 031 - Visualization and Data Analysis <br>
Date Submitted : 9/11/2025 <br>
Instructor : Engr. Maria Rizette Sayo


---



**1. Objectives:**

This activity aims to demonstrate studentsâ€™ ability to visualize data truthfully and ethically. Students will identify missing or biased data, correct misleading visualizations, and apply techniques to ensure integrity in data presentation.

**2. Intended Learning Outcomes (ILOs):**

By the end of this activity, students should be able to:

1. Analyze datasets to detect missing values, errors, and biases.

2. Evaluate the accuracy and fairness of different data visualization designs.

3. Create ethical and truthful charts by correcting deceptive visualizations.

**3. Discussions:**

Telling the truth with data visualization means ensuring that every visual accurately represents the data and context without distortion.
Misleading charts can manipulate interpretation through poor scaling, selective data, or biased representation.

Missing Data and Data Errors:
Missing values or outliers can lead to incorrect conclusions if ignored. Visualizations should either indicate missing data or use methods like interpolation or removal.

Biased Data:
Data can be biased through selection bias (only certain data is collected) or survivor bias (excluding failures or dropouts). Identifying these biases prevents misleading visuals.

Adjusting for Inflation:
When comparing values over time (e.g., prices, income), data should be adjusted for inflation to reflect real value changes.

Deceptive Design:
Visualization design choices such as truncated axes, dual-axis charts, or selective time frames can distort perception. Ethical visualization maintains consistent scales and transparency.

**4. Procedures:**

Step 1: Import Libraries

In [11]:
!pip install pandas plotly numpy
import pandas as pd
import numpy as np
import plotly.express as px



Step 2: Create a Sample Dataset

This dataset simulates product prices, sales, and inflation across years.

In [12]:
# Sample data
years = np.arange(2015, 2025)
data = {
    "Year": years,
    "Sales": [120, 130, 150, 170, 200, np.nan, 240, 260, 290, 320],
    "Price": [50, 52, 55, 57, 60, 63, 65, 70, 75, 78],
    "InflationRate": [1.02, 1.03, 1.01, 1.05, 1.04, 1.03, 1.02, 1.03, 1.02, 1.02]
}

df = pd.DataFrame(data)
df.head()

Unnamed: 0,Year,Sales,Price,InflationRate
0,2015,120.0,50,1.02
1,2016,130.0,52,1.03
2,2017,150.0,55,1.01
3,2018,170.0,57,1.05
4,2019,200.0,60,1.04


Step 3: Identify Missing Data and Errors

In [13]:
# Check missing and invalid data
print("Missing Data per Column:")
print(df.isna().sum())

# Fill or interpolate missing sales values
df["Sales"] = df["Sales"].interpolate()
df

Missing Data per Column:
Year             0
Sales            1
Price            0
InflationRate    0
dtype: int64


Unnamed: 0,Year,Sales,Price,InflationRate
0,2015,120.0,50,1.02
1,2016,130.0,52,1.03
2,2017,150.0,55,1.01
3,2018,170.0,57,1.05
4,2019,200.0,60,1.04
5,2020,220.0,63,1.03
6,2021,240.0,65,1.02
7,2022,260.0,70,1.03
8,2023,290.0,75,1.02
9,2024,320.0,78,1.02


Step 4: Adjust for Inflation

In [14]:
# Adjust sales for inflation
df["Adjusted_Sales"] = df["Sales"] / df["InflationRate"].cumprod()
fig = px.line(df, x="Year", y=["Sales", "Adjusted_Sales"],
              title="Sales Over Time (Adjusted for Inflation)",
              labels={"value": "Sales", "variable": "Metric"})
fig.show()

Step 5: Demonstrate Deceptive Design

Bad Example (Truncated Axis):

In [15]:
bad_chart = px.bar(df, x="Year", y="Sales", title="Deceptive Chart (Truncated Axis)")
bad_chart.update_yaxes(range=[150, 350])  # starts too high
bad_chart.show()

Good Example (Honest Axis):

In [16]:
good_chart = px.bar(df, x="Year", y="Sales", title="Truthful Chart (Proper Scale)")
good_chart.update_yaxes(range=[0, 350])
good_chart.show()

**Task 1:** Handling Missing and Erroneous Data

Identify missing or inconsistent data points in your own dataset (or this one).

Apply at least one correction method (interpolation, imputation, or exclusion).

Visualize the corrected dataset.

In [17]:



import pandas as pd
import numpy as np
import plotly.express as px


years = np.arange(2015, 2025)
data = {
    "Year": years,
    "Sales": [120, 130, 150, 170, 200, np.nan, 240, 260, 290, 320],
    "Price": [50, 52, 55, 57, 60, 63, 65, 70, 75, 78],
    "InflationRate": [1.02, 1.03, 1.01, 1.05, 1.04, 1.03, 1.02, 1.03, 1.02, 1.02]
}

df = pd.DataFrame(data)


print("Missing Data per Column:")
print(df.isna().sum())


df["Sales"] = df["Sales"].interpolate()


fig = px.line(df, x="Year", y="Sales",
              title="Sales Over Time (Corrected for Missing Data)",
              labels={"Sales": "Sales"})
fig.show()


Missing Data per Column:
Year             0
Sales            1
Price            0
InflationRate    0
dtype: int64


**Task 2:** Detecting and Correcting Bias

Create or simulate a biased dataset (e.g., only showing top-performing products or regions).

1. Visualize the biased data.

2. Then, include the full dataset and create a truthful comparison chart.

3. Briefly explain how bias affected interpretation.

In [18]:


import pandas as pd
import numpy as np
import plotly.express as px

# Step 1: Simulate a biased dataset (only top 3 products by sales)
products = ["Product A", "Product B", "Product C", "Product D", "Product E"]
sales = [500, 450, 400, 200, 150]  # Notice only top performers stand out
df_full = pd.DataFrame({"Product": products, "Sales": sales})

# Biased dataset: only top 3 products
df_biased = df_full.nlargest(3, "Sales")

# Step 2: Visualize the biased data
fig_biased = px.bar(df_biased, x="Product", y="Sales",
                    title="Biased Dataset: Only Top-Performing Products")
fig_biased.show()

# Step 3: Visualize the full dataset (truthful comparison)
fig_full = px.bar(df_full, x="Product", y="Sales",
                  title="Truthful Dataset: All Products Included")
fig_full.show()


**Task 3:** Deceptive vs. Truthful Visualization

Create one misleading chart using axis manipulation or selective data range.

Create a corrected version that shows the same data honestly.

Explain the difference in interpretation between the two visuals.

In [19]:


import pandas as pd
import numpy as np
import plotly.express as px

# Sample dataset: Annual sales over 10 years
years = np.arange(2015, 2025)
sales = [120, 130, 150, 170, 200, 210, 240, 260, 290, 320]
df = pd.DataFrame({"Year": years, "Sales": sales})

# Step 1: Create a deceptive chart (truncated y-axis)
fig_deceptive = px.bar(df, x="Year", y="Sales",
                       title="Deceptive Chart (Truncated Axis)")
fig_deceptive.update_yaxes(range=[150, 350])  #the axis starts too high
fig_deceptive.show()

# Step 2: Create a truthful chart (honest y-axis)
fig_truthful = px.bar(df, x="Year", y="Sales",
                      title="Truthful Chart (Proper Scale)")
fig_truthful.update_yaxes(range=[0, 350])  # axis starting from 0
fig_truthful.show()




---


**5. Supplementary Activity:**

Visual Truth Challenge

Create a small project where you visualize a real-world dataset (e.g., population, income, environmental data).

1. Detect and correct at least two forms of distortion (missing data, bias, or misleading scaling).

2. Annotate your charts with titles and labels explaining your corrections.

3. Reflect on how ethical visualization improves trust and understanding.

In [20]:
# Supplementary Activity: Visual Truth Challenge

import pandas as pd
import numpy as np
import plotly.express as px

# Step 1: Create a dataset
cities = ["City A", "City B", "City C", "City D", "City E", "City F"]
population = [1200, 950, np.nan, 800, 600, 400]  # Missing data in City C
df = pd.DataFrame({"City": cities, "Population": population})

# Step 2: Detect missing data
print("Missing Data per Column:")
print(df.isna().sum())

# Step 3: Correct missing data
df["Population"] = df["Population"].interpolate()

# Step 4: Simulate bias
df_biased = df.nlargest(3, "Population")

# Step 5: Create misleading chart
fig_misleading = px.bar(df_biased, x="City", y="Population",
                        title="Misleading Chart: Top 3 Cities Only")
fig_misleading.update_yaxes(range=[800, 1300])
fig_misleading.show()

# Step 6: Create truthful chart
fig_truthful = px.bar(df, x="City", y="Population",
                      title="Truthful Chart: All Cities Included")
fig_truthful.update_yaxes(range=[0, 1300])
fig_truthful.show()

Missing Data per Column:
City          0
Population    1
dtype: int64


Reflection:
During this activity, I realized that missing data, like the population of City C, can make the visualization incomplete and confusing. By correcting it, the data becomes more accurate and easier to understand. I also learned that showing only the top-performing cities can give a misleading impression because it hides smaller cities. Including all the data makes the chart fair and honest. Lastly, using an honest scale on the axes helps avoid exaggerating differences, so people can see the real trends. Overall, presenting data truthfully builds trust and makes it easier for others to understand what the numbers really mean.

**6. Conclusion/Learnings/Analysis:**

Through this activity, I learned how important it is to show data honestly in visualizations. Fixing missing data, like filling in the population for City C, made the charts more complete and easier to understand. I also realized that showing only some of the data, like just the top-performing cities, can give a wrong impression and be misleading. Using proper axis scales and including all the information helps people see the real trends without exaggeration. Overall, I learned that ethical visualization is not just about making charts look niceâ€”it helps others trust the data and understand it correctly. Even small choices in designing charts can change how people interpret the information.