<a href="https://colab.research.google.com/github/jamierosereyes/CPE-031-Visualizations-and-Data-Analysis/blob/main/Hands_On_Activity_14___Telling_the_Truth_with_Data_Visualization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Hands-On Activity 14 | Telling the Truth with Data Visualization**





---



Name : Reyes, Jamie Rose Kia M. <br> Tamondong, Ivan Rex B. <br>
Course Code and Title : CPE 031 Visualization and Data Analysis<br>
Date Submitted : November 13, 2025<br>
Instructor : Engr. Rizzete M. Sayo


---



**1. Objectives:**

This activity aims to demonstrate studentsâ€™ ability to visualize data truthfully and ethically. Students will identify missing or biased data, correct misleading visualizations, and apply techniques to ensure integrity in data presentation.

**2. Intended Learning Outcomes (ILOs):**

By the end of this activity, students should be able to:

1. Analyze datasets to detect missing values, errors, and biases.

2. Evaluate the accuracy and fairness of different data visualization designs.

3. Create ethical and truthful charts by correcting deceptive visualizations.

**3. Discussions:**

Telling the truth with data visualization means ensuring that every visual accurately represents the data and context without distortion.
Misleading charts can manipulate interpretation through poor scaling, selective data, or biased representation.

Missing Data and Data Errors:
Missing values or outliers can lead to incorrect conclusions if ignored. Visualizations should either indicate missing data or use methods like interpolation or removal.

Biased Data:
Data can be biased through selection bias (only certain data is collected) or survivor bias (excluding failures or dropouts). Identifying these biases prevents misleading visuals.

Adjusting for Inflation:
When comparing values over time (e.g., prices, income), data should be adjusted for inflation to reflect real value changes.

Deceptive Design:
Visualization design choices such as truncated axes, dual-axis charts, or selective time frames can distort perception. Ethical visualization maintains consistent scales and transparency.

**4. Procedures:**

Step 1: Import Libraries

In [1]:
!pip install pandas plotly numpy
import pandas as pd
import numpy as np
import plotly.express as px



Step 2: Create a Sample Dataset

This dataset simulates product prices, sales, and inflation across years.

In [2]:
# Sample data
years = np.arange(2015, 2025)
data = {
    "Year": years,
    "Sales": [120, 130, 150, 170, 200, np.nan, 240, 260, 290, 320],
    "Price": [50, 52, 55, 57, 60, 63, 65, 70, 75, 78],
    "InflationRate": [1.02, 1.03, 1.01, 1.05, 1.04, 1.03, 1.02, 1.03, 1.02, 1.02]
}

df = pd.DataFrame(data)
df.head()

Unnamed: 0,Year,Sales,Price,InflationRate
0,2015,120.0,50,1.02
1,2016,130.0,52,1.03
2,2017,150.0,55,1.01
3,2018,170.0,57,1.05
4,2019,200.0,60,1.04


Step 3: Identify Missing Data and Errors

In [3]:
# Check missing and invalid data
print("Missing Data per Column:")
print(df.isna().sum())

# Fill or interpolate missing sales values
df["Sales"] = df["Sales"].interpolate()
df

Missing Data per Column:
Year             0
Sales            1
Price            0
InflationRate    0
dtype: int64


Unnamed: 0,Year,Sales,Price,InflationRate
0,2015,120.0,50,1.02
1,2016,130.0,52,1.03
2,2017,150.0,55,1.01
3,2018,170.0,57,1.05
4,2019,200.0,60,1.04
5,2020,220.0,63,1.03
6,2021,240.0,65,1.02
7,2022,260.0,70,1.03
8,2023,290.0,75,1.02
9,2024,320.0,78,1.02


Step 4: Adjust for Inflation

In [4]:
# Adjust sales for inflation
df["Adjusted_Sales"] = df["Sales"] / df["InflationRate"].cumprod()
fig = px.line(df, x="Year", y=["Sales", "Adjusted_Sales"],
              title="Sales Over Time (Adjusted for Inflation)",
              labels={"value": "Sales", "variable": "Metric"})
fig.show()

Step 5: Demonstrate Deceptive Design

Bad Example (Truncated Axis):

In [5]:
bad_chart = px.bar(df, x="Year", y="Sales", title="Deceptive Chart (Truncated Axis)")
bad_chart.update_yaxes(range=[150, 350])  # starts too high
bad_chart.show()

Good Example (Honest Axis):

In [6]:
good_chart = px.bar(df, x="Year", y="Sales", title="Truthful Chart (Proper Scale)")
good_chart.update_yaxes(range=[0, 350])
good_chart.show()

**Task 1:** Handling Missing and Erroneous Data

Identify missing or inconsistent data points in your own dataset (or this one).

Apply at least one correction method (interpolation, imputation, or exclusion).

Visualize the corrected dataset.

In [7]:
fig_corrected_sales = px.line(df, x="Year", y="Sales", title="Corrected Sales Over Time (Missing Values Interpolated)",
              labels={"Sales": "Sales Value", "Year": "Year"})
fig_corrected_sales.show()

**Task 2:** Detecting and Correcting Bias

Create or simulate a biased dataset (e.g., only showing top-performing products or regions).

1. Visualize the biased data.

2. Then, include the full dataset and create a truthful comparison chart.

3. Briefly explain how bias affected interpretation.

In [8]:
# Simulate a biased dataset: only show years with higher sales
median_sales = df['Sales'].median()
biased_df = df[df['Sales'] >= median_sales].copy()

# 1. Visualize the biased data
fig_biased = px.bar(biased_df, x='Year', y='Sales', title='Biased Sales Data (Showing only higher sales)',
                    labels={'Sales': 'Sales Value', 'Year': 'Year'})
fig_biased.show()

# 2. Visualize the full dataset for a truthful comparison
fig_truthful = px.bar(df, x='Year', y='Sales', title='Truthful Sales Data (All Years)',
                      labels={'Sales': 'Sales Value', 'Year': 'Year'})
fig_truthful.show()

**Task 3:** Deceptive vs. Truthful Visualization

Create one misleading chart using axis manipulation or selective data range.

Create a corrected version that shows the same data honestly.

Explain the difference in interpretation between the two visuals.

In [9]:
# Create a misleading chart (truncated y-axis)
misleading_chart = px.bar(df, x='Year', y='Sales', title='Misleading Chart: Truncated Y-Axis for Sales')
misleading_chart.update_yaxes(range=[df['Sales'].min() - 50, df['Sales'].max() + 50]) # Start the axis higher than 0
misleading_chart.show()

# Create a truthful chart (proper y-axis starting from 0)
truthful_chart = px.bar(df, x='Year', y='Sales', title='Truthful Chart: Sales with Proper Y-Axis')
truthful_chart.update_yaxes(range=[0, df['Sales'].max() + 50]) # Ensure axis starts at 0
truthful_chart.show()



---


**5. Supplementary Activity:**

Visual Truth Challenge

Create a small project where you visualize a real-world dataset (e.g., population, income, environmental data).

1. Detect and correct at least two forms of distortion (missing data, bias, or misleading scaling).

2. Annotate your charts with titles and labels explaining your corrections.

3. Reflect on how ethical visualization improves trust and understanding.

In [17]:
import plotly.express as px

# 1. Create a misleading line chart for the 'Price' column with a truncated y-axis
misleading_price_chart = px.line(df, x='Year', y='Price', title='Misleading Price Chart (Truncated Y-Axis)')
misleading_price_chart.update_yaxes(range=[df['Price'].min() - 5, df['Price'].max() + 5])
misleading_price_chart.show()

# 2. Create a truthful line chart for the 'Price' column with a proper y-axis starting at zero
truthful_price_chart = px.line(df, x='Year', y='Price', title='Truthful Price Chart (Proper Y-Axis)')
truthful_price_chart.update_yaxes(range=[0, df['Price'].max() + 5])
truthful_price_chart.show()

The 'Misleading Price Chart (Truncated Y-Axis)' makes price changes look larger than they really are because the y-axis doesnâ€™t start at zero. This can make small changes seem more significant and give viewers the wrong idea about the actual trend.

The 'Truthful Price Chart (Proper Y-Axis)' fixes this by starting the y-axis at zero. This shows the full range of prices and gives an accurate view of how much prices actually changed over time. Viewers can now see the real size of the changes and wonâ€™t be misled by the chart.

Altering chart axes like this can be misleading and even unethical. Even if unintentional, it can cause people to misinterpret data and make poor decisions. Charts should show data clearly and accurately, and starting the y-axis at zero is an important rule for fair and honest visualization.

In [18]:
fig_biased.show()

In [19]:
fig_truthful.show()

The 'Biased Sales Data (Showing only higher sales)' chart gives a distorted view of sales. By leaving out years with lower sales (below the median), it makes it look like sales are always high or growing quickly. This is a type of selection bias, where only certain data is chosen to make the results look better than they really are.

The 'Truthful Sales Data (All Years)' chart shows all the sales data, including the lower-performing years. This gives the full picture of how sales actually changed over time, including any drops or slower growth periods.

Selection bias can seriously affect how people interpret data. The biased chart might mislead stakeholders or investors, leading them to make decisions based on an overly positive view, like overinvesting or having unrealistic expectations. The truthful chart helps people understand the real situation and make better, more informed decisions. This shows why itâ€™s important to present complete and unbiased data â€” it ensures the analysis is honest and reliable, not just showing the data that looks good.

**Reflection: **<br> Fixing distortions like misleading axes or selection bias is important for building trust in data visualizations. Charts with truncated axes, such as the 'Misleading Price Chart' and 'Misleading Sales Chart,' can make changes appear larger than they really are, while starting axes at zero, as in the 'Truthful Price Chart' and 'Truthful Sales Chart,' provides an honest view of the data. Selection bias, like in the 'Biased Sales Data,' can also make results seem better than reality, and showing all the data, as in the 'Truthful Sales Data (All Years),' gives a complete and accurate picture. Correcting these issues builds trust because viewers can rely on the charts to make informed decisions. Key lessons for ethical visualization include showing the full context, reflecting numbers honestly, avoiding selection bias, making charts easy to understand, and using the influence of visuals responsibly. Ethical visualization ensures data is clear, accurate, and useful for decision-making.


6. Conclusion/Learnings/Analysis:

In conclusion, ethical data visualization is essential for accurately communicating information and supporting sound decision-making. By identifying missing data, correcting errors, and addressing biases, we ensure that charts reflect the true story behind the numbers. Adjusting for factors like inflation and avoiding deceptive design choices, such as truncated axes or selective time frames, helps maintain transparency and fairness. Overall, creating truthful and ethical visualizations allows viewers to interpret data correctly, promotes trust, and ensures that decisions are based on reliable and complete information.