<a href="https://colab.research.google.com/github/qasraposas01-alt/CPE-031-Visualization-and-Data-Analysis/blob/main/Hands_On_Activity_14___Telling_the_Truth_with_Data_Visualization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Hands-On Activity 14 | Telling the Truth with Data Visualization**





---



Name : Raposas, Alykyne S.<br>
Course Code and Title : CPE031 Visualization and Data Analysis<br>
Date Submitted : 11/13/2025<br>
Instructor : Maria Rizette Sayo


---



**1. Objectives:**

This activity aims to demonstrate students’ ability to visualize data truthfully and ethically. Students will identify missing or biased data, correct misleading visualizations, and apply techniques to ensure integrity in data presentation.

**2. Intended Learning Outcomes (ILOs):**

By the end of this activity, students should be able to:

1. Analyze datasets to detect missing values, errors, and biases.

2. Evaluate the accuracy and fairness of different data visualization designs.

3. Create ethical and truthful charts by correcting deceptive visualizations.

**3. Discussions:**

Telling the truth with data visualization means ensuring that every visual accurately represents the data and context without distortion.
Misleading charts can manipulate interpretation through poor scaling, selective data, or biased representation.

Missing Data and Data Errors:
Missing values or outliers can lead to incorrect conclusions if ignored. Visualizations should either indicate missing data or use methods like interpolation or removal.

Biased Data:
Data can be biased through selection bias (only certain data is collected) or survivor bias (excluding failures or dropouts). Identifying these biases prevents misleading visuals.

Adjusting for Inflation:
When comparing values over time (e.g., prices, income), data should be adjusted for inflation to reflect real value changes.

Deceptive Design:
Visualization design choices such as truncated axes, dual-axis charts, or selective time frames can distort perception. Ethical visualization maintains consistent scales and transparency.

**4. Procedures:**

Step 1: Import Libraries

In [None]:
!pip install pandas plotly numpy
import pandas as pd
import numpy as np
import plotly.express as px



Step 2: Create a Sample Dataset

This dataset simulates product prices, sales, and inflation across years.

In [None]:
# Sample data
years = np.arange(2015, 2025)
data = {
    "Year": years,
    "Sales": [120, 130, 150, 170, 200, np.nan, 240, 260, 290, 320],
    "Price": [50, 52, 55, 57, 60, 63, 65, 70, 75, 78],
    "InflationRate": [1.02, 1.03, 1.01, 1.05, 1.04, 1.03, 1.02, 1.03, 1.02, 1.02]
}

df = pd.DataFrame(data)
df.head()

Unnamed: 0,Year,Sales,Price,InflationRate
0,2015,120.0,50,1.02
1,2016,130.0,52,1.03
2,2017,150.0,55,1.01
3,2018,170.0,57,1.05
4,2019,200.0,60,1.04


Step 3: Identify Missing Data and Errors

In [None]:
# Check missing and invalid data
print("Missing Data per Column:")
print(df.isna().sum())

# Fill or interpolate missing sales values
df["Sales"] = df["Sales"].interpolate()
df

Missing Data per Column:
Year             0
Sales            1
Price            0
InflationRate    0
dtype: int64


Unnamed: 0,Year,Sales,Price,InflationRate
0,2015,120.0,50,1.02
1,2016,130.0,52,1.03
2,2017,150.0,55,1.01
3,2018,170.0,57,1.05
4,2019,200.0,60,1.04
5,2020,220.0,63,1.03
6,2021,240.0,65,1.02
7,2022,260.0,70,1.03
8,2023,290.0,75,1.02
9,2024,320.0,78,1.02


Step 4: Adjust for Inflation

In [None]:
# Adjust sales for inflation
df["Adjusted_Sales"] = df["Sales"] / df["InflationRate"].cumprod()
fig = px.line(df, x="Year", y=["Sales", "Adjusted_Sales"],
              title="Sales Over Time (Adjusted for Inflation)",
              labels={"value": "Sales", "variable": "Metric"})
fig.show()

Step 5: Demonstrate Deceptive Design

Bad Example (Truncated Axis):

In [None]:
bad_chart = px.bar(df, x="Year", y="Sales", title="Deceptive Chart (Truncated Axis)")
bad_chart.update_yaxes(range=[150, 350])  # starts too high
bad_chart.show()

Good Example (Honest Axis):

In [None]:
good_chart = px.bar(df, x="Year", y="Sales", title="Truthful Chart (Proper Scale)")
good_chart.update_yaxes(range=[0, 350])
good_chart.show()

**Task 1:** Handling Missing and Erroneous Data

Identify missing or inconsistent data points in your own dataset (or this one).

Apply at least one correction method (interpolation, imputation, or exclusion).

Visualize the corrected dataset.

In [1]:
# Import Libraries
import pandas as pd
import numpy as np
import plotly.express as px

# Sample dataset
years = np.arange(2015, 2025)
data = {
    "Year": years,
    "Sales": [120, 130, 150, 170, 200, np.nan, 240, 260, 290, 320],
    "Price": [50, 52, 55, 57, 60, 63, 65, 70, 75, 78],
    "InflationRate": [1.02, 1.03, 1.01, 1.05, 1.04, 1.03, 1.02, 1.03, 1.02, 1.02]
}

df = pd.DataFrame(data)

# Check missing data
print("Missing Data per Column:")
print(df.isna().sum())

# Interpolate missing sales
df["Sales"] = df["Sales"].interpolate()

# Visualize corrected data
fig = px.line(df, x="Year", y="Sales", title="Sales After Interpolation (Corrected)")
fig.show()


Missing Data per Column:
Year             0
Sales            1
Price            0
InflationRate    0
dtype: int64


**Task 2:** Detecting and Correcting Bias

Create or simulate a biased dataset (e.g., only showing top-performing products or regions).

1. Visualize the biased data.

2. Then, include the full dataset and create a truthful comparison chart.

3. Briefly explain how bias affected interpretation.

In [2]:
# Biased dataset – only high sales
biased_df = df[df["Sales"] > 200]

# Visualize biased data
fig_biased = px.bar(biased_df, x="Year", y="Sales", title="Biased Visualization (Only High Sales Shown)")
fig_biased.show()

# Truthful visualization – full dataset
fig_truthful = px.bar(df, x="Year", y="Sales", title="Truthful Visualization (All Sales Shown)")
fig_truthful.show()

# Explanation
print("Explanation: The biased chart hides early low sales years, creating a false impression of continuous success.")


Explanation: The biased chart hides early low sales years, creating a false impression of continuous success.


**Task 3:** Deceptive vs. Truthful Visualization

Create one misleading chart using axis manipulation or selective data range.

Create a corrected version that shows the same data honestly.

Explain the difference in interpretation between the two visuals.

In [3]:
# Deceptive chart – truncated y-axis and selective years
fig_deceptive = px.line(df, x="Year", y="Sales", title="Deceptive Trend")
fig_deceptive.update_xaxes(range=[2020, 2025])
fig_deceptive.update_yaxes(range=[250, 320])
fig_deceptive.show()

# Truthful chart – full range
fig_truthful2 = px.line(df, x="Year", y="Sales", title="Truthful Trend (Full Data Range)")
fig_truthful2.update_yaxes(range=[0, 350])
fig_truthful2.show()

print("Explanation: The deceptive chart exaggerates growth, whereas the truthful chart shows the full context accurately.")


Explanation: The deceptive chart exaggerates growth, whereas the truthful chart shows the full context accurately.




---


**5. Supplementary Activity:**

Visual Truth Challenge

Create a small project where you visualize a real-world dataset (e.g., population, income, environmental data).

1. Detect and correct at least two forms of distortion (missing data, bias, or misleading scaling).

2. Annotate your charts with titles and labels explaining your corrections.

3. Reflect on how ethical visualization improves trust and understanding.

**6. Conclusion/Learnings/Analysis:**

In this activity, I learned that truthful visualization is more than plotting data — it requires ethical decisions.
on data handling and design: missing values, biased sampling, and design tricks such as truncated axes.
can distort perception. Correcting errors and maintaining consistent scales ensure clearer, more trustworthy
visualizations. Ethics in visuals enhance comprehension among the target audiences of the messages and enable them to make appropriate decisions.
