# CSE 512 - Perception Exercise

## Instructions
We'll be working in teams of 2-3 for this exercise. Create a fork of this notebook by clicking on the fork button above (next to the share button). You will use your group's notebook to keep track of your work. If you would like to allow multiple users to edit the notebook, you can use the share menu (accessed through the share button) to add users and give them edit access.

Next, click on the share button in your fork and select the option to save it as 'unlisted'. This will only allow people with access to the link to view the notebook.

**Note**: Make sure you submit your group's notebook *before* the end of lecture!

## Activities

In this exercise, we will be reviewing and re-designing visualizations with respect to their perceptual effectiveness. 

**Activity 1 - Evaluate Current Visualizations (10 min)**: Review the visualizations below in your groups and write down what you think about their design choices in terms of graphical perception.

**Activity 2 - Re-Design One Visualization (30 min)**: Design your own visualization to address one or more of the limitations you pointed out in the first activity.

**Activity 3 - Get Feedback (20 min)**: Share out your designs with as many fellow students as possible in class. Get "votes" on which designs seem the most effective. Among the old and new designs, which design was the "winner" in terms of perceptual effectiveness? Why do you think this was the case?

**Activity 4 - Reflection (5 min)**: Reflect on what you learned from this exercise. What are some useful takeaways?

In [2]:
import altair as alt
import pandas as pd

### Dataset Story

Here's the (fictional) story behind our dataset for this activity: Maria, Glendale, Richa, and Michael became friends in their first year of college, and are all graduating with bachelors degrees in business and finance. In their freshman year, they made a silly bet that they could beat each other with their own hand-picked stock portfolios. They recorded the stock prices at the time (closing stock prices on November 8, 2019), and vowed to compare their results four years later. The big day has finally come!

It's our job to **visualize their results to see who won the bet!**

For the curious, here are some notes on each friend's stock selections:
* Maria went for general brand recognition (with some biases in favor of her own lifestyle)
* Glendale, being a CS double major, feels that the US economy is heavily focused on tech, so she favored tech stocks
* Richa believes the money movers are the real money makers, and picked only finance-related stocks
* Michael---a self-professed "true Washingtonian"---chose to place his faith in some of the biggest companies in the great state of Washington.

### Dataset Details

This dataset has six attributes:
* **Investor**: the portfolio owner.
* **Company**: the name of the company whose stocks were "purchased".
* **Symbol**: the stock symbol for the company.
* **Shares**: How many shares of the stock were "purchased".
* **Price**: the closing price for this stock on the given date.
* **Date**: The date the stock price was recorded (Nov 8 2019 and Nov 8 2022).

A google sheets version of the data can be seen [here](https://docs.google.com/spreadsheets/d/1GpZfguw-Dx6UPsqDAAgsh8lNW5gsKLEbiwYGhZLZrTk/edit?usp=sharing).

In [3]:
data_url = "https://docs.google.com/spreadsheets/d/e/2PACX-1vQSz2gGUW3Sgq0hrcpaKM3ToNgEGImWgwJthfp3b9OUmxiEcEN6kMZygz1HZuq1cbbPcxcWXGUicurY/pub?gid=0&single=true&output=csv"

portfolio_data = pd.read_csv(data_url)

In [4]:
portfolio_data["dollar_value_invested"] = portfolio_data["Shares"] * portfolio_data["Price"]
portfolio_data["Date"] = pd.to_datetime(portfolio_data["Date"])
portfolio_data.groupby(
    [pd.Grouper(key="Date", freq="YE"), "Investor"],
    as_index=False,
).sum(
    "dollar_value_invested"
)

Unnamed: 0,Date,Investor,Shares,Price,dollar_value_invested
0,2019-12-31,Glendale,100,1301.18,13621.57
1,2019-12-31,Maria,100,1135.79,12665.3
2,2019-12-31,Michael,100,1316.84,13168.4
3,2019-12-31,Richa,100,1858.22,16584.22
4,2023-12-31,Glendale,100,2463.08,24213.54
5,2023-12-31,Maria,100,1536.92,18103.39
6,2023-12-31,Michael,100,1788.2,17882.0
7,2023-12-31,Richa,100,2581.3,22528.2


In [5]:
portfolio_data.groupby(
    [pd.Grouper(key="Date", freq="YE"), "Investor"],
    as_index=False,
).agg({"Shares": "sum"})

Unnamed: 0,Date,Investor,Shares
0,2019-12-31,Glendale,100
1,2019-12-31,Maria,100
2,2019-12-31,Michael,100
3,2019-12-31,Richa,100
4,2023-12-31,Glendale,100
5,2023-12-31,Maria,100
6,2023-12-31,Michael,100
7,2023-12-31,Richa,100


## Activity 1 (15 mins) - Evaluate Current Visualizations

Below are some visualizations of our target dataset. Discuss as a team and write down the strong points and weak points of the designs in terms of graphical perception. Use your knowledge of the lecture materials (videos, slides, readings) to guide your evaluations.

Also, think of what it means to ``win the bet.'' Should the winner make the most money? Or get the most value from their initial investments? You decide!

#### Definition of winning the bet:
Total percent increase in value invested.

### Visualization 1

This visualization shows the total value (shares * price) for each stock as a proportion of each investor's portfolio, separated by date.

Strengths:
* *Easy to see the distribution at a point in time for a particular person*
* *Tooltip is helpful to see more information*
* *They tried small multiples which enables comparison across additional dimensions*

Weaknesses:
*  *Too many color encodings, resulting in similar hues, this makes it confusing to compare across investors and hard to utilize the legend*
*  *Since it's comparing proportions it is confusing to see the change over time, it appears like the big purple investment went down for glendale but it went up but is just a smaller proportion of his larger total invested money*
* *Hard to compare size changes from one chart to another across time or investors.*

In [6]:
arc_chart = alt.Chart(portfolio_data).mark_arc().encode(
    row=alt.Row('Date:T'),
    column=alt.Column('Investor:N'),
    color=alt.Color('Symbol:N'),
    theta=alt.Theta('Total Value:Q'),
    tooltip=[
        alt.Tooltip('Symbol:N'),
        alt.Tooltip('Price:Q'),
        alt.Tooltip('Shares:Q'),
        alt.Tooltip('Total Value:Q')
    ]
).transform_calculate(
    **{"Total Value": "datum.Shares * datum.Price"}
).properties(
    width=150,
    height=200
)
arc_chart

### Visualization 2

This visualization shows the sum of the total stock value (shares * price) for each investor.

Strengths:
* *Good because in addition to the proportion you can see within a persons investments, you can compare their total values across persons*
* *Using the transform_calculate function effectively to generate the Total Value*

Weaknesses:
*  *Not actually representative of reality because they ignored the difference between the times and therefore it is double counting the investments that were held in 2019 and are still held in 2023*
*  *The tooltip shows that there are multiple sections of the bars for the same Ticker, which further indicates ignoring time and double counting*

In [7]:
bar_chart = alt.Chart(portfolio_data).mark_bar().encode(
    x=alt.X('Investor:N'),
    y=alt.Y('Total Value:Q', aggregate='sum'),
    color=alt.Color('Symbol:N'),
    tooltip=[
        alt.Tooltip('Symbol:N'),
        alt.Tooltip('Price:Q'),
        alt.Tooltip('Shares:Q'),
        alt.Tooltip('Total Value:Q')
    ]
).transform_calculate(
    **{"Total Value": "datum.Shares * datum.Price"}
)
bar_chart

### Visualization 3

This visualization shows the difference in the total value (shares * price) for each investor's entire portfolio over time.

Strengths:
* *This effectively shows us the change in total value for each indivisual person, easy to see difference between the heights of the two data points.*
* *Tooltip which gave more details about exact date and number of the Total Value*

Weaknesses:
*  *Hard to compare from one person to another, for example, Maria and Michael are almost indistinguishable.*
*  *It is not visually pleasing*
* *grid is not needed since there are only two dates involved easy to compare without a line chart*
* *Small multiples is working against them this time, they could have plotted all of the investment changes on one axis.*

In [8]:
line_chart = alt.Chart(portfolio_data).mark_line(point=True).encode(
    x=alt.X('Date:T'),
    y=alt.Y('Total Value:Q', aggregate='sum'),
    column=alt.Column('Investor:N'),
    tooltip=[
        alt.Tooltip('Date:T'),
        alt.Tooltip('Total Value:Q', aggregate='sum')
    ]
).transform_calculate(
    **{"Total Value": "datum.Shares * datum.Price"}
).properties(
    width=190,
    height=200
)
line_chart

## Activity 2 (15 mins) - Make a *Better* Visualization!

Now, create your own visualization addressing one or more of the weak points you discussed in Activity 1. Feel free to use whatever tool(s) you like to do your re-design(s).

You are welcome to reuse any of the code above, as well as any code examples provided through the course materials or online. You are also welcome to use Tableau Desktop, or a different tool as desired.

Please describe your rationale for why this visualization is better than the visualization(s) used above for reference.

For this, we wanted to show the percent change rather than aggregate sums so that we can truly see who won the bet rather than focusing on other things like the mutlitude of companies that they invested in. We also needed an actual numerical ranking so we did percent change next to the difference bar charts. We let the difference bar charts be bars because they are total values and decided for the calculated percents to be dots to indicate proportion.

In [17]:
bar_chart = alt.Chart(portfolio_data).mark_bar().encode(
    column=alt.Column('Investor:N'),
    x=alt.X('Date:N'), #, title=alt.Text('Date', format="%Y")),
    y=alt.Y('Total_Value:Q', aggregate='sum'),
    color=alt.Color('Symbol:N'),
    tooltip=[
        alt.Tooltip('Symbol:N'),
        alt.Tooltip('Price:Q'),
        alt.Tooltip('Shares:Q'),
        alt.Tooltip('Total Value:Q')
    ]
).transform_calculate(
    **{"Total Value": "datum.Shares * datum.Price"}
).transform_aggregate(
    Total_Value='sum(Total Value)',
    groupby=['Date', 'Investor']
).properties(
    width=100,
    height=400
)
bar_chart

In [18]:
portfolio_total_values = portfolio_data.groupby(
    [pd.Grouper(key="Date", freq="YE"), "Investor"],
    as_index=False,
).agg({"dollar_value_invested": "sum"})
portfolio_total_values[portfolio_total_values.Date.dt.year == 2023]["dollar_value_invested"].reset_index(drop=True)

0    24213.54
1    18103.39
2    17882.00
3    22528.20
Name: dollar_value_invested, dtype: float64

In [19]:
portfolio_total_values = portfolio_data.groupby(
    [pd.Grouper(key="Date", freq="YE"), "Investor"],
    as_index=False,
).agg({"dollar_value_invested": "sum"})
portfolio_total_values[portfolio_total_values.Date.dt.year == 2023].sort_values(by="Investor")

Unnamed: 0,Date,Investor,dollar_value_invested
4,2023-12-31,Glendale,24213.54
5,2023-12-31,Maria,18103.39
6,2023-12-31,Michael,17882.0
7,2023-12-31,Richa,22528.2


In [20]:
portfolio_total_values[portfolio_total_values.Date.dt.year == 2019].sort_values(by="Investor")

Unnamed: 0,Date,Investor,dollar_value_invested
0,2019-12-31,Glendale,13621.57
1,2019-12-31,Maria,12665.3
2,2019-12-31,Michael,13168.4
3,2019-12-31,Richa,16584.22


In [21]:
portfolio_total_values

Unnamed: 0,Date,Investor,dollar_value_invested
0,2019-12-31,Glendale,13621.57
1,2019-12-31,Maria,12665.3
2,2019-12-31,Michael,13168.4
3,2019-12-31,Richa,16584.22
4,2023-12-31,Glendale,24213.54
5,2023-12-31,Maria,18103.39
6,2023-12-31,Michael,17882.0
7,2023-12-31,Richa,22528.2


In [22]:
investors_percent_change = portfolio_total_values.Investor.unique()
investors_percent_change
investors_percent_change = pd.DataFrame(investors_percent_change, columns=["Investor"])
investors_percent_change["Percent Change"] = 0

In [23]:
investors_percent_change["Percent Change"] = (
    portfolio_total_values[portfolio_total_values.Date.dt.year == 2023].sort_values(by="Investor")["dollar_value_invested"].reset_index(drop=True)
    - portfolio_total_values[portfolio_total_values.Date.dt.year == 2019].sort_values(by="Investor")["dollar_value_invested"].reset_index(drop=True)
) / portfolio_total_values[portfolio_total_values.Date.dt.year == 2019].sort_values(by="Investor")["dollar_value_invested"].reset_index(drop=True)

In [24]:
percent_change = alt.Chart(investors_percent_change).mark_circle().encode(
    x=alt.X('Investor:N'),
    y=alt.Y('Percent Change:Q'),
    tooltip=[
        alt.Tooltip('Investor:N'),
        alt.Tooltip('Percent Change:Q')
    ]
).properties(
    width=300,
    height=400
)
(bar_chart | percent_change)

 ## Activity 3 (20 mins) - Get Feedback and Iterate

Share your design(s) with another group in the class. For example, maybe the people sitting next to you or in the next row. Ask them to compare it with the original visualization from Activity 1, and share their response. What works? What could be improved? (And similarly, do the same for their design.)

Then, iterate on your design incorporating any peer feedback.

In [25]:
percent_change2 = alt.Chart(investors_percent_change).mark_circle().encode(
    x=alt.X('Investor:N'),
    y=alt.Y('Percent Change:Q'),
    tooltip=[
        alt.Tooltip('Investor:N'),
        alt.Tooltip('Percent Change:Q')
    ]
).properties(
    width=300,
    height=400,
    title = "Percent Change in Dollar Value Invested (2019-2023)"
)
(bar_chart | percent_change2).properties(title = alt.Title("Who Won the Bet?", fontSize = 40))

* The small multiples are good. I think maybe the labelling could use some work and maybe a title to explain what we are seeing.

## Activity 4 (5 mins) - Reflection

Reflect on what you learned from this exercise. What are some useful takeaways?

* Taking a little bit of thought beforehand can help inform better encodings that are easer for the viewer to percieve the message and main points you are trying to convey.