# Final Project for DTSA 5304 - Intro to Data Visualizations at the University of Colorado Masters of Data Science Degree

## This workbook will serve as an analysis of all Pokemon stats through Generation 6, which was released in 2013. 

## This analysis will be done in Python3, using the Altair library for the Visualizations

##Introduction

I have chosen a dataset of Pokemon statistics for those released in generations 1-6 of the video game franchise. 

The Pokemon (short for “Pocket Monsters”) video game series revolves around capturing and training fictional creatures called Pokemon to become skilled trainers. In the series, each Pokemon is given a base “type”, sometimes a secondary “type”, and a list of different attribute scores. These attributes are:
- HP: Hit Points, or Pokemon Health
- Attack: Used in calculation for physical moves damage
- Defense: Used in the calculation for physical moves damage.
- Special Attack: Used in the calculation for non-physical moves damage.
- Special Defense: Used in the calculation for non-physical moves damage.
- Speed: Used to determine the turn order in a Pokemon Battle.

Our dataset has the following fields:
- Number - each Pokemon has a specific identifying number through the series
- Name - Pokemon Name
- Type 1 - Primary type
- Type 2 - Secondary Type
- Total: Total Stat points given through the 6 Pokemon attributes
- HP, Attack, Defense, Sp. Atk, Sp. Def, Speed
- Generation - What generation of game releases the Pokemon debuted in
- Legendary - A binary flag for if the Pokemon is legendary or not. 

Here are some examples of other data visualizations made using this dataset:
- https://www.kaggle.com/code/lakshyaag/data-visualization-pokemon-data 
- https://github.com/Dave-314/Pokemon-Data-Visualizations/blob/main/Pokemon_visualization_notebook.ipynb 
- https://medium.com/analytics-vidhya/data-visualization-pok%C3%A9mon-dataset-48e57690830d 


After researching previous Pokemon data visualizations; I have found they are generally useful for exploring the dataset, but less so for the use of analyzing the dataset. Radar charts are popular in the Pokemon community to display how an individual Pokemon’s stats are distributed, but are less useful for analyzing the stats of every Pokemon in the dataset. There are a number of strong visualizations throughout these webpages (such as scatter plots showing the differences in Pokemon stats side by side), as well as some weaker ones (line charts showing changes of average stat per type). These line charts in particular are weak because the order of the Pokemon type along the x-axis is irrelevant - any order would tell the same story, so the change in the graph does not matter. 

It is my goal to create some explanatory visualizations using this dataset without falling to the pitfalls I see some of the visualizations here succumb to. After reviewing other data visualizations of this dataset, I would like to answer the following questions:
- What is the distribution of Pokemon typings across the series?
    - Which Pokemon types are most commonly used as secondary types?
    - What are the most common combinations of primary and secondary typings?
- Which types of Pokemon have the highest average of specific stats?
- How have average Pokemon stats changed across the releases of various generations of games?
- Is there a correlation between any two specific stats in Pokemon; e.g. are Pokemon with high speed more likely to have high defense or offense.

In [2]:
#Ignore Altair Deprecation Warnings
import warnings ; warnings.warn = lambda *args,**kwargs: None

#Import necessary Libraries
import pandas as pd
import altair as alt
import numpy as np
import requests
from io import StringIO

url = 'https://raw.githubusercontent.com/rdadmun/Data_Science_Degree/main/DTSA_5304/Pokemon_Stats.csv'
response = requests.get(url)
pkmn_data = pd.read_csv(StringIO(response.text))
pkmn_data.head()

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


Exploring this dataset a bit, it seems we have 13 different data fields. We will be using all but two of these.
- Number: The Pokedex number for a given Pokemon. We will not be using this field in our data visualization. 
- Name: Pokemon Name
- Type 1: Primary Pokemon typing
- Type 2: Secondary Pokemon typing
- Total: The total number of stat points distributed across the 6 Pokemon stats for the creature.
    **Note:** Pokemon have evolutions, where after a certain level is reached they change form. Example: Bulbasaur becomes Ivysaur at level 16, who becomes Venasaur at level 32. Pokemon in a higher evolutionary tier have a higher pool of base stat points to choose from. 
- HP: Pokemon Stat #1, Hit Points, or Pokemon Health
- Attack: Pokemon Stat #2, Used in calculation for physical moves damage
- Defense: Pokemon Stat #3, Used in the calculation for physical moves damage.
- Special Attack: Pokemon Stat #4, Used in the calculation for non-physical moves damage.
- Special Defense: Pokemon Stat #5, Used in the calculation for non-physical moves damage.
- Speed: Pokemon Stat #6, Used to determine the turn order in a Pokemon Battle.
- Generation: The Generation (1-6) of Pokemon game for which the respective creature was introduced.
- Legendary: a binary flag for if the Pokemon is a legendary creature. 

Let us clean the variables a little bit, by following standard variable naming conventions and replacing the ". " with  an underscore in our variables. 

In [4]:
#Edit Variable Names
pkmn_data = pkmn_data.rename(columns={'Sp. Atk': 'Sp_Atk', 'Sp. Def': 'Sp_Def'})

#Remove Mega Pokemon
pkmn_data = pkmn_data[~pkmn_data['Name'].str.contains('Mega', case=False)]
print(pkmn_data)

       #                 Name   Type 1  Type 2  Total   HP  Attack  Defense  \
0      1            Bulbasaur    Grass  Poison    318   45      49       49   
1      2              Ivysaur    Grass  Poison    405   60      62       63   
2      3             Venusaur    Grass  Poison    525   80      82       83   
4      4           Charmander     Fire     NaN    309   39      52       43   
5      5           Charmeleon     Fire     NaN    405   58      64       58   
..   ...                  ...      ...     ...    ...  ...     ...      ...   
794  718     Zygarde50% Forme   Dragon  Ground    600  108     100      121   
795  719              Diancie     Rock   Fairy    600   50     100      150   
797  720  HoopaHoopa Confined  Psychic   Ghost    600   80     110       60   
798  720   HoopaHoopa Unbound  Psychic    Dark    680   80     160       60   
799  721            Volcanion     Fire   Water    600   80     110      120   

     Sp_Atk  Sp_Def  Speed  Generation  Legendary  

Next we will define colors for each of our various pokemon types throughout our graphs

In [5]:
#Defining Colors
colors = {
    "Bug": "#ABC206",
    "Dark": "#4A3A2F",
    "Dragon": "#5F21F6",
    "Electric": "#E7D711",
    "Fairy": "#EC83B7",
    "Fighting": "#C6231C",
    "Fire": "#F57C22",
    "Flying": "#A58CEB",
    "Ghost": "#684E8A",
    "Grass": "#76C945",
    "Ground": "#D6B55E",
    "Ice": "#9CE0DD",
    "Normal": "#B3B288",
    "Poison": "#B23BAF",
    "Psychic": "#FF467E",
    "Rock": "#B6A136",
    "Steel": "#C5C5D3",
    "Water": "#4C7CE2",
}

##Research Question #1 
- What is the distribution of Pokemon typings across the series?
    - Which Pokemon types are most commonly used as secondary types?
    - What are the most common combinations of primary and secondary typings?

For my initial research question and first few visualizations, we will be exploring the distributions of Pokemon "types".
First, lets create a pie chart displaying what percentage of the total pokemon dataset each primary type makes up:

In [28]:
#Primary Pokemon Type Pie Chart
pkmn_data_filtered = pkmn_data.dropna(subset=['Type 1'])

base = alt.Chart(pkmn_data_filtered).transform_aggregate(
    count='count()',
    groupby=['Type 1']
).transform_joinaggregate(
    total_count='sum(count)',
    groupby=[]
).transform_calculate(
    percentage='datum.count / datum.total_count',
).encode(
    alt.Theta('percentage:Q', stack=True),
    color=alt.Color('Type 1:N', legend=None, scale=alt.Scale(range=list(colors.values())))
)

pie = base.mark_arc(innerRadius=0)
text = base.mark_text(radius=200, size=12).encode(text='Type 1:N')
text2 = base.mark_text(radius=165, size=12).encode(text=alt.Text('percentage:Q', format='.1%'))

pie + text + text2

This is a pretty good distribution and visualization for primary typings! We display the statistics for the "Type 1" field only in this chart, and display them using percentages to assist the vower in pinpointing exactly how much of the total each type represent. We could have used raw values, but that is not beneficial to the user in better understanding the overall distribution. Or rather, it would require a greater mental load from the viewer to compare each group amongst the others. 

Let's create a second one for secondary typings as well. As not every pokemon has a secondary type, we will create a new dataset where the null secondary typings have been filtered out.

In [29]:
#Secondary Pokemon Type Pie Chart
pkmn_data_filtered = pkmn_data.dropna(subset=['Type 2'])

base = alt.Chart(pkmn_data_filtered).transform_aggregate(
    count='count()',
    groupby=['Type 2']
).transform_joinaggregate(
    total_count='sum(count)',
    groupby=[]
).transform_calculate(
    percentage='datum.count / datum.total_count',
).encode(
    alt.Theta('percentage:Q', stack=True),
    color=alt.Color('Type 2:N', legend=None, scale=alt.Scale(range=list(colors.values())))
)

pie = base.mark_arc(innerRadius=0)
text = base.mark_text(radius=200, size=12).encode(text='Type 2:N')
text2 = base.mark_text(radius=165, size=12).encode(text=alt.Text('percentage:Q', format='.1%'))

pie + text + text2

Interestingly,  our pie chart above clearly demonstrates that the Flying type is overwhelmingly the most frequent Type 2 to appear in Pokemon, by a large margin.

**Originally, I put the raw data values on the pie charts above, but after receiving feedback from my reviewers, I changed it to percentages on the chart. Although raw data values may be simple enough to compare, the visualizations did not communicate how much of the total each type represented, and put pressure on the viewer to perform these comparisons themselves. Combined with the fact that humans are less able to directly compare slice values on a pie chart accurately (such as comparing the Normal vs Bug slices in the graph above), I believe showing percentages is the best approach for this visualization. 

While this is pretty helpful for seeing how both primary and secondary types are distributed, it tells us nothing about how Type 1 and Type 2 are paired. To show this, lets create a heatmap, again ignoring any NA or null secondary typings. 

In [36]:
# Replace NaN values in 'Type 1' and 'Type 2' with a placeholder (e.g., 'None')
pkmn_data['Type 1'].fillna('None', inplace=True)
pkmn_data['Type 2'].fillna('None', inplace=True)

# Filter rows where 'Type 2' is not "None"
filtered_data = pkmn_data[pkmn_data['Type 2'] != 'None']

# Create a new DataFrame for counting frequencies
heatmap_data = filtered_data.groupby(['Type 1', 'Type 2']).size().reset_index(name='Frequency')

# Heatmap with frequency color bar
heatmap = alt.Chart(heatmap_data).mark_rect().encode(
    x=alt.X('Type 2:N', title='Type 2'),
    y=alt.Y('Type 1:N', title='Type 1'),
    color=alt.Color('Frequency:Q', scale=alt.Scale(scheme='viridis'), title='Frequency')
).properties(
    width=500,
    height=400,
    title='Pokemon Type 1 vs Type 2 Heatmap (Excluding No Secondary Type)'
)

heatmap

This heatmap highlights which combinations of Type 1/Type 2 exist within the pokemon releases for generations 1-6. We can see that Normal/Flying types are the most common typing match up in the series, with Grass/Poison and Bug/Flying coming in second and third respectively. This heatmap also shows us that Flying is far and away the most likely secondary typing, as Pokemon of every type exist with a flying secondary typing. This conclusion supports our findings from the Type 2 pie chart above, where we learned that Flying is the most prevalent secondary typing in the series.

The heatmap and pie charts together give us a basic understanding of how Pokemon typings are distributed across the series, as well as which type combinations are most popular. However, we have yet to see how what types of pokemon were released per generation. So let's explore that! We will do so with a stacked bar graph of pokemon types per generation. 

In [9]:
# Group by "Type 1" and calculate counts for each generation
type_counts = pkmn_data.groupby(['Generation', 'Type 1']).size().reset_index(name='Count')

# Create the stacked bar chart
Gen_bar_chart = alt.Chart(type_counts).mark_bar().encode(
    x='Generation:N',
    y='Count:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Generation:N', 'Type 1:N', 'Count:Q']
).properties(
    title='Counts per Pokémon across Generations per type',
    width=600, 
    height=400 
)

# Show the stacked bar chart
Gen_bar_chart

Although this graph is dense, the use of tooltips greatly assists the user understanding of the visualization. The graph itself shows how many Pokemon with specific primary types were released per generation. For more in depth statistics, users can hover over parts of the graph to display a pop-up box showing the primary type represented by that section of the bar, the generation specified, and the number of pokemon with that primary type released within that generation. 

This visualization went through a number of different forms before settling on a stacked bar. I initially tried a line graph to show how specific type counts changed per generation, but it was sloppy and quite difficult to read. I also created a graph where each primary type had a separate bar, and a dropdown menu enabled the user to switch generations. This was pretty good, but not terribly useful for comparing the differences between generations. After reviewing the visualization with my peers, I decided a stacked bar graph was the best way forward, even if initially it appears quite busy.



Now that we have some visualizations for the distribution of pokemon types, lets explore Pokemon Stats a bit!

Pokemon have a total of 6 stats: HP, Attack, Defense, Special Attack, Special Defense and Speed.
- HP: Hit Points, or Pokemon Health
- Attack: Used in calculation for physical moves damage
- Defense: Used in the calculation for physical moves damage.
- Special Attack: Used in the calculation for non-physical moves damage.
- Special Defense: Used in the calculation for non-physical moves damage.
- Speed: Used to determine the turn order in a Pokemon Battle.

First, lets create a Bar Chart of Pokemon types and the means of their respective stats to see how they compare against each other.

In [54]:
#Filter NA Values
pkmn_data['Type 1'].fillna('None', inplace=True)

# Calculate mean values
mean_values = pkmn_data.groupby('Type 1').agg({'HP': 'mean', 'Attack': 'mean', 'Defense': 'mean', 'Sp_Atk': 'mean', 'Sp_Def': 'mean', 'Speed': 'mean'}).reset_index()

# Sorting mean values by each stat
mean_values_HP = mean_values.sort_values(by=['HP'], ascending=[False])
mean_values_Atk = mean_values.sort_values(by=['Attack'], ascending=[False])
mean_values_Def = mean_values.sort_values(by=['Defense'], ascending=[False])
mean_values_SpAtk = mean_values.sort_values(by=['Sp_Atk'], ascending=[False])
mean_values_SpDef = mean_values.sort_values(by=['Sp_Def'], ascending=[False])
mean_values_Speed = mean_values.sort_values(by=['Speed'], ascending=[False])

# Selection
selection = alt.selection(type="multi", fields=["Type 1"])

# HP Chart
HP_Bar = alt.Chart(pkmn_data).mark_bar().encode(
    y=alt.Y("mean(HP):Q", title="Mean HP"),
    x=alt.X("Type 1:N", title="Primary Type", sort=mean_values_HP['Type 1'].tolist()),
    color=alt.Color("Type 1:N", legend=None, scale=alt.Scale(range=list(colors.values()))),
    tooltip=["Type 1:N", "mean(HP)"],
    opacity=alt.condition(selection, alt.value(1), alt.value(0.2))
).add_selection(selection).properties(height=250, width=250)

# Attack Chart
Atk_Bar = alt.Chart(pkmn_data).mark_bar().encode(
    y=alt.Y("mean(Attack):Q", title="Mean Attack"),
    x=alt.X("Type 1:N", title="Primary Type", sort=mean_values_Atk['Type 1'].tolist()),
    color=alt.Color("Type 1:N", legend=None, scale=alt.Scale(range=list(colors.values()))),
    tooltip=["Type 1:N", "mean(Attack)"],
).add_selection(selection).properties(height=250, width=250)

# Defense Chart
Def_Bar = alt.Chart(pkmn_data).mark_bar().encode(
    y=alt.Y("mean(Defense):Q", title="Mean Defense"),
    x=alt.X("Type 1:N", title="Primary Type", sort=mean_values_Def['Type 1'].tolist()),
    color=alt.Color("Type 1:N", legend=None, scale=alt.Scale(range=list(colors.values()))),
    tooltip=["Type 1:N", "mean(Defense)"],
).add_selection(selection).properties(height=250, width=250)

# SpAtk Chart
SpAtk_Bar = alt.Chart(pkmn_data).mark_bar().encode(
    y=alt.Y("mean(Sp_Atk):Q", title="Mean Sp Atk"),
    x=alt.X("Type 1:N", title="Primary Type", sort=mean_values_SpAtk['Type 1'].tolist()),
    color=alt.Color("Type 1:N", legend=None, scale=alt.Scale(range=list(colors.values()))),
    tooltip=["Type 1:N", "mean(Sp_Atk)"],
    opacity=alt.condition(selection, alt.value(1), alt.value(0.2))
).add_selection(selection).properties(height=250, width=250)

# SpDef Chart
SpDef_Bar = alt.Chart(pkmn_data).mark_bar().encode(
    y=alt.Y("mean(Sp_Def):Q", title="Mean Sp Def"),
    x=alt.X("Type 1:N", title="Primary Type", sort=mean_values_SpDef['Type 1'].tolist()),
    color=alt.Color("Type 1:N", legend=None, scale=alt.Scale(range=list(colors.values()))),
    tooltip=["Type 1:N", "mean(Sp_Def)"],
).add_selection(selection).properties(height=250, width=250)

# Speed Chart
Speed_Bar = alt.Chart(pkmn_data).mark_bar().encode(
    y=alt.Y("mean(Speed):Q", title="Mean Speed"),
    x=alt.X("Type 1:N", title="Primary Type", sort=mean_values_Speed['Type 1'].tolist()),
    color=alt.Color("Type 1:N", legend=None, scale=alt.Scale(range=list(colors.values()))),
    tooltip=["Type 1:N", "mean(Speed)"],
).add_selection(selection).properties(height=250, width=250)

# Combine charts vertically and horizontally
charts = alt.vconcat(
    alt.hconcat(HP_Bar, Atk_Bar, Def_Bar),
    alt.hconcat(SpAtk_Bar, SpDef_Bar, Speed_Bar)
)

charts

Excellent! These graphs, especially displayed side by side, do a fantastic job of showing which primary types of Pokemon excel in specific stats.

Initially, the Pokemon primary types were listed alphabetically left to right along the x-axis. However, my reviewers assisted in the change made to the graphs, where each x-axis is sorted in descending order. As each color bar is marked on the x-axis, a separate legend is unnecessary, but the shift from alphabetical order to descending order by mean value creates much more visually pleasing, and easily communicable findings through the visualizations.  

Although I am satisfied with how these visualizations appear, I would like to explore this same data in Box and Whisker charts. 
    Note: Altair had some issues placing the Box and Whisker charts side by side, which is why they are displayed vertically. 

In [73]:
# Replace NaN values in 'Type 1' with a placeholder (e.g., 'None')
pkmn_data['Type 1'].fillna('None', inplace=True)

# Calculate mean values
mean_values = pkmn_data.groupby('Type 1').agg({'HP': 'mean', 'Attack': 'mean', 'Defense': 'mean', 'Sp_Atk': 'mean', 'Sp_Def': 'mean', 'Speed': 'mean'}).reset_index()

# Sorting mean values by each stat
mean_values_HP = mean_values.sort_values(by=['HP'], ascending=[False])
mean_values_Atk = mean_values.sort_values(by=['Attack'], ascending=[False])
mean_values_Def = mean_values.sort_values(by=['Defense'], ascending=[False])
mean_values_SpAtk = mean_values.sort_values(by=['Sp_Atk'], ascending=[False])
mean_values_SpDef = mean_values.sort_values(by=['Sp_Def'], ascending=[False])
mean_values_Speed = mean_values.sort_values(by=['Speed'], ascending=[False])

# Selections
selection_HP = alt.selection(type="multi", fields=["Type 1"], name="selection_HP")
selection_Atk = alt.selection(type="multi", fields=["Type 1"], name="selection_Atk")
selection_Def = alt.selection(type="multi", fields=["Type 1"], name="selection_Def")
selection_SpAtk = alt.selection(type="multi", fields=["Type 1"], name="selection_SpAtk")
selection_SpDef = alt.selection(type="multi", fields=["Type 1"], name="selection_SpDef")
selection_Speed = alt.selection(type="multi", fields=["Type 1"], name="selection_Speed")

# Create individual boxplots
box_HP = alt.Chart(pkmn_data).mark_boxplot().encode(
    x=alt.X('Type 1:N', title='Primary Type', sort=mean_values_HP['Type 1'].tolist()),
    y=alt.Y('HP:Q', title='HP'),
    color=alt.Color("Type 1", legend=None, scale=alt.Scale(range=list(colors.values()))),
).add_selection(selection_HP).properties(
    title='Box and Whisker Plot of Pokemon HP by Type',
    height=250,
    width=250
)

box_Atk = alt.Chart(pkmn_data).mark_boxplot().encode(
    x=alt.X('Type 1:N', title='Primary Type', sort=mean_values_Atk['Type 1'].tolist()),
    y=alt.Y('Attack:Q', title='Attack'),
    color=alt.Color("Type 1", legend=None, scale=alt.Scale(range=list(colors.values()))),
).add_selection(selection_Atk).properties(
    title='Box and Whisker Plot of Pokemon Attack by Type',
    height=250,
    width=250
)

box_Def = alt.Chart(pkmn_data).mark_boxplot().encode(
    x=alt.X('Type 1:N', title='Primary Type', sort=mean_values_Def['Type 1'].tolist()),
    y=alt.Y('Defense:Q', title='Defense'),
    color=alt.Color("Type 1", legend=None, scale=alt.Scale(range=list(colors.values()))),
).add_selection(selection_Def).properties(
    title='Box and Whisker Plot of Pokemon Defense by Type',
    height=250,
    width=250
)

box_SpAtk = alt.Chart(pkmn_data).mark_boxplot().encode(
    x=alt.X('Type 1:N', title='Primary Type', sort=mean_values_SpAtk['Type 1'].tolist()),
    y=alt.Y('Sp_Atk:Q', title='Sp. Atk'),
    color=alt.Color("Type 1", legend=None, scale=alt.Scale(range=list(colors.values()))),
).add_selection(selection_SpAtk).properties(
    title='Box and Whisker Plot of Pokemon Sp.Atk by Type',
    height=250,
    width=250
)

box_SpDef = alt.Chart(pkmn_data).mark_boxplot().encode(
    x=alt.X('Type 1:N', title='Primary Type', sort=mean_values_SpDef['Type 1'].tolist()),
    y=alt.Y('Sp_Def:Q', title='Sp. Def'),
    color=alt.Color("Type 1", legend=None, scale=alt.Scale(range=list(colors.values()))),
).add_selection(selection_SpDef).properties(
    title='Box and Whisker Plot of Pokemon Sp.Def by Type',
    height=250,
    width=250
)

box_Speed = alt.Chart(pkmn_data).mark_boxplot().encode(
    x=alt.X('Type 1:N', title='Primary Type', sort=mean_values_Speed['Type 1'].tolist()),
    y=alt.Y('Speed:Q', title='Speed'),
    color=alt.Color("Type 1", legend=None, scale=alt.Scale(range=list(colors.values()))),
).add_selection(selection_Speed).properties(
    title='Box and Whisker Plot of Pokemon Speed by Type',
    height=250,
    width=250
)

# Display the charts
box_HP.display()
box_Atk.display()
box_Def.display()
box_SpAtk.display()
box_SpDef.display()
box_Speed.display()


These box and whisker charts are better than the bar charts for showing the distribution of stats per type, per stat. For example, the Speed Box and Whisker chart shows us that Flying type Pokemon have the highest average speed, and that there are very few outliers. Compare this with Psychic types on the same graph, which has the 4th highest mean speed with many outliers.

Originally, I also had these Box and Whisker charts with the types listed alphabetically along the x-axis. After my reviewers submitted their feedback; I decided to have these match the bar charts in the order in which they displayed types.

With these graphs, we've finalized our visualizations of different stats by pokemon type. Let's now return to our analysis of the changes made across generations, by making a line graph showing how stats changed across each generation of Pokemon releases.

In [34]:
#First, we need to build a function to group by
def alt_stats_by(classifier):
    stats_names = ['HP', 'Attack', 'Defense', 'Sp_Atk', 'Sp_Def', 'Speed']
    
    # Group by classifier and calculate mean
    stats_mean = pkmn_data.groupby(classifier)[stats_names].mean().reset_index()


    # Melt the dataframe for Altair's long-form data requirement
    melted_stats_mean = stats_mean.melt(id_vars=[classifier], value_vars=stats_names, var_name='Stat', value_name='MeanValue')


    # Create our chart
    chart = alt.Chart(melted_stats_mean).mark_line(point=True).encode(
        x=alt.X(f'{classifier}:N', title=f'{classifier}'),
        y=alt.Y('MeanValue:Q', title='Mean Values', scale=alt.Scale(domain=[60,85])),
        color='Stat:N',
        tooltip=[alt.Tooltip(f'{classifier}:N', title=f'{classifier}'), 'Stat:N', 'MeanValue:Q']
    ).properties(
        title=f'Mean Trend of Stats by {classifier}',
        width=600,
        height=400
    )

    return chart

# Visualize trend of stats by Generation and Type 1
alt_stats_by('Generation')

This is actually extremely interesting! It seems that, aside from a few dips, every stat received general buffs across the series. "Powercreep" is a term thrown around in video game circles, and is used to describe how future releases need to continuously up the ante to make the newer releases more attractive to fans and obtain the "most powerful" Pokemon. We see this thr most as the Attack stat approaches generation 5, after which it falls back down.

Finally, I'd like to present 36 scatter plots which compare each of the Pokemon stats against each other.

In [35]:
#HP Graphs
HP_HP = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='HP:Q',
    y='HP:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'HP:Q', 'HP:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of HP vs HP Scores',
    width=200,
    height=200
)

HP_Atk = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='HP:Q',
    y='Attack:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'HP:Q', 'Attack:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of HP vs Attack Scores',
    width=200,
    height=200
)

HP_Def = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='HP:Q',
    y='Defense:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'HP:Q', 'Defense:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of HP vs Defense Scores',
    width=200,
    height=200
)

HP_SpAtk = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='HP:Q',
    y='Sp_Atk:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'HP:Q', 'Sp_Atk:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of HP vs Sp. Atk Scores',
    width=200,
    height=200 
)

HP_SpDef = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='HP:Q',
    y='Sp_Def:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'HP:Q', 'Sp_Def:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of HP vs Sp. Def Scores',
    width=200,
    height=200
)

HP_Speed = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='HP:Q',
    y='Speed:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'HP:Q', 'Speed:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of HP vs Speed Scores',
    width=200,
    height=200
)

#Atk Graphs
Atk_HP = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='Attack:Q',
    y='HP:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'Attack:Q', 'HP:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of Atk vs HP Scores',
    width=200,
    height=200
)

Atk_Atk = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='Attack:Q',
    y='Attack:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'Attack:Q', 'Attack:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of Attack vs Attack Scores',
    width=200,
    height=200
)

Atk_Def = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='Attack:Q',
    y='Defense:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'Attack:Q', 'Defense:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of Attack vs Defense Scores',
    width=200,
    height=200
)

Atk_SpAtk = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='Attack:Q',
    y='Sp_Atk:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'Attack:Q', 'Sp_Atk:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of Attack vs Sp. Atk Scores',
    width=200,
    height=200 
)

Atk_SpDef = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='Attack:Q',
    y='Sp_Def:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'Attack:Q', 'Sp_Def:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of Attack vs Sp. Def Scores',
    width=200,
    height=200
)

Atk_Speed = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='Attack:Q',
    y='Speed:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'Attack:Q', 'Speed:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of Attack vs Speed Scores',
    width=200,
    height=200
)

#Defense Graphs
Def_HP = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='Defense:Q',
    y='HP:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'Defense:Q', 'HP:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of Defense vs HP Scores',
    width=200,
    height=200
)

Def_Atk = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='Defense:Q',
    y='Attack:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'Defense:Q', 'Attack:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of Defense vs Attack Scores',
    width=200,
    height=200
)

Def_Def = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='Defense:Q',
    y='Defense:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'Defense:Q', 'Defense:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of Defense vs Defense Scores',
    width=200,
    height=200
)

Def_SpAtk = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='Defense:Q',
    y='Sp_Atk:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'Defense:Q', 'Sp_Atk:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of Defense vs Sp. Atk Scores',
    width=200,
    height=200 
)

Def_SpDef = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='Defense:Q',
    y='Sp_Def:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'Defense:Q', 'Sp_Def:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of Defense vs Sp. Def Scores',
    width=200,
    height=200
)

Def_Speed = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='Defense:Q',
    y='Speed:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'Defense:Q', 'Speed:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of Defense vs Speed Scores',
    width=200,
    height=200
)

#Sp. Atk Graphs
SpAtk_HP = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='Sp_Atk:Q',
    y='HP:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'Sp_Atk:Q', 'HP:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of Sp. Atk vs HP Scores',
    width=200,
    height=200
)

SpAtk_Atk = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='Sp_Atk:Q',
    y='Attack:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'Sp_Atk:Q', 'Attack:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of Sp. Atk vs Attack Scores',
    width=200,
    height=200
)

SpAtk_Def = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='Sp_Atk:Q',
    y='Defense:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'Sp_Atk:Q', 'Defense:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of Sp. Atk vs Defense Scores',
    width=200,
    height=200
)

SpAtk_SpAtk = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='Sp_Atk:Q',
    y='Sp_Atk:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'Sp_Atk:Q', 'Sp_Atk:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of Sp. Atk vs Sp. Atk Scores',
    width=200,
    height=200 
)

SpAtk_SpDef = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='Sp_Atk:Q',
    y='Sp_Def:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'Sp_Atk:Q', 'Sp_Def:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of Sp. Atk vs Sp. Def Scores',
    width=200,
    height=200
)

SpAtk_Speed = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='Sp_Atk:Q',
    y='Speed:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'Sp_Atk:Q', 'Speed:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of Sp. Atk vs Speed Scores',
    width=200,
    height=200
)

#Sp. Defense Graphs
SpDef_HP = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='Sp_Def:Q',
    y='HP:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'Sp_Def:Q', 'HP:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of Sp. Def vs HP Scores',
    width=200,
    height=200
)

SpDef_Atk = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='Sp_Def:Q',
    y='Attack:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'Sp_Def:Q', 'Attack:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of Sp. Def vs Attack Scores',
    width=200,
    height=200
)

SpDef_Def = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='Sp_Def:Q',
    y='Defense:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'Sp_Def:Q', 'Defense:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of Sp. Def vs Defense Scores',
    width=200,
    height=200
)

SpDef_SpAtk = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='Sp_Def:Q',
    y='Sp_Atk:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'Sp_Def:Q', 'Sp_Atk:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of Sp. Def vs Sp. Atk Scores',
    width=200,
    height=200 
)

SpDef_SpDef = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='Sp_Def:Q',
    y='Sp_Def:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'Sp_Def:Q', 'Sp_Def:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of Sp. Def vs Sp. Def Scores',
    width=200,
    height=200
)

SpDef_Speed = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='Sp_Def:Q',
    y='Speed:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'Sp_Def:Q', 'Speed:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of Sp. Def vs Speed Scores',
    width=200,
    height=200
)

#Speed Graphs
Speed_HP = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='Speed:Q',
    y='HP:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'Speed:Q', 'HP:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of Speed vs HP Scores',
    width=200,
    height=200
)

Speed_Atk = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='Speed:Q',
    y='Attack:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'Speed:Q', 'Attack:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of Speed vs Attack Scores',
    width=200,
    height=200
)

Speed_Def = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='Speed:Q',
    y='Defense:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'Speed:Q', 'Defense:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of Speed vs Defense Scores',
    width=200,
    height=200
)

Speed_SpAtk = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='Speed:Q',
    y='Sp_Atk:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'Speed:Q', 'Sp_Atk:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of Speed vs Sp. Atk Scores',
    width=200,
    height=200 
)

Speed_SpDef = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='Speed:Q',
    y='Sp_Def:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'Speed:Q', 'Sp_Def:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of Speed vs Sp. Def Scores',
    width=200,
    height=200
)

Speed_Speed = alt.Chart(pkmn_data).mark_circle(size=60).encode(
    x='Speed:Q',
    y='Speed:Q',
    color=alt.Color("Type 1", scale=alt.Scale(range=list(colors.values()))),
    tooltip=['Name:N', 'Speed:Q', 'Speed:Q', 'Type 1:N']
).properties(
    title='Scatter Plot of Speed vs Speed Scores',
    width=200,
    height=200
)


(HP_HP | HP_Atk | HP_Def | HP_SpAtk | HP_SpDef | HP_Speed) & (Atk_HP | Atk_Atk | Atk_Def | Atk_SpAtk | Atk_SpDef | Atk_Speed) & (Def_HP | Def_Atk | Def_Def | Def_SpAtk | Def_SpDef | Def_Speed) & (SpAtk_HP | SpAtk_Atk | SpAtk_Def | SpAtk_SpAtk | SpAtk_SpDef | SpAtk_Speed) & (SpDef_HP | SpDef_Atk | SpDef_Def | SpDef_SpAtk | SpDef_SpDef | SpDef_Speed) & (Speed_HP | Speed_Atk | Speed_Def | Speed_SpAtk | Speed_SpDef | Speed_Speed)

There is a ton of information on the graphs above, which plot each of the 6 Pokemon stats against each other, including versus themselves, which is how we receive the perfectly straight scatter plots. These graphs also contain tooltips, so viewers can hover over any given point on the scatter plots and see which pokemon has that stat distribution, the value of their stats which are listed on the axes of the graphs. and the pokemon primary type.  

## Conclusion

### I had a lot of fun making these visualizations, and I hope you enjoyed viewing them!

To reiterate, my research questions were:
- What is the distribution of Pokemon typings across the series?
- Which types of Pokemon have the highest average of specific stats?
- Is there a correlation between any two specific stats in Pokemon; e.g. are Pokemon with high speed more likely to have high defense or offense.

Following the DTSA 5304 course curriculum, I ensured I answered a number of questions regarding each of these questions, which is listed below:
- 1. What is the distribution of Pokemon typings across the series?
* Why is a task pursued? (goal)
    * This task is being pursued in order to gain a better understanding of which types of Pokemon are most prevalent throughout the video game series, and which are the least common. I would also like to see which types are most common in the secondary typing category of the data. 
* How is a task conducted? (means)
    * This will be done through a 2D pie chart, with statistics posted as percentages of the total number of Pokemon rather than raw values. I also think it would be interesting to see what common primary/secondary typings exist. 
* What does a task seek to learn about the data? (characteristics)
    * This task seeks to answer the question of which types of Pokemon are the most common, and which types are most prevalent in the secondary typing field. 
* Where does the task operate? (target data)
* When is the task performed? (workflow)
    * As this is the first research question, these visualizations will be built after loading and cleaning the dataset. 
* Who is executing the task? (roles)
    * Ryan Dadmun 

- 2. Which types of Pokemon have the highest average of specific stats? Do stat averages change across the releases within the Pokemon series?
* Why is a task pursued? (goal)
    * This task is pursued to gain a better understanding of how different types of Pokemon have their stats distributed. For example, are ghost type Pokemon more likely to have high offensive or defensive stats? Additionally, have the averages of these stats changed across series releases?
* How is a task conducted? (means)
    * We will be creating bar charts displaying the mean value of a given stat per type of Pokemon; and a line chart showing overall stat averages per generation/ 
* What does a task seek to learn about the data? (characteristics)
    * The aim of this visualization is to show which types of Pokemon are the best in category per stat, and where certain types of Pokemon are lacking specific stats.
* Where does the task operate? (target data)
    * This task utilizes all Pokemon statistics (HP, Attack, Defense, Sp. Atk, Sp. Def, Speed), as well as the primary Pokemon type. For our second analysis in the above questions, we will be replacing Pokemon type with “Generation”.
* When is the task performed? (workflow)
    * These visualizations will be created second, following the visualizations created for the initial research question. 
* Who is executing the task? (roles)
    * Ryan Dadmun

- 3. Is there a correlation between any two specific stats in Pokemon; e.g. are Pokemon with high speed more likely to have high defense or offense.
* Why is a task pursued? (goal)
    * This task is pursued to gain an understanding of if any two statistics are correlated in the design of Pokemon. 
* How is a task conducted? (means).
    * This task will be completed by creating 36 bar charts which place one of six different stats on the x and y axis. 6 of these charts should be a straight line of 1, as they will compare the same stat on each axis. 
* What does a task seek to learn about the data? (characteristics)
    * This tasks seeks to answer the question “is there anything in the design philosophy of Pokemon to have two stats correlate together when creating Pokemon stats?”
* Where does the task operate? (target data)
    * This visualization will use the Pokemon stat fields, which are HP, Attack, Defense, Sp. Atk, Sp. Def, Speed.
* When is the task performed? (workflow)
    * After the other research question’s visualizations have been created. 
* Who is executing the task? (roles)
    * Ryan Dadmun


### Additionally, we were asked to find individuals, hoppefully some with data visualization experience, to review our workbooks:
* The target question you want to answer:
    * Through these visualizations, were you able to get a better understanding of how frequent the different types of Pokemon are through the games? Were you able to understand which types are most commonly used as secondary types? Were you able to see which types had the highest average of each of a Pokemon’s stats?
* The people you would recruit to answer that question:
    * I have recruited a Architecture PhD student at the University of Utah, a friend who builds data visualizations for MedStar, a DC based healthcare company, a Materials Engineering PhD student at the University of Tennessee, and my mother, a medical start-up quality assurance officer.  
* The kinds of measures you would use to answer your data (e.g., insight depth, use cases, accuracy) and what these measures would tell you about the core question:
    * Primarily, I will be relying on Insight Depth to measure the success of my visualization. This visualization is meant to bring some of there insights within the Pokemon dataset to the user quickly, as opposed to ensuring accuracy or demanding usability metrics.
* The approach you will use to answer that question (e.g., a journaling study, a formal experiment, etc.)
    * I will be using an Exploratory Data Analysis Approach in order to assist int he identification of patterns, trends and outliers in the dataset.  
* How you would instantiate those methods (i.e., what would your participants do?)
    * Participants should explore my data visualization, and then return and answer the series of questions asked in the initial section of this post, and explain either why they were or were not successful in answering them after being presented to visualizations 
* What criteria would you use to indicate that your visualization was successful.
    * If my users were able to read my visualizations and gather the answers to my research h questions from the visualizations alone. 



## I am extremely grateful to the individuals who helped me by reviewing this workbook, and it would be a lesser product without their discussion, insight and input. Through their help, I believe I have created a strong visual exploration of the Pokemon generation 1-6 dataset. 

RJD
