1. Clean the files and combine them into one final DataFrame.

This dataframe should have the following columns:
Hero (Just the name of the Hero)
Publisher
Gender
Eye color
Race
Hair color
Height (numeric)
Skin color
Alignment
Weight (numeric)
Plus, one-hot-encoded columns for every power that appears in the dataset. E.g.:
Agility
Flight
Superspeed
etc.
Hint: There is a space in "100 kg" or "52.5 cm"

## Loading Data from Part 1

In [1]:
## Plotly is not included in your dojo-env
!pip install plotly



In [2]:
# Standard Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import json

## importing plotly 
import plotly.express as px

In [3]:
## Load in csv.gz
df = pd.read_csv('Data/superhero_info - superhero_info.csv')
df.head()

Unnamed: 0,Hero|Publisher,Gender,Race,Alignment,Hair color,Eye color,Skin color,Measurements
0,A-Bomb|Marvel Comics,Male,Human,good,No Hair,yellow,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}"
1,Abe Sapien|Dark Horse Comics,Male,Icthyo Sapien,good,No Hair,blue,blue,"{'Height': '191.0 cm', 'Weight': '65.0 kg'}"
2,Abin Sur|DC Comics,Male,Ungaran,good,No Hair,blue,red,"{'Height': '185.0 cm', 'Weight': '90.0 kg'}"
3,Abomination|Marvel Comics,Male,Human / Radiation,bad,No Hair,green,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}"
4,Absorbing Man|Marvel Comics,Male,Human,bad,No Hair,blue,Unknown,"{'Height': '193.0 cm', 'Weight': '122.0 kg'}"


In [58]:
## Load in csv.gz
df2 = pd.read_csv('Data/superhero_powers - superhero_powers.csv')
df2.head()

Unnamed: 0,hero_names,Powers
0,3-D Man,"Agility,Super Strength,Stamina,Super Speed"
1,A-Bomb,"Accelerated Healing,Durability,Longevity,Super..."
2,Abe Sapien,"Agility,Accelerated Healing,Cold Resistance,Du..."
3,Abin Sur,Lantern Power Ring
4,Abomination,"Accelerated Healing,Intelligence,Super Strengt..."


## Preprocessing

- 1. We need to get the height and weight as separate columns.
- 2. We need to get the Hero & Publisher in diff columns

In [5]:
test_meas = df.loc[1, 'Measurements']
test_meas

"{'Height': '191.0 cm', 'Weight': '65.0 kg'}"

In [6]:
type(test_meas)

str

## Fixing the String-Dictionaries

In [7]:
## REPLACE single ' with double " 
test_meas = test_meas.replace("'", '"')
test_meas

'{"Height": "191.0 cm", "Weight": "65.0 kg"}'

In [8]:
json.loads(test_meas)

{'Height': '191.0 cm', 'Weight': '65.0 kg'}

In [9]:
# viewing type after using json.loads
# NOW IT'S A DICTIONARY
type(json.loads(test_meas))

dict

In [10]:
## replace ' with " (entire column)
df['Measurements'] = df['Measurements'].str.replace("'", '"')

## apply json.loads
df['Measurements'] = df['Measurements'].apply(json.loads)

In [11]:
## slice out a single test coordinate
test_meas = df.loc[5, 'Measurements']
test_meas

{'Height': '185.0 cm', 'Weight': '88.0 kg'}

In [12]:
#FOR THE ENTIRE COLUMN, IT'S AN ACTUAL DICTIONARY
type(test_meas)

dict

## Using .apply with pd.Series to convert a dictionary column into multiple columns

In [13]:
## use .apply pd.Series to convert a dict to columns
df['Measurements'].apply(pd.Series)

Unnamed: 0,Height,Weight
0,203.0 cm,441.0 kg
1,191.0 cm,65.0 kg
2,185.0 cm,90.0 kg
3,203.0 cm,441.0 kg
4,193.0 cm,122.0 kg
...,...,...
458,183.0 cm,83.0 kg
459,165.0 cm,52.0 kg
460,66.0 cm,17.0 kg
461,170.0 cm,57.0 kg


In [14]:
## Concatenate the 2 new columns and drop the original.
df = pd.concat([df, df['Measurements'].apply(pd.Series)], axis = 1)
#drop 
df = df.drop(columns = 'Measurements')
df.head(2)

Unnamed: 0,Hero|Publisher,Gender,Race,Alignment,Hair color,Eye color,Skin color,Height,Weight
0,A-Bomb|Marvel Comics,Male,Human,good,No Hair,yellow,Unknown,203.0 cm,441.0 kg
1,Abe Sapien|Dark Horse Comics,Male,Icthyo Sapien,good,No Hair,blue,blue,191.0 cm,65.0 kg


In [15]:
# Split the "Hero|Publisher" column
df[['Hero', 'Publisher']] = df['Hero|Publisher'].str.split('|', expand=True)

# Print the resulting DataFrame
df.head()

Unnamed: 0,Hero|Publisher,Gender,Race,Alignment,Hair color,Eye color,Skin color,Height,Weight,Hero,Publisher
0,A-Bomb|Marvel Comics,Male,Human,good,No Hair,yellow,Unknown,203.0 cm,441.0 kg,A-Bomb,Marvel Comics
1,Abe Sapien|Dark Horse Comics,Male,Icthyo Sapien,good,No Hair,blue,blue,191.0 cm,65.0 kg,Abe Sapien,Dark Horse Comics
2,Abin Sur|DC Comics,Male,Ungaran,good,No Hair,blue,red,185.0 cm,90.0 kg,Abin Sur,DC Comics
3,Abomination|Marvel Comics,Male,Human / Radiation,bad,No Hair,green,Unknown,203.0 cm,441.0 kg,Abomination,Marvel Comics
4,Absorbing Man|Marvel Comics,Male,Human,bad,No Hair,blue,Unknown,193.0 cm,122.0 kg,Absorbing Man,Marvel Comics


In [16]:
#Remove the original column
df.drop('Hero|Publisher', axis=1, inplace=True)

# Print the resulting DataFrame
df.head()

Unnamed: 0,Gender,Race,Alignment,Hair color,Eye color,Skin color,Height,Weight,Hero,Publisher
0,Male,Human,good,No Hair,yellow,Unknown,203.0 cm,441.0 kg,A-Bomb,Marvel Comics
1,Male,Icthyo Sapien,good,No Hair,blue,blue,191.0 cm,65.0 kg,Abe Sapien,Dark Horse Comics
2,Male,Ungaran,good,No Hair,blue,red,185.0 cm,90.0 kg,Abin Sur,DC Comics
3,Male,Human / Radiation,bad,No Hair,green,Unknown,203.0 cm,441.0 kg,Abomination,Marvel Comics
4,Male,Human,bad,No Hair,blue,Unknown,193.0 cm,122.0 kg,Absorbing Man,Marvel Comics


## Height to numeric

In [17]:
# Assuming you have a DataFrame called 'df' with a column named 'Height'
df['Height'] = df['Height'].str.replace(' cm', '')  # Remove " cm"


In [18]:
df['Height'] = pd.to_numeric(df['Height'])  # Convert to numeric
# Print the updated DataFrame
df.head()

Unnamed: 0,Gender,Race,Alignment,Hair color,Eye color,Skin color,Height,Weight,Hero,Publisher
0,Male,Human,good,No Hair,yellow,Unknown,203.0,441.0 kg,A-Bomb,Marvel Comics
1,Male,Icthyo Sapien,good,No Hair,blue,blue,191.0,65.0 kg,Abe Sapien,Dark Horse Comics
2,Male,Ungaran,good,No Hair,blue,red,185.0,90.0 kg,Abin Sur,DC Comics
3,Male,Human / Radiation,bad,No Hair,green,Unknown,203.0,441.0 kg,Abomination,Marvel Comics
4,Male,Human,bad,No Hair,blue,Unknown,193.0,122.0 kg,Absorbing Man,Marvel Comics


In [19]:
#Ensure type is numeric 
df['Height'].dtype

dtype('float64')

## Weight to numeric

In [20]:
# Assuming you have a DataFrame called 'df' with a column named 'Height'
df['Weight'] = df['Weight'].str.replace(' kg', '')  # Remove " cm"

In [21]:
df['Weight'] = pd.to_numeric(df['Weight'])  # Convert to numeric
# Print the updated DataFrame
df.head()

Unnamed: 0,Gender,Race,Alignment,Hair color,Eye color,Skin color,Height,Weight,Hero,Publisher
0,Male,Human,good,No Hair,yellow,Unknown,203.0,441.0,A-Bomb,Marvel Comics
1,Male,Icthyo Sapien,good,No Hair,blue,blue,191.0,65.0,Abe Sapien,Dark Horse Comics
2,Male,Ungaran,good,No Hair,blue,red,185.0,90.0,Abin Sur,DC Comics
3,Male,Human / Radiation,bad,No Hair,green,Unknown,203.0,441.0,Abomination,Marvel Comics
4,Male,Human,bad,No Hair,blue,Unknown,193.0,122.0,Absorbing Man,Marvel Comics


In [22]:
#Ensure type is numeric 
df['Weight'].dtype

dtype('float64')

In [23]:
df = df[['Hero', 'Publisher', 'Gender', 'Eye color', 'Race', 'Hair color', 'Height', 'Skin color', 'Alignment', 'Weight']]
df.head()

Unnamed: 0,Hero,Publisher,Gender,Eye color,Race,Hair color,Height,Skin color,Alignment,Weight
0,A-Bomb,Marvel Comics,Male,yellow,Human,No Hair,203.0,Unknown,good,441.0
1,Abe Sapien,Dark Horse Comics,Male,blue,Icthyo Sapien,No Hair,191.0,blue,good,65.0
2,Abin Sur,DC Comics,Male,blue,Ungaran,No Hair,185.0,red,good,90.0
3,Abomination,Marvel Comics,Male,green,Human / Radiation,No Hair,203.0,Unknown,bad,441.0
4,Absorbing Man,Marvel Comics,Male,blue,Human,No Hair,193.0,Unknown,bad,122.0


## Separate df2 Powers by comma

In [59]:
df2.head()

Unnamed: 0,hero_names,Powers
0,3-D Man,"Agility,Super Strength,Stamina,Super Speed"
1,A-Bomb,"Accelerated Healing,Durability,Longevity,Super..."
2,Abe Sapien,"Agility,Accelerated Healing,Cold Resistance,Du..."
3,Abin Sur,Lantern Power Ring
4,Abomination,"Accelerated Healing,Intelligence,Super Strengt..."


In [60]:
# Step 1: Create a list of all unique powers
all_powers = set()
for powers in df2["Powers"]:
    all_powers.update(powers.split(','))

In [61]:
# Step 2: Create binary columns for each power
for power in all_powers:
    df2[power] = df2["Powers"].apply(lambda x: 1 if power in x else 0)

  df2[power] = df2["Powers"].apply(lambda x: 1 if power in x else 0)
  df2[power] = df2["Powers"].apply(lambda x: 1 if power in x else 0)
  df2[power] = df2["Powers"].apply(lambda x: 1 if power in x else 0)
  df2[power] = df2["Powers"].apply(lambda x: 1 if power in x else 0)
  df2[power] = df2["Powers"].apply(lambda x: 1 if power in x else 0)
  df2[power] = df2["Powers"].apply(lambda x: 1 if power in x else 0)
  df2[power] = df2["Powers"].apply(lambda x: 1 if power in x else 0)
  df2[power] = df2["Powers"].apply(lambda x: 1 if power in x else 0)
  df2[power] = df2["Powers"].apply(lambda x: 1 if power in x else 0)
  df2[power] = df2["Powers"].apply(lambda x: 1 if power in x else 0)
  df2[power] = df2["Powers"].apply(lambda x: 1 if power in x else 0)
  df2[power] = df2["Powers"].apply(lambda x: 1 if power in x else 0)
  df2[power] = df2["Powers"].apply(lambda x: 1 if power in x else 0)
  df2[power] = df2["Powers"].apply(lambda x: 1 if power in x else 0)
  df2[power] = df2["Powers"].apply

In [62]:
df2

Unnamed: 0,hero_names,Powers,Molecular Manipulation,Audio Control,Enhanced Senses,Animal Control,Sub-Mariner,Vision - Cryo,Toxin and Disease Control,Force Fields,...,Invisibility,Telekinesis,Projection,Jump,Animal Oriented Powers,Heat Resistance,Substance Secretion,Agility,Probability Manipulation,Vision - X-Ray
0,3-D Man,"Agility,Super Strength,Stamina,Super Speed",0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
1,A-Bomb,"Accelerated Healing,Durability,Longevity,Super...",0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Abe Sapien,"Agility,Accelerated Healing,Cold Resistance,Du...",0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,1,0,0
3,Abin Sur,Lantern Power Ring,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Abomination,"Accelerated Healing,Intelligence,Super Strengt...",0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
662,Yellowjacket II,"Flight,Energy Blasts,Size Changing",0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
663,Ymir,"Cold Resistance,Durability,Longevity,Super Str...",0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
664,Yoda,"Agility,Stealth,Danger Sense,Marksmanship,Weap...",0,0,0,0,0,0,0,1,...,0,1,0,1,0,0,0,1,0,0
665,Zatanna,"Cryokinesis,Telepathy,Magic,Fire Control,Proba...",0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0


In [65]:
## showing the lists are really strings
df2.loc[2,'Powers']

'Agility,Accelerated Healing,Cold Resistance,Durability,Underwater breathing,Marksmanship,Weapons Master,Longevity,Intelligence,Super Strength,Telepathy,Stamina,Immortality,Reflexes,Enhanced Sight,Sub-Mariner'

In [67]:
df2['Powers'].value_counts()

Intelligence                                                                                                                                                                                                                                                         8
Durability,Super Strength                                                                                                                                                                                                                                            5
Agility,Stealth,Marksmanship,Weapons Master,Stamina                                                                                                                                                                                                                  4
Marksmanship                                                                                                                                                                                                       

In [68]:
## exploding the column of lists
exploded = df2.explode('Powers')
exploded[['hero_names','Powers']].head(5)


Unnamed: 0,hero_names,Powers
0,3-D Man,"Agility,Super Strength,Stamina,Super Speed"
1,A-Bomb,"Accelerated Healing,Durability,Longevity,Super..."
2,Abe Sapien,"Agility,Accelerated Healing,Cold Resistance,Du..."
3,Abin Sur,Lantern Power Ring
4,Abomination,"Accelerated Healing,Intelligence,Super Strengt..."


In [69]:
## saving the unique values from the exploded column
cols_to_make = exploded['Powers'].dropna().unique()
cols_to_make



array(['Agility,Super Strength,Stamina,Super Speed',
       'Accelerated Healing,Durability,Longevity,Super Strength,Stamina,Camouflage,Self-Sustenance',
       'Agility,Accelerated Healing,Cold Resistance,Durability,Underwater breathing,Marksmanship,Weapons Master,Longevity,Intelligence,Super Strength,Telepathy,Stamina,Immortality,Reflexes,Enhanced Sight,Sub-Mariner',
       'Lantern Power Ring',
       'Accelerated Healing,Intelligence,Super Strength,Stamina,Super Speed,Invulnerability,Animation,Super Breath',
       'Dimensional Awareness,Flight,Intelligence,Super Strength,Size Changing,Super Speed,Teleportation,Magic,Dimensional Travel,Immortality,Invulnerability,Molecular Manipulation,Energy Manipulation,Power Cosmic',
       'Cold Resistance,Durability,Energy Absorption,Super Strength,Invulnerability,Elemental Transmogrification,Fire Resistance,Natural Armor,Molecular Manipulation,Heat Resistance,Matter Absorption',
       'Accelerated Healing,Immortality,Regeneration',
       'D

In [70]:
for col in cols_to_make:
    df2[col] = df2['Powers'].str.contains(col)
df2.head()



  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col]

  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col]

Unnamed: 0,hero_names,Powers,Molecular Manipulation,Audio Control,Enhanced Senses,Animal Control,Sub-Mariner,Vision - Cryo,Toxin and Disease Control,Force Fields,...,"Durability,Flight,Longevity,Super Strength,Energy Blasts,Size Changing,Stamina,Super Speed,Reflexes,Invulnerability,Self-Sustenance","Accelerated Healing,Durability,Flight,Marksmanship,Weapons Master,Longevity,Intelligence,Super Strength,Telepathy,Stamina,Super Speed,Animal Oriented Powers,Weapon-based Powers,Enhanced Senses,Dimensional Travel,Enhanced Memory,Reflexes,Force Fields,Fire Resistance,Enhanced Hearing,Hypnokinesis,Enhanced Smell,Vision - Telescopic,Toxin and Disease Resistance,Magic Resistance,Vision - Microscopic,Vision - Night,Vision - Infrared,Vision - X-Ray,Vision - Thermal","Agility,Accelerated Healing,Durability,Stealth,Marksmanship,Longevity,Super Strength,Stamina,Jump,Reflexes,Enhanced Hearing,Enhanced Sight,Natural Weapons,Enhanced Smell,Vision - Telescopic,Toxin and Disease Resistance,Vision - Night","Flight,Telepathy,Astral Travel,Teleportation,Telekinesis,Phasing,Astral Projection,Psionic Powers,Mind Control,Intangibility,Illusions","Size Changing,Animal Oriented Powers","Flight,Energy Blasts,Size Changing","Cold Resistance,Durability,Longevity,Super Strength,Cryokinesis,Immortality","Agility,Stealth,Danger Sense,Marksmanship,Weapons Master,Longevity,Intelligence,Telepathy,Energy Blasts,Stamina,Super Speed,Telekinesis,Jump,Reflexes,Force Fields,Empathy,Precognition,Cloaking,The Force","Cryokinesis,Telepathy,Magic,Fire Control,Probability Manipulation,Water Control,Terrakinesis,Weather Control","Super Speed,Intangibility,Time Travel,Time Manipulation"
0,3-D Man,"Agility,Super Strength,Stamina,Super Speed",0,0,0,0,0,0,0,0,...,False,False,False,False,False,False,False,False,False,False
1,A-Bomb,"Accelerated Healing,Durability,Longevity,Super...",0,0,0,0,0,0,0,0,...,False,False,False,False,False,False,False,False,False,False
2,Abe Sapien,"Agility,Accelerated Healing,Cold Resistance,Du...",0,0,0,0,1,0,0,0,...,False,False,False,False,False,False,False,False,False,False
3,Abin Sur,Lantern Power Ring,0,0,0,0,0,0,0,0,...,False,False,False,False,False,False,False,False,False,False
4,Abomination,"Accelerated Healing,Intelligence,Super Strengt...",0,0,0,0,0,0,0,0,...,False,False,False,False,False,False,False,False,False,False


In [None]:
# drop transactions clumns
df2 = df2.drop(columns=['transactions','transactions_split'])
## save data for next lesson
df2.to_csv('advanced_tf_data_pt1.csv', index=False)


## Merge and Order

In [56]:
df_merged = df.merge(df2, left_on='Hero', right_on='hero_names')


#df = df.drop(columns=['hero_names_x'])

df_merged.head()

Unnamed: 0,Hero,Publisher,Gender,Eye color,Race,Hair color,Height,Skin color,Alignment,Weight,...,Invisibility,Telekinesis,Projection,Jump,Animal Oriented Powers,Heat Resistance,Substance Secretion,Agility,Probability Manipulation,Vision - X-Ray
0,A-Bomb,Marvel Comics,Male,yellow,Human,No Hair,203.0,Unknown,good,441.0,...,0,0,0,0,0,0,0,0,0,0
1,Abe Sapien,Dark Horse Comics,Male,blue,Icthyo Sapien,No Hair,191.0,blue,good,65.0,...,0,0,0,0,0,0,0,1,0,0
2,Abin Sur,DC Comics,Male,blue,Ungaran,No Hair,185.0,red,good,90.0,...,0,0,0,0,0,0,0,0,0,0
3,Abomination,Marvel Comics,Male,green,Human / Radiation,No Hair,203.0,Unknown,bad,441.0,...,0,0,0,0,0,0,0,0,0,0
4,Absorbing Man,Marvel Comics,Male,blue,Human,No Hair,193.0,Unknown,bad,122.0,...,0,0,0,0,0,1,0,0,0,0


In [57]:
#list all the columnd in df_merged
df_merged.columns.tolist()

['Hero',
 'Publisher',
 'Gender',
 'Eye color',
 'Race',
 'Hair color',
 'Height',
 'Skin color',
 'Alignment',
 'Weight',
 'hero_names',
 'Molecular Manipulation',
 'Audio Control',
 'Enhanced Senses',
 'Animal Control',
 'Sub-Mariner',
 'Vision - Cryo',
 'Toxin and Disease Control',
 'Force Fields',
 'Elasticity',
 'Levitation',
 'Wallcrawling',
 'Durability',
 'Technopath/Cyberpath',
 'Summoning',
 'Radar Sense',
 'Density Control',
 'Gliding',
 'Super Strength',
 'Enhanced Memory',
 'Omnitrix',
 'Danger Sense',
 'Grim Reaping',
 'Darkforce Manipulation',
 'Banish',
 'Intuitive aptitude',
 'Weapons Master',
 'Photographic Reflexes',
 'Animal Attributes',
 'Intangibility',
 'Empathy',
 'Energy Resistance',
 'Illumination',
 'Echolocation',
 'Vision - Night',
 'Spatial Awareness',
 'Cold Resistance',
 'Power Suit',
 'Omnilingualism',
 'Insanity',
 'Molecular Combustion',
 'Power Augmentation',
 'Camouflage',
 'Natural Weapons',
 'Vision - Microscopic',
 'Energy Manipulation',
 'Biokin

## OneHotEncoder Powers

In [38]:
from sklearn.preprocessing import OrdinalEncoder, OneHotEncoder

In [39]:
df_merged['transactions_split'].value_counts()

KeyError: 'transactions_split'

In [31]:
# define a list of columns to encode as ordinal
power_columns = ['Power1', 'Power2', 'Power3', 'Power4', 'Power5', 'Power6', 'Power7', 'Power8', 'Power9', 'Power10',
                 'Power11', 'Power12', 'Power13', 'Power14', 'Power15', 'Power16', 'Power17', 'Power18', 'Power19',
                 'Power20', 'Power21', 'Power22', 'Power23', 'Power24', 'Power25', 'Power26', 'Power27', 'Power28',
                 'Power29', 'Power30', 'Power31', 'Power32', 'Power33', 'Power34', 'Power35', 'Power36', 'Power37',
                 'Power38', 'Power39', 'Power40', 'Power41', 'Power42', 'Power43', 'Power44', 'Power45', 'Power46',
                 'Power47', 'Power48', 'Power49']


In [32]:
# Perform one-hot encoding on the 'Power' columns
encoded_df = pd.get_dummies(df_merged, columns=power_columns)

In [33]:
encoded_df

Unnamed: 0,Hero,Publisher,Gender,Eye color,Race,Hair color,Height,Skin color,Alignment,Weight,...,Power35_Water Control,Power36_Vision - Telescopic,Power37_Magnetism,Power38_Invisibility,Power39_Vision - Microscopic,Power40_Super Breath,Power41_Vision - Night,Power42_Vision - Heat,Power43_Vision - X-Ray,Power44_Vision - Thermal
0,A-Bomb,Marvel Comics,Male,yellow,Human,No Hair,203.0,Unknown,good,441.0,...,0,0,0,0,0,0,0,0,0,0
1,Abe Sapien,Dark Horse Comics,Male,blue,Icthyo Sapien,No Hair,191.0,blue,good,65.0,...,0,0,0,0,0,0,0,0,0,0
2,Abin Sur,DC Comics,Male,blue,Ungaran,No Hair,185.0,red,good,90.0,...,0,0,0,0,0,0,0,0,0,0
3,Abomination,Marvel Comics,Male,green,Human / Radiation,No Hair,203.0,Unknown,bad,441.0,...,0,0,0,0,0,0,0,0,0,0
4,Absorbing Man,Marvel Comics,Male,blue,Human,No Hair,193.0,Unknown,bad,122.0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
458,Yellowjacket,Marvel Comics,Male,blue,Human,Blond,183.0,Unknown,good,83.0,...,0,0,0,0,0,0,0,0,0,0
459,Yellowjacket II,Marvel Comics,Female,blue,Human,Strawberry Blond,165.0,Unknown,good,52.0,...,0,0,0,0,0,0,0,0,0,0
460,Yoda,George Lucas,Male,brown,Yoda's species,White,66.0,green,good,17.0,...,0,0,0,0,0,0,0,0,0,0
461,Zatanna,DC Comics,Female,blue,Human,Black,170.0,Unknown,good,57.0,...,0,0,0,0,0,0,0,0,0,0


In [34]:
#list all the columnd in df_merged
encoded_df.columns.tolist()

['Hero',
 'Publisher',
 'Gender',
 'Eye color',
 'Race',
 'Hair color',
 'Height',
 'Skin color',
 'Alignment',
 'Weight',
 'hero_names',
 'Power1_Accelerated Healing',
 'Power1_Agility',
 'Power1_Animal Attributes',
 'Power1_Animal Oriented Powers',
 'Power1_Cold Resistance',
 'Power1_Cryokinesis',
 'Power1_Darkforce Manipulation',
 'Power1_Dimensional Awareness',
 'Power1_Duplication',
 'Power1_Durability',
 'Power1_Electrokinesis',
 'Power1_Energy Absorption',
 'Power1_Energy Blasts',
 'Power1_Enhanced Memory',
 'Power1_Fire Control',
 'Power1_Flight',
 'Power1_Intelligence',
 'Power1_Lantern Power Ring',
 'Power1_Longevity',
 'Power1_Magic',
 'Power1_Marksmanship',
 'Power1_Phasing',
 'Power1_Power Absorption',
 'Power1_Power Augmentation',
 'Power1_Projection',
 'Power1_Psionic Powers',
 'Power1_Seismic Power',
 'Power1_Shapeshifting',
 'Power1_Size Changing',
 'Power1_Stamina',
 'Power1_Stealth',
 'Power1_Super Speed',
 'Power1_Super Strength',
 'Power1_Telepathy',
 'Power1_Telep

## II. Use your combined DataFrame to answer the following questions.

### 1. Compare the average weight of super powers who have Super Speed to those who do not.


In [35]:
df_merged.head()

Unnamed: 0,Hero,Publisher,Gender,Eye color,Race,Hair color,Height,Skin color,Alignment,Weight,...,Power40,Power41,Power42,Power43,Power44,Power45,Power46,Power47,Power48,Power49
0,A-Bomb,Marvel Comics,Male,yellow,Human,No Hair,203.0,Unknown,good,441.0,...,,,,,,,,,,
1,Abe Sapien,Dark Horse Comics,Male,blue,Icthyo Sapien,No Hair,191.0,blue,good,65.0,...,,,,,,,,,,
2,Abin Sur,DC Comics,Male,blue,Ungaran,No Hair,185.0,red,good,90.0,...,,,,,,,,,,
3,Abomination,Marvel Comics,Male,green,Human / Radiation,No Hair,203.0,Unknown,bad,441.0,...,,,,,,,,,,
4,Absorbing Man,Marvel Comics,Male,blue,Human,No Hair,193.0,Unknown,bad,122.0,...,,,,,,,,,,


In [36]:
super_speed_columns = [f'Power{i}_Super Speed' for i in range(1, 15)]

has_super_speed = encoded_df[encoded_df[super_speed_columns].any(axis=1)]
no_super_speed = encoded_df[~encoded_df[super_speed_columns].any(axis=1)]

average_weight_has_speed = has_super_speed['Weight'].mean()
average_weight_no_speed = no_super_speed['Weight'].mean()

print("Average weight of super powers who have Super Speed:", average_weight_has_speed)
print("Average weight of super powers who DO NOT have Super Speed:", average_weight_no_speed)

Average weight of super powers who have Super Speed: 129.18274111675126
Average weight of super powers who DO NOT have Super Speed: 102.04135338345864


## What is the average height of heroes for each publisher?

In [37]:
df.groupby('Publisher')['Height'].mean()

Publisher
DC Comics            181.923913
Dark Horse Comics    176.909091
George Lucas         159.600000
Image Comics         211.000000
Marvel Comics        191.546128
Shueisha             171.500000
Star Trek            181.500000
Team Epic TV         180.750000
Unknown              178.000000
Name: Height, dtype: float64