# CORE: Applying Advanced Transformations

The Task
Your task is two-fold:

# I. Clean the files and combine them into one final DataFrame.

This dataframe should have the following columns:
- Hero (Just the name of the Hero)
- Publisher
- Gender
- Eye color
- Race
- Hair color
- Height (numeric)
- Skin color
- Alignment
- Weight (numeric)

Plus, one-hot-encoded columns for every power that appears in the dataset. E.g.:
- Agility
- Flight
- Superspeed
- etc.
    * Hint: There is a space in "100 kg" or "52.5 cm"

## Imports

In [1]:
# Standard Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Importing the OS and JSON Modules
import os, json

## Superhero Info DF

In [2]:
df1 = pd.read_csv('superhero_info - superhero_info.csv')
df1

Unnamed: 0,Hero|Publisher,Gender,Race,Alignment,Hair color,Eye color,Skin color,Measurements
0,A-Bomb|Marvel Comics,Male,Human,good,No Hair,yellow,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}"
1,Abe Sapien|Dark Horse Comics,Male,Icthyo Sapien,good,No Hair,blue,blue,"{'Height': '191.0 cm', 'Weight': '65.0 kg'}"
2,Abin Sur|DC Comics,Male,Ungaran,good,No Hair,blue,red,"{'Height': '185.0 cm', 'Weight': '90.0 kg'}"
3,Abomination|Marvel Comics,Male,Human / Radiation,bad,No Hair,green,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}"
4,Absorbing Man|Marvel Comics,Male,Human,bad,No Hair,blue,Unknown,"{'Height': '193.0 cm', 'Weight': '122.0 kg'}"
...,...,...,...,...,...,...,...,...
458,Yellowjacket|Marvel Comics,Male,Human,good,Blond,blue,Unknown,"{'Height': '183.0 cm', 'Weight': '83.0 kg'}"
459,Yellowjacket II|Marvel Comics,Female,Human,good,Strawberry Blond,blue,Unknown,"{'Height': '165.0 cm', 'Weight': '52.0 kg'}"
460,Yoda|George Lucas,Male,Yoda's species,good,White,brown,green,"{'Height': '66.0 cm', 'Weight': '17.0 kg'}"
461,Zatanna|DC Comics,Female,Human,good,Black,blue,Unknown,"{'Height': '170.0 cm', 'Weight': '57.0 kg'}"


### Hero|Publisher

In [3]:
df1[['Hero Name', 'Publisher']] = df1['Hero|Publisher'].str.split('|', expand = True)
df1 = df1.drop(columns = 'Hero|Publisher')
df1.sample(3)

Unnamed: 0,Gender,Race,Alignment,Hair color,Eye color,Skin color,Measurements,Hero Name,Publisher
207,Female,Unknown,good,Black,blue,Unknown,"{'Height': '180.0 cm', 'Weight': '59.0 kg'}",Huntress,DC Comics
95,Male,Human,good,Black,blue,Unknown,"{'Height': '175.0 cm', 'Weight': '74.0 kg'}",Captain Marvel II,DC Comics
162,Male,Human,good,Unknown,Unknown,Unknown,"{'Height': '183.0 cm', 'Weight': '86.0 kg'}",Flash III,DC Comics


### Measurements

In [4]:
#change single quotes on 'Measurements'
measurements = df1.loc[0,"Measurements"]
measurements = measurements.replace("'", '"')
#checking if the measurement is working
fixed_measurements = json.loads(measurements)
fixed_measurements

{'Height': '203.0 cm', 'Weight': '441.0 kg'}

In [5]:
df1['Measurements'] = df1['Measurements'].str.replace("'", '"')
df1['Measurements'] = df1['Measurements'].apply(json.loads)
df1['Measurements'].sample(3)

107     {'Height': '226.0 cm', 'Weight': '70.0 kg'}
230    {'Height': '287.0 cm', 'Weight': '855.0 kg'}
364     {'Height': '178.0 cm', 'Weight': '79.0 kg'}
Name: Measurements, dtype: object

In [6]:
height_weight = df1['Measurements'].apply(pd.Series)
height_weight.sample(3)

Unnamed: 0,Height,Weight
311,157.0 cm,79.0 kg
321,183.0 cm,86.0 kg
91,193.0 cm,90.0 kg


In [7]:
df1 = pd.concat((df1, height_weight), axis = 1)
df1 = df1.drop(columns = 'Measurements')
df1.sample(3)

Unnamed: 0,Gender,Race,Alignment,Hair color,Eye color,Skin color,Hero Name,Publisher,Height,Weight
298,Female,Luphomoid,bad,No Hair,blue,blue,Nebula,Marvel Comics,185.0 cm,83.0 kg
365,Male,Unknown,good,Red,brown,Unknown,Shatterstar,Marvel Comics,191.0 cm,88.0 kg
235,Male,Bolovaxian,good,No Hair,red,pink,Kilowog,DC Comics,234.0 cm,324.0 kg


In [8]:
df1.columns

Index(['Gender', 'Race', 'Alignment', 'Hair color', 'Eye color', 'Skin color',
       'Hero Name', 'Publisher', 'Height', 'Weight'],
      dtype='object')

In [9]:
df1 = df1[['Hero Name', 'Publisher', 'Gender', 'Race', 'Height', 'Weight', 
         'Alignment', 'Hair color', 'Eye color', 'Skin color']]
df1

Unnamed: 0,Hero Name,Publisher,Gender,Race,Height,Weight,Alignment,Hair color,Eye color,Skin color
0,A-Bomb,Marvel Comics,Male,Human,203.0 cm,441.0 kg,good,No Hair,yellow,Unknown
1,Abe Sapien,Dark Horse Comics,Male,Icthyo Sapien,191.0 cm,65.0 kg,good,No Hair,blue,blue
2,Abin Sur,DC Comics,Male,Ungaran,185.0 cm,90.0 kg,good,No Hair,blue,red
3,Abomination,Marvel Comics,Male,Human / Radiation,203.0 cm,441.0 kg,bad,No Hair,green,Unknown
4,Absorbing Man,Marvel Comics,Male,Human,193.0 cm,122.0 kg,bad,No Hair,blue,Unknown
...,...,...,...,...,...,...,...,...,...,...
458,Yellowjacket,Marvel Comics,Male,Human,183.0 cm,83.0 kg,good,Blond,blue,Unknown
459,Yellowjacket II,Marvel Comics,Female,Human,165.0 cm,52.0 kg,good,Strawberry Blond,blue,Unknown
460,Yoda,George Lucas,Male,Yoda's species,66.0 cm,17.0 kg,good,White,brown,green
461,Zatanna,DC Comics,Female,Human,170.0 cm,57.0 kg,good,Black,blue,Unknown


## Superhero Powers DF
### Powers

In [10]:
df2 = pd.read_csv('superhero_powers - superhero_powers.csv')
df2

Unnamed: 0,hero_names,Powers
0,3-D Man,"Agility,Super Strength,Stamina,Super Speed"
1,A-Bomb,"Accelerated Healing,Durability,Longevity,Super..."
2,Abe Sapien,"Agility,Accelerated Healing,Cold Resistance,Du..."
3,Abin Sur,Lantern Power Ring
4,Abomination,"Accelerated Healing,Intelligence,Super Strengt..."
...,...,...
662,Yellowjacket II,"Flight,Energy Blasts,Size Changing"
663,Ymir,"Cold Resistance,Durability,Longevity,Super Str..."
664,Yoda,"Agility,Stealth,Danger Sense,Marksmanship,Weap..."
665,Zatanna,"Cryokinesis,Telepathy,Magic,Fire Control,Proba..."


In [11]:
powers = df2.loc[2,'Powers']
powers

'Agility,Accelerated Healing,Cold Resistance,Durability,Underwater breathing,Marksmanship,Weapons Master,Longevity,Intelligence,Super Strength,Telepathy,Stamina,Immortality,Reflexes,Enhanced Sight,Sub-Mariner'

In [12]:
df2['Power split'] = df2['Powers'].str.split(',', expand = False)
df2['Power split']

0        [Agility, Super Strength, Stamina, Super Speed]
1      [Accelerated Healing, Durability, Longevity, S...
2      [Agility, Accelerated Healing, Cold Resistance...
3                                   [Lantern Power Ring]
4      [Accelerated Healing, Intelligence, Super Stre...
                             ...                        
662               [Flight, Energy Blasts, Size Changing]
663    [Cold Resistance, Durability, Longevity, Super...
664    [Agility, Stealth, Danger Sense, Marksmanship,...
665    [Cryokinesis, Telepathy, Magic, Fire Control, ...
666    [Super Speed, Intangibility, Time Travel, Time...
Name: Power split, Length: 667, dtype: object

In [13]:
exploaded = df2.explode('Power split')
exploaded

Unnamed: 0,hero_names,Powers,Power split
0,3-D Man,"Agility,Super Strength,Stamina,Super Speed",Agility
0,3-D Man,"Agility,Super Strength,Stamina,Super Speed",Super Strength
0,3-D Man,"Agility,Super Strength,Stamina,Super Speed",Stamina
0,3-D Man,"Agility,Super Strength,Stamina,Super Speed",Super Speed
1,A-Bomb,"Accelerated Healing,Durability,Longevity,Super...",Accelerated Healing
...,...,...,...
665,Zatanna,"Cryokinesis,Telepathy,Magic,Fire Control,Proba...",Weather Control
666,Zoom,"Super Speed,Intangibility,Time Travel,Time Man...",Super Speed
666,Zoom,"Super Speed,Intangibility,Time Travel,Time Man...",Intangibility
666,Zoom,"Super Speed,Intangibility,Time Travel,Time Man...",Time Travel


In [14]:
# using drop.na so we won't have NAN in our list
cols_to_make = exploaded['Power split'].dropna().unique()
cols_to_make

array(['Agility', 'Super Strength', 'Stamina', 'Super Speed',
       'Accelerated Healing', 'Durability', 'Longevity', 'Camouflage',
       'Self-Sustenance', 'Cold Resistance', 'Underwater breathing',
       'Marksmanship', 'Weapons Master', 'Intelligence', 'Telepathy',
       'Immortality', 'Reflexes', 'Enhanced Sight', 'Sub-Mariner',
       'Lantern Power Ring', 'Invulnerability', 'Animation',
       'Super Breath', 'Dimensional Awareness', 'Flight', 'Size Changing',
       'Teleportation', 'Magic', 'Dimensional Travel',
       'Molecular Manipulation', 'Energy Manipulation', 'Power Cosmic',
       'Energy Absorption', 'Elemental Transmogrification',
       'Fire Resistance', 'Natural Armor', 'Heat Resistance',
       'Matter Absorption', 'Regeneration', 'Stealth', 'Power Suit',
       'Energy Blasts', 'Energy Beams', 'Heat Generation', 'Danger Sense',
       'Phasing', 'Force Fields', 'Hypnokinesis', 'Invisibility',
       'Enhanced Senses', 'Jump', 'Shapeshifting', 'Elasticity',
 

In [15]:
for col in cols_to_make:
    df2[col] = df2['Powers'].str.contains(col)
df2.sample(3)

  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col]

Unnamed: 0,hero_names,Powers,Power split,Agility,Super Strength,Stamina,Super Speed,Accelerated Healing,Durability,Longevity,...,Weather Control,Omnipresent,Omniscient,Hair Manipulation,Nova Force,Odin Force,Phoenix Force,Intuitive aptitude,Melting,Changing Armor
155,Chromos,"Agility,Super Strength,Energy Blasts,Stamina,T...","[Agility, Super Strength, Energy Blasts, Stami...",True,True,True,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
392,Mandarin,"Weapons Master,Intelligence,Cryokinesis,Energy...","[Weapons Master, Intelligence, Cryokinesis, En...",False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
450,Nova,"Agility,Accelerated Healing,Durability,Energy ...","[Agility, Accelerated Healing, Durability, Ene...",True,True,True,True,True,True,False,...,False,False,False,False,True,False,False,False,False,False


In [17]:
df2 = df2.drop(columns = ['Powers', 'Power split', 'Power split'])
df2.sample(3)

Unnamed: 0,hero_names,Agility,Super Strength,Stamina,Super Speed,Accelerated Healing,Durability,Longevity,Camouflage,Self-Sustenance,...,Weather Control,Omnipresent,Omniscient,Hair Manipulation,Nova Force,Odin Force,Phoenix Force,Intuitive aptitude,Melting,Changing Armor
194,Destroyer,False,True,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
342,Kevin 11,True,True,True,True,True,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
69,Beta Ray Bill,False,True,True,True,True,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


## Merging Two Dataframes

In [19]:
df = pd.concat([df1, df2], axis = 1, join = 'inner')
display(df)

Unnamed: 0,Hero Name,Publisher,Gender,Race,Height,Weight,Alignment,Hair color,Eye color,Skin color,...,Weather Control,Omnipresent,Omniscient,Hair Manipulation,Nova Force,Odin Force,Phoenix Force,Intuitive aptitude,Melting,Changing Armor
0,A-Bomb,Marvel Comics,Male,Human,203.0 cm,441.0 kg,good,No Hair,yellow,Unknown,...,False,False,False,False,False,False,False,False,False,False
1,Abe Sapien,Dark Horse Comics,Male,Icthyo Sapien,191.0 cm,65.0 kg,good,No Hair,blue,blue,...,False,False,False,False,False,False,False,False,False,False
2,Abin Sur,DC Comics,Male,Ungaran,185.0 cm,90.0 kg,good,No Hair,blue,red,...,False,False,False,False,False,False,False,False,False,False
3,Abomination,Marvel Comics,Male,Human / Radiation,203.0 cm,441.0 kg,bad,No Hair,green,Unknown,...,False,False,False,False,False,False,False,False,False,False
4,Absorbing Man,Marvel Comics,Male,Human,193.0 cm,122.0 kg,bad,No Hair,blue,Unknown,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
458,Yellowjacket,Marvel Comics,Male,Human,183.0 cm,83.0 kg,good,Blond,blue,Unknown,...,False,False,False,False,False,False,False,False,False,False
459,Yellowjacket II,Marvel Comics,Female,Human,165.0 cm,52.0 kg,good,Strawberry Blond,blue,Unknown,...,False,False,False,False,False,False,False,False,False,False
460,Yoda,George Lucas,Male,Yoda's species,66.0 cm,17.0 kg,good,White,brown,green,...,False,False,False,False,False,False,False,False,False,False
461,Zatanna,DC Comics,Female,Human,170.0 cm,57.0 kg,good,Black,blue,Unknown,...,False,False,False,False,False,False,False,False,False,False


# II. Use your combined DataFrame to answer the following questions.

1. Compare the average weight of super powers who have Super Speed to those who do not.
2. What is the average height of heroes for each publisher?

In [31]:
df.info()
df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 463 entries, 0 to 462
Columns: 178 entries, Hero Name to Changing Armor
dtypes: bool(167), object(11)
memory usage: 115.4+ KB


Unnamed: 0,Hero Name,Publisher,Gender,Race,Height,Weight,Alignment,Hair color,Eye color,Skin color,...,Weather Control,Omnipresent,Omniscient,Hair Manipulation,Nova Force,Odin Force,Phoenix Force,Intuitive aptitude,Melting,Changing Armor
0,A-Bomb,Marvel Comics,Male,Human,203.0 cm,441.0 kg,good,No Hair,yellow,Unknown,...,False,False,False,False,False,False,False,False,False,False
1,Abe Sapien,Dark Horse Comics,Male,Icthyo Sapien,191.0 cm,65.0 kg,good,No Hair,blue,blue,...,False,False,False,False,False,False,False,False,False,False
2,Abin Sur,DC Comics,Male,Ungaran,185.0 cm,90.0 kg,good,No Hair,blue,red,...,False,False,False,False,False,False,False,False,False,False
3,Abomination,Marvel Comics,Male,Human / Radiation,203.0 cm,441.0 kg,bad,No Hair,green,Unknown,...,False,False,False,False,False,False,False,False,False,False
4,Absorbing Man,Marvel Comics,Male,Human,193.0 cm,122.0 kg,bad,No Hair,blue,Unknown,...,False,False,False,False,False,False,False,False,False,False
