# **Applying Advanced Transformations (Core)**

**I. Clean the files and combine them into one final DataFrame.**

This dataframe should have the following columns:
- Hero (Just the name of the Hero)
- Publisher
- Gender
- Eye color
- Race
- Hair color
- Height (numeric)
- Skin color
- Alignment
- Weight (numeric)
- Plus, one-hot-encoded columns for every power that appears in the dataset. E.g.:
    - Agility
    - Flight
    - Superspeed etc.

Hint: There is a space in "100 keach publisher?

**II. Use your combined DataFrame to answer the following questions.**

- Compare the average weight of super powers who have Super Speed to those who do not.
- What is the average height of heroes for each publisher?

In [1]:
## Standard Imports
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)

## Importing the OS and JSON Modules
import os,json

### **I. Clean the files and combine them into one final DataFrame.**

In [2]:
info_file = 'Data/superhero_info.csv'
powers_file = 'Data/superhero_powers.csv'
output_file = 'Data/superhero_combined.csv'

df_info = pd.read_csv(info_file)
df_powers = pd.read_csv(powers_file)

In [3]:
df_info.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 463 entries, 0 to 462
Data columns (total 8 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Hero|Publisher  463 non-null    object
 1   Gender          463 non-null    object
 2   Race            463 non-null    object
 3   Alignment       463 non-null    object
 4   Hair color      463 non-null    object
 5   Eye color       463 non-null    object
 6   Skin color      463 non-null    object
 7   Measurements    463 non-null    object
dtypes: object(8)
memory usage: 29.1+ KB


In [4]:
df_info.head()

Unnamed: 0,Hero|Publisher,Gender,Race,Alignment,Hair color,Eye color,Skin color,Measurements
0,A-Bomb|Marvel Comics,Male,Human,good,No Hair,yellow,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}"
1,Abe Sapien|Dark Horse Comics,Male,Icthyo Sapien,good,No Hair,blue,blue,"{'Height': '191.0 cm', 'Weight': '65.0 kg'}"
2,Abin Sur|DC Comics,Male,Ungaran,good,No Hair,blue,red,"{'Height': '185.0 cm', 'Weight': '90.0 kg'}"
3,Abomination|Marvel Comics,Male,Human / Radiation,bad,No Hair,green,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}"
4,Absorbing Man|Marvel Comics,Male,Human,bad,No Hair,blue,Unknown,"{'Height': '193.0 cm', 'Weight': '122.0 kg'}"


In [5]:
df_powers.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 667 entries, 0 to 666
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   hero_names  667 non-null    object
 1   Powers      667 non-null    object
dtypes: object(2)
memory usage: 10.5+ KB


In [6]:
df_powers.head()

Unnamed: 0,hero_names,Powers
0,3-D Man,"Agility,Super Strength,Stamina,Super Speed"
1,A-Bomb,"Accelerated Healing,Durability,Longevity,Super..."
2,Abe Sapien,"Agility,Accelerated Healing,Cold Resistance,Du..."
3,Abin Sur,Lantern Power Ring
4,Abomination,"Accelerated Healing,Intelligence,Super Strengt..."


In [7]:
df_superhero = pd.concat([df_info, df_powers], axis=1)
df_superhero.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 667 entries, 0 to 666
Data columns (total 10 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Hero|Publisher  463 non-null    object
 1   Gender          463 non-null    object
 2   Race            463 non-null    object
 3   Alignment       463 non-null    object
 4   Hair color      463 non-null    object
 5   Eye color       463 non-null    object
 6   Skin color      463 non-null    object
 7   Measurements    463 non-null    object
 8   hero_names      667 non-null    object
 9   Powers          667 non-null    object
dtypes: object(10)
memory usage: 52.2+ KB


In [8]:
df_superhero.head()

Unnamed: 0,Hero|Publisher,Gender,Race,Alignment,Hair color,Eye color,Skin color,Measurements,hero_names,Powers
0,A-Bomb|Marvel Comics,Male,Human,good,No Hair,yellow,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}",3-D Man,"Agility,Super Strength,Stamina,Super Speed"
1,Abe Sapien|Dark Horse Comics,Male,Icthyo Sapien,good,No Hair,blue,blue,"{'Height': '191.0 cm', 'Weight': '65.0 kg'}",A-Bomb,"Accelerated Healing,Durability,Longevity,Super..."
2,Abin Sur|DC Comics,Male,Ungaran,good,No Hair,blue,red,"{'Height': '185.0 cm', 'Weight': '90.0 kg'}",Abe Sapien,"Agility,Accelerated Healing,Cold Resistance,Du..."
3,Abomination|Marvel Comics,Male,Human / Radiation,bad,No Hair,green,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}",Abin Sur,Lantern Power Ring
4,Absorbing Man|Marvel Comics,Male,Human,bad,No Hair,blue,Unknown,"{'Height': '193.0 cm', 'Weight': '122.0 kg'}",Abomination,"Accelerated Healing,Intelligence,Super Strengt..."


In [9]:
df_info[['Hero','Publisher']] = df_info['Hero|Publisher'].str.split('|',expand=True)

In [10]:
# Fix Dictionary Column
df_info['Measurements'] = df_info['Measurements'].str.replace("'",'"')
df_info['Measurements'] = df_info['Measurements'].apply(json.loads)

# Split Dictionary Column into 2 columns and convert to numerics (remove 'cm' and 'kg')
measurements = df_info['Measurements'].apply(pd.Series)
measurements['Height'] = pd.to_numeric(measurements['Height'].str.replace(' cm',''))
measurements['Weight'] = pd.to_numeric(measurements['Weight'].str.replace(' kg',''))

# Add back to initial DataFrame
df_info = pd.concat([df_info, measurements], axis=1)

In [11]:
df_info.drop(columns=['Hero|Publisher', 'Measurements'], inplace=True)
df_info.head()

Unnamed: 0,Gender,Race,Alignment,Hair color,Eye color,Skin color,Hero,Publisher,Height,Weight
0,Male,Human,good,No Hair,yellow,Unknown,A-Bomb,Marvel Comics,203.0,441.0
1,Male,Icthyo Sapien,good,No Hair,blue,blue,Abe Sapien,Dark Horse Comics,191.0,65.0
2,Male,Ungaran,good,No Hair,blue,red,Abin Sur,DC Comics,185.0,90.0
3,Male,Human / Radiation,bad,No Hair,green,Unknown,Abomination,Marvel Comics,203.0,441.0
4,Male,Human,bad,No Hair,blue,Unknown,Absorbing Man,Marvel Comics,193.0,122.0


In [12]:
df_powers = pd.concat([df_powers.drop(columns='Powers'),df_powers['Powers'].str.get_dummies(',')],axis=1)
df_powers.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 667 entries, 0 to 666
Columns: 168 entries, hero_names to Wind Control
dtypes: int64(167), object(1)
memory usage: 875.6+ KB


In [13]:
df_powers.head()

Unnamed: 0,hero_names,Accelerated Healing,Adaptation,Agility,Animal Attributes,Animal Control,Animal Oriented Powers,Animation,Anti-Gravity,Astral Projection,Astral Travel,Audio Control,Banish,Biokinesis,Camouflage,Changing Armor,Clairvoyance,Cloaking,Cold Resistance,Cryokinesis,Danger Sense,Darkforce Manipulation,Death Touch,Density Control,Dexterity,Dimensional Awareness,Dimensional Travel,Duplication,Durability,Echolocation,Elasticity,Electrical Transport,Electrokinesis,Element Control,Elemental Transmogrification,Empathy,Energy Absorption,Energy Armor,Energy Beams,Energy Blasts,Energy Constructs,Energy Manipulation,Energy Resistance,Enhanced Hearing,Enhanced Memory,Enhanced Senses,Enhanced Sight,Enhanced Smell,Enhanced Touch,Fire Control,Fire Resistance,Flight,Force Fields,Gliding,Gravity Control,Grim Reaping,Hair Manipulation,Heat Generation,Heat Resistance,Hyperkinesis,Hypnokinesis,Illumination,Illusions,Immortality,Insanity,Intangibility,Intelligence,Intuitive aptitude,Invisibility,Invulnerability,Jump,Lantern Power Ring,Levitation,Light Control,Longevity,Magic,Magic Resistance,Magnetism,Marksmanship,Matter Absorption,Melting,Mind Blast,Mind Control,Mind Control Resistance,Molecular Combustion,Molecular Dissipation,Molecular Manipulation,Natural Armor,Natural Weapons,Nova Force,Odin Force,Omnilingualism,Omnipotent,Omnipresent,Omniscient,Omnitrix,Peak Human Condition,Phasing,Phoenix Force,Photographic Reflexes,Plant Control,Portal Creation,Possession,Power Absorption,Power Augmentation,Power Cosmic,Power Nullifier,Power Sense,Power Suit,Precognition,Probability Manipulation,Projection,Psionic Powers,Qwardian Power Ring,Radar Sense,Radiation Absorption,Radiation Control,Radiation Immunity,Reality Warping,Reflexes,Regeneration,Resurrection,Seismic Power,Self-Sustenance,Shapeshifting,Size Changing,Sonar,Sonic Scream,Spatial Awareness,Speed Force,Stamina,Stealth,Sub-Mariner,Substance Secretion,Summoning,Super Breath,Super Speed,Super Strength,Symbiote Costume,Technopath/Cyberpath,Telekinesis,Telepathy,Telepathy Resistance,Teleportation,Terrakinesis,The Force,Thirstokinesis,Time Manipulation,Time Travel,Toxin and Disease Control,Toxin and Disease Resistance,Underwater breathing,Vision - Cryo,Vision - Heat,Vision - Infrared,Vision - Microscopic,Vision - Night,Vision - Telescopic,Vision - Thermal,Vision - X-Ray,Vitakinesis,Wallcrawling,Water Control,Weapon-based Powers,Weapons Master,Weather Control,Web Creation,Wind Control
0,3-D Man,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,A-Bomb,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Abe Sapien,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
3,Abin Sur,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Abomination,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### **Combine into 1 Data Frame**

In [14]:
df_final = pd.merge(left=df_info, right=df_powers, left_on='Hero', right_on='hero_names')
df_final.to_csv(output_file)

In [15]:
df_final = pd.read_csv(output_file)
df_final.head()

Unnamed: 0.1,Unnamed: 0,Gender,Race,Alignment,Hair color,Eye color,Skin color,Hero,Publisher,Height,Weight,hero_names,Accelerated Healing,Adaptation,Agility,Animal Attributes,Animal Control,Animal Oriented Powers,Animation,Anti-Gravity,Astral Projection,Astral Travel,Audio Control,Banish,Biokinesis,Camouflage,Changing Armor,Clairvoyance,Cloaking,Cold Resistance,Cryokinesis,Danger Sense,Darkforce Manipulation,Death Touch,Density Control,Dexterity,Dimensional Awareness,Dimensional Travel,Duplication,Durability,Echolocation,Elasticity,Electrical Transport,Electrokinesis,Element Control,Elemental Transmogrification,Empathy,Energy Absorption,Energy Armor,Energy Beams,Energy Blasts,Energy Constructs,Energy Manipulation,Energy Resistance,Enhanced Hearing,Enhanced Memory,Enhanced Senses,Enhanced Sight,Enhanced Smell,Enhanced Touch,Fire Control,Fire Resistance,Flight,Force Fields,Gliding,Gravity Control,Grim Reaping,Hair Manipulation,Heat Generation,Heat Resistance,Hyperkinesis,Hypnokinesis,Illumination,Illusions,Immortality,Insanity,Intangibility,Intelligence,Intuitive aptitude,Invisibility,Invulnerability,Jump,Lantern Power Ring,Levitation,Light Control,Longevity,Magic,Magic Resistance,Magnetism,Marksmanship,Matter Absorption,Melting,Mind Blast,Mind Control,Mind Control Resistance,Molecular Combustion,Molecular Dissipation,Molecular Manipulation,Natural Armor,Natural Weapons,Nova Force,Odin Force,Omnilingualism,Omnipotent,Omnipresent,Omniscient,Omnitrix,Peak Human Condition,Phasing,Phoenix Force,Photographic Reflexes,Plant Control,Portal Creation,Possession,Power Absorption,Power Augmentation,Power Cosmic,Power Nullifier,Power Sense,Power Suit,Precognition,Probability Manipulation,Projection,Psionic Powers,Qwardian Power Ring,Radar Sense,Radiation Absorption,Radiation Control,Radiation Immunity,Reality Warping,Reflexes,Regeneration,Resurrection,Seismic Power,Self-Sustenance,Shapeshifting,Size Changing,Sonar,Sonic Scream,Spatial Awareness,Speed Force,Stamina,Stealth,Sub-Mariner,Substance Secretion,Summoning,Super Breath,Super Speed,Super Strength,Symbiote Costume,Technopath/Cyberpath,Telekinesis,Telepathy,Telepathy Resistance,Teleportation,Terrakinesis,The Force,Thirstokinesis,Time Manipulation,Time Travel,Toxin and Disease Control,Toxin and Disease Resistance,Underwater breathing,Vision - Cryo,Vision - Heat,Vision - Infrared,Vision - Microscopic,Vision - Night,Vision - Telescopic,Vision - Thermal,Vision - X-Ray,Vitakinesis,Wallcrawling,Water Control,Weapon-based Powers,Weapons Master,Weather Control,Web Creation,Wind Control
0,0,Male,Human,good,No Hair,yellow,Unknown,A-Bomb,Marvel Comics,203.0,441.0,A-Bomb,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,1,Male,Icthyo Sapien,good,No Hair,blue,blue,Abe Sapien,Dark Horse Comics,191.0,65.0,Abe Sapien,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
2,2,Male,Ungaran,good,No Hair,blue,red,Abin Sur,DC Comics,185.0,90.0,Abin Sur,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,3,Male,Human / Radiation,bad,No Hair,green,Unknown,Abomination,Marvel Comics,203.0,441.0,Abomination,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,4,Male,Human,bad,No Hair,blue,Unknown,Absorbing Man,Marvel Comics,193.0,122.0,Absorbing Man,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


### **II. Use your combined DataFrame to answer the following questions.**

In [24]:
# Calculate the average weight of heroes 
average_weight = df_final.groupby('Super Speed')['Weight'].mean()
average_weight

Super Speed
0    101.773585
1    129.404040
Name: Weight, dtype: float64

In [21]:
# Calculate average height for each publisher
average_height_by_publisher = df_final.groupby('Publisher')['Height'].mean()

print(average_height_by_publisher)

Publisher
DC Comics            181.923913
Dark Horse Comics    176.909091
George Lucas         159.600000
Image Comics         211.000000
Marvel Comics        191.546128
Shueisha             171.500000
Star Trek            181.500000
Team Epic TV         180.750000
Unknown              178.000000
Name: Height, dtype: float64
