# How healthy are our Heroes?

![Superheros](http://images.wookmark.com/91478_dc-marvel-superheroes.jpg)
[Source](http://worldsuperheros.blogspot.com/2013/02/dc-super-heroes.html)<br>
We all enjoy the adventures of the Caped Crusader or the mystic arts of Doctor Strange. They help us relax our minds and entertain ourselves. Thanks to [ClaudioDavi](https://www.kaggle.com/claudiodavi) we have a dataset of the attributes of various superheros by various publishers (Marvel Comics, DC Comics, Dark Horse Comics, etc.). So, I think it is time we see how healthy our heroes are.

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.offline as py
import plotly.figure_factory as ff
py.init_notebook_mode(connected=True)
print(os.listdir("../input"))
import warnings
warnings.filterwarnings('ignore')

In [None]:
hero_info = pd.read_csv('../input/heroes_information.csv')
hero_info.head(10)

Here we see how the Dataset is structured. You will notice that some records have a value of **-99** in Height and Weight. 


In [None]:
hero_info[(hero_info['Weight'] < 0)].head(10)

I chose some heros that have such values in their weights or heights. I then looked at these attributes in [FANDOM](http://www.wikia.com/fandom) sites of the corresponding publishers. I understood that the weights or heights of these heroes were either not recorded are variable. This is understandable as lot of the characters are cosmic or mystical entities and/or can modify their physical appearances.

In [None]:
hero_pow = pd.read_csv('../input/super_hero_powers.csv')
hero_pow.head()

Here we have the data of the powers of the heros. Some of these are too specific, like the [Lantern Power Ring](http://dc.wikia.com/wiki/Green_Lantern_Ring), [Omnitirix](https://ben10.fandom.com/wiki/Omnitrix)

Let's convert the True/False values to numbers (Most of the times, numbers are easier to handle).

In [None]:
hero_pow = hero_pow*1
hero_pow.head()

In [None]:
hero_info.shape[0] == hero_pow.shape[0]

The number of heros mentioned in the two files are different. So, Let's choose the attributes of the heros present in both the files

In [None]:
names = []
weights = []
agility = []
stamina = []
total_abilities = []
gender = []
height = []
alignments = []
publisher = []
for name, agi, sta, gen, ali, pub in zip(hero_pow['hero_names'], hero_pow['Agility'], hero_pow['Stamina'], hero_info['Gender'], hero_info['Alignment'], hero_info['Publisher']):
    w = hero_info[hero_info['name'] == name]['Weight'].values
    h = hero_info[hero_info['name'] == name]['Height'].values
    abilities = sum(hero_pow[hero_pow['hero_names']==name].iloc[:,1:].values[0])
    if w.shape[0] != 0:
        names.extend([name])
        total_abilities.extend([abilities])
        weights.extend([sum(w)/w.shape[0]])
        agility.extend(['Agile' if agi == 1 else 'Not Agile'])
        stamina.extend(['Has Stamina' if sta == 1 else 'No Stamina'])
        gender.extend([gen])
        height.extend([sum(h)/h.shape[0]])
        alignments.extend([ali])
        publisher.extend([pub])

In [None]:
weights = np.array(weights)
height = np.array(height)
filtered = pd.DataFrame()
filtered['Name'] = np.array(names)[(weights > 0) & (height > 0)]
filtered['Weight'] = weights[(weights > 0) & (height > 0)]
filtered['Agility'] = np.array(agility)[(weights > 0) & (height > 0)]
filtered['Stamina'] = np.array(stamina)[(weights > 0) & (height > 0)]
filtered['Total Abilities'] = np.array(total_abilities)[(weights > 0) & (height > 0)]
filtered['Gender'] = np.array(gender)[(weights > 0) & (height > 0)]
filtered['Height'] = np.array(height)[(weights > 0) & (height > 0)]
filtered['Alignment'] = np.array(alignments)[(weights > 0) & (height > 0)]
filtered['Publisher'] = np.array(publisher)[(weights > 0) & (height > 0)]

In [None]:
filtered.head()

I was curious to know if Heavy characters were agile. 

In [None]:
plt.figure(figsize = (20,8))
sns.swarmplot(filtered['Agility'], filtered['Weight'], hue = filtered['Stamina'], palette="Set2", dodge=True)

Turns out, they are.
The Heaviest character in the dataset is also agile. Let's see who that is.

In [None]:
print(filtered['Name'][filtered['Weight']==max(filtered['Weight'])])

It's [Sasquatch](http://marvel.wikia.com/wiki/Sasquatch). Better not be on the bad side of him.

In the above plot, it is interesting to note that characters who are not agile don't have stamina.

Now, Let's take a look at the Gender distribution of the characters

In [None]:
print('Unique Genders in Dataset: {}'.format(np.unique(filtered['Gender'])))

We see that **'-'** exists in the dataset for *Gender*. Let's take a look at the characters who have **-** for their gender.  

In [None]:
filtered[filtered['Gender']=='-']

Most of these Characters are Male, except for [Mockingbird](http://hero.wikia.com/wiki/Mockingbird) and [Goblin Queen](https://x-men.fandom.com/wiki/Goblin_Queen). So, I'll replace them myself.

In [None]:
_females = ('Mockingbird','Goblin Queen',)
def fill_missing(x):
    if(x['Gender']=='-'):
        if(x['Name'] in _females):
            return 'Female'
        else:
            return 'Male'
    else:
        return x['Gender']

filtered['Gender'] = filtered.apply(fill_missing, axis=1)

In [None]:
print('Unique Genders in Dataset: {}'.format(np.unique(filtered['Gender'])))

Now, let's see the number of heroes in each gender in [Marvel](https://www.marvel.com/) and [DC](https://www.dcentertainment.com/) comics.

In [None]:
sns.countplot(filtered['Gender'][filtered['Publisher']=='Marvel Comics'])
plt.title('Gender count - Marvel Comics')

In [None]:
sns.countplot(filtered['Gender'][filtered['Publisher']=='DC Comics'])
plt.title('Gender Count - DC comics')

Looks like both of these publishers can improve on gender diversity

Now, let's take a look at how the Height of the Characters vary with their Weights

In [None]:
#plt.figure(figsize = (20,8))
sns.jointplot(x=filtered['Weight'], y=filtered['Height'], kind = 'reg')

Height and Weight appear to have an almost linear relationship

Now, for the part you're here.

Let's start by Calculating their [BMI(Body Mass Index)](https://en.wikipedia.org/wiki/Body_mass_index)

BMI is calculated as **Weight(in kg)/Height(in m)<sup>2</sup>**

Before we start calculating, we need to make sure that the data provided here is in the units we need (kg and m). How do we do that?

In [None]:
filtered.head()

I checked the attributes for Abomination: http://marvel.wikia.com/wiki/Emil_Blonsky_(Earth-616)

I compared the attributes and concluded that the weights are in *kg* and the height in *cm*.

Let's proceed to calculate the BMI.

In [None]:
filtered['BMI'] = np.divide(filtered['Weight'], np.square(filtered['Height']/100))

Let's see how the BMI is distributed amon the Good guys

In [None]:
fig = ff.create_distplot([filtered['BMI'][(filtered['Alignment'] == 'good')  & (filtered['Gender'] == 'Male') & (filtered['BMI'] < 80)], filtered['BMI'][(filtered['Alignment'] == 'good')  & (filtered['Gender'] == 'Female') & (filtered['BMI'] < 80)]], ['BMI- Good, Male', 'BMI - Good, Female'])
fig['layout'].update(title='Distribution of BMI - Good', xaxis=dict(title='BMI'))
py.iplot(fig, filename='Basic Distplot')

I got the below table from [here](https://www.cdc.gov/healthyweight/assessing/bmi/adult_bmi/index.html)

|BMI|Weight Status|
|-|-|
|Below 18.5|Underweight|
|18.5 – 24.9|Normal or Healthy Weight|
|25.0 – 29.9|Overweight|
|30.0 and Above|Obese|

Based on the table, we see that most of the characters have made it into the **Healthy Category** by a tiny amount.

Now, Let's look at the bad guys

In [None]:
fig = ff.create_distplot([filtered['BMI'][(filtered['Alignment'] == 'bad')  & (filtered['Gender'] == 'Male')], filtered['BMI'][(filtered['Alignment'] == 'bad')  & (filtered['Gender'] == 'Female')]], ['BMI- Bad, Male', 'BMI - Bad, Female'])
fig['layout'].update(title='Distribution of BMI - Bad', xaxis=dict(title='BMI'))
py.iplot(fig, filename='Basic Distplot')

Almost same here.

Just for fun, let's see who has the top 10 highest BMI.

In [None]:
filtered.sort_values(['BMI'], ascending=False).head(10)

![Utgard-Loki](https://vignette.wikia.nocookie.net/marveldatabase/images/6/6d/Utgard-Loki_%28Earth-616%29_from_Thor_Vol_4_3_001.jpg)
The winner is Utgard-Loki: http://marvel.wikia.com/wiki/Utgard-Loki_(Earth-616) . He is a cosmic being, So being overwheight must not be a matter to him.

This Data Visualization and Analysis shows us that most of the Characters are Healthy (at least by human standards).