In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
from scipy import stats
%matplotlib inline

In [2]:
data = pd.read_csv('Data/Superhero/charcters_stats.csv')

In [3]:
data.head(25)

Unnamed: 0,Name,Alignment,Intelligence,Strength,Speed,Durability,Power,Combat,Total
0,3-D Man,good,50,31,43,32,25,52,233
1,A-Bomb,good,38,100,17,80,17,64,316
2,Abe Sapien,good,88,14,35,42,35,85,299
3,Abin Sur,good,50,90,53,64,84,65,406
4,Abomination,bad,63,80,53,90,55,95,436
5,Abraxas,bad,88,100,83,99,100,56,526
6,Adam Monroe,good,63,10,12,100,71,64,320
7,Adam Strange,good,1,1,1,1,0,1,5
8,Agent 13,good,1,1,1,1,0,1,5
9,Agent Bob,good,10,8,13,5,5,20,61


In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 611 entries, 0 to 610
Data columns (total 9 columns):
Name            611 non-null object
Alignment       608 non-null object
Intelligence    611 non-null int64
Strength        611 non-null int64
Speed           611 non-null int64
Durability      611 non-null int64
Power           611 non-null int64
Combat          611 non-null int64
Total           611 non-null int64
dtypes: int64(7), object(2)
memory usage: 43.0+ KB


Looks like there are two scales being used for the data, binary and base 100. For the data that was entered using the a binary scale for the skill measure, the highest possible total is 6 so that threshold is used to split the dataset

In [5]:
data_s100 = data[data['Total']>6]
data_binary = data[data['Total']<7]

In [6]:
data_s100.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 434 entries, 0 to 610
Data columns (total 9 columns):
Name            434 non-null object
Alignment       432 non-null object
Intelligence    434 non-null int64
Strength        434 non-null int64
Speed           434 non-null int64
Durability      434 non-null int64
Power           434 non-null int64
Combat          434 non-null int64
Total           434 non-null int64
dtypes: int64(7), object(2)
memory usage: 33.9+ KB


In [7]:
data_binary.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 177 entries, 7 to 605
Data columns (total 9 columns):
Name            177 non-null object
Alignment       176 non-null object
Intelligence    177 non-null int64
Strength        177 non-null int64
Speed           177 non-null int64
Durability      177 non-null int64
Power           177 non-null int64
Combat          177 non-null int64
Total           177 non-null int64
dtypes: int64(7), object(2)
memory usage: 13.8+ KB


The dataset is split 434 to 177 with the majority being in the 100 scale dataset

#### After looking at the data, I raise the question who's smarter good or evil?

My hypothesis is that the good guys are smarter.

Therefore my null hypothesis is that the evil characters are smarter.

To look at this I would first have to break down the datasets by the Alignment category

In [8]:
data_s100.Alignment.value_counts()

good       299
bad        122
neutral     11
Name: Alignment, dtype: int64

In [9]:
data_binary.Alignment.value_counts()

good    133
bad      43
Name: Alignment, dtype: int64

#### The next question is what defines "Smart"

In [10]:
# 1) Using a cutoff value of intelligence, with a simple count. 
#    Only the characters with stats on the 100 scale will be used

# 80 intelligence cutoff

c80_data = data_s100[data_s100['Intelligence']>=80]

c80_data.Alignment.value_counts()

good       44
bad        41
neutral     1
Name: Alignment, dtype: int64

Based on this cutoff it seems the good guys are smarter 44 to 41

In [19]:
# what about a 90 intelligence cutoff?

c90_data = data_s100[data_s100['Intelligence']>=90]

c90_data.Alignment.value_counts()

good       16
bad        14
neutral     1
Name: Alignment, dtype: int64

Looks like the good guys still have it 16 to 14

In [11]:
# what about the smartest of all, those with a 100 intelligence level?

c100_data = data_s100[data_s100['Intelligence']>=100]

c100_data.Alignment.value_counts()

good       15
bad        12
neutral     1
Name: Alignment, dtype: int64

And the good guys still have it 15 to 12.  

So using a cutoff levels based on an intelligence score of 80, 90, 100 all support the hypothesis that good guys are smarter

But this doesn't really leave us with an inscrutable answer, especially since we are now unclear on whether beings with 100 level intelligence are all the same level of intelligence or not.

Also the dataset contains more good guys than bad guys so it would makes senses that based on count level the good guys would likely outnumber the bad.

In [13]:
# 2) what if the average intelligence of all good guys vs. all bad guys was the measure?
# still using the 100 scale dataset

bad = data_s100[data_s100['Alignment']=='bad']
good = data_s100[data_s100['Alignment']=='good']


In [14]:
print(bad.Intelligence.mean())
print(good.Intelligence.mean())

67.29508196721312
60.23076923076923


Based on this method of measure the hypothesis is refuted

####  T - Test

Use a t-test to test null hypothesis at a .05 P-value

In [18]:
stats.ttest_ind(good.Intelligence, bad.Intelligence)

Ttest_indResult(statistic=-3.1584658499782137, pvalue=0.0017008432719925284)

The results reject the null hypothesis, supporting the claim that the heros are smarter

#### Further Data

Possible further data would be an IQ test of every character. Hopefully yielding some varition for those with 100 level intelligence ratings. Since there are 28 characters with a 100 level intelligence it would be nice to get some differentiation between there characters.