<img src="https://github.com/danielscarvalho/data/blob/master/img/FIAP-logo.png?raw=True" style="float:right;" width="200px">
# DATA SCIENCE & STATISTICAL COMPUTING [》](https://www.fiap.com.br/)

## Dataframe & Python

Visualizar dados do Titanic

Pandas Cheat Sheet: [PDF](https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf)

In [13]:
import pandas as pd

In [14]:
!ls 

'ls' n�o � reconhecido como um comando interno
ou externo, um programa oper�vel ou um arquivo em lotes.


In [15]:
titanic_df = pd.read_csv("titanic.csv")

FileNotFoundError: [Errno 2] No such file or directory: 'titanic.csv'

In [None]:
titanic_df

In [None]:
titanic_df.sample(3).T

<pre>
VARIABLE DESCRIPTIONS:
survival        Survival
                (0 = No; 1 = Yes)
pclass          Passenger Class
                (1 = 1st; 2 = 2nd; 3 = 3rd)
name            Name
sex             Sex
age             Age
sibsp           Number of Siblings/Spouses Aboard
parch           Number of Parents/Children Aboard
ticket          Ticket Number
fare            Passenger Fare
cabin           Cabin
embarked        Port of Embarkation
                (C = Cherbourg; Q = Queenstown; S = Southampton)
</pre>

<pre>
SPECIAL NOTES:
Pclass is a proxy for socio-economic status (SES)
 1st ~ Upper; 2nd ~ Middle; 3rd ~ Lower

Age is in Years; Fractional if Age less than One (1)
 If the Age is Estimated, it is in the form xx.5

With respect to the family relation variables (i.e. sibsp and parch)
some relations were ignored.  The following are the definitions used
for sibsp and parch.

Sibling:  Brother, Sister, Stepbrother, or Stepsister of Passenger Aboard Titanic
Spouse:   Husband or Wife of Passenger Aboard Titanic (Mistresses and Fiances Ignored)
Parent:   Mother or Father of Passenger Aboard Titanic
Child:    Son, Daughter, Stepson, or Stepdaughter of Passenger Aboard Titanic

Other family relatives excluded from this study include cousins,
nephews/nieces, aunts/uncles, and in-laws.  Some children travelled
only with a nanny, therefore parch=0 for them.  As well, some
travelled with very close friends or neighbors in a village, however,
the definitions do not support such relations.
</pre>

In [9]:

import numpy as np
import matplotlib.pyplot as plt

# Set the global default size of matplotlib figures
plt.rc('figure', figsize=(10, 5))

# Size of matplotlib figures that contain subplots
fizsize_with_subplots = (10, 10)

# Size of matplotlib histogram bins
bin_size = 10


In [10]:
titanic_df.dtypes

NameError: name 'titanic_df' is not defined

In [11]:
titanic_df.info()

NameError: name 'titanic_df' is not defined

In [None]:
titanic_df.head()

In [None]:
titanic_df.tail(5)

In [None]:
titanic_df.describe()

In [None]:
titanic_df['Survived'].value_counts().plot(kind='bar', 
                                         title='Death and Survival Counts')

In [None]:
titanic_df['Pclass'].value_counts().plot(kind='bar', 
                                       title='Passenger Class Counts')

In [None]:
titanic_df['Sex'].value_counts().plot(kind='bar', 
                                    title='Gender Counts')
plt.xticks(rotation=0)

In [None]:
titanic_df['Embarked'].value_counts().plot(kind='bar', 
                                         title='Ports of Embarkation Counts')

In [3]:
titanic_df['Age'].hist()
plt.title('Age Histogram')

In [None]:
pclass_xt_df = pd.crosstab(titanic_df['Pclass'], titanic_df['Survived'])
pclass_xt_df

In [None]:
pclass_xt_df.plot(kind='bar', 
                   stacked=True, 
                   title='Survival Rate by Passenger Classes')
plt.xlabel('Passenger Class')
plt.ylabel('Survival Rate')

In [None]:
sexes_df = sorted(titanic_df['Sex'].unique())
genders_mapping_df = dict(zip(sexes_df, range(0, len(sexes_df) + 1)))
genders_mapping_df

In [None]:
titanic_df['Sex_Val'] = titanic_df['Sex'].map(genders_mapping_df).astype(int)
titanic_df.head()

In [None]:
sex_val_xt_df = pd.crosstab(titanic_df['Sex_Val'], titanic_df['Survived'])
sex_val_xt_pct_df = sex_val_xt_df.div(sex_val_xt_df.sum(1).astype(float), axis=0)
sex_val_xt_pct_df.plot(kind='bar', stacked=True, title='Survival Rate by Gender')

In [None]:
passenger_classes_df = sorted(titanic_df['Pclass'].unique())

for p_class in passenger_classes_df:
    print ('M: ', p_class, len(titanic_df[(titanic_df['Sex'] == 'male') & 
                             (titanic_df['Pclass'] == p_class)]))
    print( 'F: ', p_class, len(titanic_df[(titanic_df['Sex'] == 'female') & 
                             (titanic_df['Pclass'] == p_class)]))

In [None]:
for pclass in passenger_classes_df:
    titanic_df.AgeFill[titanic_df.Pclass == pclass].plot(kind='kde')
    
plt.title('Age Density Plot by Passenger Class')
plt.xlabel('Age')
plt.legend(('1st Class', '2nd Class', '3rd Class'), loc='best')

In [None]:
titanic_df['FamilySize'] = titanic_df['SibSp'] + titanic_df['Parch']
titanic_df.head()

In [None]:
titanic_df['FamilySize'].hist()
plt.title('Family Size Histogram')

In [None]:
family_sizes_df = sorted(titanic_df['FamilySize'].unique())
family_size_max_df = max(family_sizes_df)

df1 = titanic_df[titanic_df['Survived'] == 0]['FamilySize']
df2 = titanic_df[titanic_df['Survived'] == 1]['FamilySize']

plt.hist([df1, df2], 
         bins=family_size_max_df + 1, 
         range=(0, family_size_max_df), 
         stacked=True)

plt.legend(('Died', 'Survived'), loc='best')
plt.title('Survivors by Family Size')

LAP: Agora vamos nos incluir dos dados do Titanic e analisar os resultados

https://drive.google.com/file/d/11Wtlwj4e4duVE1VR-_fhKVKdCMhCpT-W/view?usp=sharing


Para aprender mais veja:

https://github.com/donnemartin/data-science-ipython-notebooks/blob/master/kaggle/titanic.ipynb

Referências:

- https://pandas.pydata.org/docs/
- https://matplotlib.org/stable/index.html

Criar conta no GitHub, instalar o git

Publica o notebook no GitHub

![titanic.jpg](attachment:92a590cb-c0e4-438f-8774-bd64ef4067ff.jpg)

<p style="background-color:brown; color:white; padding:10px;">
Alunos que estiverem jogando Tetris ou outros jogos no LAB durante a aula, vão ser convidados a implementar com Reinforcement Learning um "jogador" automatizado de tetris, valendo nota<br><br>
    Building an AI to MASTER Tetris<br>
    https://www.youtube.com/watch?v=1yXBNKubb2o&ab_channel=GreerViau
</p>
