# Jupyter notebook

Jupyter notebook is an application running inside the browser that provides an interactive Python environment.

Let's get to know it with a few common commands.

1. Try to edit the text in this cell: click on it and press `ENTER`
2. When done press `Ctrl-ENTER`

The `ESC` key enables the Command Mode. Try it. You'll see the border of the notebook change to Blue.

In Command Mode you can press `H` to access the help dialog with all the keyboard shortcuts.

Cells can be text or code. Try running the code in the next cell by pressing `Shift-ENTER`

In [None]:
2 + 2

Great! Now try practice adding and deleting cells with the following 3 commands:

- A: insert cell above
- B: insert cell below
- Arrows: navigate up/down
- DD: delete cell

Try converting a cell from text to code and viceversa using these commands:

- Y: to code
- M: to markdown (text)

# Pandas Review

Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

You can find it here: http://pandas.pydata.org/

And the documentation can be found here: http://pandas.pydata.org/pandas-docs/stable/

In this notebook we review some of its functionality.

In [None]:
import pandas as pd

In [None]:
df = pd.read_csv("../data/titanic-train.csv")

## Quick exploration

In [None]:
df.head()

In [None]:
df.info()

In [None]:
df.describe()

## Categorical data

In [None]:
df = pd.read_csv("../data/titanic-train.csv",
                 dtype={'Pclass': 'category',
                        'Sex': 'category',
                        'Embarked': 'category'}
                )

In [None]:
df.head()

In [None]:
df.info()

## Indexing

In [None]:
df.iloc[3]

In [None]:
df.loc[0:4,'Ticket']

In [None]:
df['Ticket'].head()

## Selections

In [None]:
df[df.Age > 70]

In [None]:
age = df['Age']
age.where(age > 30).head()

In [None]:
df[(df['Age'] == 11) & (df['SibSp'] == 5)]

In [None]:
df[(df.Age == 11) | (df.SibSp == 5)]

In [None]:
df.query('(Age == 11) & (SibSp == 5)')

In [None]:
df.sort_values('Age', ascending = False).head()

## Distinct elements

In [None]:
df['Embarked'].unique()

## Group by

Pandas supports many SQL-like operations like group by, order by and join. In pandas they are called:
- groupby
- sort_values
- merge

In [None]:
# Find average age of passengers that survived vs. died
df.groupby('Survived')['Age'].mean()

In [None]:
df.sort_values('Age', ascending = False).head()

In [None]:
pd.merge(df[['PassengerId', 'Survived']],
         df[['PassengerId', 'Age']],
         on='PassengerId').head()

## Pivot Tables

Pandas also supports Excel-like functionality like pivot tables

In [None]:
df.pivot_table(index='Pclass', columns='Survived', values='PassengerId', aggfunc='count')

In [None]:
df['Pclass'].value_counts()

## Exercises:

- select passengers that survived
- select passengers that embarked in port S
- select male passengers
- select passengers who paid less than 40.000 and were in third class
- locate the name of passegner Id 674
- calculate the average age of passengers using the function mean()
- count the number of survived and the number of dead passengers
- count the number of males and females
- count the number of survived and dead per each gender
- calculate average price paid by survived and dead people in each class

In [None]:
df.query('Survived == 1').head()

In [None]:
df.query('Embarked == "S"').head()

In [None]:
df[df['Sex'] == 'male'].head()

In [None]:
df[(df.Fare < 40000) & (df.Pclass == '3')].head()

In [None]:
df.query('PassengerId == 674')

In [None]:
df['Age'].mean()

In [None]:
df['Survived'].value_counts()

In [None]:
df['Sex'].value_counts()

In [None]:
df.pivot_table(index='Survived', columns='Sex', values='PassengerId', aggfunc='count')

In [None]:
df.pivot_table(index='Survived', columns='Pclass', values='Fare', aggfunc='mean')

*Copyright &copy; 2017 CATALIT LLC.  All rights reserved.*