# Pandas Tutorial
https://pandas.pydata.org/docs/getting_started/intro_tutorials/index.html

## What kind of data does pandas handle?
https://pandas.pydata.org/docs/getting_started/intro_tutorials/01_table_oriented.html

To start using pandas the convention is to importing `as pd`.

In [1]:
import pandas as pd

Pandas data table representation:

![alt text](001-dataframe.png)

I want to store passenger data of the Titanic. I know the name (characters), age (integers), and sex(male/female) data.


In [2]:
df = pd.DataFrame(
    {
        "Name": [
            "Braud, Mr. Owen Harris",
            "Allen, Mr. William Henry",
            "Bonnel, Miss. Elizabeth",
        ],
        "Age": [22, 35, 58],
        "Sex": ["male", "male", "female"],
    }
)

In [3]:
# print the DataFrame
df 

Unnamed: 0,Name,Age,Sex
0,"Braud, Mr. Owen Harris",22,male
1,"Allen, Mr. William Henry",35,male
2,"Bonnel, Miss. Elizabeth",58,female


To manually store data in a table, create a `DataFrame`. When using a Python dictionary of lists, the dictionary keys will be used as column headers and the values in each list as columns of the `DataFrame`.

+ A `DataFrame` is a 2-dimensional data structure that can store data of different types in columns. It is similar to a spreadsheet.

Each column in a `DataFrame` is a `Series`

![alt text](002-series.png)


I'm just interested in working with the data in the column `Age`. When selecting a single column of a pandas `DataFrame`, the result is a pandas `Series`. 

To select the column, use the column label in between square brackets `[]`

In [4]:
df["Age"]

0    22
1    35
2    58
Name: Age, dtype: int64

You can create a `Series` from scratch as well. A pandas `Series` has no column labels, as it is just a single column of a `DataFrame`.

A `Series` does have row labels.

In [5]:
ages = pd.Series([22, 35, 58], name="Age")
ages

0    22
1    35
2    58
Name: Age, dtype: int64

## Do Something with a DataFrame or Series
I want to know the maxium Age of the passengers. We can do this using the `DataFrame` or the `Series`.


In [6]:
# Using the DataFrame
df["Age"].max()

58

In [7]:
# Using the Series
ages.max()

58

As ilustrated by the `max()` method, you can *do* things with a `DataFrame` or a `Series`. 
Pandas provides a lot of functionalities, each of them a *method* you can apply to a `DataFrame` or `Series`.
As methods are functions, do not forget to use parentheses `()`.

Also, suppose you are interested in some basic statistics of the numerical data of my table. 
The `describe()` method provides a quick overview of the numerical data in a `DataFrame`. As the `Name` and `Sex` columns are textual data, these are by default not taken into account by this method..

In [8]:
df.describe()

Unnamed: 0,Age
count,3.0
mean,38.333333
std,18.230012
min,22.0
25%,28.5
50%,35.0
75%,46.5
max,58.0
