# Data in Pandas

To start working with pandas we must import the pandas package:

In [0]:
import pandas as pd

The community agreed alias for pandas is pd, so loading pandas as pd is assumed standard practice for all of the pandas documentation. It's a good idea to get in the habit of using this convention!

# Pandas data table representation

![Dataframe](https://drive.google.com/uc?id=1_0mStBas65K9mTmdDwAyeM30DBO7WHKA)

Pandas stores all data in a table, just like spreadsheet aplications like Excel. We have rows and columns of data.

At the **top**, we have column labels.

To the **left** we have row labels.

# Store data in a Dataframe

We'll be working with data from the Titanic dataset for the next few sections.

To manually store data in a table, create a `DataFrame`. When using a Python dictionary of lists, the dictionary keys will be used as column headers and the values in each list as rows of the DataFrame.

In [0]:
df = pd.DataFrame({"Name": ["Braund, Mr. Owen Harris",
                            "Allen, Mr. William Henry",
                            "Bonnell, Miss. Elizabeth"],
                  "Age": [22, 35, 58],
                  "Sex": ["male", "male", "female"]}
                  )
df

Unnamed: 0,Name,Age,Sex
0,"Braund, Mr. Owen Harris",22,male
1,"Allen, Mr. William Henry",35,male
2,"Bonnell, Miss. Elizabeth",58,female


A `DataFrame` is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns.

- The table has 3 columns, each of them with a column label. The column labels are respectively *Name*, *Age* and *Sex*.
- The column *Name* consists of textual data with each value a string, the column *Age* are numbers and the column *Sex* is textual data.




# Each column in a DataFrame is a Series

![alt text](https://drive.google.com/uc?id=1jnhd9rYBOB6yGr5d2JtZ4GOdzwAY1inn)

When selecting a single column of a pandas `DataFrame`, the result is a pandas `Series`. To select the column, use the column label in between square brackets [].

If we recall Python dictionaries, the selection of a single column is very similar to selection of dictionary values based on the key.

In [0]:
df["Age"]

0    22
1    35
2    58
Name: Age, dtype: int64

Notice that the name of the new Series is **Age**, which is also the name of column that we selected from the dataframe.

We can create the same series from scratch as well:

In [0]:
ages = pd.Series([22, 35, 58], name="Age")

ages

0    22
1    35
2    58
Name: Age, dtype: int64

A pandas `Series` has no column labels because it is a single column of a `DataFrame`. A `Series` does have row labels -- 0, 1, and 2 in the above example.

# Dataframe 

What if we want to know the maximum **age** of the passengers in our dataframe?

We can do this on the DataFrame by selecting the **Age** column and applying `max()`.

In [0]:
# Get the age data from the Dataframe and apply max()
df["Age"].max()

58

In [0]:
# Get the age data from the "ages" Series we made and apply max()
ages.max()

58

The `describe()` method provides a quick overview of the numerical data in a DataFrame. Because the *Name* and *Sex* columns are textual data, these are by default not taken into account by the `describe()` method.

In [0]:
df.describe()

Unnamed: 0,Age
count,3.0
mean,38.333333
std,18.230012
min,22.0
25%,28.5
50%,35.0
75%,46.5
max,58.0


Many pandas operations return a `DataFrame` or a `Series`. The `describe()` method is an example of a pandas operation returning a pandas `Series`.

# Summary

- Import the package: `import pandas as pd`
- A table of data is stored as a pandas `DataFrame`
- Each column in a `DataFrame` is a `Series`
- You can do things by applying a method to a `DataFrame` or `Series`

