## Pandas is table oriented

> I want to start using Pandas

In [22]:
import pandas as pd

To load the pandas package and start working with it, import the package. The community agreed shortcut for pandas is `pd`, so loading pandas as `pd` is assumed standard practice for all of the pandas documentation.

### Pandas data table representation

![](../schemas/01_table_dataframe.svg)

> I want to store passenger data of the Titanic. For a number of passengers, I know the name (characters), age (integers) and the cabin class (categories 1, 2 or 3) data.

In [35]:
my_dataframe = pd.DataFrame({
    'Name': ["Braund, Mr. Owen Harris", 
             "Allen, Mr. William Henry", 
             "Bonnell, Miss. Elizabeth"], 
    'Age': [22, 35, 58],
    'Pclass': pd.Categorical([3, 3, 1])},
    index = [0, 1, 2]
    )
my_dataframe

Unnamed: 0,Name,Age,Pclass
0,"Braund, Mr. Owen Harris",22,3
1,"Allen, Mr. William Henry",35,3
2,"Bonnell, Miss. Elizabeth",58,1


A `DataFrame` is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns. It is similar to a spreadsheet, a SQL table or the `data.frame` in R. 

- The table has 3 columns, each of them with a column label. The column labels are respectively `Name`, `Age` and `Pclass`.
- The column `Name` consists of textual data with each value a string, the column `Age` are numbers and the latter is categorical data (each category represents a cabin class).

In spreadsheet software, the table representation of our data would look very similar:

![](../schemas/01_table_spreadsheet.png)


   
<div class="alert alert-info">
    
__Note:__ You probably do not want to manually input the data of a DataFrame! In most situations, data tables stored in a file format are the starting point of an analysis. The [next tutorial](2_read_write.ipynb) provides more insight to reading data from a variety of data sources.

</div>

# Each column in a `DataFrame` is a `Series`

![](../schemas/01_table_series.svg)

> I'm just interested in working with the data in the column `Age`

In [27]:
my_dataframe["Age"]

0    22
1    35
2    58
Name: Age, dtype: int64

When selecting a single column of a pandas `DataFrame`, the result is a pandas `Series`. To select the column, use the column label in between square brackets `[]`.  Already wondering about other ways to select data, jump straight to [the tutorial on subsetting](3_subset_data.ipynb).

<div class="alert alert-info">
    
When you are familiar to Python :ref:`dictionaries <python:tut-dictionaries>`, the selection of a single column is very similar to selection of dictionary values base on the key.

</div>

You can create a `Series` from scratch as well:

In [36]:
ages = pd.Series([22, 35, 58], name = "Age")
ages

0    22
1    35
2    58
Name: Age, dtype: int64

A pandas `Series` has no column labels, as it is just a single column of a `DataFrame`. A Series does have row labels.

# Do something with a DataFrame or Series

> I want to know the maximum Age of the passengers

We can do this on the `DataFrame` by selecting the `Age` column and applying `max()`:

In [32]:
my_dataframe["Age"].max()

58

Or to the `Series`:

In [33]:
ages.max()

58

As illustrated by the `max()` method, you can _do_ things with a `DataFrame` or `Series`. Pandas provides a lot of functionalities each of them a _method_ you can apply to a `DataFrame` or `Series`. As methods are like functions, do not forget to use parenthesis `()`. Already looking forward to get more advanced summary statistics, go directly to the [tutorial on statistics](6_calculate_statistics.ipynb). Or rather want to do calculations with entire columns, go straight to [tutorial on calculating with columns](5_add_columns.ipynb).

> I'm interested in some basic statistics of the numerical data of my data table

In [34]:
my_dataframe.describe()

Unnamed: 0,Age
count,3.0
mean,38.333333
std,18.230012
min,22.0
25%,28.5
50%,35.0
75%,46.5
max,58.0


The `describe` method provides quick overview of the numerical data in a `DataFrame`. As the `Name` and `PClass` columns are character and categorical data respectively, these are by default not taken into account by the `describe` method. 

__To user guide:__ check more options on `describe` :ref:`basics.describe`

<div class="alert alert-info">
    
__Note:__ This is just a starting point. Besides the looks, also the data manipulations and calculations you would do in spreadsheet software are supported by Pandas. Continue reading the next tutorials to get you started!

</div>

## REMEMBER

- Import the package, aka `import Pandas as pd`
- A table of data is stored as a Pandas `DataFrame`
- Each column in a `DataFrame` is a `Series`
- You can do things by applying a method to a `DataFrame` or `Series`

__To user guide:__ A more extended introduction to `DataFrame` and `Series` is provided in :ref:`dsintro`.