# Introduction to Pandas 🐼

"Borrowed" from https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html

In [None]:
import numpy as np # Never wrong to have NumPy on your side as well
import pandas as pd

**Pandas** is the Python package usually used for dataframes. It consists of two major data structures: `pd.Series` for single columns of data and `pd.DataFrame` for tables consisting of one or multiple columns.

## Object Creation
Creating a Series by passing a list of values, letting pandas create a default integer index:

In [None]:
s = pd.Series([1, 3, 5, np.nan, 6, 8])
s

(See the row index on the left side of the output)

Creating a DataFrame by passing a dict of objects that can be converted to series-like.

In [None]:
df1 = pd.DataFrame(
    {
        'A': 1.,
        'B': pd.Series([1, 2, 3, 4]),
        'C': np.array([3] * 4),
        'D': pd.Categorical(["test", "train", "test", "train"]),
        'E': 'foo'
    }
)
df1

## Viewing Data

Here is how to view the top and bottom rows of the frame:

In [None]:
df2 = pd.DataFrame(np.arange(100).reshape((25, -1)))
df2.head() # Display first n columns, default: 5

In [None]:
df2.tail()

Display the index ("row names"), columns:

In [None]:
df1.index

In [None]:
df1.columns

Get a numpy representation of the underlying data:

In [None]:
df2.head().values # This returns a numpy ndarray!

`describe` shows a quick statistic summary of the data:

In [None]:
df1.describe()

Sort DataFrame by the values of one or more columns:

In [None]:
df1.sort_values(["D", "B"])

## Selection + Slicing

Selecting a single column, which yields a Series

In [None]:
df1['A']

Note: this returns a Series!

Selecting multiple columns using a list

In [None]:
df1[['A', 'D']]

Selecting via [], which slices the rows.

In [None]:
df1[1:3]

By lists of indices and column names, similar to the numpy style:

In [None]:
df1.loc[[1, 2], ["B", "D"]]

#### Selection by Label

Common syntax: `df.loc[row_index, columns]`

In [None]:
df1.loc[:, ['A', 'B']]

In [None]:
df1.loc[2:4, ['A', 'B']]

#### Selection by Position

In [None]:
df1.iloc[3]

In [None]:
df1.iloc[2:4, 0:2]

### Boolean indexing / filtering

Using a single column’s values to select data.

In [None]:
df1[df1.B > 2]

Or the values from multiple columns:

In [None]:
df1[
    (df1.B > 2) &
    (df1.D == "test")
]

Note the brackets around the conditions!

Using the isin() method for filtering:

In [None]:
df1[df1['B'].isin([1, 3])]

### Changing Values in DataFrames
Can easily be done using the indexing methods shown above.

In [None]:
df1.loc[df1.B > 2, "C"] = 5
df1

Columns can be added (or overwritten) just like values in a dictionary:

In [None]:
df1["F"] = [3, 7, 3, 1]
df1

Appending a row to a DataFrame is a little bit more tricky: 

In [None]:
df1.append(
    {
        "A": 1,
        "B": 2,
        "D": "test"
    },
    ignore_index=True
) 

Keep in mind this operation does not change the DF in place but returns a new DF!

## Basic operations

In [None]:
df1.mean() # Along columns

In [None]:
df1.mean(axis=1) # Along rows

In [None]:
df1.sum()

In [None]:
df1.count()

## Apply functions

In [None]:
df1.apply(np.cumsum)

## Reading Recommendations:
- Merging & Joining: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
- Grouping: https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html
- Reshaping: https://pandas.pydata.org/pandas-docs/stable/user_guide/reshaping.html