# Pandas 101

For baby pandas (a.k.a pandas beginners), [10 Minutes to pandas](https://pandas.pydata.org/pandas-docs/version/0.23.4/10min.html#minutes-to-pandas) is extremely helpful. Below are a few I used often in my projects.

[This tutorial](https://www.tutorialspoint.com/python_pandas/python_pandas_dataframe.htm) is also very friendly to beginners.

In [4]:
import pandas as pd

## Create Series and DataFrame
[DataFrame](https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.DataFrame.html)

In [6]:
# create a series by passing a list of values
s = pd.Series([1,2,3])

In [9]:
# create a DataFrame by passing a NumPy array
df = pd.DataFrame(data = [[1,2], [3,4], [5,6]],
                  columns = ['A', 'B'],
                  index = range(3))
df

Unnamed: 0,A,B
0,1,2
1,3,4
2,5,6


In [11]:
# by passing a dictionary
df = pd.DataFrame(data = {'A': [1,3,5], 'B': [2,4,6]})
df

Unnamed: 0,A,B
0,1,2
1,3,4
2,5,6


## Viewing data

In [12]:
# view data types of columns
df.dtypes

A    int64
B    int64
dtype: object

In [13]:
# view columns
df.columns

Index(['A', 'B'], dtype='object')

In [14]:
# view values
df.values

array([[1, 2],
       [3, 4],
       [5, 6]])

In [15]:
# statistical summary of data
df.describe()

Unnamed: 0,A,B
count,3.0,3.0
mean,3.0,4.0
std,2.0,2.0
min,1.0,2.0
25%,2.0,3.0
50%,3.0,4.0
75%,4.0,5.0
max,5.0,6.0


In [16]:
df.info

<bound method DataFrame.info of    A  B
0  1  2
1  3  4
2  5  6>

## Selecting data

In [17]:
# select row using index
df.loc[df.index[0]]

A    1
B    2
Name: 0, dtype: int64

In [18]:
# select multi-axis by label
df.loc[:3, ['A', 'B']]

Unnamed: 0,A,B
0,1,2
1,3,4
2,5,6


In [19]:
# access a scalar
# can also use df.loc[df.index[0], 'A]. but df.at is faster
df.at[df.index[0], 'A']

1

In [21]:
# select by position
df.iloc[1]  # index = 1, row = 2

A    3
B    4
Name: 1, dtype: int64

In [22]:
df.iloc[0:2, 0:1]

Unnamed: 0,A
0,1
1,3


In [24]:
df.iloc[[0,2], [1]]

Unnamed: 0,B
0,2
2,6


In [25]:
# select based on value/condition using boolean indexing. 
df[df.A > 2]

Unnamed: 0,A,B
1,3,4
2,5,6


In [26]:
# use isin() for filtering
df[df.A.isin([2,3,4,5])]

Unnamed: 0,A,B
1,3,4
2,5,6
