# Panda Series

Series are a special type of data structure available in the pandas Python library. Pandas Series are similar to NumPy arrays, except that we can give them a named or datetime index instead of just a numerical index.

In [52]:
# The usual imports
import numpy as np
import pandas as pd

In [53]:
# create two lists, a NumPy array, and a dictionary.

labels = ['a', 'b', 'c']

my_list = [10, 20, 30]

arr = np.array([10, 20, 30])

d = {'a':10, 'b':20, 'c':30}

The easiest way to create a pandas Series is by passing a vanilla Python list into the pd.Series() method.

In [54]:
pd.Series(my_list)

0    10
1    20
2    30
dtype: int64

In [55]:
# adding index with our dict
pd.Series(my_list, index=labels)

a    10
b    20
c    30
dtype: int64

Once labels have been applied to a pandas Series, you can use either its numerical index or its label.

In [56]:
Series = pd.Series(my_list, index=labels)
Series[0]


10

In [57]:
Series['a']

10

In [58]:
# You can pass in a dict to make a panda Series
pd.Series(d)

a    10
b    20
c    30
dtype: int64

## Pandas flexibility

Pandas Series are highly flexible. You can pass three of Python’s built-in functions into a pandas Series without getting an error, for instance.

In [59]:
pd.Series([sum, print, len])

0      <built-in function sum>
1    <built-in function print>
2      <built-in function len>
dtype: object

## Pandas Dataframes

DataFrames are the most important data structure in the pandas library.

A pandas DataFrame is a two-dimensional data structure that has labels for both its rows and columns. 

In [62]:
# Use vanilla lists to demonstrate
rows = ['X', 'Y', 'Z']
cols = ['A', 'B', 'C', 'D', 'E']

Next, create NumPy arrays in preparation for DataFrame by generating some random data.

In [63]:
data = np.round(np.random.randn(3,5),2)

Now, wrap all of the constituent variables in the pd.DataFrame method to create a DataFrame.

In [64]:
pd.DataFrame(data, rows, cols)

Unnamed: 0,A,B,C,D,E
X,0.97,1.23,1.58,0.23,0.11
Y,-1.13,0.32,-1.88,1.08,0.75
Z,-0.73,-2.18,-0.89,0.23,-0.7


Each of the columns are a pandas series.

## Indexing and Assignment in Pandas DataFrames

We can actually call a specific Series from a pandas DataFrame using square brackets, just like how we call a element from a list.

In [65]:
df = pd.DataFrame(data, rows, cols)

In [66]:
df

Unnamed: 0,A,B,C,D,E
X,0.97,1.23,1.58,0.23,0.11
Y,-1.13,0.32,-1.88,1.08,0.75
Z,-0.73,-2.18,-0.89,0.23,-0.7


In [67]:
df['A']

X    0.97
Y   -1.13
Z   -0.73
Name: A, dtype: float64

In [68]:
df['E']

X    0.11
Y    0.75
Z   -0.70
Name: E, dtype: float64

You can pass a list of columns if you want to pull a subset of all columns in a DataFrame.

In [69]:
col_i_want = ['A', 'E']

df[col_i_want]

Unnamed: 0,A,E
X,0.97,0.11
Y,-1.13,0.75
Z,-0.73,-0.7


Chaining list elements can get you a specific 'cell' in a DataFrame.

In [70]:
df['B']['Z']

-2.18

## Create / Remove Columns from a DataFrame

Create a new column called ‘A + B’ which is the sum of columns A and B:

In [71]:
df['A + B']= df['A'] + df['B']
df

Unnamed: 0,A,B,C,D,E,A + B
X,0.97,1.23,1.58,0.23,0.11,2.2
Y,-1.13,0.32,-1.88,1.08,0.75,-0.81
Z,-0.73,-2.18,-0.89,0.23,-0.7,-2.91


If I want to remove this column use `pd.DataFrame.drop` method. Without `axis = 1` below, we would be attempting to remove a row.

In [72]:
df.drop('A + B', axis = 1)

Unnamed: 0,A,B,C,D,E
X,0.97,1.23,1.58,0.23,0.11
Y,-1.13,0.32,-1.88,1.08,0.75
Z,-0.73,-2.18,-0.89,0.23,-0.7


Note that this method doesn't modify the DataFrame itself.

In [73]:
df

Unnamed: 0,A,B,C,D,E,A + B
X,0.97,1.23,1.58,0.23,0.11,2.2
Y,-1.13,0.32,-1.88,1.08,0.75,-0.81
Z,-0.73,-2.18,-0.89,0.23,-0.7,-2.91


Two ways to make pandas automatically overwrite the current DataFrame.

In [74]:
df.drop('A + B', axis=1, inplace=True)
df

Unnamed: 0,A,B,C,D,E
X,0.97,1.23,1.58,0.23,0.11
Y,-1.13,0.32,-1.88,1.08,0.75
Z,-0.73,-2.18,-0.89,0.23,-0.7


In [75]:
df['A + B']= df['A'] + df['B']
df

Unnamed: 0,A,B,C,D,E,A + B
X,0.97,1.23,1.58,0.23,0.11,2.2
Y,-1.13,0.32,-1.88,1.08,0.75,-0.81
Z,-0.73,-2.18,-0.89,0.23,-0.7,-2.91


In [76]:
df = df.drop('A + B', axis=1)
df

Unnamed: 0,A,B,C,D,E
X,0.97,1.23,1.58,0.23,0.11
Y,-1.13,0.32,-1.88,1.08,0.75
Z,-0.73,-2.18,-0.89,0.23,-0.7


You can drop a row if you wanted to.

In [79]:
df.drop(['Z'])

Unnamed: 0,A,B,C,D,E
X,0.97,1.23,1.58,0.23,0.11
Y,-1.13,0.32,-1.88,1.08,0.75


## Select A Row From A Pandas DataFrame

DataFrame rows can be accessed by their row label using the `loc` attribute along with square brackets. You can use row label.

In [80]:
df.loc['X']

A    0.97
B    1.23
C    1.58
D    0.23
E    0.11
Name: X, dtype: float64

You can use numerical index with `iloc`.

In [82]:
df.iloc[0]

A    0.97
B    1.23
C    1.58
D    0.23
E    0.11
Name: X, dtype: float64

## Getting shape

In [83]:
df.shape

(3, 5)

## Slicing DataFrames

Select a subset of a DataFrame

In [84]:
# first two columns, first two rows
df[['A', 'B']].loc[['X', 'Y']]

Unnamed: 0,A,B
X,0.97,1.23
Y,-1.13,0.32
