# Pandas

The common convention is to import pandas but refer to it as pd

In [2]:
import pandas as pd

## Common data structures

In [4]:
obj = pd.Series([1,2,3,4])
obj

0    1
1    2
2    3
3    4
dtype: int64

A sequence of values, which have indexes associated with them. You can think of them similar to Python dictionaries.

In [9]:
# Subset based on index
print(obj[0])

# Create new series with Python dictionary
newobj = pd.Series({'col1':1, 'col2':2})
print(newobj)

1
col1    1
col2    2
dtype: int64


In [10]:
data = {
    'col1': [1, 2, 3, 4],
    'col2': [100, 200, 300, 400],
    'col3': ['a', 'b', 'c', 'd']
}
df = pd.DataFrame(data)
df

Unnamed: 0,col1,col2,col3
0,1,100,a
1,2,200,b
2,3,300,c
3,4,400,d


Dataframes are tabular data, with two axes (row and column).

In [14]:
# Accessing data

# By column
df['col1']

# By row
df.loc[1]

col1      2
col2    200
col3      b
Name: 1, dtype: object

In [17]:
# Other manipulations

# Removing a column
df_copy = df.copy()
del df_copy['col1']
df_copy

# Removing a row
df_copy.drop(2)

Unnamed: 0,col2,col3
0,100,a
1,200,b
3,400,d


In [16]:
# transpose

df_copy.T

Unnamed: 0,0,1,2,3
col2,100,200,300,400
col3,a,b,c,d


more detailed info can be found: 

## Descriptive statistics

In [19]:
df.sum()


col1      10
col2    1000
col3    abcd
dtype: object

In [20]:
df.mean()

col1      2.5
col2    250.0
dtype: float64

In [21]:
df.describe()

Unnamed: 0,col1,col2
count,4.0,4.0
mean,2.5,250.0
std,1.290994,129.099445
min,1.0,100.0
25%,1.75,175.0
50%,2.5,250.0
75%,3.25,325.0
max,4.0,400.0


In [23]:
obj.unique() # Similar to a python set

array([1, 2, 3, 4])

In [24]:
df['col1'].value_counts()

4    1
3    1
2    1
1    1
Name: col1, dtype: int64

In [25]:
# Multiple comparision, similar to SQL's WHERE IN clause

df['col1'].isin([1,2])

0     True
1     True
2    False
3    False
Name: col1, dtype: bool

## Loading data

In [32]:
df = pd.read_csv('test.csv')
df2 = pd.read_json('test.json')

In [33]:
# TODO: learn about HDF5

## Data Cleaning

## Joins and Data Structure Manipulation