# 10 Minutes to Padas

This is a short introduction to pandas.

In [2]:
import numpy as np
import pandas as pd

## Basic Data Structures in Pandas

Pandas provides two types of classes for handling data:

`Series`: **a one-dimensional labeled array holding data of any type**
such as integers, strings, Python objects etc.

`DataFrame`: a two-dimensional data structure that holds data like a two-dimension array or a table with rows and columns.

## Object Creation

Creating a `Series` by passing a list of values, letting pandas create a default `RangeIndex`.



In [3]:
s = pd.Series([1,3,5,np.nan,6,8])
s

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

Creating a `DataFrame` by passing a Numpy array with a datatime index using `data_range()` and labeled columns:

In [14]:
dates = pd.date_range("20130101", periods = 6)
dates

In [22]:
df = pd.DataFrame(np.random.rand(6,4), index = dates, columns = list("ABCD"))
df

Unnamed: 0,A,B,C,D
2013-01-01,0.031456,0.606786,0.489153,0.608915
2013-01-02,0.437252,0.193781,0.698477,0.49723
2013-01-03,0.560852,0.689439,0.364026,0.950522
2013-01-04,0.43039,0.752366,0.57199,0.086978
2013-01-05,0.369941,0.32652,0.587998,0.773797
2013-01-06,0.54852,0.519458,0.591992,0.354832


Creating a `DataFrame` by passing a dictionary of objects where the keys are the column labels and the values are the column values.

In [23]:
df2 = pd.DataFrame(
    {
        "A": 1.0,
        "B": pd.Timestamp("20130102"),
        "C": pd.Series(1, index=list(range(4)), dtype="float32"),
        "D": np.array([3] * 4, dtype="int32"),
        "E": pd.Categorical(["test", "train", "test", "train"]),
        "F": "foo",
    }
)
df2

Unnamed: 0,A,B,C,D,E,F
0,1.0,2013-01-02,1.0,3,test,foo
1,1.0,2013-01-02,1.0,3,train,foo
2,1.0,2013-01-02,1.0,3,test,foo
3,1.0,2013-01-02,1.0,3,train,foo


The columns of the resulting DataFrame have different dtypes:



In [24]:
df2.dtypes

A           float64
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
dtype: object

## Viewing Data

Use `DataFrame.head()` and `DataFrame.tail()` to view the top and bottom rows of the frame respectively:


In [27]:
df.head(3) # Argument 3 is the amount of output rows

Unnamed: 0,A,B,C,D
2013-01-01,0.031456,0.606786,0.489153,0.608915
2013-01-02,0.437252,0.193781,0.698477,0.49723
2013-01-03,0.560852,0.689439,0.364026,0.950522


In [26]:
df.tail(3)

Unnamed: 0,A,B,C,D
2013-01-02,0.437252,0.193781,0.698477,0.49723
2013-01-03,0.560852,0.689439,0.364026,0.950522
2013-01-04,0.43039,0.752366,0.57199,0.086978
2013-01-05,0.369941,0.32652,0.587998,0.773797
2013-01-06,0.54852,0.519458,0.591992,0.354832


Display the `DataFrame.index` or `DataFrame.columns`:




In [29]:
df.index

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [30]:
df.columns

Index(['A', 'B', 'C', 'D'], dtype='object')

Return a NumPy representation of the underlying data with `DataFrame.to_numpy()` without the index or column labels:



In [32]:
df.to_numpy()

array([[0.03145569, 0.60678625, 0.4891535 , 0.60891543],
       [0.43725203, 0.19378103, 0.69847672, 0.49722966],
       [0.56085179, 0.68943897, 0.36402606, 0.95052216],
       [0.43039028, 0.75236567, 0.57198954, 0.08697836],
       [0.36994131, 0.32652034, 0.58799848, 0.77379692],
       [0.54852039, 0.51945754, 0.59199212, 0.35483154]])

`describe()` shows a quick statistic summary of your data:



In [33]:
df.describe()

Unnamed: 0,A,B,C,D
count,6.0,6.0,6.0,6.0
mean,0.396402,0.514725,0.550606,0.545379
std,0.193336,0.216297,0.113157,0.306304
min,0.031456,0.193781,0.364026,0.086978
25%,0.385054,0.374755,0.509863,0.390431
50%,0.433821,0.563122,0.579994,0.553073
75%,0.520703,0.668776,0.590994,0.732577
max,0.560852,0.752366,0.698477,0.950522


Transposing your data:



In [39]:
df.T

Unnamed: 0,2013-01-01,2013-01-02,2013-01-03,2013-01-04,2013-01-05,2013-01-06
A,0.031456,0.437252,0.560852,0.43039,0.369941,0.54852
B,0.606786,0.193781,0.689439,0.752366,0.32652,0.519458
C,0.489153,0.698477,0.364026,0.57199,0.587998,0.591992
D,0.608915,0.49723,0.950522,0.086978,0.773797,0.354832


`DataFrame.sort_index()` sorts by an axis:



In [40]:
df.sort_index(axis=1, ascending=False)

Unnamed: 0,D,C,B,A
2013-01-01,0.608915,0.489153,0.606786,0.031456
2013-01-02,0.49723,0.698477,0.193781,0.437252
2013-01-03,0.950522,0.364026,0.689439,0.560852
2013-01-04,0.086978,0.57199,0.752366,0.43039
2013-01-05,0.773797,0.587998,0.32652,0.369941
2013-01-06,0.354832,0.591992,0.519458,0.54852


`DataFrame.sort_values()` sorts by values:



In [41]:
df.sort_values(by="B")

Unnamed: 0,A,B,C,D
2013-01-02,0.437252,0.193781,0.698477,0.49723
2013-01-05,0.369941,0.32652,0.587998,0.773797
2013-01-06,0.54852,0.519458,0.591992,0.354832
2013-01-01,0.031456,0.606786,0.489153,0.608915
2013-01-03,0.560852,0.689439,0.364026,0.950522
2013-01-04,0.43039,0.752366,0.57199,0.086978
