# Introduction to Pandas

This introduction is meant to give you a feel for how we will work with data in Pandas by introducing some of the basic Pandas functions.

We will learn cool stuff like:

*   List item 1
*   List item 2



# Imports

There are two Python packages that we need to import in order to run all of our functions.

## Pandas

Pandas is a powerful data analysis library. We can think of pandas as an Excel replacement. It allows us to view and explore datatables (called dataframes), manipulate data, calculate statistics, and prepare data for plotting.

## Numpy

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

Pandas uses numpy to hold our arrays of datas, so we can natively use numpy functions to manipulate data and make life easier.

In [0]:
import numpy as np
import pandas as pd

Creating a Series by passing a list of values, letting pandas create a default integer index:

In [0]:
series = pd.Series([1, 3, 5, np.nan, 6, 8])
print(series)

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64


Creating a DataFrame by passing a NumPy array, with a datetime index and labeled columns:

In [0]:
# First create the dates index
dates = pd.date_range('20130101', periods=6)
print(dates)

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')


In [0]:
# Now create a dataframe using the dates index
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
print(df)

                   A         B         C         D
2013-01-01  1.401968 -1.127411  0.241863  0.492416
2013-01-02 -0.584790  2.007197 -0.086667 -0.266566
2013-01-03  0.782372  1.150672  1.313389 -0.670734
2013-01-04  0.034873  1.360758 -0.085461  0.254689
2013-01-05 -0.680447  1.597861 -0.485678 -0.853043
2013-01-06  0.154399 -0.300130 -0.623483  0.488045


Creating a DataFrame by passing a dict of objects that can be converted to series-like.

In [0]:
df2 = pd.DataFrame({'A': 1.,
                    'B': pd.Timestamp('20130102'),
                    'C': pd.Series(1, index=list(range(4)), dtype='float32'),
                    'D': np.array([3] * 4, dtype='int32'),
                    'E': pd.Categorical(["test", "train", "test", "train"]),
                    'F': 'foo'})
df2

Unnamed: 0,A,B,C,D,E,F
0,1.0,2013-01-02,1.0,3,test,foo
1,1.0,2013-01-02,1.0,3,train,foo
2,1.0,2013-01-02,1.0,3,test,foo
3,1.0,2013-01-02,1.0,3,train,foo


The columns of the resulting DataFrame have different dtypes.

In [0]:
df2.dtypes

A           float64
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
dtype: object

# View Data

Here is how to view the top and bottom rows of the frame:

In [0]:
# Show the first 5 rows
df.head()

Unnamed: 0,A,B,C,D
2013-01-01,1.401968,-1.127411,0.241863,0.492416
2013-01-02,-0.58479,2.007197,-0.086667,-0.266566
2013-01-03,0.782372,1.150672,1.313389,-0.670734
2013-01-04,0.034873,1.360758,-0.085461,0.254689
2013-01-05,-0.680447,1.597861,-0.485678,-0.853043


In [0]:
# Show the last 3 rows
df.tail(3)

Unnamed: 0,A,B,C,D
2013-01-04,0.034873,1.360758,-0.085461,0.254689
2013-01-05,-0.680447,1.597861,-0.485678,-0.853043
2013-01-06,0.154399,-0.30013,-0.623483,0.488045


Display the index and column names:

In [0]:
# Show the index
df.index

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [0]:
# Show the columns
df.columns

DataFrame.to_numpy() gives a NumPy representation of the underlying data.

In [0]:
df.to_numpy()

In [0]:
df2.to_numpy()

`describe()` shows a quick statistic summary of your data:

In [0]:
df.describe()

Transposing your data:

In [0]:
df.T

Sorting by an axis:

In [0]:
df.sort_index(axis=1, ascending=False)

Sorting by values:

In [0]:
df.sort_values(by='B')

# Selecting Data

Selecting a single column, which yields a Series:

In [0]:
# First way to select a column
df['A']

2013-01-01    1.401968
2013-01-02   -0.584790
2013-01-03    0.782372
2013-01-04    0.034873
2013-01-05   -0.680447
2013-01-06    0.154399
Freq: D, Name: A, dtype: float64

In [0]:
# Second way to select a column
df.A

2013-01-01    1.401968
2013-01-02   -0.584790
2013-01-03    0.782372
2013-01-04    0.034873
2013-01-05   -0.680447
2013-01-06    0.154399
Freq: D, Name: A, dtype: float64

Selecting a row:

In [0]:
df.loc