# Chapter 11 Pandas Module

In [1]:
# Pandas is a Python library to deal with sequential and tabular data. It in-
# cludes many tools to manage, analyze and manipulate data in a convenient
# and efficient manner. We can think of its data structures as akin to database
# tables or spreadsheets. 

In [2]:
# Pandas is built on top of the Numpy library and has two primary data struc-
# tures viz. Series (1-dimensional) and DataFrame (2- dimensional). It can
# handle both homogeneous and heterogeneous data, and some of its many
# capabilities are:
# • ETL tools (Extraction, Transformation and Load tools)
# • Dealing with missing data (NaN)
# • Dealing with data files (csv, xls, db, hdf5, etc.)
# • Time-series manipulation tools
# In the Python ecosystem, Pandas is the best choice to retrieve, manipulate,
# analyze and transform financial data.

# 11.1 Pandas Installation

In [None]:
# The official documentation1 has a detailed explanation that spans over sev-
# eral pages on installing Pandas. We summarize it below.

# 11.1.1 Installing with pip

In [None]:
# The simplest way to install Pandas is from PyPI.
# In a terminal window, run the following command.

## pip install pandas

# In your code, you can use the escape character ’!’ to install pandas directly
# from your Python console.

## !pip install pandas

# Pip is a useful tool to manage Python’s packages and it is worth investing
# some time in knowing it better.

## pip help

# 11.1.2 Installing with Conda environments

In [None]:
# For advanced users, who like to work with Python environments for each
# project, you can create a new environment and install pandas as shown
# below.

## conda create -n EPAT python
## source activate EPAT
## conda install pandas

# 11.1.3 Testing Pandas installation

In [None]:
# To check the installation, Pandas comes with a test suite to test almost all of
# the codebase and verify that everything is working.

## import pandas as pd
## pd.test()

# 11.2 What problem does Pandas solve?

In [None]:
# Pandas works with homogeneous data series (1-Dimension) and heteroge-
# neous tabular data series (2-Dimensions). It includes a multitude of tools to
# work with these data types, such as:

# • Indexes and labels.
# • Searching of elements.
# • Insertion, deletion and modification of elements.
# • Apply set techniques, such as grouping, joining, selecting, etc.
# • Data processing and cleaning.
# • Work with time series.
# • Make statistical calculations
# • Draw graphics
# • Connectors for multiple data file formats, such as, csv, xlsx, hdf5, etc.

# 11.3 Pandas Series

In [1]:
# The first data structure in Pandas that we are going to see is the Series.
# They are homogeneous one-dimensional objects, that is, all data are of the
# same type and are implicitly labeled with an index.

In [2]:
# For example, we can have a Series of integers, real numbers, characters,
# strings, dictionaries, etc. We can conveniently manipulate these series
# performing operations like adding, deleting, ordering, joining, filtering,
# vectorized operations, statistical analysis, plotting, etc.

In [None]:
# Let’s see some examples of how to create and manipulate a Pandas Series:
import pandas as pd
s = pd.Series()
print(s)

Series([], dtype: object)


In [5]:
# Let’s create a Pandas Series of integers and print it:
import pandas as pd
s = pd.Series([1, 2, 3, 4, 5, 6, 7])
print(s)

0    1
1    2
2    3
3    4
4    5
5    6
6    7
dtype: int64


In [6]:
# Let’s create a Pandas Series of characters:
import pandas as pd
s = pd.Series(['a', 'b', 'c', 'd', 'e'])
print(s)

0    a
1    b
2    c
3    d
4    e
dtype: object


In [7]:
# Let’s create a random Pandas Series of float numbers:
import pandas as pd
import numpy as np
s = pd.Series(np.random.randn(5))
print(s)

0   -0.893950
1    1.017228
2   -1.553182
3    1.941077
4   -1.009861
dtype: float64


In [8]:
# In all these examples, we have allowed the index label to appear by default
# (without explicitly programming it). It starts at 0, and we can check the
# index as:

s.index

RangeIndex(start=0, stop=5, step=1)

In [12]:
# But we can also specify the index we need, for example:
s = pd.Series(np.random.randn(5), index= ['a', 'b', 'c', 'd', 'e'])
print(s)

a    0.501520
b   -0.394926
c   -1.359427
d   -0.900551
e    0.797512
dtype: float64


In [13]:
# Let's create a Pandas Series from a dictionary:
import pandas as pd
dictionary = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5}
s =pd.Series(dictionary)
print(s)

a    1
b    2
c    3
d    4
e    5
dtype: int64


In [None]:
# In this case, the Pandas Series is created with the dictonary keys as index
# unless we specify any other index.

# 11.3.1 Simple operations with Pandas Series

In [14]:
# When we have a Pandas Series, we can perform several simple operations
# on it. For example, let’s create two Series. One from a dictionary and the
# other from an array of integers:

In [16]:
import pandas as pd
dictionary = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
s1 = pd.Series(dictionary)
print(s1)

array = [1, 2, 3, 4, 5]
s2 = pd.Series(array)
print(s2)

a    1
b    2
c    3
d    4
e    5
dtype: int64
0    1
1    2
2    3
3    4
4    5
dtype: int64


In [17]:
# We can perform operations similar to Numpy arrays:
# Selecting one item from the Pandas Series by means of its index:

s1[0]

  s1[0]


np.int64(1)

In [18]:
s1['a']

np.int64(1)

In [19]:
s2[0]

np.int64(1)

In [20]:
# Selecting several items from the Pandas Series by means of its index:
s1[[1, 4]]

  s1[[1, 4]]


b    2
e    5
dtype: int64

In [21]:
s1[['b', 'e']]

b    2
e    5
dtype: int64

In [22]:
s2[[1, 4]]

1    2
4    5
dtype: int64

In [23]:
# Get the series starting from an element:
s1[2:]

c    3
d    4
e    5
dtype: int64

In [24]:
s2[2:]

2    3
3    4
4    5
dtype: int64

In [26]:
# Get the series up to one element:
s1[2:]

c    3
d    4
e    5
dtype: int64

In [27]:
s2[:2]

0    1
1    2
dtype: int64

In [28]:
# We can perform operations like a dictionary:

In [42]:
# Assign a value:
s1[1] = 99
print(s1)

a    99
b    99
c     3
d     4
e     5
dtype: int64


  s1[1] = 99


In [34]:
s2[1] = 99
print(s2)

0     1
1    99
2     3
3     4
4     5
dtype: int64


In [47]:
# Get a value by index (like dictionary key):
print(s)
s.get('b')

a    1
b    2
c    3
d    4
e    5
dtype: int64


np.int64(2)

In [50]:
# Here are some powerful vectorized operations that let us perform quickly calculations,for example:
# Add, subtract, multiply, divide, power, and almost any NumPy function that accepts NumPy arrays.
## s1 + 2
## s1 - 2
## s1 * 2
## s1 / 2
## s1 ** 2
## np.exp(s1)