# NumPy Module

In [1]:
# NumPy, an acronym for Numerical Python, is a package to perform scientific
# computing in Python efficiently. It includes random number generation
# capabilities, functions for basic linear algebra, Fourier transforms as well
# as a tool for integrating Fortran and C/C++ code along with a bunch of
# other functionalities.

In [2]:
# NumPy is an open-source project and a successor to two earlier scientific
# Python libraries: Numeric and Numarray.

In [3]:
# It can be used as an efficient multi-dimensional container of generic
# data. This allows NumPy to integrate with a wide variety of databases
# seamlessly. It also features a collection of routines for processing single and
# multidimensional vectors known as arrays in programming parlance.

In [None]:
# NumPy is not a part of the Python Standard Library and hence, as with
# any other such library or module, it needs to be installed on a workstation
# before it can be used. Based on the Python distribution one uses, it can
# be installed via a command prompt, conda prompt, or terminal using the
# following command. One point to note is that if we use the Anaconda distri-
# bution to install Python, most of the libraries (like NumPy, pandas, scikit-learn,
# matplotlib, etc. ) used in the scientific Python ecosystem come pre-installed.

# # pip install numpy

# Note: If we use the Python or iPython console to install the NumPy library, the command to install it 
# would be preceded by the character!

In [None]:
# Once installed we can use it by importing into our program by using the import statement. The de facto way of importing 
# is shown below:

# # import numpy as np

# Here, the NumPy library is imported with an alias of np so that any functionality within it can be used with convenience.
# We will be using this form of alias for all examples in this section.

# 10.1 NumPy Arrays

# A Python list is a pretty powerful sequential data structure with some nifty
# features. For example, it can hold elements of various data types which
# can be added, changed or removed as required. Also, it allows index sub-
# setting and traversal. But lists lack an important feature that is needed
# while performing data analysis tasks. We often want to carry out operations
# over an entire collection of elements, and we expect Python to perform this
# fast. With lists executing such operations over all elements efficiently is a
# problem. For example, let’s consider a case where we calculate PCR (Put
# Call Ratio) for the previous 5 days. Say, we have put and call options vol-
# ume (in Lacs) stored in lists call_vol and put_vol respectively. We then
# compute the PCR by dividing put volume by call volume as illustrated in
# the below script:

In [4]:
# put volume in lacs
put_vol = [52.89, 45.14, 63.84, 77.1, 74.6]

# call volume in lacs
call_vol = [49.51, 50.45, 59.11, 80.49, 65.11]

# Computing Put Call Ratio (PCR)
put_vol / call_vol

TypeError: unsupported operand type(s) for /: 'list' and 'list'

In [5]:
# Unfortunately, Python threw an error while calculating PCR values as it
# has no idea on how to do calculations on lists. We can do this by iterating
# over each item in lists and calculating the PCR for each day separately.
# However, doing so is inefficient and tiresome too. A way more elegant
# solution is to use NumPy arrays, an alternative to the regular Python list.

In [6]:
# The NumPy array is pretty similar to the list, but has one useful feature: we
# can perform operations over entire arrays(all elements in arrays). It’s easy
# as well as super fast. Let us start by creating a NumPy array. To do this,
# we use array() function from the NumPy package and create the NumPy
# version of put_vol and call_vol lists.

In [8]:
# Importing NumPy library
import numpy as np

# Creating arrays
n_put_vol = np.array(put_vol)
n_call_vol = np.array(call_vol)

n_put_vol

array([52.89, 45.14, 63.84, 77.1 , 74.6 ])

In [9]:
n_call_vol

array([49.51, 50.45, 59.11, 80.49, 65.11])

In [10]:
# Here, we have two arrays n_put_vol and n_call_vol which holds put and
# call volume respectively. Now, we can calculate PCR in one line:

# Computing Put Call Ratio (PCR)
pcr = n_put_vol / n_call_vol

pcr

array([1.06826904, 0.89474727, 1.0800203 , 0.95788297, 1.14575334])

In [11]:
# This time it worked, and calculations were performed element-wise. The
# first observation in pcr array was calculated by dividing the first element
# in n_put_vol by the first element in n_call_vol array. The second element
# in pcr was computed using the second element in the respective arrays
# and so on.

In [12]:
# First, when we tried to compute PCR with regular lists, we got an error, be-
# cause Python cannot do calculations with lists like we want it to. Then
# we converted these regular lists to NumPy arrays and the same opera-
# tion worked without any problem. NumPy work with arrays as if they
# are scalars. But we need to pay attention here. NumPy can do this easily
# because it assumes that array can only contain values of a single type. It’s
# either an array of integers, floats or booleans and so on. If we try to cre-
# ate an array of different types like the one mentioned below, the resulting
# NumPy array will contain a single type only. String in the below case:

np.array([1, 'Python', True])

array(['1', 'Python', 'True'], dtype='<U21')

In [None]:
# NOTE : NumPy arrays are made to be created as homogeneous
# arrays, considering the mathematical operations that can be per-
# formed on them. It would not be possible with heterogeneous
# data sets.

In [None]:
# In the example given above, an integer and a boolean were both converted
# to strings. NumPy array is a new type of data structure type like the Python
# list type that we have seen before. This also means that it comes with its
# own methods, which will behave differently from other types. Let us im-
# plement the + operation on the Python list and NumPy arrays and see how
# they differ.

In [None]:
# Creating lists
list_1 = [1, 2, 3]
list_2 = [5, 6, 4]

# Adding two lists
list_1 + list_2

[1, 2, 3, 5, 6, 4]

In [15]:
# Creating arrays
arr_1 = np.array([1, 2, 3])
arr_2 = np.array([5, 6, 4])

# Adding two arrays
arr_1 + arr_2

array([6, 8, 7])

In [16]:
# As can be seen in the above example, performing the + operation with
# list_1 and list_2, the list elements are pasted together, generating a list
# with 6 elements. On the other hand, if we do this with NumPy arrays,
# Python will do an element-wise sum of the arrays.

# 10.1.1 N-dimensional arrays

In [17]:
# Until now we have worked with two arrays: n_put_vol and n_call_vol. If
# we are to check its type using type(), Python tells us that they are of type
# numpy.ndarray as shown below:

# Checking array type
type(n_put_vol)

numpy.ndarray

In [18]:
# Based on the output we got, it can be inferred that they are of data type
# ndarray which stands for n-dimensional array within NumPy. These arrays
# are one-dimensional arrays, but NumPy also allows us to create two dimen-
# sional, three dimensional and so on. We will stick to two dimensional for
# our learning purpose in this module. We can create a 2D (two dimensional)
# NumPy array from a regular Python list of lists. Let us create one array for
# all put and call volumes.

In [19]:
# Recalling put and call volumes lists
put_vol

[52.89, 45.14, 63.84, 77.1, 74.6]

In [20]:
call_vol

[49.51, 50.45, 59.11, 80.49, 65.11]

In [21]:
# Creating a two-dimensional array
n_2d = np.array([put_vol, call_vol])
n_2d

array([[52.89, 45.14, 63.84, 77.1 , 74.6 ],
       [49.51, 50.45, 59.11, 80.49, 65.11]])

In [None]:
# We see that n_2d array is a rectangular data structure. Each list pro-
# vided in the np.array creation function corresponds to a row in the two-
# dimensional NumPy array. Also for 2D arrays, the NumPy rule applies: an
# array can only contain a single type. If we change one float value in the
# above array definition, all the array elements will be coerced to strings, to
# end up with a homogeneous array. We can think of a 2D array as an ad-
# vanced version of lists of a list. We can perform element-wise operation
# with 2D as we had seen for a single dimensional array.