
# Linear Regression in Python 3.x
## Python Libraries for Data Science - NumPy, Matplotlib
### Anirudh Jonnalagadda, PhD
##### Shell Postdoctoral Fellow @ CDS, IISc
###### (anirudhj@iisc.ac.in)

### NUMerical Python - The fundamental package for scientific computing with Python (numpy.org)

- Essentially, NumPy lets you create, and manipulate arrays within Python.
- You may wonder, what's the big deal? Can I not use lists which is provided by default? 

In [None]:
# creating lists
l1 = [1, 2, 3, 4, 5] # python2
l = list([1, 2, 3, 4, 5]) # python3 (This is more clear to understand)

In [None]:
# limitations of list operations
l+5

In [None]:
# Two ways to add a number to every element to a list

# Use for loops
new_list = list()
for i in range(len(l)): # len(l): length of l & range(n) iterates over the range [0, n)
    new_list.append(l[i] + 5)
print("new_list = {}".format(new_list)) # Python3 style of printing 

# Use list comprehension
new_list2 = [l[i]+5 for i in range(len(l))]
print("new_list2 = {}".format(new_list2))

In [None]:
%%timeit
new_list = list()
for i in range(len(l)):
    new_list.append(l[i] + 5)

In [None]:
%%timeit
[l[i]+5 for i in range(len(l))]

_List Comprehension is faster than the vannila for loop!!_

What about NumPy?

In [None]:
import numpy # general convention is import numpy as np

A = numpy.array(l)
A

In [None]:
A + 5

In [None]:
%%timeit
A+5

__The performance improvement isn't that great__

But lets look at a 25 data values instead of 5...

In [None]:
l1 = [i for i in range(25)]

In [None]:
%%timeit
new_list = list()
for i in range(len(l1)):
    new_list.append(l1[i] + 5)

In [None]:
%%timeit
[l1[i]+5 for i in range(len(l1))]

In [None]:
A = numpy.array(l1)

In [None]:
%%timeit
A+5

__NumPy improves your efficiency both in terms of computational cost, but also in terms of how you think.__

### Quick Demo of how to write efficient code with NumPy

- Use __in-built__ NumPy methods as much as possible. The NumPy developers have spent years in optimizing these methods
- Make use array slices as much as possible, use as few for loops as possible

We consider a simple numerical derivative example. We want to find the value of $\frac{d}{dx}\sin(x)$ over \[0, $2\pi$\] using definition of derivative i.e. $\frac{dy}{dx} = \lim\limits_{h\to0}\frac{y(x+h/2)-y(x-h/2)}{h}$.



In [None]:
n = 101                     # number of grid points
a = 0                       # lower bound  
b = 2*numpy.pi              # upper bound
x = numpy.linspace(a, b, n) # equally spaced grid
y = numpy.sin(x)            # integrand

In [None]:
def numpy_derivative():
    dy = y[1:]-y[:-1]
    dx = x[1:]-x[:-1]
    dy_dx = dy/dx
    return dy_dx

def std_derivative():
    h = (b-a)/n
    dy_dx = numpy.zeros(n-1) # array initialized with zeros
    for i in range(n-1):
        dy_dx[i] = (y[i+1]-y[i])/h
    return dy_dx

## Matplotlib

[Matplotlib](https://matplotlib.org/) is Python's plotting package. It is extensively well documented, and provides several (!!) [examples](https://matplotlib.org/stable/plot_types/index.html) for plotting - someone, somewhere has most likely plotted something like what you could ever want to plot (see [gallery](https://matplotlib.org/stable/gallery/index.html) and [tutorials](https://matplotlib.org/stable/tutorials/index.html)). And there's always Google and stackexchange.



In [None]:
import matplotlib.pyplot as plot
def plot_figure():
    fig = plot.figure()
    ax = fig.add_subplot(111) # 2 subplot with 1 row and 2 columns
    ax.plot(x, y, label=r"$y=\sin(x)$")
    ax.plot((x[1:]+x[:-1])/2.0, std_derivative(), '-', label= r"$\frac{dy}{dx}$ - $Standard$")
    ax.plot((x[1:]+x[:-1])/2.0, numpy_derivative(), '--', label= r"$\frac{dy}{dx}$ - $NumPy$")
    ax.set_ylabel(r"$y(x)$")
    ax.set_yticks([-1, 0, 1])
    ax.set_yticklabels([-1, 0, 1])
    ax.set_xlabel(r"$x$")
    ax.set_xticks([0, numpy.pi, 2*numpy.pi])
    ax.set_xticklabels([0, r'$\pi$', r'$2\pi$'])
    plot.legend()
    return fig, ax

In [None]:
fig, ax = plot_figure()
plot.show()

In [None]:
# Let's make some changes to make the image more pretty
import matplotlib
# Let's use LaTeX style math mode texts
# Warning: This requires a local installation of LaTeX - Windows users may have trouble
matplotlib.rcParams['text.usetex'] = True
matplotlib.rcParams["font.size"] = "16"
matplotlib.rcParams['ytick.labelsize'] = 12
matplotlib.rcParams['xtick.labelsize'] = 12

In [None]:
fig, ax = plot_figure()
ax.tick_params(axis="y",direction="in", length=5,bottom=True, top=True, left=True, right=True)
ax.tick_params(axis="x",direction="in", length=5,bottom=True, top=True, left=True, right=True)
plot.show()

Now, Let's time the implementations of the two derivative implementations we made earlier.

In [None]:
%%timeit
std_derivative()

In [None]:
%%timeit
numpy_derivative()

NumPy is an extremely optimized library which enables efficient array manipulations required for numerical computations making it the backbone of all scientific computing packages. See NumPy documentation for full functionality.