 # Data Analysis Using [Python](https://www.python.org) - NumPy

## Why NumPy ?

### NumPy is an acronym for "Numeric Python" or "Numerical Python"

### [NumPy](http://www.numpy.org/) is the fundamental package for scientific computing with Python. It contains among other things:

* ### A powerful N-dimensional array object (ndarray) - efficiently implemented multi-dimensional arrays
* ### Array oriented computing - sophisticated (broadcasting) functions
* ### Tools for integrating C/C++ and Fortran code
* ### Designed for scientific computation - useful linear algebra, Fourier transform, and random number capabilities

In [1]:
# import NumPy library

import numpy as np

In [2]:
%%timeit
temp_list = range(100000)
temp_list1 = [ x*2 for x in temp_list]

9.82 ms ± 43.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [3]:
%%timeit
temp_array = np.arange(100000)
temp_array1 = temp_array*2

201 µs ± 5.41 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


## numpy array (ndarray)

* ### A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers

* ### **ndarray.ndim** - the number of axes (dimensions) of the array. In the Python world, the number of dimensions is referred to as rank.

* ### **ndarray.shape** - the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension.

* ### **ndarray.size** - the total number of elements of the array. This is equal to the product of the elements of shape.

* ### **ndarray.dtype** - an object describing the type of the elements in the array.

<img src="./image/fig_numpy_axes.png" alt="NumPy axes" height="300" width="300" align="left">

In [4]:
# ndarray can be created for regular python list or tupple
mylist = [2,5,8,15,25]
array = np.array(mylist)

In [5]:
type(array)

numpy.ndarray

In [6]:
array.shape

(5,)

In [7]:
array[0]

2

In [8]:
array[0:3]

array([2, 5, 8])

In [9]:
array.dtype

dtype('int64')

In [10]:
array.ndim

1

In [11]:
# dtype can be mentioned while creating an array
array2 = np.array(mylist,dtype=np.float)

In [12]:
array2

array([ 2.,  5.,  8., 15., 25.])

In [13]:
array2.dtype

dtype('float64')

In [14]:
# creating a 5 X 3 multi dimensional array
marray = np.arange(15).reshape(5,3)

In [15]:
marray

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [16]:
marray.ndim

2

In [17]:
marray.shape

(5, 3)

In [18]:
marray.size

15

In [19]:
# ravel function generates a flattens 
marray.ravel()

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [20]:
# reshape can be used to change the shape of an array
marray.ravel().reshape(3,5)

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [21]:
marray.shape

(5, 3)

## Array basic operations

In [22]:
# multiplying a scalr and ndarray
print(marray*2)

[[ 0  2  4]
 [ 6  8 10]
 [12 14 16]
 [18 20 22]
 [24 26 28]]


In [23]:
marray

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [24]:
# inplace change
# there are certain operations that will modify the object inplace like one below
marray += 10

In [25]:
marray

array([[10, 11, 12],
       [13, 14, 15],
       [16, 17, 18],
       [19, 20, 21],
       [22, 23, 24]])

In [26]:
# Guess - what would be the result of the following
marray > 15

array([[False, False, False],
       [False, False, False],
       [ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

In [27]:
arr_A = np.array( [ [2,3], [4,5] ] )
arr_B = np.array( [ [1,1], [2,1] ] )

In [28]:
# * operates element wise
arr_A * arr_B

array([[2, 3],
       [8, 5]])

In [29]:
# dot is used for matrix multiplication
# np.dot(arr_A,arr_B) also works
arr_A.dot(arr_B)

array([[ 8,  5],
       [14,  9]])

## Array Slicing

In [30]:
marray[0]

array([10, 11, 12])

In [31]:
marray[0,1]

11

In [32]:
marray

array([[10, 11, 12],
       [13, 14, 15],
       [16, 17, 18],
       [19, 20, 21],
       [22, 23, 24]])

In [33]:
marray[:,1:3]

array([[11, 12],
       [14, 15],
       [17, 18],
       [20, 21],
       [23, 24]])

In [34]:
marray[0:3,:]

array([[10, 11, 12],
       [13, 14, 15],
       [16, 17, 18]])

In [35]:
marray[1:3,1:]

array([[14, 15],
       [17, 18]])

## Broadcasting

* ### The term [Broadcasting](http://scipy.github.io/old-wiki/pages/EricsBroadcastingDoc) describes how numpy treats arrays with different shapes during arithmetic operations.
* ### Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.
* ### Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python.

In [37]:
marray

array([[10, 11, 12],
       [13, 14, 15],
       [16, 17, 18],
       [19, 20, 21],
       [22, 23, 24]])

In [38]:
marray + 5

array([[15, 16, 17],
       [18, 19, 20],
       [21, 22, 23],
       [24, 25, 26],
       [27, 28, 29]])

<img src="./image/fig_broadcast_visual_1.png" alt="Broadcasting" height="500" width="500" align="left">

### Exercises

In [None]:
# Exercise - 1
# Construct  3 by 3 ndarray with 5 as diagonal elemet and 1 as remaining elements
# [[5, 1, 1][1,5,1][1,1,5]]
# Tip : explore np.ones and np.eye functions
# the dtype should be int

In [None]:
# Exercise
# try following array slicing
a = np.arange(0,60).reshape(6,10)[0:6,0:6]

<img src="./image/fig_numpy_indexing_q.png" alt="Array Slicing" height="300" width="300" align="left">

In [None]:
a

## NumPy functions used for performing computations

* ### np.sum
* ### np.std
* ### np.mean
* ### np.max
* ### np.min

In [39]:
# np.NaN is a datatype - Not a Number
np.NaN?

[0;31mType:[0m        float
[0;31mString form:[0m nan
[0;31mDocstring:[0m   Convert a string or number to a floating point number, if possible.


In [40]:
np.random.seed(0)
arr_c = np.random.random(15).reshape((5,3))
arr_c

array([[0.5488135 , 0.71518937, 0.60276338],
       [0.54488318, 0.4236548 , 0.64589411],
       [0.43758721, 0.891773  , 0.96366276],
       [0.38344152, 0.79172504, 0.52889492],
       [0.56804456, 0.92559664, 0.07103606]])

In [41]:
arr_c.sum()

9.042960048565472

In [49]:
arr_c.min()

0.07103605819788694

In [43]:
arr_c.max()

0.9636627605010293

In [44]:
arr_c.mean()

0.6028640032376982

In [45]:
arr_c.mean(axis=0)

array([0.496554  , 0.74958777, 0.56245025])

In [46]:
arr_c.mean(axis=1)

array([0.62225542, 0.53814403, 0.76434099, 0.56802049, 0.52155909])