# Introduction to NumPy
This notebook showcases the utility of python's scientific library `Numpy`, that is based on the powerful `ndarray`object and different linear algebra operations.

### Imports

In [11]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

### Comparing native python objects with numpy arrays

In [12]:
# A native list of thousand integers [0,999].
# Note that range() function will return an iterator.
# It is fed to a list() constructor, to yeild a list.
test_list = list(range(1000))

In [13]:
# An equivalent numpy array of thousand integers.
test_array = np.arange(1000)

In [14]:
# Using magic function %timeit to time the sum() function on test_list.
%timeit sum(test_list)

7.01 µs ± 145 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [15]:
# Using %timeit to time np.sum() on test_array.
%timeit  np.sum(test_array)

3.51 µs ± 193 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


We can see that numpy peforms almost twice as fast as native python and this difference becomes considerably large for larger arrays. (One to two order of magnitudes faster)<br>

Python list is essentially a dynamically resized array that contains pointers to objects of different kinds. These objects are **not contigously allocated.** Thus we have an overhead in terms of time whenever we need to lookup these objects. This overhead can affect large data, where it can become noticable.<br>

In numpy, we have an **n-ary size dense array** that **is contigous in memory** since only one lookup is needed. Python, being a dynamically typed language has a lot of overheads for each expression, involving a lot of error checking steps. This overhead is removed in the numpy object of **ndarray** since it already has a **specified** type, again needing one lookup. Thus, numpy essentially moves from the dynamic type-saftey of the language to enhance speed.<br>

**Python Lists:** Allowed to be Heterogenous, resized.<br>
**Numpy Arrays:** Allowed to be homogenous, fixed size. (99% true, because it's very expensive)

Not only is the data structure changed, but also the data in context of the above two.

In [16]:
# numpy array dtype: data type, an 8 bit integer here
a = np.array([-1, 0, 1, 100], dtype='int8')
a

array([ -1,   0,   1, 100], dtype=int8)

In [17]:
# Dividing a numpy array by 0, (a warning is raised)
# The values based on IEEE representation
a // 0

  This is separate from the ipykernel package so we can avoid doing imports until


array([0, 0, 0, 0], dtype=int8)

In [18]:
# Python's Arbitrarily Large numbers
100 ** 16

100000000000000000000000000000000

In [19]:
# Note how 100 Squared becomes 16, this is because 10000 cannot be represented in 8 bits. (Integer Overflow)
a ** 2

array([ 1,  0,  1, 16], dtype=int8)

In [20]:
# Coersing the array a to become a 32 bit floating type using astype function and supplying
# the string literal 'float32'
b = a.astype('float32')
b

array([ -1.,   0.,   1., 100.], dtype=float32)

In [21]:
# True division corresponds to IEEE Representation. (Issues a warning.)
b / 0

  
  


array([-inf,  nan,  inf,  inf], dtype=float32)

In [22]:
# nan corresponds to missingness in data. It showcases a very interseting property that since
# we don't know 'what' the object is we can't compare it for equality. 
np.nan == np.nan

False

In [23]:
# For checking nan objects we use np.isnan() method
# For more information, look up IEEE Floating point representation and how inf and nan are represented.
np.isnan(np.nan)

True

### Understanding ndarrays

So far, we have seen the `np.ndarray()` constructor and the `np.arange()` functions.

In [24]:
# To generate an array of zeros
np.zeros

<function numpy.core.multiarray.zeros>

In [25]:
# To generate an array of ones
np.ones

<function numpy.core.numeric.ones(shape, dtype=None, order='C')>

In [26]:
# To generate an array filled with nothing
np.empty

<function numpy.core.multiarray.empty>

In [27]:
# the constructor takes the dimensions as a tuple
np.empty((2, 2))

array([[5.e-324, 5.e-324],
       [5.e-324, 0.e+000]])

In [28]:
# Accessing arrays (same as python)
# Note how slicing gives us the same object type.
a[0], a[-1], a[:3]

(-1, 100, array([-1,  0,  1], dtype=int8))

In [29]:
# Random Access, numpy arrays are mutable
a[-1] = 5
a

array([-1,  0,  1,  5], dtype=int8)

In [30]:
# Making a two dimensional numpy array.
# We can write nested Lists also.
# arange will give us a 1-D array, which we will give a desired shape via the reshape() method.
b = np.arange(12).reshape(4, 3)
b

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

In [31]:
# To check the shape of our array.
b.shape

(4, 3)

In [32]:
# Accessing multi dimensional numpy arrays. To refer to 8, we refer it in this way over the convensional b[2][2]
b[2, 2]

8

In [33]:
# Slicing in two dimensions.
# Notice how we have a new 2 by 2 np array like a view on our original array.
b[:2, :2]

array([[0, 1],
       [3, 4]])

In [34]:
# We just collapsed one dimension here while retrieving our data.
# Using the 'get item' with an integer will always collapse one dimension
b[1:3, -1]

array([5, 8])

In [35]:
# We can fix this via reshape also, here is another approach.
b[1:3, -1:]

array([[5],
       [8]])

In [36]:
b[1:3, -2:-1]

array([[4],
       [7]])

In [37]:
# To read from files
np.loadtxt

<function numpy.lib.npyio.loadtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0, encoding='bytes')>

In [38]:
# Generating a 3D Array
c = np.arange(24).reshape(2,3,4)
c

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

In [39]:
# Extracting 17
c[1, 1, 1]

17

In [40]:
# Reducing one dimension
c[0, :, :]

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [41]:
np.shape(c[0, :, :])

(3, 4)

In [42]:
# Two get item calls and one slice for collapsing two dimensions.
# Expected to supply the number or operators equal to the number of dimensions.
c[1, 0, :]

array([12, 13, 14, 15])

In [43]:
# 3D to 1D
c.flatten()

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23])

### Exercise 1

In [44]:
# Creating our 2D array.
a = np.arange(25).reshape(5, 5)
a

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

**Q1) Extract the last row.**

In [45]:
a[4]

array([20, 21, 22, 23, 24])

**Q2) Extract columns 1, 3.**

In [46]:
a[:, 1::2]

array([[ 1,  3],
       [ 6,  8],
       [11, 13],
       [16, 18],
       [21, 23]])

**Q3) Separate Points to form** $\begin{bmatrix}5 & 7\\ 15 & 17\end{bmatrix}$

In [47]:
a[1::2, :3:2]

array([[ 5,  7],
       [15, 17]])

Why do we want to use this? It is fast and effiecient due to the **random access.** O(1) time

### Fancy Indexing (second way to access data in numpy arrays)

There are two kinds of fanct indexing. First, where we are allowed to specify non sequential integer locations of our data and second where we specify masks.

In [48]:
a = np.arange(4)
a

array([0, 1, 2, 3])

In [49]:
# Passing a list of indeces.
a[[0, 1, 3]]

array([0, 1, 3])

In [50]:
b

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

In [51]:
# Extracring 2 and 6
b[[0, 2],[2, 0]]

array([2, 6])

In [52]:
c

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

In [53]:
# Applying in 3-D extracting 6 and 17
c[[0, 1], [1, 1], [2, 1]]

array([ 6, 17])

In [54]:
# Masks are a more common method of fancy indexing.
# What elements are greater than 16.
c > 16

array([[[False, False, False, False],
        [False, False, False, False],
        [False, False, False, False]],

       [[False, False, False, False],
        [False,  True,  True,  True],
        [ True,  True,  True,  True]]])

In [55]:
# Retreiving data fulfilling a certain condition via masks.
# Notice that the array returned is flattened.
c[c > 16]

array([17, 18, 19, 20, 21, 22, 23])

In [56]:
# Fancy indexing is slower than get method and slicing.
# d is a view on c, the elements are essentially in the same memory location.
d = c[:, 1:2, 1:3]
d

array([[[ 5,  6]],

       [[17, 18]]])

In [57]:
# Facts about a numpy array
# notice the OWNDATA flag is false, telling us that this array
# does not control its own data.
d.flags

  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

`d` maybe a separate python object, but it doesn't create separate copies of `5, 6, 17, 18`, but holds references to our data in `c`. This helps in terms of computational and space complexity because copying data can be very expensive for very large sets. This has a consequence, **changes made in d will be reflected in c.**

In [58]:
d[0 ,0, 0] = 1000
d

array([[[1000,    6]],

       [[  17,   18]]])

In [59]:
# Notice that both have changed.
# It helps in doing, in-place changes.
c

array([[[   0,    1,    2,    3],
        [   4, 1000,    6,    7],
        [   8,    9,   10,   11]],

       [[  12,   13,   14,   15],
        [  16,   17,   18,   19],
        [  20,   21,   22,   23]]])

This is usefull in a way that you can manipulate sub arrays.

In [60]:
# Data is recopied in case of fancy indexing being used.
# Indicated by OWNDATA: True
e = c[c > 16]
e.flags

  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

All data that numpy stores on the memory is one dimensional contiguous data. (Pointer arithmetic used for different views) ndarray.reshape() only changes the formula, doesnt move elements in memory.

In [62]:
# This is the transpose operation, again computationally very cheap
# because only the view on the memory changes.
d.T

array([[[1000,   17]],

       [[   6,   18]]])

In [61]:
# ndarray.strides() is a tuple holding the bytes to step in each dimension.
# as you may have figured it's this tuple that is affected when view changes.
c.strides

(96, 32, 8)

In [65]:
c.flags

  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

A very peculiar thing to note in the above output is that c also does not hold it's own data. This is because whenever a constructor type functions like `np.ndarray()`, `np.arange()`, etc are called, they create the object in the memory with two parts the actual data and the header part containing metadata. Python holds the reference to this object. Variables like `c` have their own header data which might include a different shape etc, but the data is owned by the original object. Any call like `reshape` or `transpose` is again generating a new header. Garbage collector only destroys the object only when all references to the object are destroyed.

## Exercise 2

In [66]:
# Creation of the ndarray
a = np.arange(25).reshape(5, 5)
a

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

**Q1) Extract the off-diagonal elements 1, 7, 13, 19.**

In [86]:
a[[0, 1, 2, 3], [1, 2, 3, 4]]

array([ 1,  7, 13, 19])

**Q2) Extract all the numbers divisible by 3 using a boolean mask.**

In [73]:
a[a % 3 == 0]

array([ 0,  3,  6,  9, 12, 15, 18, 21, 24])