# Introduction

**NumPy** is a general-purpose array-processing package designed to efficiently manipulate large multi-dimensional arrays of arbitrary records without sacrificing too much speed for small multi-dimensional arrays. **NumPy** is built on the Numeric code base and adds features introduced by numarray as well as an extended C-API and the ability to create arrays of arbitrary type which also makes NumPy suitable for interfacing with general-purpose data-base applications.

There are also basic facilities for discrete fourier transform, basic linear algebra and random number generation. Its required to process large swaths of data files in tabular format needed for Data Science.

www.numpy.org


A common beginner question is what is the real difference here against python data types. The answer is performance. Numpy data structures perform better in:

Size - Numpy data structures take up less space

Performance - they have a need for speed and are faster than lists

Functionality - SciPy and NumPy have optimized functions such as linear algebra operations built in.

Installation

1. Comes preinstalled with Anaconda and Google colab
    

*   https://www.anaconda.com
*   https://colab.research.google.com


2. Otherwise use : **pip install numpy**

The latest stable version of **NumPy** is 1.16.3

Import in Python Program/Script

In [1]:
! pip install numpy



In [2]:
import numpy as np

## Intialization

In [3]:
a = np.zeros((2,5,3,3))
print (a)

[[[[0. 0. 0.]
   [0. 0. 0.]
   [0. 0. 0.]]

  [[0. 0. 0.]
   [0. 0. 0.]
   [0. 0. 0.]]

  [[0. 0. 0.]
   [0. 0. 0.]
   [0. 0. 0.]]

  [[0. 0. 0.]
   [0. 0. 0.]
   [0. 0. 0.]]

  [[0. 0. 0.]
   [0. 0. 0.]
   [0. 0. 0.]]]


 [[[0. 0. 0.]
   [0. 0. 0.]
   [0. 0. 0.]]

  [[0. 0. 0.]
   [0. 0. 0.]
   [0. 0. 0.]]

  [[0. 0. 0.]
   [0. 0. 0.]
   [0. 0. 0.]]

  [[0. 0. 0.]
   [0. 0. 0.]
   [0. 0. 0.]]

  [[0. 0. 0.]
   [0. 0. 0.]
   [0. 0. 0.]]]]


In [4]:
print (type(a))

<class 'numpy.ndarray'>


In [5]:
b = np.ones((2,2,3), dtype =int)
print (b)

[[[1 1 1]
  [1 1 1]]

 [[1 1 1]
  [1 1 1]]]


In [6]:
a = np.array([1,2,3,4])
print(a)

[1 2 3 4]


In [7]:
b = np.array([(1,2,3), (4,5,6), (7,11,6)])
print(b)

[[ 1  2  3]
 [ 4  5  6]
 [ 7 11  6]]


In [8]:
c = np.array( [ (1+2j, 2+6j), (3,4) ], dtype=complex )
print (c)

[[1.+2.j 2.+6.j]
 [3.+0.j 4.+0.j]]


In [9]:
d = np.array([1+2j, 3+4j, 5+6j])
print (type(d))

<class 'numpy.ndarray'>


## Generate Random numeric values

In [76]:
np.random.seed(700)
np.random.rand(20,10)

array([[0.15654636, 0.41686984, 0.06680689, 0.86540639, 0.30229266,
        0.74125586, 0.41041526, 0.34513583, 0.97038625, 0.85254326],
       [0.57568511, 0.74530348, 0.68513455, 0.37452814, 0.18463369,
        0.28290183, 0.76077559, 0.61127742, 0.5473914 , 0.71080255],
       [0.58580117, 0.97151746, 0.94569067, 0.5166665 , 0.6582399 ,
        0.79492955, 0.40720702, 0.76953392, 0.07670343, 0.65327686],
       [0.8583688 , 0.50493235, 0.6966257 , 0.89618378, 0.36649969,
        0.73465513, 0.62589185, 0.9430502 , 0.6142144 , 0.83581277],
       [0.28378321, 0.74103373, 0.12282519, 0.9795708 , 0.02377285,
        0.50611138, 0.98748746, 0.48673501, 0.03395986, 0.76613888],
       [0.75934091, 0.87632783, 0.63020388, 0.38634057, 0.93143231,
        0.91418726, 0.49593791, 0.60003236, 0.07850808, 0.6368584 ],
       [0.5903337 , 0.38502885, 0.25112874, 0.84811823, 0.97997383,
        0.20651039, 0.2399852 , 0.66630943, 0.63050984, 0.449665  ],
       [0.03083455, 0.9677237 , 0.1678181

In [82]:
# data = np.random.randint(1,10,(10,8))
data = np.random.randint(1,10,(8,10))
print (data)

[[8 8 7 1 4 8 5 2 2 8]
 [9 2 1 6 4 1 4 8 5 9]
 [1 5 3 4 4 3 1 9 2 6]
 [5 8 2 3 6 2 8 9 4 4]
 [9 7 9 2 5 7 9 6 5 4]
 [1 7 6 6 4 8 1 3 3 5]
 [5 9 2 1 4 5 9 4 1 3]
 [9 1 1 2 2 6 4 5 2 2]]


## Reshaping matrix

In [12]:
a = np.arange(5,25)
print (a)

[ 5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24]


In [13]:
# a = a.reshape(4,5)
# a = a.reshape(5,4)
a = a.reshape(5,2,2)
a

array([[[ 5,  6],
        [ 7,  8]],

       [[ 9, 10],
        [11, 12]],

       [[13, 14],
        [15, 16]],

       [[17, 18],
        [19, 20]],

       [[21, 22],
        [23, 24]]])

In [14]:
print(a.shape)
print(a.ndim)
print(a.dtype)
print(a.itemsize)
print(a.size)

(5, 2, 2)
3
int32
4
20


## Basic Operations

In [15]:
a = np.array( [20,30,40,50] )
b = np.arange( 4,8 )
print (a)
print (b)
c = a-b
print (c)

[20 30 40 50]
[4 5 6 7]
[16 25 34 43]


In [16]:
b**2

array([16, 25, 36, 49], dtype=int32)

In [17]:
np.sin(a)

array([ 0.91294525, -0.98803162,  0.74511316, -0.26237485])

In [18]:
np.sin(np.array((0., 30., 45., 60., 90.)) * np.pi / 180. )

array([0.        , 0.5       , 0.70710678, 0.8660254 , 1.        ])

In [19]:
np.cos(np.array((0.,30.,45.,60.,90.))* np.pi / 180.)


array([1.00000000e+00, 8.66025404e-01, 7.07106781e-01, 5.00000000e-01,
       6.12323400e-17])

In [20]:
a[a>45]

array([50])

In [21]:
A = np.array( [[1,1], [0,1]] )
B = np.array( [[2,0], [3,4]] )
print (A)
print (B)

# elementwise matrix multiplication
print ("elementwise")
print (A*B)

# matrix product
print ("matrixproduct")
print (A@B)

# another way of matrix product

print (A.dot(B))

[[1 1]
 [0 1]]
[[2 0]
 [3 4]]
elementwise
[[2 0]
 [0 4]]
matrixproduct
[[5 4]
 [3 4]]
[[5 4]
 [3 4]]


## Unary Operations

In [22]:
a = np.random.random((2,3))
a

array([[0.27083024, 0.35094912, 0.62686355],
       [0.63151289, 0.73910934, 0.74810479]])

In [23]:
a.shape

(2, 3)

In [24]:
a.sum()

3.36736992849458

In [25]:
a.min()

0.27083024065540606

In [26]:
a.max()

0.7481047863598169

In [27]:
b = np.arange(12).reshape(4,3)
b

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

In [28]:
b.shape

(4, 3)

In [29]:
b.sum(axis = 0)

array([18, 22, 26])

In [30]:
b.sum(axis=1)

array([ 3, 12, 21, 30])

In [31]:
b.min(axis = 0)

array([0, 1, 2])

In [32]:
b.cumsum(axis =0)

array([[ 0,  1,  2],
       [ 3,  5,  7],
       [ 9, 12, 15],
       [18, 22, 26]], dtype=int32)

In [33]:
a = np.array([1,2,3,4, 0.5, 0.25])

In [34]:
np.exp(a)

array([ 2.71828183,  7.3890561 , 20.08553692, 54.59815003,  1.64872127,
        1.28402542])

In [35]:
np.sqrt(a)

array([1.        , 1.41421356, 1.73205081, 2.        , 0.70710678,
       0.5       ])

In [36]:
np.add(A,B) #shape of both inputs must be same

array([[3, 1],
       [3, 5]])

## Indexing, Slicing and Iterating

In [37]:
# Very similar to list in python
a = np.arange(10)**2
a

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81], dtype=int32)

In [38]:
a[6]

36

In [39]:
a[2:5]

array([ 4,  9, 16], dtype=int32)

In [40]:
for i in a:
  print (i ** 3)

0
1
64
729
4096
15625
46656
117649
262144
531441


In [41]:
B = np.arange(5,15).reshape(2,5)
B

array([[ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [42]:
B[0,1]

6

In [43]:
B[:, 1] 

array([ 6, 11])

In [44]:
C = np.array( [[[  0,  1,  2], [ 10, 12, 13]], [[100,101,102], [110,112,113]], [[1,10,12], [110,112,113]]])
C

array([[[  0,   1,   2],
        [ 10,  12,  13]],

       [[100, 101, 102],
        [110, 112, 113]],

       [[  1,  10,  12],
        [110, 112, 113]]])

In [45]:
C.shape

(3, 2, 3)

In [46]:
C[1,...]

array([[100, 101, 102],
       [110, 112, 113]])

In [47]:
C[1:]

array([[[100, 101, 102],
        [110, 112, 113]],

       [[  1,  10,  12],
        [110, 112, 113]]])

In [48]:
C[...,2]

array([[  2,  13],
       [102, 113],
       [ 12, 113]])

In [49]:
C[1,1,...]

array([110, 112, 113])

In [50]:
for obj1 in C:
  print(obj1.shape)
  for obj2 in obj1:
    print(obj2.shape)
    print(obj2)

(2, 3)
(3,)
[0 1 2]
(3,)
[10 12 13]
(2, 3)
(3,)
[100 101 102]
(3,)
[110 112 113]
(2, 3)
(3,)
[ 1 10 12]
(3,)
[110 112 113]


## Shape Manipulation

In [51]:
A = np.floor(10*np.random.random((3,4)))
A

array([[3., 0., 9., 5.],
       [4., 9., 5., 0.],
       [8., 4., 7., 4.]])

In [52]:
A.shape

(3, 4)

In [53]:
A.ravel()

array([3., 0., 9., 5., 4., 9., 5., 0., 8., 4., 7., 4.])

In [54]:
A = A.reshape(6,2)
A

array([[3., 0.],
       [9., 5.],
       [4., 9.],
       [5., 0.],
       [8., 4.],
       [7., 4.]])

In [55]:
A.T

array([[3., 9., 4., 5., 8., 7.],
       [0., 5., 9., 0., 4., 4.]])

In [56]:
A.T.shape

(2, 6)

In [57]:
A.shape

(6, 2)

In [58]:
my_arr = np.arange(1,122).reshape(11,11)
count = 0
for row in my_arr:
    row[count+1:] = 0
    count += 1
print(my_arr)

[[  1   0   0   0   0   0   0   0   0   0   0]
 [ 12  13   0   0   0   0   0   0   0   0   0]
 [ 23  24  25   0   0   0   0   0   0   0   0]
 [ 34  35  36  37   0   0   0   0   0   0   0]
 [ 45  46  47  48  49   0   0   0   0   0   0]
 [ 56  57  58  59  60  61   0   0   0   0   0]
 [ 67  68  69  70  71  72  73   0   0   0   0]
 [ 78  79  80  81  82  83  84  85   0   0   0]
 [ 89  90  91  92  93  94  95  96  97   0   0]
 [100 101 102 103 104 105 106 107 108 109   0]
 [111 112 113 114 115 116 117 118 119 120 121]]


In [59]:
arr = np.arange(1,122).reshape(11,11)
arr[4:9,3:8] = 0
print(arr)

[[  1   2   3   4   5   6   7   8   9  10  11]
 [ 12  13  14  15  16  17  18  19  20  21  22]
 [ 23  24  25  26  27  28  29  30  31  32  33]
 [ 34  35  36  37  38  39  40  41  42  43  44]
 [ 45  46  47   0   0   0   0   0  53  54  55]
 [ 56  57  58   0   0   0   0   0  64  65  66]
 [ 67  68  69   0   0   0   0   0  75  76  77]
 [ 78  79  80   0   0   0   0   0  86  87  88]
 [ 89  90  91   0   0   0   0   0  97  98  99]
 [100 101 102 103 104 105 106 107 108 109 110]
 [111 112 113 114 115 116 117 118 119 120 121]]


## Stacking together different arrays

In [60]:
a = np.floor(10*np.random.random((3,3)))
a

array([[2., 9., 7.],
       [8., 3., 7.],
       [0., 2., 1.]])

In [61]:
b = np.floor(10*np.random.random((2,3)))
b

array([[0., 9., 7.],
       [4., 1., 0.]])

In [62]:
np.vstack((a,b))

array([[2., 9., 7.],
       [8., 3., 7.],
       [0., 2., 1.],
       [0., 9., 7.],
       [4., 1., 0.]])

Try HStack on your own.

## Splitting one array into several

In [63]:
a = np.floor(10*np.random.random((3,12)))
a

array([[9., 0., 6., 0., 4., 3., 5., 0., 9., 5., 1., 4.],
       [5., 2., 3., 6., 9., 8., 0., 2., 8., 1., 1., 1.],
       [6., 2., 4., 7., 0., 4., 8., 6., 9., 8., 3., 0.]])

In [64]:
np.hsplit(a,3)

[array([[9., 0., 6., 0.],
        [5., 2., 3., 6.],
        [6., 2., 4., 7.]]),
 array([[4., 3., 5., 0.],
        [9., 8., 0., 2.],
        [0., 4., 8., 6.]]),
 array([[9., 5., 1., 4.],
        [8., 1., 1., 1.],
        [9., 8., 3., 0.]])]

In [65]:
np.vsplit(a,3) # homework

[array([[9., 0., 6., 0., 4., 3., 5., 0., 9., 5., 1., 4.]]),
 array([[5., 2., 3., 6., 9., 8., 0., 2., 8., 1., 1., 1.]]),
 array([[6., 2., 4., 7., 0., 4., 8., 6., 9., 8., 3., 0.]])]

## Linear Algebra

In [66]:
a = np.array([[3.0, 2.0], [4.0, 6.5]])
print(a)

[[3.  2. ]
 [4.  6.5]]


In [67]:
a.transpose()

array([[3. , 4. ],
       [2. , 6.5]])

In [68]:
np.linalg.inv(a)                       #linalg is the package in numpy for Linear algebra operations

array([[ 0.56521739, -0.17391304],
       [-0.34782609,  0.26086957]])

In [69]:
u = np.eye(5) #pronounced as I (Identity matrix)
u

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

## Advantages of Numpy against list

The most important benefits of using it are :

1. It consumes less memory.

2. It is fast as compared to the python List.

3. It is convenient to use.


For Python Lists -  We can conclude from this that for every new element, we need another eight bytes for the reference to the new object. The new integer object itself consumes 28 bytes. The size of a list "lst" without the size of the elements can be calculated with:

64 + 8 * len(lst) + (len(lst) * 28)

![alt text](https://webcourses.ucf.edu/courses/1249560/files/64324060/download?verifier=ZSNNql7AkXjfNQuAPKVykvZTiRqAa6LJY4EvPrdr&wrap=1)


NumPy takes up less space. This means that an arbitrary integer array of length "n" in numpy needs

96 + n * 8 Bytes

![alt text](https://webcourses.ucf.edu/courses/1249560/files/64324040/download?verifier=V7pAn7JWwOOMUVR8nJ2mpRIfu8MRFGTSZNwUCAlq&wrap=1)

In [70]:
import numpy as np
import time 
import sys

# Creating a NumPy array with 1000 elements
array = np.arange(1000)
# array.itemsize : Size of one element
# array.size : length of array
print("Size of NumPy array: ", array.itemsize * 1000)

# Creating a list with 1000 elements
# print the size of list
list = range(0, 1000)
# Multiplying size of 1 element with length of the list
print("Size of list: ", sys.getsizeof(1)*len(list))

Size of NumPy array:  4000
Size of list:  28000


In [71]:
import math
size_of_vec = 10000000

def pure_python_version():
    t1 = time.time()
    X = range(size_of_vec)
    Y = range(size_of_vec)
    Z = [X[i] + Y[i] for i in range(len(X)) ]
    return time.time() - t1

def numpy_version():
    t1 = time.time()
    X = np.arange(size_of_vec)
    Y = np.arange(size_of_vec)
    Z = X + Y
    return time.time() - t1


t1 = pure_python_version()
t2 = numpy_version()
print(t1, "\n", t2)
print("Numpy in this example is: " + str(math.floor(t1/t2)) + "x faster!")

3.3952503204345703 
 0.05900454521179199
Numpy in this example is: 57x faster!


## Further Reading
https://docs.scipy.org/doc/numpy/reference/index.html#reference