# Numpy 

Let us start with an example to understand the need for numpy. <br>
Lets assume I am in a 3D space, so I'll have 3 coordinates for my location: x, y and z.<br><br>
How to represent this is python?<br>
* List?
* Set?
* Tupple?

Now let us assume we are in 10,000 dimensional space, same case with the real world data. <br>
Example: Machine Learning Features, Images.

Will our choice to store such information remain same?

## NumPy: Numerical Python
* A scientific computing library for Python
* Focuses on representation of data as **N-dimensional array** and their **operations**.
* Implements efficient computations and operations on these N-dimensional array. For example
    * Basic operations : +, - , * , /
    * Mathematical functions on these arrays : sin, cos, pow
    * Linear Algebra
    * Searching arrays, sorting arrays etc
* More efficient (than lists for ex)  because
    * Specialized data structures that take advantage of homogenous typing, and contiguous storage
    * Many operations implemented in C
    * Some specialized methods are parallelized (kind of)
* Bottom line : When dealing with large data arrays, numpy arrays are a good data structure to use due to 
    * Efficiency
    * Support for various operations (For example, how to sort an N-dimensional array?) including linear algebra operations, so saves you time writing all that code.
    * Shape and Broadcasting semantics (covered later)

In [3]:
#sample program to show the effectiveness.

c = list(range(1000000))

import time
start = time.time()

i= 0
while i <len(c):
    i+=1
    
print(time.time()-start)

import numpy as np
d = np.array(c)

i =0
start = time.time()
while i < len(d):
    i+=1
print(time.time()-start)


print(c[-1])
print(d[-1])

0.16695857048034668
0.1567387580871582
999999
999999


### NumPy vs Lists

At the core of the NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of homogeneous data types, with many operations being performed in compiled code for performance. There are several important differences between NumPy arrays and the standard Python sequences:

* NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of an ndarray will create a new array and delete the original.

* The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in memory. The exception: one can have arrays of (Python, including NumPy) objects, thereby allowing for arrays of different sized elements.

* NumPy arrays facilitate advanced mathematical and other types of operations on large numbers of data. Typically, such operations are executed more efficiently and with less code than is possible using Python’s built-in sequences.

In [4]:
#another example for numpy vs list

a = np.array(list(range(1000)))
b = np.array(list(range(1000)))
import time
start = time.time()
for i in range(1000):
    c = a[i]*b[i]
print(time.time()-start)

0.011240005493164062


To install numpy, run

`` pip install numpy``

inside your environment

To import

``import numpy``

But usually everyone aliases it like this

In [5]:
import numpy as np

> **Resources** :  
> [The Official Numpy Quickstart Guide](https://numpy.org/devdocs/user/quickstart.html)  
> [A nice short tutorial on Python and Numpy](https://cs231n.github.io/python-numpy-tutorial)  
> [A cheatsheet covering most of the basic things you want to do with numpy](https://www.dataquest.io/blog/numpy-cheat-sheet/)  

### Numpy Basics

* np.array attributes : size, shape, dtype, T, 
* automatically creating some preset types of arrays
    * zeros, ones, array, with type, np.arange, np.linspace
    * np.random - rand, randn, randint
    
Lets have a look at few basic commands: <br>
* Array creation: array, ones, zeros, arrange, empty, linspace
* Array manipulation: ndim, shape, size, dtype, reshape
* Functions: pi, sin, cos

What if we try to print a large array?

In [39]:
a = np.ones((3,4), dtype = np.int64)
c = list(range(10))

a = np.array([1,3,6,9])
b = np.arange(9) # b = np.array(list(range(9)))

c = np.linspace(0,10,10000)  # np.linspace(start, end, no of enteries)
print(c.size)
# range(start, end, step_size)
print(c)

10000
[0.00000000e+00 1.00010001e-03 2.00020002e-03 ... 9.99799980e+00
 9.99899990e+00 1.00000000e+01]


In [8]:
a = np.arange(1000)
print(a.size)
#print(a.shape)
#print(a.ndim)
b = a.reshape(10,10,10)
print()
#print('-'*10)
#print(b.shape)
#print(b.ndim)

#shape = h x C x W  cube   
# square: row x column

print(a.size)  # a.size = product of (a.shape)
print(b.size)


1000
1000
1000


In [113]:
a = np.arange(6).reshape(2,3)
print(a.shape)
print(a)
print(a.size)

(2, 3)
[[0 1 2]
 [3 4 5]]
6


In [114]:
# np.array to create a numpy array from list

# you can create arrays with multiple types - int32, int64, float32, float64, bool etc
# interesting array attributes : size, shape, dtype, ndim, itemsize 

x = np.array([1,2,3])
#print(x, x.ndim, x.shape, x.size, x.dtype, x.itemsize)

x = np.array([1,2,3],dtype="int64")
#print(x, x.ndim, x.shape, x.size, x.dtype,x.itemsize)

x = np.array([1,2,3],dtype="float32")
print(x, x.ndim, x.shape, x.size, x.dtype, x.itemsize)

x = np.array([1,2,3],dtype="float64")
print(x, x.ndim, x.shape, x.size, x.dtype,x.itemsize)

# otherwise numpy automatically guesses the type

x = np.array([1.5,2.6,7.7])
print(x, x.ndim, x.shape, x.size, x.dtype,x.itemsize)

x = np.array([True, False])  # boolean : True or False
print(x, x.ndim, x.shape, x.size, x.dtype,x.itemsize)

# regular bool to int conversion 
x = np.array([True, False],dtype="int32")
print(x, x.ndim, x.shape, x.size, x.dtype,x.itemsize)

[1. 2. 3.] 1 (3,) 3 float32 4
[1. 2. 3.] 1 (3,) 3 float64 8
[1.5 2.6 7.7] 1 (3,) 3 float64 8
[ True False] 1 (2,) 2 bool 1
[1 0] 1 (2,) 2 int32 4


In [10]:
print(int(False))
print(int(True))
c = [1, 2, 3.0, 5]
print(c)
a = np.array(c, dtype=np.int32)
print(a.dtype)
print(a)

0
1
[1, 2, 3.0, 5]
int32
[1 2 3 5]


In [11]:
print(2 == 3)
print(4==4)

False
True


In [132]:
# Example of N-Dimensional np array.

a = np.arange(1000).reshape(10,100)
print(a.shape)
print(a.ndim)
b = a.reshape(100,2,5)
print(b.shape)

(10, 100)
2
(100, 2, 5)


In [136]:
c = [
    [1,2,3],[4,6,9]
]
print(c)
c = np.array(c)
print(c.shape)

[[1, 2, 3], [4, 6, 9]]
(2, 3)


In [135]:
c = [
    [
        [1,2,3], [4,5,6]
    ],
    [
        [8,9,0],[7,6,7]
    ]
]

print(c)
c = np.array(c)
print(c.shape)

[[[1, 2, 3], [4, 5, 6]], [[8, 9, 0], [7, 6, 7]]]
(2, 2, 3)


In [138]:
# you can create multidimensional arrays

# a 2-d array

# sub arrays shapes have to be equal
x = np.array([[0.89012943, 0.95057931, 0.73957614],
       [0.32044759, 0.11317706, 0.21559414],
       [0.64161899, 0.22493727, 0.19792863],
       [0.64161899, 0.22493727, 0.19792863]])

print(x)

print(x.ndim, x.shape, x.size, x.dtype,x.itemsize)


[[0.89012943 0.95057931 0.73957614]
 [0.32044759 0.11317706 0.21559414]
 [0.64161899 0.22493727 0.19792863]
 [0.64161899 0.22493727 0.19792863]]
2 (4, 3) 12 float64 8


In [139]:
# a 3-d array

x = np.array([[[0.95059078, 0.4920439 , 0.19623088],
        [0.35948661, 0.03963015, 0.01359072],
        [0.60065644, 0.43182609, 0.5839114 ]],

       [[0.49141504, 0.69129956, 0.53504228],
        [0.39380831, 0.90347317, 0.77581393],
        [0.77241933, 0.65354209, 0.00165317]],

       [[0.09852572, 0.48191009, 0.75421498],
        [0.73972572, 0.03630696, 0.88197898],
        [0.64376238, 0.77181998, 0.27395923]],

       [[0.86325292, 0.77815359, 0.85744606],
        [0.05330425, 0.79149102, 0.57572194],
        [0.92416049, 0.07126161, 0.15124192]]])

print(x.ndim, x.shape, x.size, x.dtype,x.itemsize)

3 (4, 3, 3) 36 float64 8


In [143]:
# numpy provides some functions to create some standard arrays
# np.zeros. np.ones, np.arange, np.linspace

#print("*"*30)
# just zeros, you can specify the dtype
x=np.zeros(shape=(1,3,4),dtype="float32")
#print(x)
#print(x.ndim, x.shape, x.size, x.dtype,x.itemsize)


#print("*"*30)
# just ones
x=np.ones((1,3,4), dtype = 'int32')
#print(x)
#print(x.ndim, x.shape, x.size, x.dtype,x.itemsize)


print("*"*30)
# a range of integer (like python range())
x=np.arange(2,10,2)
print(x)
print(x.ndim, x.shape, x.size, x.dtype,x.itemsize)

#print("*"*30)
# more generally to get equal divisions between two numbers
x=np.linspace(2.2,8.1,100)
#print(x)
#print(x.ndim, x.shape, x.size, x.dtype,x.itemsize)

******************************
[2 4 6 8]
1 (4,) 4 int64 8


### Numpy random submodule
#### Use?
Machine Learning weights initialization

In [40]:
# np.random is a module that gives options to create random arrays as well
# np.random.randint, np.random.uniform, 

# a 1 dimensional array of random integers
x=np.random.randint(10,90, (3,3)) # np.random.randint(lowest number to include, highest integer, no. of integers)
#print(x, x.shape, x.dtype)
#print("*"*30)

# any n-dimensional array of random numbers between 0 and 1 (uniformly distributed)
x=np.random.random_sample(size=(3,2))
#print(x, x.shape, x.dtype)
#print("*"*30)

# more generally between a and b (uniformly distributed)
x=np.random.uniform(10,10.5,size=(3,2))
print(x, x.shape, x.dtype)
print("*"*30)
x=np.random.random((3,4))
print(x, x.shape, x.dtype)
print("*"*30)

[[10.37832739 10.01460012]
 [10.32408034 10.05933534]
 [10.07093262 10.46504084]] (3, 2) float64
******************************
[[0.62749774 0.627892   0.85908209 0.09679144]
 [0.04145308 0.04995615 0.31390502 0.54546021]
 [0.84304337 0.80353633 0.32588973 0.42131974]] (3, 4) float64
******************************


* slicing and indexing, all indices vs few indices

In [20]:
# Difference between List and Numpy indexing

c = list(range(10))
#print(c)
#print(c[8])
c = [
    [1,2,3], [3,4,5]
]
#print(c[0])
#print(type(c[0]))
#print(c[0][1])
d = np.array(c)
#print(d)
#print(d[0,2])

x = np.array([[[0.95059078, 0.4920439 , 0.19623088],
        [0.35948661, 0.03963015, 0.01359072],
        [0.60065644, 0.43182609, 0.5839114 ]],

       [[0.49141504, 0.69129956, 0.53504228],
        [0.39380831, 0.90347317, 0.77581393],
        [0.77241933, 0.65354209, 0.00165317]],

       [[0.09852572, 0.48191009, 0.75421498],
        [0.73972572, 0.03630696, 0.88197898],
        [0.64376238, 0.77181998, 0.27395923]],

       [[0.86325292, 0.77815359, 0.85744606],
        [0.05330425, 0.79149102, 0.57572194],
        [0.92416049, 0.07126161, 0.15124192]]])

print(x[2,2,2])
print(x[2][2][2])

(3,)
(3,)


In [171]:
c = list(range(10))
c = np.array(c)
print(c)
print(c[:5])
print(c[5:])
print(c[:])

[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4]
[5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]


In [None]:
# Negative indices

# Complete vs incomplete slices : x[0] vs x[0,:,:]

In [21]:
# numpy indexing and slicing

# mixing indices and slices

x=np.random.uniform(low=10,high=10.5,size=(4,6,5))

#print(x.shape)
#print(x[0])
#print(x[0].shape)
#print(x[1:3].shape)
#print(x[1:3, :, :].shape)
#print(x[1:-1].shape) # 0, 1, 2, 3
#print(x[1:-1,:,:].shape)

#print(x[1:3,2:5,3:].shape)     # x[   1:3,   2:5,    3:]

print(x[1:2,2,3:].shape)
print(x[1].shape)
print(x[1,:,:].shape)

(1, 2)
(6, 5)
(6, 5)


In [187]:
c = [
    [1,2,3], [3,4,5]
]
c = np.array(c)

print(c.shape)
print(c[1].shape)

(2, 3)
(3,)


* operations on arrays of same size
    * numpy mathematical functions - sin, cos, log etc
    * scalar arithmetic
    * arithmetic operations, upcasting
    * comparison operations
        * np.all, np.any
        * np.where, np.argwhere, np.nonzero

In [189]:
x=np.random.randint(low=1,high=10,size=(2,3))
print(x)

[[4 6 6]
 [3 9 1]]


In [193]:
# You can do arithmetic between numpy arrays and scalars
# These operations are element wise

print(x,end="\n\n")
#print(x+2,end="\n\n")
#print(x-2,end="\n\n")
#print(x*2,end="\n\n")
#print(x/2,end="\n\n")
#print(x**3,end="\n\n")
print(x//3,end="\n\n")
print(x%3,end="\n\n")

[[4 6 6]
 [3 9 1]]

[[1 2 2]
 [1 3 0]]

[[1 0 0]
 [0 0 1]]



In [198]:
# Less than or Greater than a number
print(x)
print(x==6)

[[4 6 6]
 [3 9 1]]
[[False  True  True]
 [False False False]]


In [199]:
# You can use some elementwise global functions on numpy arrays
# like np.sin, cos, exp, log, sqrt

print(np.sin(x),end="\n\n")
print(np.cos(x),end="\n\n")
print(np.sin(x)**2 + np.cos(x)**2,end="\n\n")
print(np.sqrt(x),end="\n\n")

[[-0.7568025  -0.2794155  -0.2794155 ]
 [ 0.14112001  0.41211849  0.84147098]]

[[-0.65364362  0.96017029  0.96017029]
 [-0.9899925  -0.91113026  0.54030231]]

[[1. 1. 1.]
 [1. 1. 1.]]

[[2.         2.44948974 2.44948974]
 [1.73205081 3.         1.        ]]



In [200]:
x=np.random.randint(low=1,high=10,size=(2,3),dtype="int32")
print(x)
y=np.random.randint(low=1,high=10,size=(2,3),dtype="int64")
print(y)

[[1 8 6]
 [2 5 1]]
[[6 5 3]
 [3 2 9]]


In [201]:
# elementwise binary arithmetic operations on arrays of same shape
# + , - , *, /
# upcasting
# Shape of both arrays need to be same, we will handle different shape arrays later on in Broadcasting
print(x+y,end="\n\n")
print(x-y,end="\n\n")
print(x*y,end="\n\n")
print(x/y,end="\n\n")
print(x**y,end="\n\n")

# Operations between different types results in upcasting to a compatible larger type
# which can contain both types : ex : int32 + float32 = float32
print(x.dtype)
zeros=np.zeros((2,3),dtype="float32")
print((x+zeros).dtype)

[[ 7 13  9]
 [ 5  7 10]]

[[-5  3  3]
 [-1  3 -8]]

[[ 6 40 18]
 [ 6 10  9]]

[[0.16666667 1.6        2.        ]
 [0.66666667 2.5        0.11111111]]

[[    1 32768   216]
 [    8    25     1]]

int32
float64


In [207]:
# Matrix multiplication
print(x)
print(y.T)

print(x*y)  # element wise multiplication
print(x.dot(y.T))
print(x@y.T)

# x = (2,3)   y = (2,3)  y.T = (3,2)   x.dot(y.T) = (2,2)

[[1 8 6]
 [2 5 1]]
[[6 3]
 [5 2]
 [3 9]]
[[ 6 40 18]
 [ 6 10  9]]
[[64 73]
 [40 25]]
[[64 73]
 [40 25]]


In [208]:
# You can also compare numpy arrays with other arrays or scalars
# For now we restrict to both arrays having the same shape
# elementwise comparison operators >, <, ==, != etc
print(x)
print(x > 2)
#print(x > y)

[[1 8 6]
 [2 5 1]]
[[False  True  True]
 [False  True False]]


In [212]:
# np.all(boolean_array) is True if all of the elements in boolean_array is True
# np.any(boolean_array) is True if at least one of the elements in boolean_array is True
#print(np.all(x>0))
print(np.any(x==2))

True


In [26]:
# np.where(boolean_array): Get tuple of indices where boolean array is True
print(x)
#print("\n")
#print(y)
#print("\n")
#print(np.where(x>=y))
#print("\n")
# np.where(boolean_array, A, B): If entry in boolean_array is True, choose entry from A else from B
# Thus, A,B, and boolean_array have to be of the same shape (or A or B can be scalar)
#max_xy=np.maximum(x,y)
#max_xy_w=np.where(x>y,x,y)
#print(np.all(max_xy_w==max_xy))
#print("\n")
print(np.where(x>10.25,20,-20))
print(np.where(x>10.25, x, 0))

[[[10.21086849 10.45614858 10.06085561 10.00210571 10.31289752]
  [10.15478653 10.2025137  10.4393493  10.32300117 10.29768191]
  [10.2356706  10.30681989 10.48387061 10.45258147 10.39516478]
  [10.42988102 10.47659671 10.07656252 10.30084437 10.37312421]
  [10.35056755 10.0152244  10.14337598 10.26968234 10.31034057]
  [10.49700348 10.42392979 10.38269157 10.25169307 10.47621213]]

 [[10.36442749 10.40041197 10.41781427 10.35636599 10.12678436]
  [10.23636095 10.2706737  10.46417705 10.45862992 10.43831937]
  [10.1630925  10.13923916 10.27748365 10.19774908 10.0065882 ]
  [10.389706   10.41046541 10.01731296 10.10937528 10.05984932]
  [10.14150233 10.19893291 10.08035789 10.28539832 10.26231354]
  [10.02284588 10.01804097 10.32437539 10.05270884 10.46989539]]

 [[10.16506725 10.26593591 10.0936059  10.08224438 10.16577367]
  [10.48305021 10.47240107 10.42066756 10.00573931 10.44786127]
  [10.20585045 10.47923819 10.1948667  10.20139964 10.34547801]
  [10.18061826 10.08020303 10.359415

In [32]:
# np.nonzero(boolean_array) : Get tuple of indices where boolean array is True, same as np.where
# np.argwhere(boolean_array) : Something like a transpose of np.nonzero...
print(x)
print(np.nonzero(x>2))
print(np.argwhere(x>2))

[[9 6 7]
 [3 1 2]]
(array([0, 0, 0, 1]), array([0, 1, 2, 0]))
[[0 0]
 [0 1]
 [0 2]
 [1 0]]


Miscellaneous 

* astype()
* sum, min, max, argmin, argmax, sort, argsort
* tolist()
* copy()
* np.concatenate

In [220]:
# arr.astype()
# either string "int32" or np.int32
x=np.random.randint(low=1,high=10,size=(2,3),dtype="int32")
print(x.dtype)
x=x.astype("float32")
print(x.dtype)
x=x.astype("bool")
print(x.dtype)
print(x)

int32
float32
bool
[[ True  True  True]
 [ True  True  True]]


In [35]:
# np.sort, np.max, np.argsort, np.argmax w/ and w/o axis parameter
x=np.random.randint(low=1,high=10,size=(4,3),dtype="int32")
print(x)
print('*'*30)
#print(x.shape)
#print(np.min(x))
#print('*'*30)
print(np.max(x,axis=0, keepdims = False))
#print('*'*30)
#print(np.max(x,axis=1))
#print('*'*30)
#print("\n"+"-"*10+"\n")
#print('*'*30)
print(np.argmax(x))
#print('*'*30)
print(np.argmax(x,axis=0))
#print('*'*30)
print(np.argmax(x,axis=1))
#print('*'*30)



[[8 7 9]
 [1 2 8]
 [3 9 8]
 [7 1 4]]
******************************
[8 9 9]
2
[0 2 0]
[2 2 1 0]


In [229]:
# np sort : default axis is -1
print(x)
#print("\n"+"-"*10+"\n")
#print(np.sort(x))
#print(np.sort(x,axis=0))
#print(np.sort(x,axis=1))

#print("\n"+"-"*10+"\n")
print(np.argsort(x))
#print(np.argsort(x,axis=0))
#print(np.argsort(x,axis=1))

[[2 9 9]
 [3 7 9]
 [3 9 1]
 [8 3 5]]
[[0 1 2]
 [0 1 2]
 [2 0 1]
 [1 2 0]]


In [230]:
# copying a numpy array
# show using is, difference between view and copy
print(id(x))
y = x
print(id(y))
z = x.copy()
print(id(z))
x[0,0] = 1000
print(x)
print(y)
print(z)

140018352573136
140018352573136
140018117385072
[[1000    9    9]
 [   3    7    9]
 [   3    9    1]
 [   8    3    5]]
[[1000    9    9]
 [   3    7    9]
 [   3    9    1]
 [   8    3    5]]
[[2 9 9]
 [3 7 9]
 [3 9 1]
 [8 3 5]]


In [232]:
# arr.tolist()
# np.array(list) => np array
print(type(x))
y = x.tolist()
print(type(y))

<class 'numpy.ndarray'>
<class 'list'>


In [5]:
x = input('What is length?')
y = input('What is the breadth?')
x = int(x)
y = int(y)
print(x*y)

What is length?9
What is the breadth?1
9


In [6]:
5+6
'hello'+'people'
[1,2]+[3,4]


[1, 2, 3, 4]

### Broadcasting

* Can you perform ``A+B`` when ``a.shape != b.shape``?
* Sometimes, when their shapes are **compatible**

How do you decide compatibility?
* Case 1 : ndim is equal
* Case 2 : ndim is not equal


* For Case 1, for every axis where shape doesn't match, **one of the dimensions has to be 1**.
* In that case the array with dimension = 1 is "tiled" or "duplicated" along the non matching axis, to match the larger array. 
* For example : ``A of shape (4,1,5) + B of shape(4,5,5)``, A is copied 5 times along the ``1`` axis to match the shape with B. Then it can be added.
* Similarly, for the case ``A of shape(1,4,5) + B of shape(3,4,1)``
    * A is copied along the 0-axis 3 times 
    * B is copied along the 2-axis 5 times
    * Thereafter we can add A + B
    


In [9]:
# Example of case 1 broadcasting
import numpy as np
b = np.arange(16).reshape(4,4)
print(b.shape)
a = np.arange(4)
print(a.shape)
print(a+b)
print((a+b).shape)

(4, 4)
(4,)
[[ 0  2  4  6]
 [ 4  6  8 10]
 [ 8 10 12 14]
 [12 14 16 18]]
(4, 4)


* For Case 2, we first make ndim equal for both arrays - by prepending the smaller array with 1s.
* For example,``A of shape (4,4,5,5) + B of shape(5,5)``, then first we convert B to shape ``(1,1,5,5)``.
* Thereafter we apply Case 1. In this case A and B are broadcastable.
* Another example,``A of shape (2,5) + B of shape(2,4,5)``, then first we convert A to shape ``(1,2,5)``. Then, in Case 1 however broadcasting will fail, because the 1-dimension is not same (2 vs 4).


So which of the following combinations work for broadcasting?

* (3,4,5) vs (4,5) vs (1,4,5)
* (3,4,5) vs (5,) vs (1,5) vs (1,1,5)
* (3,4,5) vs (1,) vs (1,1) vs (1,1,1) : Scalar case!
* (3,4,5) vs (3,1,5)
* (3,4,5) vs (4,4,5) ?
* (3,4,5) vs (3,8,5) ?
* (3,4,5) vs (3,4,7) ?
    
> https://numpy.org/devdocs/user/basics.broadcasting.html

In [11]:
# Let's try the different cases we mentioned above and see what happens
x=np.random.randint(low=1,high=10,size=(2,3,2),dtype="int32")
y=np.random.randint(low=1,high=10,size=(2,3,2),dtype="int32")
print(x)
print(y)
print(x+y)

[[[7 5]
  [5 8]
  [4 8]]

 [[1 3]
  [4 9]
  [9 8]]]
[[[3 3]
  [2 1]
  [4 1]]

 [[5 2]
  [4 3]
  [5 3]]]
[[[10  8]
  [ 7  9]
  [ 8  9]]

 [[ 6  5]
  [ 8 12]
  [14 11]]]


In [13]:
# Let's try the different cases we mentioned above and see what happens
x=np.random.randint(low=1,high=10,size=(2,3,2),dtype="int32")
y=np.random.randint(low=1,high=10,size=(3,2),dtype="int32") # 3,2 => 1,3,2 => braodcast 2,3,2
print(x)
print(y)
print(x+y)

[[[2 6]
  [7 7]
  [6 7]]

 [[1 5]
  [4 9]
  [7 3]]]
[[6 4]
 [2 1]
 [5 5]]
[[[ 8 10]
  [ 9  8]
  [11 12]]

 [[ 7  9]
  [ 6 10]
  [12  8]]]


In [17]:
# Let's try the different cases we mentioned above and see what happens
x=np.random.randint(low=1,high=10,size=(2,3,2),dtype="int32")
print(x.shape)
print(x.ndim)
y=np.array([5]) #y.shape = (1,) != x.shape => (1,1,1) => (2,3,2)

z = np.ones((2,3,2), dtype=np.int32)*5
#print(x)
#print(y)
print(x+y)
print(x+z)

(2, 3, 2)
3
[[[14  6]
  [14  7]
  [13 13]]

 [[ 7 14]
  [12 12]
  [ 8  9]]]
[[[14  6]
  [14  7]
  [13 13]]

 [[ 7 14]
  [12 12]
  [ 8  9]]]


### Manipulating shape

* expand_dims, squeeze, flatten,  reshape

In [233]:
x= np.random.random((5,5))
print(x)

[[0.13515076 0.62211961 0.4202402  0.63787993 0.92610762]
 [0.34536589 0.42206474 0.90750312 0.04698229 0.70801067]
 [0.27280579 0.96508233 0.17616529 0.10812028 0.69226975]
 [0.29704434 0.11295846 0.12160439 0.7224152  0.96982305]
 [0.08403466 0.57883786 0.21644387 0.99771301 0.6451038 ]]


In [234]:
# flatten()
y = x.flatten()
print(x.flatten())
print(y.shape)

[0.13515076 0.62211961 0.4202402  0.63787993 0.92610762 0.34536589
 0.42206474 0.90750312 0.04698229 0.70801067 0.27280579 0.96508233
 0.17616529 0.10812028 0.69226975 0.29704434 0.11295846 0.12160439
 0.7224152  0.96982305 0.08403466 0.57883786 0.21644387 0.99771301
 0.6451038 ]
(25,)


In [236]:
x = np.array(range(0,10)).reshape(2,5)
print(x.shape)
y = np.expand_dims(x, axis = 0)
print(y.shape)

(2, 5)
(1, 2, 5)


In [237]:
# expand_dims(a,axis)
# add a singleton dimension
x2=np.expand_dims(x,axis=0)
print(x2.shape)
x2=np.expand_dims(x,axis=1)
print(x2.shape)
x2=np.expand_dims(x,axis=2)
print(x2.shape)
x2=np.expand_dims(x,axis=-1)
print(x2.shape)

(1, 2, 5)
(2, 1, 5)
(2, 5, 1)
(2, 5, 1)


In [72]:
# squeeze(a,axis=1 or (1,2) or None)
# remove (some/all) singleton dimension
print(x2)
print(x2.squeeze())

[[[0]
  [1]
  [2]
  [3]
  [4]]

 [[5]
  [6]
  [7]
  [8]
  [9]]]
[[0 1 2 3 4]
 [5 6 7 8 9]]


In [240]:
a = np.arange(1000).reshape(10,20,5)
print(a.shape)

b = np.moveaxis(a, [0 ,1, 2], [1,2, 0])
print(b.shape)

(10, 20, 5)
(5, 10, 20)


In [24]:
# moveaxis(a,    [0,1,2],      [1,2,0])
# 0 ->1 , 1 -> 2 , 2 ->0  (a cycle)


print(x2.shape)
x3=np.moveaxis(x2,[0,1,2],[1,2,0])
print(x3.shape)

print(x,x.shape)
x3=np.moveaxis(x,[0,1],[1,0])
print(x3,x3.shape)

(5, 5, 1)
(1, 5, 5)
[[0.65628936 0.53580925 0.61111055 0.55807366 0.17518485]
 [0.89396368 0.51182689 0.48569536 0.08317149 0.80857819]
 [0.23172367 0.51990505 0.64523944 0.56416134 0.3608993 ]
 [0.60397349 0.30386437 0.22384056 0.23004792 0.7280163 ]
 [0.29966579 0.76987699 0.62173972 0.70876911 0.66961209]] (5, 5)
[[0.65628936 0.89396368 0.23172367 0.60397349 0.29966579]
 [0.53580925 0.51182689 0.51990505 0.30386437 0.76987699]
 [0.61111055 0.48569536 0.64523944 0.22384056 0.62173972]
 [0.55807366 0.08317149 0.56416134 0.23004792 0.70876911]
 [0.17518485 0.80857819 0.3608993  0.7280163  0.66961209]] (5, 5)


In [38]:
# reshape 
# note : this method conceptually first flattens the array
# and then fills it into the new array in row first order
# while reshape can do the work of flatten, squeeze and expand_dims, 
# reshape CANNOT be used to moveaxis - even though 
# trying will not give an error - don't fall into this trap!

a=np.arange(1,25,1)
print(a)
print(a.reshape(4,6))
print(a.reshape(6,4))
print(a.reshape(2,3,4))

[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24]
[[ 1  2  3  4  5  6]
 [ 7  8  9 10 11 12]
 [13 14 15 16 17 18]
 [19 20 21 22 23 24]]
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]
 [17 18 19 20]
 [21 22 23 24]]
[[[ 1  2  3  4]
  [ 5  6  7  8]
  [ 9 10 11 12]]

 [[13 14 15 16]
  [17 18 19 20]
  [21 22 23 24]]]


### Further Reading

* np.vstack,np.hstack,np.tile
* integer indexing and boolean indexing
* cumsum, cumprod, 
* serializing arrays : np.save