### Advanced Numpy

What we will touch:    
1. Difference between python list and array.shape, ndim, len
1. Ufuncs, vectorization  
1. Broadcasting, Dimensioning, 1-rank arrays, vectors, matrices def  
1. vector product, outer product, hadamard product, notations  
1. Reduce, Accumulate  
1. Reshape, Squeeze  
1. Specifying axis; axis=0, axis=1, axis=None; suming  
1. Making comparism  
1. Random seed, random state  
1. Debugging, Timing your code
1. How to read code and therefore write good code
1. Optimizing code; Perceptron as an example; 
1. Numba
1. Using documentation and general advice.
    
What we will not touch;  
1. Maths  
1. Machine Learning
1. Python programming

Try things yourself. Don't ask if you can try it out and see the result
    

In [4]:
import numpy as np

In [13]:
a = np.arange(15).reshape(5,3)

In [14]:
a

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [19]:
np.sum(a, axis=1).shape

(5,)

In [None]:
# Creating arrays
np.array([1,2,3]) # One dimensional array
np.array([[1,2,3],[4,5,6]b]) # Two dimensional array
np.zeros(3) # 1D array of length 3 all values 0
np.ones((3,4)) # 3x4 array with all values 1
np.eye(5) # 5x5 array of 0 with 1 on diagonal (Identity matrix)
np.linspace(0,100,6) # Array of 6 evenly divided values from 0 to 100
np.arange(0,10,3) # Array of values from 0 to less than 10 with step 3 (eg [0,3,6,9])
np.full((2,3),8) # 2x3 array with all values 8
np.random.rand(4,5) # 4x5 array of random floats between 0–1
np.random.randn(4,5) # 6x7 array of random floats between 0–100
np.random.randint(5,size=(2,3)) # 2x3 array with random ints between 0–4

In [12]:
temp = np.random.randint(9, size=(10,6))
arr = np.full((10,8),[3, 1,0,0,0,0,0,0])

In [14]:
arr[:, 2:] = temp

In [17]:
temp[0, :] = 1 

In [18]:
temp

array([[1, 1, 1, 1, 1, 1],
       [0, 1, 0, 4, 7, 3],
       [2, 7, 2, 0, 0, 4],
       [5, 5, 6, 8, 4, 1],
       [4, 8, 1, 1, 7, 3],
       [6, 7, 2, 0, 3, 5],
       [4, 4, 6, 4, 4, 3],
       [4, 4, 8, 4, 3, 7],
       [5, 5, 0, 1, 5, 3],
       [0, 5, 0, 1, 2, 4]])

In [19]:
arr

array([[3, 1, 8, 1, 3, 3, 3, 7],
       [3, 1, 0, 1, 0, 4, 7, 3],
       [3, 1, 2, 7, 2, 0, 0, 4],
       [3, 1, 5, 5, 6, 8, 4, 1],
       [3, 1, 4, 8, 1, 1, 7, 3],
       [3, 1, 6, 7, 2, 0, 3, 5],
       [3, 1, 4, 4, 6, 4, 4, 3],
       [3, 1, 4, 4, 8, 4, 3, 7],
       [3, 1, 5, 5, 0, 1, 5, 3],
       [3, 1, 0, 5, 0, 1, 2, 4]])

In [16]:
print(temp)

[[8 1 3 3 3 7]
 [0 1 0 4 7 3]
 [2 7 2 0 0 4]
 [5 5 6 8 4 1]
 [4 8 1 1 7 3]
 [6 7 2 0 3 5]
 [4 4 6 4 4 3]
 [4 4 8 4 3 7]
 [5 5 0 1 5 3]
 [0 5 0 1 2 4]]


In [15]:
arr

array([[3, 1, 8, 1, 3, 3, 3, 7],
       [3, 1, 0, 1, 0, 4, 7, 3],
       [3, 1, 2, 7, 2, 0, 0, 4],
       [3, 1, 5, 5, 6, 8, 4, 1],
       [3, 1, 4, 8, 1, 1, 7, 3],
       [3, 1, 6, 7, 2, 0, 3, 5],
       [3, 1, 4, 4, 6, 4, 4, 3],
       [3, 1, 4, 4, 8, 4, 3, 7],
       [3, 1, 5, 5, 0, 1, 5, 3],
       [3, 1, 0, 5, 0, 1, 2, 4]])

In [20]:
a = np.array([1,2,3])
print(a)

[1 2 3]


In [21]:
a = np.array([1,2,3])
a

array([1, 2, 3])

In [25]:
def printer(a):
    return a

In [28]:
print(printer(a))

None


In [30]:
x3 = np.full((2,3), [1, 3, 0])
x3

array([[1, 3, 0],
       [1, 3, 0]])

In [31]:
# Learning about your array
print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)
len(x3)


x3 ndim:  2
x3 shape: (2, 3)
x3 size:  6


2

In [5]:
b = np.random.rand(5,3)
b.shape

(5, 3)

In [6]:
import numpy as np
a = np.full(b.shape, [1, 3, 0])

In [7]:
a

array([[1, 3, 0],
       [1, 3, 0],
       [1, 3, 0],
       [1, 3, 0],
       [1, 3, 0]])

In [38]:
a.flatten()

array([1, 3, 0, 1, 3, 0])

In [None]:
#Tensors - Higher dimensions

In [40]:
import numpy as np
a = np.array([2])
a = a.astype('float32')

In [41]:
a.dtype

dtype('float32')

In [None]:
# Datatype conversion
a//2
a.astype(int)

# Indexing and Slicing
One important–and extremely useful–thing to know about array slices is that they return views rather than copies of the array data. This is one area in which NumPy array slicing differs from Python list slicing: in lists, slices will be copies. Consider our two-dimensional array from before:

In [50]:
x = np.array([1,3,5,6])

In [51]:
y = x.copy()
y

array([1, 3, 5, 6])

In [52]:
y[2] = 10

In [53]:
y

array([ 1,  3, 10,  6])

In [54]:
x

array([1, 3, 5, 6])

In [None]:
x.copy()
x[:, :]

In [63]:
# Reshaping
grid = np.arange(1, 10).reshape(3,-1)
print(grid) # why not just grid instead of print(grid)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [62]:
grid.shape

(3, 3)

In [66]:
# Other methods
x = np.array([1, 2, 3])

# row vector via newaxis
print(x[np.newaxis, :].shape)
print(x[:, np.newaxis].shape)

(1, 3)
(3, 1)


In [69]:
arr = np.ones((3,2))
arr

array([[1., 1.],
       [1., 1.],
       [1., 1.]])

In [72]:
arr.resize((5,6), refcheck=False) # Changes arr shape to 5x6 and fills new values with 0

In [73]:
arr

array([[1., 1., 1., 1., 1., 1.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [None]:
# column vector via newaxis
x[:, np.newaxis]

In [1]:
a = [1, 2, 3, 4]

In [2]:
a * 5 

[1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4]

In [5]:
b = np.array([1,2,3,4])
b*5

array([ 5, 10, 15, 20])

In [1]:
### Ufuncs

With ufuncs, we will eliminate a lot of iterative processes that we come by;

Why Numpy?
it provides an easy and flexible interface to optimized computation with arrays of data. Numpy can be fast or slow

In [None]:
import numpy as np
np.random.seed(0)

def compute_reciprocals(values):
    output = np.empty(len(values))
    for i in range(len(values)):
        output[i] = 1.0 / values[i]
    return output
        
values = np.random.randint(1, 10, size=5)
compute_reciprocals(values)

In [None]:
big_array = np.random.randint(1, 100, size=1000000)

In [None]:
%timeit compute_reciprocals() #% is an ipython magic

In [None]:
%timeit (1.0 / big_array)

How does vectorization work?
This vectorized approach is designed to push the loop into the compiled layer that underlies NumPy, leading to much faster execution.

In [None]:
x = np.arange(9).reshape((3, 3))
2 ** x

In [None]:
# Ufunc wrappers
x = np.arange(4)
print("x     =", x)
print("x + 5 =", x + 5)
print("x - 5 =", x - 5)
print("x * 2 =", x * 2)
print("x / 2 =", x / 2)
print("x // 2 =", x // 2)  # floor division

In [None]:
# e.g Ufuncs 
x ** 2
x*x
#x^2
np.square(X)
np.power()

In [None]:
# Specifying outputs
x = np.arange(5)
y = np.empty(5)
np.multiply(x, 10, out=y)
print(y)

In [None]:
y = np.zeros(10)
np.power(2, x, out=y[::2])
print(y)

In [None]:
# Aggregation

In [7]:
# Reduce
x = np.arange(1, 6)
np.add.reduce(x)

15

In [9]:
x

array([1, 2, 3, 4, 5])

In [11]:
# Accumulate
np.multiply.accumulate(x)[-1]

120

In [None]:
# Inner Product/ dot product

In [None]:
# Outer Product
x = np.arange(1, 6)
np.multiply.outer(x, x)

In [None]:
# Hadamard Product: Element wise


In [8]:
# Randomness
np.random.seed(0)

In [None]:
# Documentation
np.info(np.eye) # View documentation for np.eye
# Using Shift tab
# Using ?

Numba

Basically, Numba has a chance to have the program compiled as a whole, numpy can only call small atomic blocks which themselves have been pre-compiled. Each of those temps are arrays that have to be allocated, operated on, and then deallocated.

In [20]:
import numpy as np
from numba import jit
nobs = 10000 

def proc_numpy(x,y,z):

    x = x*2 - ( y * 55 )      # these 4 lines represent use cases
    y = x + y*2               # where the processing time is mostly
    z = x + y + 99            # a function of, say, 50 to 200 lines
    z = z * ( z - .88 )       # of fairly simple numerical operations

    return z

@jit
def proc_numba(xx,yy,zz):
    for j in range(nobs):     # as pointed out by Llopis, this for loop 
        x, y = xx[j], yy[j]    # is not needed here.  it is here by 
                             # accident because in the original benchmarks 
        x = x*2 - ( y * 55 )   # I was doing data creation inside the function 
        y = x + y*2            # instead of passing it in as an array
        z = x + y + 99         # in any case, this redundant code seems to 
        z = z * ( z - .88 )    # have something to do with the code running
                             # faster.  without the redundant code, the 
        zz[j] = z              # numba and numpy functions are exactly the same.
    return zz

x = np.random.randn(nobs)
y = np.random.randn(nobs)
z = np.zeros(nobs)
res_numpy = proc_numpy(x,y,z)

z = np.zeros(nobs)
res_numba = proc_numba(x,y,z)

In [82]:
np.all( res_numpy == res_numba )

True

In [21]:
%timeit proc_numpy(x,y,z)

The slowest run took 16.53 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 5: 37.1 µs per loop


In [22]:
%timeit proc_numba(x,y,z)

The slowest run took 15.02 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 4.07 µs per loop


In [None]:
AZ\DIGIT FIVE`Q16TRDSZ\