# The numpy.random package

## Overall purpose of the numpy.random package 

NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

NumPy (Numerical Python) is an open source Python library that’s used in almost every field of science and engineering. It’s the universal standard for working with numerical data in Python, and it’s at the core of the scientific Python and PyData ecosystems. NumPy users include everyone from beginning coders to experienced researchers doing state-of-the-art scientific and industrial research and development. The NumPy API is used extensively in Pandas, SciPy, Matplotlib, scikit-learn, scikit-image and most other data science and scientific Python packages.

The NumPy library contains multidimensional array and matrix data structures. It provides ndarray, a homogeneous n-dimensional array object, with methods to efficiently operate on it. NumPy can be used to perform a wide variety of mathematical operations on arrays. It adds powerful data structures to Python that guarantee efficient calculations with arrays and matrices and it supplies an enormous library of high-level mathematical functions that operate on these arrays and matrices.[1]

Numpy random function is a module inside the numpy library. Random number does NOT mean a different number every time. Random means something that can not be predicted logically.Computers work on programs, and programs are definitive set of instructions. So it means there must be some algorithm to generate a random number as well.

If there is a program to generate random number it can be predicted, thus it is not truly random.

Random numbers generated through a generation algorithm are called pseudo random.



## Simple Random Data using the numPY package:

Numpy’s random number routines produce pseudo random numbers using combinations of a BitGenerator to create sequences and a Generator to use those sequences to sample from different statistical distributions.

Generators are defined on the Numpy random package API as Objects that transform sequences of random bits from a BitGenerator into sequences of numbers that follow a specific probability distribution (such as uniform, Normal or Binomial) within a specified interval.(6)

In recent months the numpy random package has made a change to their package in regards the random number genertor (RNG) and the way that seeds are generated. Previously the pseudorandom number generator used in the numpy random package was the Mersenne Twister(MT19937) algorithm.  This algorithm has been updated to use bits provided by the PCG64 (Permuted Congruential Generator) Algorithm.


By default, Generator uses bits provided by PCG64 which has better statistical properties than the legacy Mersenne Twister MT19937 used in RandomState.

In [106]:
from numpy.random import default_rng
rng = default_rng()
vals = rng.standard_normal(10)
more_vals = rng.standard_normal(10)

In [109]:
more_vals

array([ 0.42249179, -0.70980551, -1.57654797,  1.24277087,  0.59001889,
       -0.80223082, -0.01011866,  1.11124682,  0.05691044, -0.29876138])

In [110]:
vals

array([ 0.4683789 ,  1.57004751,  0.70475688,  2.43134302, -1.01758774,
       -1.18223012,  0.2153418 , -2.31694415,  0.07960234, -0.76906249])

### Simple random data function on numPY

#### Random Generator NumPy v1.21:

The Generator provides access to a wide range of distributions, and served as a replacement for RandomState. The main difference between the two is that Generator relies on an additional BitGenerator to manage state and generate the random bits, which are then transformed into random values from useful distributions. The default BitGenerator used by Generator is PCG64. The BitGenerator can be changed by passing an instantized BitGenerator to Generator.


numpy.random.default_rng()




In [2]:
# Efficient numerical arrays.
import numpy as np
from numpy import random


# Plotting.
import matplotlib.pyplot as plt


In [3]:
np.random.random(size=None)

0.04349653497503858

### Practice from the new Numpy Documentation[2]

https://numpy.org/doc/1.21/

In [4]:
rng = np.random.default_rng()
# This initiates a 'Generator' object

In [5]:
type(rng)

numpy.random._generator.Generator

In [6]:
rng

Generator(PCG64) at 0x26BA679E4A0

[1](https://numpy.org/doc/stable/user/absolute_beginners.html)

In [7]:
#
rng.choice(2, size=20)

array([0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0],
      dtype=int64)

In [8]:
rng.random()

0.021733967557711398

In [9]:
type(rng.random())

float

In [10]:
rng.random((5,))

array([0.86361572, 0.62168023, 0.0944921 , 0.66989053, 0.77001018])

In [11]:
5 * rng.random((3, 2)) - 5
#Three-by-two array of random numbers from [-5, 0):

array([[-2.32970758, -1.66857258],
       [-4.21517167, -0.44047435],
       [-2.3908498 , -4.30540313]])

In [12]:
rng.choice(5, 3)
##This is equivalent to rng.integers(0,5,3)
#Generate a uniform random sample from np.arange(5) of size 3:

array([1, 4, 1], dtype=int64)

In [13]:
#Generate a non-uniform random sample from np.arange(5) of size 3:
rng.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0])


array([3, 3, 2], dtype=int64)

In [14]:
#Generate a uniform random sample from np.arange(5) of size 3 without replacement:
rng.choice(5, 3, replace=False)
#This is equivalent to rng.permutation(np.arange(5))[:3]

array([2, 3, 4], dtype=int64)

In [15]:
#Generate a uniform random sample from a 2-D array along the first axis (the default), without replacement:
rng.choice([[0, 1, 2], [3, 4, 5], [6, 7, 8]], 2, replace=False)

array([[3, 4, 5],
       [0, 1, 2]])

In [16]:
names= ['pooh', 'rabbit', 'piglet', 'Peppa']
rng.choice(names, 5, p=[0.5, 0.1, 0.1, 0.3])


array(['piglet', 'pooh', 'pooh', 'pooh', 'pooh'], dtype='<U6')

In [17]:
np.random.rand(3,2)


array([[0.24612165, 0.53832686],
       [0.6160684 , 0.10335535],
       [0.93354154, 0.07019093]])

In [18]:
np.random.randn()

0.9207047786788716

In [19]:
#Two-by-four array of samples from N(3, 6.25):
3 + 2.5 * np.random.randn(2, 4)

array([[1.41622618, 0.8068468 , 1.33625423, 3.82627048],
       [4.78614984, 5.07400558, 2.58709084, 2.89463332]])

In [20]:
np.random.randint(5)
#https://numpy.org/doc/1.21/reference/random/generated/numpy.random.RandomState.random_integers.html

1

In [21]:
np.random.randint(5, size=(3,2))

array([[1, 0],
       [2, 1],
       [3, 3]])

# Random in NumPy used to generate coin flips

Simulate 4 coins flips using the random function on Numpy [3]


In [22]:
np.random.random()
#draw a number between 0 and 1 
# A Bernouli Trial :In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with 
#exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is conducted.

0.6827029276991625

In [23]:
np.random.seed(42)
random_numbers = np.random.random(size=4)
random_numbers


array([0.37454012, 0.95071431, 0.73199394, 0.59865848])

In [24]:
heads = random_numbers < 0.5
heads


array([ True, False, False, False])

In [25]:
np.sum (heads)

1

#### Using a for loop to repeat the four flips over and over again

We need to initialize the number to zero and then do 10,000 repeats of the 4 flip trials 

In [26]:
n_all_heads = 0
for _ in range (10000):
    heads =np.random.random(size=4) < 0.5
    n_heads = np.sum(heads)
    if n_heads == 4:
        n_all_heads += 1
        
n_all_heads / 10000

# probability of getting all four heads in all flips

0.0619

In [27]:
np.random.ranf(5)

array([0.20365333, 0.24226181, 0.25546003, 0.45571635, 0.50957319])

In [28]:
np.random.randint(10, size = 20)

array([1, 6, 5, 5, 3, 9, 8, 5, 7, 7, 3, 4, 1, 3, 5, 1, 1, 9, 6, 9])

In [29]:
np.random.randint(5, size =  2)

array([4, 4])

#### Permutation Functions in NumPy:

A permutation refers to an arrangement of elements. e.g. [3, 2, 1] is a permutation of [1, 2, 3] and vice-versa.

The NumPy Random module provides two methods for this: shuffle() and permutation().[4]


In [30]:
np.random.permutation(10)

array([7, 4, 5, 0, 1, 3, 8, 6, 9, 2])

In [31]:
np.random.permutation([1, 4, 9, 12, 15])

array([ 1, 12, 15,  4,  9])

In [32]:
arr = np.arange(9).reshape((3, 3))
np.random.permutation(arr)

array([[3, 4, 5],
       [6, 7, 8],
       [0, 1, 2]])

## Backround Information on NumPy (5)

https://www.youtube.com/watch?v=ZB7BZMhfPgk Introduction to Numerical Computing with NumPy | SciPy 2019 Tutorial 1.13.00


In [33]:
a = np.array([1,2,3,4,5])

In [34]:
# what kind of array is this?. This is a numpy nd array which means n-dimentional or one dimentional 
type(a)

numpy.ndarray

In [35]:
# what kind of data type is this. Int stands for intergers so 1,2,3,4 etc and 32 bit intergers 
a.dtype

dtype('int32')

In [36]:
a = a.astype('int64')
# I want to change the array dtype because my f array is a 64 bit array. 

In [37]:
f = np.array([1.2,3.2,2.2,3.6,6.3])

In [38]:
# data type here is a floating point because its a decimal point/fractional numbers and the 64 is a 64 bit object
f.dtype

dtype('float64')

In [39]:
a
# this array is a sequence 

array([1, 2, 3, 4, 5], dtype=int64)

In [40]:
a[0]
# we can access a number at any given position in the sequence 

1

In [41]:
a[0]=10
# we can change any number in the sequence

In [42]:
a

array([10,  2,  3,  4,  5], dtype=int64)

In [43]:
a[0]=11.5
# try to assign 11.5 but the data has to be of the same type in the array so we will get 11 as a result instead of a floating point number which would be 11.5

In [44]:
a

array([11,  2,  3,  4,  5], dtype=int64)

In [45]:
a.ndim
# number of dimensions. this array is a one-dimension array

1

In [46]:
a.shape
# the shape is a very imp attribute one as it drives the behaviour of the array. the shape is always a tuple that showsthe number of elements along the dimensions. As can be seen
# below the a array has 1 tuple with 4 elements

(5,)

In [47]:
a.size
# size is always an integer  and shows only the total number of elements. Does not show the shape

5

In [48]:
 c = np.array ([2,4,7,8,9])

In [49]:
c.dtype

dtype('int32')

In [50]:
a+f

#---------------------------------------------------------------------------
#ValueError                                Traceback (most recent call last)
#<ipython-input-50-d86107eb4b16> in <module>
#----> 1 a+f

#ValueError: operands could not be broadcast together with shapes (5,) (4,) 
# my two arrays won't add because they have two different shapes. One has a tuple with 5 elements and the other has a tuple with 4 elements. 
# Element number must need to be equal to add the arrays together


array([12.2,  5.2,  5.2,  7.6, 11.3])

In [51]:
a+f
# after fixing the element number on the f array we can add the arrays together 

array([12.2,  5.2,  5.2,  7.6, 11.3])

In [52]:
a/f

array([9.16666667, 0.625     , 1.36363636, 1.11111111, 0.79365079])

In [53]:
a**f

array([1.77693369e+01, 9.18958684e+00, 1.12115785e+01, 1.47033389e+02,
       2.53227593e+04])

In [54]:
f ** a
#vectorized operation

array([7.43008371e+00, 1.02400000e+01, 1.06480000e+01, 1.67961600e+02,
       9.92436543e+03])

In [55]:
a*10

array([110,  20,  30,  40,  50], dtype=int64)

In [56]:
# universal functions(ufunc)
np.sin(a)

array([-0.99999021,  0.90929743,  0.14112001, -0.7568025 , -0.95892427])

In [57]:
# lets go into 2 dimentional arrays
c = np.array([[1,3,5,7,9],[2,4,6,8,10]])

In [58]:
c

array([[ 1,  3,  5,  7,  9],
       [ 2,  4,  6,  8, 10]])

In [59]:
c.shape

(2, 5)

In [60]:
c.size

10

In [61]:
c.ndim

2

In [62]:
# lets change the elements in the sets
c[1,3]

8

In [63]:
c[1,3]= 4

In [64]:
c

array([[ 1,  3,  5,  7,  9],
       [ 2,  4,  6,  4, 10]])

In [65]:
c[1]

array([ 2,  4,  6,  4, 10])

In [66]:
# How to slect an entire row or multiple elements
#slicing arrays
a


array([11,  2,  3,  4,  5], dtype=int64)

In [67]:
a[1:3]

array([2, 3], dtype=int64)

In [68]:
#negative indices also work
a[1:-2]
# -1 is always the last element of your sequence

array([2, 3], dtype=int64)

In [69]:
#negative indices 
a[-4:3]

array([2, 3], dtype=int64)

In [70]:
#omitting boundaries assumed to be the beginning or end of the list
a[:3]
#grab the first three elements

array([11,  2,  3], dtype=int64)

In [71]:
a[-2:]
# grab the last 2 elements

array([4, 5], dtype=int64)

In [72]:
d = np.array([[1,2,3,4,5,6],[2,3,4,5,6,7],[3,6,7,9,8,9],[2,5,6,8,2,7],[2,5,9,8,7,8],[1,3,4,5,6,9]])

In [73]:
d

array([[1, 2, 3, 4, 5, 6],
       [2, 3, 4, 5, 6, 7],
       [3, 6, 7, 9, 8, 9],
       [2, 5, 6, 8, 2, 7],
       [2, 5, 9, 8, 7, 8],
       [1, 3, 4, 5, 6, 9]])

In [74]:
#slicing using a multi-dim array
d[0, 3:5]
# indexing and slicing

array([4, 5])

In [75]:
d[4:, 4:]
# throw away 4 rows and 4 columns 

array([[7, 8],
       [6, 9]])

In [76]:
d[2:, 3:]
# throw away 2 rows and three columns

array([[9, 8, 9],
       [8, 2, 7],
       [8, 7, 8],
       [5, 6, 9]])

In [77]:
d[1:, 5:]
#throw away one row and 5 colums

array([[7],
       [9],
       [7],
       [8],
       [9]])

In [78]:
#lonely colon- means everything along that dimension(the first dimention on column no 2 )
# the result is a 1 array
d[:, 2]

array([3, 4, 7, 6, 9, 4])

In [79]:
#strides
d[2::2, ::2]

array([[3, 7, 8],
       [2, 9, 7]])

In [80]:
e = np.arange(25). reshape(5, 5)

In [81]:
e


array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

In [82]:
e[:, 1::2]
# I want the 2nd and 4th colunms only 

array([[ 1,  3],
       [ 6,  8],
       [11, 13],
       [16, 18],
       [21, 23]])

In [83]:
# I want the last row only. Three ways to do this
e[4, :]
#choose the 5th row and the lonely colon will choose the entire column

array([20, 21, 22, 23, 24])

In [84]:
e[4]

array([20, 21, 22, 23, 24])

In [85]:
e [-1, :]

array([20, 21, 22, 23, 24])

In [86]:
# random number picking between rows and columns

e[1::2]

array([[ 5,  6,  7,  8,  9],
       [15, 16, 17, 18, 19]])

In [87]:
e[1::2, :3:2]

array([[ 5,  7],
       [15, 17]])

In [88]:
e[1::2, :4:2]

array([[ 5,  7],
       [15, 17]])

In [89]:
# fancy indexing 
a=np.arange(0, 80, 10)


In [90]:
a


array([ 0, 10, 20, 30, 40, 50, 60, 70])

In [91]:

indices = [1,2,-3]
y = a[indices]
# position 1=10 position 2 = 20 position -3 =50 (the last number is always -1)

In [92]:
y

array([10, 20, 50])

In [93]:
a[indices]=90
# now 90 will replace all the locations on the indices with the number 90

In [94]:
a

array([ 0, 90, 90, 30, 40, 90, 60, 70])

In [95]:
a = np.array([[0,1,2,3,4,5],[10,11,12,13,14,15],[20,21,22,23,24,25],[30,31,32,33,34,35],[40,41,42,43,44,45],[50,51,52,53,54,55]])

In [96]:
a

array([[ 0,  1,  2,  3,  4,  5],
       [10, 11, 12, 13, 14, 15],
       [20, 21, 22, 23, 24, 25],
       [30, 31, 32, 33, 34, 35],
       [40, 41, 42, 43, 44, 45],
       [50, 51, 52, 53, 54, 55]])

![fancy indexing.PNG](attachment:4686ca20-d801-445e-ba33-ab442bad4640.PNG)

In [97]:
# yellow 
a[[0,1,2,3,4],[1,2,3,4,5]]
#first array set here will denote the no of the row, the second array will denote the element number on that row or the column number 
a[[0,1,2],[0,1,2]]
# look up diagonal indexing 

array([ 0, 11, 22])

In [98]:
#blue highlighted indexing
a[3:, [0,2,5]]
# 3: will denote everything from the 3rd row down
# [0,2,5]- this will pick the elements in the columns 0 , 2 and 5

array([[30, 32, 35],
       [40, 42, 45],
       [50, 52, 55]])

In [99]:
#red highlighted indexing using masking
mask = np.array ([1,0,1,0,0,1], dtype=bool)
a[mask, 2]
# in the mask array we want the first, the third and the last element denoted by 1 and we're only interested in column number 2.

array([ 2, 22, 52])

![index problrm to solve.PNG](attachment:29f813bd-c7b7-4483-8f9b-bd7d498267df.PNG)

In [100]:
m = np.arange(25).reshape(5,5)

In [101]:
m

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

In [102]:
# extract elements denoted in blue 
m[[0,2,3,3],[2,3,1,4]]

array([ 2, 13, 16, 19])

In [103]:
# extract all numbers divisable by three using a boolean mask
# need to use the modulo 
m % 3
# all the places on the array now that are zero are divisable by three 

array([[0, 1, 2, 0, 1],
       [2, 0, 1, 2, 0],
       [1, 2, 0, 1, 2],
       [0, 1, 2, 0, 1],
       [2, 0, 1, 2, 0]], dtype=int32)

In [104]:
m % 3 == 0
#gives us all the numbers divisable by three in a boolean type array

array([[ True, False, False,  True, False],
       [False,  True, False, False,  True],
       [False, False,  True, False, False],
       [ True, False, False,  True, False],
       [False,  True, False, False,  True]])

In [105]:
m [m%3==0]

array([ 0,  3,  6,  9, 12, 15, 18, 21, 24])

## Generators in NumPy 

Legacy Generator Mersenne Twister (MT19937)- This is a pseudo random number generator tht was previously used in the legacy random state 

### References:
[1](https://numpy.org/devdocs/user/whatisnumpy.html)
[2](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.shuffle.html)
[3](https://www.datacamp.com/community/tutorials/numpy-random)
[4](https://www.w3schools.com/python/numpy/numpy_random_permutation.asp)
[5](https://www.youtube.com/watch?v=ZB7BZMhfPgk)
[6](https://numpy.org/devdocs/reference/random/index.html)