<a href="https://colab.research.google.com/github/krishnavarathan/python-data-analysis/blob/main/Generating_Data_With_NumPy_Complete.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Generating Data w/ Numpy

In [70]:
import numpy as np

### np.empty(), np.zeros(), np.ones(), np.full()

In [71]:
array_empty = np.empty(shape = (2,3))
array_empty

array([[1.24e-322, 1.48e-322, 1.73e-322],
       [1.98e-322, 2.22e-322, 2.47e-322]])

In [72]:
# zeros
array_0s = np.zeros(shape  = (2,3))
array_0s

array([[0., 0., 0.],
       [0., 0., 0.]])

In [73]:
array_0s = np.zeros(shape = (2,3), dtype = np.int8)
array_0s

array([[0, 0, 0],
       [0, 0, 0]], dtype=int8)

In [74]:
# ones
array_1s = np.ones(shape  = (2,3))
array_1s

array([[1., 1., 1.],
       [1., 1., 1.]])

In [75]:
# full
array_full = np.full(shape = (2,3), fill_value = 2) # One additional mandatory argument - fill_value -> scalar
array_full

array([[2, 2, 2],
       [2, 2, 2]])

In [76]:
array_full = np.full(shape = (2,3), fill_value = 'Three-Six-Five')
array_full

array([['Three-Six-Five', 'Three-Six-Five', 'Three-Six-Five'],
       ['Three-Six-Five', 'Three-Six-Five', 'Three-Six-Five']],
      dtype='<U14')

### "_like" functions

In [77]:
matrix_A = np.array([[1,0,9,2,2],[3,23,4,5,1],[0,2,3,4,1]])
matrix_A

array([[ 1,  0,  9,  2,  2],
       [ 3, 23,  4,  5,  1],
       [ 0,  2,  3,  4,  1]])

In [78]:
array_empty_like = np.empty_like(matrix_A)

# Shape and type are like the prototype.
# If we want to override this, we can define dtype and shape and pass different values (but why even use empty_like then).

array_empty_like

array([[      702336255,               0,        11664720,
               11654504,        11665152],
       [       11654504,        11664576,        11654504,
               11664576,        11717696],
       [       11708096,        11662568, 133681259127792,
        133681259127792,             320]])

In [79]:
array_0s_like = np.zeros_like(matrix_A)
array_0s_like

# We have corresponding functions for 1s and full as well.

array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]])

### np.arange()

In [80]:
#range(30)
list(range(30))

# range(30) results in a range object.
# list(range(30)) creates a list with all the values in this range.

[0,
 1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28,
 29]

In [81]:
array_rng = np.arange(30)
array_rng

## Creates an ndarray with the values in this range.

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])

In [82]:
# array_rng = np.arange(stop =  30)
# array_rng = np.arange(start =  30)
array_rng

# The only mandatory argument is "start", rather than stop.
# If we specify only a start, the function assumes this is the "stop" and starts from the origin (0).

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])

In [83]:
array_rng = np.arange(start = 0, stop =  30)
array_rng

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])

In [84]:
array_rng = np.arange(start = 0, stop =  30, step = 2.5)
array_rng

# "Step" doesn't have to be the same type as the values of the array.

array([ 0. ,  2.5,  5. ,  7.5, 10. , 12.5, 15. , 17.5, 20. , 22.5, 25. ,
       27.5])

In [85]:
array_rng = np.arange(start = 0, stop =  30, step = 2.5, dtype = np.float32)
array_rng = np.arange(start = 0, stop =  30, step = 2.5, dtype = np.int32)
array_rng

# The casting happens after all the computations.

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22], dtype=int32)

## Random Generators

### Defining Random Generators

In [86]:
from numpy.random import Generator as gen
from numpy.random import PCG64 as pcg


## We load two functions from the numpy.random module.

In [87]:
array_RG = gen(pcg())

#array_RG.normal()
#array_RG.normal(size = 5)
array_RG.normal(size = (5,5))

# RG is short for Random Generator.

array([[ 0.01629188,  1.35438466, -0.18916567, -0.34121281,  2.35174814],
       [ 1.29003142, -0.78729913, -0.30513545,  1.82829873, -1.55225011],
       [-1.35442055, -1.10979934, -0.83217481,  0.31817632, -1.43078717],
       [ 1.77595962, -0.12013822, -0.27213083, -0.69595276,  1.3755859 ],
       [-0.95652468, -0.42521105, -0.22004154,  1.24192268,  1.02174848]])

We can set a seed, so that our random values don't change everytime we re-run the code. We'll se the seed equal to 365.

In [88]:
array_RG = gen(pcg(seed = 365))
array_RG.normal(size = (5,5))

# Re-running this cell provides a consistent output, since the seed (with fixed starting values) is set.

array([[-0.13640899,  0.09414431, -0.06300442,  1.05391641, -0.6866818 ],
       [-0.50922173, -0.7999526 ,  0.73041825,  0.08825439, -2.1177576 ],
       [ 0.65526774, -0.48095012, -0.5519114 , -0.58578662, -0.98257896],
       [ 1.12378166, -1.30984316, -0.04703774,  0.955272  ,  0.26071745],
       [-0.20023668, -1.50172484, -1.4929163 ,  0.96535084,  1.18694633]])

In [89]:
array_RG.normal(size = (5,5))

# The seed is fixed for a single itteration.

array([[-0.76065577,  1.48158358,  0.01200258, -0.06846959,  0.25301664],
       [-0.52640788,  0.79613109,  0.28203421,  1.80238008,  0.93932117],
       [-0.53693283, -0.26317689, -1.77723035,  1.14900013, -2.20733915],
       [ 1.54116775, -0.5124932 , -2.14564563,  1.98878673,  0.32208907],
       [-1.2651495 ,  3.2714633 ,  1.78650493, -0.20233675,  0.20427467]])

### Generating Integers, Probabilities and Random Choices

In [90]:
array_RG = gen(pcg(seed = 365))
#array_RG.integers(10, size = (5,5))
array_RG.integers(low = 10, high = 100, size = (5,5))

# Generates integers within a range.

array([[18, 78, 64, 78, 84],
       [66, 67, 28, 10, 69],
       [45, 15, 37, 74, 96],
       [19, 21, 89, 73, 54],
       [53, 84, 66, 51, 92]])

In [91]:
array_RG = gen(pcg(seed = 365))
array_RG.random(size = (5,5))

array([[0.75915734, 0.7662218 , 0.6291028 , 0.20336599, 0.66501486],
       [0.06559111, 0.71326309, 0.10812106, 0.87969046, 0.49405844],
       [0.82472673, 0.45652944, 0.07367232, 0.69628564, 0.36690736],
       [0.29787156, 0.4996155 , 0.4865245 , 0.62740703, 0.54952637],
       [0.64894629, 0.04411757, 0.7206516 , 0.84594003, 0.17159792]])

In [92]:
#array_RG.choice(matrix_A[0], size = (5,5))
array_RG = gen(pcg(seed = 365))
#array_RG.choice([1,2,3,4,5], size = (5,5))
array_RG.choice((1,2,3,4,5), p = [0.1,0.1,0.1,0.1,0.6],size = (5,5))

# Chooses among a given set (with possible weighted probabilities).

array([[5, 5, 5, 3, 5],
       [1, 5, 2, 5, 5],
       [5, 5, 1, 5, 4],
       [3, 5, 5, 5, 5],
       [5, 1, 5, 5, 2]])

### Generating Arrays From Known Distributions

In [93]:
array_RG = gen(pcg(seed = 365))
array_RG.poisson(size = (5,5))

# The default Poisson distribution.

array([[2, 0, 1, 1, 2],
       [1, 1, 0, 1, 1],
       [1, 2, 1, 1, 0],
       [0, 1, 0, 2, 1],
       [0, 1, 0, 0, 2]])

In [94]:
array_RG = gen(pcg(seed = 365))
array_RG.poisson(lam = 10,size = (5,5))

# Specifying lambda.

array([[11, 12, 12, 14, 13],
       [ 9, 10, 11, 11,  8],
       [11,  8, 10,  9, 14],
       [ 7,  8,  9, 15, 15],
       [13,  8,  8,  7,  9]])

In [95]:
array_RG = gen(pcg(seed = 365))
array_RG.binomial(n = 100, p = 0.4, size = (5,5))

# A binomial distribution with p = 0.4 and 100 trials.

array([[42, 44, 30, 36, 45],
       [36, 41, 38, 42, 41],
       [35, 31, 35, 46, 29],
       [41, 41, 46, 34, 48],
       [45, 45, 45, 40, 43]])

In [96]:
array_RG = gen(pcg(seed = 365))
array_RG.logistic(loc = 9, scale = 1.2, size = (5,5))

# A logistic distribution with a location = 9 and scale = 1.2.

array([[10.37767822, 10.42451863,  9.63404367,  7.36153427,  9.82286787],
       [ 5.81223125, 10.09354231,  6.46790532, 11.38740256,  8.97147918],
       [10.85844698,  8.79081317,  5.962079  ,  9.99560681,  8.34539118],
       [ 7.97105522,  8.9981544 ,  8.93530194,  9.6253307 ,  9.23850869],
       [ 9.73729284,  5.3090678 , 10.13723528, 11.04372782,  7.11078651]])

### Applications of Random Generators

#### Creating Tests

In [97]:
array_RG = gen(pcg(seed = 365))

array_column_1 = array_RG.normal(loc = 2, scale = 3, size = (1000))
array_column_2 = array_RG.normal(loc = 7, scale = 2, size = (1000))
array_column_3 = array_RG.logistic(loc = 11, scale = 3, size = (1000))
array_column_4  = array_RG.exponential(scale = 4, size = (1000))
array_column_5  = array_RG.geometric(p = 0.7, size = (1000))

# Create the individual columns of the dataset we're creating.

In [98]:
#random_test_data = np.array([array_column_1, array_column_2, array_column_3, array_column_4, array_column_5]).transpose()
random_test_data = np.array([array_column_1, array_column_2, array_column_3, array_column_4, array_column_5]).transpose()
random_test_data

# Use np.array to generate a new array with the 5 arrays we created earlier.
# Use the transpose method to make sure our dataset isn't flipped.

array([[ 1.59077303,  6.42174295, 10.14698427,  6.91500737,  1.        ],
       [ 2.28243293,  8.57902322, 15.93309953,  6.243605  ,  1.        ],
       [ 1.81098674,  5.17270135, -0.46878789,  2.44997251,  1.        ],
       ...,
       [ 0.1973629 ,  4.3465854 ,  2.66485989,  0.80935387,  1.        ],
       [-2.21015722,  8.2176402 , 12.69328115,  0.50644607,  2.        ],
       [ 2.91161235,  7.90337695, 11.79840961,  4.86816939,  1.        ]])

In [99]:
random_test_data.shape

(1000, 5)

In [100]:
np.savetxt("Random-Test-from-NumPy.csv", random_test_data, fmt = '%s', delimiter = ',')


# Saving the arrays to an extrenal file we're creating.

# file name -> "Random-Test-from-NumPy.csv"
# random_test_data -> data we're exporting (saving to an external file)
# format -> strings
# delimiter ","

# We'll talk more about these in just a bit.

In [101]:
np.genfromtxt("Random-Test-from-NumPy.csv", delimiter = ',')

# Importing the data from the file we just created.

array([[ 1.59077303,  6.42174295, 10.14698427,  6.91500737,  1.        ],
       [ 2.28243293,  8.57902322, 15.93309953,  6.243605  ,  1.        ],
       [ 1.81098674,  5.17270135, -0.46878789,  2.44997251,  1.        ],
       ...,
       [ 0.1973629 ,  4.3465854 ,  2.66485989,  0.80935387,  1.        ],
       [-2.21015722,  8.2176402 , 12.69328115,  0.50644607,  2.        ],
       [ 2.91161235,  7.90337695, 11.79840961,  4.86816939,  1.        ]])

In [102]:
rand_test_data = np.genfromtxt("Random-Test-from-NumPy.csv", delimiter = ',')
print(rand_test_data)

[[ 1.59077303  6.42174295 10.14698427  6.91500737  1.        ]
 [ 2.28243293  8.57902322 15.93309953  6.243605    1.        ]
 [ 1.81098674  5.17270135 -0.46878789  2.44997251  1.        ]
 ...
 [ 0.1973629   4.3465854   2.66485989  0.80935387  1.        ]
 [-2.21015722  8.2176402  12.69328115  0.50644607  2.        ]
 [ 2.91161235  7.90337695 11.79840961  4.86816939  1.        ]]


**Exercises:**

**NumPy - Arrays of 0s and 1s - Exercise #1**
Generate an empty array of size 10.

Generate an array of size 10 filled with zeros.

Generate an array of size 10 filled with ones.

Generate an array of size 10 filled with twos.

Generate a 2×4 empty array.

Generate a 2×4 array filled with zeros.

Generate a 2×4 array filled with ones.

Generate a 2×4 array filled with twos.

In [103]:
#1.
print(np.empty(10))
#2.
print(np.zeros(10))
#3.
print(np.ones(10))
#4.
print(np.full(10,2))
#5.
print(np.empty((2,4)))
#6.
print(np.zeros((2,4)))
#7.
print(np.ones((2,4)))
#8.
print(np.full((2,4),2))

[0.0e+000 3.0e-323 2.5e-323 0.0e+000 4.4e-323 9.9e-324 9.9e-324 4.9e-324
 1.5e-323 4.9e-324]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[2 2 2 2 2 2 2 2 2 2]
[[3.0e-323 2.5e-323 4.4e-323 9.9e-324]
 [9.9e-324 4.9e-324 1.5e-323 4.9e-324]]
[[0. 0. 0. 0.]
 [0. 0. 0. 0.]]
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]]
[[2 2 2 2]
 [2 2 2 2]]


**"_like" functions in NumPy - Exercise #1**
Using the provided objects, apply NumPy’s _like functions to generate the following:

1. An empty array matching the shape of array_1D.

2. A 2D array of zeros with the same shape as array_2D.

3. A 3D array of ones shaped like array_3D.

4. A 3D array of twos, also matching the shape of array_3D.

In [104]:

array_1D = np.array([10,11,12,13, 14])
array_2D = np.array([[20,30,40,50,60], [43,54,65,76,87], [11,22,33,44,55]])
array_3D = np.array([[[1,2,3,4,5], [11,21,31,41,51]], [[11,12,13,14,15], [51,52,53,54,5]]])

#1.
print(np.empty_like(array_1D))
#2.
print(np.zeros_like(array_2D))
#3.
print(np.ones_like(array_3D))
#4.
print(np.full_like(array_3D, 2))

[10 11 12 13 14]
[[0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]
[[[1 1 1 1 1]
  [1 1 1 1 1]]

 [[1 1 1 1 1]
  [1 1 1 1 1]]]
[[[2 2 2 2 2]
  [2 2 2 2 2]]

 [[2 2 2 2 2]
  [2 2 2 2 2]]]


**NumPy - A Non-Random Sequence of Numbers - Exercise #1**
Using the np.arange() function, generate the following sequences of numbers:

1. Integers from 0 up to (but not including) 50.

2. Integers from 1 to 50, inclusive.

3. Integers from 25 to 50, inclusive.

4. Every 5th integer from 25 to 50, inclusive.

5. Every 5th integer from 25 to 50, inclusive, represented as 32-bit floating-point numbers.

In [105]:
#1.
print(np.arange(50))
#2.
print(np.arange(1,51))
#3.
print(np.arange(25,51))
#4.
print(np.arange(25,51,5))
#5.
print(np.arange(25,51,5, dtype=np.float32))

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
 48 49]
[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
 49 50]
[25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
 49 50]
[25 30 35 40 45 50]
[25. 30. 35. 40. 45. 50.]


**NumPy - Random Generators and Seeds - Exercise #1**
From the NumPy .random module, the Generator and PCG64 classes have been imported as gen and pcg, respectively.


1. Create a random generator object named array_RG using a Generator initialized with a PCG64 bit generator.

2. Display the generator object.

3. Using the .random() method, generate and display a single probability of an event occurring.

4. Using the .random() method, generate and display a 1D array of size 10 representing the probabilities of 10 events.

5. Using the .random() method, generate and display a 2D array of shape (5, 10) representing the probabilities of 50 events.

In [106]:
from numpy.random import Generator as gen
from numpy.random import PCG64 as pcg

# PCG = Permuted Congruential Generator
# 1
array_RG = gen(pcg())

# 2
print(array_RG)

# 3
print(array_RG.random())

# 4
print(array_RG.random(10))

# 5
print(array_RG.random((5,10)))

Generator(PCG64)
0.17483836501335048
[0.51484303 0.95768868 0.26399319 0.4352668  0.85320595 0.06370249
 0.69419715 0.70786597 0.8758145  0.51121264]
[[0.2778015  0.3864317  0.01330451 0.76569266 0.06748543 0.03642055
  0.39303136 0.78278085 0.28539292 0.03077869]
 [0.95610615 0.58828252 0.13212926 0.50982573 0.0386025  0.54595638
  0.58763328 0.72124772 0.63341631 0.67988193]
 [0.89474277 0.87552148 0.24664759 0.35894988 0.7868182  0.42816828
  0.20708383 0.46600454 0.69904311 0.73455297]
 [0.28362317 0.18811951 0.26646873 0.49272429 0.50807497 0.71960328
  0.33647386 0.61642971 0.39988512 0.5392473 ]
 [0.51818416 0.04666854 0.82607459 0.46940398 0.06700548 0.62279289
  0.05111235 0.17896148 0.46861908 0.65520593]]


**Basic Random Functions in NumPy - Exercise #1**
From the NumPy .random module, the Generator and PCG64 classes have been imported as gen and pcg, respectively.



1. Create a random generator object named array_RG using a Generator initialized with a PCG64 bit generator seeded with 123.

2. Generate and display a two-dimensional array of shape (5, 10) representing the probabilities of 50 events.

In [107]:
#1.
array_RG = gen(pcg(seed=123))
#2.
print(array_RG.random((5,10)))

[[0.68235186 0.05382102 0.22035987 0.18437181 0.1759059  0.81209451
  0.923345   0.2765744  0.81975456 0.88989269]
 [0.51297046 0.2449646  0.8242416  0.21376296 0.74146705 0.6299402
  0.92740726 0.23190819 0.79912513 0.51816504]
 [0.23155562 0.16590399 0.49778897 0.58272464 0.18433799 0.01489492
  0.47113323 0.72824333 0.91860049 0.62553401]
 [0.91712257 0.86469025 0.21814287 0.86612743 0.73075194 0.27786529
  0.79704355 0.86522171 0.2994379  0.52704208]
 [0.07148681 0.58323841 0.2379064  0.76496365 0.17363164 0.31274226
  0.01447448 0.03255192 0.49670184 0.46831253]]


**Basic Random Functions in NumPy - Exercise #2**
From the NumPy .random module, the Generator and PCG64 classes have been imported as gen and pcg, respectively.



1. Create a random generator object named array_RG using a Generator initialized with a PCG64 bit generator seeded with 123.

2. Using the array_RG generator, generate and display a single integer between 0 and 10.

In [108]:
#1.
array_RG = gen(pcg(123))
#2.
print(array_RG.integers(3))

0


From the NumPy .random module, the Generator and PCG64 classes have been imported as gen and pcg, respectively.



1. Create a random generator object named array_RG using a Generator initialized with a PCG64 bit generator seeded with 123.

2. Using the array_RG generator, generate and display a one-dimensional array of 10 integers between 0 and 10 (exclusive).

In [109]:
#1.
array_RG = gen(pcg(123))
#2.
print(array_RG.integers(10, size=10))

[0 6 5 0 9 2 2 1 3 1]


**Basic Random Functions in NumPy - Exercise #4**
From the NumPy .random module, the Generator and PCG64 classes have been imported as gen and pcg, respectively.



1. Create a random generator object named array_RG using a Generator initialized with a PCG64 bit generator seeded with 123.

2. Using the array_RG generator, generate and display a two-dimensional array of shape (5,10) containing integers between 0 (inclusive) and 10 (exclusive).

In [110]:
#1.
array_RG = gen(pcg(123))
#2.
print(array_RG.integers(10, size = (5,10)))

[[0 6 5 0 9 2 2 1 3 1]
 [3 8 4 9 4 2 7 8 8 8]
 [0 5 2 2 2 8 7 2 4 7]
 [1 6 4 9 7 2 8 7 2 5]
 [7 2 2 1 0 4 0 5 4 1]]


**Probability Distributions in NumPy - Exercise #1**
From the NumPy .random module, the Generator and PCG64 classes have been imported as gen and pcg, respectively.



1. Create a random generator object named array_RG using a Generator initialized with a PCG64 bit generator seeded with 123. Specify the parameter name when passing the seed to the PCG64 bit generator.

2. Consult the NumPy documentation, choose a normal probability distribution, and generate a single value from it without specifying any non-mandatory arguments.

3. Consult the NumPy documentation, choose a Poisson probability distribution and generate a one-dimensional array of 10 values from it. Set the expected value parameter lam to 15.

4. Consult the NumPy documentation, choose a binomial probability distribution and generate a two-dimensional array of shape (5, 10) using 100 trials and a probability of success of 70%.