> ### **Assignment 2 - Numpy Array Operations** 
>
> This assignment is part of the course ["Data Analysis with Python: Zero to Pandas"](http://zerotopandas.com). The objective of this assignment is to develop a solid understanding of Numpy array operations. In this assignment you will:
> 
> 1. Pick 5 interesting Numpy array functions by going through the documentation: https://numpy.org/doc/stable/reference/routines.html 
> 2. Run and modify this Jupyter notebook to illustrate their usage (some explanation and 3 examples for each function). Use your imagination to come up with interesting and unique examples.
> 3. Upload this notebook to your Jovian profile using `jovian.commit` and make a submission here: https://jovian.ml/learn/data-analysis-with-python-zero-to-pandas/assignment/assignment-2-numpy-array-operations
> 4. (Optional) Share your notebook online (on Twitter, LinkedIn, Facebook) and on the community forum thread: https://jovian.ml/forum/t/assignment-2-numpy-array-operations-share-your-work/10575 . 
> 5. (Optional) Check out the notebooks [shared by other participants](https://jovian.ml/forum/t/assignment-2-numpy-array-operations-share-your-work/10575) and give feedback & appreciation.
>
> The recommended way to run this notebook is to click the "Run" button at the top of this page, and select "Run on Binder". This will run the notebook on mybinder.org, a free online service for running Jupyter notebooks.
>
> Try to give your notebook a catchy title & subtitle e.g. "All about Numpy array operations", "5 Numpy functions you didn't know you needed", "A beginner's guide to broadcasting in Numpy", "Interesting ways to create Numpy arrays", "Trigonometic functions in Numpy", "How to use Python for Linear Algebra" etc.
>
> **NOTE**: Remove this block of explanation text before submitting or sharing your notebook online - to make it more presentable.


# NumPy


### NumPy Functions

NumPy offers comprehensive mathematical functions, random number generators, linear algebra routines, Fourier transforms, and more. Interoperable. 

- np.linspace
- np.digitize 
- np.repeat
- np.squeeze 
- np.random

The recommended way to run this notebook is to click the "Run" button at the top of this page, and select "Run on Binder". This will run the notebook on mybinder.org, a free online service for running Jupyter notebooks.

In [1]:
!pip install jovian --upgrade -q

In [2]:
import jovian

<IPython.core.display.Javascript object>

In [3]:
jovian.commit(project='numpy-array-operations')

<IPython.core.display.Javascript object>

[jovian] Updating notebook "singhabhishek/numpy-array-operations" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/singhabhishek/numpy-array-operations[0m


'https://jovian.ai/singhabhishek/numpy-array-operations'

Let's begin by importing Numpy and listing out the functions covered in this notebook.

In [4]:
import numpy as np

In [5]:
# List of functions explained 
function1 = np.concatenate  
function2 =  np.digitize 
function3 =  np.repeat
function4 =  np.squeeze 
function5 = np.random

## Function 1 - np.linspace 

The numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0) function returns evenly spaced numbers over a specified interval defined by the first two arguments of the function (start and stop — required arguments). The number of samples generated is specified by the third argument num. If omitted, 50 samples are generated. One important thing to bear in mind while working with this function is that the stop element is provided in the returned array (by default endpoint=True), unlike in the built-in python function range.

In [7]:
# Example 1 - working 
# array with 11 elements, last element included
np.linspace(0,10,11)

array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])

Explanation about example

In [8]:
# Example 2 - working
# array with 11 elements, last element not included
np.linspace(0,10,11,endpoint=False)

array([0.        , 0.90909091, 1.81818182, 2.72727273, 3.63636364,
       4.54545455, 5.45454545, 6.36363636, 7.27272727, 8.18181818,
       9.09090909])

Explanation about example

In [11]:
import numpy 

In [14]:
# Example 3 - breaking (to illustrate when it breaks)
# X-axis linspace function 111 points from 0 to 100    
x = np.linspace(0,100,111)

# Compute for mathematical functions - sine, cosine, exponential, and logarithmic functions
functions = [np.sin(x), np.cos(x), np.exp(x), np.log(x)]
titles = ['Sine function', 'Cosine function', 'Exponential function', 'Logarithmic function']
    

# Plot the functions
plot.figure(figsize=(11,11))

for index,function in enumerate(functions):
    plot.subplot(2, 2, index+1)
    plot.plot(x,function)
    plot.title(titles[index],fontsize=16)

  functions = [np.sin(x), np.cos(x), np.exp(x), np.log(x)]


NameError: name 'plot' is not defined

Explanation about example (why it breaks and how to fix it)

Some closing comments about when to use this function.

In [17]:
import jovian

In [19]:
jovian.commit('Assingment-3')

<IPython.core.display.Javascript object>

[jovian] Updating notebook "singhabhishek/numpy-array-operations" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/singhabhishek/numpy-array-operations[0m


'https://jovian.ai/singhabhishek/numpy-array-operations'

## Function 2 - np.digitize 

Maybe you have never heard about this function, but it can be really useful working with continuous spaces in reinforcement learning. The numpy.digitize(x, bins, right=False) function has two arguments: (1) an input array x, and (2) an array of bins, returning the indices of the bins to which each value in input array belongs. 

In [20]:
# Example 1 - working
# Input array
x = np.array([0.5])

# Bins - 5 bins in total
bins = np.array([0,1,2,3])


In [21]:
# Digitize function - 0.5 belong to the bin 0<= 0.5 <1 - therefore returned index 1
np.digitize(x,bins)

array([1])

In [22]:
# Example 2 - working
# The input array can contain several inputs
x = np.array([-0.5,1,3.5])

# Digitize function
np.digitize(x,bins)

array([0, 2, 4])

Explanation about example- # array([0, 2, 4], dtype=int64)

In [23]:
# Example 3 - breaking (to illustrate when it breaks)
def discretize(location, grid):
    return tuple(int(np.digitize(l, g)) for l, g in zip(location, grid))

# grid - bins - we will consider any value lower than 1 bin 0 and any value larger than 4 bin 4
grid = [np.array([1,2,3,4]),np.array([1,2,3,4])]

In [24]:
location =[2.5,1.2]
print(discretize(location,grid))

(2, 1)


In [25]:
location =[4.5,0.2]
print(discretize(location,grid)

SyntaxError: unexpected EOF while parsing (180117526.py, line 2)

Explanation about example (why it breaks and how to fix it)

Some closing comments about when to use this function.

In [26]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Updating notebook "singhabhishek/numpy-array-operations" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/singhabhishek/numpy-array-operations[0m


'https://jovian.ai/singhabhishek/numpy-array-operations'

## Function 3 - numpy.repeat

The numpy.repeat(a, repeats, axis=None) function repeats the elements of an array. The number of repetitions is specified by the second argument repeats.

In [27]:
# Example 1 - working
np.repeat(3,5)

array([3, 3, 3, 3, 3])

Explanation about example

In [28]:
# Example 2 - working
np.repeat('2015',5)

array(['2015', '2015', '2015', '2015', '2015'], dtype='<U4')

Explanation about example

In [33]:
# Example 3 - breaking (to illustrate when it breaks)
# sales 2017
sales_2017 = pd.DataFrame([['chair',20],['sofa',24],['table',15]],columns=['product','sales_units'])

# sales 2018
sales_2018 = pd.DataFrame([['chair',25],['sofa',10],['shelf',10]],columns=['product','sales_units'])
# sales 2017
# add year column in data frame 2017
sales_2017['year'] = np.repeat(2017,sales_2017.shape[0])

# add year column in data frame 2018
sales_2018['year'] = np.repeat(2018,sales_2018.shape[0])

sales = pd.concat([sales_2017,sales_2018], ignore_index=True)

sales

NameError: name 'pd' is not defined

Explanation about example (why it breaks and how to fix it)

Some closing comments about when to use this function.

In [26]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..[0m
[jovian] Updating notebook "aakashns/numpy-array-operations" on https://jovian.ml/[0m
[jovian] Uploading notebook..[0m
[jovian] Capturing environment..[0m
[jovian] Committed successfully! https://jovian.ml/aakashns/numpy-array-operations[0m


'https://jovian.ml/aakashns/numpy-array-operations'

## Function 4 - np.squeeze

Numpy library includes several constants such as not a number (Nan), infinity (inf) or pi. In computing, not a number is a numeric data type that can be interpreted as a value that is undefined. We can use not a number to represent missing or null values in Pandas. Unfortunately, dirty data sets contain null values with other denominations (e.g. Unknown, — , and n/a), making difficult to detect and drop them.

In [34]:
# Example 1 - working
# numpy array of shape (1,2,1)

arr = np.array([[[1],[1]]])

arr.shape

(1, 2, 1)

Explanation about example

In [35]:
# Example 2 - working
# squeeze out axis 2
arr2 = np.squeeze(arr,axis=2)
arr2
#array([[1, 1]])
arr2.shape

(1, 2)

Explanation about example

In [36]:
# Example 3 - breaking (to illustrate when it breaks)
import torch
import numpy as np
from torchvision import datasets
import torchvision.transforms as transforms

# convert data to torch.FloatTensor
transform = transforms.ToTensor()

# load the mnist datasets
data = datasets.MNIST(root='data',download=True, transform=transform)

# create an iterable over the dataset
loader = torch.utils.data.DataLoader(data, batch_size=20)

# iterate over the iterator one element at a time
dataiter = iter(loader)
images, labels = dataiter.next()

# batch size 20 - number of channels 1
# height of input planes in pixels 28 , and W width in pixels 28 .
images.shape

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw



  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)


torch.Size([20, 1, 28, 28])

Explanation about example (why it breaks and how to fix it)

Some closing comments about when to use this function.

In [37]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Updating notebook "singhabhishek/numpy-array-operations" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/singhabhishek/numpy-array-operations[0m


'https://jovian.ai/singhabhishek/numpy-array-operations'

## Function 5 - np.random

The numpy.random.randint(low, high=None, size=None, dtype=’l’) function returns random integers from the interval [low,high). If high parameter is missing (None), the random numbers are selected from the interval [0,low). By default, a single random number(int) is returned. To generate a narray of random integers, the shape of the array is provided in the parameter size.

In [38]:
# Example 1 - working-- "numpy.random.randint"
# roll a dice 10 times
np.random.randint(1,7,size=10)

array([3, 1, 5, 1, 6, 6, 2, 1, 2, 5])

Explanation about example

In [39]:
# Example 2 - working--" numpy.random.choice"
# roll a dice 10 times
np.random.choice([1,2,3,4,5,6],size=10)

array([2, 4, 2, 5, 5, 5, 2, 5, 3, 5])

The numpy.random.choice(a, size=None, replace=True, p=None) returns a random sample from a given array. By default, a single value is returned. To return more elements, the output shape can be specified in the parameter size as we did before with the numpy.random.randint function.

In [40]:
# Example 3 - breaking (to illustrate when it breaks)--"numpy.random.binomial"
# we can obtain approximated probabilities by simulating a huge number of flips 
# probability of obtaining 4 head in 10 flips 
flips = np.random.binomial(10,0.5,size=int(1e6))
(flips==4).mean()

0.204806

EWe can simulate a wide variety of statistical distributions by using numpy such as normal, beta, binomial, uniform, gamma, or poisson distributions.
The numpy.random.binomial(n, p, size=None) draws samples from a binomial distribution. The binomial distribution is used when there are two mutually exclusive outcomes, providing the number of successes of n trials with a probability of success on a single trial p.

Some closing comments about when to use this function.

In [41]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Updating notebook "singhabhishek/numpy-array-operations" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/singhabhishek/numpy-array-operations[0m


'https://jovian.ai/singhabhishek/numpy-array-operations'

## Conclusion

Summarize what was covered in this notebook, and where to go next

## Reference Links
Provide links to your references and other interesting articles about Numpy arrays:
* Numpy official tutorial : https://numpy.org/doc/stable/user/quickstart.html
* ...

In [None]:
jovian.commit()

<IPython.core.display.Javascript object>