# 5 data analysis functions in Numpy


The functions below are part of the Numpy library and can be used individually or in combination to perform data analysis
in python.

- function 1 : np.linspace()
- function 2 : np.meshgrid()
- function 3 : np.lingalg.eig()
- function 4 : np.put_along_axis()
- function 5 : np.nonzero()

The recommended way to run this notebook is to click the "Run" button at the top of this page, and select "Run on Binder". This will run the notebook on mybinder.org, a free online service for running Jupyter notebooks.

In [1]:
!pip install jovian --upgrade -q

In [2]:
import jovian

In [None]:
jovian.commit(project='numpy-array-operations')

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..[0m


Let's begin by importing Numpy and listing out the functions covered in this notebook.

In [None]:
import numpy as np

In [None]:
# List of functions explained 
'''
function1 = np.linspace()

function2 = ???
function3 = ???
function4 = ???
function5 = ???
'''

## Function 1 - np.linspace() 

The first function on this list creates an array of evenly spaced samples between a designated stop and end point. It also takes a "num" field which designates the number of samples required.

General format: np.linspace(start, end, num)

In [None]:
# Example 1
# We will create an array that collects 5 samples between 1 and 5 using the np.linspace() function 

arr1 = np.linspace(1,5,5)

# To create a two dimensional array we can use the following general format.
# linspace((start1,start2),(end1, end2), num)

arr2 = np.linspace((1,2),(10,20),10)

print('arr1: ')
print(arr1)
print('arr2: ')
print(arr2)

In arr2 the first column represents start1 (1) to end1 (10) and the second column is start2 (2) to end2(20).

The linspace function is similar to the arrange function which creates an array using a given start, stop and interval amount. The difference with the linspace function is you do not provide an interval amount but a number of evenly seperated samples between two precise points.

In [None]:
# Example 2 - working
#We will create an array that collects 5 samples between 1 and 4 but with data type set to float.

arr3 = np.linspace(1,4,5, dtype=float)

#The function has another optional parameter called endpoint which is automatically set to True.
#If we change it to False it will find parameters up to but not including the end value provided.

arr4 = np.linspace(1,4,5, endpoint=False, dtype=float)

print('arr3: ')
print(arr3)
print('arr4: ')
print(arr4)

Notice that arr3 and arr4 do not generate the same sample values as arr3 includes the endpoint and arr4 does not.

In [None]:
# Example 3 - breaking 

arr5 = np.linspace((1,2,3),(10,20),10)

print(arr5)

The above creates an error since we attempted to create a three dimensional array without a third end value. To fix this we would need to add another value in the second parameter (10,20,X).

This function can be used collaboratively with the arrange function. In certain cases where the interval between values in an array is highly important the arrange function would be better. When the endpoint is more important, the linspace funciton makes more sense.

In [None]:
jovian.commit()

## Function 2 - np.meshgrid()

The numpy meshgrid() function will convert vectors into matrices (or grids). It does this by default using cartesian indexing with an x and y 'axis'. The function takes the data that is given and converts it to a grid with the 'x' values across the columns (x-axis) and a second grid with the 'y' values across the rows (y-axis).

In [None]:
# Example 1 - working

# two one-dimensional arrays

x = np.array([1,2,3,4])
y = np.array([10,20,30,40,50])

x_1, y_1 = np.meshgrid(x,y)

print('x_1:')
print(x_1)
print('y_1:')
print(y_1)

For x_1 and y_1 the grid is a 4x5 ('x'x'y') and a 5x4 ('y'x'x')

In [None]:
# Example 2 - working
# Using 3 arrays to create 3 grids

a = np.array([1,2,3])
b = np.array([4,3,7])
c = np.array([5,6,7])

xa, xb, xc = np.meshgrid(a,b,c)

print('xa:')
print(xa)
print('xb:')
print(xb)
print('xc:')
print(xc)

For n number of arrays the meshgrid() function will generate n number of corresponding grids.

In [None]:
# Example 3 - breaking 

#****THE BELOW WILL TAKE FOREVER TO RUN****

x2 = np.linspace(10,100,9999999)

y2 = np.array([10,20,30,40,50])

x_2, y_2 = np.meshgrid(x2,y2)

print('x_2:')
print(x_2)
print('y_2:')
print(y_2)

The function runs into a memory error here as it tries to run about 2GBs of memory to create the grids. The solution to this would be to add the sparse parameter (sparse=True) and cut this down to a fraction of the memory but only display one column or row for each grid.

Numpy's meshgrid() provides a quick and easy way to display data in a grid format with a lot of flexibility. There are many practical uses for evaluating functions on a grid and plotting the results.

In [None]:
jovian.commit()

## Function 3 - linalg.eig()

The linalg.eig() function takes a square array (1x1, 2x2, 3x3, etc.) for input and computes the eigenvalues and eigenvectors. the 

In [None]:
# Example 1 - working

# Create a square array to use in the fuction

arr1 = np.array([[1,2,3],
        [3,4,6],
        [5,8,9]])
    
np.linalg.eig(arr1)

The function computes first an array with the eigenvalues of the 3 rows in the array that was given. Next it computes the eigenvectors for the given.

In [None]:
# Example 2 - working

# Using the identity matrix to verify this

arr2 = np.array([[1,0,0],
               [0,1,0],
               [0,0,1]])

np.linalg.eig(arr2)

The identity matrix has eigenvalues of 1,1,1 and shows the same matrix for it's eigenvectors as expected.

In [None]:
# Example 3 - breaking (to illustrate when it breaks)

arr3 = np.array([[1,2,3],
        [2,8,14],
        [1,5,7],
       [ 3,9,27]])

np.linalg.eig(arr3)


The arr3 that we created cannot be used as it has a 3x4 shape and must be a squre to be used in the eig() function. We must remove a row to compute this.

Eigenvalues and eigenvectors are complicated to solve but with this function the process is greatly simplified. These values can be applied in many linear algebra cases to create alternative solutions and simplify data sets by a scalar.

In [None]:
jovian.commit()

## Function 4 - put_along_axis()

The numpy put_along_axis() function will allow you to replace values in an array at precise axis points.

General format: put_along_axis(destination_array, indices_to_replace, replace_value, axis_to_replace_from)

In [None]:
# Example 1 - working
# We will create an array 'x' with three dimensions.
# Using the numpy put_along_axis() function, we will replace the values in x at the index values 0 for the 1st row, 
# 1 for the 2nd row and 2 for the 3rd row, all with the value 99.

import numpy as np

x = np.array([[10,20,30],[30,40,50],[50,60,70]])

print('Before:')
print(x)

np.put_along_axis(x, np.array([[0],[1],[2]]), 99, axis = 1)

print('After:')
print(x)

The (3x3) array x began without the value 99 and had it inserted on all 3 rows.

In [None]:
# Example 2 - working
# Now, we will create arrays a, index and replace. Using the same function, we will replace values of 'a' at the index values
# from 'index', with the values from 'replace'.

a = np.array([[1,2,3,4],[3,4,5,5],[1,5,6,7]])

print('Before:')
print(a)

index_arr = np.array([0,1,0], dtype=int)
replace_arr = np.array([100,99,101])

np.put_along_axis(a, index_arr[:, None], replace_arr[:, None], axis = 1)

print('After:')
print(a)

The 'replace_arr' values were inserted in the 3 lines of 'a' at the 'index_arr' points.

In [None]:
# Example 3 - breaking (to illustrate when it breaks)

w = np.array([[1,1,3],[2,2,2],[3,3,3]])

print('Before:')
print(w)

index_arr = np.array([0,0,0,0])
replace_arr = np.array([100,99,101])

np.put_along_axis(w, index_arr[:, None], replace_arr[:, None], axis = 1)

print('After:')
print(w)

In this example we are trying to replace four different indexes from the array 'w' but there are only 3 rows to iterate through in the array. We would have to remove one value from the 'index' array or add another row to 'w'.

This function has many uses for correcting data in an array and has the flexibility to do it on a large scale.

In [None]:
jovian.commit()

## Function 5 - nonzero()

This function simply returns the index of non-zero elements in an array but there are many interesting applications.

General format: nonzero(array)

In [None]:
# Example 1 - working
# Getting indices of non-zero elements in basic 2x3 array

import numpy as np

a = np.array([[0,2,3],[4,0,6]])

np.nonzero(a)

The function returns two arrays representing both co-ordinates of the non-zero values.
(0,1) = 2
(0,2) = 3
(1,0) = 4
(1,2) = 6

In [None]:
# Example 2 - working
# Getting just non-zero elements from an array

b = np.array([1,np.NaN, 0, 2, 0, 3, 4])
print('Original:')
print(b)

print('Non-zero:')
print(b[np.nonzero(b)])

The function returns the values that are not 0 in the array including the null NaN value.

In [None]:
# Example 3 - breaking (to illustrate when it breaks)

c = np.array([[0,0,0,0],[1,2,3],[0,5]])
print(c)
print(c[np.nonzero(c)])

Although the function does run this code and provides an output, there is a warning due to the varying lengths of rows in the array. To fix this, we would even each line out to have 4 elements.

This function can be used in many different ways including correction and analysis of values in a datset.

In [None]:
jovian.commit()

## Conclusion

These functions have the potential to simplify and/or correct data to further analyze and make statistic based decisions or solve complicated problems. I encourage anyone interested in data science to familiarize themselves with these 5 functions as they will surely come up again in your career!

## Reference Links
Links to your references and other interesting articles about Numpy arrays:
* Numpy official tutorial : https://numpy.org/doc/stable/user/quickstart.html
* https://numpy.org/doc/stable/reference/generated/numpy.nonzero.html#numpy.nonzero
* https://docs.w3cub.com/numpy~1.17/generated/numpy.put_along_axis/
* https://numpy.org/doc/stable/reference/generated/numpy.meshgrid.html
* https://www.geeksforgeeks.org/numpy-meshgrid-function/
* https://www.sharpsightlabs.com/blog/numpy-linspace/

In [None]:
jovian.commit()