# UMD FIRE Stream- Genome Computing III
## Elements of Linear Algebra

Here you will learn about working with vectors and matrices with the `numpy` module.

Linear Algebra is not only essential in computational work, its important in Genome Computing work. 

This notebook will take a bit of time for those who are unfamilar with vectors and I encourage you to check out the numpy documentation at https://numpy.org/doc/ based on the version of NumPy you have.
<li>- for the 'dnalab' environment you should refer to Version 1.23 (as of Jan 2023)</li>

Here you will learn about
<li>- arrays, rows, and columns</li>
<li>- Learn to filter databased on logic operators</li>
<li>- Make a figure with a "meshgrid"</li>
<li>- and more!</li>


# `numpy`

`numpy` is another very popular package in the world of data science and machine learning. It is short for Numerical Python, and is one of the most important foundational packages for numerical computing in `python`.

Let's start off by importing the package.

In [None]:
# It is standard to import numpy as np

import numpy as np

print(np.__version__)


### `numpy`'s ndarray

The `numpy` data object is know as an `ndarray` it is similar to a list

In [None]:
# You can make an array with np.array

array1 = np.array([1,2,3,4])

print(array1)
print()
print(type(array1))


... but it can have any <b>finite</b> number dimensions.

In [None]:
array2 = np.array([[1,2],[2,1]])

print(array2)
print()

# np.shape() will give you the dimensions of the array
# array2 is a 2 by 2 array

print("array2 is a",np.shape(array2),"ndarray")


In [None]:
array3 = np.array([[[1,2],[2,1]],[[2,3],[3,2]]])

print(array3)
print()
print("array3 is a",np.shape(array3),"ndarray")


We can keep going if we want to

In [None]:
## Practice
## Make a 4 dimensional array and store it in array4

array4 = np.array(
[[[[1,2], [1,2]], [[1,2], [1,2]]], 
 [[[1,2], [1,2]], [[1,2], [1,2]]],
 [[[1,2], [1,2]], [[1,2], [1,2]]]
])
    
print(array4)
print("array4 is a",np.shape(array4),"ndarray")


In [None]:
# Delete the arrays you made above (save your memory)





#### Other Array Generators

As we've seen you can make arrays out of lists and lists of lists. There are also a number of built in array generators.

In [None]:
# We can make an empty array
# np.empty(array_shape)

np.empty((2,2,2))


In [None]:
# We can make an array of all ones
# np.ones(array_shape)

np.ones(12)


In [None]:
#  An array of zeros
# np.zeros(array_shape)

np.zeros((4,4))


In [None]:
# a 1d array for an interval
# np.linspace(start, end, number of entries)

np.linspace(0, 1, 10)


In [None]:
# the numpy version of range allows decimals
# np.arange(start,end,step_size)

np.arange(0, 10, 0.1)


In [None]:
# make a 2-dimensional grid

interval = np.linspace(-2,2,100)

print(interval)

print('*'*100)

# meshgrid takes in two intervals as arguments
# it outputs two arrays that give every x,y
# pair from the two intervals you enter

xs,ys = np.meshgrid(interval, interval)

# useful for calculations

z = xs**2 + ys**2  # the ** operator is an exponent, so **2 means 'squared'

print(z)

In [None]:
# We will now import a second module - `matplotlib' - that will allow us to print figures

# A sample plot using meshgrid

import matplotlib.pyplot as plt

plt.figure(figsize=(10,10))

plt.imshow(z)

# Try a different color map called plasma- comment out the plt line above and remove the commenter from the line below
#plt.imshow(z, cmap=plt.cm.plasma)

plt.colorbar()

plt.title("Image of $x^2 + y^2$ for a grid of values")

plt.show()


In [None]:
# Task: use np.linspace make an interval from -5 to 5, store it in x; repeat for 30.0 to 40.0, store it in y, and print both x and y



#### Indexing or Slicing an ndarray

Just like with lists, pandas DataFrames, or any other data structure you may want to access particular elements or subsets of an ndarray. Let's see how to do that below.

In [None]:
# a is a random 50 by 2 array
# more on random arrays in a bit

a = 2*np.random.random((50,2)) - 1

a


In [None]:
# I can get an individual entry like so
# the 1 row and 1 column entry
print(a[1,1])
print()


# the entire 4th row
print(a[4,:])
print()


# The entire 0th column
# note despite being a column it is output as a row vector
print(a[:,0])


In [None]:
# You can get a slice of a as well, such as the 1 through 4 rows

a[1:4,:]


In [None]:
# You can index by logical conditions

# Every entry of a where a is positive and returned as a 1d array
a[ a > 0 ]

In [None]:
# Only rows where the 0th columns is positive
# and the 1 column is negative

a[ ( a[:,0] > 0 ) & ( a[:,1] < 0 ), :]


In [None]:
## Task: find all rows where the 1 column is greater than 0.25 or the 0 column is less than 0.25




### Functions on ndarrays

ndarrays are nice because unlike base `python` lists and arrays `numpy` has been designed to allow very fast elementwise operations.

In [None]:
## Practice
## What happens here

[1,2] + [2,3]


In [None]:
## What about here?

np.array([1,2]) + np.array([2,3])


In [None]:
## What happens here

2 * [3,4]


In [None]:
## What about here?

2*np.array([3,4])


In [None]:
# Okay how about

y = 3*[1,2,3] + 2


In [None]:
# and this?

y = 3*np.array([1,2,3]) + 2

y


In [None]:
# There are even a number of popular built-in math functions for ndarray absolute value
# This is the x from your earlier practice block

np.abs(x)


In [None]:
# ... and square root

np.sqrt(np.abs(x))


In [None]:
# ... and the floor function

np.floor(x)

In [None]:
## Practice
## using np.exp define y to be 
## e^(x+3) + log(|x|+1)





### Pseudorandom ndarrays and Stat Functions

`numpy` is useful for generating random numbers as well. We can look at common statistics of arrays too.

In [None]:
# random generators are stored in np.random
# a np.random.random() gives a uniform random number in [0,1]

np.random.random()

# execute this block a few times to confirm random outputs

In [None]:
# np.random.randn() is a normal(0,1) number
# a single draw

print(np.random.randn())
print()

# .. and for an an array of draws:

np.random.randn(10,2)

In [None]:
# Task: make a 20 by 3 array of random normal draws and call it X for items in the next block



In [None]:
# We can use np.mean to get the overall array mean
print(np.mean(X))
print()

# ... the row mean
print(np.mean(X,axis=0))
print()

# ... or the column mean (again output as a row vector)
print(np.mean(X,axis=1))

### Linear Algebra with `numpy`

A final important use for us is `numpy` as a way to perform linear algebra calculations.

In [None]:
# We can think of a 2D array as a matrix

A = np.random.binomial(n=4,p=.3,size=(2,2))

A

In [None]:
# A 1d array can be a row vector

x = np.array([1,2])
x

In [None]:
# ... or a column vector

x.reshape(-1,1)


In [None]:
# We can now calculate A*x
# matrix.dot() is used for matrix multiplication

A.dot(x.reshape(-1,1))


In [None]:
# Task: take the matrix B and multiply it with a column vector of 1s



In [None]:
# Dr. Robert Young, University of Maryland
# In collaboration with Matthew Osborne, Erdos Institute
# UMD FIRE Genome Computing