# Introduction to Numpy

Numpy provides an efficient and convenient way to operate on in-memory data.

Mostly, can be thought of as an (advanced) alternative to `list`.

In [1]:
some_list = list(range(1,31))

In [None]:
def last_five(l):
    

## Advantages over list

Context: A list is flexible but it comes at a price.

In [None]:
vals = ['Hello', True, 1, 0.0]
print([type(v) for v in vals])

List needs to store this type and other information along with the actual value for each member. So, it takes up more memory and slows down operations.

Numpy arrays on the other hand are ***homogenous*** (i.e store only one type) and thus only need to store the information once for all values. So, it's faster and more efficient (while sacrificing flexibility).

*Note: Image from [Data Science Handbook](https://github.com/jakevdp/PythonDataScienceHandbook)*

![img](numpy_array.png)

So, the advantages of numpy arrays:
    
- Consumes less memory
- More efficient and faster
- Provides convenient operations

In [None]:
import numpy as np

## Creating Arrays from Python list

In [None]:
a = [1, 2, 3]
np.array(a)

Q. Create a numpy array from a 2d list.

In [None]:
# Your Code here

## Creating arrays from scratch

In [None]:
# Create a length-10 integer array filled with zeros.
np.zeros(10, dtype=int)

In [None]:
# Create a 3x5 floating-point array filled with 1s
np.ones((3, 5), dtype=float)

In [None]:
# Create a 3x5 array filled with 3.14
np.full((3, 5), 3.14)

In [None]:
# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)
np.arange(0, 20, 2)

In [None]:
# Create a 3x3 identity matrix
np.eye(3)

### Getting Random

In [None]:
# Create a 3x3 array of uniformly distributed
# random values between 0 and 1
np.random.random((3, 3))

In [None]:
# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))

Q. Create a 1D numpy array that contains first 10 multiples of 4

In [None]:
# Your Code Here

Q. Create a 1D numpy array with 10 dice rolls (i.e between 1 and 6).

Extra: Make the rolls favor getting a `6`! Hint: Use `np.random.choice`. See doc.

In [None]:
# Your Code Here

## Numpy Array Attributes

In [None]:
x = np.random.randint(10, size=(5,4)) # Two-dimensional array
print("x ndim: ", x.ndim)
print("x shape: ", x.shape)
print("x size: ", x.size)
print("dtype: ", x.dtype)
print("itemsize: ", x.itemsize, "bytes")
print("nbytes: ", x.nbytes, "bytes")

## Array Indexing

In [None]:
x = np.random.randint(10, size=10)
x

In [None]:
x[2]

In [None]:
x[-1]

In [None]:
x = np.random.randint(10, size=(2,3))
x

In [None]:
x[1, 2]

In [None]:
# Can also use to modify
x[0,0] = 1000
x

Q. What's different and why?

In [None]:
vals = [1, 2, 3]
np_vals = np.array(vals)
vals[0] = 10.39
np_vals[0] = 10.39
print(vals)
print(np_vals)

Q. Create a `(3,3)` array of random integers between 10 and 30 and make the first element of last row `0`.

Extra: Make all non-diagonal entries `0`. Hint: Make use of `np.eye`.

In [None]:
# Your Code Here

### Array Slicing

In [None]:
x = np.arange(1, 11)
x

In [None]:
# First Five
x[:5]

In [None]:
# Last Five
x[-5:]

In [None]:
# Middle section
x[3:7]

In [None]:
# Every Other element
x[::2]

In [None]:
# Every other element starting at 2
x[1::2]

In [None]:
np.random.seed(42)
x = np.random.randint(10, size=(4,4))
x

In [None]:
# Two Rows, Three Columns
x[:2, :3]

In [None]:
# First column (from all rows)
x[:, 0]

Q. Slice out the two columns in the middle.

In [None]:
# Your Code Here

Q. Slice out the first three rows of second column (i.e `[3, 9, 4]`)

Extra: Preserve 2D structure i.e `[[3], [9], [4]]`.

In [None]:
# Your Code Here

### Array Reshaping

In [None]:
x = np.arange(9)
print(x)
print(x.shape)

In [None]:
y = x.reshape((3,3))
print(y)
print(y.shape)

### Array Concatenation

In [None]:
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
np.concatenate([x, y])

## Vectorized Operations

In [None]:
x = np.arange(-2, 4)
x

In [None]:
print(x + 2)
print(np.add(x, 2))

In [None]:
np.abs(x)

In [None]:
np.sin(x)

In [None]:
np.exp(x)

In [None]:
print(np.min(x))
print(np.max(x))
print(np.mean(x))
print(np.sum(x))

In [None]:
x = np.array([[1, 2], [3, 4]])
x

In [None]:
np.sum(x)

In [None]:
np.sum(x, axis=0)

## Boolean Arrays

In [None]:
x = np.arange(10)
x

In [None]:
x < 5

In [None]:
# Boolean arrays as masks
# Select elements from x whose values are less that 5
x[x < 5]

Q. Select even elements from x

In [None]:
# Your Code Here

## Matrix/Vector Manipulation

In [None]:
a = np.array([[0, 1], [1, 0]])

In [None]:
b = np.array([[2,3], [1, 2]])

In [None]:
np.matmul(a, b)

In [None]:
np.linalg.det(b)

In [None]:
np.linalg.inv(b)

## Assignments

We have marks for five people in three subjects.

Each row denotes one person, while each column is a subject.

In [None]:
names = np.array(['Alice', 'Bob', 'Chris', 'Dylan', 'Eva'])
subjects = np.array(['Maths', 'Science', 'English'])

In [None]:
marks = np.array([[30, 50, 90],
                  [78, 60, 82],
                  [38, 32, 50],
                  [79, 80, 92],
                  [94, 81, 60]])
marks

Q. Who got the highest total marks?

Extra: Use `np.argmax` to index into `names`.

Q. How many people failed in Maths? (Assume 40 is the pass marks)

Hint: You can use `np.count_nonzero` on the boolean mask.

Q. Which is the most difficult subject? (Assume subject with lowest total marks is the most difficult)

## Extra (Trailer, sort of)

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.scatter(x = marks[:, 0], y= marks[:,1])
plt.xlim(xmin=0, xmax=100)
plt.ylim(ymin=0, ymax=100)
plt.xlabel('Maths')
plt.ylabel('Science')
plt.title('Maths vs Science')

In [None]:
plt.scatter(x = marks[:, 0], y= marks[:, 2])
plt.xlim(xmin=0, xmax=100)
plt.ylim(ymin=0, ymax=100)
plt.xlabel('Maths')
plt.ylabel('English')
plt.title('Maths vs English')