# Numpy

<img src="https://bids.berkeley.edu/sites/default/files/styles/400x225/public/projects/numpy_logo_project_page_banner.png?itok=jaJeRlWs" />

# What is Numpy

- Numpy is the core library for scientific computing in Python. 
- It has a high-performance multidimensional array object
- It provides tools for working with these arrays


# Sample Usage

- store matrices, solve systems of linear equations, find eigenvalues/vectors, find matrix decompositions, and solve other problems familiar from linear algebra
- store multi-dimensional measurement data. For example, an element a[i,j] in a 2-dimensional array might store the temperature tij measured at coordinates i, j on a 2-dimension surface.
- images and videos can be represented as NumPy arrays:
    + a gray-scale image can be represented as a two dimensional array
    + a color image can be represented as a three dimensional image, the third dimension contains the color components red, green, and blue
    + a color video can be represented as a four dimensional array
- a 2-dimensional table might store a sequence of samples, and each sample might be divided into features. For example, we could measure the weather conditions once per day, and the conditions could include the temperature, direction and speed of wind, and the amount of rain. Then we would have one sample per day, and the features would be the temperature, wind, and rain. In the standard representation of this kind of tabular data, the rows corresponds to samples and the columns correspond to features. We see more of this kind of data in the chapters on Pandas and Scikit-learn.

# Objectives

- Creation of arrays
- Array types and attributes
- Accessing arrays with indexing and slicing
- Reshaping of arrays
- Combining and splitting arrays
- Aggregations of arrays
- Matrix operations familiar from linear algebra

# Standard Import

In [None]:
import numpy as np

# Numpy Creation

- Multiple ways to create the arrays
- Think of the array as a list of lists of much much faster!

In [None]:
a = np.array([1,2,3])
a

In [None]:
np.array([[[1,2], [3,4]], [[5,6], [7,8]]])

# Numpy Creation

- Arrays can be created as zeros, ones, or a given value

In [None]:
np.full((2,3), fill_value=7)

In [None]:
np.ones((2,3))

# Numpy Creation

- `eye` provides the identity matrix for a given number of rows

In [None]:
np.eye(7)

# Numpy Creation

- `arange` allows for the 1D creation for values from 0 to N.
- `linspace` takes a start and stop and step size and creates even intervals

In [None]:
np.arange(5)

In [None]:
np.linspace(4, 4.12413453, 10)

# Numpy Creation

- Creating good test data is crucial to making good analysis and model
- Numpy can create random matrices
- Data is uniformly distributed

In [None]:
np.random.random((3,4)) 

In [None]:
np.random.normal(0, 1, (3,4))

In [None]:
np.random.randint(-2, 10, (3,4))

# Attribute and Properties of an Array

- `ndim` tells the number of dimensions
- `shape` tells the size in each dimension
- `size` tells the number of elements
- `dtype` tells the element type

In [None]:
def info(a):
    print(f"Array has dim {a.ndim}, shape {a.shape}, size {a.size}, and dtype {a.dtype}")
array = np.array([1,4,6,7])
info(array)
array

# Indexing

- One dimensional array behaves like the list in Python

In [None]:
a = np.arange(10)
print(a)
print(a[1])
print(a[-2])

# Indexing

- multi-dimensional array the index is a comma separated tuple instead of a single integer

In [None]:
array = np.array([[1,2,3], [4,5,6], [7,1,1]])
print(array)
print(array[1,2])    # row index 1, column index 2
print(array[0,-1])   # row index 0, column index -1

# Data Modification 

- Done by using the index and the `=`

In [None]:
array[0,0] = 10
array

# Iris dataset

- The iris dataset contains the following data
    + 50 samples of 3 different species of iris (150 samples total)
- Measurements: sepal length, sepal width, petal length, petal width
- The format for the data: (sepal length, sepal width, petal length, petal width)

In [None]:
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
X[:4] # First Four Rows

# Important Terminology

    1. Each row is an observation (also known as: sample, example, instance, record)
    2. Each column is a feature (also known as: predictor, attribute, independent variable, input, regressor, covariate)

# Column Names

- Notice displaying the columns does not show column names

In [None]:
iris.feature_names

# Common Array Operations

In [None]:
a = np.array(range(6), float)
a

### Reshaping Array

Gives a new shape to an array without changing its data

In [None]:
a.reshape(2,3)

### Undo Reshape


In [None]:
a.reshape(2,3).flatten()

### Concatenation of Array

Combine multiple arrays into one

In [None]:
a = np.array([1], float)
b = np.array([3,5,7], float)
c = np.array([9, 11], float)
np.concatenate((a, b, c))

### Concatenation of Array

If there are multiple dimensions in the array you can specify the axis

In [None]:
c=np.arange(1,5).reshape(2,2)
print(f"c has shape {c.shape}:", c, sep="\n")
np.concatenate((c,c))   # concatenating 2d arrays (default)

In [None]:
np.concatenate((c,c), axis=1)

### Concatenation of Array

If you want to catenate arrays with different dimensions, for example to add a new column to a 2d array, you must first reshape the arrays to have same number of dimensions:

In [None]:
a=np.arange(2)
print("New row:")
print(np.concatenate((c,a.reshape(1,2))))
print("New column:")
print(np.concatenate((c,a.reshape(2,1)), axis=1))

# Stacking and Splitting Data

- create higher dimensional arrays from lower dimensional arrays using `stack`
- put data into equal bins using `split`

In [None]:
a = np.arange(2)
b = np.ones(2)
np.stack((a,b))

In [None]:
d = np.arange(12).reshape(2,6)
print("original:")
print(d)
parts = np.split(d, (2,3,5), axis=1)
for i, p in enumerate(parts):
    print("part %i:" % i)
    print(p)