# NumPy Introduction

- Improved by: [Dhouha Grissa](https://www.cpr.ku.dk/staff/jensen-group/?pure=en/persons/605543)

**Numpy**, which stands for Numerical Python, is the core library for scientific and numerical computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays.

In this tutorial, we explain the basics of NumPy such as its environment and properties. We also discusse the various array functions, operations, types of indexing, etc.

   - Official documentation: http://docs.scipy.org/doc/
   - The NumPy User Guide: [PDF](https://numpy.org/doc/stable/numpy-user.pdf), [website](https://numpy.org/doc/stable/user)
   - [Manual](https://numpy.org/doc/stable/)

### What is NumPy?

**Numpy** is an open source project, created in 2005 by Travis Oliphant. It's main object is the multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers. In Numpy, dimensions are called axes.

### Why Use NumPy ?

**Numpy** is at the core of nearly every scientific Python application or module since it provides a fast N-d array datatype that can be manipulated in a vectorized form.

   - Python does numerical computations slowly
   - 1000 x 1000 matrix multiply
   - fast, multidimensional arrays
   - Python triple loop takes > 10 min **vs** Numpy takes ~0.03 seconds
   - fast, multidimensional arrays
   - plotting tools

### Arrays and Vectorization

**Arrays** are sequences of same-type data points (most-often numbers).  Numpy allows us to work with the sequence without writing a for-loop, using a technique called **vectorization**.  

Besides an **array()** class, Numpy also includes a lot of math functions, which makes analysis much easier.  Let's try some out!

## NumPy Getting started

### Installation of NumPy

If you have Python and **PIP** already installed on a system, then installation of NumPy is very easy.

Install it using this command: **pip install numpy**

### Import NumPy

Once NumPy is installed, import it in your applications by adding the import keyword:

In [9]:
import numpy

### NumPy as np

NumPy is usually imported under the **np** alias.
Create an alias with the **as** keyword while importing:

In [2]:
import numpy as np

Now the NumPy package can be referred to as **np** instead of **numpy**.

### Checking NumPy Version

The version string is stored under **__**version__ attribute.

In [10]:
print(np.__version__)

1.14.3


## NumPy Creating Arrays

NumPy is used to work with arrays. The array object in **NumPy** is called **ndarray**.

We can create a NumPy ndarray object by using the **array()** function.

### 1-D Arrays

An array that has a rank 1 array and called uni-dimensional or 1-D array.

In [65]:
#**Example**
arr = np.array([1, 2])

print(arr)

print(type(arr))

[1 2]
<class 'numpy.ndarray'>


- Create a 1-D array containing the values 1,2,3,4,5. Then print it out:

In [14]:
#arr = np.array([1, 2, 3, 4, 5])
#print(arr)

### 2-D Arrays

An array that has 1-D arrays as its elements is called a 2-D array.

These are often used to represent matrix or 2nd order tensors.

- Create a 2-D array containing two arrays with the values 1,2,3 and 4,5,6:

In [16]:
#arr = np.array([[1, 2, 3], [4, 5, 6]])
#print(arr)

[[1 2 3]
 [4 5 6]]


### 3-D arrays

An array that has 2-D arrays (matrices) as its elements is called 3-D array.

These are often used to represent a 3rd order tensor.

- Create a 3-D array with two 2-D arrays, both containing two arrays with the values 1,2,3 and 4,5,6:

In [64]:
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
#print(arr)

### Check Number of Dimensions

NumPy Arrays provides the **ndim** attribute that returns an integer that tells us how many dimensions the array have.

In [3]:
a = np.array(42)

In [None]:
# print(a.ndim)

- Check how many dimensions these arrays have:

In [63]:
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])

In [62]:
#print(b.ndim)
#print(c.ndim)
#print(d.ndim)

### Get the Shape of an Array

The shape of an array is the number of elements in each dimension.

NumPy arrays have an attribute called **shape** that returns a tuple with each index having the number of corresponding elements.

- Print the shape of a 2-D array:

In [53]:
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

In [None]:
#print(arr.shape)

## Building Arrays Exercices

### Building Arrays

Numpy has some convenient array-building functions as well.  Some commonly-used are examples are **arange()**, **linspace()**, **zeros()**, and the random number generation functions in **random**.

| function | Purpose |  Example |
| -------- | ------- | ---------|
| **`np.arange()`**                  | Makes an array with all the integers between two values | np.arange(2, 7) |
| **`np.linspace()`**               | Makes a specific-length array |  np.linspace(2, 3, 10) |
| **`np.zeros()`**                    | Makes an array of all zeros | np.zeros(5) |
| **`np.ones()`**                     | Makes an array of all ones | np.ones(3) |
| **`np.random.random()`** | Makes an array of random numbers | np.random.random(100) |
| **`np.random.randn()`**     | Makes an array of normally-distributed random numbers | np.random.randn(100) |


1. Make an array containing the numbers 1 to 15.

In [5]:
#np.arange(1, 16)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

2. Make an array containing 20 zeros.

In [6]:
#np.zeros(20)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0.])

3. Make an array contain 20 ones!

In [7]:
#np.ones(20)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1.])

4. Generate an array of 10 random numbers from Numpy's **random** submodule, using any function you want.

In [25]:
#np.random.random((10))

array([0.19420113, 0.72116741, 0.12062238, 0.31133038, 0.83235338,
       0.1229514 , 0.2357226 , 0.76733401, 0.97485605, 0.36178444])

5. Create an array with 5 dimensions and verify that it has 5 dimensions, using **ndmin** as argument:

In [22]:
#arr = np.array([1, 2, 3, 4], ndmin=5)
#print(arr)
#print('number of dimensions :', arr.ndim)

[[[[[1 2 3 4]]]]]
number of dimensions : 5


## NumPy Array Indexing
Array indexing is the same as accessing an array element.
You can access an array element by referring to its index number.

The indexes in NumPy arrays start with 0, meaning that the first element has index 0, and the second has index 1 etc.

1. Get the first element from the following array:  

In [27]:
arr = np.array([1, 2, 3, 4]) 
#print(arr[0]) 

1


2. Get third and fourth elements from the following array and add them.

In [30]:
arr = np.array([1, 2, 3, 4])
#print(arr[2] + arr[3])

## NumPy Sorting Arrays
Sorting means putting elements in an ordered sequence.

Ordered sequence is any sequence that has an order corresponding to elements, like numeric or alphabetical, ascending or descending.

The NumPy ndarray object has a function called **sort()**, that will sort a specified array.

1. Sort the array:

In [None]:
arr = np.array([3, 2, 0, 1])

#print(np.sort(arr))

2. Sort the array alphabetically:

In [None]:
arr = np.array(['banana', 'cherry', 'apple'])

#print(np.sort(arr))

3. Sort a 2-D array:

In [31]:
arr = np.array([[3, 2, 4], [5, 0, 1]])

#print(np.sort(arr))

[[2 3 4]
 [0 1 5]]


## Statistics Methods on Arrays

Arrays have many useful math methods.  For example, to get the mean of an array of numbers:

```python
data = np.random.random(100)
data.mean()
```

**Exercise**: Calculate the statistics on the following numbers:

In [45]:
data = np.arange(2, 7)
data

array([2, 3, 4, 5, 6])

1. Get the mean of the data.

In [46]:
#print(data.mean())

4.0


2. What is the sum of the data?

In [48]:
#print(data.sum())

20


3. The maximum of the data?

In [49]:
#print(data.max())

6


4. The standard deviation of the data?

In [50]:
#print(data.std())

1.4142135623730951


## Statistics Functions on Arrays

Numpy also has many useful statistics **functions**.  These take an array as an input and can be found inside the **np** library.  Sometimes, the same functionality can be found both as a Numpy function  and an array method, giving you the choice of how you'd like to use it.  


For example, the following **method**:


```python
data = np.random.random(100)
data.mean()
```

can also be used as a **function**:


```python
data = np.random.random(100)
np.mean(data)
```

**Exercise**: Calculate the statistics on the following numbers, this time using the **function** version of the previous methods.

In [33]:
data = np.arange(2, 7)
data

array([2, 3, 4, 5, 6])

1. Get the mean of the data.

In [35]:
#np.mean(data)

2. What is the sum of the data?

In [36]:
#np.sum(data)

20

3. What is the minimum of the data?

In [37]:
#np.min(data)

2

4. The standard deviation?

In [39]:
#np.std(data)

1.4142135623730951

## Arithmetic with Arrays

Arrays can also be added, subtracted, multiplied, and divided.  

For example, to add 10 to all values in an array:

```python
data = np.random.randn(5)
print(data)
print(data + 10)
```

Here is multiplying two arrays together: 

```python
print(data )
print(data * data)
```

It is also possible to use arithmetic operations such as **add(), subtract(), multiply(), and divide()**

**Exercises**: Modify the following arrays using the math operators  (+, -, *, /)

In [54]:
data = np.arange(-3, 5)
data

array([-3, -2, -1,  0,  1,  2,  3,  4])

1. Multiply the data by 100

In [56]:
#print(data * 100)
#print(np.multiply(data, 100))

[-300 -200 -100    0  100  200  300  400]


2. Add 40 to each value in the array.

In [42]:
#print(data + 40)

[37 38 39 40 41 42 43 44]


3. Divide the numbers by 100

In [43]:
#print(data / 100)

[-0.03 -0.02 -0.01  0.    0.01  0.02  0.03  0.04]


4. Subtract the data from itself.

In [44]:
#print(data - data)

[0 0 0 0 0 0 0 0]


## Extra Exercises

### Exercise: Other math functions

1. Calculate the square of all the numbers from 0 to 8.

In [59]:
arr = np.arange(0, 9)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

In [None]:
#print(np.square(arr))

2. Calculate the square roots of all the numbers from 0 to 8.

In [60]:
#print(np.sqrt(arr))

[0.         1.         1.41421356 1.73205081 2.         2.23606798
 2.44948974 2.64575131 2.82842712]


3. Make an array of 20 values, all of them 2's.

In [79]:
#data = np.linspace(2, 2, 20)
#data

array([2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
       2., 2., 2.])

4. Subtract the mean of the array from each element in the array (a.k.a. "mean-centering" the values)

In [74]:
#arr = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
#mean = arr.mean()
#mean
#print(arr - mean)
#arr - mean[:, None] if mean = arr.mean(axis=1)

[[-5.5 -4.5 -3.5]
 [-2.5 -1.5 -0.5]
 [ 0.5  1.5  2.5]
 [ 3.5  4.5  5.5]]


### Translating Algorithms into Code

Calculate the standard deviation of an array's values, without using the numpy.std() function.  (Formula can be found here: http://www.mathsisfun.com/data/standard-deviation-formulas.html)

1. Work out the Mean (the simple average of the numbers)
2. Then for each number: subtract the Mean and square the result
3. Then work out the mean of those squared differences.
4. Take the square root of that and we are done!


In [91]:
import math

arr = np.arange(1,10)
print(arr)
def std_deviation(data):
    mean = sum(data) / len(data)
    sd =0.0
    for x in data:
        sd+= pow(x-mean, 2) 
    var = sd/len(data) # variance
    std = math.sqrt(var)
    return std

print("Standard Deviation of the sample is: ", std_deviation(arr))

[1 2 3 4 5 6 7 8 9]
Standard Deviation of the sample is:  2.581988897471611
