# Package Expo - Group 10 - NumPy

NumPy is a library in Python that makes it easier to work with an n-dimensional array. 

## The benefits of NumPy
- More Speed: NumPy uses code written in C that executes much faster
- Fewer Loops: NumPy helps reduce the number of loops used
- Clearer Code: Without using lots of loops, your code will be much cleaner
- Better quality: There are many contriubtors that work hard to keep NumPy fast, user-friendly, and free of bugs

# Step 1: Install NumPy

Installing with Anaconda

- run the install command: $ conda install numpy matplolib

Installing with pip

- run the install comand: pip install numpy matplotlib

# Using IPython, Notebooks, or JupyterLab

The above section provides everything needed to get started, and there are a couple more tools that you can optionally install to make working in data science more developer-friendly.

IPython is an upgraded Python read-eval-print loop (REPL) that makes editing code in a live interpreter session more straightforward and prettier. Here’s what an IPython REPL session looks like:

In [1]:
#Import numpy
import numpy as np

In [2]:
digits = np.array([
   ...:     [1, 2, 3],
   ...:     [4, 5, 6],
   ...:     [6, 7, 9],
   ...: ])

In [3]:
digits

array([[1, 2, 3],
       [4, 5, 6],
       [6, 7, 9]])

IPython has several differences from a basic Python REPL, including its line numbers, use of colors, and quality of array visualizations. There are also a lot of user-experience bonuses that make it more pleasant to enter, re-enter, and edit code.

Installing IPython as a standalone:

- run the install comand: pip install ipython

A slightly more featureful alternative to a REPL is a notebook. Notebooks are a slightly different style of writing Python than standard scripts, though. Instead of a traditional Python file, they give you a series of mini-scripts called cells that you can run and re-run in whatever order you want, all in the same Python memory session.

One neat thing about notebooks is that you can include graphs and render Markdown paragraphs between cells, so they’re really nice for writing up data analyses right inside the code!

Here’s what it looks like:

In [4]:
a = 1
b = 2

In [5]:
c = a + b
c

3

The most popular notebook offering is probably the Jupyter Notebook, but nteract is another option that wraps the Jupyter functionality and attempts to make it a bit more approachable and powerful.

However, if you’re looking at Jupyter Notebook and thinking that it needs more IDE-like qualities, then JupyterLab is another option. You can customize text editors, notebooks, terminals, and custom components, all in a browser-based interface. It will likely be more comfortable for people coming from MatLab. It’s the youngest of the offerings, but its 1.0 release was back in 2019, so it should be stable and full featured.

Whichever option you choose, once you have it installed, you’ll be ready to run your first lines of NumPy code.

# Array Shape and Axes

Let's create an array with a complex shape

In [1]:
import numpy as np

temps = np.array([
    29.3, 42.1, 18.8, 16.1, 38.0, 12.5,
    12.6, 49.9, 38.6, 31.3, 9.2, 22.2
]).reshape(2,2,3)

temps

array([[[29.3, 42.1, 18.8],
        [16.1, 38. , 12.5]],

       [[12.6, 49.9, 38.6],
        [31.3,  9.2, 22.2]]])

In [2]:
temps.shape

(2, 2, 3)

Now let's switch axes of the array

In [4]:
np.swapaxes(temps, 1, 2)

array([[[29.3, 16.1],
        [42.1, 38. ],
        [18.8, 12.5]],

       [[12.6, 31.3],
        [49.9,  9.2],
        [38.6, 22.2]]])

# Axes

In [7]:
table = np.array([
    [5, 3, 7, 1],
    [2, 6, 7 ,9],
    [1, 1, 1, 1],
    [4, 3, 2, 0],
])

table.max()

9

.max() returns the largest value in the entire array (the number of dimensions doesn't matter)

axis 0 specifies row by row

the line of code below will get the largest value in each column

In [8]:
table.max(axis=0)

array([5, 6, 7, 9])

the next line of code will get the larget value in each row

In [9]:
table.max(axis=1)

array([7, 9, 1, 4])

# Broadcasting


Arrays can be broadcast against each other of their dimesnions match or if one of the arrays has a size of 1 

In [10]:
arrayA = np.arange(32).reshape(4, 1, 8)
arrayA

array([[[ 0,  1,  2,  3,  4,  5,  6,  7]],

       [[ 8,  9, 10, 11, 12, 13, 14, 15]],

       [[16, 17, 18, 19, 20, 21, 22, 23]],

       [[24, 25, 26, 27, 28, 29, 30, 31]]])

In [11]:
arrayB = np.arange(48).reshape(1,6,8)
arrayB

array([[[ 0,  1,  2,  3,  4,  5,  6,  7],
        [ 8,  9, 10, 11, 12, 13, 14, 15],
        [16, 17, 18, 19, 20, 21, 22, 23],
        [24, 25, 26, 27, 28, 29, 30, 31],
        [32, 33, 34, 35, 36, 37, 38, 39],
        [40, 41, 42, 43, 44, 45, 46, 47]]])

In [12]:
arrayA + arrayB

array([[[ 0,  2,  4,  6,  8, 10, 12, 14],
        [ 8, 10, 12, 14, 16, 18, 20, 22],
        [16, 18, 20, 22, 24, 26, 28, 30],
        [24, 26, 28, 30, 32, 34, 36, 38],
        [32, 34, 36, 38, 40, 42, 44, 46],
        [40, 42, 44, 46, 48, 50, 52, 54]],

       [[ 8, 10, 12, 14, 16, 18, 20, 22],
        [16, 18, 20, 22, 24, 26, 28, 30],
        [24, 26, 28, 30, 32, 34, 36, 38],
        [32, 34, 36, 38, 40, 42, 44, 46],
        [40, 42, 44, 46, 48, 50, 52, 54],
        [48, 50, 52, 54, 56, 58, 60, 62]],

       [[16, 18, 20, 22, 24, 26, 28, 30],
        [24, 26, 28, 30, 32, 34, 36, 38],
        [32, 34, 36, 38, 40, 42, 44, 46],
        [40, 42, 44, 46, 48, 50, 52, 54],
        [48, 50, 52, 54, 56, 58, 60, 62],
        [56, 58, 60, 62, 64, 66, 68, 70]],

       [[24, 26, 28, 30, 32, 34, 36, 38],
        [32, 34, 36, 38, 40, 42, 44, 46],
        [40, 42, 44, 46, 48, 50, 52, 54],
        [48, 50, 52, 54, 56, 58, 60, 62],
        [56, 58, 60, 62, 64, 66, 68, 70],
        [64, 66, 68, 70, 72,

# Data Science Operations: Indexing
Indexing can be done from the front or back of the array.

Colons (:) are used to specify "the rest" or "all"


In [3]:
import numpy as np

square = np.array([
    [16, 3, 2, 13],
    [5, 10, 11, 8],
    [9, 6, 7, 12],
    [4, 15, 14, 1]
])

In the statement below, x represents the number of rows and y represents the number of columns

[:x,y:]

Putting (:) before x would indicate selecting all rows up until and including row x

Putting (:) after y would indicate selecting all columns after column y

In [18]:
square[:2, 3:]

array([[13],
       [ 8]])

# Masking and Filtering
Masking allows filtering based on more complicated, nonuniform or nonsequential criteria. A mask is an array that has the exact same shape as your data, but instead of your values, it holds Boolean values: either True or False. Masking will return all elements where the Boolean array has a TRUE value.

The following example shows the effect of using a mask. All numbers that are divisible by 4 return a true value while all other numbers return a false value.

In [20]:
import numpy as np

square = np.array([
    [16, 3, 2, 13],
    [5, 10, 11, 8],
    [9, 6, 7, 12],
    [4, 15, 14, 1]
])

mask = square % 4 == 0

mask

array([[ True, False, False, False],
       [False, False, False,  True],
       [False, False, False,  True],
       [ True, False, False, False]])

# Transposing, Sorting and Concatenating
Transposing allows for the rearranging of arrays. The ".T" or ".transpose()" function will transpose an array

In [21]:
square = np.array([
    [16, 3, 2, 13],
    [5, 10, 11, 8],
    [9, 6, 7, 12],
    [4, 15, 14, 1]
])

square.T

array([[16,  5,  9,  4],
       [ 3, 10,  6, 15],
       [ 2, 11,  7, 14],
       [13,  8, 12,  1]])

Sorting allows for data to be sorted numerically.

This can be done by using the ".sort()" function.

In [22]:
np.sort(square)

array([[ 2,  3, 13, 16],
       [ 5,  8, 10, 11],
       [ 6,  7,  9, 12],
       [ 1,  4, 14, 15]])

Concatenation allows for the combination of arrays which can be very useful depending on the circumstance

".hstack" can be used to horizontally combine arrays. 
".vstack" can be used to vertically combine arrays.
".concatenate" can be used to combine arrays. 

In [25]:
a = np.array([
    [4, 8],
    [6, 1]
])

b = np.array([
    [3, 5],
    [7, 9]
])

np.hstack((a, b))

array([[4, 8, 3, 5],
       [6, 1, 7, 9]])

In [26]:
np.vstack((a, b))

array([[4, 8],
       [6, 1],
       [3, 5],
       [7, 9]])

In [27]:
np.concatenate((a, b))

array([[4, 8],
       [6, 1],
       [3, 5],
       [7, 9]])

# Aggregation 
This is another powerful and convenient way to analyze data.
NumPy has a large library of functions but the most popular aggergation ones are .sum(), .max(), .mean(), and .std().

These functions can be applied to arrays.

In [31]:
a.std()

2.5860201081971503