<img src="https://user-images.githubusercontent.com/7065401/39118381-910eb0c2-46e9-11e8-81f1-a5b897401c23.jpeg"
    style="width:300px; float: right; margin: 0 40px 40px 40px;"></img>

# Numpy: Numeric computing library

NumPy (Numerical Python) is one of the core packages for numerical computing in Python. Pandas, Matplotlib, Statmodels and many other Scientific libraries rely on NumPy.

NumPy major contributions are:

* Efficient numeric computation with C primitives
* Efficient collections with vectorized operations
* An integrated and natural Linear Algebra API
* A C API for connecting NumPy with libraries written in C, C++, or FORTRAN.

Let's develop on efficiency. In Python, **everything is an object**, which means that even simple ints are also objects, with all the required machinery to make object work. We call them "Boxed Ints". In contrast, NumPy uses primitive numeric types (floats, ints) which makes storing and computation efficient.

<img src="https://docs.google.com/drawings/d/e/2PACX-1vTkDtKYMUVdpfVb3TTpr_8rrVtpal2dOknUUEOu85wJ1RitzHHf5nsJqz1O0SnTt8BwgJjxXMYXyIqs/pub?w=726&h=396" />


In [5]:
import numpy as np

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)



## Creating Numpy Arrays from Python Lists

In [4]:
np.array([1, 2, 3, 4])

array([1, 2, 3, 4])

In [6]:
[3.14, 4, 2, 3]

[3.14, 4, 2, 3]

In [8]:
np.array([3.14, 4, 2, 3])

array([3.14, 4.  , 2.  , 3.  ])

In [10]:
np.array([1, 2, 3, 4], dtype='float32')

array([1., 2., 3., 4.], dtype=float32)

In [12]:
a1 = np.array([1, 2, 3, 4])

In [14]:
type(a1)

numpy.ndarray

In [18]:
a2 = np.array([[1, 2, 3], [4, 5, 6]])

In [20]:
type(a2)

numpy.ndarray

In [22]:
a2.shape

(2, 3)

In [24]:
a2.ndim

2

In [26]:
a2.dtype

dtype('int32')

In [28]:
a2.size

6

Unlike Python lists, NumPy is constrained to arrays that all contain the same type. If types do not match, NumPy will upcast if possible (here, integers are up-cast to floating point)

Unlike Python lists, NumPy arrays can explicitly be **multi-dimensional**

## Creating Arrays from Scratch

### `zeros`, `ones`, `full`, `arange`, `linspace`

In [34]:
np.zeros([2, 4], dtype = int)

array([[0, 0, 0, 0],
       [0, 0, 0, 0]])

In [36]:
np.ones([3, 5], dtype = float)

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [38]:
# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range function)
np.arange(0, 10, 2)

array([0, 2, 4, 6, 8])

In [40]:
np.full([2, 3], 3.5)

array([[3.5, 3.5, 3.5],
       [3.5, 3.5, 3.5]])

In [42]:
# Create an array of five values evenly spaced between 0 and 1
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

### `random` 

In [52]:
np.random.random((4, 4))

array([[0.86554241, 0.95871637, 0.67287409, 0.47772868],
       [0.84434922, 0.48682894, 0.03288957, 0.29223806],
       [0.37043843, 0.21386059, 0.83224866, 0.4946439 ],
       [0.00631914, 0.80261223, 0.80081251, 0.94966149]])

In [54]:
np.random.random((4, 4))

array([[0.62136927, 0.13006924, 0.27374677, 0.19046589],
       [0.48783704, 0.0412301 , 0.04088057, 0.7578944 ],
       [0.9112785 , 0.52540017, 0.09362762, 0.57501773],
       [0.23146376, 0.6502748 , 0.95707749, 0.63135128]])

In [92]:
# Seed
np.random.seed(0)
np.random.random((4, 4))

array([[0.5488135 , 0.71518937, 0.60276338, 0.54488318],
       [0.4236548 , 0.64589411, 0.43758721, 0.891773  ],
       [0.96366276, 0.38344152, 0.79172504, 0.52889492],
       [0.56804456, 0.92559664, 0.07103606, 0.0871293 ]])

In [96]:
np.random.normal(0, 1, (3, 3))

array([[ 0.44386323,  0.33367433,  1.49407907],
       [-0.20515826,  0.3130677 , -0.85409574],
       [-2.55298982,  0.6536186 ,  0.8644362 ]])

In [98]:
np.random.randint(0, 10, (4, 5))

array([[7, 2, 0, 0, 4],
       [5, 5, 6, 8, 4],
       [1, 4, 9, 8, 1],
       [1, 7, 9, 9, 3]])

In [100]:
np.random.rand(4, 4)

array([[0.65314004, 0.17090959, 0.35815217, 0.75068614],
       [0.60783067, 0.32504723, 0.03842543, 0.63427406],
       [0.95894927, 0.65279032, 0.63505887, 0.99529957],
       [0.58185033, 0.41436859, 0.4746975 , 0.6235101 ]])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## NumPy Array Attributes

In [None]:
x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array

- `itemsize`, which lists the size (in bytes) of each array element, and 
- `nbytes`, which lists the total size (in bytes) of the array

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Array Indexing & Slicing
### One-dimensional subarray

In [102]:
x1 = np.random.randint(20, size=6)

In [104]:
x1

array([15, 13, 16, 17,  5,  9])

In [116]:
x1[4], x1[0], x1[-1]

(5, 15, 9)

### Slicing:
`x[start:stop:step]`

In [136]:
x1[0:3:1]

array([15, 13, 16])

### Multi-dimensional array

In [123]:
x2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array

In [126]:
x2

array([[3, 4, 1, 2],
       [9, 1, 4, 6],
       [8, 2, 3, 0]])

In [128]:
x2[1,2]

4

In [130]:
x2[1, 2] = 6

In [132]:
x2

array([[3, 4, 1, 2],
       [9, 1, 6, 6],
       [8, 2, 3, 0]])

In [138]:
x2[:2,:3]

array([[3, 4, 1],
       [9, 1, 6]])

In [140]:
x2[:,:2]

array([[3, 4],
       [9, 1],
       [8, 2]])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Reshaping of Arrays & Transpose

In [147]:
grid = np.arange(1, 10)
grid.shape

(9,)

In [149]:
grid.reshape(3, 3)

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [151]:
x = np.array([1, 2, 3])

In [153]:
x.shape

(3,)

In [157]:
x.reshape(1, 3).shape

(1, 3)

In [159]:
x = np.array([[1., 2.], [3., 4.]])
x

array([[1., 2.],
       [3., 4.]])

In [161]:
x.T

array([[1., 3.],
       [2., 4.]])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Array Concatenation and Splitting

In [165]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])

In [169]:
np.concatenate((x, y)) #axis = 0 by default

array([1, 2, 3, 3, 2, 1])

In [171]:
grid = np.array([[1, 2, 3], [4, 5, 6]])
grid

array([[1, 2, 3],
       [4, 5, 6]])

In [175]:
np.concatenate((grid, grid), axis = 1)

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

In [None]:
#vstack
x = np.array[1, 2, 3]
grid = np.array([[9, 8, 7], 
                 [4, 5, 6]])

In [177]:
np.vstack((x, grid))

array([[1, 2, 3],
       [1, 2, 3],
       [4, 5, 6]])

In [179]:
#horizontally stack the arrays: hstack
y = np.array([[99], [99]])
np.hstack((y, grid))

array([[99,  1,  2,  3],
       [99,  4,  5,  6]])

### Splitting of arrays

In [185]:
x = np.array([1, 2, 3, 4, 5, 6])
x1, x2, x3 = np.split(x, [3, 5])
x1, x2, x3

(array([1, 2, 3]), array([4, 5]), array([6]))

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Summary statistics

And [many more](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.ndarray.html#array-methods)...

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Broadcasting and Vectorized operations

Broadcasting is simply a set of rules for applying binary ufuncs (e.g., addition, subtraction, multiplication, etc.) on arrays of different sizes.

![image-broadcasting](https://jakevdp.github.io/PythonDataScienceHandbook/figures/02.05-broadcasting.png)

In [7]:
a = np.arange(3)

In [9]:
a

array([0, 1, 2])

In [11]:
a + 5

array([5, 6, 7])

In [13]:
b = np.ones((3, 3))

In [15]:
b

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [17]:
a.shape, b.shape

((3,), (3, 3))

In [19]:
a + b

array([[1., 2., 3.],
       [1., 2., 3.],
       [1., 2., 3.]])

In [21]:
a*b

array([[0., 1., 2.],
       [0., 1., 2.],
       [0., 1., 2.]])

In [29]:
c = np.arange(3).reshape((3, 1))

In [31]:
c

array([[0],
       [1],
       [2]])

In [33]:
a+c

array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])

### Manipulating & Comparing Arrays 

In [37]:
list_number = [1, 2, 3]

In [39]:
ll = np.array(list_number)

In [41]:
ll

array([1, 2, 3])

In [43]:
sum(ll) #python sum()

6

In [45]:
np.sum(ll) #Numpy sum()

6

In [49]:
# Create a massive Numpy array
massive_array = np.random.random(10000)
massive_array[:5]
massive_array.shape

(10000,)

In [58]:
%timeit sum(massive_array)
%timeit np.sum(massive_array)

801 μs ± 32.7 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
5.38 μs ± 139 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [60]:
np.mean(massive_array)

0.5036348940926638

In [62]:
np.max(massive_array)

0.9998032079026611

In [64]:
np.min(massive_array)

0.00014893209111543904

In [68]:
dog_height = [600, 470, 170, 430]
dog_height = np.array(dog_height)
np.std(dog_height)

156.10493265749164

In [70]:
np.var(dog_height)

24368.75

In [72]:
np.sqrt(np.var(dog_height))

156.10493265749164

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Sorting Arrays

np.sort uses an quicksort algorithm


In [74]:
x = np.array([3, 2, 5, 4, 1])
np.sort(x)

array([1, 2, 3, 4, 5])

In [76]:
#A related function is argsort, which instead returns the indices of the sorted elements:
np.argsort(x)

array([4, 1, 0, 3, 2], dtype=int64)

### Sorting along rows or columns
NumPy's sorting algorithms is the ability to sort along specific rows or columns of a multidimensional array using the axis argument

In [86]:
np.random.seed(42)
MatA = np.random.randint(0,10, size=(4,6))

In [88]:
MatA

array([[6, 3, 7, 4, 6, 9],
       [2, 6, 7, 4, 3, 7],
       [7, 2, 5, 4, 1, 7],
       [5, 1, 4, 0, 9, 5]])

In [90]:
np.sort(MatA, axis = 0)

array([[2, 1, 4, 0, 1, 5],
       [5, 2, 5, 4, 3, 7],
       [6, 3, 7, 4, 6, 7],
       [7, 6, 7, 4, 9, 9]])

In [92]:
np.sort(MatA, axis = 1)

array([[3, 4, 6, 6, 7, 9],
       [2, 3, 4, 6, 7, 7],
       [1, 2, 4, 5, 7, 7],
       [0, 1, 4, 5, 5, 9]])

### Partial Sorts: Partitioning

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Linear Algebra

In [96]:
A = np.array([[1, 2, 3],
            [4, 5, 6],
            [7, 8, 9]])

In [100]:
B = np.array([[6, 5],
             [4, 3],
              [2, 1]])

In [102]:
# A (3x3) dot product B (3x2)
A.dot(B)

array([[20, 14],
       [56, 41],
       [92, 68]])

In [104]:
1*6 + 2*4 + 3*2

20

In [106]:
A @ B

array([[20, 14],
       [56, 41],
       [92, 68]])

In [108]:
# B (3, 2) dot A (3x3)
B.T #(2x3) dot A (3x3)

array([[6, 4, 2],
       [5, 3, 1]])

In [110]:
B.T @ A

array([[36, 48, 60],
       [24, 33, 42]])

### Dot Product Example

In [113]:
#Number of jars sold
np.random.seed(0)
sales_amounts = np.random.randint(20, size=(5, 3))

In [115]:
sales_amounts

array([[12, 15,  0],
       [ 3,  3,  7],
       [ 9, 19, 18],
       [ 4,  6, 12],
       [ 1,  6,  7]])

In [117]:
#Create weekly_sales DataFrame
import pandas as pd
weekly_sales = pd.DataFrame(sales_amounts, index=["Mon", "Tues", "Wed", "Thurs", "Fri"],
                           columns = ["Almond Butter", "Peanut Butter", "Cashew butter"])

In [119]:
weekly_sales

Unnamed: 0,Almond Butter,Peanut Butter,Cashew butter
Mon,12,15,0
Tues,3,3,7
Wed,9,19,18
Thurs,4,6,12
Fri,1,6,7


In [139]:
# create a price array
prices = np.array([10, 8, 12])
prices.shape

(3,)

In [133]:
butter_prices = pd.DataFrame(prices.reshape(1, 3), index = ["Price"], columns = ["Almond Butter", "Peanut Butter", "Cashew butter"])

In [135]:
butter_prices

Unnamed: 0,Almond Butter,Peanut Butter,Cashew butter
Price,10,8,12


In [145]:
weekly_sales.shape, butter_prices.shape

((5, 3), (1, 3))

In [153]:
total_prices = weekly_sales.dot(butter_prices.T)

In [155]:
total_prices

Unnamed: 0,Price
Mon,240
Tues,138
Wed,458
Thurs,232
Fri,142


In [157]:
weekly_sales["total prices"] = total_prices

In [159]:
weekly_sales

Unnamed: 0,Almond Butter,Peanut Butter,Cashew butter,total prices
Mon,12,15,0,240
Tues,3,3,7,138
Wed,9,19,18,458
Thurs,4,6,12,232
Fri,1,6,7,142
