# Coding Temple's Data Analytics Program  
---
# Advanced Python - Intro to `numpy`
---



## Part 1: Working with `numpy`


### 1.1 Importing `numpy`

We've already used the `numpy` package by importing it and assigning it the standard alias of `np`. Do this again in the following cell - the more you practice typing these lines of code, the easier it will be to remember.

In [2]:
# Import numpy and assign it the standard alias
import numpy as np

### 1.1 Solution - Run this cell to check your answer in 1.1. Please do not edit the values in this cell!

In [3]:
# DO NOT EDIT THIS CELL
assert np.__name__ == 'numpy', 'Make sure that you have properly imported numpy and aliased it as np!'

### 1.2 Generate random numbers

Create a `(5,3)` `numpy` array of random integer values between 0 and 100.

Use the the `random()` method in numpy to generate these integers. Name your new variable `myarray`. You should also print the array to check it's dimensions and values.

In [4]:
# Generate your random numbers
np.random.seed(1) #Seed generated for reproducibility

myarray = np.random.random((5,3))

# Print out the array
print(myarray)

[[4.17022005e-01 7.20324493e-01 1.14374817e-04]
 [3.02332573e-01 1.46755891e-01 9.23385948e-02]
 [1.86260211e-01 3.45560727e-01 3.96767474e-01]
 [5.38816734e-01 4.19194514e-01 6.85219500e-01]
 [2.04452250e-01 8.78117436e-01 2.73875932e-02]]


### 1.2 Solution - Run the following cell to check your answer.

In [None]:
#DO NOT EDIT THIS CELL

#Verify the array was created with the correct name and has the proper shape
assert myarray.shape == (5,3), 'Make sure you create an array with the proper shape!'

### 1.3 Calculate BMI 

Using the two lists provided, please calculate the BMI(body mass index) of each individual using NDArrays. Save the variable containing your results as `bmi`

The formula for BMI in pounds and inches can be defined as: $BMI= \frac{703 * weight} {(height)^2}$

In [3]:
height = [55, 120, 90, 100]
weight = [170, 180, 190, 200]

np_height = np.array(height)
np_weight = np.array(weight)

bmi = (703 * np_weight)/(np_height**2)

### 1.3 Solution: Run the following cell to check your answer.

In [4]:
assert 'bmi' in dir() , 'Make sure you have saved your results to the proper variable name!'
assert type(bmi) == np.ndarray, 'Make sure that you made the calculation using an NDArray for both height and weight!'

### 1.4 Create a function 

Create a function named `my_func` that will take in two parameters and will create a random matrix based off of those parameters. Extra: Have additional parameters taken in that allow the user to choose the shape and data type of the matrix.

In [8]:
def my_funct(min, max, shape = (6,3), type = int):
    return np.random.randint(min, max, shape, type)

print(my_funct(0,15, (3,2)))

RA = my_funct(2,25,(10, 5))

print(RA)

[[ 6  8]
 [ 3  8]
 [ 9 12]]
[[20  5  7  9 12]
 [ 8  4 17  2  5]
 [18  9 16 23  7]
 [19  9  3 14  7]
 [16 24 17 14  4]
 [ 8 12 20 24  9]
 [ 5 22 15 21  3]
 [ 5 15  4 22  3]
 [12 11 23 21  4]
 [13 10 18 22  9]]


### 1.5 Array practice

Time for some more practice. Run each of these tasks in the separate code cell listed below:

1.  Return the first row
2.  Return the last column
3.  Return the third column values from the 4th and 5th rows
4.  Multiply every value in the array by 2
5.  Divide every value by 3
6.  Increase the values in the first row by 12
7. Calculate the mean of the first column
8. Calculate the median of the array _after_ removing the 2 smallest values in the array
9. Calculate the standard deviation of the first 3 rows
10. Return values greater than 25 in the second column
11. Return values less than 40 in the array

In [9]:
# 1. Return the first row:

RA[0,:]


array([20,  5,  7,  9, 12])

In [10]:
# 2. Return the last column

RA[-1,:]


array([13, 10, 18, 22,  9])

In [11]:
# 3. Return the third column values from the 4th and 5th rows

RA[3:5, 2]

array([ 3, 17])

In [12]:
# 4. Multiply every value in the array by 2
RA*2


array([[40, 10, 14, 18, 24],
       [16,  8, 34,  4, 10],
       [36, 18, 32, 46, 14],
       [38, 18,  6, 28, 14],
       [32, 48, 34, 28,  8],
       [16, 24, 40, 48, 18],
       [10, 44, 30, 42,  6],
       [10, 30,  8, 44,  6],
       [24, 22, 46, 42,  8],
       [26, 20, 36, 44, 18]])

In [13]:
# 5. Divide every value by 3
RA/3

array([[6.66666667, 1.66666667, 2.33333333, 3.        , 4.        ],
       [2.66666667, 1.33333333, 5.66666667, 0.66666667, 1.66666667],
       [6.        , 3.        , 5.33333333, 7.66666667, 2.33333333],
       [6.33333333, 3.        , 1.        , 4.66666667, 2.33333333],
       [5.33333333, 8.        , 5.66666667, 4.66666667, 1.33333333],
       [2.66666667, 4.        , 6.66666667, 8.        , 3.        ],
       [1.66666667, 7.33333333, 5.        , 7.        , 1.        ],
       [1.66666667, 5.        , 1.33333333, 7.33333333, 1.        ],
       [4.        , 3.66666667, 7.66666667, 7.        , 1.33333333],
       [4.33333333, 3.33333333, 6.        , 7.33333333, 3.        ]])

In [16]:
# 6. Increase the values in the first row by 12
RA[0,:] += 12

print(RA)

[[44 29 31 33 36]
 [ 8  4 17  2  5]
 [18  9 16 23  7]
 [19  9  3 14  7]
 [16 24 17 14  4]
 [ 8 12 20 24  9]
 [ 5 22 15 21  3]
 [ 5 15  4 22  3]
 [12 11 23 21  4]
 [13 10 18 22  9]]


In [20]:
# 7. Calculate the mean of the first column
np.mean(RA[:,0])


14.8


14.8

In [51]:
# 8. Calculate the median of the array after removing the 2 smallest values in the array

np.median(np.sort(RA.flatten())[:-2])


13.5

In [52]:
# 9. Calculate the standard deviation of the first 3 rows
# Generate a new array to work on
np.random.seed(2) # New seed for new array

NA = np.random.randint(5, 65, (8, 6))

np.std(NA[:3,:])



14.421242048527658

In [48]:
# 10. Return values in the second column greater than 25


NA[:,1][NA[:,1] > 25]

array([57, 47, 38, 51])

In [46]:
# 11. Return values < 40 in the array
NA[NA < 40]

array([20, 13, 27, 23, 16, 12, 39, 36, 16, 26, 36, 31, 25,  8,  9, 38,  8,
       10, 29,  9, 11, 36, 24, 36,  7, 21, 17])

### Solution 1.5: Run the following cell to view the solution for each of the above tasks.

A new array will be generated to demonstrate the solution - the values will not be the same as your array. But the code for each task will still apply.

In [53]:
# DO NOT EDIT THIS CELL
# SOLUTION 1.3

# SOLUTION: Make sure you have completed all of the above tasks
# Generate your random numbers (NEW ARRAY)
myarray = np.random.randint(100, size=(5, 3))
print('The generated array: \n', myarray)
print('\n')

# 1. Return the first row:
print('1. The first row: ', myarray[0])
print('\n')

# 2. Return the last column
print('2. The last column: ', myarray[:,-1])
print('\n')

# 3. Return the third column values from the 4th and 5th rows
print('3. The 3rd column, 4th & 5th rows: ', myarray[3:5,2])
print('\n')

# 4. Multiply every value in the array by 2
# (operates on the original array)
print('4. Multiply by 2: \n', myarray * 2)
print('\n')

# 5. Divide every value by 3
# (operates on the original array)
print('5. Divide by 3: \n', myarray / 3)
print('\n')

# 6. Increase the values in the first row by 12
# (operates on the original array)
print('6. Add 12 to the first row: \n', myarray[0,:] + 12)
print('\n')

# 7. Calculate the mean of the first column
print('7. The mean of the 1st column: ', myarray[:,0].mean())
print('\n')

# 8. Calculate the median of the array after removing the 2 smallest values in the array
# flatten and sort (axis=None does the flattening)
myarray = np.sort(myarray, axis=None)
# remove two smallest values
myarray = myarray[2:]
# calculate the median
print('8. The median after removing the 2 smallest values: ', np.median(myarray))
print('\n')

# 9. Calculate the standard deviation of the first 3 rows
# Generate new array first:
myarray = np.random.randint(100, size=(5, 3))
# Then calculate the std:
print('9. The standard deviation is: ', np.std(myarray[0:3,:]))
print('\n')

# 10. Return values in the second column greater than 25
# create a Boolean mask where values in the 2nd column > 25 are True
condition = myarray[:,1] > 25
# Apply the mask
print('10. All values in 2nd column > 25: \n', myarray[condition])
print('\n')

# 11. Return values < 40 in the array
# create another Boolean mask for values < 40
condition = myarray < 40
# apply the mask
print('11. All values < 40: \n', myarray[condition])

The generated array: 
 [[50  4 90]
 [63 79 49]
 [39 46  8]
 [50 15  8]
 [17 22 73]]


1. The first row:  [50  4 90]


2. The last column:  [90 49  8  8 73]


3. The 3rd column, 4th & 5th rows:  [ 8 73]


4. Multiply by 2: 
 [[100   8 180]
 [126 158  98]
 [ 78  92  16]
 [100  30  16]
 [ 34  44 146]]


5. Divide by 3: 
 [[16.66666667  1.33333333 30.        ]
 [21.         26.33333333 16.33333333]
 [13.         15.33333333  2.66666667]
 [16.66666667  5.          2.66666667]
 [ 5.66666667  7.33333333 24.33333333]]


6. Add 12 to the first row: 
 [ 62  16 102]


7. The mean of the 1st column:  43.8


8. The median after removing the 2 smallest values:  49.0


9. The standard deviation is:  28.801148811106582


10. All values in 2nd column > 25: 
 [[57 90 62]
 [83 96 43]
 [32 26  8]
 [34 60  9]]


11. All values < 40: 
 [32 26  8 10 34  9]
