# Coding Temple's Data Analytics Program  
---
# Advanced Python - Intro to `numpy`
---



## Part 1: Working with `numpy`


### 1.1 Importing `numpy`

We've already used the `numpy` package by importing it and assigning it the standard alias of `np`. Do this again in the following cell - the more you practice typing these lines of code, the easier it will be to remember.

In [2]:
# Import numpy and assign it the standard alias
# YOUR CODE HERE

import numpy as np

### 1.1 Solution - Run this cell to check your answer in 1.1. Please do not edit the values in this cell!

In [2]:
# DO NOT EDIT THIS CELL
assert np.__name__ == 'numpy', 'Make sure that you have properly imported numpy and aliased it as np!'

### 1.2 Generate random numbers

Create a `(5,3)` `numpy` array of random integer values between 0 and 100.

Use the the `random()` method in numpy to generate these integers. Name your new variable `myarray`. You should also print the array to check it's dimensions and values.

In [14]:
# Generate your random numbers
np.random.seed(1) #Seed generated for reproducibility

#YOUR CODE HERE
myarray = np.random.randint(101, size = (5, 3))

# Print out the array
myarray

array([[37, 12, 72],
       [ 9, 75,  5],
       [79, 64, 16],
       [ 1, 76, 71],
       [ 6, 25, 50]])

### 1.2 Solution - Run the following cell to check your answer.

In [15]:
#DO NOT EDIT THIS CELL

#Verify the array was created with the correct name and has the proper shape
assert myarray.shape == (5,3), 'Make sure you create an array with the proper shape!'

### 1.3 Calculate BMI 

Using the two lists provided, please calculate the BMI(body mass index) of each individual using NDArrays. Save the variable containing your results as `bmi`

The formula for BMI in pounds and inches can be defined as: $BMI= \frac{703 * weight} {(height)^2}$

In [19]:
height = [55, 120, 90, 100]
weight = [170, 180, 190, 200]

np.array(height) + np.array(weight)

array([225, 300, 280, 300])

In [24]:
height = [55, 120, 90, 100]
weight = [170, 180, 190, 200]

bmi = (703 * np.array(weight)) / np.array(height) ** 2
bmi

array([39.50743802,  8.7875    , 16.49012346, 14.06      ])

### 1.3 Solution: Run the following cell to check your answer.

In [25]:
assert 'bmi' in dir() , 'Make sure you have saved your results to the proper variable name!'
assert type(bmi) == np.ndarray, 'Make sure that you made the calculation using an NDArray for both height and weight!'

### 1.4 Create a function 

Create a function named `my_func` that will take in two parameters and will create a random matrix based off of those parameters. Extra: Have additional parameters taken in that allow the user to choose the shape and data type of the matrix.

In [26]:
np.random.rand(5,2)

array([[0.84631092, 0.31327352],
       [0.52454816, 0.44345289],
       [0.22957721, 0.53441391],
       [0.91396202, 0.45720481],
       [0.43069857, 0.93912779]])

In [27]:
def my_func(rows, cols):
    random_matrix = np.random.rand(rows, cols)
    
    return random_matrix

In [28]:
my_func(3, 4)

array([[0.77838924, 0.71597052, 0.8027575 , 0.09280081],
       [0.51815255, 0.86502025, 0.82914691, 0.82960336],
       [0.27304997, 0.0592432 , 0.67052804, 0.59306552]])

### 1.5 Array practice

Time for some more practice. Run each of these tasks in the separate code cell listed below:

1.  Return the first row
2.  Return the last column
3.  Return the third column values from the 4th and 5th rows
4.  Multiply every value in the array by 2
5.  Divide every value by 3
6.  Increase the values in the first row by 12
7. Calculate the mean of the first column
8. Calculate the median of the array _after_ removing the 2 smallest values in the array
9. Calculate the standard deviation of the first 3 rows
10. Return values greater than 25 in the second column
11. Return values less than 40 in the array

In [29]:
myarray

array([[37, 12, 72],
       [ 9, 75,  5],
       [79, 64, 16],
       [ 1, 76, 71],
       [ 6, 25, 50]])

In [38]:
# 1. Return the first row:
myarray[0,:] #give all columns

array([37, 12, 72])

In [36]:
# 2. Return the last column
        #rows, columns
myarray[:,-1]

array([72,  5, 16, 71, 50])

In [39]:
# 3. Return the third column values from the 4th and 5th rows

myarray[3:5,2]

array([71, 50])

In [40]:
# 4. Multiply every value in the array by 2
myarray * 2

array([[ 74,  24, 144],
       [ 18, 150,  10],
       [158, 128,  32],
       [  2, 152, 142],
       [ 12,  50, 100]])

In [43]:
# 5. Divide every value by 3
myarray / 3

array([[12.33333333,  4.        , 24.        ],
       [ 3.        , 25.        ,  1.66666667],
       [26.33333333, 21.33333333,  5.33333333],
       [ 0.33333333, 25.33333333, 23.66666667],
       [ 2.        ,  8.33333333, 16.66666667]])

In [51]:
# 6. Increase the values in the first row by 12
myarray[0,:] + 12 #all columns + 12

array([49, 24, 84])

In [54]:
# 7. Calculate the mean of the first column
np.mean(myarray[:, 0])

26.4

In [50]:
# 8. Calculate the median of the array after removing the 2 smallest values in the array
np.sort(myarray)

array([[12, 37, 72],
       [ 5,  9, 75],
       [16, 64, 79],
       [ 1, 71, 76],
       [ 6, 25, 50]])

In [3]:
# 9. Calculate the standard deviation of the first 3 rows
# Generate a new array to work on
np.random.seed(2) # New seed for new array

array_std = np.random.randint(low=2, high=93, size=(3,3))
print(array_std)
std_d = np.std(array_std, axis = 0)
print(std_d)


[[42 17 74]
 [24 45 84]
 [77  9 36]]
[22.00504993 15.4344492  20.67741656]


In [8]:
# 10. Return values in the second column greater than 25
condition = array_std[:,1] > 25

print(condition)

[False  True False]


In [10]:
# 11. Return values < 40 in the array
M = np.where(array_std<40)

array_std[M] = 39
print(M)


(array([0, 1, 2, 2], dtype=int64), array([1, 0, 1, 2], dtype=int64))


### Solution 1.5: Run the following cell to view the solution for each of the above tasks.

A new array will be generated to demonstrate the solution - the values will not be the same as your array. But the code for each task will still apply.

In [11]:
# DO NOT EDIT THIS CELL
# SOLUTION 1.3

# SOLUTION: Make sure you have completed all of the above tasks
# Generate your random numbers (NEW ARRAY)
myarray = np.random.randint(100, size=(5, 3))
print('The generated array: \n', myarray)
print('\n')

# 1. Return the first row:
print('1. The first row: ', myarray[0])
print('\n')

# 2. Return the last column
print('2. The last column: ', myarray[:,-1])
print('\n')

# 3. Return the third column values from the 4th and 5th rows
print('3. The 3rd column, 4th & 5th rows: ', myarray[3:5,2])
print('\n')

# 4. Multiply every value in the array by 2
# (operates on the original array)
print('4. Multiply by 2: \n', myarray * 2)
print('\n')

# 5. Divide every value by 3
# (operates on the original array)
print('5. Divide by 3: \n', myarray / 3)
print('\n')

# 6. Increase the values in the first row by 12
# (operates on the original array)
print('6. Add 12 to the first row: \n', myarray[0,:] + 12)
print('\n')

# 7. Calculate the mean of the first column
print('7. The mean of the 1st column: ', myarray[:,0].mean())
print('\n')

# 8. Calculate the median of the array after removing the 2 smallest values in the array
# flatten and sort (axis=None does the flattening)
myarray = np.sort(myarray, axis=None)
# remove two smallest values
myarray = myarray[2:]
# calculate the median
print('8. The median after removing the 2 smallest values: ', np.median(myarray))
print('\n')

# 9. Calculate the standard deviation of the first 3 rows
# Generate new array first:
myarray = np.random.randint(100, size=(5, 3))
# Then calculate the std:
print('9. The standard deviation is: ', np.std(myarray[0:3,:]))
print('\n')

# 10. Return values in the second column greater than 25
# create a Boolean mask where values in the 2nd column > 25 are True
condition = myarray[:,1] > 25
# Apply the mask
print('10. All values in 2nd column > 25: \n', myarray[condition])
print('\n')

# 11. Return values < 40 in the array
# create another Boolean mask for values < 40
condition = myarray < 40
# apply the mask
print('11. All values < 40: \n', myarray[condition])

The generated array: 
 [[49 95 75]
 [85 47 63]
 [31 90 20]
 [37 39 67]
 [ 4 42 51]]


1. The first row:  [49 95 75]


2. The last column:  [75 63 20 67 51]


3. The 3rd column, 4th & 5th rows:  [67 51]


4. Multiply by 2: 
 [[ 98 190 150]
 [170  94 126]
 [ 62 180  40]
 [ 74  78 134]
 [  8  84 102]]


5. Divide by 3: 
 [[16.33333333 31.66666667 25.        ]
 [28.33333333 15.66666667 21.        ]
 [10.33333333 30.          6.66666667]
 [12.33333333 13.         22.33333333]
 [ 1.33333333 14.         17.        ]]


6. Add 12 to the first row: 
 [ 61 107  87]


7. The mean of the 1st column:  41.2


8. The median after removing the 2 smallest values:  51.0


9. The standard deviation is:  16.6332999331662


10. All values in 2nd column > 25: 
 [[38 33 58]
 [67 69 88]
 [68 46 70]
 [95 83 31]
 [66 80 52]]


11. All values < 40: 
 [38 33 31]
