# Coding Temple's Data Analytics Program  
---
# Advanced Python 1 - Intro to `numpy`
---



## Part 1: Working with `numpy`


### 1.1 Importing `numpy`

We've already used the `numpy` package by importing it and assigning it the standard alias of `np`. Do this again in the following cell - the more you practice typing these lines of code, the easier it will be to remember.

In [428]:
# Import numpy and assign it the standard alias

import numpy as np 

### 1.1 Solution - Run this cell to check your answer in 1.1. Please do not edit the values in this cell!

In [447]:
# DO NOT EDIT THIS CELL
assert np.__name__ == 'numpy', 'Make sure that you have properly imported numpy and aliased it as np!'

### 1.2 Generate random numbers

Create a `(5,3)` `numpy` array of random integer values between 0 and 100.

Use the the `random()` method in numpy to generate these integers. Name your new variable `myarray`. You should also print the array to check it's dimensions and values.

In [465]:
import numpy as np

np.random.seed(1)  # Seed generated for reproducibility
myarray = np.random.randint(0, 101, size=(5, 3))
print(myarray)


[[37 12 72]
 [ 9 75  5]
 [79 64 16]
 [ 1 76 71]
 [ 6 25 50]]


### 1.2 Solution - Run the following cell to check your answer.

In [482]:
#DO NOT EDIT THIS CELL

#Verify the array was created with the correct name and has the proper shape
assert myarray.shape == (5,3), 'Make sure you create an array with the proper shape!'

### 1.3 Calculate BMI 

Using the two lists provided, please calculate the BMI(body mass index) of each individual using NDArrays. Save the variable containing your results as `bmi`

The formula for BMI in pounds and inches can be defined as: $BMI= \frac{703 * weight} {(height)^2}$

In [498]:
import numpy as np

height = [55, 120, 90, 100]
weight = [170, 180, 190, 200]

weights_arr = np.array(weight)
heights_arr = np.array(height)
bmi = (703 * weights_arr) / (heights_arr ** 2)
print(bmi)


[39.50743802  8.7875     16.49012346 14.06      ]


### 1.3 Solution: Run the following cell to check your answer.

In [513]:
# DO NOT EDIT THIS CELL
assert 'bmi' in dir() , 'Make sure you have saved your results to the proper variable name!'
assert type(bmi) == np.ndarray, 'Make sure that you made the calculation using an NDArray for both height and weight!'


### 1.4 Create a function 

Create a function named `my_func` that will take in two parameters and will create a random matrix based off of those parameters. Extra: Have additional parameters taken in that allow the user to choose the shape and data type of the matrix.

In [540]:
import numpy as np
rows = 3
columns = 5
shape = (5, 5)
dtype = float

def my_func(rows, columns, shape, dtype):
    matrix = np.random.rand(*shape).astype(dtype)
    return matrix[:rows, :columns]

random_matrix = my_func(rows, columns, shape, dtype)
print(random_matrix)


[[0.50524609 0.0652865  0.42812233 0.09653092 0.12715997]
 [0.59674531 0.226012   0.10694568 0.22030621 0.34982629]
 [0.46778748 0.20174323 0.64040673 0.48306984 0.50523672]]


### 1.5 Array practice

Time for some more practice. Run each of these tasks in the separate code cell listed below:

1.  Return the first row
2.  Return the last column
3.  Return the third column values from the 4th and 5th rows
4.  Multiply every value in the array by 2
5.  Divide every value by 3
6.  Increase the values in the first row by 12
7. Calculate the mean of the first column
8. Calculate the median of the array _after_ removing the 2 smallest values in the array
9. Calculate the standard deviation of the first 3 rows
10. Return values greater than 25 in the second column
11. Return values less than 40 in the array

In [553]:
# 1. Return the first row:
first_row = random_matrix[0, :]
print(f'1st row: {first_row}')

1st row: [12.50524609 12.0652865  12.42812233 12.09653092 12.12715997]


In [565]:
# 2. Return the last column
last_column = random_matrix[:, -1]
print(f'Last column: {last_column}')

Last column: [24.12715997  0.34982629  0.50523672]


In [576]:
# 3. Return the third column values from the 4th and 5th rows

third_column_subset = random_matrix[3:5, 2]
print(f'Third column: {third_column_subset}')

Third column: []


In [586]:
# 4. Multiply every value in the array by 2
multiplied_matrix = random_matrix * 2
print(f'Matrix multiplied by 2: {multiplied_matrix}')

Matrix multiplied by 2: [[97.01049218 96.13057301 96.85624466 96.19306183 96.25431994]
 [ 1.19349062  0.452024    0.21389137  0.44061241  0.69965257]
 [ 0.93557497  0.40348645  1.28081345  0.96613967  1.01047344]]


In [595]:
# 5. Divide every value by 3
divided_matrix = random_matrix / 3
print(f'Matrix divided by 3: {divided_matrix}')

Matrix divided by 3: [[20.16841536 20.02176217 20.14270744 20.03217697 20.04238666]
 [ 0.1989151   0.07533733  0.03564856  0.0734354   0.11660876]
 [ 0.15592916  0.06724774  0.21346891  0.16102328  0.16841224]]


In [603]:
# 6. Increase the values in the first row by 12
random_matrix[0, :] += 12
print(f'Matrix increase first row by 12: {random_matrix}')

Matrix increase first row by 12: [[84.50524609 84.0652865  84.42812233 84.09653092 84.12715997]
 [ 0.59674531  0.226012    0.10694568  0.22030621  0.34982629]
 [ 0.46778748  0.20174323  0.64040673  0.48306984  0.50523672]]


In [610]:
# 7. Calculate the mean of the first column
first_column_mean = np.mean(random_matrix[:, 0])
print(f'Mean first column: {first_column_mean}')


Mean first column: 28.523259627894202


In [616]:
# 8. Calculate the median of the array after removing the 2 smallest values in the array
sorted_values = np.sort(random_matrix, axis=None)
removed_smallest_values = sorted_values[2:]
median_after_removal = np.median(removed_smallest_values)
print(f'Median without 2 lowest values is: {median_after_removal}')


Median without 2 lowest values is: 0.5967453089785958


In [621]:
# 9. Calculate the standard deviation of the first 3 rows
# Generate a new array to work on
np.random.seed(2) # New seed for new array
std_first_three_rows = np.std(random_matrix[:3, :])
print(f'Standard deviation first 3 rows: {std_first_three_rows}')


Standard deviation first 3 rows: 39.534580715271886


In [625]:
# 10. Return values in the second column greater than 25
values_greater_than_25 = random_matrix[random_matrix[:, 1] > 25, 1]
print(f'Values greater than 25 in the second column: {values_greater_than_25}')


Values greater than 25 in the second column: [84.0652865]


In [628]:
# 11. Return values < 40 in the array
values_less_than_40 = random_matrix[random_matrix < 40]
print(f'Values less than 40 in the array: {values_less_than_40}')


Values less than 40 in the array: [0.59674531 0.226012   0.10694568 0.22030621 0.34982629 0.46778748
 0.20174323 0.64040673 0.48306984 0.50523672]


### Solution 1.5: Run the following cell to view the solution for each of the above tasks.

A new array will be generated to demonstrate the solution - the values will not be the same as your array. But the code for each task will still apply.

In [630]:
# DO NOT EDIT THIS CELL
# SOLUTION 1.5

# SOLUTION: Make sure you have completed all of the above tasks
# Generate your random numbers (NEW ARRAY)
myarray = np.random.randint(100, size=(5, 3))
print('The generated array: \n', myarray)
print('\n')

# 1. Return the first row:
print('1. The first row: ', myarray[0])
print('\n')

# 2. Return the last column
print('2. The last column: ', myarray[:,-1])
print('\n')

# 3. Return the third column values from the 4th and 5th rows
print('3. The 3rd column, 4th & 5th rows: ', myarray[3:5,2])
print('\n')

# 4. Multiply every value in the array by 2
# (operates on the original array)
print('4. Multiply by 2: \n', myarray * 2)
print('\n')

# 5. Divide every value by 3
# (operates on the original array)
print('5. Divide by 3: \n', myarray / 3)
print('\n')

# 6. Increase the values in the first row by 12
# (operates on the original array)
print('6. Add 12 to the first row: \n', myarray[0,:] + 12)
print('\n')

# 7. Calculate the mean of the first column
print('7. The mean of the 1st column: ', myarray[:,0].mean())
print('\n')

# 8. Calculate the median of the array after removing the 2 smallest values in the array
# flatten and sort (axis=None does the flattening)
myarray = np.sort(myarray, axis=None)
# remove two smallest values
myarray = myarray[2:]
# calculate the median
print('8. The median after removing the 2 smallest values: ', np.median(myarray))
print('\n')

# 9. Calculate the standard deviation of the first 3 rows
# Generate new array first:
myarray = np.random.randint(100, size=(5, 3))
# Then calculate the std:
print('9. The standard deviation is: ', np.std(myarray[0:3,:]))
print('\n')

# 10. Return values in the second column greater than 25
# create a Boolean mask where values in the 2nd column > 25 are True
condition = myarray[:,1] > 25
# Apply the mask
print('10. All values in 2nd column > 25: \n', myarray[condition])
print('\n')

# 11. Return values < 40 in the array
# create another Boolean mask for values < 40
condition = myarray < 40
# apply the mask
print('11. All values < 40: \n', myarray[condition])

The generated array: 
 [[74 72 90]
 [99 91 96]
 [16 55 21]
 [43 93 80]
 [40 70 74]]


1. The first row:  [74 72 90]


2. The last column:  [90 96 21 80 74]


3. The 3rd column, 4th & 5th rows:  [80 74]


4. Multiply by 2: 
 [[148 144 180]
 [198 182 192]
 [ 32 110  42]
 [ 86 186 160]
 [ 80 140 148]]


5. Divide by 3: 
 [[24.66666667 24.         30.        ]
 [33.         30.33333333 32.        ]
 [ 5.33333333 18.33333333  7.        ]
 [14.33333333 31.         26.66666667]
 [13.33333333 23.33333333 24.66666667]]


6. Add 12 to the first row: 
 [ 86  84 102]


7. The mean of the 1st column:  54.4


8. The median after removing the 2 smallest values:  74.0


9. The standard deviation is:  20.325134959234862


10. All values in 2nd column > 25: 
 [[37 59 17]
 [15 30 77]
 [26 39 63]
 [49 27  0]]


11. All values < 40: 
 [37 17 15 30 26 39 20 22 27  0]
