# Programming Assignment 9

## Objectives 

**The purpose of this assignment is to:**

- Practice creating an ndarray using np.random.randint() 

- Practice accessing parts of the ndarray using slice. 

- Use a DataFrame to represent the same data, but with convenient indices and column headers. 

## Assignment Instructions

We will use an array to represent the grades of students in the class, randomly generate some grades, and then do some common array manipulations.  We will suppose that we have 10 students in the class, and each student completes 4 exams, all of which are equally weighted in their final grade.  We’ll group the code into functions. 

#### **Function 1:** rand_array(low, high, num_students, num_exams) 

Write a function that takes 4 arguments: 

- low represents the lowest number that should be randomly generated for a grade 

- high represents the highest number (inclusive) that should be randomly generated for a grade 

- num_students contains the number of students in the class 

- num_exams contains the number of exams that are give in the class 

This function should then return a 2D array which has num_students number of rows, and num_exams number of columns, containing random numbers between low and high (inclusive). 

##### Function 1 Hints (click the arrow on the left to access)

**Hints (it may help to write these steps as comments in your function):**
1) Important:  The first line of your code inside the function must set the random seed to be 0.  Include this line exactly: 

    `np.random.seed(0)` 

This line ensures that your program uses the same starting value as the autograder (and all other students, too!) for the random numbers it generates.  Recall that a pseudorandom number generator will generate a random distribution of numbers, but that the numbers are based on the starting seed. 

2. Calculate the total number of exams that you need to generate (by multiplying the number of students by the number of exams in the course). 

3. Use np.random.randint() to create enough random numbers for those exams, with grades falling between low and high (inclusive).  This function will return a 1D list. 

4. Then, use .reshape() to modify list to be an array with num_student rows and num_exams columns. (Note that.reshape() does not change the original ndarray, it instead returns a new ndarray.). Store the result in a final variable and return it. 

#### **Function 2**: build_student_df(grade_array) 

Write a function that will take a 2D array (such as what is returned from the first function) and convert it to a Pandas DataFrame with custom column names and custom row names.  Specifically, the names of the columns should be “Exam 1”, “Exam 2”, “Exam 3”, and “Exam 4”.  The names of the rows should be the letters “A” through “J”, in order.  Return this DataFrame from the function.

##### Function 2 Hints (click the arrow on the left to access)

Once you have the array converted to a DataFrame, you can use the function .rename() to change the row and column names.  You can check your notes or review the lectures, or find another source on the internet, for example this page: https://www.geeksforgeeks.org/python-change-column-names-and-row-indexes-in-pandas-dataframe/  

## Let's Start Coding!

In [1]:
# get all dependencies
import numpy as np
import pandas as pd
import inspect

### Step 1: Code your Functions

#### **Function 1:** rand_array(low, high, num_students, num_exams) 

In [109]:
def rand_array(low, high, num_students, num_exams):
    ''' generate a 2D array of num_students rows and num_exams columns
        which contain random numbers between the low and high
        integer values (inclusive), returning the 2D array'''
    np.random.seed(0)
    arr = np.random.randint(low, high, size =(num_students, num_exams))
    return arr


In [94]:
# For basic testing
exam_grades = rand_array(70, 100, 10, 4)
print(exam_grades)

[[82 85 91 70]
 [73 97 73 77]
 [79 89 91 88]
 [74 93 76 94]
 [94 82 96 71]
 [76 77 93 84]
 [94 87 75 95]
 [83 78 79 90]
 [89 86 89 75]
 [85 85 70 88]]


#### **Function 2**: build_student_df(grade_array) 

In [110]:
def build_student_df(grade_array):
    ''' take a 2D ndarray and convert to a Pandas Dataframe with custom row and column labels'''
    names = [_ for _ in 'ABCDEFGHIJ']
    names1 = [_ for _ in ('Exam 1', 'Exam 2', 'Exam 3', 'Exam 4')]
    df = pd.DataFrame(grade_array, index=names, columns=names1)
    return df

In [111]:
# For basic testing
build_student_df(exam_grades) 

Unnamed: 0,Exam 1,Exam 2,Exam 3,Exam 4
A,82,85,91,70
B,73,97,73,77
C,79,89,91,88
D,74,93,76,94
E,94,82,96,71
F,76,77,93,84
G,94,87,75,95
H,83,78,79,90
I,89,86,89,75
J,85,85,70,88


### Step 2: Testing your code with the Main program

The main program has been provided for you, to use in testing.  Notice how the same data (for example, Exam 2 scores) can be retrieved from either the array or the DataFrame, but the exact methods used for retrieving the data are slightly different. 

As always, feel free to modify the code in main to further your understanding, but be sure that the two functions meet the requirements listed above. 

In [121]:
def main():
    # create exam grades for 10 students, 4 exams each, with random numbers between 70 and 100
    grades = rand_array(70, 100, 10, 4)
    # Print the result -- this is a 2D list
    print('The student grades in a 2D list are:')
    print(grades)
    # Print just the Exam 2 grades
    print('\nFrom the 2D list, we can print just Exam 2 grades:')
    print(grades[:,1])
    
    # now, convert this to a DataFrame
    student_grades_df= build_student_df(grades)
    # Print the result -- this is a DataFrame
    print('The student grades in a DataFrame are:')
    print(student_grades_df)
    # Print just the Exam 2 grades
    print('\nFrom the DataFrame, we can print just Exam 2 grades:')
    print(student_grades_df['Exam2']) # col
    print('\nWe can also print all the grades of student "A":')
    print(student_grades_df.loc['A']) # row

main()

The student grades in a 2D list are:
[[82 85 91 70]
 [73 97 73 77]
 [79 89 91 88]
 [74 93 76 94]
 [94 82 96 71]
 [76 77 93 84]
 [94 87 75 95]
 [83 78 79 90]
 [89 86 89 75]
 [85 85 70 88]]

From the 2D list, we can print just Exam 2 grades:
[85 97 89 93 82 77 87 78 86 85]
The student grades in a DataFrame are:
   Exam 1  Exam 2  Exam 3  Exam 4
A      82      85      91      70
B      73      97      73      77
C      79      89      91      88
D      74      93      76      94
E      94      82      96      71
F      76      77      93      84
G      94      87      75      95
H      83      78      79      90
I      89      86      89      75
J      85      85      70      88

From the DataFrame, we can print just Exam 2 grades:


KeyError: 'Exam2'

### Step 3: Graded Test Cases
The below cells are locked in this notebook. You do not need to modify them, but you can run them to test your code before submitting for final grading, or press the "Validate" button in your lab to automatically run these all at once.

In [120]:
### GRADED TEST CASE 1: 
'''FUNCTION 1 - A 2D List of exam grades is generated from a 4 parameter function'''
param_check = bool

try:
    grades = rand_array(70, 100, 10, 4)
    param_check = True
    data_type = type(grades)
except Exception as error:
    param_check = False
    
assert str(data_type) == "<class 'numpy.ndarray'>" and param_check == True, "Please verify that your function takes 4 arguments and has a data type of a 2D list"

In [119]:
### GRADED TEST CASE 2: 
'''FUNCTION 1 - Function includes random generation elements'''

random_check = bool

func_lines = inspect.getsource(rand_array)
code_lines = func_lines.splitlines()
for line in code_lines:
    if "np.random.seed(0)" in line or "np.random.randint" in line:
        random_check = True
assert random_check == True, "Please verify you've used np.random.seed and np.random.randint"

In [118]:
###  GRADED TEST CASE 3: 
'''FUNCTION 2 - Function delivers an output of a DataFrame type'''

student_grades_df= build_student_df(grades)
data_type = type(student_grades_df)
assert str(data_type) == "<class 'pandas.core.frame.DataFrame'>", "Please verify that your code generates a DataFrame."

In [117]:
### GRADED TEST CASE 4: 
'''FUNCTION 2 - Row names “Exam 1”, “Exam 2”, “Exam 3”, and “Exam 4”. The names of the rows should be the letters “A” through “J”, in order. '''
col_check = bool
row_check = bool

def cleanString(string):
    clean_string = string.replace(" ", "")
    clean_string = clean_string.replace("'", "")
    clean_string = clean_string.replace('"', "")
    return clean_string

df = build_student_df(exam_grades) 
exp_column_list = "['Exam 1','Exam 2','Exam 3','Exam 4']"
exp_row_list = "['A' 'B' 'C' 'D' 'E' 'F' 'G' 'H' 'I' 'J']"

submit_column_list = str(list(df.columns.values))
submit_rows_list = str(df.index.values)

## Sanitize columns and rows from submission 
submit_col_clean = cleanString(submit_column_list)
submit_rows_clean = cleanString(submit_rows_list)

## Sanitize columns and rows from expected solution
exp_col_clean = cleanString(exp_column_list)
exp_rows_clean = cleanString(exp_row_list)

assert submit_col_clean == exp_col_clean, "Please verify your column headers are correct. Your columns are %s" % submit_column_list
assert submit_rows_clean ==  exp_rows_clean, "Please verify your row values are correct. Your rows are %s" % submit_rows_list

### Step 4: Submit your Work for Grading
Congratulations on completing this assignment.

To receive a final score for your work, please select the "Submit Assignment" button at the top of your lab.