# Troubleshooting Exam Solution Variables (results/data) Template

#### What we want to do is show some techniques for troubleshooting when your solution does not match the exam solution.

The troubleshooting below applies for all exam solutions, no matter what the exercise requires you to return (list, dict, dataframe, number, etc).

## Note that these are examples of what you can use to troubleshoot your exam data.

## These examples are NOT EXHAUSTIVE, and they are meant to be REPRESENTATIVE of the techniques you can use when troubleshooting.

## These examples represent the KIND OF THINKING that you should be doing, but the code here WILL NOT catch every single error case.

#### For every exam question, you will have a markdown cell similar to the below, describing the variables available to you for troubleshooting. These 4 variables are available to you after every exercise on the exam.

What we are doing here is showing you how these variables will be formatted, and how to do data discovery of them, for troubleshooting.

### Every exam exercise will have this text, describing the test variables available to you for troubleshooting.

*************************************************************************************************************************

The test cell below will check your solution against several randomly generated test cases. If your solution does not pass the test (or if you're just curious), you can look at the variables used in the latest test run. They are automatically imported for you as part of the test.

- `input_vars` - Dictionary containing all of the inputs to your function. Keys are the parameter names.
- `original_input_vars` - Dictionary containing a copy of all the inputs to your function. This is useful for debugging failures related to your solution modifying the input. Keys are the parameter names.
- `returned_output_vars` - Dictionary containing the outputs your function generated. If there are multiple outputs, the keys will match the names mentioned in the exercrise instructions.
- `true_output_vars` - Dictionary containing the outputs your function **should have** generated. If there are multiple outputs, the keys will match the names mentioned in the exercrise instructions.

#### The main variables that you will want to troubleshoot are the output variables, the last two listed above. 

If your code is not working, you will want to compare the result that your code returned **(returned_output_vars)** against the solution **(true_output_vars)**.

Note that these variables are returned in a dictionary, so you will need to write code to access that dictionary to see the actual variables.

## When your solution is being tested, the test is looking at 5 things:
(actually 4, as items 2 and 3 below are the same test, depending on the variable type):

1. The data types between the two solution outputs are the same.
2. If the solution is longer than a single value (list/dict), the true and returned variables are the same length.
3. If your solution is a dataframe, the true and returned variables have the same shape (rows and columns)
4. If your solution is a dataframe, the column names are the same, and in the same order.
5. The actual data itself is the same, no matter the data structure (list, dict, df, etc)

### There are two general methods to troubleshoot.

1. You can visually inspect and compare the two outputs. This is what many students do, and they fail, because a visual inspection of the data does not test all 4 scenarios above.
2. You can run some code to test the 4 testing scenarios above. Below we are providing template code to do this. You can copy and paste the below code into your exam and run it to test your outputs.

#### Use the below cell as template code to print the different variables. Uncomment the lines for the variables that you want to print. You can insert a new cell and copy in this code, below the cell that gives you the error message on your exercise.

Printing the output variables allows you to do a visual inspection for differences.

This is not necessarily the only way that you should test, but it is a good starting point, to get familiar with what the output variables look like.

In [None]:
# # ONLY UNCOMMENT IF YOU NEED IT!!!!
# # BEWARE THAT THIS COULD GENERATE VOLUMINOUS OUTPUT!!!!!


# print('input_vars')
# print(input_vars)
# print('original_input_vars')
# print(original_input_vars)


# print('returned_output_vars')
# print(returned_output_vars)
# print('true_output_vars')
# print(true_output_vars)

#### Use the below two cells as template code to run some specific tests to compare your returned output against the true output. Uncomment the lines for the variables that you want to compare. You can insert a new cell and copy in this code, below the cell that gives you the error message on your exercise, in the exam.

The output variables are always returned in the form of a dictionary. The key of the dictionary is the name of the outputted variable, while the value of the dictionary is the actual exercise solution result.

*******************************************************

If you are required to return a single result (list, dict, df, number), then the key name will be something on the order of "output_0". As the dict has only a single key-value pair, the key name in this case is not important.


********************************************************

If you are required to return multiple results (lists, dicts, dfs, numbers), then there will be multiple key-value pairs in the dict, and the key name of each pair, will be the name of the variable you were to return. 

*********************************************************

For example, if you are asked to return the mean, median, and max of a dataset, then there will be three key-value pairs in the dictionary, and the key names will be "mean", "median", and "max".

#### This first cell runs all of the checks on the true_output variables (the solution), then runs all of the checks on the return_output variables (your code output).

In [None]:
# # template code to check the output variables on the exam
# # this code checks the true output variables (the solution your output is being compared against)
# # Checks all of the true output variables, then all of the returned output variables


##********************************************************
# # this code checks the true output variables (solution)

# for k,v in true_output_vars.items():

##*** Use these checks for returning lists, dicts, sets, values************
#     # check for datatype (list, dict)
#     print(type(v))
    
#     # check for the length of the solution (use for lists and dicts)
#     print(len(v))
    
#     # print the output for visual inspection
#     print(v)
##**********************************************************

##*** Use these checks for returning pandas dfs************
#     
#     # check for datatype (df)
#     print(type(v))

#     # check the datatypes of individual columns in a pandas df
#     print(v.dtypes)
   
#     # check the shape of a pandas df
#     rows_soln = v.shape[0]
#     cols_soln = v.shape[1]
#     print(rows_soln)
#     print(cols_soln)
    
#     # check the column names of a pandas df
#     my_list = v.columns.values.tolist()
#     print(my_list)
    
#     # check the first 5 values of a pandas df
#     true_df = v.copy()
#     print(true_df.head(5))
#
#     # use the df.info() function
#     true_df = v.copy()
#     print(true_df.info())

##*******************************************************


##*******************************************************
# # this code checks your code's output variables
# for k,v in returned_output_vars.items():

##*** Use these checks for returning lists, dicts, sets, values************
#     # check for datatype (list, dict, df)
#     print(type(v))
    
#     # check for the length of the solution (use for lists and dicts)
#     print(len(v))
    
#     # print the output for visual inspection
#     print(v)
##**********************************************************
    
##*** Use these checks for returning pandas dfs************
#     # check for datatype (df)
#     print(type(v))

#     # check the datatypes of individual columns in a pandas df
#     print(v.dtypes)
    
#     # check the shape of a pandas df
#     rows_soln = v.shape[0]
#     cols_soln = v.shape[1]
#     print(rows_soln)
#     print(cols_soln)
    
#     # check the column names of a pandas df
#     my_ret_list = v.columns.values.tolist()
#     print(my_ret_list)
    
#     # check the first 5 values of a pandas df
#     ret_df = v.copy()
#     print(ret_df.head(5))
#
#     # use the df.info() function
#     ret_df = v.copy()
#     print(ret_df.info())

##**************************************************

#### The cell below runs each of the checks individually, in a loop that checks both the true and returned variables in the same loop. Each loop performs a different set of checks, so use the loop applicable for your variable types.

In [None]:
# # template code to check the output variables on the exam
# # this code checks the true output variables (the solution your output is being compared against)
# # Checks the true output and returned output variables in a single loop

# for kr,vr in returned_output_vars.items():
#     # check for datatype (list, dict, df)
#     print('returned var type')
#     print(type(vr))
#     for kt,vt in true_output_vars.items():
#     #     # check for datatype (list, dict, df)
#         print('true var type')
#         print(type(vt))
#         if type(vr) == type(vt):
#             print('data types match')
#         else:
#             print('data types incorrect')
        
# for kr,vr in returned_output_vars.items():
#     # check for column datatypes (list, dict, string, object)
#     print('returned var column types')
#     # check the datatypes of individual columns in a pandas df
#     print(vr.dtypes)
#     for kt,vt in true_output_vars.items():
#         # check for column datatypes (list, dict, string, object)
#         print('true var column types')
#         # check the datatypes of individual columns in a pandas df
#         print(vt.dtypes)
        
# for kr,vr in returned_output_vars.items():
#     # check the shape of a pandas df
#     print('returned var df shape')
#     r_rows_soln = vr.shape[0]
#     r_cols_soln = vr.shape[1]
#     print(r_rows_soln)
#     print(r_cols_soln)
#     for kt,vt in true_output_vars.items():
#         # check the shape of a pandas df
#         print('true var df shape')
#         t_rows_soln = vr.shape[0]
#         t_cols_soln = vr.shape[1]
#         print(t_rows_soln)
#         print(t_cols_soln)
        
# for kr,vr in returned_output_vars.items():
#     # check the column names of a pandas df
#     print('returned var df column names')
#     r_my_list = vr.columns.values.tolist()
#     print(r_my_list)
#     for kt,vt in true_output_vars.items():
#         # check the column names of a pandas df
#         print('true var df column names')
#         t_my_list = vr.columns.values.tolist()
#         print(t_my_list)      


#### Finally, if your output is a dataframe, and you need to compare the actual df values between your output and the solution, you can use the pandas compare() function to see the actual rows that are different.

The caveat here, and the only way this will work, is if you have already verified that the following are true:

1. Your dataframe has the same shape as the true output dataframe.
2. Your dataframes column names are the same as the true output dataframe.
3. Your dataframe column data types are the same as the true output dataframe.

********************************************************************************************

If your dataframe values are the same, the compare() function will return an empty DataFrame. The DataFrame produced by the compare() function will only have data if there are rows that are different, and it will only have the actual different rows in it.

#### The below code shows two ways of doing the compare. They effectively are doing the same thing, so you would only need to use one of them in your exam troubleshooting.

In [None]:
# # loop over the true output dict
# # using .items()
# for k_true,v_true in true_output_vars.items():
#     # loop over your code output dict
#     for k_returned,v_returned in returned_output_vars.items():
#         print(v_true.compare(v_returned))

# # loop over the true output dict
# # using .keys()
# for key in true_output_vars.keys() :
#     true_val = (true_output_vars[key])
#     for key in returned_output_vars.keys() :
#         ret_val = (returned_output_vars[key])
#         print(true_val.compare(ret_val))