In [3]:
# Initialize Otter
import otter
grader = otter.Notebook("lec_act_1_data_structures.ipynb")

# Lecture goals

1. Understand the benefit of numpy (over lists) for operating over lists of numbers
2. Introduction to numpy-style array operations (slicing, mean)
3. Dictionaries for data encapsulation
4. Debugging strategies: Showing Variables in the variable window, interpretting errors

Some "how tos" for Jupyter notebooks/autograder if you're starting here.

- Put the cursor in a cell and hit shift-return to execute the cell. You can also click on the triangle in the upper left
- Each problem has a "grading" cell. This is, essentially, the tests the auto-grader will run
- If you see triple dots ... or the word "pass" this is short-hand for "put your code here". Python will ignore these, but you should delete them as you complete problems
- You can add as many variables and cells as you want, but don't change the names of the variables given to you. The autograder is expecting those names
- When you think everything is working, hit Restart and Run All (buttons at the top). This will make sure the code you turn in is the same code Gradescope runs. We do check for this; look at the code cells - they have a number next to them. If you've done a Restart and Run all those should start from 1
- If you click on the colored bar to the left of the cell (or output) this hides the contents of the cell. Try it to hide this cell and bring it back

Lecture activity hand-ins for in-person class: Get as much done as you can before Wednesday class, but if you get stuck, just stop and bring your questions to class. There is no penalty for re-submitting after class (just get it in before the late deadline). These "late" days do not count toward your late day allotment.  

In [4]:
# Safety check - if you did not install numpy or otter-grader correctly this should do it for you
#  If you have installed everything already, it should spit out a bunch of "requirement already satisfied" messages
import sys
!{sys.executable} -m pip install numpy
!{sys.executable} -m pip install otter-grader



In [5]:
# Access all numpy functions as np.
import numpy as np

# Question 1: Stats on a list

## calculate stats on a list

TODO: Given a list of numbers (as a list) 
- Calculate the mean of the negative and positive values
- Count the total number of negative/positive values
- Store the values in a dictionary

This is (mostly) just practice with a **for** loop and an **if** statement. And to see an example of the format of the problems for this class.

In [6]:
# Data used for this assignment
#   For just this week we're going to write "in-line" code - i.e., all the code is in one long list of code. Starting
#   next week we'll break code up into functions so we can re-use it/change the data. For this assignment, I'm "hard coding"
#   the "data" in this variable

test_list_one = [-0.75, -0.25, 1.0 / 3.0, 2.0 / 3.0, 3.0 / 3.0]


In [7]:
## EXAMPLE CODE
# Cells labeled EXAMPLE CODE have code in them that you should understand before you start the problem. Usually, they'll be
#  code that you'll want to copy and edit for the TODO in the main problem, along with some explanations of what the code
#  is and how it works.

# Loop over a list and print out whether the element is positive or negative
for item in test_list_one:
    if item < 0:
        # see tutorial on strings for the syntax of f""
        print(f"Item {item} is negative")
    elif item > 0:
        print(f"Item {item} is positive")
    else:
        print(f"Item {item} is zero")


Item -0.75 is negative
Item -0.25 is negative
Item 0.3333333333333333 is positive
Item 0.6666666666666666 is positive
Item 1.0 is positive


In [8]:
# SCRATCH CELL
# Scratch cells are for you to use to try something out - usually a simpler version of the problem. These are not
#  graded, although make sure they execute properly 

# Suggestion: If doing the full positive-negative split is too complicated, try writing a for loop that just
#  loops over all of the items in test_list_one and adds them up and counts how many there are.

In [9]:
# These are the stats you will be calculating. This is more elegant/useful than creating four variables - it keeps all
#  of the values in the same place and assigns a meaningful label (key) to them
# See the tutorial on dictionaries on how to set/get key-value pairs in a dictionary
dict_save_stats = {"Mean positive": 0.0, "Mean negative": -0.0, "Count positive": 0, "Count negative": 0}

# TODO: 
#   Calculate the means and the counts of test_list_one and store them in the dictionary with the keys given above in dict_save_stats
#   You'll need a for loop to go over the list and an if statement to separate into positive and negative
# Step 1: Copy the for loop from the example code above (try writing it without looking at it first)
#  This has the code structure you'll need, but doesn't calculate any stats.
# Step 2: Use the dictionary dict_save_stats with the appropriate key to count the number of positives/negatives
#    Two options: Create a count variable and set the dictionary entry to the count variable after the for loop
#    Use the dictionary entry as the count variable (foo[""] = foo[""] + 1)
# Step 3: In the loop add the positive/negative values to the appropriate dictionary entry
# Step 4: Don't forget to divide by the count to create the mean

...


Ellipsis

### Test code for list

TODO: 
- Fill in the for loop above
- Run the cell below - it will print out if your values are incorrect

Next week we'll put the for loop code in a function so that we run it with other data than test_list_data. In the meantime, this is an example of writing test code to check that your code is correct. In this case, you can just look at test_list_data and see what the right answer should be.

In [10]:
# Tests
b_tests_passed = True
if not np.isclose(dict_save_stats["Mean positive"], 2.0 / 3.0):
    b_tests_passed = False
    print(f"Mean positive is not correct, should be {2.0/3.0}, got {dict_save_stats['Mean positive']}")

if not np.isclose(dict_save_stats["Mean negative"], -0.5):
    b_tests_passed = False
    print(f"Mean negative is not correct, should be -0.5, got {dict_save_stats['Mean negative']}")

# != is not equals
if dict_save_stats["Count positive"] != 3:
    b_tests_passed = False
    print(f"Count positive numbers, should be 3, got {dict_save_stats['Count positive']}")

if dict_save_stats["Count negative"] != 2:
    b_tests_passed = False
    print(f"Count positive numbers, should be 2, got {dict_save_stats['Count negative']}")

if b_tests_passed:
    print("All array tests passed!")

Mean positive is not correct, should be 0.6666666666666666, got 0.0
Mean negative is not correct, should be -0.5, got -0.0
Count positive numbers, should be 3, got 0
Count positive numbers, should be 2, got 0


## A note on the autograder. 

As tempting as it might be to just write the number in to make the test work (instead of calculating it) you will get a zero for doing so. I.e., do not just put 2/3 into the dictionary.

In [11]:
grader.check("list")

# Question 2: Doing it again with a numpy array

TODO: Same as the previous question, but this time do it for a numpy array
- NO **if** statements or **for** loops - do this all with numpy operations

You might find "count_nonzero" useful.

As before, test code is below

In [12]:
# Data for this problem - this takes the list and converts it to a numpy array
# Use the variable window (click on variables above) to see the difference between test_list_one and
#  test_nparra - they should have the same values, they're just stored differently
test_nparray = np.array(test_list_one)

In [13]:
# EXAMPLE CODE
# Numpy has built-in functions to do most of what you want to do
#   this essentially does a for loop over the array and looks for the min
example_min = np.min(test_nparray)
print(f"Min is {example_min}")

# But what if you want to do the if < 0 part? This is where boolean indexing comes in.
#   Look at this variable in the variable window - this is another numpy array, but this time the array is full of 
#   False and True - it is True where the corresponding element in test_nparray is negative
b_is_negative = test_nparray < 0

# You can use this boolean array to get just the elements in the original list that were negative
all_negative = test_nparray[b_is_negative]
print(f"Negative elements {all_negative}")

# or use another numpy method to count the number of non-zero - note, False is zero and True is non-zero
#  Notice that this time the call to np is inside of the print statement - you can always take a piece
#  of code and assign it to a variable.
#     Try doing my_count = np.count_nonzero(b_is_negative)
# If you get syntax errors, break the code up this way to find what part of the code is "broken"
print(f"Count negative {np.count_nonzero(b_is_negative)}")

Min is -0.75
Negative elements [-0.75 -0.25]
Count negative 2


In [14]:
# Calculate these stats for test_nparray. The answers should be the same as the ones above.
#   Do NOT just set the values - you must calculate them
dict_save_stats_np = {"Mean positive": 0, "Mean negative": 0, "Count positive": 0, "Count negative": 0}

# TODO: Calculate the mean for the positive and the negative values
#.  Also count the number of each
#   Do NOT use a for loop - use boolean indexing (see example above)

...


Ellipsis

In [15]:
# SELF TESTS
#  Use this cell to write any additional test code for yourself. For example, you could copy the tests from the list version
#  and use them here, just check the values in dict_save_stats_np instead of dict_save_stats

In [16]:
grader.check("nparray")

# Queston 3: Fix, please -  Uh oh, it doesn't work

There is additional information on this problem in the Lab slides:  https://docs.google.com/presentation/d/1lVYGqoStt0ZdnRAYMfF9Km6f0NgMNkuYgINsRhXASwI/edit?usp=sharing; read those slides first

TODO: Each of the following cells had code that is "broken" - either it generates a syntax error OR it doesn't do what the comment says it does. TODO Fix what's broken so the grader tests pass.

TODO: Read carefully through the next cell. It sets up the data you'll be using for these problems, and also has example array slicing you'll need to fix the broken bits

Reminder: If you don't understand a complex piece of code, you can always break it apart. For example:

**min_xy = np.min(test_data[0,1:3])**

Can be broken up into two lines by creating another variable to hold the slice result

**sliced_data = test_data[0,1:3]**

**min_xy = np.min(sliced_data)**

This makes it a lot easier to see if, for example, sliced_data is actually the slice you want (by printing it out or looking at it in the variable window)


In [17]:
# Making a data set to practice with before lab/homework
#  This is a simplified version of the data set we'll work with
#  It consists of x, y, z data for 10 time steps for 5 samples
#  The x data is all between 0 and 1, the y data 0 and -1, z data 10-20
#  The last column is 1 if the sample is good, 0 if it is bad
#  The data is stored in a 5 x [3 * 10 + 1] array
#  Each row (one row for each sample) looks like this
#    x0 y0 z0 x1 y1 z1 .... x9 y9 z9 1 or 0

# Make space for all of the data and fill it with zeros
#   zeros takes a tuple with the data sizes - in this case we are making a 2 dimensional array
#   with 5 columns (one for each sample) and 10 x,y,z value (30 total) and one extra column for the good/bad
my_test_data = np.zeros((5, 3 * 10 + 1))

# Fill in whether or not the sample is good. Every other one is good, the others are bad
# Since zero is bad - and the array is all zeros - just set every other row, last column
#   The -1 picks the last column, the 0::2 picks every other row
#   Note: The left hand side has 3 elements, the right a single number - numpy interprets this to mean
#     set all of those values to the single number
my_test_data[0::2, -1] = 1

# Fill in the x values for each sample with 0, 0.1... 1.0
#  np.linspace() generates uniformly-spaced samples from start to stop
#    You can assign values to specific parameters by name if you want
#    This would be the same as np.linspace(0, 1.0, 10)
# In this case, the array on the left hand side is 5 x 10, so we're going to use a loop to set each row
#  to 0, 0.1 etc. one row at a time
# shape is the size of the array; we want the number of rows so use .shape[0]
x_data_for_one_row = np.linspace(start=0, stop=1.0, num=10)
for r in range(0, my_test_data.shape[0]):
    # loop through each row r
    # Fill in column 0 to one before the end (don't overwrite the good/bad), skipping every 3
    my_test_data[r, 0:-2:3] = x_data_for_one_row

# Fill in the y values for each sample with random values
#  np.random.uniform() generates random samples between the two values; unlike linsapce, you can set the size
#   of the numpy array it returns. 
# The left side is all rows (5 - :) and every 3rd column starting at 1
y_data_for_all_rows = np.random.uniform(-1.0, 0.0, size=(5, 10))
my_test_data[:, 1::3] = y_data_for_all_rows

# Now the z values - notice that we start at column 2 instead of 1
my_test_data[:, 2::3] = np.random.uniform(10.0, 20.0, size=(5, 10))

## What are the dimensions of the data?

We know what the dimensions of the data are - we just made it in the cell above. Pretend for a moment that you just read **my_test_data** in from a file and you don't know how many samples there are or how many time steps. You **do** know that each sample has an x,y, and z value for each time step, and that the last column is the success/fail column

Again, for these quesitons you need to calculate the value from the data, not just put the number in, except where noted.

In [18]:
# Number of samples - this is correct
n_samples = my_test_data.shape[0]

# Number of dimensions for each time step - we know this is 3, so set it to 3 - this is correct
n_xyz = 3

# FIX ME: Number of time steps
n_time_steps = my_test_data.shape[1]


In [19]:
# Get out just the x values for sample 2
#   Note, this is a somewhat subtle error - look at the size of the x values - it should be n_time_steps (10). WHy
#  is it not? How many columns does my_test_data actually have? What happens if you take every 3rd, starting at 0?
# FIX ME
x_values_sample_2 = my_test_data[1, 0::3]


In [20]:
grader.check("fix_broken")

## Hours and collaborators
Required for every assignment - fill out before you hand-in.

Listing names and websites helps you to document who you worked with and what internet help you received in the case of any plagiarism issues. You should list names of anyone (in class or not) who has substantially helped you with an assignment - or anyone you have *helped*. You do not need to list TAs.

Listing hours helps us track if the assignments are too long.

In [21]:

# List of names (creates a set)
worked_with_names = {"not filled out"}
# List of URLS TCS3 (creates a set)
websites = {"not filled out"}
# Approximate number of hours, including lab/in-class time
hours = -1.5

In [22]:
grader.check("hours_collaborators")

### To submit

- Remove any print statements that print out a lot of stuff
- Do a restart then run all to make sure everything runs ok
- Save the file
- Submit this .ipynb file through gradescope, lecture activity 1 data structures. 

See detailed instructions here
    https://docs.google.com/presentation/d/1tYa5oycUiG4YhXUq5vHvPOpWJ4k_xUPp2rUNIL7Q9RI/edit?usp=sharing