# Notebook 12: Multiple return values, random numbers, pair programming, creating tests of code
by Rachel Langgin \\
March 2022 and July 2024 \\
Haverford College and University of Nevada, Las Vegas \\

## NOTE: you need "country_vaccinations.csv' for this to work. (Download/upload it now from the usual directory.)

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Goals
* Get used to numpy.random
* Get used to multiple return values
* Start to think about truth testing (it's no new code - just a concept that you'll need as a programmer)
* Understand the difference between normally distributed and uniformly distributed random numbers

# Numpy random
This may come in handy when testing your code.  I use it a lot.

In [None]:
# Create a 100 random numbers that are evenly distributed
# np.random.uniform(low=0.0, high=1.0, size=None)
x = np.random.uniform(0, 100, 100000)
plt.hist(x, bins=30);

In [None]:
x

In [None]:
# Now create 100 random numbers are "normally" distributed
# This is more like random numbers are in nature.  They're usually centered on some value.
# x = np.random.normal(center,width, number)
x = np.random.normal(20, 1, 100000)
plt.hist(x, bins=80)

# Some examples
height, eyeglass prescription, hemoglobin content of blood

# Multiple return values
Sometimes it's useful to have your function return multiple items.  For example...

In [None]:
## multiple return values and append
def myfunc(x):
    return(x**2, x**3, x**4)

If your function doesn't return anything, you can't set anything equal to it.

When I call myfunc I have a couple different options for how to save the different returned values.
First, I can use the assignment operator to assign the values of three variables all at once.

In [None]:
sq, cub, quart = myfunc(2)

Let's just make sure each of those variables (sq, cub, and quart) all got assigned as we expected.

In [None]:
print(sq)

In [None]:
print(cub)

In [None]:
print('The square is {}.  The cube is {}. The 4th power is {}.'.format(sq, cub, quart))

And there are other ways to write the print statement.

In [None]:
print('The square is %d.  The cube is %d. the 4th power is %d' %(sq, cub, quart))

The second option is to just set the output equal to one object, in this case "powers".  

In [None]:
powers = myfunc(2)

Powers is now a "tuple"

In [None]:
type(powers)

And it has three values:

In [None]:
powers

If I only want one of its values I can use indexing.

In [None]:
powers[2]

## append
Useful for building up lists when you don't know how many of them there will be.  (You'll need this in the pair programming assignment below.)

In [None]:
x=[]  # this is a list with nothing in it
for i in range(10):
    x.append(2**i)

In [None]:
x

## Creating tests of your code.  
This is one of the mandatory components of your project.

It's also called "truth-testing".

Basically, once you write code to analyze your data for your project, you'll need to create a test that it's working the way you thought it was.

Let's use a new csv file:
I got this off Kaggle an hour ago:
Go to the same place you got this notebook and download and then upload country_vaccinations.csv

In [None]:
from google.colab import drive
#drive.mount("/content/drive") # this line only has to be run once
df = pd.read_csv("/content/drive/MyDrive/UNLV/research!/projects/GW_Explorer_A_Beginners_Guide/Beginning_Python_Notebooks/country_vaccinations.csv")

In [None]:
df.columns

In [None]:
df

In [None]:
ustate=df.loc[df['country']=='Zimbabwe']

In [None]:
ustate

In [None]:
ustate.index = ustate.index - min(ustate.index)  # Get the row numbers to start at 0
plt.plot(ustate.date, ustate['people_fully_vaccinated_per_hundred'])  # make the plot
desind = np.linspace(min(ustate.index), max(ustate.index),6, dtype=int) # Create a list of 6 integers that span
    # the data set
plt.xticks(desind,ustate.date[desind], rotation=45);  # put the date at each of those indices

In [None]:
# Import pandas library import pandas as pd
# initialize list of lists
data = ['tom', 10], ['nick', 15], ['juli', 14]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age'])

In [None]:
df

In [None]:
def myfunc(x,y):
    z=x+y
    return(z)

a=7
b=1
z = myfunc(a,b)
print(z)

NameError: name 'z' is not defined

I used some code from class 11.  I highly recommend using snippets of code you find from class!

If this were your data, you would want to make sure its internally consistent with itself.  That's a truth-test of sorts.

Invent a truth-test for this data set.  

Here's a suggestion: the total number of vaccinations should increase each day by the number of vaccinations on that particular day (if indeed, that's what the columns mean.)



In [None]:
# Truth test:
# Principle: the increase in the total number of people vaccinated each day should
# match the "total_vaccinations" column.

In [None]:
# practicing with loc to make sure I remember how to do it
ustate.loc[23,'total_vaccinations']
# (now actually look it up in the table!! and make sure you understand it!)

In [None]:
# Truth test:
# Principle: the increase in the total_vaccinations each day
# should match the "daily_vaccinations" column.
# Let's plot both of them.

# Use append
increase=[0]  # The first element is 0, because there was no date before that
for i in range(1, len(ustate)):
    increase.append(ustate.loc[i,'total_vaccinations'] -  ustate.loc[i-1,'total_vaccinations'])

# What I did above was subtract the previous days number from the current days number.
# FYI, that's a derivative.   You're actually doing numerical differentiation.

In [None]:
plt.plot(ustate.date, increase)
plt.plot(ustate.date, ustate['daily_vaccinations_raw'])
plt.xticks(desind,ustate.date[desind], rotation=45);  # put the date at each of those indices

Did it pass the truth test??   not really.  I mean, kind of.

##  So if this were my data for my project,
I would conclude that I sort of understood my data, but not really.  It seems like in general, the total number of vaccinations increases by the number of daily vaccinations.  It looks like maybe the total vaccinations isn't updated at the same rate that the daily vaccinations is updated.   We also might be looking at a problem of not updating on weekends.  (Does the blue line go up and down once a week?)

# Invent another truth test that this data should pass.
* Feel free to use a different country.
* I'm guessing that different countries have different parts of the table filled out.  (I notice there are a lot of NaN's in the table.).  So there may be different truth tests you can do with other countries' data.

# Multiple return values, and random numbers
A function can return more than one thing.  Let's pretend that we want to test out this random.normal function and see if the mean and standard deviation of the numbers it creates is really the ones we put int.

a) Write a new function that creates a normal
distribution of random numbers, accepting the center, width, and number of values of the distribution as its arguments, and returns the mean and standard deviation of those values.

Practice building up your function slowly.  First have it do only one really simple thing.  Make sure it can do that. Then slowly build up its capacities.

b) Create a loop that goes from N=1 to N=1000 and calculates the absolute value of the difference between the center you input into your function and the mean your function returned for N values in the distribution.  (In other words, each time through the loop you'll ask python to create N random numbers in the distribution.)  First talk with your partner about what you expect to observe in this assignment.  Plot the difference between the two numbers as a function of N.

c) Bonus: Do the same as b, but for standard deviation

In [None]:
# to calculate the mean and standard deviation use np.mean, np.stdev
y = np.arange(4)
print(np.mean(y))
print(np.std(y))

In [None]:
ustate.dropna()

In [None]:
# Stop scrolling now unless you want answers!

## Rachel's answers

In [None]:
# Part a
def calc_mean_stdev(center, width, number):
    x = np.random.normal(center,width, number)
    return(np.mean(x), np.std(x))

In [None]:
# Part b
diffmean=[]
for n in range(1,1000):
    y, x = calc_mean_stdev(0,1,n)
    diffmean.append(np.abs(y - 0))
plt.plot(diffmean)


In [None]:
# Part c
diffstdev=[]
for n in range(1,1000):
    y, x = calc_mean_stdev(0,1,n)
    diffstdev.append(np.abs(x - 1))
plt.plot(diffstdev)

## The moral of the story is that the more random numbers you create, the closer the mean and standard deviation will get to the nominal mean and standard deviation.