## Data lesson 6

Today we will learn about making loops & conditionals.

In [None]:
# Add import statements
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

#### **Finishing up curve fitting**

Let's consider fitting a noisy absorption band with a Gaussian profile.  

This could be useful if we are trying to determine the peak absorbance and wavelength of peak absorbance, but it is too noisy to read directly off the plot.

*Load the data from the file `noisy_spectrum.txt`.  Store the wavelength and absorbance as 1D arrays.  Plot it to check what the data looks like.*

In [None]:
# Add code here
data = np.genfromtxt('noisy_spectrum.txt', delimiter=' ').T
wavelength = data[0]
absorbance = data[1]

plt.plot(wavelength, absorbance)

Next we will need the function with which to fit our data.

*Complete the function for a Gaussian profile using the executable code below, where `xx` is the independent variable.  Be sure your arguments are ordered correctly to use with `curve_fit()`!*

In [None]:
# Complete the function
def gaussian(xx, amplitude, center, width):
    ygauss = amplitude*np.exp(-(xx-center)**2/(2*width**2))
    return ygauss

Now try calling curve_fit() using the Gaussian function and the spectral data.   

Include the keyword argument `p0=[guess1,guess2,...]`.  The list contains our guesses for the free parameters in the same order as they are listed in the function, in this case: amplitude, center, width.

In [None]:
# Add code here
params, pcov = curve_fit(gaussian, wavelength, absorbance, p0=[0.5, 350, 20])
params

We can make a new array containing the best-fit model of our data.

We evaluate the function used to fit the data at each x value in the original data set, and with the best-fit parameters provided by `curve_fit()`.

In this case, the function is `gaussian()`, the x data is contained in `wavelength`, and the best-fit parameters are the elements of `params`.

In [None]:
best_model = gaussian(wavelength, params[0], params[1], params[2])

*Overplot the best-fit model on top of your measured data.*

In [None]:
# Add code here

There are other features of `curve_fit()` that can come in handy if you have challenging data to fit.  This includes setting bounds on the values of the free parameters and adding an uncertainty specific to each value.  We won't go into detail here, but check out the documentation for more information.

Another important note is that our fitting will only describe the data as well as the function will allow it to!  If we use a bad model or equation, then no amount of parameter optimization will make it fit the data.

*Try fitting the `noisy_spectrum` data with your `line()` function instead.  Does it give a meaningful answer?*

In [None]:
# Add code here

#### **Try it yourself**

Try loading in the file `rate.txt` and fitting the concentration curve to a first-order rate law: [A] = [A]_0 exp(-kt)

In [None]:
# Add code here

#### **Intro to loops \& conditional statements**

We can add logic to our programs to make them more efficient.  Today we will cover:
* Conditions: set a criteria for whether a piece of code is executed or not
* Loops: execute the same block of code repeatedly

First, we need to learn some basics of Boolean logic.

#### **Boolean logic**

A Boolean expression is used to test the truth of some statement.  

It evaluated as either `True` or `False`.

We implement Boolean expressions using specific operators.

*Test the truth of the following statements by running each cell.*

In [None]:
7 == 2

This cell "asserts" that the first number (7) is equal to the second number (2).  A double equals sign `==` is used to indicate that we are testing whether the two numbers are equal.

The output reflects whether that assertion is `True` or `False`.

In this case, 7 does not equal 2, so the statement is declared `False`.

In [None]:
4 > 3

This cell asserts that 4 is greater than 3.  The output, `True`, reflects that this statement is true.

Here are some common Boolean operators that we can use to evaluate the truth of a statement:
* `==`: equal
* `!=`: not equal
* `<=`: less than or equal
* `>=`: greater than or equal
* `<`: less than
* `>`: greater than

*Test whether 4x12 is less than or equal to 3x14*

In [None]:
# Add code here

*Test whether the square root of 4096 is equal to 8 squared*

In [None]:
# Add code here

We can make **compound comparisons** by evaluating the truth of multiple statements simultaneously.

To do so we use the logic operators:
* `and` : returns `True` only if both inputs are `True`
* `or` : returns `True` if either input is `True`

In [None]:
# and returns True if both inputs are True
10 > 4 and 20 > 10

In [None]:
# and returns False if either input is False
10 > 4 and 10 < 4

In [None]:
# or returns True if either input is True
10 > 4 or 10 < 4

In [None]:
# or returns False if neither input is True
10 > 20 or 10 > 30

We can perform Boolean operations on arrays as well as individual values.

*What do you think will happen in the cell below?*

In [None]:
array = np.linspace(0,10,11)
array > 4

*Evaluate where `array` is less than 8*

In [None]:
# Add code here

For compound comparisons with arrays we need to use special functions.  

Remember back in Data lesson 4 when we selected wavelengths within the range 14-16 um?

`idx = np.logical_and(wavelength_um>14, wavelength_um<16)`

Now you can recognize that this is a Boolean `and` operation that involves identifying the indices where both of the following statements are true:
* wavelength_um>14
* wavelength_um<16

*Use `np.logical_and()` to determine where `array` is greater than 4 and less than 8*

In [None]:
# Add code here

A summary of helpful numpy functions for Boolean operations:
* np.logical_and() : evaluates `True` for each element where both inputs are `True`
* np.logical_or() : evaluates `True` for each element where at least one input is `True`
* np.where() : allows the user to specify a customized set of truth conditions

#### **Conditions**

What is the use of truth testing different statements with Boolean logic?

We can use this to specify **conditions** for whether a certain block of code is executed or not.

We can do so using an `if` statement.

The general format for an `if` statement is:
* `if` followed by a Boolean statement followed by a colon `:`
* Indented block of code to be executed only  if Boolean statement evaluates to `True`

In [None]:
pH = 5
if pH > 7:
    print("The solution is basic!")

In [None]:
pH = 8
if pH > 7:
    print("The solution is basic!")

The message only printed in the case when the Boolean statement was `True`.

We often wish to specify an alternative course of action for when the Boolean statement evaluates to `False`.

We can do so by adding an `else` statement.

In [None]:
pH = 5
if pH > 7:
    print("The solution is basic!")
else:
    print("The solution is not basic!")

Now you can see that although the `if` statement is `False`, we still have executed the code within the `else` statement.

If you want to add an additional condition you can use an `elif` statement, or "else if".

*Try changing the pH value defined below and see what happens.*

In [None]:
pH = 5
if pH > 7:
    print("The solution is basic!")
elif pH == 7:
    print("The solution is neutral!")
else:
    print("The solution is acidic!")

The `elif` block of code runs when the `if` statement is `False` and the `elif` statement is `True`.

The `else` statement runs when both the `if` and `elif` statements are `False`.

You do not need an `else` statement and could use only `if` and `elif` statements if you wish to take no action when your conditions are all `False`.

*Construct some conditional code to `print()` a string telling you whether a variable `number` is positive, negative, or 0.*

In [None]:
# Add code here

#### **Loops**

Loops allow us to execute the same block of code repeatedly.

To do so we **iterate** over some sequence like a list or array.

A `for` loop has the general structure:

`for variable in iterable:`

        executable code

Note that the first line of the `for` loop must end with a colon and the body must be indented.



In [None]:
for x in [0,1,2]:
    print(x*2)

In the example above:
* x is the `variable` that is used within the executable part of the code.  
* `iterable` is the sequence (list) containing all values to be looped over.
* During the `for` loop, each element in the list is assigned to the `variable` (x) and the code is then executed.

This is equivalent to the following, but much more compact and scalable.

In [None]:
print(0*2)
print(1*2)
print(2*2)

The original values within the `iterable` list or array are not modified directly in a `for` loop.  

Also, any variables assigned within a `for` loop will be overwritten each time it is run.

In [None]:
integers = [1,2,3,4] # define iterable
for x in integers:   # loop over integers
    square = x**2    # execute code
    
print(integers)
print(square)

We must create a new list or array to store the values of any quantity calculated in a `for` loop.

The list method `append()` is very helpful for adding a new element to a list during each iteration of the `for` loop.

In [None]:
integers = [1,2,3,4]        # define iterable
squares = []                # define empty list to store results
for x in integers:          # loop over integers
    square = x**2           # execute code
    squares.append(square)  # add result to list of results
    
print(squares)

The body of a loop can contain many statements.

*For each element in `integers`, print both x squared and the square root of x.*

In [None]:
# Add code here

It is convenient to use the built-in function `range()` to iterate over a sequence of numbers.

With `range()` we do not need to pre-define the list to be iterated over: it produces a sequence of integers on-the-fly.

`range(stop)` provides the integers from `0` to `stop-1`.

`range(start, stop)` provides the integers from `start` to `stop-1`.

In [None]:
for n in range(5):
    print(n)

In [None]:
for n in range(3,8):
    print(n)

`range()` can be used to conveniently perform an action a specified number of times.  

In this case the values provided by `range()` do not need to be used within the executable part of the code.

In [None]:
number = 0
for n in range(4):
    number = number + 3 # n is not used within executable part of code

number

The other main type of loop is the `while` loop.  

This combines looping with a conditional statement.

A `while` loop will iterate the executable block of code as long as the condition is being met.

It uses the same format as an `if` statement.

In [None]:
count = 0
while count < 4:
  print(count)
  count += 1

The code will execute as long as the Boolean statement `count < 4` is `True`.

Each iteration, `count` is increased by an increment of 1.

Therefore, after 4 iterations, `count` is equal to 4, the Boolean cndition is no longer `True`, and the code stops executing.

**A word of caution:** it is easy to set up a `while` loop that will never terminate.  Be careful about setting up your condition and, in general, it is safer to use `for` loops when possible!

#### **Conditionals inside of loops**

Loops and conditional statements can become much more useful when combined!

We can define a set of values, step through each one, and perform a different action depending on some conditional criteria.

Suppose we are observing a species of frog in the wild and wish to count the number of frogs in large vs. small size bins.

*Look at the code below and see if you can identify what it is doing*

In [None]:
frog_masses = [17.1, 4.3, 5.6, 12.4, 22.7, 5.9, 14.6, 3.9]

small_frogs = []
large_frogs = []
for mass in frog_masses:
    if mass > 14:
        large_frogs.append(mass)
    else:
       small_frogs.append(mass)

print(small_frogs)
print(large_frogs)

*Create a new list containing zeroes where the original list’s values were negative and ones where the original list's values were positive.*

In [None]:
original_list = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]


Sometimes we have two related lists or arrays (like wavelength and absorbance) and we wish to use a value from one list for assessing the conditional statement, and a value from another list in the executable part of code.

In this case, we could use `range()` to iterate over the indices of the lists and perform different tasks with the same position in each list.

Calling `range(len(list))` for some `list` iterates over all indices in the list.

In the example below, we create `new_list` which contains:
* The value of `list2` for all positions where `list1` is greater than 0.5.
* Twice the value of `list2` for all positions where `list1` is less than or equal to 0.5.

In [None]:
list1 = np.random.random(10)       # define list 1
list2 = np.arange(0,10,1)          # define list 2

new_list = []                      # create list to store new values
for n in range(len(list1)):        # iterate through all indices in list1
    if list1[n] > 0.5:             
        new_list.append(list2[n]) 
    else:
        new_list.append(list2[n]*2) 

new_list

Try it yourself: create a new list based on the values contained in `flags` and `values`:
* All values flagged as "good" are added to the new list
* All values flagged as "bad" are replaced with a 0

In [None]:
# Add code below
flags = ["good", "good", "good", "bad", "good", "bad", "good", "good", "bad"]
values = [0.9, 16.1, 16.6, 15.2,  3.2,  7.1,  8.9, 0.4, 0.2]
