# Midterm Review


##  Loops

The basic kind of loop is the foreach loop that does something with each element in the list (or other "iterable"):

In [1]:
a  = [1, 2, 100]
for item in a:
  print(item)

1
2
100


If we just want to do something a set number of times, we can use range() as the source of items to iterate over.  Remember that range(n) iterates from 0 to n-1 by default.

In [2]:
for i in range(3):
  print(f'Hooray {i}!')

Hooray 0!
Hooray 1!
Hooray 2!


If we want to do something until a condition is met, we can use a while loop -- which iterates until a condition is *not* met.

In [3]:
a = 3
while (a < 10):
  print(a)
  a += 3

3
6
9


enumerate() is useful for producing both an item number and an item during iteration, in case you want both.

In [4]:
for i, item in enumerate([3, 6, 9]):
  print("item " + str(i) + " is " + str(item))

item 0 is 3
item 1 is 6
item 2 is 9


Recall also that you can generally unpack tuples as a part of list iteration.

In [5]:
rated = [("Back to the Future", 4), ("Time Bandits", 2), ("Looper", 3)]
for movie, rating in rated:
  print("My rating of " + movie + " is " + str(rating))

My rating of Back to the Future is 4
My rating of Time Bandits is 2
My rating of Looper is 3


You can iterate over dictionaries, too.

In [6]:
mydict = {
    "Back to the Future": 4,
    "Time Bandits": 2,
    "Looper": 3
}

for key in mydict:
  print("My rating of " + key + " is " + str(mydict[key]))

My rating of Back to the Future is 4
My rating of Time Bandits is 2
My rating of Looper is 3


## Dictionaries

Recall that you can use square brackets to both set values for keys and look up values.

In [7]:
newdict = {}
newdict["Marco"] = "Polo"
print(newdict["Marco"])

Polo


If you try to access a key that isn't there, you will raise an error unless you supply a default value, as demonstrated below.

In [8]:
print(newdict.get("Polo", "Not Found"))

Not Found


The "in" keyword can also check whether something is a key in the dictionary already.

In [9]:
"Polo" in newdict

False

We mentioned earlier that you can iterate over keys, but you can also iterate over both keys and values as a tuple:

In [10]:
for movie, rating in mydict.items():
  print(movie + ": " + str(rating))

Back to the Future: 4
Time Bandits: 2
Looper: 3


## DataFrames

DataFrames are a major way to work with data in data science.


In [None]:
# Google colab only
from google.colab import files
import io

uploaded = files.upload() # pick starbucks_drinkMenu_expanded.csv

In [12]:
import pandas as pd

df = pd.read_csv('starbucks_drinkMenu_expanded.csv', index_col = 'Beverage')

Recall that we can get smaller dataframes using ".loc[]" on the dataframe.  We can pass it names of rows or columns we want, or colon if we want everything.

In [14]:
df.loc["Brewed Coffee", "Calories"]

Beverage
Brewed Coffee    3
Brewed Coffee    4
Brewed Coffee    5
Brewed Coffee    5
Name: Calories, dtype: int64

In [15]:
df.loc["Brewed Coffee", :]

Unnamed: 0_level_0,Beverage_category,Beverage_prep,Calories,Total_Fat_g,Trans_Fat_g,Saturated_Fat_g,Sodium_mg,Total_Carbohydrates_g,Cholesterol_mg,Dietary Fibre_g,Sugars_g,Protein_g,Vitamin_A,Vitamin_C,Calcium,Iron,Caffeine_mg
Beverage,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
Brewed Coffee,Coffee,Short,3,0.1,0.0,0.0,0,5,0,0,0,0.3,0%,0%,0%,0%,175
Brewed Coffee,Coffee,Tall,4,0.1,0.0,0.0,0,10,0,0,0,0.5,0%,0%,0%,0%,260
Brewed Coffee,Coffee,Grande,5,0.1,0.0,0.0,0,10,0,0,0,1.0,0%,0%,0%,0%,330
Brewed Coffee,Coffee,Venti,5,0.1,0.0,0.0,0,10,0,0,0,1.0,0%,0%,2%,0%,410


If we call mean() on a dataframe with no arguments, it'll find the means of all columns.

In [16]:
df.loc["Brewed Coffee", :].mean()

Calories                 4.250000e+00
Trans_Fat_g              0.000000e+00
Saturated_Fat_g          0.000000e+00
Sodium_mg                0.000000e+00
Total_Carbohydrates_g    8.750000e+00
Cholesterol_mg           0.000000e+00
Dietary Fibre_g          0.000000e+00
Sugars_g                 0.000000e+00
Protein_g                7.000000e-01
Caffeine_mg              4.381508e+10
dtype: float64

idxmax() will find the entry with the greatest value in some column.

In [17]:
df["Calories"].idxmax()

'White Chocolate Mocha (Without Whipped Cream)'

It's possible to filter for values that match particular criteria.  We create a nested expression where the inside expression checks which values fit and evaluates to an array of booleans, then use that to index the dataframe as a whole.

In [20]:
df[df["Calories"] > 500]

Unnamed: 0_level_0,Beverage_category,Beverage_prep,Calories,Total_Fat_g,Trans_Fat_g,Saturated_Fat_g,Sodium_mg,Total_Carbohydrates_g,Cholesterol_mg,Dietary Fibre_g,Sugars_g,Protein_g,Vitamin_A,Vitamin_C,Calcium,Iron,Caffeine_mg
Beverage,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
White Chocolate Mocha (Without Whipped Cream),Signature Espresso Drinks,2% Milk,510,15,9.0,0.2,35,330,77,0,74,19.0,20%,4%,60%,2%,150


It's possible to index by multiple criteria in this way, but they need to be separated by an & and each surrounded by parentheses.

In [22]:
df[(df["Calories"] > 300) & (df["Beverage_prep"] == "Venti")]

Unnamed: 0_level_0,Beverage_category,Beverage_prep,Calories,Total_Fat_g,Trans_Fat_g,Saturated_Fat_g,Sodium_mg,Total_Carbohydrates_g,Cholesterol_mg,Dietary Fibre_g,Sugars_g,Protein_g,Vitamin_A,Vitamin_C,Calcium,Iron,Caffeine_mg
Beverage,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
Caramel Apple Spice (Without Whipped Cream),Signature Espresso Drinks,Venti,360,0,0.0,0.0,0,25,89,0,83,0.0,0%,0%,0%,0%,0


## Objects

We define objects as bundles of related data and functions associated with that data (methods).  Some methods are common to all objects and can be overridden; these include \_\_init\_\_() for initialization and \_\_str\_\_() for rendering the object as a string.  (The double-underscores indicate built-in functions.)



All of an object's methods need to include self as a first parameter (unless they're static).  You can then refer to fields (variables inside the object) within the object code with "self.fieldname", or from outside the object using "variablename.fieldname".

In [23]:
class Square:
  def __init__(self, side):
    self.side = side

  def __str__(self):
    out = ""
    for i in range(self.side):
      for j in range(self.side):
        out += "O"
      out += "\n"
    return out
  
  def report_side_length(self):
    print(str(self.side))

square = Square(4)
print(square) # implicitly calls __str__
square.report_side_length()

OOOO
OOOO
OOOO
OOOO

4


## Strings

Recall that split() is a handy way to turn a string into a list; its argument is the separator.

In [24]:
"cabbage,beets,lettuce".split(',')

['cabbage', 'beets', 'lettuce']

.lower() is a good way of making sure comparisons are case-insensitive.

In [25]:
"Hello".lower() == "hEllO".lower()

True

Remember that "in" is the Python way of checking whether a string contains a substring.

In [26]:
"foo" in "foobar"

True

But for more complex pattern-matching, you need to use regular expressions.

In [27]:
import re

longstring = "We saw 200 people"
pattern = '(\d+) people'
result = re.search(pattern, longstring)
print(result.group(1))

200


# Recursion

Recall that a good way to program recursively is to assume the function already works for smaller problems, then make use of that to code the current case.

The recursive call should always be making progress toward your base cases, or else the program will run infinitely.


In [28]:
def reverse_string(s):
  if s == "":
    return ""
  return reverse_string(s[1:]) + s[0]

reverse_string("foobar")

'raboof'

Recursion is particularly common for functions on trees, with the recursive calls acting on the children of the current node.

## Other assorted reminders

* When you get errors, it tells you where things went wrong.

* You may find it helpful to create pseudocode to structure your answers before writing the program.  Outline in comments what needs to be done.

# Short sample problem 1:  Loop

Write a function that returns the next power of 2 after the argument (or the argument, if it is a power of 2).  You can assume the argument is at least 1.

In [None]:
def next_power(input):
  p = 0

  while 2 ** p < input:
    p += 1
  
  return 2 ** p

print(next_power(32))
print(next_power(33))

# Short sample problem 2:  Dictionaries

Write a function that takes a list of strings and a dictionary as input.  Return a tuple, where the first element is the number of strings in the argument that are keys in the dictionary, and the second element is a list of the found keys' values.

In [None]:
def find_strings(strings, mydict):
  found_count = 0
  found_vals = []
  for string in strings:
    if string in mydict:
      found_count += 1
      found_vals.append(mydict[string])
  return found_count, found_vals

test_dict = {
    "foo" : 1,
    "bar" : 3,
    "qux" : 2
}
find_strings(["foo", "bar", "baz"], test_dict)

# Short sample problem 3:  Dataframes

Using the Starbucks dataframe, find the mean calories of all Venti drinks.

In [None]:
venti = df[df["Beverage_prep"] == "Venti"]
venti["Calories"].mean()

# Short sample problem 4:  Objects

Define a right triangle object.  The constructor should take the lengths of the two legs adjacent to the right angle.  Define a method that returns the area, and also override the string method so that it says "Triangle of area X", where X is the area.

In [None]:
class RightTriangle():
  def __init__(self, left, right):
    self.left = left
    self.right = right
  
  def area(self):
    return self.left*self.right/2
  
  def __str__(self):
    return "Triangle of area " + str(self.area())

my_tri = RightTriangle(5,4)
print(my_tri)

# Short sample problem 5:  Recursion

Write a recursive program that flips a BinaryTree (defined below) to be its mirror image, so that the rightmost leaf is now the leftmost leaf and similarly throughout the tree.

In [None]:
class BinaryTree:
  def __init__(self,left,right,s):  # s is string data
    self.left = left
    self.right = right
    self.s = s
  
  def __str__(self):
    if (self.left):
      leftstring = str(self.left)
    else:
      leftstring = ""
    if (self.right):
      rightstring = str(self.right)
    else:
      rightstring = ""
    return leftstring + self.s + rightstring

leftleft = BinaryTree(None,None,"a")
leftright = BinaryTree(None,None,"c")
left = BinaryTree(leftleft,leftright,"b")
rightleft = BinaryTree(None,None,"e")
rightright = BinaryTree(None,None,"g")
right = BinaryTree(rightleft,rightright,"f")
root = BinaryTree(left,right,"d")

def mirror(bt):
  if bt == None:
    return None
  temp = mirror(bt.left)
  bt.left = mirror(bt.right)
  bt.right = temp
  return bt

print(mirror(root))
