<a href="https://colab.research.google.com/github/nicsim22/DS110-Content/blob/main/Lecture18Objects_nosol.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Objects

*Finally done with her file-scanning project, Cynthia tried committing her code to SAGE's repository, hoping it would pass code review.*

*Unfortunately, her commit was rejected almost instantly.  "We're an OO shop," read the terse feedback from a senior programmer.  "Just because we're saving the world, doesn't mean we're sloppy."*

*OO?  Right - Object-Oriented.  It was stuff Cynthia had skipped in her Python education because she couldn't see the point.  But now the senior programmers wouldn't let her play ball unless she structured her code that way, so she supposed that was point enough.*



**Definitions to know:**
- A class
- An object
- _ _ in it _ _ parameter
- inheritance
- 'self' parameter: used to access variables that belong to the class
- difference between class and object: a class defines attributes and methods while an object holds the attribute data. A class doesn't hold attribute data.

# Object-oriented programming

Suppose I want to model a lot of cars in a dataset.  Up until now, a reasonable way to do this might be with a tuple for each car:  (2010, "Honda", "Fit", "blue").  Having this data travel around together, inseparably, is already an improvement over tracking the year, make, model, and color with separate variables.  Assigning a tuple to a variable like my_car will keep the relevant facts together.  Being able to bundle data like this is good.

But we could get still more organized.  Today we'll learn how to implement our own object *classes*.  If I make all my cars objects that belong to a Car class:

* I will have a way of checking whether something really is a Car instead of an unrelated four-element tuple.
* I can organize my code by having all Car-related functions (methods) in one place.
* Using inheritance (next time), I can share code between objects that are similar.
* Readers of the code have a better idea of what's happening when it talks about a Car instead of a tuple.


We'll start by defining a basic **empty Car class**.  The "pass" keyword just means "nothing interesting here."  To define a new class, we say **class [name of class]: and follow it with the body of the class**, which we can leave empty if we say "pass."

In [None]:
class Car:
    pass #means no further code, intended to leave blank

We have at least declared that there's something you can create called a "Car."  Even with this minimal definition, we can now create Car objects like so:

In [None]:
car1 = Car()
car2 = Car()
car3 = Car()

Each variable now points to a car object.  There's not much to know about these objects except the fact that they are indeed Cars, which we can check in the code:

In [None]:
print(isinstance(car1,Car)) #isinstance -- takes two arguments (thing asked abt, name of type) -- is car1 a car?
print(isinstance(5,Car)) #5 not assigned to car type -- thus false

True
False


The cars currently don't have any data associated with them, but we could assign data to them.  Unlike other languages, **Python doesn't require *attributes*, the variables associated with an object, to be defined ahead of time.** So we can set the attributes for these car objects like this:

example of attribute: e.g. shape of array in numpy

In [None]:
car1.year = 2010
car1.make = "Honda"
car1.model = "Fit"
car1.color = "blue"
car2.year = 2013
car2.make = "Toyota"
car2.model = "Camry"
car2.color = "silver"

We haven't done a lot with this class yet, but the code is already going to be more readable than it would be if we were using a tuple.

In [None]:
print(f"This car is a {car1.year} {car1.color} {car1.make} {car1.model}")
# as opposed to the less readable w tuple below
my_car = (2010, 'Honda', 'Fit', 'blue')
print(f"This car is a {my_car[0]} {my_car[3]} {my_car[1]} {my_car[2]}")

This car is a 2010 blue Honda Fit
This car is a 2010 blue Honda Fit


If we expect to do something a lot with our object, we can go back and **define a method for the object**.

//Methods are functions specific to class.//

To write a method for a class, define it indented under the class definition.  **You should also include a first argument of self**, in addition to any other arguments, that represents the object itself.

In [None]:
class Car:
    def print_facts(self): #self refers to the argument it self --> object itself gets bound to self
        print(f"This car is a {self.year} {self.color} {self.make} {self.model}") #self is whatever obj that comes before the . period

# Have to re-start these car objects since they had the old class definition
car1 = Car()
car2 = Car()

car1.year = 2010
car1.make = "Honda"
car1.model = "Fit"
car1.color = "blue"
car2.year = 2013
car2.make = "Toyota"
car2.model = "Camry"
car2.color = "silver"

car1.print_facts() #no argument here bc self argument is implict --> car.1 is the object that is already bound, so () can be empty
car2.print_facts()
Car.print_facts(car1) #will also works ????

This car is a 2010 blue Honda Fit
This car is a 2013 silver Toyota Camry
This car is a 2010 blue Honda Fit


This code is still rather wordier than it needs to be - we want the *initialization* of the object, where all the attributes are set for the first time, to be a little more painless.  To this end, we introduce the *constructor* `__init__()`.  The method with this name gets called automatically when the object is created for the first time, using the arguments passed in at that time.

In [None]:
class Car:
    def __init__(self, year, make, model, color): #start with __ (2 underscores) and end with __ (2 underscores); __init__ defines i
        # self refers to object that we are creating
        # It's common for the constructor's arguments
        # to have similar or identical names to the attributes they set
        # (but we still have to say one should be set to the other)
        self.year = year #self.year is the year that is stored in the object
        self.make = make
        self.model = model
        self.color = color

    def print_facts(self):
        print(f"This car is a {self.year} {self.color} {self.make} {self.model}")

# Now this is nicer
car1 = Car(2010, "Honda", "Fit", "blue") #__init__ will be called
car2 = Car(2013, "Toyota", "Camry", "silver")

car1.print_facts()
car2.print_facts()

This car is a 2010 blue Honda Fit
This car is a 2013 silver Toyota Camry


Those are the essentials to creating new classes.  We can expand the classes by adding more methods, or we can write code that makes use of the methods and attributes.

In [None]:
# Return the newest car in a list of cars
def newest_car(list_of_cars):
    if not list_of_cars:  # ie, if empty list
        return None
    best_year = list_of_cars[0].year
    best_car = list_of_cars[0]
    for car in list_of_cars:
        # This warning message could prevent a bug if we try
        # to hand this function the wrong list
        if not isinstance(car, Car):
            print('Warning, list had non-car items!')
        elif car.year > best_year:
            best_year = car.year
            best_car = car
    return best_car

newest_car([car1, car2]).print_facts()

This car is a 2013 silver Toyota Camry


# Review of object-oriented vocabulary so far

The ***class*** is the template that allows us to make many **object *instances***.  Each instance has its own ***attributes***, which are like variables attached to the object, and can call ***methods***, functions associated with the class.  The function that creates a new object is called the *constructor*; the *constructor* creates new *instances*.

# Getters and Setters

It's not uncommon to see object code that looks like the following, where there seem to be **methods that don't do anything besides set attribute values (*setters*)** and **methods that don't do anything besides return attribute values (*getters*)**.  We also see the attribute itself has a name starting with an underscore; this means "please use a method instead of accessing this directly."

In [None]:
class Bill:
  """ Represents a bill at a restaurant.

  _items (list of tuples):  list of (item name, cost) tuples
  """

  def __init__(self, items):
    self._items = items

  # "Getter"
  def items(self):
    return self._items

  # "Setter"
  def set_items(self, items):
    self._items = items #why are there ._items instead of just .items? --> considered good obj oriented programming style --> esp if you will rewrite the code later --> usual
    #if code were to be shifted around, methods to attributes will still safeguard and code will not break
    #._ --> means do not access directly

  def total_cost_pretax(self):
    total = 0
    for name, cost in self._items:
      total += cost
    return total

  def total_cost_with_tax(self, tax_rate):
    return round(self.total_cost_pretax() * (1 + tax_rate), 2)


In [None]:
my_lunch = [("Ham Sandwich", 9), ("Coke", 2)]
new_bill = Bill(my_lunch)
cost_with_tax = new_bill.total_cost_with_tax(0.08)
print(f"Total cost: {cost_with_tax}")

Total cost: 11.88


Writing "getters" and "setters" that avoid direct attribute access is generally considered to be a good practice in object-oriented programming, because it avoids breaking people's code when changes happen.

In [None]:
new_bill.items() # could have said new_bill._items, but we were told not to

[('Ham Sandwich', 9), ('Coke', 2)]

In [None]:
# Now suppose the Bill code gets redesigned a little - code is rearranged for some reason
# names and prices in separate lists now!
class Bill:
  """ Represents a bill at a restaurant.

  _item_names (list of strings):  list of items on bill
  _item_costs (list of ints): list of prices of items on bill
  _items is not here anymore! sorry anybody who wrote code that uses it, we warned you!
  """

  def __init__(self, items):
    self._item_names = [item[0] for item in items]
    self._item_costs = [item[1] for item in items]

  # "Getter"
  def items(self):
    # list(zip(a, b)) returns a list of tuples combining a and b
    return list(zip(self._item_names, self._item_costs))

  # "Setter"
  def set_items(self, items):
    self._item_names = [item[0] for item in items]
    self._item_costs = [item[1] for item in items]

  def total_cost_pretax(self):
    total = 0
    for name, cost in self._items:
      total += cost
    return total

  # Notice that we can call another method with this one
  def total_cost_with_tax(self, tax_rate):
    return round(self.total_cost_pretax() * (1 + tax_rate), 2)

In [None]:
my_lunch = [("Ham Sandwich", 9), ("Coke", 2)]
new_bill = Bill(my_lunch)
print(new_bill.items())  # this still works the same, but _items would have broken

[('Ham Sandwich', 9), ('Coke', 2)]


# Validation in the Constructor

One use of objects is in validating data - making sure it's the right type and range for its intended use.  This also demonstrates the use of ***raise*, or raising your own exception**.

In [None]:
class Circle:
  def __init__(self, radius):
    if radius < 0:
      raise ValueError("Can't have negative circle radius") #raise your own error
    self.radius=radius

Circle(-1)

# Default parameter values

It often makes sense to have the constructor take all the attributes of an object as arguments.  But there may be some default values that work well.
You can set default arguments to the constructor by following each argument with =value, where value is the default value.  Classes in machine learning libraries like scikit-learn do this a lot, recommending default values for their algorithms' parameters that often work well.

In [None]:
class Circle2:
  def __init__(self,radius=2): #can specify a default value for an argument, spare yourself extra typing and errors
    self.radius = radius

Circle2().radius #calling circle2 constructor, asking for its radius

2

# Choosing what is an object

Suppose we're writing software for a gradebook.  There are students who have grades on assignments and tests.  It's for a particular class.  What among these might usefully be an object?

* The students? -- Yes, can be obj
* The gradebook itself? -- Likely not, but possible, depends on what u want
* The grades? -- No
* The course name? -- No, just a single attribute of the grade book

In short, things that make good classes often have several pieces of data associated with them, along with functions that specifically pertain to that data.  But choosing what to treat as an object is often a question of elegance rather than a hard-and-fast rule.

# Exercise (5 min)

Try defining an object that represents a student.  Include at least 3 attributes that describe the student, including age.  Define a constructor and a method get_older() which increases the student's age by the argument.  You don't need to define all the getters and setters.

In [None]:
# TODO
class Student:
  def __init__(self, age, major, year):
    self.age = age
    self.major = major
    self.year = year

  def get_older(self, amount):
    self.age += amount


bob = Student(20, "Biology", "Sophomore")
bob.get_older(2)
print (bob.age)



22


# Pass by reference and objects

Everything in Python is actually an object, and Python is actually pass-by-reference with everything.  But you'll see the effects of this particularly when dealing with objects you created.  If you assign from one variable to another, then change one of the variables, the other will appear to change as well - because both variables are pointing to the same data in memory.

In [None]:
car1 = Car(2010, "Honda", "Fit", "blue")
car2 = car1
car2.color = "black" #will change blue in line 1 to black
car1.print_facts()  # It's black now
car2.print_facts()

This car is a 2010 black Honda Fit
This car is a 2010 black Honda Fit


If you want to alter one copy but not the other, you need to copy your object first. copy.copy() will do the job, or copy.deepcopy() if you've created a data structure with pointers to other objects that also need copying.

In [None]:
import copy

car2 = copy.copy(car1) #standard copy here, not a deep copy
car2.color = "white"
car1.print_facts()
car2.print_facts()

This car is a 2010 black Honda Fit
This car is a 2010 white Honda Fit


# Example with a CSV dataset

In reading a dataset, there may be entities that it makes sense to treat as objects, either because the object represents one sample, or because the object brings together data from several rows.  Here we show two classes that could be created that are related to the books.csv dataset:  a Book class, which corresponds to a row of the dataset, and a Publisher that has published multiple books, corresponding to an entry in the last column.

In [None]:
# Google colab only
from google.colab import files
uploaded = files.upload() # import books.csv

Saving books.csv to books.csv


In [None]:
import pandas as pd
df = pd.read_csv('books.csv', index_col = 'title')
df.head()

Unnamed: 0_level_0,bookID,authors,average_rating,isbn,isbn13,language_code,num_pages,ratings_count,text_reviews_count,publication_date,publisher
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Harry Potter and the Half-Blood Prince (Harry Potter #6),1,J.K. Rowling/Mary GrandPré,4.57,0439785960,9780439785969,eng,652,2095690,27591,9/16/2006,Scholastic Inc.
Harry Potter and the Order of the Phoenix (Harry Potter #5),2,J.K. Rowling/Mary GrandPré,4.49,0439358078,9780439358071,eng,870,2153167,29221,9/1/2004,Scholastic Inc.
Harry Potter and the Chamber of Secrets (Harry Potter #2),4,J.K. Rowling,4.42,0439554896,9780439554893,eng,352,6333,244,11/1/2003,Scholastic
Harry Potter and the Prisoner of Azkaban (Harry Potter #3),5,J.K. Rowling/Mary GrandPré,4.56,043965548X,9780439655484,eng,435,2339585,36325,5/1/2004,Scholastic Inc.
Harry Potter Boxed Set Books 1-5 (Harry Potter #1-5),8,J.K. Rowling/Mary GrandPré,4.78,0439682584,9780439682589,eng,2690,41428,164,9/13/2004,Scholastic


In [None]:
class Book:
    def __init__(self, title, author, average_rating):
        self.title = title
        self.author = author
        self.average_rating = average_rating
        # Could add more fields from the dataset if desired

class Publisher:
    def __init__(self, df, publisher_name):
        self.name = publisher_name
        self.books = []
        for row in df.itertuples(): #takes publisher one row after another in data frame -->itertuples
            if row.publisher == publisher_name:
                self.books.append(Book(row.Index, row.authors, row.average_rating)) # --> add publisher to self.books

    def average_rating(self):
        total = 0
        for book in self.books:
            total += book.average_rating
        return total/len(self.books)

scholastic = Publisher(df,'Scholastic Inc.')
scholastic.average_rating()

4.05923076923077