# Object-oriented programming (OOP)


**Description:** This notebook decribes

* What object-oriented programming (OOP) is
* What a class is
* What an object is
* What attributes, methods and constructors are
* How to write a class
* How to create objects using a class 
* How to use OOP in a simple text analysis task

**Use Case:** For learners (detailed explanation)

**Difficulty:** Intermediate

**Completion Time:** 90 minutes

**Knowledge Required:**

* Python Basics Series ([Start Python Basics 1](./python-basics-1.ipynb))
    
**Knowledge Recommended:** None

**Data Format:** None

**Libraries Used:**
  * NLTK

**Research Pipeline:** None


# Classes and objects?

Object-oriented programming is a programming paradigm that relies on the concept of **classes** and **objects**. To understand OOP, we will need to understand **classes** and **objects**.

To make the concepts of **classes** and **objects** more concrete, let's use a simple example. 


Suppose you are looking to buy a house. You keep a record of the houses you are interested in to do a comparison between them.

 |                |      |  
 |----------------|:-----| 
 |**bedroom**     |3    | 
 |**bathroom**        |1.5    | 
 |**price**        | 660000 | 
 |**sqft**     |1500    | 
 |**price_per_sqft**|def price_per_sqft (600000, 1500):<br>&emsp;$~~~$price_per_sqft = 600000 / 1500<br>&emsp;$~~~$print ( "This house costs " + str(price_per_sqft) + " dollars per square foot.")|
    
 |                |      |  
 |----------------|:-----| 
 |**bedroom**     |4    | 
 |**bathroom**        |2    | 
  |**price**        | 800000| 
 |**sqft**     |2000    | 
 |**price_per_sqft**|def price_per_sqft (800000, 2000):<br>&emsp;$~~~$price_per_sqft = 800000 / 2000<br>&emsp;$~~~$print ( "This house costs " + str(price_per_sqft) + " dollars per square foot." )|

For each house you are interested in, you create such a record. In a record there are two kinds of information. First, the **attributes** of the house; Second, the **functions** that operate on the said house (in this case, a function that prints out the price per square foot).

Now, a natural question to ask is: is there a good way to organize such a collection of information? Yes! This is where **objects** come in. 

## What is an object?

An **object** is basically a collection of attributes and functions. With such a collection of information, an **object** can be used to represent anything, e.g. a person, a dog, a school, etc. 

Coming back to our example, we are using the **object** below to represent `house1`. Of course, depending on what properties/attributes and functions you want to include, you may represent `house1` with a different set of information stored in the **object**. In our scenario, we choose to use the number of bedrooms, the number of bathrooms, the price of the house and a function that calculates and prints out the price per square foot to represent `house1`. Let's assign this object to the variable name ```house1```.

<img src="https://ithaka-labs.s3.amazonaws.com/static-files/images/tdm/tdmdocs/intermediate_python_4_house1.png" width="650" height="300" />

We have created another **object** representing `house2`. Let's assign the object to the variable name ```house2```.

<img src="https://ithaka-labs.s3.amazonaws.com/static-files/images/tdm/tdmdocs/intermediate_python_4_house2.png" width="650" height ="300" />


Some terminologies. The variables in an object are called **attributes**. The functions in an object are called **methods**.

If you take a look at the two objects we have created to represent `house1` and `house2` respectively, you can easily see that they are quite similar. They have the same set of **attributes**, i.e. number of bedrooms, number of bathrooms, house price, number of square feet, and they have the same set of **methods**, i.e. a function that calculates and prints out the price per square foot. In other words, the two objects are written based on the same recipe. 

Now, if the objects are created based on the same recipe, it will be ideal if we can write out that recipe and then use it to produce the same kind of objects. In the house-buying scenario, for example, you will want as many objects as there are houses you are interested in! The question now becomes: how do we write that recipe? This is exactly where **classes** come in. 

## What is a class?

A **class** is an abstract blueprint from which we create individual instances of **objects**. 

|                |      |  
 |----------------|:-----| 
 |**bedroom**        | | 
 |**bathroom**     |    | 
 |**price**        |    | 
 |**sqft**     |    | 
 |**price_per_sqft**|def price_per_sqft ( price, sqft ):<br>&emsp;$~~~$price_per_sqft = price / sqft<br>&emsp;$~~~$print ( "This house costs " + str(price_per_sqft) + " per square foot." )|
 
 Notice that the values assigned to the four variables, i.e. bedroom, bathroom, price, sqft, are not specified in this class. This is because a **class** does not refer to any specific **object**. A **class** refers to a broad category of **objects**. Here in the house-buying scenario, our **class** refers to the category of houses. The **class** specifies what attributes the houses have and also what functions operate on these houses.

## A snippet of code example 

Now, let's write some codes to create our house **class**! 

In [12]:
# Create a class named 'House'
class House:
    def __init__(self, num_bedroom, num_bathroom, overall_price, num_sqft): ## constructor
        self.bedroom = num_bedroom      ## instance variable
        self.bathroom = num_bathroom    ## instance variable
        self.price = overall_price      ## instance variable
        self.sqft = num_sqft            ## instance variable
    def price_per_sqft(self): ## method
        price_per_sqft = self.price / self.sqft
        print ("This house costs " + str(price_per_sqft) + " dollars per square foot.")

In [25]:
# Conventionally, the argument names and variables names are the same
class House:
    def __init__(self, bedroom, bathroom, price, sqft): ## constructor
        self.bedroom = bedroom      ## instance variable
        self.bathroom = bathroom    ## instance variable
        self.price = price          ## instance variable
        self.sqft = sqft            ## instance variable
    def price_per_sqft(self): ## method
        price_per_sqft = self.price / self.sqft
        print ("This house costs " + str(price_per_sqft) + " dollars per square foot.")

In [26]:
# Create an object house1
house1 = House (3, 1.5, 660000, 1500) 

In [27]:
# Get the value of the attribute bedroom of house1
house1.bedroom

3

In [28]:
# Get the value of the attribute bathroom of house1
house1.bathroom

1.5

In [29]:
# Get the value of the attribute price of house1
house1.price

660000

In [30]:
# Get the value of the attribute sqft of house1
house1.sqft

1500

In [31]:
# Use the method price_per_sqft of house1
house1.price_per_sqft()

This house costs 440.0 dollars per square foot.


In [34]:
# Create an object house2
house2 = House(4, 2, 800000, 2000)

In [35]:
# Get the value of the attribute bedroom of house2
house2.bedroom

4

In [59]:
# Add a new variable to an object and assign a value to it
house1.lot_size = 500

In [60]:
# The newly added variable is specific to the object house1
house2.lot_size

AttributeError: 'House' object has no attribute 'lot_size'

### instance variables vs. class variables

An object we create using a class is an instance of that class. `House1` and `House2` are two instances of the class `House`. In the class `House`, we have defined several **instance variables** like `bedroom`, `bathroom`, `price` and `sqft`. The values assigned to these instance variables are different for each house because each house has its own number of bedrooms, bathroom, house price and number of square feet.    

In [32]:
# Take a look at the instance variables in the class House
class House:
    def __init__(self, bedroom, bathroom, price, sqft): ## constructor
        self.bedroom = bedroom      ## instance variable
        self.bathroom = bathroom    ## instance variable
        self.price = price          ## instance variable
        self.sqft = sqft            ## instance variable
    def price_per_sqft(self): ## method
        price_per_sqft = self.price / self.sqft
        print ("This house costs " + str(price_per_sqft) + " dollars per square foot.")

In [36]:
# the values assigned to instance variables vary with the objects
house1.bathroom == house2.bathroom

False

**Class variables** are the variables that are assigned the same values across all the instances of a class. Suppose the houses you are interested in all raise their prices by 5%. Now, you want to write a new method in the class `House` to calculate the new house prices. 

In [51]:
# Write a new method in the class House to calculate new house prices
class House:
    perc_raise = 0.05   ## class variable
    def __init__(self, bedroom, bathroom, price, sqft): ## constructor
        self.bedroom = bedroom      ## instance variable
        self.bathroom = bathroom    ## instance variable
        self.price = price          ## instance variable
        self.sqft = sqft            ## instance variable
    def price_per_sqft(self): ## method
        price_per_sqft = self.price / self.sqft
        print ("This house costs " + str(price_per_sqft) + " dollars per square foot.")
    def new_price (self):
        new_price = self.price * (1 + House.perc_raise)
        print ("This house now costs " + str(new_price) + " dollars.")

In [52]:
# Create an object house1 using this new class House
house1 = House (3, 1.5, 660000, 1500)

In [53]:
# Get the new price of house1 after the raise
house1.new_price()

This house now costs 693000.0 dollars.


In [56]:
# Create an object house2
house2 = House(4, 2, 800000, 2000)

In [61]:
# Access the class variable using an instance
print(house1.perc_raise)
print(house2.perc_raise)

0.05
0.05


In [62]:
# Access the class variable using the class
print(House.perc_raise)

0.05


In [63]:
# When accessing the value of a variable using an object, it first searches the name space of the object
# If no such variable name there, then it searches the name space of the class
house1.perc_raise = 0.08

In [64]:
# Check the value of perc_raise for house1
house1.perc_raise

0.08

In [65]:
# Check the value of perc_raise for house2
house2.perc_raise

0.05

In [66]:
# Check the value of perc_raise for the class House
House.perc_raise

0.05

<h3 style="color:red; display:inline">Coding Challenge! &lt; / &gt; </h3>

Create a class called `Employee`. In this class, create the instance variables `first_name`, `last_name`, `salary` and `email`. Also, create a method that prints out the full name of instances of this class. Then, create two instances of this class. 

The company has just announced the pay raise. Everyone will get a pay raise of 5%. Add a class variable `pay_raise` to `Employee`. For the two instances you created just now, calculate their new salary.  

# Using OOP in text analysis
We have answered the question of how to write classes and objects in Python. Now we need to answer the question of why. Why do we need OOP? What is the benefit of using OOP? In the following, you will find the answer to these questions by going through a mock project of text analysis.  

It is often the case that we want to reuse some of the functions we write to process data files from one analysis in another analysis. For example, you may create a function which cleans the text (returns lowercase text, free from stopwords) in the preprocessing stage in a text analysis project. A few months later when you work on another project, you want to preprocess text in the same way as your previous project. In this case, you want to reuse the clean text method to clean the text in the same way as before.

What makes OOP particularly attractive is the reusability of functions and objects which you can reuse in different analysis. 

# References

to be added
