# Object-oriented programming (OOP)


**Description:** This notebook decribes

* What object-oriented programming (OOP) is
* What a class is
* What an object is
* What attributes, methods and constructors are
* How to write a class
* How to create objects using a class 
* How to use OOP in a simple text analysis task

**Use Case:** For learners (detailed explanation)

**Difficulty:** Intermediate

**Completion Time:** 90 minutes

**Knowledge Required:**

* Python Basics Series ([Start Python Basics 1](./python-basics-1.ipynb))
    
**Knowledge Recommended:** None

**Data Format:** None

**Libraries Used:**
  * NLTK

**Research Pipeline:** None


# What is object-oriented programming (OOP)?

OOP is a programming paradigm that relies on the concept of **classes** and **objects**.

To make the concepts of **classes** and **objects** more concrete, let's use a simple example. 
<!-- OOP is used to structure software program into simple, reusable pieces of code blueprints called **classes**, which are used to create individual instances of **classes** called **objects**. Python, the programming language we are using in the Constellate tutorials, is an object-oriented programming language. Essentiallly, everyting in Python is an object of a certain class, be it a number, a string, or a list.  -->

Suppose you are a high school teacher and you want to keep a record of student performance in your class. Specificlly, there are two main things you want to record. First, student name and their grades in English, Math and History. Second, student's average score of the three subjects. 

 |                |      |  
 |----------------|:-----| 
 |**name**        | Abby | 
 |**English**     |90    | 
 |**Math**        |95    | 
 |**History**     |88    | 
 |**average_score**|def average_score ( ):<br>&emsp;average = ( 90 + 95 + 88 ) / 3<br>&emsp;print ( "Abby's average score is " + average + "." )|
    
 |                |      |  
 |----------------|:-----| 
 |**name**        | Billy| 
 |**English**     |85    | 
 |**Math**        |90    | 
 |**History**     |86    | 
 |**average_score**|def average_score ( ):<br>&emsp;average = ( 85 + 90 + 86 ) / 3<br>&emsp;print ( "Billy's average score is " + average + "." )|

For each student in the class, you will create such a record. Notice that in a record there are two kinds of information. First, the **properties/attributes** of the student (in this case, name, English score, Math score, History score); Second, the **functions** that operate on the said student (in this case, a function that prints out the average score).

Now, a natural question to ask is: is there a good way to organize such a collection of information? Yes! This is where **objects** come in. 

## What is an object?

An **object** is basically a collection of properties/attributes and functions. With such a collection of information, an **object** can be used to represent anything, e.g. a person, a dog, a school, etc. 

Coming back to our example, we are using the **object** below to represent the student Abby. Of course, depending on what properties/attributes and functions you want to include, you may represent Abby with a different set of information stored in the **object**. In our scenario, we choose to use name, English score, Math score, History score and a function that calculates and prints out the average score of the three subjects to represent Abby. Let's assign this object to the variable name ```Abby```.

<img src="https://ithaka-labs.s3.amazonaws.com/static-files/images/tdm/tdmdocs/usingOOP_Abby.png" width="500" height="300" />

We have created another **object** representing Billy. Let's assign the object to the variable name ```Billy```.

<img src="https://ithaka-labs.s3.amazonaws.com/static-files/images/tdm/tdmdocs/usingOOP_Billy.png" width="500" height ="300" />


Some terminologies. The variables in an object are called **attributes**. The functions in an object are called **methods**.

If you take a look at the two objects we have created to represent Abby and Billy respectively, you can easily see that they are quite similar. They have the same set of **attributes**, i.e. name, English score, Math score, History score, and they have the same set of **methods**, i.e. a function that calculates and prints out the average score. In other words, the two objects are written based on the same recipe. 

Now, if the objects are created based on the same recipe, it will be ideal if we can write out that recipe and then use it to produce the same kind of objects. In our scenario for example, we want as many objects as there are students in the class! The question now becomes: how do we write that recipe? This is exactly where **classes** come in. 

## What is a class?

A **class** is an abstract blueprint from which we create individual instances of **objects**. 

|                |      |  
 |----------------|:-----| 
 |**name**        | | 
 |**eng**     |    | 
 |**math**        |    | 
 |**hist**     |    | 
 |**average_score**|def average_score ( ):<br>&emsp;average = ( eng + math + hist ) / 3<br>&emsp;print ( name + "'s average score is " + average + "." )|
 
 Notice that the values assigned to the four variables, i.e. name, eng, math, hist, are not specified in this class. This is because a **class** does not refer to any specific **object** that have these values already set. A **class** refers to a broad category of **objects**. Here, our **class** refers to the category of students. The **class** specifies what attributes the students have and also what functions the students will be operated on. 

## A snippet of code example 

Now, let's write some codes to create our student **class**! 

In [None]:
class Student:
    def __init__ (self, stu_name, eng_score, math_score, hist_score): ## constructor
        self.name = stu_name
        self.eng = eng_score
        self.math = math_score
        self.hist = hist_score
    def average_score (self): ## method
        average = (self.eng + self.math + self.hist)/3
        print (self.name + "'s average score is " + str(average) + ".")

In [None]:
abby = Student ("Abby", 90, 95, 88)

In [None]:
abby.average_score ()

In [None]:
billy = Student ("Billy", 85, 90, 86)

In [None]:
billy.name

In [None]:
billy.eng

In [None]:
billy.average_score ()

# Using OOP in text analysis

It is often the case that we want to reuse some of the functions we write to process data files from one analysis in another analysis. For example, you may create a function which cleans the text (returns lowercase text, free from stopwords) in the preprocessing stage in a text analysis project. A few months later when you work on another project, you want to preprocess text in the same way as your previous project. In this case, you want to reuse the clean text method to clean the text in the same way as before.

What makes OOP particularly attractive is the reusability of functions and objects which you can reuse in different analysis. 

# References

to be added
