<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Introduction to Object-Oriented Programming
_Author:_ Tim Book

# Programming Paradigms

There are several **programming paradigms** (ie, programming patterns/styles) out there. Most languages fall into multiple of them, but many language choose to specialize in one of them. In modern programming, two paradigms seem to dominate:

## Functional Programming
The FP paradigm involves completing all of your tasks via making **functions** and using them in a very clever manner. This sounds simple, but it is not. A hallmark of FP is frequent use of functions similar to `map` and `apply`, as well as anonymous functions (`lambda`s).

**Examples of some common languages that specialize in FP:**
* R
* Scala
* Haskell
* The Lisp family (Clojure, Lisp, Scheme, etc.)

| Pros of FP | Cons of FP |
| --- | --- |
| Code is easy to read | Code is hard to write |
| Code tends to be much shorter | Steep learning curve |
| It has very strong foundations in theoretical math | It has very strong foundations in theoretical math |

## Object-Oriented Programming
The OOP paradigm involves **creating your own data types**. These data types can **maintain their own state** and **have their own methods and functions assocated with them**.

**Examples of some common languages that specialize in OOP:**
* Python
* Ruby
* Scala
* Java

| Pros of OOP | Cons of OOP |
| --- | --- |
| Code is easy to write | Coding style can be awkward to get used to |
| Shallow learning curve | Code is typically much longer |

## Python is Object-Oriented!
![](imgs/gvr.jpg)

While Python _does_ support a lot of FP tools, Python is designed to work best with an OOP design. In fact, even the parts of Python that are designed to allow you to work in FP are built on top of OOP. (In most languages, it's the other way around!)

## Cool, but: Why now?
You're actually very familiar with **using** OOP tools. Objects like `DataFrame`, `StandardScaler`, and `LinearRegression` have all followed the traditional OOP pattern. If you understand how to manipulate those objects, you know OOP!

But, we don't know how to **make our own objects** yet. That's what we're going to explore today.

![](imgs/ds-def.png)

In data science, we don't make our own classes very often. But it's absolutely imperative for data scientists to be comfortable with the idea, and to recognize when making a class is a good idea. **If data science is a cross between statistics and computer science, this lesson falls more on the computer science side.** After today's lesson, a lot of the magic surrounding what we've been doing up until now should "click".

## OOP Vocab

**Covered in this lesson:**
* Class
* Instance
* Attribute
* Method
* Constructor method
* State
* "self"

**Not covered in this lesson (but some covered in supplemental material):**
* Inheritance
* Encapsulation
* Magic methods (aka "dunder methods")
* Class method
* Static method
* Public and private methods
* Getter and setter methods

## Part I: The Dog Class

In [2]:
# Instantiate a Dog named Chloe

In [32]:
# A new type of thing!

__main__.Dog

In [3]:
# This instace of Dog has attributes

'Chloe'

True

In [5]:
# Call a method on this instance of Dog

Bark bark, I'm Chloe the pug!


In [6]:
# Another method. This one changes the state of the Dog

Chloe eats...


In [7]:
# State has changed!

False

In [8]:
# Again. The state of Chloe has changed!

Chloe is not hungry!


In [9]:
# If I make a different Dog, it doesn't share state with Chloe

True

In [10]:
# We can also make a Cat class, but it's a totally separate concept from Dog.

In [12]:
# Cat doesn't magically get Dog class's methods.
# Keeping methods specific to only the classes that can use them
# is called "encapsulation" - a core tenant of OOP.

## Part II: The Car Class
Let's create a car with a make and model. This car will have the following features:
* It will keep track of its own miles
* It will keep track of its state as to whether the car is on or off
* If the car is off, it can't drive!
* It will have methods to turn the car on and off.

**(THREAD):** Build a `drive()` method that takes one argument and adds that many miles to the car's odometer.

Beep beep!


0

Car is off!


0

20

### Can you see how this can quickly get complicated?
Cars are more intricate than this.

**An exercise left to the reader:** Can you modify this car class to keep track of its own gas, too? That might involve an `mpg` attribute as well as a `tank_size` attribute. When the car drives a certain number of miles, compute how much gas is consumed. You'll also probably need a `fill_tank()` method to refuel gas.

**Further, much more advanced considerations:** What if the car only has enough gas for 15 miles, but you try to drive 20 miles? Should it drive 15 miles and then stop? Should it throw an error? Should it throw the error before or after deducting the gas? If it throws an error, what kind? Maybe you'll need to create your own `EmptyTankError` exception that inherits from `Exception`.

Sounds hard to make? This is one of the pro/con tradeoffs of OOP. It's very easy to use. When done right code looks like this:

```python
mycar = Car("Chrysler", "PT Cruiser", mpg=30, tank_size=11)
mycar.turn_on()
mycar.drive(30)
mycar.turn_off()
mycar.turn_on()
mycar.drive(100)
mycar.fill_tank()
mycar.drive(100)
mycar.turn_off()
```

Simple! Easy to read! Building classes is said to be a **layer of abstraction** for your code for this reason. This syntax is hiding potentially hundreds of lines of code that you don't need to worry about.

**Fun fact:** The file that defines the pandas `DataFrame` is more than 8,000 lines long! You don't need to read those 8,000+ lines to know how to use a `DataFrame`. Check out what this really looks like in the wild [here](https://github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py).

## Part III: Hiding Your Ugly Code to Keep Your Notebooks Clean
Have you ever wondered how to make your own "importable" things? Let's check out a basic example. Let's open up the `car.py` file in this directory.

In [18]:
from car import AdvancedCar

In [19]:
# This new advanced car can keep track of mpg and tanks_size!

In [23]:
# Car still honks

Beep beep!


In [21]:
# Car now has a 'gas' attribute

11

In [26]:
# Let's turn the car on and go!

In [27]:
# Car state has changed. We've added miles...

25

In [28]:
# And used up some gas!

10.0

In [29]:
# What if we try to drive too far?
# A custom error!

InsufficientGasError: You don't have enough gas!

In [30]:
# Fill up

In [31]:
# All the gas

11.0

### When would we needs this?
Luckily for us, between `pandas` and `sklearn`, most of the classes we need have already been built for us. But data scientists don't work in a vacuum! Here are some examples of times where building your own class is the right thing to do:

#### Whenever you want to bundle your code into a package.
It's true that you can define functions that can be `import`ed, but it's not very _Pythonic_. True Pythonistas will build related tools into classes that can be shared amongst coworkers. **If you set this up properly, you can even have them be `pip install`able from either a private or prublic Git repository!** Think about all of the different libraries we've used so far. You know this pattern to be true!

#### Whenever you want to "build once, run many times later."
Imagine a complicated task, such as connecting to a server and executing code on it. These tasks typically have a lot of rote boilerplate code that you'd want to automate. For example, check out this fantasy code you might write for connecting to a SQL server:

```python
conn = SQLServer("12.34.56.78")
conn.connect()
conn.login("tim", "p@ssw0rd1!")
conn.execute("SELECT name, age FROM users")
conn.close()
```

#### Unit Testing
Most of Python's unit testing capabilities require you to build classes, where each method is an individual suite of tests.

> **Unit testing** is a type of automated testing you can do to ensure that minor changes you make to your code don't fundamentally change what your code is doing.

#### Sometimes you literally just _need_ to.
There are actually a few data science packages that force you to build a class in order to use them properly. Specificaly these two:

![](imgs/scrapy.png)
![](imgs/pytorch.png)

* **PyTorch** - A popular deep learning library. Second only in popularity to TensorFlow/Keras and gaining.
* **Scrapy** - A heavy-duty webscraping library, much more powerful than BeautifulSoup.

## Conclusions and Takeaways
* OOP is a really cool coding paradigm that takes some getting used to.
* OOP is easy to use and write, but code can be pretty long sometimes.
* OOP can serve to really clean you code up and make it easier to read.
* We won't _need_ to build classes very often, but we should definitely do it more!
* Let us all be more OO.