<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Objectives" data-toc-modified-id="Objectives-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Objectives</a></span></li><li><span><a href="#Why-a-Data-Scientist-Should-Learn-about-OOP" data-toc-modified-id="Why-a-Data-Scientist-Should-Learn-about-OOP-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Why a Data Scientist Should Learn about OOP</a></span></li><li><span><a href="#&quot;Everything-in-Python-is-an-object&quot;" data-toc-modified-id="&quot;Everything-in-Python-is-an-object&quot;-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>"Everything in Python is an object"</a></span><ul class="toc-item"><li><span><a href="#Side-Note-about-Variables" data-toc-modified-id="Side-Note-about-Variables-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Side Note about Variables</a></span></li></ul></li><li><span><a href="#Define-Attributes,-Methods,-and-Dot-Notation" data-toc-modified-id="Define-Attributes,-Methods,-and-Dot-Notation-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Define Attributes, Methods, and Dot Notation</a></span><ul class="toc-item"><li><span><a href="#Exercise" data-toc-modified-id="Exercise-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Exercise</a></span></li></ul></li><li><span><a href="#Describe-the-Relationship-of-Classes-to-Objects,-and-Learn-to-Code-Classes" data-toc-modified-id="Describe-the-Relationship-of-Classes-to-Objects,-and-Learn-to-Code-Classes-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Describe the Relationship of Classes to Objects, and Learn to Code Classes</a></span><ul class="toc-item"><li><span><a href="#Classes" data-toc-modified-id="Classes-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>Classes</a></span></li><li><span><a href="#Methods" data-toc-modified-id="Methods-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span>Methods</a></span></li></ul></li></ul></div>

![fvo](https://cdn.educba.com/academy/wp-content/uploads/2018/07/Functional-Programming-vs-OOP-1.png)

In [1]:
import pandas as pd
import inspect

# Objectives

- Explain the meaning and relevance of object orientation
- Explain the idea that "everything in Python is an object"
- Define the notions of attribute, method, and dot notation
- Describe the relationship of classes and objects, and to code classes
- Explain the notion of inheritance

# Why a Data Scientist Should Learn about OOP

  - By becoming familiar with the principles of OOP, you will increase your knowledge of what's possible.  Much of what you might think you need to code by hand is already built into the objects.
  - With a knowledge of classes and how objects store information, you will develop a better sense of when the learning in machine learning occurs in the code, and after that learning occurs, how to access the information gained.
  - You become comfortable reading other people's code, which will improve your own code.
  - You will develop knowledge of the OOP family of programming languages, the strengths and weakness of Python, and the strengths and weaknesses of other language families.

Let's begin by taking a look at the source code for `sklearn`'s [StandardScaler](https://github.com/scikit-learn/scikit-learn/blob/fd237278e/sklearn/preprocessing/_data.py#L517)

Take a minute to peruse the source code on your own. What do you notice?

# "Everything in Python is an object"

Python is an object-oriented programming language. You'll hear people say that "everything is an object" in Python. What does this mean?

Go back to the idea of a function for a moment. A function is a kind of abstraction whereby an algorithm is made repeatable. So instead of coding:

In [2]:
print(3**2 + 10)
print(4**2 + 10)
print(5**2 + 10)

19
26
35


or even:

In [3]:
for x in range(3, 6):
    print(x**2 + 10)

19
26
35


I can write:

In [4]:
def square_and_add_ten(x):
    return x**2 + 10

Now imagine a further abstraction: Before, creating a function was about making a certain algorithm available to different inputs. Now I want to make that function available to different **objects**.

Even Python integers are objects. Consider:

In [5]:
x = 3

We can see what type of object a variable is with the built-in type operator:

In [6]:
type(x)

int

By setting x equal to an integer, I'm imbuing x with the methods of the integer class.

In [7]:
x.bit_length()

2

In [8]:
y = 4
y.bit_length()

3

In [9]:
x.__float__()

3.0

Python is dynamically typed, meaning you don't have to instruct it as to what type of object your variable is.  
A variable is a pointer to where an object is stored in memory.

## Side Note about Variables

In [10]:
id(x)

4494154144

In [11]:
hex(id(x))

'0x10bdf59a0'

In [12]:
y = 3

In [13]:
hex(id(y))

'0x10bdf59a0'

In [14]:
x is y

True

In [15]:
# this can have implications 

x_list = [1,2,3,4]
y_list = x_list

x_list.pop()
print(x_list)
print(y_list)

[1, 2, 3]
[1, 2, 3]


In [16]:
# when you use copy(), you create a shallow copy of the object

z_list = y_list.copy()

In [17]:
id(z_list)

140188800184448

In [18]:
id(y_list)

140188800231168

In [19]:
y_list.pop()
print(y_list)
print(z_list)

[1, 2]
[1, 2, 3]


In [20]:
a_list = [[1,2,3], [4,5,6]]
b_list = a_list.copy()
a_list[0][0] ='z'
b_list

[['z', 2, 3], [4, 5, 6]]

In [21]:
import copy

# deepcopy is needed for mutable objects

a_list = [[1,2,3], [4,5,6]]
b_list = copy.deepcopy(a_list)
a_list[0][0] ='z'
b_list

[[1, 2, 3], [4, 5, 6]]

For more details on this general feature of Python, see [here](https://jakevdp.github.io/WhirlwindTourOfPython/03-semantics-variables.html).
For more on shallow and deep copying, go [here](https://docs.python.org/3/library/copy.html#copy.deepcopy).

# Define Attributes, Methods, and Dot Notation

Dot notation is used to access both attributes and methods.

Take for example our familiar friend, the [`Pandas` DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html).

In [22]:
# Dataframes are another type of object.

df = pd.DataFrame({'price': [50, 40, 30],'sqft': [1000, 950, 500]})

In [23]:
df

Unnamed: 0,price,sqft
0,50,1000
1,40,950
2,30,500


In [24]:
type(df)

pandas.core.frame.DataFrame

Instance attributes are associated with each unique object.
They describe characteristics of the object, and are accessed with dot notation like so:

In [25]:
df.shape

(3, 2)

What are some other DataFrame attributes we know?:

In [26]:
# Other df attributes



A **method** is a function attached to an object:

In [27]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   price   3 non-null      int64
 1   sqft    3 non-null      int64
dtypes: int64(2)
memory usage: 176.0 bytes


In [28]:
type(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   price   3 non-null      int64
 1   sqft    3 non-null      int64
dtypes: int64(2)
memory usage: 176.0 bytes


NoneType

In [29]:
# isna() is a method that comes along with the DataFrame object

df.isna()

Unnamed: 0,price,sqft
0,False,False
1,False,False
2,False,False


What other DataFrame methods do we know?

In [30]:
# Other df methods



## Exercise

Let's practice accessing the methods associated with the built in `str` class.  
You are given a string below: 

In [31]:
example = '   hELL0, w0RLD?   '

Your task is to fix is so it reads `Hello, World!` using string methods.  To practice chaining methods, try to do it in one line.

Use the [documentation](https://docs.python.org/3/library/stdtypes.html#string-methods), and use the inspect library to see the names of methods.

We can chain methods together because the **result of applying a method to an object is another object**.

In [32]:
inspect.getmembers(example)

[('__add__', <method-wrapper '__add__' of str object at 0x7f803fa82e40>),
 ('__class__', str),
 ('__contains__',
  <method-wrapper '__contains__' of str object at 0x7f803fa82e40>),
 ('__delattr__',
  <method-wrapper '__delattr__' of str object at 0x7f803fa82e40>),
 ('__dir__', <function str.__dir__()>),
 ('__doc__',
  "str(object='') -> str\nstr(bytes_or_buffer[, encoding[, errors]]) -> str\n\nCreate a new string object from the given object. If encoding or\nerrors is specified, then the object must expose a data buffer\nthat will be decoded using the given encoding and error handler.\nOtherwise, returns the result of object.__str__() (if defined)\nor repr(object).\nencoding defaults to sys.getdefaultencoding().\nerrors defaults to 'strict'."),
 ('__eq__', <method-wrapper '__eq__' of str object at 0x7f803fa82e40>),
 ('__format__', <function str.__format__(format_spec, /)>),
 ('__ge__', <method-wrapper '__ge__' of str object at 0x7f803fa82e40>),
 ('__getattribute__',
  <method-wrapper '

In [33]:
# we can also use the built-in dir() method

dir(example)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',


<details>
    <summary>
        Answer here
    </summary>
<code>example.swapcase().replace('0', 'o').strip().replace('?', '!')</code>
    </details>

# Describe the Relationship of Classes to Objects, and Learn to Code Classes

Each object is an instance of a **class** that defines a bundle of attributes and functions (now, as proprietary to the object type, called *methods*), the point being that **every object of that class will automatically have those proprietary attributes and methods**.

A class is like a blueprint that describes how to create a specific type of object.

![blueprint](img/blueprint.jpeg)

## Classes

We can define **new** classes of objects altogether by using the keyword `class`:

In [34]:
class Car:
    """Automotive object"""
    pass # This is called a stub.

In [35]:
# Instantiate a car object

ferrari = Car()
type(ferrari)

__main__.Car

In [36]:
# We can give the Ferrari four wheels

ferrari.wheels = 4
ferrari.wheels

4

But wouldn't it be nice not to have to do that every time? We'll just include the 4-wheels specification in the blueprint!

In [37]:
class Car:
    """Automotive object"""
    
    wheels = 4                      # These are attributes of *every* car.

In [38]:
civic = Car()
civic.wheels

4

In [39]:
#  Then we can add more attributes
class Car:
    """Automotive object"""
    
    wheels = 4                      # These are attributes of *every* car.
    doors = 4

In [40]:
ferrari = Car()
ferrari.doors

4

In [41]:
ferrari.wheels

4

In [42]:
# Does your Ferrari have only 2 doors? 
# These attributes can be overwritten.

ferrari.doors = 2
ferrari.doors

2

## Methods

We can also write functions that are associated with each class.  
As said above, a function associated with a class is called a method.

In [43]:
#  Then we can add more attributes
class Car:
    """Automotive object"""
    
    wheels = 4                      # These are attributes of *every* car.
    doors = 4

    def honk(self):                   # These are methods we can call on *any* car.
        print('Beep beep')

In [44]:
ferrari = civic = Car()
ferrari.honk()
civic.honk()

Beep beep
Beep beep


In [45]:
type(ferrari.wheels)

int

In [46]:
type(ferrari.honk())

Beep beep


NoneType

Wait a second, what's that `self` doing? <br/> Every method should include `self` as its first parameter, **which refers to the individual object, i.e. to the instance of the class**.