# Object Oriented Programming

<img src="./img/oop.jpeg" width="400" align='center'>


## Objectives 
1. Understand what OOP is
2. Be able to create a python object including methods and attributes 
3. Apply inheritance to a python object
4. Understand how OOP is used in scikit learn

## What is object oriented programming?

Object-oriented programming (OOP) is a programming paradigm based on the concept of "objects", which can contain data, in the form of fields (often known as attributes or properties), and code, in the form of procedures (often known as methods).  Objects get their variables and functions from classes. Classes are essentially a template to create your objects.

<img src="./img/att_methods.png" width="500" align='center'>


## What are the other popular programming paradigms?

There are many other programming paradigms. You can check them in this link: 

- [Wiki - Programming Paradigms](https://en.wikipedia.org/wiki/Programming_paradigm)
- [Common Paradigms](https://cs.lmu.edu/~ray/notes/paradigms/)

## Why a data scientist should learn about OOP

![hackerman](https://media.giphy.com/media/MM0Jrc8BHKx3y/giphy.gif)

  - By becoming familiar with the principles of OOP, you will increase your knowledge of what's possible.  Much of what you might think you need to code by hand is already built into the objects.
  - With a knowledge of classes and how objects store information, you will develop a better sense of when the learning in machine learning occurs in the code, and after that learning occurs, how to access the information gained.
  - You become comfortable reading other people's code, which will improve your own code.
  - You will develop knowledge of the OOP family of programming languages, what are the strengths and weakness of Python, and the strengths and weaknesses of other language families.



### Turn and Talk:

<img src="img/talking.jpeg" width="60" align='left'>

</br>
</br>
</br>

Let's begin by taking a look at the source code for [Sklearn's standard scalar](https://github.com/scikit-learn/scikit-learn/blob/fd237278e/sklearn/preprocessing/_data.py#L517)

Take a minute to examine the source code.  What are some familiar aspects of this code?  What are some things you have not seen before?

## Pandas Objects

In fact we have already used objects!  One of our frequently used objects in a pandas dataframe!

We can use dot notation to access both attributes and methods related to our class object.

In [1]:
import pandas as pd

df = pd.DataFrame({'price':[50,40,30],'sqft':[1000,950,500]})

In [2]:
type(df)

pandas.core.frame.DataFrame

Instance __attributes__ are associated with each unique object.
They describe characteristics of the object, and are accessed with dot notation like so:

In [26]:
df.shape

(3, 2)

What are some other DataFrame attributes we know?:

In [27]:
# answer

In [28]:
#__SOLUTION__
# Other attributes
print(df.columns)
print(df.index)
print(df.dtypes)
print(df.T)

Index(['price', 'sqft'], dtype='object')
RangeIndex(start=0, stop=3, step=1)
price    int64
sqft     int64
dtype: object
          0    1    2
price    50   40   30
sqft   1000  950  500


A **method** is what we call a function attached to an object

In [29]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   price   3 non-null      int64
 1   sqft    3 non-null      int64
dtypes: int64(2)
memory usage: 176.0 bytes


In [30]:
# isna() is a method that comes along with the DataFrame object
df.isna()

Unnamed: 0,price,sqft
0,False,False
1,False,False
2,False,False


What other DataFrame methods do we know?

In [31]:
#__SOLUTION__
df.describe()
df.copy()
df.head()
df.tail()

Unnamed: 0,price,sqft
0,50,1000
1,40,950
2,30,500


### Practice With a Partner:

<img src="img/talking.jpeg" width="60" align='left'>

</br>
</br>
</br>

Let's practice accessing the methods associated with the built in string class.  
You are given a string below: 

In [33]:
example = '   hELL0, w0RLD?   '

Your task is to fix is so it reads `Hello, World!` using string methods.  To practice chaining methods, try to do it in one line.
Use the [documentation](https://docs.python.org/3/library/stdtypes.html#string-methods), and use the inspect library to see the names of methods.

In [34]:
import inspect
inspect.getmembers(example)

[('__add__', <method-wrapper '__add__' of str object at 0x11c5b2300>),
 ('__class__', str),
 ('__contains__',
  <method-wrapper '__contains__' of str object at 0x11c5b2300>),
 ('__delattr__', <method-wrapper '__delattr__' of str object at 0x11c5b2300>),
 ('__dir__', <function str.__dir__()>),
 ('__doc__',
  "str(object='') -> str\nstr(bytes_or_buffer[, encoding[, errors]]) -> str\n\nCreate a new string object from the given object. If encoding or\nerrors is specified, then the object must expose a data buffer\nthat will be decoded using the given encoding and error handler.\nOtherwise, returns the result of object.__str__() (if defined)\nor repr(object).\nencoding defaults to sys.getdefaultencoding().\nerrors defaults to 'strict'."),
 ('__eq__', <method-wrapper '__eq__' of str object at 0x11c5b2300>),
 ('__format__', <function str.__format__(format_spec, /)>),
 ('__ge__', <method-wrapper '__ge__' of str object at 0x11c5b2300>),
 ('__getattribute__',
  <method-wrapper '__getattribute__'

In [230]:
# we can also use built in dir() method
dir(example)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',


In [183]:
# Your code here

In [180]:
#__SOLUTION__
example.swapcase().replace('0','o').strip().replace('?','!')

'Hello, World!'

## Building a Class 

Here we will create a class that count people's information and the number of parties they attended

In [16]:
class PartyCount:
    # count is an attribute - it contains data
    guests = 0
    
    
    def __init__(self):
        print('A PartyCount object constructed')
        
    # party is a method of this class - it defines a procedure
    def party(self):
        self.guests = self.guests+1
        print("So far", self.guests)

[For naming Conventions and terminology](https://en.wikipedia.org/wiki/Camel_case)

In [17]:
## let's create murat_party_info object. 
marisa_party_info = PartyCount()

A PartyCount object constructed


__Your Turn__

- Make marisa_party_info.guests equals to 3 by calling party method.


In [18]:
## your code here

In [19]:
## __SOLUTION__

marisa_party_info.guests = 3
marisa_party_info.guests

3

__On 'self' parameter__

Note that "party" method has only one parameter, namely "self". When the "party" method is called, the first parameter (which we call by convention self) points to the particular instance of the PartyCount object that "party" is called from.

In [20]:
## note that using the self is the same thing as giving
## the object itself in the method.

PartyCount.party(marisa_party_info)

So far 4


In [22]:
print(type(marisa_party_info.party))

<class 'method'>


__Your Turn__

- Now create an PartyCounter object with variable name: your_name_party_info.

In [85]:
## your code here

In [25]:
#__Solution__

class PartyCount:
    # count is an attribute - it contains data
    guests = 0
    # Add name attribute to the PartyCount object
    # Note that by default name is empty
    name = ''    
    
    def __init__(self, name):
        self.name = name
        print('{}: A PartyCount object constructed'.format(self.name))
        
    # party is a method of this class - it defines a procedure
    def party(self):
        self.guests = self.guests+1
        print("So far", self.guests)

In [27]:
marisa = PartyCount('Marisa')

Marisa: A PartyCount object constructed


In [86]:
## Note that we can access the attributes of an object.
## Also we can change them by assigning new values

## Inheritance

Another powerful feature of object-oriented programming is the ability to create a new class by extending an existing class. When extending a class, we call the original class the parent class and the new class the child class.

<img src="./img/inheritance.png" width="300" align='center'>


In [71]:
class SuperBowl(PartyCount):
    supporting_team = ''
    winning_team = 'Eagles'
    def __init__(self,supporting_team, name):
        self.name = name
        self.supporting_team = supporting_team
    def fun_factor(self):
        if self.supporting_team == self.winning_team:
            self.count +=2
        else:
            self.party()
    

In [72]:
murat = PartyCount('Murat')

Murat: A PartyCount object constructed


In [73]:
murat_super = SuperBowl('Eagles', 'Murat')

In [74]:
murat_super.fun_factor()

In [76]:
murat_super.count

2

## OOP in scikit learn

We are becomming more and more familiar with a series of methods with names such as fit or fit_transform.

After instantiating an instance of a Standard Scaler, Linear Regression model, or One Hot Encoder, we use fit to learn about the dataset and save what is learned. What is learned is saved in the attributes.

### Standard Scaler 

The standard scaler takes a series and, for each element, computes the absolute value of the difference from the point to the mean of the series, and divides by the standard deviation.

$\Large z = \frac{|x - \mu|}{s}$

What attributes and methods are available for a Standard Scaler object? Let's check out the code on [GitHub](https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/preprocessing/_data.py)!

## Attributes

### `.scale_`

In [28]:
from sklearn.preprocessing import StandardScaler
import numpy as np

# instantiate a standard scaler object
ss = StandardScaler()

# We can instantiate as many scaler objects as we want
maxs_scaler = StandardScaler()

In [29]:
# Let's create a dataframe with two series

series_1 = np.random.normal(3,1,1000)
print(series_1.mean())
print(series_1.std())

3.0123808304157174
0.9859443719776071


When we fit the standard scaler, it studies the object passed to it, and saves what is learned in its instance attributes

In [39]:
ss.fit(series_1.reshape(-1, 1))

# standard deviation is saved in the attribute scale_
ss.scale_

array([0.98594437])

In [40]:
# mean is saved into the attribute mean
ss.mean_

array([3.01238083])

In [41]:
# Knowledge Check

# What value should I put into the standard scaler to make the equality below return 0

ss.transform([])

ValueError: Expected 2D array, got 1D array instead:
array=[].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

In [78]:
#__SOLUTION__
ss.transform([ss.mean_])

array([[0.]])

In [79]:
# we can then use these attributes to transform objects
np.random.seed(42)
random_numbers = np.random.normal(3,1, 2)
random_numbers

array([3.49671415, 2.8617357 ])

In [80]:
ss.transform(random_numbers.reshape(-1,1))

array([[ 0.51708177],
       [-0.13592498]])

In [81]:
# We can also use a scaler on a DataFrame
series_1 = np.random.normal(3,1,1000)
series_2 = np.random.uniform(0,100, 1000)
df_2 = pd.DataFrame([series_1, series_2]).T
ss_df = StandardScaler()
ss_df.fit_transform(df_2)


array([[ 0.63918361, -1.63325007],
       [ 1.53240185,  1.50265028],
       [-0.260668  , -1.56258467],
       ...,
       [ 0.56254398, -1.61544876],
       [ 1.40620165, -1.36827099],
       [ 0.92178475, -0.56807826]])

In [82]:
ss_df.transform([[5, 50]])

array([[ 2.01911307, -0.00948621]])

## Practice With a Partner:

<img src="img/talking.jpeg" width="60" align='left'>

</br>
</br>
</br>

Another object that you will use often is OneHotEncoder from sklearn. It is recommended over pd.get_dummies() because it can trained, with the learned informed stored in the attributes of the object.

In [83]:
from sklearn.preprocessing import OneHotEncoder

In [84]:
np.random.seed(42)
# Let's create a dataframe that has days of the week and number of orders. 

days = np.random.choice(['m','t', 'w','th','f','s','su'], 1000)
orders = np.random.randint(0,1000,1000)

df = pd.DataFrame([days, orders]).T
df.columns = ['days', 'orders']
df.head()

Unnamed: 0,days,orders
0,su,758
1,th,105
2,f,562
3,su,80
4,w,132


Let's interact with an important parameters which we can pass when instantiating the OneHotEncoder object:` drop`.  

By dropping column, we avoid the [dummy variable trap](https://en.wikipedia.org/wiki/Dummy_variable_(statistics)).  

By passing `drop = True`, sklearn drops the first category it happens upon.  In this case, that is 'su'.  But what if we want to drop 'm'.  We can pass an array like object in as parameter to specify which column to drop.




In [85]:
# Instantiate the OHE object with a param that tells it to drop Monday
ohe = None

In [86]:
#__SOLUTION__
# Instantiate a OneHotEncoder object

ohe = OneHotEncoder(drop=['m'])

In [87]:
# Now, fit_transform the days column of the dataframe

ohe_array = None

In [316]:
#__SOLUTION__
ohe_matrix = ohe.fit_transform(df[['days']])

In [318]:
# look at __dict__ and checkout drop_idx_
# did it do what you wanted it to do?
ohe.__dict__

{'categories': 'auto',
 'sparse': True,
 'dtype': numpy.float64,
 'handle_unknown': 'error',
 'drop': array(['m'], dtype=object),
 'categories_': [array(['f', 'm', 's', 'su', 't', 'th', 'w'], dtype=object)],
 'drop_idx_': array([1])}

In [319]:
# check out the categories_ attribute
ohe.categories_

[array(['f', 'm', 's', 'su', 't', 'th', 'w'], dtype=object)]

In [320]:
# Check out the object itself
ohe_matrix

<1000x6 sparse matrix of type '<class 'numpy.float64'>'
	with 844 stored elements in Compressed Sparse Row format>

It is a sparse matrix, which is a matrix that is composed mostly of zeros

In [321]:
# We can convert it to an array like so
oh_df = pd.DataFrame.sparse.from_spmatrix(ohe_matrix)

In [322]:
# Now, using the categories_ attribute, set the column names to the correct days of the week
# you can use drop_idx_ for this as well



In [323]:
#__SOLUTION__
ohe_columns = list(ohe.categories_[0])
ohe_columns.pop(int(ohe.drop_idx_))
oh_df.columns = ohe_columns
oh_df.head()
oh_df.columns = ohe_columns
oh_df.head()

Unnamed: 0,f,s,su,t,th,w
0,0.0,0.0,1.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,1.0,0.0
2,1.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,1.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,1.0


In [326]:
# Add the onehotencoded columns to the original df, and drop the days column


In [325]:
#__SOLUTION__
# Now, add the onehotencoded columns to the original df, and drop the days column

df = df.join(oh_df).drop('days', axis=1)
df.head()

Unnamed: 0,orders,f,s,su,t,th,w
0,758,0.0,0.0,1.0,0.0,0.0,0.0
1,105,0.0,0.0,0.0,0.0,1.0,0.0
2,562,1.0,0.0,0.0,0.0,0.0,0.0
3,80,0.0,0.0,1.0,0.0,0.0,0.0
4,132,0.0,0.0,0.0,0.0,0.0,1.0
