# Sect 21: Object-Oriented Programming 

- online-ds-ft-070620
- 09/09/20

## Questions?


- [Inheritance Lab](https://learn.co/tracks/module-3-data-science-career-2-1/appendix/more-oop/inheritance-lab) 
    - why do only some of the parameters in the animal class go here: def __init__(self, name, weight):

- When to create a class vs. when to just write functions?


## Topics

- **OOP-Vocabulary**
- **Defining/Initializing Classes**
- **Inspecting classes:**
    - `help(obj)` vs `dir(obj)`
    
- **Deeper dive into Classes/Objects**
    - special methods/properties (`__repr__(),__str__(),__call__(),__version__(),__name__()`)
    - Methods: vs Bound Methods vs Static Methods 

# What does it mean to be 'Object-Oriented'?

> ### ___"Everything is an object."___
- some Python sensei


In [None]:
var = 13
var

In [None]:
type(var)

In [None]:
help(int)

In [None]:
## Functions are objects too
prove_it = max
prove_it([0,11,13])

In [None]:
type(prove_it)

# OOP VOCABULARY


**VOCAB RELATED TO FUNCTIONS:**
- **Function:**  

    - Parameters: 
    - Argument: 
        - Positional Argument:
        - Keyword/default Arguments:

- "Calling" a function: 

<br><br>

**VOCAB RELATED TO CLASSES:**
- "Object": 
- **Class:** 
- Instance: 
- Attribute:
- Method:
- Private Attributes/Methods: 
- Getters/Setters:


- Object: 

- "dunders" = double underscores __ 

# Defining and Initializing Classes


- Use `class NewClassName():` like you use `def function_name():` for functions.
    - the `()` are optional for classes. (used to inherit other classes, more on that later)

#### Naming Classes
    
- Convention for naming classes = `UpperCamelCase`
- Convention for naming function = `snake_case`

In [None]:
## Bare minimum to define a class.
class Car:
    pass

In [None]:
rav4 = Car
rav4

In [None]:
## Instances are made by calling a class
rav4 = Car()
rav4

In [None]:
type(rav4)

In [None]:
## Instances are NOT == the Class
rav4==Car

In [None]:
## But can check if an obj is an instance of Car
isinstance(rav4,Car)

## Attributes and Methods

- Attribute:

- Method:

In [None]:
class Car:
    """Automotive object"""
    ## Attributes
    wheels = 4                     
    moving = False
    doors = 2
    
    ## Methods
    def go():                  
        print('It\'s going!')
        self.moving = True
    
    def stop():
        print('Stopped.')
        self.moving = False

In [None]:
## Make an instance and run .go(), does it work??
rav4 = Car()
rav4.go()

### Know thy `self`

- Because Methods are designed to operate on the `object_its_attached_to`, Python automatically gives every method a copy of instance its attached to, which we call `self`
- We have to pass `self` as the first parameter for every method we make.
    - `def method(self):`
- Otherwise it will think that the first thing we give it is actually itself. This will cause an *existential crisis** and corresponding error.

In [None]:
class Car:
    """Automotive object"""
    ## Attributes
    wheels = 4                     
    moving = False
    doors = 2
    
    ## Methods
    def go(self):                  
        print('It\'s going!')
        self.moving = True
    
    def stop(self):
        print('Stopped.')
        self.moving = False

In [None]:
rav4 = Car()
rav4.go()

## Initialization 


- We create an instance by setting a `instance = ClassName()`
-  This uses the template `ClassName` to create an instance of the class ( which we named `instance`)

In [None]:
lamborghini = Car()
lamborghini

In [None]:
lamborghini.wheels

In [None]:
lamborghini.doors

In [None]:
lamborghini.moving

In [None]:
lamborghini.go()

In [None]:
lamborghini.moving

In [None]:
lamborghini.stop()

In [None]:
lamborghini.moving

### `__init__`

> - What if we don't want to set the attributes in stone for every Car but want to let the programmer determine that whenever a new Car is made?

> - When an instance is `initialized`, we `call` it using `()`, which runs a default `__init__()` method.

In [None]:
class Car():
    """Automotive object"""
    ## Attributes
    moving = False

    ## Methods
    def __init__(self,wheels,doors):
        self.wheels = wheels
        self.doors = doors
    
    
    def go(self):                   # These are methods we can call on *any* car.
        print('It\'s going!')
        self.moving = True
    
    def stop(self):
        print('Stopped.')
        self.moving = False

In [None]:
## We should get an erorr about missing positional arguments
lamborghini = Car()

In [None]:
## We must provide any arguments for __init__ when we create an instance
lamborghini = Car(wheels=4,doors=2)
print(lamborghini.wheels)
lamborghini.doors

In [None]:
rav4 = Car(wheels=4,doors=2)
print(rav4.wheels)
rav4.doors

## Inheritance

- Define a Class based on another class by passing the class to inherit from as a parameter:

In [None]:
class Truck(Car):
    pass

In [None]:
f150 = Truck(4,2)
f150

### What did you inherit?    

- To view all of the attributes and methods of a class, **use the help() command**
    -  Note: There is often ***information in `help()` that you may not be able to find ANYWHERE else*** and does not show up in documentation.

#### Peeking Under the Hood: `help` and `dir`

In [None]:
help(f150)

In [None]:
dir(f150)

# Special Class Methods

#### Magic Methods

It is common for a class to have magic methods. These are identifiable by the "dunder" (i.e. **d**ouble **under**score) prefixes and suffixes, such as `__init__()`. These methods will get called **automatically**, as we'll see below.

For more on these "magic methods", see [here](https://www.geeksforgeeks.org/dunder-magic-methods-python/).

## Using special methods to control the output of a class

### `__repr__()` controls display when final element of a cell (or when display is used)

In [None]:
class Car():
    """Automotive object"""
    ## Attributes
    moving = False

    ## Methods
    def __init__(self,wheels=4,doors=4):
        self.wheels = wheels
        self.doors = doors
    
    
    def go(self):                   # These are methods we can call on *any* car.
        print('It\'s going!')
        self.moving = True
    
    def stop(self):
        print('Stopped.')
        self.moving = False
        
    def __repr__(self):
        info = [f"- Wheels: {self.wheels}"]
        info.append(f"- Doors: {self.doors}")
        info.append(f"- Moving?: {self.moving}")
        return '\n'.join(info)

In [None]:
rav4 = Car()
rav4

In [None]:
rav4.go()

In [None]:
rav4

### `__str__()` controls whats displayed when an object is printed

In [None]:
class Car():
    """Automotive object"""
    ## Attributes
    moving = False

    ## Methods
    def __init__(self,wheels=4,doors=4):
        self.wheels = wheels
        self.doors = doors
    
    
    def go(self):                   # These are methods we can call on *any* car.
        print('It\'s going!')
        self.moving = True
    
    def stop(self):
        print('Stopped.')
        self.moving = False
        
    def __repr__(self):
        info = [f"- Wheels: {self.wheels}"]
        info.append(f"- Doors: {self.doors}")
        info.append(f"- Moving?: {self.moving}")
        return '\n'.join(info)
    
    def __str__(self):
        return f"""- This car has {self.wheels} wheels, {self.doors} doors, and moving = {self.moving}"""

In [None]:
rav4=Car()
rav4

In [None]:
print(rav4)

### `__repr__()` vs. `__str__()`

`__repr__()` and `__str__()` are both designed to return string-representations of the object. But `__repr__()` focuses on minimizing ambiguity while `__str__()` focuses on readability. However, if your class has no `__str__()` method, it will fall back on `__repr__()` (if it exists!). For more on this distinction, see [this post](https://dbader.org/blog/python-repr-vs-str).

# Scikit Learn Objects

In [2]:
# ## Getting the dataset ready
# from fsds.imports import *
# df= fs.datasets.load_mod1_proj(read_csv_kwds={'na_values':'?'})#load_iowa_prisoners()

# df.fillna(0,inplace=True)

# drop_cols= ['id','date'] #[col for col in df.columns if 'New' in col]
# # drop_cols.append('Days to Recidivism')
# df.drop(columns=drop_cols,inplace=True)
# df.head()


In [39]:
## Getting the dataset ready
from fsds.imports import *
df= fs.datasets.load_iowa_prisoners()

df.fillna('MISSING',inplace=True)

drop_cols= [col for col in df.columns if 'New' in col]
drop_cols.append('Days to Recidivism')
df.drop(columns=drop_cols,inplace=True)
df.head()


Unnamed: 0,Fiscal Year Released,Recidivism Reporting Year,Race - Ethnicity,Age At Release,Convicting Offense Classification,Convicting Offense Type,Convicting Offense Subtype,Release Type,Main Supervising District,Recidivism - Return to Prison,Part of Target Population,Recidivism Type,Sex
0,2010,2013,Black - Non-Hispanic,25-34,C Felony,Violent,Robbery,Parole,7JD,Yes,Yes,New,Male
1,2010,2013,White - Non-Hispanic,25-34,D Felony,Property,Theft,Discharged – End of Sentence,MISSING,Yes,No,Tech,Male
2,2010,2013,White - Non-Hispanic,35-44,B Felony,Drug,Trafficking,Parole,5JD,Yes,Yes,Tech,Male
3,2010,2013,White - Non-Hispanic,25-34,B Felony,Other,Other Criminal,Parole,6JD,No,Yes,No Recidivism,Male
4,2010,2013,Black - Non-Hispanic,35-44,D Felony,Violent,Assault,Discharged – End of Sentence,MISSING,Yes,No,Tech,Male


In [40]:
df.describe()

Unnamed: 0,Fiscal Year Released,Recidivism Reporting Year
count,26020.0,26020.0
mean,2012.600769,2015.600769
std,1.661028,1.661028
min,2010.0,2013.0
25%,2011.0,2014.0
50%,2013.0,2016.0
75%,2014.0,2017.0
max,2015.0,2018.0


In [41]:
# change_dtypes = {'zipcode':'object'}
# for col, dtype in change_dtypes.items():
#     df[col] = df[col].astype(dtype)
# df.info()

In [42]:
## Sklearn Classes
from sklearn.preprocessing import LabelEncoder,OneHotEncoder, StandardScaler

scaler = StandardScaler()

In [43]:
df['scaled_year_released'] = scaler.fit_transform(df[['Fiscal Year Released']])
df[['scaled_year_released','Fiscal Year Released']]

Unnamed: 0,scaled_year_released,Fiscal Year Released
0,-1.565789,2010
1,-1.565789,2010
2,-1.565789,2010
3,-1.565789,2010
4,-1.565789,2010
...,...,...
26015,1.444453,2015
26016,1.444453,2015
26017,1.444453,2015
26018,1.444453,2015


In [44]:
scaler.inverse_transform(df['scaled_year_released'])

array([2010., 2010., 2010., ..., 2015., 2015., 2015.])

In [45]:
cat_cols = df.select_dtypes('object').columns
cat_cols

Index(['Race - Ethnicity', 'Age At Release ',
       'Convicting Offense Classification', 'Convicting Offense Type',
       'Convicting Offense Subtype', 'Release Type',
       'Main Supervising District', 'Recidivism - Return to Prison',
       'Part of Target Population', 'Recidivism Type', 'Sex'],
      dtype='object')

In [46]:
## Intiialize an encoder
encoder = OneHotEncoder(sparse=False,drop='if_binary')
encoder

OneHotEncoder(drop='if_binary', sparse=False)

In [47]:
ohe_data = encoder.fit_transform(df[cat_cols])
ohe_data

array([[0., 0., 0., ..., 0., 0., 1.],
       [0., 0., 0., ..., 0., 0., 1.],
       [0., 0., 0., ..., 0., 0., 1.],
       ...,
       [0., 0., 0., ..., 1., 0., 0.],
       [0., 0., 0., ..., 0., 0., 1.],
       [0., 0., 0., ..., 0., 0., 1.]])

In [48]:
pd.DataFrame(ohe_data)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,86,87,88,89,90,91,92,93,94,95
0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0
4,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
26015,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0
26016,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0
26017,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0
26018,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0


In [49]:
encoder.get_feature_names(cat_cols)

array(['Race - Ethnicity_American Indian or Alaska Native - Hispanic',
       'Race - Ethnicity_American Indian or Alaska Native - Non-Hispanic',
       'Race - Ethnicity_Asian or Pacific Islander - Hispanic',
       'Race - Ethnicity_Asian or Pacific Islander - Non-Hispanic',
       'Race - Ethnicity_Black -', 'Race - Ethnicity_Black - Hispanic',
       'Race - Ethnicity_Black - Non-Hispanic',
       'Race - Ethnicity_MISSING', 'Race - Ethnicity_N/A -',
       'Race - Ethnicity_White -', 'Race - Ethnicity_White - Hispanic',
       'Race - Ethnicity_White - Non-Hispanic', 'Age At Release _25-34',
       'Age At Release _35-44', 'Age At Release _45-54',
       'Age At Release _55 and Older', 'Age At Release _MISSING',
       'Age At Release _Under 25',
       'Convicting Offense Classification_A Felony',
       'Convicting Offense Classification_Aggravated Misdemeanor',
       'Convicting Offense Classification_B Felony',
       'Convicting Offense Classification_C Felony',
       'Conv

In [50]:
df_ohe = pd.DataFrame(ohe_data,columns=encoder.get_feature_names(cat_cols))
df_ohe

Unnamed: 0,Race - Ethnicity_American Indian or Alaska Native - Hispanic,Race - Ethnicity_American Indian or Alaska Native - Non-Hispanic,Race - Ethnicity_Asian or Pacific Islander - Hispanic,Race - Ethnicity_Asian or Pacific Islander - Non-Hispanic,Race - Ethnicity_Black -,Race - Ethnicity_Black - Hispanic,Race - Ethnicity_Black - Non-Hispanic,Race - Ethnicity_MISSING,Race - Ethnicity_N/A -,Race - Ethnicity_White -,...,Main Supervising District_Interstate Compact,Main Supervising District_MISSING,Recidivism - Return to Prison_Yes,Part of Target Population_Yes,Recidivism Type_New,Recidivism Type_No Recidivism,Recidivism Type_Tech,Sex_Female,Sex_MISSING,Sex_Male
0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0
4,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
26015,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0
26016,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0
26017,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0
26018,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0


In [51]:
original_column = encoder.inverse_transform(df_ohe)
original_column


array([['Black - Non-Hispanic', '25-34', 'C Felony', ..., 'Yes', 'New',
        'Male'],
       ['White - Non-Hispanic', '25-34', 'D Felony', ..., 'No', 'Tech',
        'Male'],
       ['White - Non-Hispanic', '35-44', 'B Felony', ..., 'Yes', 'Tech',
        'Male'],
       ...,
       ['White - Non-Hispanic', '25-34', 'Aggravated Misdemeanor', ...,
        'No', 'No Recidivism', 'Female'],
       ['White - Non-Hispanic', '25-34', 'D Felony', ..., 'Yes',
        'No Recidivism', 'Male'],
       ['White - Non-Hispanic', '35-44', 'D Felony', ..., 'Yes', 'Tech',
        'Male']], dtype=object)

In [53]:
df_model = pd.concat([df.drop(columns=cat_cols),df_ohe,],axis=1)
df_model

Unnamed: 0,Fiscal Year Released,Recidivism Reporting Year,scaled_year_released,Race - Ethnicity_American Indian or Alaska Native - Hispanic,Race - Ethnicity_American Indian or Alaska Native - Non-Hispanic,Race - Ethnicity_Asian or Pacific Islander - Hispanic,Race - Ethnicity_Asian or Pacific Islander - Non-Hispanic,Race - Ethnicity_Black -,Race - Ethnicity_Black - Hispanic,Race - Ethnicity_Black - Non-Hispanic,...,Main Supervising District_Interstate Compact,Main Supervising District_MISSING,Recidivism - Return to Prison_Yes,Part of Target Population_Yes,Recidivism Type_New,Recidivism Type_No Recidivism,Recidivism Type_Tech,Sex_Female,Sex_MISSING,Sex_Male
0,2010,2013,-1.565789,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0
1,2010,2013,-1.565789,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
2,2010,2013,-1.565789,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0
3,2010,2013,-1.565789,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0
4,2010,2013,-1.565789,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
26015,2015,2018,1.444453,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0
26016,2015,2018,1.444453,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0
26017,2015,2018,1.444453,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0
26018,2015,2018,1.444453,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0


In [56]:
target = 'Recidivism - Return to Prison_Yes'
y = df_model[target].copy()
X = df_model.drop(columns=target)

# ACTIVITY

## Activity Option 1: Construct a Timer Class

In [None]:
import tzlocal
import datetime as dt
# tzlocal.get_localzone()

In [None]:
#dt.datetime.now()
print(dt.datetime.now())

In [None]:
# fs.quick_refs.ts_date_str_formatting()

In [64]:
class Timer:
    
    def __init__(self,fmt='%m/%d/%Y - %I:%M:%S %p',start=True,label=''):
        import tzlocal
        import datetime as dt
        
        self._tz = tzlocal.get_localzone()
        self._created_at =dt.datetime.now(self._tz)
        self._fmt = fmt
        if start==True:
            self.start(label=label)
        
        
    def _get_time(self):
        import datetime as dt
        return dt.datetime.now(self._tz)
        
        
    def start(self,label=''):
        self._start = self._get_time()
        self._start_label = label
        
        print(f'[i] Timer started at {self._start.strftime(self._fmt)}')
        if len(label)>0:
            print(f'\t- Process running: {label}')
        
    
    def stop(self,label=''):
        
        self._stop = self._get_time()
        elapsed = self._stop - self._start
        
        print(f'[i] Timer stopped at {self._stop.strftime(self._fmt)}')
        print(f"\t- The process {label} took {elapsed}.")
        print(f"\t- The process {label} took {elapsed}.")
        
    def __call__(self):
        print(self._get_time())
        
    


In [65]:
timer = Timer()

[i] Timer started at 09/09/2020 - 11:47:38 AM


In [66]:
# dir(timer)
timer.start('Testing this thing')

[i] Timer started at 09/09/2020 - 11:47:38 AM
	- Process running: Testing this thing


In [67]:
timer.stop()#@'Testing this other thing')

[i] Timer stopped at 09/09/2020 - 11:47:38 AM
	- The process  took 0:00:00.299843.
	- The process  took 0:00:00.299843.


In [68]:
timer()

2020-09-09 11:47:39.033393-04:00


### Running the Model with the Timer

In [69]:
from sklearn.model_selection import train_test_split


In [70]:
# target='Recidivism - Return to Prison'
# y = df[target].copy()
# X = df.drop(target,axis=1).copy()

X_train, X_test,y_train,y_test = train_test_split(X,y)

In [71]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier 
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score
import warnings
warnings.filterwarnings('ignore')

In [73]:
tree = RandomForestClassifier()#DecisionTreeClassifier( )
params = {'max_depth':[0,4,6,10,20],
#          'min_samples_leaf':[2,4,5,9,20],
         'criterion':['entropy','gini']}
grid = GridSearchCV(tree,params)

timer =Timer(start=True,label='Training Decision Tree Classifier')

grid.fit(X_train, y_train)

y_hat_test = grid.predict(X_test)
acc = accuracy_score(y_test,y_hat_test)
timer.stop(f'- Training complete. Accuracy = {acc}')

[i] Timer started at 09/09/2020 - 11:48:24 AM
	- Process running: Training Decision Tree Classifier
[i] Timer stopped at 09/09/2020 - 11:48:45 AM
	- The process - Training complete. Accuracy = 1.0 took 0:00:20.932559.
	- The process - Training complete. Accuracy = 1.0 took 0:00:20.932559.


# APPENDIX

## Dictionaries & Dictionary Methods

- Iterating throught a dict:
    - `dict.items()`
    - `dict.keys()`
    - `dict.values()`
    - `**dict` vs `*dict`

- Retrieving Value:
    - `dict.get(k)` vs `dict[k]`

- Removing / Extracting Entries
    - `dict.pop(k)` vs `del dict[k]`
    - `dict.clear()`
    
- Merging Dictionaries:
    - `d1.update(d2)`
        - for every (k,v) in d2"
            - if k is NOT in d1, insert (k,v) into d1
            - if k IS in d1, updates value of k in d1
    - Use `**` operator:
        - `combined_d = {**d1,**d2}`
    
- Updating Dictionaries
    - `d1.update(key1=new_value1,new_key2=new_value2)`

- Setting Dictionary Values
    - `dict[k] = 5`
    - `dict.setdefault(k,5)`


## Decorators with Classes
#### Some special decorators used in classes.

1. `@staticmethod`:
    - Defines a method that does not get passed `self` when its called and can act on external code as if it was a function, not a "`bound method`"
2. `@classmethod`:
    - Specifies a method that should always refer to the default method spelled out in the class definition, NOT the version of it that is stored inside the **instance** of a method.
3. `@property`: (see example class `EncryptedPassword` below.)
    - Specifies that a function is going to determine the value of the `class.property`:
    - Essentially replaces the property name with a getter function to determine that value.
    - Use '@property.setter' above another function to define it as the setter function. 

### Vocab (completed)

- "Object" is an instance of a template class that currently exists in memory
- "Calling" a function: 
    - When we use `( )` with a function we are calling it.

- **Function:**  Codes that maniuplates data in a useful way. 

- Parameters: the defined data/varaibles that are passed accepted by a function
- Argument: the actual variable/value passed in for a parameter
- Positional Argument:
    - The first arguments required
    - their id is determined by their order
- Keyword/default Arguments:
    - arugments that have a defined default value
    - must come after positional arguments

<br><br>
- **Class:** Template/blue print.
- Instance: Ab object built from the class blueprint
- Attribute: A variable stored inside an object. 
- Method: Functions are stored inside an object.
    - Objects always pass themselves into a method, so we used `self` to account for this.
- Private Attributes/Methods: they start with _ and are hidden from the user. They can be updated using getting and setting functions.
- Getters/Setters:
    - Methods for retreiving or changing private attributes

- Object: 

- "dunders" = double underscores __ 

In [1]:
# class Car():
#     """Automotive object"""
#     ## Attributes
#     moving = False

#     ## Methods
#     def __init__(self,wheels=4,doors=4):
#         self.wheels = wheels
#         self.doors = doors
    
    
#     def go(self):                   # These are methods we can call on *any* car.
#         print('It\'s going!')
#         self.moving = True
    
#     def stop(self):
#         print('Stopped.')
#         self.moving = False
        
#     def __repr__(self):
#         info = [f"- Wheels: {self.wheels}"]
#         info.append(f"- Doors: {self.doors}")
#         info.append(f"- Moving?: {self.moving}")
#         return '\n'.join(info)
    
#     def __str__(self):
#         return f"""- This car has {self.wheels} wheels, {self.doors} doors, and moving = {self.moving}"""

## Activity Option 2: OutlierRemover

In [None]:
from fsds.imports import *
df = fs.datasets.load_mod1_proj(read_csv_kwds={'na_values':'?'})
# keep_cols = ['price','bedrooms','bathrooms','sqft_living']
drop_cols= ['id','lat','long','date']
df = df.drop(columns=drop_cols)
df.head()

In [None]:
df.describe().round()

In [None]:
from scipy import stats
def find_outliers_z(data):
    """Finds outliers using the z-score rule with a cutoff of 3.
    Returns a T/F for every row if > 3 z-scores away from the mean."""
    z_data = np.abs(stats.zscore(data))
    idx_outliers = z_data > 3
    return idx_outliers

def find_outliers_IQR(data):
    """Finds outliers using the IQR threshold removal method.
    Returns a T/F for every row if:
    - data is less than  1.5 * IQR  below Q1
    - OR data is more than  1.5 * IQR  above Q3"""
    q1 = np.quantile(data,.25)
    q3 = np.quantile(data,.75)
    IQR_threshold = (q3-q1) * 1.5
    idx_outliers = (data < q1-IQR_threshold) | (data>q3+IQR_threshold)
    return idx_outliers

In [None]:
print(find_outliers_z(df['price']).sum())
find_outliers_IQR(df['price']).sum()

In [None]:
class OutlierRemover():
    def __init__(self,method='z'):
        self.method = method
        
        # Setting the Method for Removal
        if self.method.lower() == 'z':
            outlier_func = self._find_outliers_z
            
        elif self.method.lower()=='iqr': 
            outlier_func = self._find_outliers_IQR
            
        else:
            raise Exception(f"Unknown outlier removal method: {self.method}")
            
        self.find_outliers = outlier_func
        
    
    def _find_outliers_z(self,data):
        """Finds outliers using the z-score rule with a cutoff of 3.
        Returns a T/F for every row if > 3 z-scores away from the mean."""
        z_data = np.abs(stats.zscore(data))
        idx_outliers = z_data > 3
        return idx_outliers

    def _find_outliers_IQR(self,data):
        """Finds outliers using the IQR threshold removal method.
        Returns a T/F for every row if:
        - data is less than  1.5 * IQR  below Q1
        - OR data is more than  1.5 * IQR  above Q3"""
        q1 = np.quantile(data,.25)
        q3 = np.quantile(data,.75)
        IQR_threshold = (q3-q1) * 1.5
        idx_outliers = (data < q1-IQR_threshold) | (data>q3+IQR_threshold)
        return idx_outliers
    
    def get_outliers(self,data,columns=None):
        if columns is None:
            columns = data.columns
            
        outliers = {}
        for col in columns:
            outliers[col] = self.find_outliers(data[col])
        setattr(self,'outlier_df',pd.DataFrame(outliers, index=data.index))
#         self.outlier_df = pd.DataFrame(outliers, index=data.index)
        setattr(self,'outlier_index',self.outlier_df.any(axis=1))
        return outliers
    
    def 
    


In [None]:
remover = OutlierRemover()
remover.get_outliers(df)
remover

In [None]:
help(remover)

In [None]:
df[remover.outlier_index].boxplot()

In [None]:
df[~remover.outlier_index].boxplot()

In [None]:
# class OutlierRemover():
#     def __init__(self,method='z'):
#         self.method = method
    

        
#     def _find_outliers_z(self,data=None):
#         """Finds outliers using the z-score rule with a cutoff of 3.
#         Returns a T/F for every row if > 3 z-scores away from the mean."""
#         if data is None:
#             data = self.data
#         z_data = np.abs(stats.zscore(data))
#         idx_outliers = z_data > 3
#         return idx_outliers
        
    
#     def _find_outliers_IQR(self,data=None):
#         """Finds outliers using the IQR threshold removal method.
#         Returns a T/F for every row if:
#         - data is less than  1.5 * IQR  below Q1
#         - OR data is more than  1.5 * IQR  above Q3"""
#         if data is None:
#             data = self.data
#         q1 = np.quantile(data,.25)
#         q3 = np.quantile(data,.75)
#         IQR_threshold = (q3-q1) * 1.5
#         idx_outliers = (data < q1-IQR_threshold) | (data>q3+IQR_threshold)
#         return idx_outliers
    
    
#     def fit(self,df,columns=None):
#         ## Save data
#         self.data = df.copy()
        
#         ## Columns List
#         if columns is None:
#             columns = df.columns.tolist()
            
#         ## Setting the Method for Removal
#         if self.method.lower() == 'z':
#             outlier_func = self._find_outliers_z
            
#         elif self.method.lower()=='iqr': 
#             outlier_func = self._find_outliers_IQR
#         else:
#             raise Exception(f"Unknown outlier removal method: {self.method}")

#         ## Empty dict of outliers
#         outliers = {}
#         for col in columns:
#             outliers[col] = pd.Series(outlier_func(self.data[col]),
#                                       index=df.index)
        
#         ## Calc all outliers.
#         outliers_df = pd.DataFrame(outliers,index=df.index)#,columns=columns)
#         outliers_df['total'] = outliers_df.any(axis=1)
#         self.index = outliers_df['total']
        
#         self.outliers = outliers            
#         self.outliers_df = outliers_df
        
        
#     def transform(self):
#         return self.data[~self.index]


# remover = OutlierRemover()
# remover.fit(df)
# remover#outliers['price']
# # df[remover.outliers['price']]
        

In [None]:
# class OutlierRemover():
#     def __init__(self,method='z'):
#         self.method = method
    

        
#     def _find_outliers_z(self,data=None):
#         """Finds outliers using the z-score rule with a cutoff of 3.
#         Returns a T/F for every row if > 3 z-scores away from the mean."""
#         if data is None:
#             data = self.data
#         z_data = np.abs(stats.zscore(data))
#         idx_outliers = z_data > 3
#         return idx_outliers
        
    
#     def _find_outliers_IQR(self,data=None):
#         """Finds outliers using the IQR threshold removal method.
#         Returns a T/F for every row if:
#         - data is less than  1.5 * IQR  below Q1
#         - OR data is more than  1.5 * IQR  above Q3"""
#         if data is None:
#             data = self.data
#         q1 = np.quantile(data,.25)
#         q3 = np.quantile(data,.75)
#         IQR_threshold = (q3-q1) * 1.5
#         idx_outliers = (data < q1-IQR_threshold) | (data>q3+IQR_threshold)
#         return idx_outliers
    
    
#     def fit(self,df,columns=None):
#         ## Save data
#         self.data = df.copy()
        
#         ## Columns List
#         if columns is None:
#             columns = df.columns.tolist()
            
#         ## Setting the Method for Removal
#         if self.method.lower() == 'z':
#             outlier_func = self._find_outliers_z
            
#         elif self.method.lower()=='iqr': 
#             outlier_func = self._find_outliers_IQR
#         else:
#             raise Exception(f"Unknown outlier removal method: {self.method}")

#         ## Empty dict of outliers
#         outliers = {}
#         for col in columns:
#             outliers[col] = pd.Series(outlier_func(self.data[col]),
#                                       index=df.index)
        
#         ## Calc all outliers.
#         outliers_df = pd.DataFrame(outliers,index=df.index)#,columns=columns)
#         outliers_df['total'] = outliers_df.any(axis=1)
#         self.index = outliers_df['total']
        
#         self.outliers = outliers            
#         self.outliers_df = outliers_df
        
        
#     def transform(self):
#         return self.data[~self.index]
# #     def find_outliers(self,remove_from=None):
# #         if remove_from is None:
# #             remove_from = self.columns
        
# #         if self.method.lower() == 'z':
# #             outlier_func = self._find_outliers_z
# #         else: 
# #             outlier_func = self._find_outliers_IQR

# #         idx_outliers = {}
# #         for col in remove_from:
# #             idx_outliers[col] = outlier_func(self.data[col])

# #         self.index = idx_outliers

# remover = OutlierRemover()
# remover.fit(df)
# remover#outliers['price']
# # df[remover.outliers['price']]
        

In [None]:
# df[remover.index]

In [None]:
# remover.transform()

In [None]:
remover = OutlierRemover(df)
# remover.find_outliers()
# remover.index