## Common Errors Exploiting Python's Object-Orientation

In [1]:
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.model_selection import train_test_split
from seaborn import load_dataset
import pandas as pd

In [2]:
diamonds = load_dataset('diamonds')
diamonds.head()

Unnamed: 0,carat,cut,color,clarity,depth,table,price,x,y,z
0,0.23,Ideal,E,SI2,61.5,55.0,326,3.95,3.98,2.43
1,0.21,Premium,E,SI1,59.8,61.0,326,3.89,3.84,2.31
2,0.23,Good,E,VS1,56.9,65.0,327,4.05,4.07,2.31
3,0.29,Premium,I,VS2,62.4,58.0,334,4.2,4.23,2.63
4,0.31,Good,J,SI2,63.3,58.0,335,4.34,4.35,2.75


In [3]:
X = diamonds.drop('carat', axis=1)
y = diamonds['carat']

In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    random_state=42)

In [5]:
diamond_nums = X_train.select_dtypes(float)
diamond_cats = X_train.select_dtypes(object)

**Failure to Instantiate**

In [6]:
ss = StandardScaler
ohe = OneHotEncoder

In [7]:
ss.fit(diamond_nums)

TypeError: fit() missing 1 required positional argument: 'X'

In [8]:
ohe.fit(diamond_cats)

TypeError: fit() missing 1 required positional argument: 'X'

**Mistaking the Fitted Object for Something New**

In [9]:
scaler = StandardScaler()

In [10]:
diamond_nums_scaled = scaler.fit(diamond_nums)

In [11]:
diamond_nums_scaled == scaler

True

In [12]:
encoder = OneHotEncoder()
diamond_cats_encoded = encoder.fit(diamond_cats)

In [13]:
diamond_cats_encoded == encoder

True

**Trying to Access Functionality That Doesn't Exist Pre-Fitting**

In [14]:
scaler = StandardScaler()

In [15]:
scaler.mean_

AttributeError: 'StandardScaler' object has no attribute 'mean_'

In [16]:
scaler.fit(diamond_nums)
scaler.mean_

array([61.7449586 , 57.45970337,  5.73319738,  5.73656977,  3.53968335])

In [17]:
encoder = OneHotEncoder()

In [18]:
encoder.get_feature_names()

NotFittedError: This OneHotEncoder instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

**Misunderstanding What Object Methods Return**

In [19]:
scaler = StandardScaler()
diamond_nums_scaled = scaler.fit_transform(diamond_nums)
diamond_nums_scaled.mean_

AttributeError: 'numpy.ndarray' object has no attribute 'mean_'

In [20]:
diamond_nums_scaled

array([[ 2.20783668,  0.24241403, -1.58998506, -1.54444639, -1.36581585],
       [ 0.03851691, -0.65492279,  0.27356006,  0.29150568,  0.28214948],
       [-0.4513295 ,  0.24241403,  0.73721722,  0.67618135,  0.63427882],
       ...,
       [-1.01115395,  0.24241403, -1.10849493, -1.11605757, -1.18270859],
       [ 0.73829748,  0.69108244,  0.35380841,  0.25653516,  0.39483087],
       [-0.9411759 ,  0.24241403,  0.9690458 ,  0.92097496,  0.80330091]])

In [21]:
encoder = OneHotEncoder()
diamond_cats_encoded = encoder.fit_transform(diamond_cats)
diamond_cats_encoded['x0_Ideal']

IndexError: invalid index

In [22]:
diamond_cats_encoded

<40455x20 sparse matrix of type '<class 'numpy.float64'>'
	with 121365 stored elements in Compressed Sparse Row format>

In [23]:
diamond_cats_encoded.todense()['x0_Ideal']

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

In [24]:
pd.DataFrame(diamond_cats_encoded.todense(),
             index=diamond_cats.index,
             columns=encoder.get_feature_names())['x0_Ideal']

35965    0.0
52281    1.0
6957     0.0
9163     1.0
50598    1.0
        ... 
11284    0.0
44732    1.0
38158    0.0
860      0.0
15795    0.0
Name: x0_Ideal, Length: 40455, dtype: float64