Create split_scale.py that will contain the functions that follow. Each scaler function should create the object, fit and transform both train and test. They should return the scaler, train dataframe scaled, test dataframe scaled. Be sure your indices represent the original indices from train/test, as those represent the indices from the original dataframe. Be sure to set a random state where applicable for reproducibility!

In [1]:
from wrangle import wrangle_telco
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, QuantileTransformer, PowerTransformer, RobustScaler, MinMaxScaler

In [2]:
customers = wrangle_telco()

In [3]:
customers.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1685 entries, 0 to 1694
Data columns (total 4 columns):
customer_id        1685 non-null object
monthly_charges    1685 non-null float64
tenure             1685 non-null int64
total_charges      1685 non-null float64
dtypes: float64(2), int64(1), object(1)
memory usage: 65.8+ KB


In [4]:
X = customers[["tenure", "monthly_charges"]]
y = customers["total_charges"]

X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=.2, random_state = 123)

print('X_train:', X_train.shape)
print('X_test:', X_test.shape)
print('y_train:', y_train.shape)
print('y_test:', y_test.shape)

X_train: (1348, 2)
X_test: (337, 2)
y_train: (1348,)
y_test: (337,)


In [5]:
X_train

Unnamed: 0,tenure,monthly_charges
119,70,75.50
1424,55,20.30
385,65,109.05
1140,70,98.30
1504,71,116.25
...,...,...
1131,65,19.85
1356,63,19.55
1416,7,19.85
1399,35,69.15


Now we transform the data

**Standard_scaler**

In [6]:
scaler = StandardScaler().fit(X_train, y_train)

X_train_scaled = pd.DataFrame(scaler.transform(X_train), columns=X_train.columns.values).set_index([X_train.index.values])

X_test_scaled = pd.DataFrame(scaler.transform(X_test), columns=X_test.columns.values).set_index([X_test.index.values])

In [7]:
X_train

Unnamed: 0,tenure,monthly_charges
119,70,75.50
1424,55,20.30
385,65,109.05
1140,70,98.30
1504,71,116.25
...,...,...
1131,65,19.85
1356,63,19.55
1416,7,19.85
1399,35,69.15


In [8]:
import split_scale

In [9]:
train, test, _, _ = split_scale.split_my_data(X, y, .8)

In [10]:
scaler, train_scaled, test_scaled = split_scale.standard_scaler(train, test)

In [11]:
scaler

StandardScaler(copy=True, with_mean=True, with_std=True)

In [12]:
scaler, train_scaled, test_scaled = split_scale.gaussian_scaler(train, test, False)

In [13]:
train_scaled

Unnamed: 0,tenure,monthly_charges
119,0.813319,0.562438
1424,-0.318595,-1.262577
385,0.400918,1.253310
1140,0.813319,1.048691
1504,0.900066,1.383335
...,...,...
1131,0.400918,-1.286668
1356,0.245852,-1.302920
1416,-1.954336,-1.286668
1399,-1.350400,0.410805


In [14]:
og_train, og_test = split_scale.scale_inverse(scaler, train_scaled, test_scaled)

In [17]:
og_train

Unnamed: 0,tenure,monthly_charges
119,70.0,75.50
1424,55.0,20.30
385,65.0,109.05
1140,70.0,98.30
1504,71.0,116.25
...,...,...
1131,65.0,19.85
1356,63.0,19.55
1416,7.0,19.85
1399,35.0,69.15


In [15]:
X_train, X_test, y_train, y_test = split_scale.split_my_data(X, y, .8)

In [16]:
sacler, train, test = split_scale.min_max_scaler(y_train, y_test)

ValueError: Expected 2D array, got 1D array instead:
array=[1502.25 7567.2  7049.75 1225.65  587.4  4542.35 2337.45 1042.65 7774.05
 2768.65 3623.95  106.9  6944.5  1728.2  1849.2   428.45 1035.5  2965.75
  197.4  1192.7  5717.85 3418.2   269.65 6435.25 4386.2  3846.75 6735.05
 2424.5   487.95 4908.25 3496.3  1345.85 3238.4   784.25 7856.   5468.45
 1790.8  3046.4  3983.6  6506.15  788.55 5744.35 1520.1   935.9   498.1
  305.55 1732.95 1759.55 1090.1   599.25  486.05 8078.1  7082.85 1914.5
  689.35 3512.9   613.95 1272.05 6441.85 5186.   6194.1  5776.45 5956.85
 3822.45 6223.8  3517.9  5930.05 4025.5  1436.95 7365.3  6697.2  1680.25
 2553.7   521.   4664.15 6365.35 2627.35 5487.     96.85 1319.95 7998.8
 6925.9  2554.   7015.9  5602.25 6733.   6253.   1830.05 1048.45  692.55
 1414.65  808.95 4230.25 1971.15 5625.55 7251.7  5324.5  7188.5  3486.65
 4122.65 1458.1  1389.35 6895.5   967.85  470.   5760.65 7895.15 8312.75
 8041.65 1851.45 4566.5   777.35 1415.85 7853.7  4250.1  7581.5  1873.7
  515.75 1052.35 6989.7  5607.75 1245.05 1688.9  4720.   1335.2  5950.2
 1782.05  609.1  4059.85 3166.9  1108.2  1802.55 4735.2  1672.15 5502.55
 7382.25 3423.5   602.9  1714.95 1031.7   890.5  1170.55 5550.1  5711.05
 3377.8  7987.6  7634.25 6549.45 2302.35  873.4  5999.85 1379.8   847.25
 2054.4  1750.7   369.1   373.5  1709.1  1209.25 4860.85 5705.05 6172.
 2181.75 1790.15  243.65 2879.9  6342.7  1051.9  6368.2  1024.   4759.55
 2779.5  7854.9  1710.15 6309.65 1234.6  2933.95 7576.7  6668.   4546.
 5861.75  789.55 5375.15 4895.1  1859.2  7839.85 6333.4  6501.35 2390.45
 1871.85 4479.2   494.05  630.6   124.45 1183.2   826.   5632.55 4284.2
 1161.75  520.55 7690.9  6227.5  4995.35 1401.15 2193.2  1988.05 5484.4
  224.5  3593.8  8182.85 4304.5  2345.55 4845.4  7334.05 3379.25 5963.95
  571.75 7348.8  6457.15 1022.6  5682.25 4575.35 1638.7   419.4  2006.95
 1618.2  7337.55 3027.65 1776.   8404.9  6994.6   125.   2398.4  2316.85
 1809.35 6841.3    68.8  7869.05 8065.65 6376.55 1255.1  5883.85 8399.15
 1738.9  1725.4  6998.95  553.   5265.2   943.    388.6  1734.65 6042.7
 7842.3  1412.65 5265.55 4798.4  3794.5  1389.6  3297.   7658.3  4911.35
 6474.45 1493.1   272.35 7679.65 7843.55 6563.4  6891.4  1023.95 4652.4
  521.8  4084.35 1344.5  6301.7  4868.4   264.55 2369.3  3533.6  3921.1
 4590.35 3914.05 4707.85 1298.7  1842.7  4428.45 1028.75 3948.45 1210.3
 3533.6   265.3  4415.75  116.95 6605.55 5435.   8240.85 4317.35  905.55
 1492.1  6141.65  223.15 6404.   7542.25 7737.55  978.   3139.8  5730.15
 8405.    514.75 1482.3  7993.3  4390.25 6333.8  6859.05  509.3   552.1
 7149.35 1715.15 6640.7  2339.3  8100.55 3177.25 2754.   1240.15 5129.45
 3370.2  6465.   6707.15 6144.55 7455.45  210.65 1654.6  7261.25 5728.55
  611.45 1787.35 3084.9   684.4 ].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.