# Demo of the package
This notebook shows how to use the classes in the package to predict outputs and train models.

In [1]:
import pandas as pd
from MLhousingPrices import preprocessor, model

## Predict
This code loads the pretrained preprocessor and models used in the FastAPI.  

In [2]:
pp = preprocessor.load_preprocessor(filename='preprocessor_02.pkl')
X_test = pp.X_valid # replace this with your data to predict
pp.preprocess_test(X_test)
pp.X_test_pp.head()

Unnamed: 0,num__longitude,num__median_income,num__latitude,cat__ocean_proximity__1H OCEAN,cat__ocean_proximity_INLAND,cat__ocean_proximity_ISLAND,cat__ocean_proximity_NEAR BAY,cat__ocean_proximity_NEAR OCEAN
14740,1.253252,0.144489,-1.425135,0.0,0.0,0.0,0.0,1.0
10101,0.794442,0.998204,-0.797937,1.0,0.0,0.0,0.0,0.0
20566,-1.135549,0.247755,1.415977,0.0,1.0,0.0,0.0,0.0
2670,1.976375,-0.747459,-1.134939,0.0,1.0,0.0,0.0,0.0
15709,-1.429785,0.591906,1.013447,0.0,0.0,0.0,1.0,0.0


In [3]:
reg = model.load_model('xgb_tuned.pkl')
y_pred = reg.predict(pp.X_test_pp)
y_pred

array([148395.3 , 264409.84, 184801.06, ..., 178035.17, 232307.38,
       180722.17], dtype=float32)

## Train
This code shows how to use the classes to train the preprocessor and a chosen model (performs hyperparameter tuning).  
Replace filepath to your data.  
Set cluster to True if you wish to generate clusters using the latitude and longitude data, and use it as a feature.  
Set subset to False if you wish to not only keeo 'longitude', 'median_income', 'latitude', 'ocean_proximity' (and cluster if selected).

In [3]:
filepath = "data/housing.csv"
data = pd.read_csv(filepath)
pp = preprocessor.Preprocessor(data, cluster=False, subset=True) 
pp.split().preprocess_train()
pp.X_train_pp.head()

Unnamed: 0,num__longitude,num__median_income,num__latitude,cat__ocean_proximity__1H OCEAN,cat__ocean_proximity_INLAND,cat__ocean_proximity_ISLAND,cat__ocean_proximity_NEAR BAY,cat__ocean_proximity_NEAR OCEAN
12069,1.003899,0.190012,-0.840062,0.0,1.0,0.0,0.0,0.0
15925,-1.434772,0.269311,0.985364,0.0,0.0,0.0,1.0,0.0
11162,0.779481,0.029895,-0.840062,1.0,0.0,0.0,0.0,0.0
4904,0.649818,-1.26447,-0.755812,1.0,0.0,0.0,0.0,0.0
4683,0.599947,-0.367016,-0.723048,1.0,0.0,0.0,0.0,0.0


In [7]:
pp.save('03')

Choose the type of model to train (with hyperparameter tuning) by setting the modelType.  
Accepted values for modelType are 'xgb' (), 'rfr' and 'knn'.

In [4]:
mt = model.ModelTrainer(modelType='knn')
mt.train(X_train_pp=pp.X_train_pp, y_train=pp.y_train)

  _data = np.array(data, dtype=dtype, copy=copy,


<MLhousingPrices.model.ModelTrainer at 0x2a2b4a25eb0>

In [5]:
mt.model.best_score_ # r2

np.float64(0.7666677612418966)

In [6]:
mt.model.best_params_

{'n_neighbors': 10, 'weights': 'distance'}

In [None]:
mt.save