# Model Inference House Prices

Name: Stanly

Batch: RMT-036  

Objective: Test the machine learning model created in `P1G4_stanly.ipynb`

# Import all libraries

In [318]:
import pandas as pd
import numpy as np
import pickle
import json

# Load all data

In [319]:
with open('model_linear.pkl', 'rb') as model_file:
  model_linear = pickle.load(model_file)

with open('model_scaler.pkl', 'rb') as scaler_file:
  scaler = pickle.load(scaler_file)

with open('one_hot_encoder.pkl', 'rb') as one_hot_encoder_file:
  one_hot_encoder = pickle.load(one_hot_encoder_file)

with open('ordinal_encoder.pkl', 'rb') as ordinal_encoder_file:
  ordinal_encoder = pickle.load(ordinal_encoder_file)

with open('num_col.txt', 'r') as num_col_file:
  num_columns = json.load(num_col_file)

with open('cat_col.txt', 'r') as cat_col_file:
  category_columns = json.load(cat_col_file)

with open('ordinal_cat_col.txt', 'r') as ordinal_cat_col:
  ordinal_category_columns = json.load(ordinal_cat_col)

# x. Model Inference

In [320]:
house_inf = {
  'area': 'BSD City',
  'city': 'Tangerang',
  'latitude': -6.3007333,
  'longitude': 106.586126,
  'property_type': 'rumah',
  'bedrooms': 3,
  'bathrooms': 2,
  'land_area': 181,
  'building_area': 182,
  'floors': 2,
  'maid_bedrooms': 1,
  'maid_bathrooms': 1,
  'certificate': 'shm - sertifikat hak milik',
  'voltage': 5500,
  'voltage_category': 'R-2',
  'building_age': 1,
  'year': 2021,
  'condition': 'baru',
  'garage': 1,
  'carport': 1
}

house_inf = pd.DataFrame([house_inf])

In [321]:
house_inf

Unnamed: 0,area,city,latitude,longitude,property_type,bedrooms,bathrooms,land_area,building_area,floors,maid_bedrooms,maid_bathrooms,certificate,voltage,voltage_category,building_age,year,condition,garage,carport
0,BSD City,Tangerang,-6.300733,106.586126,rumah,3,2,181,182,2,1,1,shm - sertifikat hak milik,5500,R-2,1,2021,baru,1,1


In [322]:
house_inf_num = house_inf[num_columns]
house_inf_cat = house_inf[category_columns]
house_inf_ordinal_cat = house_inf[ordinal_category_columns]

### Transform all the data to encoded values, and add them all together to one variable

In [323]:
house_inf_num_scaled = scaler.transform(house_inf_num)
house_inf_cat_encoded = one_hot_encoder.transform(house_inf_cat)
house_inf_ordinal_cat_encoded = ordinal_encoder.transform(house_inf_ordinal_cat)
house_inf_final = np.concatenate([house_inf_cat_encoded, house_inf_ordinal_cat_encoded, house_inf_num_scaled], axis=1)

In [324]:
result = model_linear.predict(house_inf_final)

print(f'Prediksi harga rumah di {house_inf['city'][0]} dengan luas {house_inf['building_area'][0]} m² adalah {result[0]}')

Prediksi harga rumah di Tangerang dengan luas 182 m² adalah 5.920352455444244e+22




# xi. Conclusion

In conclusion, we have successfully created a machine learning model that predicts the price of a house, based on numerous factors, such as its area (city), features (how many bedrooms and bathrooms in the house), the house condition (new, good, very good, etc), the year the house was built, its certificate, land area and maid features (how many maid bedrooms and maid bathrooms in the house). We use Winsorizer to cap the columns with outliers (we don't trim the columns with outliers, as there are many of them) and we use the `gaussian` method. To handle the missing values in the `bedrooms`, `bathrooms`, and `floors` columns, we use median, as they are skewed. There are a lot of numerical columns in the dataset, some categorical data and one categorical ordinal data (condition). The preprocessing technique used for numerical data is MinMaxScaler, as the numerical data has different digits, OrdinalEncoder for categorical ordinal data and OneHotEncoder for other categorical data. Ultimately, with all these processes we are able to predict the price of a house.

In [326]:
float(5.920352455444244 * 10 ** 22)

5.9203524554442446e+22