# Data Mining

---


For our dataset, predicting the rent of a property based on its features is a regression problem. We'll use a simple linear regression model as a starting point. To evaluate our model, we'll split the data into training and testing sets. This allows us to train the model on one subset of the data and test its performance on another unseen subset.

Let's start by preparing our data. We'll encode categorical variables and split the data into features (X) and target (y, which is the "Rent" column). After that, we'll split the data into training and testing sets.

**Preparing the Data**
1. One-hot encode categorical variables.
2. Split the data into features and target.
3. Split the data into training and testing sets.

In [4]:
import warnings
warnings.filterwarnings('ignore', category=DeprecationWarning)
from google.colab import files

uploaded = files.upload()


import pandas as pd
data = pd.read_csv('House_Rent_Dataset.csv')

Saving House_Rent_Dataset.csv to House_Rent_Dataset.csv


In [6]:
from sklearn.model_selection import train_test_split

df_encoded = pd.get_dummies(data, drop_first=True)

# Split the data into training and testing sets
X = df_encoded.drop("Rent", axis=1)
y = df_encoded["Rent"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [8]:
!pip install lazypredict


Collecting lazypredict
  Downloading lazypredict-0.2.12-py2.py3-none-any.whl (12 kB)
Installing collected packages: lazypredict
Successfully installed lazypredict-0.2.12
[0m

In [9]:
from lazypredict.Supervised import LazyRegressor

reg = LazyRegressor(predictions=True)
models, predictions = reg.fit(X_train, X_test, y_train, y_test)


100%|██████████| 42/42 [00:14<00:00,  2.88it/s]


You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 265
[LightGBM] [Info] Number of data points in the train set: 3796, number of used features: 3
[LightGBM] [Info] Start training from score 35151.516333


# Evaluation

---



The output will provide metrics such as R-Squared, RMSE, Time Taken, etc., for each model. This can help in quickly identifying which models perform well on your dataset without much manual effort.

In [10]:
print(models)


                               Adjusted R-Squared  R-Squared       RMSE  \
Model                                                                     
GaussianProcessRegressor                  4128.77   -8089.26 5678271.37   
LinearSVR                                    1.63      -0.23   70095.16   
MLPRegressor                                 1.61      -0.19   68903.19   
SVR                                          1.55      -0.08   65665.27   
NuSVR                                        1.53      -0.04   64297.38   
DummyRegressor                               1.51      -0.00   63134.82   
RANSACRegressor                              1.50       0.02   62407.40   
AdaBoostRegressor                            1.49       0.03   62066.55   
ElasticNetCV                                 1.49       0.04   61816.65   
KernelRidge                                  1.48       0.06   61285.33   
ExtraTreeRegressor                           1.45       0.11   59425.17   
DecisionTreeRegressor    

# Conclusion:

---


LazyPredict is a powerful tool for quickly evaluating many models.

It's particularly useful in the early stages of a project when you want to get a sense of which algorithms might be a good fit for your data. However, for a production-level model or detailed analysis, you would likely want to delve deeper into feature engineering, hyperparameter tuning, and other advanced techniques using the top-performing models identified by LazyPredict