# **Fake Real Estate Advertisement Detection with AutoGluon**

### **1. Install AutoGluon**

In [None]:
!pip install autogluon

### **2. Load and inspect the training data**

We create a **TabularDataset** object from a given URL link and store this object in the **train_data** variable. This variable now contains our training dataset.

In [None]:
from autogluon.tabular import TabularDataset

train_data_url = 'https://nvcuong.github.io/files/autogluon_train.csv'
train_data = TabularDataset(train_data_url)

We can look at the first 5 rows in the data.

In [None]:
train_data.head()

We look at the size of the data. The dataset contains 23268 rows and 20 columns.

In [None]:
print(train_data.shape)

Now we list all the column names to see which features we have.

Note that the column **label** (last column) is what we need to predict (1 = fake, 0 = real) from the other columns.

In [None]:
print(train_data.columns)

### **3. Train the predictor**

We can train a predictor using the **TabularPredictor** module. We need to specify the label column that we need to predict, and then call **fit(...)** with the training data.

Here we set the time limit to 300 seconds (5 minutes) for demo purposes. Removing this limit will allow longer training time with potentially better result.

In [None]:
from autogluon.tabular import TabularPredictor

predictor = TabularPredictor(label='label').fit(train_data, time_limit=300)

### **4. Evaluate the predictor on test data**

First, we load the test data like the way we loaded the training data before.

In [None]:
test_data_url = 'https://nvcuong.github.io/files/autogluon_test.csv'

test_data = TabularDataset(test_data_url)

We also look at the first 5 rows of the test data.

In [None]:
test_data.head()

We also check the size of the test data. This dataset contain 5817 rows with the same 20 columns.

In [None]:
print(test_data.shape)

We call the **evaluate(...)** function of the predictor to test and compute all the metrics.

The results show that the accuracy of the model is around 94%.

In [None]:
predictor.evaluate(test_data)

We can look at all the components of the predictor and their performance.

In [None]:
predictor.leaderboard(test_data)

### **5. Interpret the model**

Finally, we can compute and look at the feature importance.

Here we only use 1000 random rows of the test data for faster computation.

In [None]:
predictor.feature_importance(test_data, subsample_size=1000)