# Model for inferring monthly rent prices in Lithuania

## Introduction
I'm interested in invested in Real Estate in Lithuania and this project collects part of
the data needed to estimate viability of the venture.

I have scraped monthly rent prices for flats from a site for real estate listings
aruodas.lt, I will also need to collect data on real estate prices, that would provide a
ballpark estimate if buying real estate for renting it out is profitable and in what
regions it would be profitable.

## Workflow from A to Z

### Gathering data

In [1]:
# !pip install git+https://github.com/mutusfa/scrape_aruodas

In [12]:
import logging

logging.basicConfig()
logger = logging.getLogger()
logger.setLevel(logging.WARNING)

In [13]:
import pandas as pd
from pathlib import Path

from scrape_aruodas.main import scrape

PROJECT_DIR = Path.cwd()

%load_ext lab_black

The lab_black extension is already loaded. To reload it, use:
  %reload_ext lab_black


In [14]:
# raw = scrape(num_items=2000)
# raw.to_csv(str(PROJECT_DIR / "data/raw/rent.csv"))

### Cleaning the data

In [15]:
import src.data.make_dataset

intermediate = (
    src.data.make_dataset.make_intermediate()
)  # also gets saved at data/intermediate/rent.csv

There is a notebook in src/data/select_final.ipynb with minimal exploratory data analysis, where I cut off outliers.

In [16]:
final = pd.read_csv(str(PROJECT_DIR / "data/final/rent.csv"), index_col=0)
final.head()

Unnamed: 0,city,district,latitude,listing_url,longitude,street,floor_area_m2,monthly_rent,number_of_rooms,floor,number_of_floors,build_year,building_type,heating_type,equipment
1,vilniuje,snipiskese,54.720888,https://www.aruodas.lt/butu-nuoma-vilniuje-sni...,25.278539,juozo-balcikonio-g,19.0,326.0,1.0,3.0,5.0,2020.0,Mūrinis,Centrinis kolektorinis,Įrengtas
2,vilniuje,naujininkuose,54.662883,https://www.aruodas.lt/butu-nuoma-vilniuje-nau...,25.27784,telsiu-g,42.0,399.0,3.0,2.0,4.0,2015.0,Mūrinis,Geoterminis,Įrengtas
3,vilniuje,fabijoniskese,54.742411,https://www.aruodas.lt/butu-nuoma-vilniuje-fab...,25.22911,salomejos-neries-g,50.0,360.0,2.0,11.0,12.0,2008.0,Mūrinis,Kita,Įrengtas
5,vilniuje,senamiestyje,54.681746,https://www.aruodas.lt/butu-nuoma-vilniuje-sen...,25.279369,klaipedos-g,105.0,1500.0,4.0,3.0,3.0,2013.0,Mūrinis,Centrinis kolektorinis,Įrengtas
6,vilniuje,naujininkuose,54.662866,https://www.aruodas.lt/butu-nuoma-vilniuje-nau...,25.277922,telsiu-g,42.0,350.0,1.0,1.0,4.0,2015.0,Mūrinis,Geoterminis,Įrengtas


### Modelling

In [17]:
import src.models

final = final.drop("listing_url", axis="columns")

model, scaler, encoder = src.models.main(final)

Score: 0.668780779673825
Mean absolute error: 128.82831971188514
Mean absolute percentage error: 0.295953694179332


Running src.models as a script saves model and used scaler and encoder to ./models directory.

### Accessing model via api

#### Access to model's inference (post interface)

https://jjuoda-ds-24.herokuapp.com/predict/

For example, using requests library:

In [18]:
import numpy as np
import requests

features_for_inference = [
    {
        "city": "kaune",
        "district": "centre",
        "latitude": 54.889328,
        "longitude": 23.936227,
        "street": "tunelio-g",
        "floor_area_m2": 25.0,
        "number_of_rooms": 1.0,
        "floor": 1.0,
        "number_of_floors": 2.0,
        "build_year": 1939.0,
        "building_type": "Medinis",
        "heating_type": "Dujinis",
        "equipment": "Įrengtas",
    }
]
url = "https://jjuoda-ds-24.herokuapp.com/predict/"
response = requests.post(url, json=features_for_inference)
inferred = np.array(response.json())
inferred

array([198.87162398])

#### Access last inferences made

https://jjuoda-ds-24.herokuapp.com/inferences

In [19]:
url = "https://jjuoda-ds-24.herokuapp.com/inferences/"
response = requests.get(url)
last_inferences = np.array(response.json())
last_inferences

array([{'number_of_floors': 2.0, 'equipment': 'Įrengtas', 'number_of_rooms': 1.0, 'street': 'tunelio-g', 'longitude': 23.936227, 'district': 'centre', 'inferred_monthly_rent': 198.87162398292858, 'heating_type': 'Dujinis', 'build_year': 1939.0, 'floor': 1.0, 'floor_area_m2': 25.0, 'latitude': 54.889328, 'city': 'kaune', 'id': 85, 'building_type': 'Medinis'},
       {'number_of_floors': 2.0, 'equipment': 'Įrengtas', 'number_of_rooms': 1.0, 'street': 'tunelio-g', 'longitude': 23.936227, 'district': 'centre', 'inferred_monthly_rent': 198.87162398292858, 'heating_type': 'Dujinis', 'build_year': 1939.0, 'floor': 1.0, 'floor_area_m2': 25.0, 'latitude': 54.889328, 'city': 'kaune', 'id': 84, 'building_type': 'Medinis'},
       {'number_of_floors': 2.0, 'equipment': 'Įrengtas', 'number_of_rooms': 1.0, 'street': 'tunelio-g', 'longitude': 23.936227, 'district': 'centre', 'inferred_monthly_rent': 198.87162398292858, 'heating_type': 'Dujinis', 'build_year': 1939.0, 'floor': 1.0, 'floor_area_m2': 25

#### Check (auto-generated) documentation:

https://jjuoda-ds-24.herokuapp.com/docs