# Machine Learning Project: ML_Real estate price predictions

## Predict Real Estate Prices using an API requested to Idealista

By extracting data using the Idealista API, it will be possible to perform monthly searches and develop a model that allows for calculating updated property prices. The main objective is to analyze whether sales prices represent a favorable purchasing opportunity in **Asturias**. This will allow potential buyers to make more informed decisions in the real estate market.

Search API lets you integrate property information published on idealista into your site or app.

**https://developers.idealista.com/access-request**


### 0. Extracting data using the Idealista API, defining the functions, parameters, URL, and first visualization of the dataset

In [2]:
# Importing
import base64
import requests as rq
import json
import pandas as pd


In [27]:
# Defining the function to get token
def get_token():    
    '''
    The function will return the customized token
    '''
    api_key = 'ucdqwx5t02bfrynufi0r7psnwb6fu11c'          # API key provided by Idealista
    secret = 'ZwjDHTCE9yWM'                               # Secret code provided by Idealista
    message = api_key + ":" + secret                      # Concatenating the encoded API key and the secret
    auth = "Basic " + base64.b64encode(message.encode("ascii")).decode("ascii")   # Encoding the message
    headers_1 = {"Authorization" : auth,
                   "Content-Type" : "application/x-www-form-urlencoded;charset=UTF-8"}   # Defining the headers
    data_1 = {"grant_type" : "client_credentials",      # Defining the request parameters
                  "scope" : "read"}
    r = rq.post("https://api.idealista.com/oauth/token",  # Requesting the API URL, headers and params
                      headers = headers_1,
                      data = data_1)
    token = json.loads(r.text)['access_token']            # Obtaining the custom token, in JSON format
    return token


In [18]:
# These are the parameters that will be used to filter the search.
base_url = 'https://api.idealista.com/3.5/'     # Resource URL
country = 'es'                                  # Countries (values: es (Spain), it (Italy), and pt(Portugal))                          
language = 'es'                                 # Language (values: es, it, pt, en, ca) 
max_items = '50'                                # 50 a maximum allowed per call set by Idealista
operation = 'sale'                              # Operation type (values: sale, rent) 
property_type = 'homes'                         # Type of property (values: homes, offices, premises, garages, bedrooms)
order = 'priceDown'                             # Allowed values by property_type: homes
center = '43.53573,-5.66152'                    # Geographic coordinates -> ASTURIAS (Gijon Center)
distance = '250000'                             # Maximum distance from the center (Gijon Center)
sort = 'desc'                                   # Sorting the found items
bankOffer = 'false'                             # If the owner is a bank
maxprice = '1000000'                            # Maximum price

In [19]:
# Defining the function to obtain the search URL for Sales in Gijon, Asturias

def define_url(operation = 'sale', country = 'es',language = 'es',max_items = '50',property_type = 'homes',order = 'priceDown',
                      center = '43.53573,-5.66152',distance = '250000',sort = 'desc',bankOffer = 'false',maxprice = '1000000'):
    '''
    These are the parameters that will be used to filter the search. So, the function will combine these parameters with the url to generate our 
    own search url parameters: 

    base_url = 'https://api.idealista.com/3.5/'     # Resource URL
    country = 'es'                                  # Countries (values: es (Spain), it (Italy), and pt(Portugal))                          
    language = 'es'                                 # Language (values: es, it, pt, en, ca) 
    max_items = '50'                                # 50 a maximum allowed per call set by Idealista
    operation = 'sale'                              # Operation type (values: sale, rent) 
    property_type = 'homes'                         # Type of property (values: homes, offices, premises, garages, bedrooms)
    order = 'priceDown'                             # Allowed values by property_type: homes
    center = '43.53573,-5.66152'                    # Geographic coordinates -> ASTURIAS (Gijon Center)
    distance = '250000'                             # Maximum distance from the center (Gijon Center)
    sort = 'desc'                                   # Sorting the found items
    bankOffer = 'false'                             # If the owner is a bank
    maxprice = '1000000'                            # Maximum price

    '''
    url = ('https://api.idealista.com/3.5/'  +      
           country +
           '/search?operation=' + operation +
           '&maxItems=' + max_items +
           '&order=' + order +
           '&center=' + center +
           '&distance=' + distance +
           '&propertyType=' + property_type +
           '&sort=' + sort + 
           '&numPage=%d' +
           '&maxPrice=' + maxprice +
           '&language=' + language)
    
    return url

In [20]:
define_url('sale')

'https://api.idealista.com/3.5/es/search?operation=sale&maxItems=50&order=priceDown&center=43.53573,-5.66152&distance=250000&propertyType=homes&sort=desc&numPage=%d&maxPrice=1000000&language=es'

In [21]:
url = define_url()

In [22]:
# Defining the API search
def search_api(url):  
    '''
    The generated token and URL will be used by the function, which will return the search results.
    '''
    token = get_token()                         #  Getting the custom token

    headers = {'Content-Type': 'Content-Type: multipart/form-data;',   # Defining the search headers 
               'Authorization' : 'Bearer ' + token}

    content = rq.post(url, headers = headers)   # Returning the content from the request

    result = json.loads(content.text)           # Transforming the result as a json file   

    return result

In [23]:
# Performing pagination of the first query, page 1
pagination = 1
first_query_url = url %(pagination)

In [None]:
# Performing the search with the paginated URL
results = search_api(first_query_url)

In [135]:
print(results)

{'elementList': [{'propertyCode': '106100857', 'thumbnail': 'https://img4.idealista.com/blur/WEB_LISTING/0/id.pro.es.image.master/73/9d/61/1304374723.webp', 'externalReference': 'REF_31176', 'numPhotos': 36, 'price': 9500.0, 'priceInfo': {'price': {'amount': 9500.0, 'currencySuffix': '€', 'priceDropInfo': {'formerPrice': 18500.0, 'priceDropValue': 9000, 'priceDropPercentage': 49}}}, 'propertyType': 'chalet', 'operation': 'sale', 'size': 65.0, 'rooms': 2, 'bathrooms': 1, 'address': 'AL SANTA EUGENIA, 7', 'province': 'Asturias', 'municipality': 'Villaviciosa', 'district': 'Parroquias surorientales', 'country': 'es', 'latitude': 43.4678584, 'longitude': -5.359872, 'showAddress': False, 'url': 'https://www.idealista.com/inmueble/106100857/', 'distance': '25471', 'description': 'Haz tu oferta! Asesoramiento con todas las facilidades para la reforma incluido  Exclusivo ventanal orientación SUR con vistas al paraíso natural en este tranquilo pueblo tradicional con acceso recientemente rehabil

In [136]:
# Defining the total number of pages available, given that we can only extract up to 50 results per page.
total_pages = results['totalPages']

In [137]:
total_pages 

805

In [138]:
def results_to_df(results):
    '''
    The following function will save the JSON results as a DataFrame and return the resulting DataFrame.
    '''
    df = pd.DataFrame.from_dict(results['elementList'])

    return df

In [140]:
def concat_df(df, df_total):
    '''
    This function takes the primary DataFrame (df_tot) and concatenates it with the specified individual dataframe, returning the updated main DataFrame.
    '''
    pd.concat([df_total,df])
    
    return df_total

In [141]:
# Saving the results in a DataFrame
df = results_to_df(results)

In [None]:
df

Unnamed: 0,propertyCode,thumbnail,externalReference,numPhotos,price,priceInfo,propertyType,operation,size,rooms,...,hasStaging,highlight,savedAd,notes,topNewDevelopment,topPlus,floor,exterior,hasLift,parkingSpace
0,106100857,https://img4.idealista.com/blur/WEB_LISTING/0/...,REF_31176,36,9500.0,"{'price': {'amount': 9500.0, 'currencySuffix':...",chalet,sale,65.0,2,...,False,{'groupDescription': 'Destacado'},{},[],False,False,,,,
1,106567113,https://img4.idealista.com/blur/WEB_LISTING/90...,Myo,20,150000.0,"{'price': {'amount': 150000.0, 'currencySuffix...",countryHouse,sale,290.0,1,...,False,,{},[],False,False,,,,
2,97273645,https://img4.idealista.com/blur/WEB_LISTING/0/...,,8,60000.0,"{'price': {'amount': 60000.0, 'currencySuffix'...",flat,sale,283.0,1,...,False,,{},[],False,False,en,True,False,
3,106463395,https://img4.idealista.com/blur/WEB_LISTING/0/...,62659,4,35000.0,"{'price': {'amount': 35000.0, 'currencySuffix'...",chalet,sale,537.0,4,...,False,,{},[],False,False,,,,
4,100774492,https://img4.idealista.com/blur/WEB_LISTING/0/...,0050124471,1,64700.0,"{'price': {'amount': 64700.0, 'currencySuffix'...",flat,sale,96.0,3,...,False,,{},[],False,False,,,,
5,89087743,https://img4.idealista.com/blur/WEB_LISTING/0/...,2041##2041_0114_PE0001,4,9900.0,"{'price': {'amount': 9900.0, 'currencySuffix':...",flat,sale,45.0,1,...,False,,{},[],False,False,,,,
6,105322146,https://img4.idealista.com/blur/WEB_LISTING/0/...,08474,33,55000.0,"{'price': {'amount': 55000.0, 'currencySuffix'...",countryHouse,sale,1051.0,1,...,False,,{},[],False,False,bj,,,
7,28756787,https://img4.idealista.com/blur/WEB_LISTING/0/...,4237,13,39900.0,"{'price': {'amount': 39900.0, 'currencySuffix'...",flat,sale,70.0,2,...,False,{'groupDescription': 'Destacado'},{},[],False,False,1,True,False,"{'hasParkingSpace': True, 'isParkingSpaceInclu..."
8,33499727,https://img4.idealista.com/blur/WEB_LISTING/0/...,0034752278,1,14900.0,"{'price': {'amount': 14900.0, 'currencySuffix'...",chalet,sale,38.0,2,...,False,,{},[],False,False,,,,
9,106110102,https://img4.idealista.com/blur/WEB_LISTING/0/...,0033747036,1,26100.0,"{'price': {'amount': 26100.0, 'currencySuffix'...",chalet,sale,120.0,4,...,False,,{},[],False,False,,,,


In [145]:
# Creating a copy of the DataFrame 'df' and storing it in the variable 'df_total'.
df_total = df.copy()

In [146]:
# This function allows you to paginate through property search results in the Idealista API, collecting all the data into a single DataFrame 
# for further analysis or use.

for i in range(1,total_pages):
    url = ('https://api.idealista.com/3.5/'+country+'/search?operation='+operation+#"&locale="+locale+
           '&maxItems='+max_items+
           '&order='+order+
           '&center='+center+
           '&distance='+distance+
           '&propertyType='+property_type+
           '&sort='+sort+ 
           '&numPage=%s'+
           '&language='+language) %(i)  
    results = search_api(url) 
    df = pd.DataFrame.from_dict(results['elementList'])
    df_total = pd.concat([df_total,df])

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

In [None]:
'''
The error shown above is due to the API only allows a maximum of 100 searches per month, the system displays the error mentioned above.
El error mostrado arriba se debe a que la API solo permite un máximo de 100 búsquedas al mes, el sistema muestra el error mencionado anteriormente.
'''

========================================================================================================

### 1. Saving the Dataset

In [157]:
# Saving dataset
df_total.to_csv('C:/Users/argpe/REPO_PRUEBA_2/ML_Real_estate_price_predictions/src/data_sample/idealista_astur.csv')

In [None]:
'''
Notebook_1:
The dataset 'df total' was saved
'''