# A user's guide to the *idealista* public search API.  

This notebook explains how to obtain property information published in the *idealista* website using the most exciting programing language out there: the one and only [Julia](https://julialang.org) language. 

Before we get started, a little background on *idealista* is due. This Spanish enterprise was funded in 2000 and offers real estate services across the entire Spanish geography since 2004. Moreover, they have recently expanded to Italy and Portugal. For more information on their history, you can check their Wikipedia page [here](https://es.wikipedia.org/wiki/Idealista.com). 

The *idealista* web portal is the most widely used in Spain, and as a result, it has information on a wide range of listed properties across the Spanish geography. This information includes characteristics of the dwellings, e.g. size, number of rooms, number of bathrooms, etc., as well as their asking prices. Consequently, obtaining this information can be very helpful for those interested in housing and rental markets.

Luckily, the *idealista* lab offers the possibility to access their database through a public API. In particular, anyone interested can request access in [here](https://developers.idealista.com/access-request). As a result, and after some weeks of waiting, you will receive an "apikey" and a "secret". These two will allow you to obtain a token that is valid for a maximum of 100 requests per month. I'll show below how to obtain that token using Julia. Given the OAuth 2.0 Authentication bearer token, you will be able to send a request to their public API using different filters on your search. For example, you can decide on the type of operation (sale vs rent), type of property (homes, offices, premises, garages, bedrooms) and/or the location (e.g. Madrid). Once you have selected and specified a value for your filters, you are ready to send the HTTP request and obtain a list with the current listed properties that satisfy your search parameters. 

**Important limitations.** So far it seems that the *idealista* can be a great source of information for the housing and rental markets in Spain. However, not all that glitters is gold. In particular, there are two limitations that make the public API insufficient for large scale research project:

1. It is not possible to recover past information about the listings. Thus, it is not possible to construct a time series. 

2. There are some limits on your requests. The public API only allows you to recover 50 listings at a time. This is because the number of items per page is limited to 50 and you have to specify the page number in each of your searches. Moreover, there is also a limit of 100 requests per month. Consequently, the combination of these two will only allow you to obtain 50,000 observations per month that you will have to distribute across the Spanish geography. 

This may be sufficient for some projects, e.g. those interested in the cross section and with a particular focus on a given region. In any case, it is possible to circumvent these two limitations but you will have to dig deep into your pockets. 

Bearing that in mind, let's get started!

## The Julia Toolbox.

Below, I reproduce the main functions that are necessary to recover the information that results from a **single search** in the *idealista* web portal. For more complicated searches, e.g. if you want to obtain the listings in different Spanish cites, it is convenient to wrap up these functions into a module.   

In [1]:
# Packages:
using JSON
using Base64
using HTTP
using DataFrames

### Step 1: Getting the authentication token

This function takes two arguments: the apikey and the secret, and transforms them into a token. The apikey and the secret are strings that contain both letter and numbers. You will receive them with your application to the public API. The resulting token is also a string that is used later as an input for the search. 

In [2]:
function get_oauth_token(apikey::String, secret::String)

    url = "https://api.idealista.com/oauth/token"
    apikey_secret = apikey*":"*secret
    auth = Base64.base64encode(apikey_secret)

    h = ["Authorization" => "Basic "* auth,
         "Content-Type" => "application/x-www-form-urlencoded;charset=UTF-8"]
    b = HTTP.escapeuri(["grant_type" => "client_credentials"
                        "scope" => "read"])
    content = HTTP.request("POST", url, h, b)
    bearer_token = JSON.parse(String(content.body))["access_token"]

    return bearer_token
end

get_oauth_token (generic function with 1 method)

### Step 2: Sending a request 

This function takes a dictionary specifying the parameters of the search and returns the url used for the search. In particular, the keys of the dictionary are the filters that you want to apply, and the values are just that: the values of that specific filter. Notice that the values can take many formats depending on the filter itself. The possible set of filters as well as the type of values that each of them admits are specified in the documentation sent along with the apikey and the secret. 

In [3]:
function search_url(search_params::Dict)

    s_ini = "https://api.idealista.com/3.5/es/search?"

    for key in keys(search_params)
        val = search_params[key]
        s_ini = s_ini * key * "=" * val * "&"
    end

    return  s_ini[1:end-1]
end

search_url (generic function with 1 method)

This function sends a request to the *idealista* API using the token from Step 1 and the url obtained in the previous function. The results obtained in this step are however not ready to be analyzed and need to be processed.  

In [4]:
function search_api(token::String, url::String)

    headers = ["Content-Type" => "Content-Type: multipart/form-data;"
               "Authorization" => "Bearer " * token]
    content = HTTP.request("POST", url, headers)
    results = JSON.parse(String(content.body))

    return results
end

search_api (generic function with 1 method)

### Step 3: Clean and organize the results from the search 

The following function cleans the results from the search and transforms them into a ready to use *DataFrame*. The resulting *DataFrame* has dimensions NxM where N is the number of listings and M the number of variables corresponding to each listing. The maximum value of N in each search is 50, and M is equal to 33 in the public version of this API.

In [5]:
function read_elementList(listing::Vector)

    # transform Dict into DataFrame
    df = DataFrame(listing[1])

    for i in 2:length(listing)
        dfTemp = DataFrame(listing[i])

        # keep only the variables that are present in each Dict
        var_names = intersect(names(df), names(dfTemp))
        select!(dfTemp, var_names)
        select!(df, var_names)

        # merge the properties in a single DataFrame
        append!(df, dfTemp)
    end

    return df
end

read_elementList (generic function with 1 method)

## An example: a single request

First things first: get the authorization token! 

In [6]:
apikey = "your_apikey"
secret = "your_secret"
token  = get_oauth_token(apikey, secret);

Now, you are good to go and you can start your search.

First, set the filters of your search. For example, imagine that I want to look for rental homes in Madrid whose maximum price is not larger than 3,000€. Then, I will have to specify the "propertyType", the "operation", the "maxPrice" as well as the location. Notice that in order to direct the search to Madrid properties I have to use its coordinates ("center") and the maximum "distance" from that point. 

A convenient way to specify those filters is by constructing two arrays: one containing the properties of the listings and the other one its values, and then combine them into a dictionary. 

In [7]:
property = ["country", "locale", "language", "operation", "propertyType", "maxPrice", "sort", "sinceDate", "center", "distance", "maxItems"]
params = ["es", "es", "es", "rent", "homes", "3000", "asc", "M", "40.416,-3.7025", "20000", "50"]
search_dict = Dict(zip(property, params))

Dict{String,String} with 11 entries:
  "propertyType" => "homes"
  "language"     => "es"
  "sinceDate"    => "M"
  "distance"     => "20000"
  "maxPrice"     => "3000"
  "operation"    => "rent"
  "maxItems"     => "50"
  "country"      => "es"
  "center"       => "40.416,-3.7025"
  "locale"       => "es"
  "sort"         => "asc"

Once you have the dictionary containing the values of the filters, you are ready to send the request to the API using the previously generated token and an url that can be obtained using the previously defined function *search_url( )*.

In [8]:
results = search_api(token, search_url(search_dict))

Dict{String,Any} with 11 entries:
  "hiddenResults"      => false
  "itemsPerPage"       => 50
  "upperRangePosition" => 50
  "totalPages"         => 221
  "paginable"          => true
  "summary"            => Any["Alquilar", "Viviendas", "barrio Sol, Madrid", "D…
  "total"              => 11004
  "lowerRangePosition" => 0
  "elementList"        => Any[Dict{String,Any}("rooms"=>1,"propertyCode"=>"9127…
  "numPaginations"     => 0
  "actualPage"         => 1

Finally, you need to format the results and transform them into a more manageable form. You can do that by applying the function below to the ''element List'' key of the dictionary resulting form the search.

In [9]:
listing = results["elementList"]
df = read_elementList(listing)

Unnamed: 0_level_0,address,bathrooms,country,detailedType
Unnamed: 0_level_1,String,Int64,String,Dict…
1,Calle Mayor,1,es,"Dict(""typology""=>""flat"")"
2,Calle de la Sal,1,es,"Dict(""typology""=>""flat"",""subTypology""=>""penthouse"")"
3,Calle de la Sal,2,es,"Dict(""typology""=>""flat"")"
4,Avenida de América,1,es,"Dict(""typology""=>""flat"",""subTypology""=>""penthouse"")"
5,Calle de Zurbano,1,es,"Dict(""typology""=>""flat"",""subTypology""=>""studio"")"
6,Calle de Cervantes,1,es,"Dict(""typology""=>""flat"",""subTypology""=>""penthouse"")"
7,Calle de Atocha,1,es,"Dict(""typology""=>""flat"")"
8,"Calle de viejas, 25",1,es,"Dict(""typology""=>""flat"")"
9,"CABESTREROS, 6",1,es,"Dict(""typology""=>""flat"",""subTypology""=>""studio"")"
10,"Calle lucano, 6",1,es,"Dict(""typology""=>""flat"")"


And voilà, you are ready to analyze the data from that single search.