# Rent in Copenhagen
## Data retrieval

This notebook is for retrieving data about current apartments and rooms for rent in Copenhagen, using the undocumented API from [Boligportalen](https://www.boligportal.dk/).

The data will be used to map the differences in rent for rooms and apartments in Copenhagen's boroughs.

### 1. Initial setup

In [16]:
import requests
import pandas as pd
import time
from random import randint

### 2. Retrieve data

Using [this tutorial](https://inspectelement.org/apis.html#tutorial), I've inspected the API from Boligportalen via the browser's developer tools and copied as cURL.

Following the tutorial, the next steps are:
1. Converting the cURL to Python via curlconverter.com
2. Stripping the Python code

Then I put that into a function `get_rentals` that gets 18 search results (rentals) with whatever offset is set:

In [17]:
# Convert cURL to Python + Requests using curlconverter.com and strip it
# Make it into a function (get_rentals)

def get_rentals(offset):
    """
    Get 18 rentals with the offset set.
    """
    headers = {
        'Content-Type': 'text/plain;charset=UTF-8',
        'Accept': '*/*',
        'Accept-Language': 'da',
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.5 Safari/605.1.15',
    }
    
    params = {
        'offset': offset,
    }

    
    data = '{"categories":{"values":["rental_apartment","rental_room"]},"city_level_1":{"values":["københavn"]},"city_level_2":{"values":null},"city_level_3":{"values":null},"rooms":null,"min_size_m2":null,"max_monthly_rent":null,"min_rental_period":null,"max_available_from":"","company_filter_key":null,"company_key":null,"street_name":{"values":null},"social_housing":null,"min_lat":null,"min_lng":null,"max_lat":null,"max_lng":null,"shareable":true,"furnished":false,"student_only":false,"pet_friendly":false,"balcony":false,"senior_friendly":false,"parking":false,"elevator":false,"electric_charging_station":null,"dishwasher":null,"dryer":null,"washing_machine":null,"newbuild":null,"include_units":true,"order":"DEFAULT"}'.encode()
    
    response = requests.post('https://www.boligportal.dk/api/search/list', params=params, headers=headers, data=data)
    
    try:
        return response.json()
    except requests.JSONDecodeError:
        print(f"Failed to decode JSON for offset {offset}. Response content: {response.content}")
        return None

#### Get the rest of the rentals

Because there is a limit in the API for 18 results, I define a new function `get_all_rentals()` that uses the function from before `get_rentals` and loops through the different 'pages', each with 18 results, until there are no more results, saving all results to a variable.


In [18]:
def get_all_rentals():
    """
    Get all rentals by fetching in batches of 18 results, with delay
    """
    offset = 0
    all_rentals = []
    
    while True:
        rentals = get_rentals(offset)
        if rentals is None:
            print(f"Encountered an issue while fetching data at offset {offset}. Stopping further retrieval.")
            break
        results = rentals.get('results', [])
        if not results:
            print("No more results!")
            break
        all_rentals.extend(results)
        print(f"Fetched {offset} to {offset + 18} results")
        offset += 18
        time.sleep(randint(1,3))

    return all_rentals

#### Wait for it to fetch all results...

In [19]:
all_rentals = get_all_rentals()

Fetched 0 to 18 results
Fetched 18 to 36 results
Fetched 36 to 54 results
Fetched 54 to 72 results
Fetched 72 to 90 results
Fetched 90 to 108 results
Fetched 108 to 126 results
Fetched 126 to 144 results
Fetched 144 to 162 results
Fetched 162 to 180 results
Fetched 180 to 198 results
Fetched 198 to 216 results
Fetched 216 to 234 results
Fetched 234 to 252 results
Fetched 252 to 270 results
Fetched 270 to 288 results
Fetched 288 to 306 results
Fetched 306 to 324 results
Fetched 324 to 342 results
Fetched 342 to 360 results
Fetched 360 to 378 results
Fetched 378 to 396 results
Fetched 396 to 414 results
Fetched 414 to 432 results
Fetched 432 to 450 results
Fetched 450 to 468 results
Fetched 468 to 486 results
Fetched 486 to 504 results
Fetched 504 to 522 results
Fetched 522 to 540 results
Fetched 540 to 558 results
Fetched 558 to 576 results
Fetched 576 to 594 results
Fetched 594 to 612 results
Fetched 612 to 630 results
Fetched 630 to 648 results
Fetched 648 to 666 results
Fetched 666 t

### 3. Check the data

In [20]:
# Access the first element

if all_rentals:
    print(all_rentals[0])

{'is_owner': False, 'is_promoted': False, 'is_open': False, 'is_exposure': False, 'url': '/lejligheder/k%C3%B8benhavn/56m2-2-vaer-id-5184445', 'created': '2021-05-03T16:55:30.755212+00:00', 'advertised_date': '2024-07-21T14:51:34.333669+00:00', 'id': 5184445, 'rentable_id': 5184476, 'city': 'København', 'city_area': 'København Ø', 'street_name': 'Æbeløgade', 'postal_code': '2100', 'description': '', 'category': 'rental_apartment', 'title': 'Lækker, lys indflytningsklar 2-værelses lejlighed i det eftertragtede Klimakvarter på Østerbro med indflytning pr. 01/10/1024', 'rooms': 2.0, 'size_m2': 56.0, 'monthly_rent': 12400.0, 'monthly_rent_currency': 'DKK', 'monthly_rent_extra_costs': 1100.0, 'prepaid_rent': 26000.0, 'deposit': 26000.0, 'deposit_currency': 'DKK', 'location': {'lat': 55.712532, 'lng': 12.561227}, 'open_house': None, 'images': [{'url': 'https://image-lambda.boligportal.dk/30dd82b6d883712fe5a11493f63a1075', 'is_floor_plan': False}, {'url': 'https://image-lambda.boligportal.dk/

In [21]:
# Check number of results

len(all_rentals)

693

In [22]:
# Check what keys/column names are in the data

all_rentals[0].keys()

dict_keys(['is_owner', 'is_promoted', 'is_open', 'is_exposure', 'url', 'created', 'advertised_date', 'id', 'rentable_id', 'city', 'city_area', 'street_name', 'postal_code', 'description', 'category', 'title', 'rooms', 'size_m2', 'monthly_rent', 'monthly_rent_currency', 'monthly_rent_extra_costs', 'prepaid_rent', 'deposit', 'deposit_currency', 'location', 'open_house', 'images', 'has_video', 'formatted_address', 'state', 'floor', 'rental_period', 'available_from', 'is_contactable_via_message', 'other_details', 'review_reason', 'needs_follow_up', 'city_level_1', 'city_level_2', 'city_level_3', 'upsell_url', 'video_url', 'digital_showing', 'created_draft_contract', 'energy_rating', 'locked_fields', 'hide_from_company_search', 'interactive_floor_plan_id', 'deleted', 'social_housing', 'is_contacted', 'is_newbuild', 'ad_phone_number'])

##### *Relevant keys*

Note to self - these keys may be the ones that are relevant to keep in an analysis. From random entry in key `results` which contains a list of dictionaries:

```
'url': '/lejligheder/k%C3%B8benhavn/91m2-3-vaer-id-5447305',
'created': '2024-07-16T12:29:37.108418+00:00',
'advertised_date': '2024-07-16T12:29:37.201699+00:00',
'id': 5447305,
'rentable_id': 5530637,
'city': 'København',
'city_area': 'Glostrup',
'street_name': 'Grannålen',
'postal_code': '2600',
'description': '',
'category': 'rental_apartment',
'title': 'Helt nye lejeboliger i Glostrup! ',
'rooms': 3.0,
'size_m2': 91.0,
'monthly_rent': 13100.0,
'monthly_rent_currency': 'DKK',
'monthly_rent_extra_costs': 1000.0,
'prepaid_rent': 13100.0,
'deposit': 39300.0,
'deposit_currency': 'DKK',
'location': {'lat': 55.682051, 'lng': 12.413595},
'formatted_address': None,
'floor': 1,
'rental_period': 0,
'available_from': '2024-08-15',
'social_housing': False,
'is_newbuild': False,
```

In [23]:
# More data checking

all_rentals[-10]

{'is_owner': False,
 'is_promoted': False,
 'is_open': False,
 'is_exposure': False,
 'url': '/lejligheder/k%C3%B8benhavn/98m2-3-vaer-id-5101431',
 'created': '2020-12-18T20:36:56+00:00',
 'advertised_date': '2023-10-02T14:14:35.321960+00:00',
 'id': 5101431,
 'rentable_id': 5101431,
 'city': 'København',
 'city_area': 'Frederiksberg',
 'street_name': 'Finsensvej',
 'postal_code': '2000',
 'description': '',
 'category': 'rental_apartment',
 'title': 'Dejlig nyistandsat 3-værelses stuelejlighed på Frederiksberg til leje. ',
 'rooms': 3.0,
 'size_m2': 98.0,
 'monthly_rent': 18500.0,
 'monthly_rent_currency': 'DKK',
 'monthly_rent_extra_costs': 0.0,
 'prepaid_rent': 0.0,
 'deposit': 55500.0,
 'deposit_currency': 'DKK',
 'location': {'lat': 55.680706, 'lng': 12.51919},
 'open_house': None,
 'images': [{'url': 'https://image-lambda.boligportal.dk/93b28e1c9835c52a77b7c163224c0bea',
   'is_floor_plan': False},
  {'url': 'https://image-lambda.boligportal.dk/98f64851274b1d9a20660ce7c275b775',


### 4. Turn the data into a DataFrame

THIS DATA WAS RETRIEVED FROM THE API **JULY 16, 2024** BETWEEN APPROXIMATELY **16.40 AND 17.00 EST (EDT)**.

In [24]:
df = pd.DataFrame(all_rentals)

### 5. Check the DataFrame

In [25]:
df.head()

Unnamed: 0,is_owner,is_promoted,is_open,is_exposure,url,created,advertised_date,id,rentable_id,city,...,energy_rating,locked_fields,hide_from_company_search,interactive_floor_plan_id,deleted,social_housing,is_contacted,is_newbuild,ad_phone_number,video
0,False,False,False,False,/lejligheder/k%C3%B8benhavn/56m2-2-vaer-id-518...,2021-05-03T16:55:30.755212+00:00,2024-07-21T14:51:34.333669+00:00,5184445,5184476,København,...,D,[],False,,False,False,False,False,,
1,False,False,False,False,/v%C3%A6relser/k%C3%B8benhavn/15m2-1-vaer-id-5...,2023-08-20T09:33:01.517000+00:00,2024-07-21T08:07:09.351245+00:00,5381282,5443346,København,...,,[],False,,False,False,False,False,,
2,False,False,False,False,/lejligheder/k%C3%B8benhavn/63m2-3-vaer-id-509...,2020-12-14T13:42:17+00:00,2024-07-21T07:07:35.465890+00:00,5095348,5095348,København,...,A2015,[],False,,False,False,False,False,,
3,False,False,False,False,/v%C3%A6relser/k%C3%B8benhavn/20m2-1-vaer-id-5...,2021-02-21T12:36:14.132103+00:00,2024-07-20T16:13:59.164021+00:00,5153512,5153517,København,...,C,[],False,,False,False,False,False,,
4,False,False,False,False,/v%C3%A6relser/k%C3%B8benhavn/12m2-1-vaer-id-5...,2021-11-05T09:17:41.540283+00:00,2024-07-20T08:09:03.604601+00:00,5238627,5249729,København,...,F,[],False,,False,False,False,False,,


In [26]:
# Check columns (same as checking keys like before)

df.columns

Index(['is_owner', 'is_promoted', 'is_open', 'is_exposure', 'url', 'created',
       'advertised_date', 'id', 'rentable_id', 'city', 'city_area',
       'street_name', 'postal_code', 'description', 'category', 'title',
       'rooms', 'size_m2', 'monthly_rent', 'monthly_rent_currency',
       'monthly_rent_extra_costs', 'prepaid_rent', 'deposit',
       'deposit_currency', 'location', 'open_house', 'images', 'has_video',
       'formatted_address', 'state', 'floor', 'rental_period',
       'available_from', 'is_contactable_via_message', 'other_details',
       'review_reason', 'needs_follow_up', 'city_level_1', 'city_level_2',
       'city_level_3', 'upsell_url', 'video_url', 'digital_showing',
       'created_draft_contract', 'energy_rating', 'locked_fields',
       'hide_from_company_search', 'interactive_floor_plan_id', 'deleted',
       'social_housing', 'is_contacted', 'is_newbuild', 'ad_phone_number',
       'video'],
      dtype='object')

In [27]:
# Check what the 'location' column looks like

df['location'].head()

0    {'lat': 55.712532, 'lng': 12.561227}
1     {'lat': 55.66544, 'lng': 12.598308}
2    {'lat': 55.655677, 'lng': 12.617045}
3    {'lat': 55.717242, 'lng': 12.485363}
4    {'lat': 55.699834, 'lng': 12.523683}
Name: location, dtype: object

#### Summary stats

In [28]:
df.describe()

Unnamed: 0,id,rentable_id,rooms,size_m2,monthly_rent,monthly_rent_extra_costs,prepaid_rent,deposit,floor,rental_period
count,693.0,693.0,693.0,693.0,693.0,692.0,690.0,693.0,664.0,693.0
mean,5320845.0,5382732.0,2.792208,79.272987,13349.12632,884.294798,13376.434681,36506.993175,2.53012,1.87013
std,383535.3,401820.8,1.144273,37.420042,5938.283079,724.214761,11711.321545,19553.676281,1.971669,4.927262
min,869803.0,869803.0,1.0,6.0,2800.0,0.0,0.0,0.0,-1.0,0.0
25%,5365630.0,5417309.0,2.0,60.0,10600.0,600.0,8300.0,26000.0,1.0,0.0
50%,5436077.0,5514729.0,3.0,85.0,12500.0,850.0,12200.0,36450.0,2.0,0.0
75%,5444055.0,5525709.0,4.0,99.0,14850.0,1050.0,14500.0,42300.0,4.0,0.0
max,5447846.0,5531290.0,8.0,338.0,50000.0,10000.0,90000.0,150000.0,12.0,24.0


### 6. Save the data to a csv

In [29]:
df.to_csv('20240721_cph_rentals_shareable.csv', index=False)

# THIS DATA WAS RETRIEVED JULY 21, 2024 BETWEEN APPROXIMATELY 14.37 AND 15.00 EST (EDT)