# Predicting house selling prices in Denmark



## Initial overview of steps:
* Guiding research question(s)
* Scrape real estate agency websites (gathering)
* Load data and organize in tidy format (wrangling)
* Deal with data issues (wrangling)
* Exploratory analysis
* Focussed questions
* Explanatory analysis
* Prediction models

## Questions
* How can we predict home prices?


* Is it possible to predict listing prices based on characteristics of the home?
* If so, what features are most important?
* Which ones doesn't matter at all?

# Notes 
The CRISP-DM Process (Cross Industry Process for Data Mining)
The lessons leading up to the first project are about helping you go through CRISP-DM in practice from start to finish. Even when we get into the weeds of coding, try to take a step back and realize what part of the process you are in, and assure that you remember the question you are trying answer and what a solution to that question looks like.

1. Business Understanding

2. Data Understanding

3. Prepare Data

4. Data Modeling

5. Evaluate the Results

6. Deploy

In [108]:
# Importing libraries
import pandas as pd
import requests
import bs4
import time

Browsing Home, the largest real estate company in Denmark and playing arround with the developer tools, I managed to find HTTP call that seem to return the data of the listings.

In [10]:
# Using the home, the biggest real estate company in Denmark
#url = 'https://home.dk/umbraco/backoffice/home-api/SEARCH?CurrentPageNumber=0&SearchResultsPerPage=10&q=2200%20K%C3%B8benhavn%20N&Energimaerker=null&SearchType=0&_=1571481546474'
url = 'https://home.dk/umbraco/backoffice/home-api/SEARCH?CurrentPageNumber=0&SearchResultsPerPage=10&q=2200&Energimaerker=null&SearchType=0&_=1571481546474'
response = requests.get(url)
# Saving response to a dictionary
featuresDict = response.json()

In [12]:
# Checking our the data
featuresDict

{'redirectUrl': None,
 'inputModel': {'SortType': None,
  'SortOrder': None,
  'CurrentPageNumber': 0,
  'SearchResultsPerPage': 10,
  'q': '2200',
  'EjendomstypeV1': None,
  'EjendomstypeRH': None,
  'EjendomstypeEL': None,
  'EjendomstypeVL': None,
  'EjendomstypeAA': None,
  'EjendomstypePL': None,
  'EjendomstypeFH': None,
  'EjendomstypeLO': None,
  'EjendomstypeHG': None,
  'EjendomstypeFG': None,
  'EjendomstypeNL': None,
  'Forretningnr': None,
  'ProjectNodeId': None,
  'OnlyBrokerHome': None,
  'PriceMin': None,
  'PriceMax': None,
  'EjerudgiftPrMdrMin': None,
  'EjerudgiftPrMdrMax': None,
  'BoligydelsePrMdrMin': None,
  'BoligydelsePrMdrMax': None,
  'BoligstoerrelseMin': None,
  'BoligstoerrelseMax': None,
  'GrundstoerrelseMin': None,
  'GrundstoerrelseMax': None,
  'VaerelserMin': None,
  'VaerelserMax': None,
  'Energimaerker': ['null'],
  'ByggaarMin': None,
  'ByggaarMax': None,
  'EtageMin': None,
  'EtageMax': None,
  'PlanMin': None,
  'PlanMax': None,
  'Aabenth

What we want to extract seem to be withing the searchResult key:

In [13]:
featuresDict['searchResults']

[{'sagsnummer': '1050000139',
  'lng': 12.5457172243703,
  'lat': 55.6924852361034,
  'fokusbolig': False,
  'showNewPrice': False,
  'isNew': True,
  'adresse': 'Bjelkes Allé 6B, st..',
  'postal': 2200,
  'city': 'København N',
  'price': '2.095.000 ',
  'ejendomstypePrimaerNicename': 'Ejerlejlighed',
  'pictures': [{'PicId': 2993530,
    'CaseId': 10397003,
    'CaseNumber': '1050000139',
    'MediaType': 'b',
    'MaxWidth': 3000,
    'MaxHeight': 2000,
    'URL': 'https://home.mindworking.eu/resources/shops/105/cases/1050000139/casemedia/images/7687715b8b7896b4ff855797e16a8061/customsize.jpg?deviceId=jd83hsdf3',
    'Position': 0,
    'Description': 'Stue',
    'GUID': '7687715b-8b78-96b4-ff85-5797e16a8061',
    'refGUID': '00000000-0000-0000-0000-000000000000',
    'IsVertical': False,
    'IsHorizontal': True},
   {'PicId': 2993537,
    'CaseId': 10397003,
    'CaseNumber': '1050000139',
    'MediaType': 'b',
    'MaxWidth': 3000,
    'MaxHeight': 2000,
    'URL': 'https://home.

Great! This is the data we're interested in. However the pictures key contain a list of information, we don't need which would ruin the granularity should we convert it to a pandas Dataframe so let's drop it.

In [14]:
# dropping the pictures key from the list of dictionaries
features = featuresDict['searchResults']
for f in features:
    del f['pictures']
features

[{'sagsnummer': '1050000139',
  'lng': 12.5457172243703,
  'lat': 55.6924852361034,
  'fokusbolig': False,
  'showNewPrice': False,
  'isNew': True,
  'adresse': 'Bjelkes Allé 6B, st..',
  'postal': 2200,
  'city': 'København N',
  'price': '2.095.000 ',
  'ejendomstypePrimaerNicename': 'Ejerlejlighed',
  'floorPlan': {'PicId': 2993542,
   'CaseId': 10397003,
   'CaseNumber': '1050000139',
   'MediaType': 'p',
   'MaxWidth': 3000,
   'MaxHeight': 2000,
   'URL': 'https://home.mindworking.eu/resources/shops/105/cases/1050000139/casemedia/images/2f0b1e7e3e1981c99f5d514ebf3f9869/customsize.jpg?deviceId=jd83hsdf3',
   'Position': 0,
   'Description': 'Plantegning',
   'GUID': '2f0b1e7e-3e19-81c9-9f5d-514ebf3f9869',
   'refGUID': '00000000-0000-0000-0000-000000000000',
   'IsVertical': False,
   'IsHorizontal': True},
  'boligOrGrundAreal': 54,
  'andenmaegler': False,
  'boligurl': 'https://home.dk/boligkatalog/koebenhavn/2200/ejerlejligheder/bjelkes_alle_6b_st_1050000139.aspx',
  'billede

The data seem ready to be loaded to a pandas dataframe.

In [15]:
df = pd.DataFrame(features)
df.head()

Unnamed: 0,aabenthusNicename,aabenthusShowRegistration,adresse,andenmaegler,billedeUrl,boligKanLejes,boligOrGrundAreal,boligurl,city,ejendomstypePrimaerNicename,...,lejePerMaaned,lng,openHouseEndDate,openHouseStartDate,overskrift2,postal,price,sagsnummer,showNewPrice,solgtBolig
0,27.10 kl. 12.00-12.30,False,"Bjelkes Allé 6B, st..",False,https://home.mindworking.eu/resources/shops/10...,0,54,https://home.dk/boligkatalog/koebenhavn/2200/e...,København N,Ejerlejlighed,...,,12.545717,2019-10-27T12:30,2019-10-27T12:00,,2200,2.095.000,1050000139,False,False
1,27.10 kl. 14.30-14.50,False,"Poppelgade 4, 1. th.",False,https://home.mindworking.eu/resources/shops/10...,0,105,https://home.dk/boligkatalog/koebenhavn/2200/a...,København N,Andelsbolig,...,,12.559357,2019-10-27T14:50,2019-10-27T14:30,Beliggende i baghuset,2200,1.799.000,1050000162,False,False
2,27.10 kl. 13.30-13.50,False,"Husumgade 20, 2. th.",False,https://home.mindworking.eu/resources/shops/10...,0,53,https://home.dk/boligkatalog/koebenhavn/2200/e...,København N,Ejerlejlighed,...,,12.5454,2019-10-27T13:50,2019-10-27T13:30,Et super godt køb!,2200,2.399.000,1050000164,False,False
3,27.10 kl. 13.30-13.50,False,"Egegade 2, 1. th.",False,https://home.mindworking.eu/resources/shops/10...,0,78,https://home.dk/boligkatalog/koebenhavn/2200/e...,København N,Ejerlejlighed,...,,12.559457,2019-10-27T13:50,2019-10-27T13:30,Med altan og stort badeværelse,2200,3.999.000,1050000167,False,False
4,27.10 kl. 11.00-11.20,False,"Fredensborggade 2, 1. th.",False,https://home.mindworking.eu/resources/shops/10...,0,56,https://home.dk/boligkatalog/koebenhavn/2200/e...,København N,Ejerlejlighed,...,,12.53988,2019-10-27T11:20,2019-10-27T11:00,Super beliggenhed på Nørrebro,2200,2.199.000,1050000137,False,False


Let's remove columns that are not of interest.

In [16]:
df.drop(inplace = True, columns=[
    'billedeUrl','lejePerMaaned','showNewPrice',
    'aabenthusNicename','floorPlan','erSolgtOgLejebolig',
    'boligKanLejes','aabenthusShowRegistration', 
    'solgtBolig','isLejebolig','fokusbolig'
])

In [17]:
df.head()

Unnamed: 0,adresse,andenmaegler,boligOrGrundAreal,boligurl,city,ejendomstypePrimaerNicename,isNew,lat,lng,openHouseEndDate,openHouseStartDate,overskrift2,postal,price,sagsnummer
0,"Bjelkes Allé 6B, st..",False,54,https://home.dk/boligkatalog/koebenhavn/2200/e...,København N,Ejerlejlighed,True,55.692485,12.545717,2019-10-27T12:30,2019-10-27T12:00,,2200,2.095.000,1050000139
1,"Poppelgade 4, 1. th.",False,105,https://home.dk/boligkatalog/koebenhavn/2200/a...,København N,Andelsbolig,True,55.692049,12.559357,2019-10-27T14:50,2019-10-27T14:30,Beliggende i baghuset,2200,1.799.000,1050000162
2,"Husumgade 20, 2. th.",False,53,https://home.dk/boligkatalog/koebenhavn/2200/e...,København N,Ejerlejlighed,True,55.693495,12.5454,2019-10-27T13:50,2019-10-27T13:30,Et super godt køb!,2200,2.399.000,1050000164
3,"Egegade 2, 1. th.",False,78,https://home.dk/boligkatalog/koebenhavn/2200/e...,København N,Ejerlejlighed,True,55.690345,12.559457,2019-10-27T13:50,2019-10-27T13:30,Med altan og stort badeværelse,2200,3.999.000,1050000167
4,"Fredensborggade 2, 1. th.",False,56,https://home.dk/boligkatalog/koebenhavn/2200/e...,København N,Ejerlejlighed,True,55.698624,12.53988,2019-10-27T11:20,2019-10-27T11:00,Super beliggenhed på Nørrebro,2200,2.199.000,1050000137


The 'boligurl' is the URL to the site of each piece of real estate for sale, so let's use that to get more features!

In [18]:
response = requests.get(df['boligurl'][0])
html = response.text

In [22]:
html

'\r\n<!DOCTYPE html>\r\n<html lang="da" class="no-js" ng-app="home" xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://ogp.me/ns/fb#">\r\n<head>\r\n    <script id="CookieConsent" src="https://policy.cookieinformation.com/uc.js" data-culture="DA" async></script>\r\n    \r\n<script>(function(H){H.className=H.className.replace(/\\bno-js\\b/,\'js\')})(document.documentElement)</script>\r\n<meta charset="utf-8">\r\n<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">\r\n<meta id="viewport" name="viewport" content="width=device-width, initial-scale=1, maximum-scale=2">\r\n<meta name="format-detection" content="telephone=no">\r\n<title>Ejerlejlighed - 2200 København N - Bjelkes Allé 6B, st..</title>\r\n<meta name="title" content="Ejerlejlighed - 2200 København N - Bjelkes Allé 6B, st..">\r\n<meta name="keywords" content="" />\r\n<meta name="description" content="Ejerlejlighed til salg, København N - Førstehåndsindtrykket er rigtig godt, når I træder indenfor i entréen, for allere

The stuff we want is in the info-property and info-value class.

In [39]:
soup = bs4.BeautifulSoup(html, "html.parser")
additionalFeatures = soup.find_all('span', {"class": ["info-property","info-value"]})


[<span class="info-property">Kontantpris</span>,
 <span class="info-value"><b>3.650.000  kr.</b></span>,
 <span class="info-property">Ejerudgift pr. md.</span>,
 <span class="info-value"><b>2.356  kr.</b></span>,
 <span class="info-property">Kvm. pris <i class="tipso" title="Kvm-prisen er baseret på et vægtet areal,  som er mere præcist, fordi der også tages højde for kælderarealer, loftsarealer, udhuse etc. - og ikke kun boligareal. ">?</i></span>,
 <span class="info-value"><b>40.109  kr.</b></span>,
 <span class="info-property">Udbetaling</span>,
 <span class="info-value"><b>185.000  kr.</b></span>,
 <span class="info-property">
                         Brutto/Netto
                         <i class="tipso" title="I brutto- og nettoydelsen indgår standardfinansiering. Da der er tale om en standardfinansiering, vil den i visse tilfælde ikke kunne opnås, hvorfor brutto- og nettoydelsen i så fald kan afvige.">?</i>
 <br>
                         ekskl. ejerudgift
                     </

They come in pairs and we need them divivded into key-value pairs.

In [64]:
# Loop through each span in the list
#import json
count = 0
keys = []
values = []
for feat in additionalFeatures:
    if count % 2: # Odd number is a value
        values.append(feat.text.strip())
        #values.append(re.findall('<b>.+</b>',str(feat))[0][3:-4])
    else: # Even number is a key
        keys.append(feat.text.strip())
        #keys.append(re.findall('>.+<',str(feat))[0][1:-1])
    count +=1 
dictionary = dict(zip(keys, values))
dictionary

{'Kontantpris': '3.650.000  kr.',
 'Ejerudgift pr. md.': '2.356  kr.',
 'Kvm. pris ?': '40.109  kr.',
 'Udbetaling': '185.000  kr.',
 'Brutto/Netto\r\n                        ?\n\r\n                        ekskl. ejerudgift': '14.114  / 12.357  kr.',
 'Prisudvikling': '0%',
 'Boligareal': '91  m2',
 'Grundareal': '570  m2',
 'Antal toiletter': '1',
 'Antal rum': '3',
 'Byggeår': '1906',
 'Energimærke': 'D',
 'Sagsnr.': '1050000133',
 'Afstand til off. transport': '200  m',
 'Afstand til skole': '500  m',
 'Afstand til indkøb': '300  m',
 'Ydermur': 'Mursten',
 'Gulve': 'Plankegulve',
 'Vinduer': 'Termo',
 'El': 'HPFI-relæ',
 'Forurening': 'Jf. udskrift fra RegionH',
 'Overtagelse': 'Efter aftale',
 'Antenne': 'Kabel-tv',
 'Vaskeri': 'Ja',
 'Udlejning tilladt': 'Ja, jf. vedtægterne',
 'Tilbehør': 'Indesit opvaskemaskineGram køleskabVoss ovn',
 'Ejendomsværdi i kr.': '1.600.000',
 'Heraf grundværdi i kr.': '112.200',
 'Vurderingsår': '2018'}

This should be repeated for each line in the dataframe and to be appended as columns. Let's create a function for this.

In [153]:
def GetAdditionalFeatures(df):
    additionalFeaturesList = []
    counter = 0
    loops = df.shape[0]
    # Loop through all rows
    for i in df['boligurl']:
        try:
            response = requests.get(i)
            html = response.text
            soup = bs4.BeautifulSoup(html, "html.parser")
            additionalFeatures = soup.find_all('span', {"class": ["info-property","info-value"]})

            # Loop through each span in the list
            count = 0
            keys = []
            values = []
            for feat in additionalFeatures:
                if count % 2: # Odd number is a value
                    values.append(feat.text.strip())
                else: # Even number is a key
                    keys.append(feat.text.strip())
                count +=1 
        except:
            keys.append('Connection timed out')
            values.append('True')
        additionalFeaturesList.append(dict(zip(keys, values)))
        time.sleep(2)
        counter += 1
        print((float(counter)/float(loops))*100.)
    df2 = df.join(pd.DataFrame(additionalFeaturesList))
    return df2

In [86]:
df2 = GetAdditionalFeatures(df)
df2.head()

Unnamed: 0,adresse,andenmaegler,boligOrGrundAreal,boligurl,city,ejendomstypePrimaerNicename,isNew,lat,lng,openHouseEndDate,...,Sagsnr.,Teknisk pris ?,Tilbehør,Udbetaling,Udlejning,Udlejning tilladt,Vaskeri,Vinduer,Vurderingsår,Ydermur
0,"Bjelkes Allé 6B, st..",False,54,https://home.dk/boligkatalog/koebenhavn/2200/e...,København N,Ejerlejlighed,True,55.692485,12.545717,2019-10-27T12:30,...,1050000139,,Gorenje komfurAEG køle/fryseskabBosch emhætteh...,105.000 kr.,,Tilladt,Ja,,2018,Mursten
1,"Poppelgade 4, 1. th.",False,105,https://home.dk/boligkatalog/koebenhavn/2200/a...,København N,Andelsbolig,True,55.692049,12.559357,2019-10-27T14:50,...,1050000162,3.879.803 kr.,Bosch køle/fryseskabAEG vaskemaskine,,,"Tilladt i kortere periode, jf. vedtægternes § ...",Ja,Termo,2018,Mursten
2,"Husumgade 20, 2. th.",False,53,https://home.dk/boligkatalog/koebenhavn/2200/e...,København N,Ejerlejlighed,True,55.693495,12.5454,2019-10-27T13:50,...,1050000164,,Afventer oplysninger fra sælger,120.000 kr.,Tilladt,,Fællesvaskeri,Termo,2018,Pudset mursten
3,"Egegade 2, 1. th.",False,78,https://home.dk/boligkatalog/koebenhavn/2200/e...,København N,Ejerlejlighed,True,55.690345,12.559457,2019-10-27T13:50,...,1050000167,,Gram køle/fryseskabSiemens komfurElectrolux va...,200.000 kr.,,Med tilladelse fra ejerforeningens bestyrelse,Nej,Termo,2018,Mursten
4,"Fredensborggade 2, 1. th.",False,56,https://home.dk/boligkatalog/koebenhavn/2200/e...,København N,Ejerlejlighed,True,55.698624,12.53988,2019-10-27T11:20,...,1050000137,,Køleskab fra Blomberg (A+)Komfur fra SMEG,110.000 kr.,Tilladt,,Fællesvaskeri,Termo,2018,Mursten


In [88]:
df2.columns

Index(['adresse', 'andenmaegler', 'boligOrGrundAreal', 'boligurl', 'city',
       'ejendomstypePrimaerNicename', 'isNew', 'lat', 'lng',
       'openHouseEndDate', 'openHouseStartDate', 'overskrift2', 'postal',
       'price', 'sagsnummer', 'Afstand til indkøb',
       'Afstand til off. transport', 'Afstand til skole', 'Altan',
       'Antal plan', 'Antal rum', 'Antal toiletter', 'Antenne', 'Boligareal',
       'Boligydelse pr. måned',
       'Brutto/Netto\r\n                        ?\n\r\n                        ekskl. ejerudgift',
       'Byggeår', 'Ejendomsværdi i kr.', 'Ejerudgift pr. md.', 'El',
       'Energimærke', 'Etage', 'Fibernet', 'Forurening', 'Grundareal', 'Gulve',
       'Heraf grundværdi i kr.', 'Husdyr', 'Husdyr tilladt', 'Kontantpris',
       'Kvm. pris ?', 'Købspris', 'Overtagelse', 'Prisudvikling', 'Pulterrum',
       'Sagsnr.', 'Teknisk pris ?', 'Tilbehør', 'Udbetaling', 'Udlejning',
       'Udlejning tilladt', 'Vaskeri', 'Vinduer', 'Vurderingsår', 'Ydermur'],
     

Alright, we can now do this entire process for multiple zip codes and more than 10 returns.

Note: Through trial and error I found the maximum number of returns to be 200 and in order to get all the data, we can use the URL to add search criteria to split our results into smaller bins.

In [141]:
# Zip codes in Denmark
#zipCode = [2200, 9210]
zipCode = [1301,2000,2100,2200,2300,2400,2450,2500,2600,2605,2610,2625,2630,
           2635,2640,2650,2660,2665,2670,2670,2680,2690,2700,2720,2730,2740,
           2750,2760,2765,2770,2791,2800,2820,2830,2840,2850,2860,2880,2900,
           2920,2930,2942,2950,2960,2970,2980,2990,3000,3050,3060,3070,3080,
           3100,3120,3140,3150,3200,3210,3220,3230,3250,3300,3310,3320,3330,
           3360,3370,3390,3400,3460,3480,3490,3500,3520,3540,3550,3600,3630,
           3650,3660,3670,3700,3720,3730,3740,3751,3760,3770,3782,3790,4000,
           4040,4050,4060,4070,4100,4130,4140,4160,4171,4173,4174,4180,4190,
           4200,4220,4230,4241,4242,4243,4250,4261,4262,4270,4281,4291,4293,
           4295,4296,4300,4320,4330,4340,4350,4360,4370,4390,4400,4420,4440,
           4450,4460,4470,4480,4490,4500,4520,4532,4534,4540,4550,4560,4571,
           4572,4573,4581,4583,4591,4592,4593,4600,4621,4622,4623,4632,4640,
           4652,4653,4654,4660,4671,4672,4673,4681,4682,4683,4684,4690,4700,
           4720,4733,4735,4736,4750,4760,4771,4772,4773,4780,4791,4792,4793,
           4800,4840,4850,4862,4863,4871,4872,4873,4874,4880,4891,4892,4894,
           4895,4900,4912,4913,4920,4930,4941,4943,4944,4951,4952,4953,4960,
           4970,4983,4990,5000,5200,5210,5220,5230,5240,5250,5260,5270,5290,
           5300,5330,5350,5370,5380,5390,5400,5450,5462,5463,5464,5466,5471,
           5474,5485,5491,5492,5500,5540,5550,5560,5580,5591,5592,5600,5610,
           5620,5631,5642,5672,5683,5690,5700,5750,5762,5771,5772,5792,5800,
           5853,5854,5856,5863,5871,5874,5881,5882,5883,5884,5892,5900,5932,
           5935,5953,5960,5970,5985,6000,6040,6051,6052,6064,6070,6091,6092,
           6093,6094,6100,6200,6230,6240,6261,6270,6280,6300,6310,6320,6330,
           6340,6360,6372,6392,6400,6430,6440,6470,6500,6510,6520,6535,6541,
           6560,6580,6600,6621,6622,6623,6630,6640,6650,6660,6670,6682,6683,
           6690,6700,6701,6705,6710,6715,6720,6731,6740,6752,6760,6771,6780,
           6792,6800,6818,6823,6830,6840,6851,6852,6853,6854,6855,6857,6862,
           6870,6880,6893,6900,6920,6933,6940,6950,6960,6971,6973,6980,6990,
           7000,7080,7100,7120,7130,7140,7150,7160,7171,7173,7182,7183,7184,
           7190,7200,7250,7260,7270,7280,7300,7321,7323,7330,7361,7362,7400,
           7430,7441,7442,7451,7470,7480,7490,7500,7540,7550,7560,7570,7600,
           7620,7650,7660,7673,7680,7700,7730,7741,7742,7752,7755,7760,7770,
           7790,7800,7830,7840,7850,7860,7870,7884,7900,7950,7960,7970,7980,
           7990,8000,8200,8210,8220,8230,8240,8250,8260,8270,8300,8305,8310,
           8320,8330,8340,8350,8355,8361,8362,8370,8380,8381,8382,8400,8410,
           8420,8444,8450,8462,8464,8471,8472,8500,8520,8530,8541,8543,8544,
           8550,8560,8570,8581,8585,8586,8592,8600,8620,8632,8641,8643,8653,
           8654,8660,8670,8680,8700,8721,8722,8723,8732,8740,8751,8752,8762,
           8763,8765,8766,8781,8783,8800,8830,8831,8832,8840,8850,8860,8870,
           8881,8882,8883,8900,8950,8961,8963,8970,8981,8983,8990,9000,9200,
           9210,9220,9230,9240,9260,9270,9280,9293,9300,9310,9320,9330,9340,
           9352,9362,9370,9380,9381,9382,9400,9430,9440,9460,9480,9490,9492,
           9493,9500,9510,9520,9530,9541,9550,9560,9574,9575,9600,9610,9620,
           9631,9632,9640,9670,9681,9690,9700,9740,9750,9760,9800,9830,9850,
           9870,9881,9900,9940,9970,9981,9982,9990
          ]


In [142]:

featureList = []
# Loop through zip codes
for code in zipCode:
    # If the zipcode is in one of the larger cities, split the search into chunks based on size
    if code in [1301, 2000, 2100, 2200, 2300, 2400, 2450, 2500,
                5000, 5200, 5210, 5220, 5230, 5240, 5250, 5260,
                5270, 8000, 8200, 8210, 8220, 8230, 8240, 9000,
                9200, 9210, 9220
               ]:
        # Setting size interval to bin responses into smaller chunks
        minSize = 11
        maxSize = 20
        # Loop through sizes
        for i in range(28):
            url = 'https://home.dk/umbraco/backoffice/home-api/SEARCH?CurrentPageNumber=0&SearchResultsPerPage=200&BoligstoerrelseMin=' + str(minSize) + '&BoligstoerrelseMax=' + str(maxSize) + '&q=' + str(code) + '&Energimaerker=null&SortOrder=asc&SearchType=0&_=1571481546474'

            response = requests.get(url)
            # Saving response to a dictionary
            featuresDict = response.json()
            # dropping the pictures key from the list of dictionaries
            features = featuresDict['searchResults']
            for f in features:
                del f['pictures']
            featureList.extend(features)
            # Pausing to not be a dick towards the server
            time.sleep(1)
            
            # Count up sizes
            minSize += 10
            maxSize += 10
        
        # Run one additional time with out the maximum boundry
        url = 'https://home.dk/umbraco/backoffice/home-api/SEARCH?CurrentPageNumber=0&SearchResultsPerPage=200&BoligstoerrelseMin=' + str(minSize) + '&q=' + str(code) + '&Energimaerker=null&SortOrder=asc&SearchType=0&_=1571481546474'

        response = requests.get(url)
        # Saving response to a dictionary
        featuresDict = response.json()
        # dropping the pictures key from the list of dictionaries
        features = featuresDict['searchResults']
        for f in features:
            del f['pictures']
        featureList.extend(features)
    # If the zipcode not in a larger city
    else:
        url = 'https://home.dk/umbraco/backoffice/home-api/SEARCH?CurrentPageNumber=0&SearchResultsPerPage=200&q=' + str(code) + '&Energimaerker=null&SortOrder=asc&SearchType=0&_=1571481546474'

        response = requests.get(url)
        # Saving response to a dictionary
        featuresDict = response.json()
        # dropping the pictures key from the list of dictionaries
        features = featuresDict['searchResults']
        for f in features:
            del f['pictures']
        featureList.extend(features)

len(featureList)

51207

In [143]:
df_new = pd.DataFrame(featureList)
df_new.drop(inplace = True, columns=[
    'billedeUrl','lejePerMaaned','showNewPrice',
    'aabenthusNicename','floorPlan','erSolgtOgLejebolig',
    'boligKanLejes','aabenthusShowRegistration', 
    'solgtBolig','isLejebolig','fokusbolig'
])
df_new.head()

Unnamed: 0,adresse,andenmaegler,boligOrGrundAreal,boligurl,city,ejendomstypePrimaerNicename,isNew,lat,lng,openHouseEndDate,openHouseStartDate,overskrift2,postal,price,sagsnummer
0,"A.D. Jørgensens Vej 75, 2. 1.",False,35.0,https://home.dk/boligkatalog/frederiksberg/200...,Frederiksberg,Ejerlejlighed,False,55.680726,12.494705,2019-10-27T11:20,2019-10-27T11:00,,2000,1.350.000,1300000111
1,"Holger Danskes Vej 14, 3. th.",False,46.0,https://home.dk/boligkatalog/frederiksberg/200...,Frederiksberg,Ejerlejlighed,False,55.686615,12.538356,,,,2000,2.145.000,1740000062
2,"Holger Danskes Vej 12, 3 tv",True,46.0,https://www.boligsiden.dk/viderestillingekster...,Frederiksberg,Ejerlejlighed,False,55.686592,12.538423,,,,2000,2.150.000,20002433_10007
3,"Ane Katrines Vej 16, 2 4",True,45.0,https://www.boligsiden.dk/viderestillingekster...,Frederiksberg,Ejerlejlighed,False,55.691713,12.535145,,,,2000,2.099.000,303562_100910_10009
4,"Ane Katrines Vej 16, St. 1",True,45.0,https://www.boligsiden.dk/viderestillingekster...,Frederiksberg,Ejerlejlighed,False,55.691713,12.535145,,,,2000,2.295.000,100088_103402_10009


In [144]:
df_new.to_csv('baseData.csv')

In [None]:
df_new2 = GetAdditionalFeatures(df_new)

0.0019528580076942606
0.0039057160153885212
0.005858574023082781
0.0078114320307770425
0.009764290038471303
0.011717148046165563
0.013670006053859825
0.015622864061554085
0.017575722069248345
0.019528580076942605
0.021481438084636865
0.023434296092331126
0.02538715410002539
0.02734001210771965
0.029292870115413906
0.03124572812310817
0.03319858613080243
0.03515144413849669
0.03710430214619095
0.03905716015388521
0.041010018161579474
0.04296287616927373
0.044915734176967995
0.04686859218466225
0.048821450192356515
0.05077430820005078
0.052727166207745035
0.0546800242154393
0.05663288222313356
0.05858574023082781
0.060538598238522076
0.06249145624621634
0.0644443142539106
0.06639717226160485
0.06835003026929912
0.07030288827699338
0.07225574628468763
0.0742086042923819
0.07616146230007616
0.07811432030777042
0.08006717831546468
0.08202003632315895
0.08397289433085321
0.08592575233854746
0.08787861034624173
0.08983146835393599
0.09178432636163025
0.0937371843693245
0.09569004237701877
0.0

0.8299646532700606
0.8319175112777549
0.8338703692854492
0.8358232272931434
0.8377760853008377
0.839728943308532
0.8416818013162263
0.8436346593239206
0.8455875173316147
0.847540375339309
0.8494932333470033
0.8514460913546975
0.8533989493623918
0.8553518073700861
0.8573046653777804
0.8592575233854747
0.8612103813931689
0.8631632394008631
0.8651160974085574
0.8670689554162516
0.8690218134239459
0.8709746714316402
0.8729275294393345
0.8748803874470288
0.876833245454723
0.8787861034624173
0.8807389614701115
0.8826918194778057
0.8846446774855
0.8865975354931943
0.8885503935008886
0.8905032515085829
0.8924561095162771
0.8944089675239714
0.8963618255316657
0.8983146835393598
0.9002675415470541
0.9022203995547484
0.9041732575624427
0.906126115570137
0.9080789735778312
0.9100318315855255
0.9119846895932198
0.913937547600914
0.9158904056086082
0.9178432636163025
0.9197961216239968
0.9217489796316911
0.9237018376393853
0.9256546956470796
0.9276075536547739
0.929560411662468
0.9315132696701622
0.

1.6833636026324525
1.6853164606401467
1.6872693186478411
1.6892221766555353
1.6911750346632295
1.693127892670924
1.695080750678618
1.6970336086863123
1.6989864666940067
1.7009393247017008
1.702892182709395
1.7048450407170894
1.7067978987247836
1.708750756732478
1.7107036147401722
1.7126564727478664
1.7146093307555608
1.716562188763255
1.7185150467709494
1.7204679047786435
1.7224207627863377
1.7243736207940321
1.7263264788017263
1.7282793368094205
1.7302321948171149
1.732185052824809
1.7341379108325032
1.7360907688401976
1.7380436268478918
1.7399964848555862
1.7419493428632804
1.7439022008709746
1.745855058878669
1.7478079168863632
1.7497607748940576
1.7517136329017517
1.753666490909446
1.7556193489171403
1.7575722069248345
1.7595250649325287
1.761477922940223
1.7634307809479173
1.7653836389556115
1.7673364969633059
1.769289354971
1.7712422129786944
1.7731950709863886
1.7751479289940828
1.7771007870017772
1.7790536450094714
1.7810065030171658
1.78295936102486
1.7849122190325541
1.786865

2.5465268420333156
2.54847970004101
2.550432558048704
2.5523854160563983
2.5543382740640928
2.556291132071787
2.558243990079481
2.5601968480871755
2.56214970609487
2.564102564102564
2.5660554221102583
2.5680082801179527
2.5699611381256466
2.571913996133341
2.5738668541410354
2.5758197121487294
2.577772570156424
2.579725428164118
2.5816782861718126
2.5836311441795066
2.585584002187201
2.5875368601948954
2.5894897182025893
2.5914425762102837
2.593395434217978
2.595348292225672
2.5973011502333665
2.599254008241061
2.601206866248755
2.6031597242564493
2.6051125822641437
2.6070654402718376
2.609018298279532
2.6109711562872264
2.6129240142949204
2.614876872302615
2.616829730310309
2.6187825883180036
2.6207354463256975
2.622688304333392
2.6246411623410864
2.6265940203487803
2.6285468783564747
2.630499736364169
2.632452594371863
2.6344054523795575
2.636358310387252
2.638311168394946
2.6402640264026402
2.6422168844103346
2.644169742418029
2.646122600425723
2.6480754584334174
2.650028316441112
2

3.413595797449567
3.4155486554572616
3.417501513464956
3.41945437147265
3.4214072294803444
3.4233600874880388
3.4253129454957327
3.427265803503427
3.4292186615111215
3.4311715195188155
3.43312437752651
3.4350772355342043
3.4370300935418987
3.4389829515495927
3.440935809557287
3.4428886675649815
3.4448415255726754
3.44679438358037
3.4487472415880642
3.450700099595758
3.4526529576034526
3.454605815611147
3.456558673618841
3.4585115316265354
3.4604643896342298
3.4624172476419237
3.464370105649618
3.4663229636573125
3.4682758216650065
3.470228679672701
3.4721815376803953
3.4741343956880897
3.4760872536957836
3.478040111703478
3.4799929697111724
3.4819458277188664
3.483898685726561
3.485851543734255
3.487804401741949
3.4897572597496436
3.491710117757338
3.493662975765032
3.4956158337727263
3.4975686917804207
3.499521549788115
3.501474407795809
3.5034272658035035
3.505380123811198
3.507332981818892
3.5092858398265863
3.5112386978342807
3.5131915558419746
3.515144413849669
3.5170972718573634


4.288476184896596
4.29042904290429
4.292381900911985
4.294334758919679
4.296287616927373
4.298240474935067
4.3001933329427615
4.302146190950456
4.30409904895815
4.306051906965845
4.308004764973539
4.309957622981233
4.311910480988927
4.313863338996621
4.315816197004316
4.31776905501201
4.319721913019705
4.321674771027398
4.323627629035093
4.325580487042787
4.327533345050481
4.329486203058176
4.33143906106587
4.333391919073564
4.335344777081258
4.3372976350889525
4.339250493096647
4.341203351104341
4.343156209112036
4.34510906711973
4.347061925127424
4.349014783135118
4.350967641142812
4.352920499150507
4.354873357158201
4.356826215165896
4.358779073173589
4.3607319311812835
4.362684789188978
4.364637647196672
4.366590505204367
4.368543363212061
4.370496221219755
4.372449079227449
4.3744019372351435
4.376354795242838
4.378307653250532
4.380260511258227
4.382213369265921
4.384166227273615
4.386119085281309
4.388071943289003
4.390024801296698
4.391977659304392
4.393930517312087
4.395883375

5.177026578397484
5.178979436405179
5.180932294412873
5.1828851524205675
5.184838010428262
5.186790868435956
5.18874372644365
5.190696584451344
5.192649442459039
5.194602300466733
5.196555158474427
5.198508016482122
5.200460874489816
5.20241373249751
5.204366590505204
5.2063194485128985
5.208272306520593
5.210225164528287
5.212178022535982
5.214130880543675
5.21608373855137
5.218036596559064
5.2199894545667584
5.221942312574453
5.223895170582147
5.225848028589841
5.227800886597535
5.22975374460523
5.231706602612924
5.233659460620618
5.235612318628313
5.237565176636007
5.239518034643701
5.241470892651395
5.2434237506590895
5.245376608666784
5.247329466674478
5.249282324682173
5.251235182689866
5.253188040697561
5.255140898705255
5.257093756712949
5.259046614720644
5.260999472728338
5.262952330736033
5.264905188743726
5.2668580467514206
5.268810904759115
5.270763762766809
5.272716620774504
5.274669478782198
5.276622336789892
5.278575194797586
5.2805280528052805
5.282480910812975
5.284433

6.065576971898373
6.067529829906068
6.069482687913761
6.071435545921456
6.07338840392915
6.0753412619368445
6.077294119944539
6.079246977952233
6.081199835959927
6.083152693967621
6.085105551975316
6.08705840998301
6.0890112679907045
6.090964125998399
6.092916984006093
6.094869842013787
6.096822700021481
6.098775558029176
6.10072841603687
6.102681274044564
6.104634132052259
6.106586990059952
6.108539848067647
6.110492706075341
6.1124455640830355
6.11439842209073
6.116351280098424
6.118304138106119
6.120256996113812
6.122209854121507
6.124162712129201
6.1261155701368954
6.12806842814459
6.130021286152284
6.131974144159978
6.133927002167672
6.135879860175367
6.137832718183061
6.139785576190755
6.14173843419845
6.143691292206144
6.145644150213838
6.147597008221532
6.1495498662292265
6.151502724236921
6.153455582244615
6.15540844025231
6.157361298260003
6.159314156267698
6.161267014275392
6.163219872283086
6.165172730290781
6.167125588298475
6.169078446306169
6.171031304313863
6.1729841623

6.954127365399263
6.956080223406956
6.958033081414651
6.959985939422345
6.961938797430039
6.963891655437733
6.965844513445428
6.967797371453122
6.969750229460817
6.97170308746851
6.973655945476205
6.975608803483898
6.977561661491594
6.979514519499287
6.981467377506982
6.983420235514676
6.98537309352237
6.987325951530064
6.989278809537758
6.991231667545453
6.993184525553146
6.9951373835608415
6.997090241568535
6.99904309957623
7.000995957583924
7.002948815591618
7.004901673599312
7.006854531607007
7.0088073896147005
7.010760247622396
7.012713105630089
7.014665963637784
7.016618821645477
7.0185716796531725
7.020524537660866
7.022477395668561
7.024430253676255
7.026383111683949
7.028335969691643
7.030288827699338
7.032241685707032
7.034194543714727
7.03614740172242
7.038100259730115
7.040053117737808
7.042005975745504
7.043958833753197
7.045911691760892
7.047864549768586
7.04981740777628
7.051770265783974
7.053723123791669
7.055675981799363
7.057628839807058
7.059581697814751
7.0615345558

7.84267775890015
7.844630616907845
7.846583474915539
7.848536332923234
7.850489190930928
7.852442048938622
7.854394906946316
7.856347764954011
7.858300622961704
7.8602534809694
7.862206338977093
7.864159196984787
7.866112054992482
7.868064913000175
7.87001777100787
7.871970629015563
7.873923487023259
7.875876345030952
7.877829203038647
7.879782061046341
7.881734919054035
7.883687777061729
7.885640635069424
7.887593493077118
7.889546351084813
7.8914992090925065
7.893452067100201
7.895404925107894
7.89735778311559
7.899310641123283
7.9012634991309785
7.903216357138672
7.905169215146366
7.90712207315406
7.909074931161755
7.911027789169449
7.912980647177144
7.9149335051848375
7.916886363192532
7.918839221200226
7.920792079207921
7.922744937215614
7.9246977952233095
7.926650653231003
7.928603511238698
7.930556369246392
7.932509227254086
7.93446208526178
7.936414943269475
7.938367801277169
7.940320659284864
7.942273517292557
7.944226375300252
7.946179233307945
7.948132091315641
7.95008494932

8.74099244243951
8.742945300447204
8.744898158454898
8.746851016462593
8.748803874470287
8.750756732477981
8.752709590485676
8.75466244849337
8.756615306501065
8.758568164508757
8.760521022516453
8.762473880524146
8.764426738531842
8.766379596539535
8.76833245454723
8.770285312554924
8.772238170562618
8.774191028570312
8.776143886578007
8.778096744585701
8.780049602593396
8.782002460601088
8.783955318608784
8.785908176616477
8.787861034624173
8.789813892631866
8.79176675063956
8.793719608647255
8.795672466654949
8.797625324662643
8.799578182670338
8.801531040678032
8.803483898685727
8.805436756693421
8.807389614701115
8.809342472708808
8.811295330716504
8.813248188724197
8.815201046731893
8.817153904739586
8.81910676274728
8.821059620754975
8.823012478762669
8.824965336770363
8.826918194778058
8.828871052785752
8.830823910793447
8.83277676880114
8.834729626808835
8.836682484816528
8.838635342824224
8.840588200831917
8.842541058839611
8.844493916847306
8.846446774855
8.848399632862694
8

9.641259983986563
9.64321284199426
9.645165700001952
9.647118558009646
9.64907141601734
9.651024274025035
9.65297713203273
9.654929990040424
9.656882848048118
9.658835706055813
9.660788564063507
9.662741422071202
9.664694280078894
9.66664713808659
9.668599996094283
9.67055285410198
9.672505712109672
9.674458570117366
9.67641142812506
9.678364286132755
9.68031714414045
9.682270002148144
9.684222860155838
9.686175718163533
9.688128576171225
9.690081434178921
9.692034292186614
9.69398715019431
9.695940008202003
9.697892866209697
9.699845724217392
9.701798582225086
9.70375144023278
9.705704298240475
9.70765715624817
9.709610014255864
9.711562872263556
9.713515730271252
9.715468588278945
9.717421446286641
9.719374304294334
9.721327162302028
9.723280020309723
9.725232878317417
9.727185736325112
9.729138594332806
9.7310914523405
9.733044310348195
9.73499716835589
9.736950026363584
9.738902884371276
9.740855742378972
9.742808600386665
9.744761458394361
9.746714316402054
9.748667174409748
9.750

10.512234655418204
10.514187513425899
10.516140371433593
10.518093229441288
10.52004608744898
10.521998945456676
10.523951803464369
10.525904661472065
10.527857519479758
10.529810377487452
10.531763235495147
10.533716093502841
10.535668951510536
10.53762180951823
10.539574667525924
10.541527525533619
10.543480383541311
10.545433241549008
10.5473860995567
10.549338957564396
10.551291815572089
10.553244673579783
10.555197531587478
10.557150389595172
10.559103247602867
10.561056105610561
10.563008963618255
10.56496182162595
10.566914679633644
10.568867537641339
10.570820395649031
10.572773253656727
10.57472611166442
10.576678969672116
10.578631827679809
10.580584685687503
10.582537543695198
10.584490401702892
10.586443259710586
10.58839611771828
10.590348975725975
10.59230183373367
10.594254691741362
10.596207549749058
10.598160407756751
10.600113265764447
10.60206612377214
10.604018981779834
10.605971839787529
10.607924697795223
10.609877555802917
10.611830413810612
10.613783271818306
10

11.363680746772902
11.365633604780596
11.36758646278829
11.369539320795985
11.37149217880368
11.373445036811374
11.375397894819066
11.377350752826763
11.379303610834455
11.381256468842151
11.383209326849844
11.385162184857538
11.387115042865233
11.389067900872927
11.391020758880622
11.392973616888316
11.39492647489601
11.396879332903705
11.398832190911397
11.400785048919094
11.402737906926786
11.404690764934482
11.406643622942175
11.40859648094987
11.410549338957564
11.412502196965258
11.414455054972953
11.416407912980647
11.418360770988341
11.420313628996036
11.422266487003728
11.424219345011425
11.426172203019117
11.428125061026813
11.430077919034506
11.4320307770422
11.433983635049895
11.43593649305759
11.437889351065284
11.439842209072978
11.441795067080673
11.443747925088367
11.445700783096061
11.447653641103756
11.449606499111448
11.451559357119145
11.453512215126837
11.455465073134533
11.457417931142226
11.45937078914992
11.461323647157615
11.46327650516531
11.465229363173004
11

12.2151268381276
12.217079696135293
12.21903255414299
12.220985412150682
12.222938270158377
12.224891128166071
12.226843986173765
12.22879684418146
12.230749702189152
12.232702560196849
12.234655418204541
12.236608276212237
12.23856113421993
12.240513992227624
12.242466850235319
12.244419708243013
12.246372566250708
12.248325424258402
12.250278282266096
12.252231140273791
12.254183998281484
12.25613685628918
12.258089714296872
12.260042572304569
12.261995430312261
12.263948288319956
12.26590114632765
12.267854004335344
12.269806862343039
12.271759720350733
12.273712578358428
12.275665436366122
12.277618294373816
12.27957115238151
12.281524010389203
12.2834768683969
12.285429726404592
12.287382584412288
12.289335442419981
12.291288300427675
12.29324115843537
12.295194016443064
12.297146874450759
12.299099732458453
12.301052590466147
12.303005448473842
12.304958306481534
12.30691116448923
12.308864022496923
12.31081688050462
12.312769738512312
12.314722596520006
12.3166754545277
12.31862

13.066572929482298
13.068525787489992
13.070478645497685
13.07243150350538
13.074384361513076
13.07633721952077
13.078290077528463
13.080242935536157
13.082195793543852
13.084148651551548
13.086101509559239
13.088054367566935
13.09000722557463
13.091960083582324
13.093912941590016
13.09586579959771
13.097818657605407
13.099771515613101
13.101724373620794
13.103677231628488
13.105630089636183
13.107582947643875
13.10953580565157
13.111488663659266
13.11344152166696
13.115394379674653
13.117347237682347
13.119300095690042
13.121252953697738
13.123205811705429
13.125158669713125
13.12711152772082
13.129064385728514
13.131017243736206
13.1329701017439
13.134922959751597
13.136875817759291
13.138828675766984
13.140781533774678
13.142734391782373
13.144687249790069
13.146640107797761
13.148592965805456
13.15054582381315
13.152498681820845
13.154451539828537
13.156404397836233
13.158357255843928
13.160310113851622
13.162262971859315
13.16421582986701
13.166168687874706
13.1681215458824
13.170

13.918019020836994
13.91997187884469
13.921924736852384
13.923877594860079
13.925830452867771
13.927783310875466
13.929736168883162
13.931689026890856
13.933641884898549
13.935594742906243
13.937547600913938
13.939500458921634
13.941453316929325
13.94340617493702
13.945359032944715
13.94731189095241
13.949264748960102
13.951217606967797
13.953170464975493
13.955123322983187
13.95707618099088
13.959029038998574
13.960981897006269
13.962934755013965
13.964887613021656
13.966840471029352
13.968793329037046
13.97074618704474
13.972699045052433
13.974651903060128
13.976604761067824
13.978557619075517
13.980510477083211
13.982463335090905
13.9844161930986
13.986369051106292
13.988321909113989
13.990274767121683
13.992227625129377
13.99418048313707
13.996133341144764
13.99808619915246
14.000039057160155
14.001991915167848
14.003944773175542
14.005897631183236
14.007850489190933
14.009803347198623
14.01175620520632
14.013709063214014
14.015661921221708
14.017614779229401
14.019567637237095
14.

14.769465112191693
14.771417970199389
14.77337082820708
14.775323686214776
14.77727654422247
14.779229402230165
14.781182260237857
14.783135118245552
14.785087976253248
14.787040834260942
14.788993692268635
14.79094655027633
14.792899408284024
14.79485226629172
14.79680512429941
14.798757982307107
14.800710840314801
14.802663698322496
14.804616556330188
14.806569414337883
14.808522272345579
14.810475130353273
14.812427988360966
14.81438084636866
14.816333704376355
14.818286562384051
14.820239420391742
14.822192278399438
14.824145136407132
14.826097994414827
14.82805085242252
14.830003710430214
14.83195656843791
14.833909426445604
14.835862284453297
14.837815142460991
14.839768000468686
14.841720858476382
14.843673716484073
14.845626574491769
14.847579432499463
14.849532290507156
14.85148514851485
14.853438006522545
14.855390864530241
14.857343722537934
14.859296580545628
14.861249438553322
14.863202296561017
14.86515515456871
14.867108012576406
14.8690608705841
14.871013728591794
14.87

15.62091120354639
15.622864061554084
15.624816919561779
15.626769777569475
15.628722635577166
15.630675493584862
15.632628351592556
15.63458120960025
15.636534067607943
15.638486925615638
15.640439783623334
15.642392641631028
15.644345499638721
15.646298357646415
15.64825121565411
15.650204073661806
15.652156931669497
15.654109789677193
15.656062647684887
15.658015505692582
15.659968363700274
15.661921221707969
15.663874079715665
15.66582693772336
15.667779795731052
15.669732653738746
15.67168551174644
15.673638369754137
15.675591227761828
15.677544085769524
15.679496943777218
15.681449801784913
15.683402659792606
15.6853555178003
15.687308375807996
15.68926123381569
15.691214091823383
15.693166949831078
15.695119807838772
15.697072665846468
15.69902552385416
15.700978381861855
15.70293123986955
15.704884097877244
15.706836955884937
15.708789813892633
15.710742671900327
15.712695529908022
15.714648387915714
15.716601245923409
15.718554103931105
15.7205069619388
15.722459819946492
15.72

16.47821586892417
16.480168726931865
16.48212158493956
16.484074442947254
16.486027300954948
16.487980158962642
16.489933016970337
16.49188587497803
16.493838732985726
16.49579159099342
16.497744449001114
16.499697307008805
16.5016501650165
16.503603023024198
16.505555881031892
16.507508739039583
16.509461597047277
16.511414455054975
16.51336731306267
16.51532017107036
16.517273029078055
16.51922588708575
16.521178745093444
16.523131603101138
16.525084461108833
16.527037319116527
16.52899017712422
16.530943035131916
16.53289589313961
16.534848751147305
16.536801609155
16.538754467162693
16.540707325170388
16.542660183178082
16.544613041185777
16.546565899193467
16.548518757201165
16.55047161520886
16.552424473216554
16.554377331224245
16.55633018923194
16.558283047239637
16.560235905247332
16.562188763255023
16.564141621262717
16.56609447927041
16.56804733727811
16.5700001952858
16.571953053293495
16.57390591130119
16.575858769308883
16.577811627316578
16.579764485324272
16.58171734333

17.339426250317338
17.341379108325032
17.343331966332727
17.345284824340425
17.347237682348116
17.34919054035581
17.351143398363504
17.3530962563712
17.355049114378893
17.357001972386588
17.358954830394282
17.360907688401976
17.36286054640967
17.364813404417365
17.36676626242506
17.368719120432754
17.37067197844045
17.372624836448143
17.374577694455837
17.37653055246353
17.378483410471222
17.38043626847892
17.382389126486615
17.38434198449431
17.386294842502
17.388247700509694
17.390200558517392
17.392153416525087
17.394106274532778
17.396059132540472
17.398011990548166
17.399964848555864
17.401917706563555
17.40387056457125
17.405823422578944
17.40777628058664
17.409729138594333
17.411681996602027
17.41363485460972
17.415587712617416
17.41754057062511
17.419493428632805
17.4214462866405
17.423399144648194
17.425352002655885
17.427304860663583
17.429257718671277
17.43121057667897
17.433163434686662
17.435116292694357
17.437069150702055
17.43902200870975
17.44097486671744
17.44292772472

18.20063663171051
18.2025894897182
18.204542347725898
18.206495205733592
18.208448063741287
18.210400921748978
18.212353779756675
18.21430663776437
18.216259495772064
18.218212353779755
18.22016521178745
18.222118069795144
18.224070927802842
18.226023785810533
18.227976643818227
18.22992950182592
18.23188235983362
18.23383521784131
18.235788075849005
18.2377409338567
18.239693791864394
18.241646649872088
18.243599507879782
18.245552365887477
18.24750522389517
18.249458081902866
18.25141093991056
18.253363797918254
18.25531665592595
18.25726951393364
18.259222371941338
18.261175229949032
18.263128087956726
18.265080945964417
18.26703380397211
18.26898666197981
18.270939519987504
18.272892377995195
18.27484523600289
18.276798094010584
18.27875095201828
18.280703810025972
18.282656668033667
18.28460952604136
18.286562384049056
18.28851524205675
18.290468100064444
18.29242095807214
18.294373816079833
18.296326674087528
18.298279532095222
18.300232390102916
18.30218524811061
18.304138106118

19.06379987111137
19.065752729119065
19.06770558712676
19.069658445134454
19.07161130314215
19.073564161149843
19.075517019157537
19.077469877165232
19.079422735172926
19.08137559318062
19.083328451188315
19.08528130919601
19.087234167203704
19.089187025211395
19.091139883219093
19.093092741226787
19.09504559923448
19.096998457242172
19.098951315249867
19.100904173257565
19.10285703126526
19.10480988927295
19.106762747280644
19.10871560528834
19.110668463296037
19.112621321303727
19.114574179311422
19.116527037319116
19.11847989532681
19.120432753334505
19.1223856113422
19.124338469349894
19.12629132735759
19.128244185365283
19.130197043372977
19.13214990138067
19.134102759388366
19.136055617396057
19.138008475403755
19.13996133341145
19.141914191419144
19.143867049426834
19.14581990743453
19.147772765442227
19.14972562344992
19.151678481457612
19.153631339465306
19.155584197473
19.1575370554807
19.15948991348839
19.161442771496084
19.16339562950378
19.165348487511473
19.16730134551916

19.925010252504542
19.926963110512236
19.928915968519927
19.93086882652762
19.93282168453532
19.934774542543014
19.936727400550705
19.9386802585584
19.940633116566094
19.942585974573788
19.944538832581483
19.946491690589177
19.94844454859687
19.950397406604566
19.95235026461226
19.954303122619955
19.95625598062765
19.958208838635343
19.960161696643038
19.962114554650732
19.964067412658427
19.96602027066612
19.967973128673812
19.96992598668151
19.971878844689204
19.9738317026969
19.97578456070459
19.977737418712284
19.97969027671998
19.981643134727676
19.983595992735367
19.98554885074306
19.987501708750756
19.989454566758454
19.991407424766145
19.99336028277384
19.995313140781533
19.997265998789228
19.999218856796922
20.001171714804617
20.00312457281231
20.005077430820005
20.0070302888277
20.008983146835394
20.01093600484309
20.012888862850783
20.014841720858474
20.016794578866172
20.018747436873866
20.02070029488156
20.02265315288925
20.024606010896946
20.026558868904644
20.02851172691

20.78622063389771
20.788173491905404
20.7901263499131
20.792079207920793
20.794032065928487
20.79598492393618
20.797937781943876
20.799890639951567
20.801843497959265
20.80379635596696
20.805749213974654
20.807702071982344
20.80965492999004
20.811607787997737
20.81356064600543
20.815513504013122
20.817466362020816
20.81941922002851
20.82137207803621
20.8233249360439
20.825277794051594
20.82723065205929
20.829183510066983
20.831136368074677
20.83308922608237
20.835042084090066
20.83699494209776
20.838947800105455
20.84090065811315
20.842853516120844
20.844806374128538
20.84675923213623
20.848712090143927
20.85066494815162
20.852617806159316
20.854570664167007
20.8565235221747
20.8584763801824
20.860429238190093
20.862382096197784
20.86433495420548
20.866287812213173
20.86824067022087
20.87019352822856
20.872146386236256
20.87409924424395
20.876052102251645
20.87800496025934
20.879957818267034
20.881910676274728
20.883863534282423
20.885816392290117
20.88776925029781
20.889722108305506
2

21.647431015290877
21.64938387329857
21.651336731306266
21.653289589313964
21.655242447321655
21.65719530532935
21.659148163337044
21.661101021344738
21.663053879352432
21.665006737360127
21.66695959536782
21.668912453375516
21.67086531138321
21.672818169390904
21.6747710273986
21.676723885406293
21.678676743413984
21.680629601421682
21.682582459429376
21.68453531743707
21.68648817544476
21.688441033452456
21.690393891460154
21.69234674946785
21.69429960747554
21.696252465483234
21.698205323490928
21.700158181498626
21.702111039506317
21.70406389751401
21.706016755521706
21.7079696135294
21.709922471537094
21.71187532954479
21.713828187552483
21.715781045560178
21.717733903567872
21.719686761575566
21.72163961958326
21.723592477590955
21.725545335598646
21.727498193606344
21.72945105161404
21.731403909621733
21.733356767629424
21.735309625637118
21.737262483644816
21.73921534165251
21.7411681996602
21.743121057667896
21.74507391567559
21.747026773683288
21.74897963169098
21.75093248969

In [None]:
df_new2.to_csv('extendedData.csv')

In [126]:
df_new2 = GetAdditionalFeatures(df_new.head())
df_new2.head()

20.0
40.0
60.0
80.0
100.0


Unnamed: 0,adresse,andenmaegler,boligOrGrundAreal,boligurl,city,ejendomstypePrimaerNicename,isNew,lat,lng,openHouseEndDate,...,Overtagelse,Prisudvikling,Sagsnr.,Tilbehør,Udbetaling,Udlejning tilladt,Vaskeri,Vinduer,Vurderingsår,Ydermur
0,"Dagmarsgade 36, 1 Lejl. 4",True,32,https://www.boligsiden.dk/viderestillingekster...,København N,Ejerlejlighed,True,55.6986,12.546053,,...,,,,,,,,,,
1,"Åboulevard 34D, 5 th",True,37,https://www.boligsiden.dk/viderestillingekster...,København N,Ejerlejlighed,False,55.684981,12.554957,,...,,,,,,,,,,
2,"Dagmarsgade 36, 4",True,32,https://www.boligsiden.dk/viderestillingekster...,København N,Ejerlejlighed,False,55.6986,12.546053,,...,,,,,,,,,,
3,"Søllerødgade 46, 5 tv",True,37,https://www.boligsiden.dk/viderestillingekster...,København N,Ejerlejlighed,False,55.696019,12.543893,,...,,,,,,,,,,
4,"Slejpnersgade 6, 1. 3.",False,44,https://home.dk/boligkatalog/koebenhavn/2200/e...,København N,Ejerlejlighed,False,55.701175,12.543234,2019-10-27T16:20,...,Efter aftale,-5%,1050000126.0,Whirlpool vaskemaskineGorenje gaskomfur - Bauk...,95.000 kr.,"Tilladt, jf. vedtægternes § 13",Vaskemaskine i lejligheden,Termo,2018.0,Mursten


In [None]:
df_new2.head()

In [75]:
df2.columns

Index(['adresse', 'andenmaegler', 'boligOrGrundAreal', 'boligurl', 'city',
       'ejendomstypePrimaerNicename', 'isNew', 'lat', 'lng',
       'openHouseEndDate', 'openHouseStartDate', 'overskrift2', 'postal',
       'price', 'sagsnummer', 'Afstand til indkøb',
       'Afstand til off. transport', 'Afstand til skole', 'Altan',
       'Antal plan', 'Antal rum', 'Antal toiletter', 'Antenne', 'Boligareal',
       'Boligydelse pr. måned',
       'Brutto/Netto\r\n                        ?\n\r\n                        ekskl. ejerudgift',
       'Byggeår', 'Ejendomsværdi i kr.', 'Ejerudgift pr. md.', 'El',
       'Energimærke', 'Etage', 'Fibernet', 'Forurening', 'Grundareal', 'Gulve',
       'Heraf grundværdi i kr.', 'Husdyr', 'Husdyr tilladt', 'Kontantpris',
       'Kvm. pris ?', 'Købspris', 'Overtagelse', 'Prisudvikling', 'Pulterrum',
       'Sagsnr.', 'Teknisk pris ?', 'Tilbehør', 'Udbetaling', 'Udlejning',
       'Udlejning tilladt', 'Vaskeri', 'Vinduer', 'Vurderingsår', 'Ydermur'],
     

In [76]:
df2.shape

(10, 55)

In [213]:
feats = dict['searchResults']
url = 'https://home.dk/umbraco/backoffice/home-api/SEARCH?CurrentPageNumber=0&SearchResultsPerPage=200&BoligstoerrelseMin=60&BoligstoerrelseMax=70&q=2200&Energimaerker=null&SortOrder=asc&SearchType=0&_=1571481546474'
response = requests.get(url)    
dict2 = response.json()
feats2 = dict2['searchResults']

In [214]:
len(feats2)

21

In [188]:
for f in feats:
    del f['pictures']

TypeError: list indices must be integers or slices, not str

In [144]:
len(feats)

200

In [165]:
feats[0]['floorPlan']

{'PicId': 2985754,
 'CaseId': 10395211,
 'CaseNumber': '1050000137',
 'MediaType': 'p',
 'MaxWidth': 3000,
 'MaxHeight': 2000,
 'URL': 'https://home.mindworking.eu/resources/shops/105/cases/1050000137/casemedia/images/d648ebc328cc9e08efd3e8f608061497/customsize.jpg?deviceId=jd83hsdf3',
 'Position': 0,
 'Description': 'Plantegning',
 'GUID': 'd648ebc3-28cc-9e08-efd3-e8f608061497',
 'refGUID': '00000000-0000-0000-0000-000000000000',
 'IsVertical': False,
 'IsHorizontal': True}

In [145]:
df = pd.DataFrame(feats)

In [179]:
df.drop(inplace = True, columns=[
    'billedeUrl','lejePerMaaned','showNewPrice',
    'aabenthusNicename','floorPlan','erSolgtOgLejebolig',
    'boligKanLejes','aabenthusShowRegistration', 
    'solgtBolig','isLejebolig','fokusbolig'
])

In [149]:
df.postal.value_counts()

2200    200
Name: postal, dtype: int64

In [180]:
df.nunique()

adresse                        200
andenmaegler                     2
boligOrGrundAreal               80
city                             1
ejendomstypePrimaerNicename      3
isNew                            2
lat                            154
lng                            154
openHouseEndDate                 5
openHouseStartDate               5
overskrift2                     30
postal                           1
price                          114
sagsnummer                     200
dtype: int64

In [70]:
df.nunique()

adresse                        10
andenmaegler                    1
boligOrGrundAreal              10
boligurl                       10
city                            1
ejendomstypePrimaerNicename     2
isNew                           2
lat                            10
lng                            10
openHouseEndDate                8
openHouseStartDate              8
overskrift2                    10
postal                          1
price                          10
sagsnummer                     10
dtype: int64

In [82]:
df.shape

(22, 27)

In [65]:
url = 'https://home.dk/resultatliste/?CurrentPageNumber=0&SearchResultsPerPage=15&q=2200%20K%C3%B8benhavn%20N&Energimaerker=null&SearchType=0'

In [11]:
content_div = soup.find_all('home-tile-info')
content_div

[]

In [21]:
urlreq = 'https://home.dk/umbraco/backoffice/home-api/BoligOrAddress/Boligdata?max=100&searchstring=2200'

In [24]:
# import json library
import json
import urllib.request
# request url
#urlreq = 'https://groceries.asda.com/api/items/search?keyword=yogurt'
# get response
response = urllib.request.urlopen(urlreq)
# load as json
jresponse = json.load(response)
json.loads(line.decode("utf-8","ignore"))
# write to file as pretty print
with open('asdaresp.json', 'w') as outfile:
    json.dump(jresponse, outfile, sort_keys=True, indent=4)
response.read()

b''

In [26]:
req = urllib.request.Request(urlreq)
with urllib.request.urlopen(req) as response:
   the_page = response.read()
print(the_page)

b'{"Successed":true,"Status":"OK","InputModel":{"SearchString":"2200","Max":100},"SuggestItems":[{"suggest":"2200 K\xc3\xb8benhavn N","count":"200","sortorder":40,"IsHeadLine":false}]}'


In [66]:
import requests

url = 'https://home.dk/umbraco/backoffice/home-api/SEARCH?CurrentPageNumber=0&SearchResultsPerPage=15&q=2200%20K%C3%B8benhavn%20N&Energimaerker=null&SearchType=0&_=1571481546474'
response = requests.get(url)    
dict = response.json()
dict

{'redirectUrl': None,
 'inputModel': {'SortType': None,
  'SortOrder': None,
  'CurrentPageNumber': 0,
  'SearchResultsPerPage': 15,
  'q': '2200 København N',
  'EjendomstypeV1': None,
  'EjendomstypeRH': None,
  'EjendomstypeEL': None,
  'EjendomstypeVL': None,
  'EjendomstypeAA': None,
  'EjendomstypePL': None,
  'EjendomstypeFH': None,
  'EjendomstypeLO': None,
  'EjendomstypeHG': None,
  'EjendomstypeFG': None,
  'EjendomstypeNL': None,
  'Forretningnr': None,
  'ProjectNodeId': None,
  'OnlyBrokerHome': None,
  'PriceMin': None,
  'PriceMax': None,
  'EjerudgiftPrMdrMin': None,
  'EjerudgiftPrMdrMax': None,
  'BoligydelsePrMdrMin': None,
  'BoligydelsePrMdrMax': None,
  'BoligstoerrelseMin': None,
  'BoligstoerrelseMax': None,
  'GrundstoerrelseMin': None,
  'GrundstoerrelseMax': None,
  'VaerelserMin': None,
  'VaerelserMax': None,
  'Energimaerker': ['null'],
  'ByggaarMin': None,
  'ByggaarMax': None,
  'EtageMin': None,
  'EtageMax': None,
  'PlanMin': None,
  'PlanMax': None

In [59]:
#urlreq = 'https://home.dk/resultatliste/?q=2200+K%C3%B8benhavn+N:33'
urlreq = 'https://home.dk/umbraco/backoffice/home-api/SEARCH?CurrentPageNumber=0&SearchResultsPerPage=15&q=2200%20K%C3%B8benhavn%20N&Energimaerker=null&SearchType=0&_=1571481546474'
response = urllib.request.urlopen(urlreq)
#req = urllib.request.Request(urlreq)
#with urllib.request.urlopen(req) as response:
#   the_page = response.read()
#print(the_page)
# load as json
response.read()#.decode('utf-8')
#jresponse = json.load(response, encoding='utf-8')
#json.loads(line.decode("utf-8","ignore"))
# write to file as pretty print
#with open('asdaresp.json', 'w') as outfile:
#    json.dump(jresponse, outfile, sort_keys=True, indent=4)

b'\xd5\xbd\xddr\x1c\xc7\x96\xa5\xf9*\x18\\\xf4\x95\x1c\x8cp\x0f\xf7\x88\xa0YY\x1bO\x89\xfa)Q?FR*\x1b+k\xa3yDx\x80\x10A\x80\x93\x99\x90\x8eN\xd9\xb9\x9c~\x86\xba\x9c\x9bj\xb3y\x83\xba\x1e\xbd\xd8|;\tGF\xa6\xa7\x8e\xe8$\x11\xec\xd0\xe9f\x91 \xc0\x05\xe4\xca\xf0\x9f\xbd\xd7^\xeb\xdfOWa\xb8X\x85~\xf3\xe3\xea\xf2\xf4\xe1\xd5\xcd\xe5\xe5g\xa7\x17Won6\xdf^\x0f\x81\x8f\xfc\xfb\xe9\xb3\xeb\xd5\xe6\xf9ooB\xfc[\xf9\xf3\xf7\xab!\xac\xe2\x07\xfe\xf9f\xb5\nW\x9b\x1f\xfcy\xf8\xee\xe6u\'\x7fQ|v\xfa,\xf8U\xff\xf2iX\xdf\\n\xd6?\x84\x95\xfc\xf5\xe9\xc3\xd2~v\xfa\x7f\x9d><\xd5\xba(N\xbe\xf9\xfd\xbf\xbap\xf5\xd2\xffru\xf2\xdd\xe9g\xa7\x8f\x7f\x0eW\xc3\xf5\xeb\xf5\x06\xb0\x9f\xca\xf8\xafO?\xfa\xf4\xabc\x1f}\xfc\xe4\xd8G\x7f:\xfa\xd1G\x8f\x8e}\xee\x0fG?\xf7\x8b\xa3hO\xbe?\xf6/|\xf5\xe5\xb1\x8f~q\xf4\xa3\xdf\xdd\xa1}q\xcd+\xb7\xb9\xba\xb8:\xbf\xba{5\x7fX]\xff\x0c\x1d\xdf\xf1\xf2\x7f=\xc4\x7f\xf4\xfb\xab\xcb\xdf\xfe\xb2\xba~\x15V_]\xbf\xbec\xe2\x87\xd5E\x1f\xbe\xbd\xb8\x8a\x9f\xf5\xf6\xcf\xfe\xaf\xf1\xcf\xbct\

In [None]:
GET /resultatliste/?q=2200+K%C3%B8benhavn+N HTTP/1.1
Host: home.dk
Connection: keep-alive
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36
Sec-Fetch-Mode: navigate
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3
Sec-Fetch-Site: same-origin
Referer: https://home.dk/
Accept-Encoding: gzip, deflate, br
Accept-Language: da-DK,da;q=0.9,en-US;q=0.8,en;q=0.7
Cookie: _ga=GA1.2.55867073.1571476775; _gid=GA1.2.440813985.1571476775; _gcl_au=1.1.826145108.1571476775; adv_guid=bb3bdd31-b9117e-fea1b-b23eb8-44b028e|ADV; CookieInformationConsent=%7B%22website_uuid%22%3A%22bfb17c80-64c9-4e36-bca8-739bd5bf03ee%22%2C%22timestamp%22%3A%222019-10-19T09%3A19%3A37.172Z%22%2C%22consent_url%22%3A%22https%3A%2F%2Fhome.dk%2F%22%2C%22consent_website%22%3A%22home.dk%22%2C%22consent_domain%22%3A%22home.dk%22%2C%22user_uid%22%3A%220e7584f9-4838-44f8-996a-90f14c9fc36c%22%2C%22consents_approved%22%3A%5B%22cookie_cat_necessary%22%2C%22cookie_cat_functional%22%2C%22cookie_cat_statistic%22%2C%22cookie_cat_marketing%22%2C%22cookie_cat_unclassified%22%5D%2C%22consents_denied%22%3A%5B%5D%2C%22user_agent%22%3A%22Mozilla%2F5.0%20%28Macintosh%3B%20Intel%20Mac%20OS%20X%2010_14_6%29%20AppleWebKit%2F537.36%20%28KHTML%2C%20like%20Gecko%29%20Chrome%2F77.0.3865.120%20Safari%2F537.36%22%7D; ASP.NET_SessionId=eedeczunm3i0d4vnrcsipwvy


In [28]:

urlreq = 'https://home.dk/umbraco/backoffice/home-api/SEARCH?CurrentPageNumber=0&SearchResultsPerPage=15&q=2200%20K%C3%B8benhavn%20N&Energimaerker=null&SearchType=0&_=1571481546474'
#response = urllib.request.urlopen(urlreq)
req = urllib.request.Request(urlreq)
with urllib.request.urlopen(req) as response:
   the_page = response.read()
print(the_page)

b'\xd5\xbd\xddr\x1c\xc7\x96\xa5\xf9*\x18\\\xf4\x95\x1c\x8cp\x0f\xf7\x88\xa0YY\x1bO\x89\xfa)Q?FR*\x1b+k\xa3yDx\x80\x10A\x80\x93\x99\x90\x8eN\xd9\xb9\x9c~\x86\xba\x9c\x9bj\xb3y\x83\xba\x1e\xbd\xd8|;\tGF\xa6\xa7\x8e\xe8$\x11\xec\xd0\xe9f\x91 \xc0\x05\xe4\xca\xf0\x9f\xbd\xd7^\xeb\xdfOWa\xb8X\x85~\xf3\xe3\xea\xf2\xf4\xe1\xd5\xcd\xe5\xe5g\xa7\x17Won6\xdf^\x0f\x81\x8f\xfc\xfb\xe9\xb3\xeb\xd5\xe6\xf9ooB\xfc[\xf9\xf3\xf7\xab!\xac\xe2\x07\xfe\xf9f\xb5\nW\x9b\x1f\xfcy\xf8\xee\xe6u\'\x7fQ|v\xfa,\xf8U\xff\xf2iX\xdf\\n\xd6?\x84\x95\xfc\xf5\xe9\xc3\xd2~v\xfa\x7f\x9d><\xd5\xba(N\xbe\xf9\xfd\xbf\xbap\xf5\xd2\xffru\xf2\xdd\xe9g\xa7\x8f\x7f\x0eW\xc3\xf5\xeb\xf5\x06\xb0\x9f\xca\xf8\xafO?\xfa\xf4\xabc\x1f}\xfc\xe4\xd8G\x7f:\xfa\xd1G\x8f\x8e}\xee\x0fG?\xf7\x8b\xa3hO\xbe?\xf6/|\xf5\xe5\xb1\x8f~q\xf4\xa3\xdf\xdd\xa1}q\xcd+\xb7\xb9\xba\xb8:\xbf\xba{5\x7fX]\xff\x0c\x1d\xdf\xf1\xf2\x7f=\xc4\x7f\xf4\xfb\xab\xcb\xdf\xfe\xb2\xba~\x15V_]\xbf\xbec\xe2\x87\xd5E\x1f\xbe\xbd\xb8\x8a\x9f\xf5\xf6\xcf\xfe\xaf\xf1\xcf\xbct\