# Predicting house selling prices in Denmark



## Initial overview of steps:
* Guiding research question(s)
* Scrape real estate agency websites (gathering)
* Load data and organize in tidy format (wrangling)
* Deal with data issues (wrangling)
* Exploratory analysis
* Focussed questions
* Explanatory analysis
* Prediction models

## Questions
* How can we predict home prices?


* Is it possible to predict listing prices based on characteristics of the home?
* If so, what features are most important?
* Which ones doesn't matter at all?

# Notes 
The CRISP-DM Process (Cross Industry Process for Data Mining)
The lessons leading up to the first project are about helping you go through CRISP-DM in practice from start to finish. Even when we get into the weeds of coding, try to take a step back and realize what part of the process you are in, and assure that you remember the question you are trying answer and what a solution to that question looks like.

1. Business Understanding

2. Data Understanding

3. Prepare Data

4. Data Modeling

5. Evaluate the Results

6. Deploy

In [1]:
# Importing libraries
import pandas as pd
import requests
import bs4
import time

Browsing Home, the largest real estate company in Denmark and playing arround with the developer tools, I managed to find HTTP call that seem to return the data of the listings.

In [10]:
# Using the home, the biggest real estate company in Denmark
#url = 'https://home.dk/umbraco/backoffice/home-api/SEARCH?CurrentPageNumber=0&SearchResultsPerPage=10&q=2200%20K%C3%B8benhavn%20N&Energimaerker=null&SearchType=0&_=1571481546474'
url = 'https://home.dk/umbraco/backoffice/home-api/SEARCH?CurrentPageNumber=0&SearchResultsPerPage=10&q=2200&Energimaerker=null&SearchType=0&_=1571481546474'
response = requests.get(url)
# Saving response to a dictionary
featuresDict = response.json()

In [12]:
# Checking our the data
featuresDict

{'redirectUrl': None,
 'inputModel': {'SortType': None,
  'SortOrder': None,
  'CurrentPageNumber': 0,
  'SearchResultsPerPage': 10,
  'q': '2200',
  'EjendomstypeV1': None,
  'EjendomstypeRH': None,
  'EjendomstypeEL': None,
  'EjendomstypeVL': None,
  'EjendomstypeAA': None,
  'EjendomstypePL': None,
  'EjendomstypeFH': None,
  'EjendomstypeLO': None,
  'EjendomstypeHG': None,
  'EjendomstypeFG': None,
  'EjendomstypeNL': None,
  'Forretningnr': None,
  'ProjectNodeId': None,
  'OnlyBrokerHome': None,
  'PriceMin': None,
  'PriceMax': None,
  'EjerudgiftPrMdrMin': None,
  'EjerudgiftPrMdrMax': None,
  'BoligydelsePrMdrMin': None,
  'BoligydelsePrMdrMax': None,
  'BoligstoerrelseMin': None,
  'BoligstoerrelseMax': None,
  'GrundstoerrelseMin': None,
  'GrundstoerrelseMax': None,
  'VaerelserMin': None,
  'VaerelserMax': None,
  'Energimaerker': ['null'],
  'ByggaarMin': None,
  'ByggaarMax': None,
  'EtageMin': None,
  'EtageMax': None,
  'PlanMin': None,
  'PlanMax': None,
  'Aabenth

What we want to extract seem to be withing the searchResult key:

In [13]:
featuresDict['searchResults']

[{'sagsnummer': '1050000139',
  'lng': 12.5457172243703,
  'lat': 55.6924852361034,
  'fokusbolig': False,
  'showNewPrice': False,
  'isNew': True,
  'adresse': 'Bjelkes Allé 6B, st..',
  'postal': 2200,
  'city': 'København N',
  'price': '2.095.000 ',
  'ejendomstypePrimaerNicename': 'Ejerlejlighed',
  'pictures': [{'PicId': 2993530,
    'CaseId': 10397003,
    'CaseNumber': '1050000139',
    'MediaType': 'b',
    'MaxWidth': 3000,
    'MaxHeight': 2000,
    'URL': 'https://home.mindworking.eu/resources/shops/105/cases/1050000139/casemedia/images/7687715b8b7896b4ff855797e16a8061/customsize.jpg?deviceId=jd83hsdf3',
    'Position': 0,
    'Description': 'Stue',
    'GUID': '7687715b-8b78-96b4-ff85-5797e16a8061',
    'refGUID': '00000000-0000-0000-0000-000000000000',
    'IsVertical': False,
    'IsHorizontal': True},
   {'PicId': 2993537,
    'CaseId': 10397003,
    'CaseNumber': '1050000139',
    'MediaType': 'b',
    'MaxWidth': 3000,
    'MaxHeight': 2000,
    'URL': 'https://home.

Great! This is the data we're interested in. However the pictures key contain a list of information, we don't need which would ruin the granularity should we convert it to a pandas Dataframe so let's drop it.

In [14]:
# dropping the pictures key from the list of dictionaries
features = featuresDict['searchResults']
for f in features:
    del f['pictures']
features

[{'sagsnummer': '1050000139',
  'lng': 12.5457172243703,
  'lat': 55.6924852361034,
  'fokusbolig': False,
  'showNewPrice': False,
  'isNew': True,
  'adresse': 'Bjelkes Allé 6B, st..',
  'postal': 2200,
  'city': 'København N',
  'price': '2.095.000 ',
  'ejendomstypePrimaerNicename': 'Ejerlejlighed',
  'floorPlan': {'PicId': 2993542,
   'CaseId': 10397003,
   'CaseNumber': '1050000139',
   'MediaType': 'p',
   'MaxWidth': 3000,
   'MaxHeight': 2000,
   'URL': 'https://home.mindworking.eu/resources/shops/105/cases/1050000139/casemedia/images/2f0b1e7e3e1981c99f5d514ebf3f9869/customsize.jpg?deviceId=jd83hsdf3',
   'Position': 0,
   'Description': 'Plantegning',
   'GUID': '2f0b1e7e-3e19-81c9-9f5d-514ebf3f9869',
   'refGUID': '00000000-0000-0000-0000-000000000000',
   'IsVertical': False,
   'IsHorizontal': True},
  'boligOrGrundAreal': 54,
  'andenmaegler': False,
  'boligurl': 'https://home.dk/boligkatalog/koebenhavn/2200/ejerlejligheder/bjelkes_alle_6b_st_1050000139.aspx',
  'billede

The data seem ready to be loaded to a pandas dataframe.

In [15]:
df = pd.DataFrame(features)
df.head()

Unnamed: 0,aabenthusNicename,aabenthusShowRegistration,adresse,andenmaegler,billedeUrl,boligKanLejes,boligOrGrundAreal,boligurl,city,ejendomstypePrimaerNicename,...,lejePerMaaned,lng,openHouseEndDate,openHouseStartDate,overskrift2,postal,price,sagsnummer,showNewPrice,solgtBolig
0,27.10 kl. 12.00-12.30,False,"Bjelkes Allé 6B, st..",False,https://home.mindworking.eu/resources/shops/10...,0,54,https://home.dk/boligkatalog/koebenhavn/2200/e...,København N,Ejerlejlighed,...,,12.545717,2019-10-27T12:30,2019-10-27T12:00,,2200,2.095.000,1050000139,False,False
1,27.10 kl. 14.30-14.50,False,"Poppelgade 4, 1. th.",False,https://home.mindworking.eu/resources/shops/10...,0,105,https://home.dk/boligkatalog/koebenhavn/2200/a...,København N,Andelsbolig,...,,12.559357,2019-10-27T14:50,2019-10-27T14:30,Beliggende i baghuset,2200,1.799.000,1050000162,False,False
2,27.10 kl. 13.30-13.50,False,"Husumgade 20, 2. th.",False,https://home.mindworking.eu/resources/shops/10...,0,53,https://home.dk/boligkatalog/koebenhavn/2200/e...,København N,Ejerlejlighed,...,,12.5454,2019-10-27T13:50,2019-10-27T13:30,Et super godt køb!,2200,2.399.000,1050000164,False,False
3,27.10 kl. 13.30-13.50,False,"Egegade 2, 1. th.",False,https://home.mindworking.eu/resources/shops/10...,0,78,https://home.dk/boligkatalog/koebenhavn/2200/e...,København N,Ejerlejlighed,...,,12.559457,2019-10-27T13:50,2019-10-27T13:30,Med altan og stort badeværelse,2200,3.999.000,1050000167,False,False
4,27.10 kl. 11.00-11.20,False,"Fredensborggade 2, 1. th.",False,https://home.mindworking.eu/resources/shops/10...,0,56,https://home.dk/boligkatalog/koebenhavn/2200/e...,København N,Ejerlejlighed,...,,12.53988,2019-10-27T11:20,2019-10-27T11:00,Super beliggenhed på Nørrebro,2200,2.199.000,1050000137,False,False


Let's remove columns that are not of interest.

In [16]:
df.drop(inplace = True, columns=[
    'billedeUrl','lejePerMaaned','showNewPrice',
    'aabenthusNicename','floorPlan','erSolgtOgLejebolig',
    'boligKanLejes','aabenthusShowRegistration', 
    'solgtBolig','isLejebolig','fokusbolig'
])

In [17]:
df.head()

Unnamed: 0,adresse,andenmaegler,boligOrGrundAreal,boligurl,city,ejendomstypePrimaerNicename,isNew,lat,lng,openHouseEndDate,openHouseStartDate,overskrift2,postal,price,sagsnummer
0,"Bjelkes Allé 6B, st..",False,54,https://home.dk/boligkatalog/koebenhavn/2200/e...,København N,Ejerlejlighed,True,55.692485,12.545717,2019-10-27T12:30,2019-10-27T12:00,,2200,2.095.000,1050000139
1,"Poppelgade 4, 1. th.",False,105,https://home.dk/boligkatalog/koebenhavn/2200/a...,København N,Andelsbolig,True,55.692049,12.559357,2019-10-27T14:50,2019-10-27T14:30,Beliggende i baghuset,2200,1.799.000,1050000162
2,"Husumgade 20, 2. th.",False,53,https://home.dk/boligkatalog/koebenhavn/2200/e...,København N,Ejerlejlighed,True,55.693495,12.5454,2019-10-27T13:50,2019-10-27T13:30,Et super godt køb!,2200,2.399.000,1050000164
3,"Egegade 2, 1. th.",False,78,https://home.dk/boligkatalog/koebenhavn/2200/e...,København N,Ejerlejlighed,True,55.690345,12.559457,2019-10-27T13:50,2019-10-27T13:30,Med altan og stort badeværelse,2200,3.999.000,1050000167
4,"Fredensborggade 2, 1. th.",False,56,https://home.dk/boligkatalog/koebenhavn/2200/e...,København N,Ejerlejlighed,True,55.698624,12.53988,2019-10-27T11:20,2019-10-27T11:00,Super beliggenhed på Nørrebro,2200,2.199.000,1050000137


The 'boligurl' is the URL to the site of each piece of real estate for sale, so let's use that to get more features!

In [18]:
response = requests.get(df['boligurl'][0])
html = response.text

In [22]:
html

'\r\n<!DOCTYPE html>\r\n<html lang="da" class="no-js" ng-app="home" xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://ogp.me/ns/fb#">\r\n<head>\r\n    <script id="CookieConsent" src="https://policy.cookieinformation.com/uc.js" data-culture="DA" async></script>\r\n    \r\n<script>(function(H){H.className=H.className.replace(/\\bno-js\\b/,\'js\')})(document.documentElement)</script>\r\n<meta charset="utf-8">\r\n<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">\r\n<meta id="viewport" name="viewport" content="width=device-width, initial-scale=1, maximum-scale=2">\r\n<meta name="format-detection" content="telephone=no">\r\n<title>Ejerlejlighed - 2200 København N - Bjelkes Allé 6B, st..</title>\r\n<meta name="title" content="Ejerlejlighed - 2200 København N - Bjelkes Allé 6B, st..">\r\n<meta name="keywords" content="" />\r\n<meta name="description" content="Ejerlejlighed til salg, København N - Førstehåndsindtrykket er rigtig godt, når I træder indenfor i entréen, for allere

The stuff we want is in the info-property and info-value class.

In [39]:
soup = bs4.BeautifulSoup(html, "html.parser")
additionalFeatures = soup.find_all('span', {"class": ["info-property","info-value"]})

[<span class="info-property">Kontantpris</span>,
 <span class="info-value"><b>3.650.000  kr.</b></span>,
 <span class="info-property">Ejerudgift pr. md.</span>,
 <span class="info-value"><b>2.356  kr.</b></span>,
 <span class="info-property">Kvm. pris <i class="tipso" title="Kvm-prisen er baseret på et vægtet areal,  som er mere præcist, fordi der også tages højde for kælderarealer, loftsarealer, udhuse etc. - og ikke kun boligareal. ">?</i></span>,
 <span class="info-value"><b>40.109  kr.</b></span>,
 <span class="info-property">Udbetaling</span>,
 <span class="info-value"><b>185.000  kr.</b></span>,
 <span class="info-property">
                         Brutto/Netto
                         <i class="tipso" title="I brutto- og nettoydelsen indgår standardfinansiering. Da der er tale om en standardfinansiering, vil den i visse tilfælde ikke kunne opnås, hvorfor brutto- og nettoydelsen i så fald kan afvige.">?</i>
 <br>
                         ekskl. ejerudgift
                     </

They come in pairs and we need them divivded into key-value pairs.

In [64]:
# Loop through each span in the list
#import json
count = 0
keys = []
values = []
for feat in additionalFeatures:
    if count % 2: # Odd number is a value
        values.append(feat.text.strip())
        #values.append(re.findall('<b>.+</b>',str(feat))[0][3:-4])
    else: # Even number is a key
        keys.append(feat.text.strip())
        #keys.append(re.findall('>.+<',str(feat))[0][1:-1])
    count +=1 
dictionary = dict(zip(keys, values))
dictionary

{'Kontantpris': '3.650.000  kr.',
 'Ejerudgift pr. md.': '2.356  kr.',
 'Kvm. pris ?': '40.109  kr.',
 'Udbetaling': '185.000  kr.',
 'Brutto/Netto\r\n                        ?\n\r\n                        ekskl. ejerudgift': '14.114  / 12.357  kr.',
 'Prisudvikling': '0%',
 'Boligareal': '91  m2',
 'Grundareal': '570  m2',
 'Antal toiletter': '1',
 'Antal rum': '3',
 'Byggeår': '1906',
 'Energimærke': 'D',
 'Sagsnr.': '1050000133',
 'Afstand til off. transport': '200  m',
 'Afstand til skole': '500  m',
 'Afstand til indkøb': '300  m',
 'Ydermur': 'Mursten',
 'Gulve': 'Plankegulve',
 'Vinduer': 'Termo',
 'El': 'HPFI-relæ',
 'Forurening': 'Jf. udskrift fra RegionH',
 'Overtagelse': 'Efter aftale',
 'Antenne': 'Kabel-tv',
 'Vaskeri': 'Ja',
 'Udlejning tilladt': 'Ja, jf. vedtægterne',
 'Tilbehør': 'Indesit opvaskemaskineGram køleskabVoss ovn',
 'Ejendomsværdi i kr.': '1.600.000',
 'Heraf grundværdi i kr.': '112.200',
 'Vurderingsår': '2018'}

This should be repeated for each line in the dataframe and to be appended as columns. Let's create a function for this.

In [25]:
def get_additional_features(df):
    """Function for getting additional features from each of the listings. Input is the dataframe."""
    additionalFeaturesList = []
    counter = 0
    loops = df.shape[0]
    # Loop through all rows
    for i in df['boligurl']:
        try:
            response = requests.get(i)
            html = response.text
            soup = bs4.BeautifulSoup(html, "html.parser")
            additionalFeatures = soup.find_all('span', {"class": ["info-property","info-value"]})

            # Loop through each span in the list
            count = 0
            keys = []
            values = []
            for feat in additionalFeatures:
                if count % 2: # Odd number is a value
                    values.append(feat.text.strip())
                else: # Even number is a key
                    keys.append(feat.text.strip())
                count +=1 
        except:
            keys.append('Connection timed out')
            values.append('True')
            
        additionalFeaturesList.append(dict(zip(keys, values)))
        time.sleep(2)
        counter += 1
        print('Progress {}'.format((float(counter)/float(loops))*100.))
    df2 = df.join(pd.DataFrame(additionalFeaturesList))
    return df2

In [86]:
df2 = get_additional_features(df)
df2.head()

Unnamed: 0,adresse,andenmaegler,boligOrGrundAreal,boligurl,city,ejendomstypePrimaerNicename,isNew,lat,lng,openHouseEndDate,...,Sagsnr.,Teknisk pris ?,Tilbehør,Udbetaling,Udlejning,Udlejning tilladt,Vaskeri,Vinduer,Vurderingsår,Ydermur
0,"Bjelkes Allé 6B, st..",False,54,https://home.dk/boligkatalog/koebenhavn/2200/e...,København N,Ejerlejlighed,True,55.692485,12.545717,2019-10-27T12:30,...,1050000139,,Gorenje komfurAEG køle/fryseskabBosch emhætteh...,105.000 kr.,,Tilladt,Ja,,2018,Mursten
1,"Poppelgade 4, 1. th.",False,105,https://home.dk/boligkatalog/koebenhavn/2200/a...,København N,Andelsbolig,True,55.692049,12.559357,2019-10-27T14:50,...,1050000162,3.879.803 kr.,Bosch køle/fryseskabAEG vaskemaskine,,,"Tilladt i kortere periode, jf. vedtægternes § ...",Ja,Termo,2018,Mursten
2,"Husumgade 20, 2. th.",False,53,https://home.dk/boligkatalog/koebenhavn/2200/e...,København N,Ejerlejlighed,True,55.693495,12.5454,2019-10-27T13:50,...,1050000164,,Afventer oplysninger fra sælger,120.000 kr.,Tilladt,,Fællesvaskeri,Termo,2018,Pudset mursten
3,"Egegade 2, 1. th.",False,78,https://home.dk/boligkatalog/koebenhavn/2200/e...,København N,Ejerlejlighed,True,55.690345,12.559457,2019-10-27T13:50,...,1050000167,,Gram køle/fryseskabSiemens komfurElectrolux va...,200.000 kr.,,Med tilladelse fra ejerforeningens bestyrelse,Nej,Termo,2018,Mursten
4,"Fredensborggade 2, 1. th.",False,56,https://home.dk/boligkatalog/koebenhavn/2200/e...,København N,Ejerlejlighed,True,55.698624,12.53988,2019-10-27T11:20,...,1050000137,,Køleskab fra Blomberg (A+)Komfur fra SMEG,110.000 kr.,Tilladt,,Fællesvaskeri,Termo,2018,Mursten


In [88]:
df2.columns

Index(['adresse', 'andenmaegler', 'boligOrGrundAreal', 'boligurl', 'city',
       'ejendomstypePrimaerNicename', 'isNew', 'lat', 'lng',
       'openHouseEndDate', 'openHouseStartDate', 'overskrift2', 'postal',
       'price', 'sagsnummer', 'Afstand til indkøb',
       'Afstand til off. transport', 'Afstand til skole', 'Altan',
       'Antal plan', 'Antal rum', 'Antal toiletter', 'Antenne', 'Boligareal',
       'Boligydelse pr. måned',
       'Brutto/Netto\r\n                        ?\n\r\n                        ekskl. ejerudgift',
       'Byggeår', 'Ejendomsværdi i kr.', 'Ejerudgift pr. md.', 'El',
       'Energimærke', 'Etage', 'Fibernet', 'Forurening', 'Grundareal', 'Gulve',
       'Heraf grundværdi i kr.', 'Husdyr', 'Husdyr tilladt', 'Kontantpris',
       'Kvm. pris ?', 'Købspris', 'Overtagelse', 'Prisudvikling', 'Pulterrum',
       'Sagsnr.', 'Teknisk pris ?', 'Tilbehør', 'Udbetaling', 'Udlejning',
       'Udlejning tilladt', 'Vaskeri', 'Vinduer', 'Vurderingsår', 'Ydermur'],
     

Alright, we can now do this entire process for multiple zip codes and more than 10 returns.

Note: Through trial and error I found the maximum number of returns to be 200 and in order to get all the data, we can use the URL to add search criteria to split our results into smaller bins.

In [141]:
# Zip codes in Denmark
#zipCode = [2200, 9210]
zipCode = [1301,2000,2100,2200,2300,2400,2450,2500,2600,2605,2610,2625,2630,
           2635,2640,2650,2660,2665,2670,2670,2680,2690,2700,2720,2730,2740,
           2750,2760,2765,2770,2791,2800,2820,2830,2840,2850,2860,2880,2900,
           2920,2930,2942,2950,2960,2970,2980,2990,3000,3050,3060,3070,3080,
           3100,3120,3140,3150,3200,3210,3220,3230,3250,3300,3310,3320,3330,
           3360,3370,3390,3400,3460,3480,3490,3500,3520,3540,3550,3600,3630,
           3650,3660,3670,3700,3720,3730,3740,3751,3760,3770,3782,3790,4000,
           4040,4050,4060,4070,4100,4130,4140,4160,4171,4173,4174,4180,4190,
           4200,4220,4230,4241,4242,4243,4250,4261,4262,4270,4281,4291,4293,
           4295,4296,4300,4320,4330,4340,4350,4360,4370,4390,4400,4420,4440,
           4450,4460,4470,4480,4490,4500,4520,4532,4534,4540,4550,4560,4571,
           4572,4573,4581,4583,4591,4592,4593,4600,4621,4622,4623,4632,4640,
           4652,4653,4654,4660,4671,4672,4673,4681,4682,4683,4684,4690,4700,
           4720,4733,4735,4736,4750,4760,4771,4772,4773,4780,4791,4792,4793,
           4800,4840,4850,4862,4863,4871,4872,4873,4874,4880,4891,4892,4894,
           4895,4900,4912,4913,4920,4930,4941,4943,4944,4951,4952,4953,4960,
           4970,4983,4990,5000,5200,5210,5220,5230,5240,5250,5260,5270,5290,
           5300,5330,5350,5370,5380,5390,5400,5450,5462,5463,5464,5466,5471,
           5474,5485,5491,5492,5500,5540,5550,5560,5580,5591,5592,5600,5610,
           5620,5631,5642,5672,5683,5690,5700,5750,5762,5771,5772,5792,5800,
           5853,5854,5856,5863,5871,5874,5881,5882,5883,5884,5892,5900,5932,
           5935,5953,5960,5970,5985,6000,6040,6051,6052,6064,6070,6091,6092,
           6093,6094,6100,6200,6230,6240,6261,6270,6280,6300,6310,6320,6330,
           6340,6360,6372,6392,6400,6430,6440,6470,6500,6510,6520,6535,6541,
           6560,6580,6600,6621,6622,6623,6630,6640,6650,6660,6670,6682,6683,
           6690,6700,6701,6705,6710,6715,6720,6731,6740,6752,6760,6771,6780,
           6792,6800,6818,6823,6830,6840,6851,6852,6853,6854,6855,6857,6862,
           6870,6880,6893,6900,6920,6933,6940,6950,6960,6971,6973,6980,6990,
           7000,7080,7100,7120,7130,7140,7150,7160,7171,7173,7182,7183,7184,
           7190,7200,7250,7260,7270,7280,7300,7321,7323,7330,7361,7362,7400,
           7430,7441,7442,7451,7470,7480,7490,7500,7540,7550,7560,7570,7600,
           7620,7650,7660,7673,7680,7700,7730,7741,7742,7752,7755,7760,7770,
           7790,7800,7830,7840,7850,7860,7870,7884,7900,7950,7960,7970,7980,
           7990,8000,8200,8210,8220,8230,8240,8250,8260,8270,8300,8305,8310,
           8320,8330,8340,8350,8355,8361,8362,8370,8380,8381,8382,8400,8410,
           8420,8444,8450,8462,8464,8471,8472,8500,8520,8530,8541,8543,8544,
           8550,8560,8570,8581,8585,8586,8592,8600,8620,8632,8641,8643,8653,
           8654,8660,8670,8680,8700,8721,8722,8723,8732,8740,8751,8752,8762,
           8763,8765,8766,8781,8783,8800,8830,8831,8832,8840,8850,8860,8870,
           8881,8882,8883,8900,8950,8961,8963,8970,8981,8983,8990,9000,9200,
           9210,9220,9230,9240,9260,9270,9280,9293,9300,9310,9320,9330,9340,
           9352,9362,9370,9380,9381,9382,9400,9430,9440,9460,9480,9490,9492,
           9493,9500,9510,9520,9530,9541,9550,9560,9574,9575,9600,9610,9620,
           9631,9632,9640,9670,9681,9690,9700,9740,9750,9760,9800,9830,9850,
           9870,9881,9900,9940,9970,9981,9982,9990
          ]


In [142]:

featureList = []
# Loop through zip codes
for code in zipCode:
    # If the zipcode is in one of the larger cities, split the search into chunks based on size
    if code in [1301, 2000, 2100, 2200, 2300, 2400, 2450, 2500,
                5000, 5200, 5210, 5220, 5230, 5240, 5250, 5260,
                5270, 8000, 8200, 8210, 8220, 8230, 8240, 9000,
                9200, 9210, 9220
               ]:
        # Setting size interval to bin responses into smaller chunks
        minSize = 11
        maxSize = 20
        # Loop through sizes
        for i in range(28):
            url = 'https://home.dk/umbraco/backoffice/home-api/SEARCH?CurrentPageNumber=0&SearchResultsPerPage=200&BoligstoerrelseMin=' + str(minSize) + '&BoligstoerrelseMax=' + str(maxSize) + '&q=' + str(code) + '&Energimaerker=null&SortOrder=asc&SearchType=0&_=1571481546474'

            response = requests.get(url)
            # Saving response to a dictionary
            featuresDict = response.json()
            # dropping the pictures key from the list of dictionaries
            features = featuresDict['searchResults']
            for f in features:
                del f['pictures']
            featureList.extend(features)
            # Pausing to not be a dick towards the server
            time.sleep(1)
            
            # Count up sizes
            minSize += 10
            maxSize += 10
        
        # Run one additional time with out the maximum boundry
        url = 'https://home.dk/umbraco/backoffice/home-api/SEARCH?CurrentPageNumber=0&SearchResultsPerPage=200&BoligstoerrelseMin=' + str(minSize) + '&q=' + str(code) + '&Energimaerker=null&SortOrder=asc&SearchType=0&_=1571481546474'

        response = requests.get(url)
        # Saving response to a dictionary
        featuresDict = response.json()
        # dropping the pictures key from the list of dictionaries
        features = featuresDict['searchResults']
        for f in features:
            del f['pictures']
        featureList.extend(features)
    # If the zipcode not in a larger city
    else:
        url = 'https://home.dk/umbraco/backoffice/home-api/SEARCH?CurrentPageNumber=0&SearchResultsPerPage=200&q=' + str(code) + '&Energimaerker=null&SortOrder=asc&SearchType=0&_=1571481546474'

        response = requests.get(url)
        # Saving response to a dictionary
        featuresDict = response.json()
        # dropping the pictures key from the list of dictionaries
        features = featuresDict['searchResults']
        for f in features:
            del f['pictures']
        featureList.extend(features)

len(featureList)

51207

In [143]:
df_new = pd.DataFrame(featureList)
df_new.drop(inplace = True, columns=[
    'billedeUrl','lejePerMaaned','showNewPrice',
    'aabenthusNicename','floorPlan','erSolgtOgLejebolig',
    'boligKanLejes','aabenthusShowRegistration', 
    'solgtBolig','isLejebolig','fokusbolig'
])
df_new.head()

Unnamed: 0,adresse,andenmaegler,boligOrGrundAreal,boligurl,city,ejendomstypePrimaerNicename,isNew,lat,lng,openHouseEndDate,openHouseStartDate,overskrift2,postal,price,sagsnummer
0,"A.D. Jørgensens Vej 75, 2. 1.",False,35.0,https://home.dk/boligkatalog/frederiksberg/200...,Frederiksberg,Ejerlejlighed,False,55.680726,12.494705,2019-10-27T11:20,2019-10-27T11:00,,2000,1.350.000,1300000111
1,"Holger Danskes Vej 14, 3. th.",False,46.0,https://home.dk/boligkatalog/frederiksberg/200...,Frederiksberg,Ejerlejlighed,False,55.686615,12.538356,,,,2000,2.145.000,1740000062
2,"Holger Danskes Vej 12, 3 tv",True,46.0,https://www.boligsiden.dk/viderestillingekster...,Frederiksberg,Ejerlejlighed,False,55.686592,12.538423,,,,2000,2.150.000,20002433_10007
3,"Ane Katrines Vej 16, 2 4",True,45.0,https://www.boligsiden.dk/viderestillingekster...,Frederiksberg,Ejerlejlighed,False,55.691713,12.535145,,,,2000,2.099.000,303562_100910_10009
4,"Ane Katrines Vej 16, St. 1",True,45.0,https://www.boligsiden.dk/viderestillingekster...,Frederiksberg,Ejerlejlighed,False,55.691713,12.535145,,,,2000,2.295.000,100088_103402_10009


In [144]:
df_new.to_csv('baseData.csv')

In [3]:
df_new = pd.read_csv('baseData.csv',index_col = 0 )
df_new.head()

Unnamed: 0,adresse,andenmaegler,boligOrGrundAreal,boligurl,city,ejendomstypePrimaerNicename,isNew,lat,lng,openHouseEndDate,openHouseStartDate,overskrift2,postal,price,sagsnummer
0,"A.D. Jørgensens Vej 75, 2. 1.",False,35.0,https://home.dk/boligkatalog/frederiksberg/200...,Frederiksberg,Ejerlejlighed,False,55.680726,12.494705,2019-10-27T11:20,2019-10-27T11:00,,2000,1.350.000,1300000111
1,"Holger Danskes Vej 14, 3. th.",False,46.0,https://home.dk/boligkatalog/frederiksberg/200...,Frederiksberg,Ejerlejlighed,False,55.686615,12.538356,,,,2000,2.145.000,1740000062
2,"Holger Danskes Vej 12, 3 tv",True,46.0,https://www.boligsiden.dk/viderestillingekster...,Frederiksberg,Ejerlejlighed,False,55.686592,12.538423,,,,2000,2.150.000,20002433_10007
3,"Ane Katrines Vej 16, 2 4",True,45.0,https://www.boligsiden.dk/viderestillingekster...,Frederiksberg,Ejerlejlighed,False,55.691713,12.535145,,,,2000,2.099.000,303562_100910_10009
4,"Ane Katrines Vej 16, St. 1",True,45.0,https://www.boligsiden.dk/viderestillingekster...,Frederiksberg,Ejerlejlighed,False,55.691713,12.535145,,,,2000,2.295.000,100088_103402_10009


Inspecting the data it's clear that we have some appartments from other real estate agencies, but how many? ('andenmaegler' tranlates to other real estate agency)

In [17]:
df_new.query('andenmaegler == True')['adresse'].count()

41777

Turns out it's actually the majority. It's not very likely the previously created function for getting additional features would work for the remaining URLs.

For now we can get the additional features for the Home listings.

In [18]:
df_new.query('andenmaegler == False')['adresse'].count()

9430

In [19]:
df_home = df_new.query('andenmaegler == False')
df_home.head()

Unnamed: 0,adresse,andenmaegler,boligOrGrundAreal,boligurl,city,ejendomstypePrimaerNicename,isNew,lat,lng,openHouseEndDate,openHouseStartDate,overskrift2,postal,price,sagsnummer
0,"A.D. Jørgensens Vej 75, 2. 1.",False,35.0,https://home.dk/boligkatalog/frederiksberg/200...,Frederiksberg,Ejerlejlighed,False,55.680726,12.494705,2019-10-27T11:20,2019-10-27T11:00,,2000,1.350.000,1300000111
1,"Holger Danskes Vej 14, 3. th.",False,46.0,https://home.dk/boligkatalog/frederiksberg/200...,Frederiksberg,Ejerlejlighed,False,55.686615,12.538356,,,,2000,2.145.000,1740000062
11,"Lyøvej 5, st.. tv.",False,60.0,https://home.dk/boligkatalog/frederiksberg/200...,Frederiksberg,Ejerlejlighed,False,55.68294,12.524527,,,HØJ STUELEJLIGHED MED STOR ALTAN - LAV EJERUDG...,2000,2.875.000,1300000128
12,"H. Schneekloths Vej 13, 5. th.",False,56.0,https://home.dk/boligkatalog/frederiksberg/200...,Frederiksberg,Ejerlejlighed,False,55.679928,12.506927,,,Lys og indflytningsklar stand – perfekt delele...,2000,2.750.000,130D01015
30,"Howitzvej 61, 3. th.",False,67.0,https://home.dk/boligkatalog/frederiksberg/200...,Frederiksberg,Ejerlejlighed,True,55.680209,12.523998,2019-11-03T15:20,2019-11-03T15:00,Nyere bad & køkken - i kort afstand til Freder...,2000,3.195.000,1300000176


In [26]:
df_home_add = get_additional_features(df_home)

Progress 0.010604453870625662
Progress 0.021208907741251323
Progress 0.03181336161187699
Progress 0.04241781548250265
Progress 0.05302226935312832
Progress 0.06362672322375398
Progress 0.07423117709437964
Progress 0.0848356309650053
Progress 0.09544008483563096
Progress 0.10604453870625664
Progress 0.11664899257688228
Progress 0.12725344644750797
Progress 0.1378579003181336
Progress 0.14846235418875928
Progress 0.15906680805938495
Progress 0.1696712619300106
Progress 0.18027571580063625
Progress 0.19088016967126192
Progress 0.20148462354188762
Progress 0.2120890774125133
Progress 0.2226935312831389
Progress 0.23329798515376457
Progress 0.24390243902439024
Progress 0.25450689289501593
Progress 0.2651113467656416
Progress 0.2757158006362672
Progress 0.28632025450689286
Progress 0.29692470837751855
Progress 0.3075291622481442
Progress 0.3181336161187699
Progress 0.3287380699893956
Progress 0.3393425238600212
Progress 0.34994697773064687
Progress 0.3605514316012725
Progress 0.3711558854718

Progress 3.138918345705196
Progress 3.149522799575822
Progress 3.160127253446447
Progress 3.1707317073170733
Progress 3.1813361611876987
Progress 3.1919406150583245
Progress 3.2025450689289503
Progress 3.2131495227995757
Progress 3.223753976670202
Progress 3.234358430540827
Progress 3.244962884411453
Progress 3.255567338282078
Progress 3.2661717921527043
Progress 3.2767762460233296
Progress 3.2873806998939554
Progress 3.2979851537645812
Progress 3.3085896076352066
Progress 3.319194061505833
Progress 3.3297985153764578
Progress 3.340402969247084
Progress 3.3510074231177094
Progress 3.361611876988335
Progress 3.3722163308589606
Progress 3.3828207847295864
Progress 3.3934252386002126
Progress 3.4040296924708375
Progress 3.414634146341464
Progress 3.425238600212089
Progress 3.435843054082715
Progress 3.4464475079533403
Progress 3.457051961823966
Progress 3.4676564156945915
Progress 3.4782608695652173
Progress 3.4888653234358427
Progress 3.4994697773064685
Progress 3.5100742311770947
Progre

Progress 6.341463414634147
Progress 6.352067868504772
Progress 6.362672322375397
Progress 6.373276776246023
Progress 6.383881230116649
Progress 6.394485683987275
Progress 6.405090137857901
Progress 6.415694591728525
Progress 6.426299045599151
Progress 6.436903499469778
Progress 6.447507953340404
Progress 6.458112407211028
Progress 6.468716861081654
Progress 6.47932131495228
Progress 6.489925768822906
Progress 6.5005302226935315
Progress 6.511134676564156
Progress 6.521739130434782
Progress 6.5323435843054085
Progress 6.542948038176035
Progress 6.553552492046659
Progress 6.564156945917285
Progress 6.574761399787911
Progress 6.585365853658537
Progress 6.5959703075291625
Progress 6.606574761399788
Progress 6.617179215270413
Progress 6.6277836691410394
Progress 6.638388123011666
Progress 6.64899257688229
Progress 6.6595970307529155
Progress 6.670201484623542
Progress 6.680805938494168
Progress 6.691410392364794
Progress 6.702014846235419
Progress 6.712619300106044
Progress 6.72322375397667

Progress 9.575821845174975
Progress 9.586426299045598
Progress 9.597030752916226
Progress 9.607635206786851
Progress 9.618239660657476
Progress 9.628844114528102
Progress 9.639448568398727
Progress 9.650053022269352
Progress 9.66065747613998
Progress 9.671261930010605
Progress 9.68186638388123
Progress 9.692470837751856
Progress 9.703075291622481
Progress 9.713679745493108
Progress 9.724284199363732
Progress 9.734888653234359
Progress 9.745493107104984
Progress 9.75609756097561
Progress 9.766702014846237
Progress 9.77730646871686
Progress 9.787910922587487
Progress 9.798515376458113
Progress 9.809119830328738
Progress 9.819724284199363
Progress 9.830328738069989
Progress 9.840933191940614
Progress 9.851537645811241
Progress 9.862142099681867
Progress 9.872746553552492
Progress 9.883351007423117
Progress 9.893955461293743
Progress 9.90455991516437
Progress 9.915164369034994
Progress 9.92576882290562
Progress 9.936373276776246
Progress 9.946977730646871
Progress 9.957582184517497
Progres

Progress 12.725344644750795
Progress 12.735949098621422
Progress 12.746553552492045
Progress 12.757158006362673
Progress 12.767762460233298
Progress 12.778366914103923
Progress 12.78897136797455
Progress 12.799575821845174
Progress 12.810180275715801
Progress 12.820784729586427
Progress 12.83138918345705
Progress 12.841993637327679
Progress 12.852598091198303
Progress 12.86320254506893
Progress 12.873806998939555
Progress 12.884411452810179
Progress 12.895015906680808
Progress 12.905620360551431
Progress 12.916224814422057
Progress 12.926829268292684
Progress 12.937433722163307
Progress 12.948038176033934
Progress 12.95864262990456
Progress 12.969247083775185
Progress 12.979851537645812
Progress 12.990455991516436
Progress 13.001060445387063
Progress 13.011664899257688
Progress 13.022269353128312
Progress 13.032873806998941
Progress 13.043478260869565
Progress 13.054082714740192
Progress 13.064687168610817
Progress 13.07529162248144
Progress 13.08589607635207
Progress 13.09650053022269

Progress 15.853658536585366
Progress 15.864262990455991
Progress 15.874867444326618
Progress 15.885471898197242
Progress 15.896076352067867
Progress 15.906680805938494
Progress 15.91728525980912
Progress 15.927889713679747
Progress 15.93849416755037
Progress 15.949098621420996
Progress 15.959703075291623
Progress 15.970307529162248
Progress 15.980911983032875
Progress 15.991516436903499
Progress 16.002120890774123
Progress 16.01272534464475
Progress 16.023329798515377
Progress 16.033934252386004
Progress 16.044538706256628
Progress 16.05514316012725
Progress 16.06574761399788
Progress 16.076352067868505
Progress 16.08695652173913
Progress 16.097560975609756
Progress 16.10816542948038
Progress 16.118769883351007
Progress 16.129374337221634
Progress 16.139978791092258
Progress 16.150583244962885
Progress 16.16118769883351
Progress 16.171792152704136
Progress 16.182396606574763
Progress 16.193001060445386
Progress 16.203605514316013
Progress 16.214209968186637
Progress 16.224814422057264


Progress 19.003181336161187
Progress 19.013785790031815
Progress 19.024390243902438
Progress 19.034994697773065
Progress 19.045599151643692
Progress 19.056203605514316
Progress 19.06680805938494
Progress 19.077412513255567
Progress 19.088016967126194
Progress 19.09862142099682
Progress 19.109225874867445
Progress 19.119830328738068
Progress 19.130434782608695
Progress 19.141039236479322
Progress 19.15164369034995
Progress 19.162248144220573
Progress 19.172852598091197
Progress 19.183457051961824
Progress 19.19406150583245
Progress 19.204665959703078
Progress 19.215270413573702
Progress 19.225874867444325
Progress 19.236479321314953
Progress 19.247083775185576
Progress 19.257688229056203
Progress 19.26829268292683
Progress 19.278897136797454
Progress 19.28950159066808
Progress 19.300106044538705
Progress 19.310710498409332
Progress 19.32131495227996
Progress 19.331919406150583
Progress 19.34252386002121
Progress 19.353128313891833
Progress 19.36373276776246
Progress 19.374337221633088
P

Progress 22.152704135737007
Progress 22.163308589607635
Progress 22.17391304347826
Progress 22.18451749734889
Progress 22.195121951219512
Progress 22.205726405090136
Progress 22.216330858960763
Progress 22.22693531283139
Progress 22.237539766702014
Progress 22.24814422057264
Progress 22.258748674443265
Progress 22.26935312831389
Progress 22.27995758218452
Progress 22.290562036055142
Progress 22.30116648992577
Progress 22.311770943796393
Progress 22.32237539766702
Progress 22.332979851537647
Progress 22.34358430540827
Progress 22.354188759278898
Progress 22.36479321314952
Progress 22.375397667020145
Progress 22.386002120890776
Progress 22.3966065747614
Progress 22.407211028632027
Progress 22.41781548250265
Progress 22.428419936373274
Progress 22.439024390243905
Progress 22.449628844114528
Progress 22.460233297985155
Progress 22.47083775185578
Progress 22.481442205726403
Progress 22.49204665959703
Progress 22.502651113467657
Progress 22.513255567338284
Progress 22.523860021208908
Progres

Progress 25.30222693531283
Progress 25.312831389183454
Progress 25.32343584305408
Progress 25.33404029692471
Progress 25.344644750795336
Progress 25.35524920466596
Progress 25.365853658536587
Progress 25.376458112407214
Progress 25.387062566277834
Progress 25.39766702014846
Progress 25.408271474019088
Progress 25.41887592788971
Progress 25.42948038176034
Progress 25.440084835630966
Progress 25.45068928950159
Progress 25.461293743372217
Progress 25.471898197242844
Progress 25.48250265111347
Progress 25.49310710498409
Progress 25.503711558854718
Progress 25.514316012725345
Progress 25.52492046659597
Progress 25.535524920466596
Progress 25.546129374337223
Progress 25.556733828207847
Progress 25.567338282078474
Progress 25.5779427359491
Progress 25.588547189819728
Progress 25.599151643690348
Progress 25.609756097560975
Progress 25.620360551431602
Progress 25.630965005302226
Progress 25.641569459172853
Progress 25.65217391304348
Progress 25.6627783669141
Progress 25.67338282078473
Progress 

Progress 28.451749734888654
Progress 28.46235418875928
Progress 28.4729586426299
Progress 28.483563096500532
Progress 28.49416755037116
Progress 28.50477200424178
Progress 28.515376458112407
Progress 28.525980911983034
Progress 28.536585365853657
Progress 28.547189819724284
Progress 28.55779427359491
Progress 28.56839872746554
Progress 28.57900318133616
Progress 28.589607635206786
Progress 28.600212089077417
Progress 28.610816542948037
Progress 28.621420996818664
Progress 28.63202545068929
Progress 28.642629904559914
Progress 28.65323435843054
Progress 28.66383881230117
Progress 28.67444326617179
Progress 28.685047720042416
Progress 28.695652173913043
Progress 28.70625662778367
Progress 28.716861081654294
Progress 28.72746553552492
Progress 28.738069989395548
Progress 28.74867444326617
Progress 28.7592788971368
Progress 28.769883351007426
Progress 28.780487804878046
Progress 28.791092258748673
Progress 28.8016967126193
Progress 28.812301166489924
Progress 28.82290562036055
Progress 28.

Progress 31.6118769883351
Progress 31.622481442205725
Progress 31.633085896076352
Progress 31.64369034994698
Progress 31.6542948038176
Progress 31.664899257688226
Progress 31.675503711558857
Progress 31.686108165429484
Progress 31.696712619300104
Progress 31.70731707317073
Progress 31.71792152704136
Progress 31.728525980911982
Progress 31.73913043478261
Progress 31.749734888653236
Progress 31.760339342523856
Progress 31.770943796394484
Progress 31.78154825026511
Progress 31.792152704135734
Progress 31.80275715800636
Progress 31.81336161187699
Progress 31.823966065747616
Progress 31.83457051961824
Progress 31.845174973488866
Progress 31.855779427359494
Progress 31.866383881230114
Progress 31.87698833510074
Progress 31.887592788971368
Progress 31.89819724284199
Progress 31.90880169671262
Progress 31.919406150583246
Progress 31.93001060445387
Progress 31.940615058324497
Progress 31.951219512195124
Progress 31.96182396606575
Progress 31.97242841993637
Progress 31.983032873806998
Progress 3

Progress 34.803817603393426
Progress 34.814422057264046
Progress 34.82502651113467
Progress 34.83563096500531
Progress 34.84623541887593
Progress 34.856839872746555
Progress 34.86744432661718
Progress 34.8780487804878
Progress 34.88865323435843
Progress 34.899257688229056
Progress 34.90986214209968
Progress 34.920466595970304
Progress 34.93107104984093
Progress 34.94167550371156
Progress 34.952279957582185
Progress 34.96288441145281
Progress 34.97348886532344
Progress 34.98409331919406
Progress 34.994697773064686
Progress 35.005302226935314
Progress 35.015906680805934
Progress 35.02651113467656
Progress 35.03711558854719
Progress 35.047720042417815
Progress 35.05832449628844
Progress 35.06892895015907
Progress 35.079533404029696
Progress 35.09013785790032
Progress 35.100742311770944
Progress 35.11134676564157
Progress 35.12195121951219
Progress 35.13255567338282
Progress 35.143160127253445
Progress 35.15376458112407
Progress 35.1643690349947
Progress 35.17497348886533
Progress 35.18557

Progress 38.006362672322375
Progress 38.016967126193
Progress 38.02757158006363
Progress 38.03817603393425
Progress 38.048780487804876
Progress 38.0593849416755
Progress 38.06998939554613
Progress 38.08059384941676
Progress 38.091198303287385
Progress 38.101802757158005
Progress 38.11240721102863
Progress 38.12301166489926
Progress 38.13361611876988
Progress 38.144220572640506
Progress 38.15482502651113
Progress 38.16542948038176
Progress 38.17603393425239
Progress 38.186638388123015
Progress 38.19724284199364
Progress 38.20784729586426
Progress 38.21845174973489
Progress 38.229056203605516
Progress 38.239660657476136
Progress 38.25026511134676
Progress 38.26086956521739
Progress 38.27147401908802
Progress 38.282078472958645
Progress 38.29268292682927
Progress 38.3032873806999
Progress 38.31389183457052
Progress 38.324496288441146
Progress 38.335100742311774
Progress 38.345705196182394
Progress 38.35630965005302
Progress 38.36691410392365
Progress 38.37751855779427
Progress 38.38812301

Progress 41.20890774125133
Progress 41.21951219512195
Progress 41.23011664899258
Progress 41.240721102863205
Progress 41.251325556733825
Progress 41.26193001060445
Progress 41.27253446447508
Progress 41.283138918345706
Progress 41.293743372216326
Progress 41.30434782608695
Progress 41.31495227995758
Progress 41.32555673382821
Progress 41.336161187698835
Progress 41.34676564156946
Progress 41.35737009544008
Progress 41.36797454931071
Progress 41.378579003181336
Progress 41.38918345705196
Progress 41.39978791092258
Progress 41.41039236479321
Progress 41.42099681866384
Progress 41.431601272534465
Progress 41.44220572640509
Progress 41.45281018027572
Progress 41.46341463414634
Progress 41.474019088016966
Progress 41.48462354188759
Progress 41.49522799575821
Progress 41.50583244962884
Progress 41.51643690349947
Progress 41.527041357370095
Progress 41.53764581124072
Progress 41.54825026511135
Progress 41.558854718981976
Progress 41.569459172852596
Progress 41.58006362672322
Progress 41.59066

Progress 44.40084835630965
Progress 44.41145281018027
Progress 44.4220572640509
Progress 44.432661717921526
Progress 44.44326617179215
Progress 44.45387062566278
Progress 44.46447507953341
Progress 44.47507953340403
Progress 44.485683987274655
Progress 44.49628844114528
Progress 44.50689289501591
Progress 44.51749734888653
Progress 44.528101802757156
Progress 44.53870625662778
Progress 44.54931071049841
Progress 44.55991516436904
Progress 44.570519618239665
Progress 44.581124072110285
Progress 44.59172852598091
Progress 44.60233297985154
Progress 44.61293743372216
Progress 44.623541887592786
Progress 44.63414634146341
Progress 44.64475079533404
Progress 44.65535524920467
Progress 44.665959703075295
Progress 44.67656415694592
Progress 44.68716861081654
Progress 44.69777306468717
Progress 44.708377518557796
Progress 44.718981972428416
Progress 44.72958642629904
Progress 44.74019088016967
Progress 44.75079533404029
Progress 44.761399787910925
Progress 44.77200424178155
Progress 44.7826086

Progress 47.59278897136797
Progress 47.6033934252386
Progress 47.61399787910923
Progress 47.624602332979855
Progress 47.635206786850475
Progress 47.6458112407211
Progress 47.65641569459173
Progress 47.66702014846235
Progress 47.67762460233298
Progress 47.68822905620361
Progress 47.69883351007423
Progress 47.70943796394486
Progress 47.720042417815485
Progress 47.730646871686105
Progress 47.74125132555673
Progress 47.75185577942736
Progress 47.762460233297986
Progress 47.773064687168606
Progress 47.78366914103923
Progress 47.79427359490987
Progress 47.80487804878049
Progress 47.815482502651115
Progress 47.82608695652174
Progress 47.83669141039236
Progress 47.84729586426299
Progress 47.857900318133616
Progress 47.86850477200424
Progress 47.87910922587486
Progress 47.88971367974549
Progress 47.90031813361612
Progress 47.910922587486745
Progress 47.92152704135737
Progress 47.932131495228
Progress 47.94273594909862
Progress 47.953340402969246
Progress 47.96394485683987
Progress 47.9745493107

Progress 50.79533404029692
Progress 50.80593849416755
Progress 50.816542948038176
Progress 50.8271474019088
Progress 50.83775185577942
Progress 50.84835630965006
Progress 50.85896076352068
Progress 50.8695652173913
Progress 50.88016967126193
Progress 50.89077412513255
Progress 50.90137857900318
Progress 50.911983032873806
Progress 50.92258748674443
Progress 50.93319194061506
Progress 50.94379639448569
Progress 50.95440084835631
Progress 50.96500530222694
Progress 50.97560975609756
Progress 50.98621420996818
Progress 50.996818663838816
Progress 51.007423117709436
Progress 51.01802757158006
Progress 51.02863202545069
Progress 51.03923647932132
Progress 51.04984093319194
Progress 51.06044538706257
Progress 51.07104984093319
Progress 51.08165429480381
Progress 51.092258748674446
Progress 51.102863202545066
Progress 51.11346765641569
Progress 51.12407211028632
Progress 51.13467656415695
Progress 51.14528101802757
Progress 51.1558854718982
Progress 51.16648992576882
Progress 51.1770943796394

Progress 53.98727465535524
Progress 53.99787910922588
Progress 54.0084835630965
Progress 54.01908801696713
Progress 54.02969247083775
Progress 54.04029692470837
Progress 54.050901378579006
Progress 54.06150583244963
Progress 54.07211028632025
Progress 54.08271474019089
Progress 54.09331919406151
Progress 54.10392364793213
Progress 54.11452810180276
Progress 54.12513255567338
Progress 54.135737009544
Progress 54.146341463414636
Progress 54.156945917285256
Progress 54.16755037115588
Progress 54.17815482502652
Progress 54.18875927889714
Progress 54.19936373276776
Progress 54.20996818663839
Progress 54.22057264050901
Progress 54.23117709437963
Progress 54.241781548250266
Progress 54.252386002120886
Progress 54.26299045599151
Progress 54.27359490986214
Progress 54.28419936373277
Progress 54.2948038176034
Progress 54.30540827147402
Progress 54.31601272534464
Progress 54.326617179215276
Progress 54.337221633085896
Progress 54.347826086956516
Progress 54.35843054082715
Progress 54.369034994697

Progress 57.17921527041357
Progress 57.1898197242842
Progress 57.20042417815483
Progress 57.21102863202545
Progress 57.22163308589607
Progress 57.23223753976671
Progress 57.24284199363733
Progress 57.25344644750795
Progress 57.26405090137858
Progress 57.2746553552492
Progress 57.28525980911983
Progress 57.295864262990456
Progress 57.30646871686108
Progress 57.3170731707317
Progress 57.32767762460234
Progress 57.33828207847296
Progress 57.34888653234358
Progress 57.35949098621421
Progress 57.37009544008483
Progress 57.38069989395546
Progress 57.391304347826086
Progress 57.40190880169671
Progress 57.41251325556734
Progress 57.42311770943797
Progress 57.43372216330859
Progress 57.44432661717922
Progress 57.45493107104984
Progress 57.46553552492046
Progress 57.476139978791096
Progress 57.486744432661716
Progress 57.49734888653234
Progress 57.50795334040297
Progress 57.5185577942736
Progress 57.52916224814422
Progress 57.53976670201485
Progress 57.55037115588547
Progress 57.56097560975609
P

Progress 60.38176033934253
Progress 60.39236479321315
Progress 60.402969247083774
Progress 60.4135737009544
Progress 60.42417815482503
Progress 60.43478260869565
Progress 60.44538706256628
Progress 60.4559915164369
Progress 60.46659597030752
Progress 60.47720042417816
Progress 60.48780487804878
Progress 60.49840933191941
Progress 60.50901378579003
Progress 60.51961823966066
Progress 60.530222693531286
Progress 60.54082714740191
Progress 60.55143160127253
Progress 60.56203605514317
Progress 60.57264050901379
Progress 60.58324496288441
Progress 60.59384941675504
Progress 60.60445387062566
Progress 60.61505832449628
Progress 60.625662778366916
Progress 60.63626723223754
Progress 60.64687168610816
Progress 60.6574761399788
Progress 60.66808059384942
Progress 60.67868504772004
Progress 60.68928950159067
Progress 60.69989395546129
Progress 60.71049840933191
Progress 60.721102863202546
Progress 60.731707317073166
Progress 60.74231177094379
Progress 60.75291622481443
Progress 60.76352067868505

Progress 63.58430540827147
Progress 63.5949098621421
Progress 63.60551431601272
Progress 63.61611876988336
Progress 63.62672322375398
Progress 63.6373276776246
Progress 63.64793213149523
Progress 63.65853658536585
Progress 63.66914103923648
Progress 63.679745493107106
Progress 63.69034994697773
Progress 63.70095440084835
Progress 63.71155885471899
Progress 63.72216330858961
Progress 63.73276776246023
Progress 63.74337221633086
Progress 63.75397667020148
Progress 63.76458112407211
Progress 63.775185577942736
Progress 63.78579003181336
Progress 63.79639448568398
Progress 63.80699893955462
Progress 63.81760339342524
Progress 63.82820784729586
Progress 63.83881230116649
Progress 63.84941675503711
Progress 63.86002120890774
Progress 63.870625662778366
Progress 63.88123011664899
Progress 63.89183457051962
Progress 63.90243902439025
Progress 63.91304347826087
Progress 63.9236479321315
Progress 63.93425238600212
Progress 63.94485683987274
Progress 63.955461293743376
Progress 63.966065747613996

Progress 66.8186638388123
Progress 66.82926829268293
Progress 66.83987274655355
Progress 66.85047720042418
Progress 66.86108165429481
Progress 66.87168610816543
Progress 66.88229056203605
Progress 66.89289501590669
Progress 66.9034994697773
Progress 66.91410392364793
Progress 66.92470837751856
Progress 66.93531283138918
Progress 66.94591728525981
Progress 66.95652173913044
Progress 66.96712619300106
Progress 66.97773064687169
Progress 66.98833510074232
Progress 66.99893955461293
Progress 67.00954400848357
Progress 67.02014846235419
Progress 67.03075291622481
Progress 67.04135737009544
Progress 67.05196182396607
Progress 67.0625662778367
Progress 67.07317073170732
Progress 67.08377518557795
Progress 67.09437963944856
Progress 67.1049840933192
Progress 67.11558854718982
Progress 67.12619300106044
Progress 67.13679745493107
Progress 67.1474019088017
Progress 67.15800636267232
Progress 67.16861081654295
Progress 67.17921527041358
Progress 67.18981972428419
Progress 67.20042417815483
Progre

Progress 70.06362672322375
Progress 70.07423117709438
Progress 70.084835630965
Progress 70.09544008483563
Progress 70.10604453870626
Progress 70.11664899257688
Progress 70.12725344644751
Progress 70.13785790031814
Progress 70.14846235418875
Progress 70.15906680805939
Progress 70.16967126193
Progress 70.18027571580063
Progress 70.19088016967126
Progress 70.20148462354189
Progress 70.21208907741251
Progress 70.22269353128314
Progress 70.23329798515377
Progress 70.24390243902438
Progress 70.25450689289502
Progress 70.26511134676564
Progress 70.27571580063626
Progress 70.28632025450689
Progress 70.29692470837752
Progress 70.30752916224814
Progress 70.31813361611877
Progress 70.3287380699894
Progress 70.33934252386003
Progress 70.34994697773065
Progress 70.36055143160127
Progress 70.37115588547191
Progress 70.38176033934252
Progress 70.39236479321315
Progress 70.40296924708377
Progress 70.4135737009544
Progress 70.42417815482503
Progress 70.43478260869566
Progress 70.44538706256628
Progress

Progress 73.30858960763521
Progress 73.31919406150583
Progress 73.32979851537645
Progress 73.3404029692471
Progress 73.35100742311771
Progress 73.36161187698833
Progress 73.37221633085896
Progress 73.38282078472959
Progress 73.3934252386002
Progress 73.40402969247084
Progress 73.41463414634146
Progress 73.42523860021208
Progress 73.43584305408271
Progress 73.44644750795334
Progress 73.45705196182396
Progress 73.46765641569459
Progress 73.47826086956522
Progress 73.48886532343585
Progress 73.49946977730647
Progress 73.51007423117709
Progress 73.52067868504773
Progress 73.53128313891834
Progress 73.54188759278897
Progress 73.5524920466596
Progress 73.56309650053022
Progress 73.57370095440085
Progress 73.58430540827148
Progress 73.5949098621421
Progress 73.60551431601272
Progress 73.61611876988336
Progress 73.62672322375397
Progress 73.6373276776246
Progress 73.64793213149522
Progress 73.65853658536585
Progress 73.66914103923648
Progress 73.6797454931071
Progress 73.69034994697773
Progres

Progress 76.55355249204666
Progress 76.56415694591729
Progress 76.57476139978792
Progress 76.58536585365854
Progress 76.59597030752916
Progress 76.6065747613998
Progress 76.61717921527041
Progress 76.62778366914104
Progress 76.63838812301167
Progress 76.64899257688229
Progress 76.6595970307529
Progress 76.67020148462355
Progress 76.68080593849417
Progress 76.69141039236479
Progress 76.70201484623543
Progress 76.71261930010604
Progress 76.72322375397667
Progress 76.7338282078473
Progress 76.74443266171792
Progress 76.75503711558854
Progress 76.76564156945918
Progress 76.7762460233298
Progress 76.78685047720042
Progress 76.79745493107106
Progress 76.80805938494167
Progress 76.81866383881231
Progress 76.82926829268293
Progress 76.83987274655355
Progress 76.85047720042418
Progress 76.86108165429481
Progress 76.87168610816542
Progress 76.88229056203606
Progress 76.89289501590667
Progress 76.9034994697773
Progress 76.91410392364794
Progress 76.92470837751856
Progress 76.93531283138918
Progre

Progress 79.79851537645811
Progress 79.80911983032874
Progress 79.81972428419937
Progress 79.83032873807
Progress 79.84093319194061
Progress 79.85153764581125
Progress 79.86214209968186
Progress 79.87274655355249
Progress 79.88335100742312
Progress 79.89395546129374
Progress 79.90455991516437
Progress 79.915164369035
Progress 79.92576882290562
Progress 79.93637327677625
Progress 79.94697773064688
Progress 79.95758218451749
Progress 79.96818663838813
Progress 79.97879109225875
Progress 79.98939554612937
Progress 80.0
Progress 80.01060445387063
Progress 80.02120890774125
Progress 80.03181336161188
Progress 80.04241781548251
Progress 80.05302226935312
Progress 80.06362672322376
Progress 80.07423117709438
Progress 80.084835630965
Progress 80.09544008483563
Progress 80.10604453870626
Progress 80.11664899257688
Progress 80.12725344644751
Progress 80.13785790031814
Progress 80.14846235418875
Progress 80.15906680805939
Progress 80.16967126193
Progress 80.18027571580063
Progress 80.190880169671

Progress 83.04347826086956
Progress 83.05408271474019
Progress 83.06468716861082
Progress 83.07529162248144
Progress 83.08589607635207
Progress 83.0965005302227
Progress 83.10710498409331
Progress 83.11770943796395
Progress 83.12831389183457
Progress 83.13891834570519
Progress 83.14952279957582
Progress 83.16012725344645
Progress 83.17073170731707
Progress 83.1813361611877
Progress 83.19194061505833
Progress 83.20254506892894
Progress 83.21314952279958
Progress 83.2237539766702
Progress 83.23435843054082
Progress 83.24496288441145
Progress 83.25556733828208
Progress 83.2661717921527
Progress 83.27677624602333
Progress 83.28738069989396
Progress 83.29798515376459
Progress 83.30858960763521
Progress 83.31919406150583
Progress 83.32979851537647
Progress 83.34040296924708
Progress 83.35100742311771
Progress 83.36161187698833
Progress 83.37221633085896
Progress 83.38282078472959
Progress 83.39342523860022
Progress 83.40402969247084
Progress 83.41463414634146
Progress 83.4252386002121
Progre

Progress 86.28844114528101
Progress 86.29904559915164
Progress 86.30965005302227
Progress 86.3202545068929
Progress 86.33085896076352
Progress 86.34146341463415
Progress 86.35206786850476
Progress 86.3626723223754
Progress 86.37327677624602
Progress 86.38388123011664
Progress 86.39448568398727
Progress 86.4050901378579
Progress 86.41569459172854
Progress 86.42629904559915
Progress 86.43690349946978
Progress 86.4475079533404
Progress 86.45811240721103
Progress 86.46871686108165
Progress 86.47932131495229
Progress 86.4899257688229
Progress 86.50053022269353
Progress 86.51113467656415
Progress 86.52173913043478
Progress 86.53234358430541
Progress 86.54294803817604
Progress 86.55355249204666
Progress 86.56415694591728
Progress 86.57476139978792
Progress 86.58536585365853
Progress 86.59597030752916
Progress 86.60657476139978
Progress 86.61717921527041
Progress 86.62778366914104
Progress 86.63838812301167
Progress 86.64899257688229
Progress 86.65959703075292
Progress 86.67020148462355
Progre

Progress 89.53340402969248
Progress 89.5440084835631
Progress 89.55461293743372
Progress 89.56521739130436
Progress 89.57582184517497
Progress 89.5864262990456
Progress 89.59703075291623
Progress 89.60763520678685
Progress 89.61823966065748
Progress 89.6288441145281
Progress 89.63944856839873
Progress 89.65005302226935
Progress 89.66065747613999
Progress 89.6712619300106
Progress 89.68186638388123
Progress 89.69247083775186
Progress 89.70307529162248
Progress 89.7136797454931
Progress 89.72428419936374
Progress 89.73488865323435
Progress 89.74549310710498
Progress 89.75609756097562
Progress 89.76670201484623
Progress 89.77730646871687
Progress 89.78791092258749
Progress 89.79851537645811
Progress 89.80911983032874
Progress 89.81972428419937
Progress 89.83032873806998
Progress 89.84093319194062
Progress 89.85153764581125
Progress 89.86214209968186
Progress 89.8727465535525
Progress 89.88335100742312
Progress 89.89395546129374
Progress 89.90455991516437
Progress 89.915164369035
Progress 

Progress 92.77836691410393
Progress 92.78897136797455
Progress 92.79957582184517
Progress 92.81018027571581
Progress 92.82078472958642
Progress 92.83138918345705
Progress 92.84199363732768
Progress 92.8525980911983
Progress 92.86320254506893
Progress 92.87380699893956
Progress 92.88441145281018
Progress 92.89501590668081
Progress 92.90562036055144
Progress 92.91622481442205
Progress 92.92682926829269
Progress 92.9374337221633
Progress 92.94803817603393
Progress 92.95864262990456
Progress 92.96924708377519
Progress 92.97985153764581
Progress 92.99045599151644
Progress 93.00106044538707
Progress 93.01166489925768
Progress 93.02226935312832
Progress 93.03287380699894
Progress 93.04347826086956
Progress 93.05408271474019
Progress 93.06468716861082
Progress 93.07529162248144
Progress 93.08589607635207
Progress 93.0965005302227
Progress 93.10710498409331
Progress 93.11770943796395
Progress 93.12831389183457
Progress 93.1389183457052
Progress 93.14952279957582
Progress 93.16012725344645
Progr

Progress 96.02332979851538
Progress 96.033934252386
Progress 96.04453870625663
Progress 96.05514316012726
Progress 96.06574761399787
Progress 96.07635206786851
Progress 96.08695652173913
Progress 96.09756097560975
Progress 96.10816542948038
Progress 96.118769883351
Progress 96.12937433722163
Progress 96.13997879109226
Progress 96.15058324496289
Progress 96.1611876988335
Progress 96.17179215270414
Progress 96.18239660657476
Progress 96.19300106044538
Progress 96.20360551431601
Progress 96.21420996818664
Progress 96.22481442205726
Progress 96.23541887592789
Progress 96.24602332979852
Progress 96.25662778366915
Progress 96.26723223753977
Progress 96.27783669141039
Progress 96.28844114528103
Progress 96.29904559915164
Progress 96.30965005302227
Progress 96.3202545068929
Progress 96.33085896076352
Progress 96.34146341463415
Progress 96.35206786850478
Progress 96.3626723223754
Progress 96.37327677624602
Progress 96.38388123011666
Progress 96.39448568398727
Progress 96.4050901378579
Progress 

Progress 99.26829268292683
Progress 99.27889713679745
Progress 99.28950159066808
Progress 99.30010604453871
Progress 99.31071049840932
Progress 99.32131495227996
Progress 99.33191940615058
Progress 99.3425238600212
Progress 99.35312831389183
Progress 99.36373276776246
Progress 99.37433722163308
Progress 99.38494167550371
Progress 99.39554612937434
Progress 99.40615058324497
Progress 99.41675503711559
Progress 99.4273594909862
Progress 99.43796394485685
Progress 99.44856839872746
Progress 99.45917285259809
Progress 99.46977730646871
Progress 99.48038176033934
Progress 99.49098621420997
Progress 99.5015906680806
Progress 99.51219512195122
Progress 99.52279957582184
Progress 99.53340402969248
Progress 99.54400848356309
Progress 99.55461293743372
Progress 99.56521739130434
Progress 99.57582184517497
Progress 99.5864262990456
Progress 99.59703075291623
Progress 99.60763520678685
Progress 99.61823966065748
Progress 99.6288441145281
Progress 99.63944856839872
Progress 99.65005302226936
Progre

In [None]:
df_home_add.to_csv('home_data.csv')

Great, so now we got the features for the ~9500 listings. Let's see how many features we ended up with and find out how much cleaning we have to do.

In [51]:
df_home_add.head()

Unnamed: 0,adresse,andenmaegler,boligOrGrundAreal,boligurl,city,ejendomstypePrimaerNicename,isNew,lat,lng,openHouseEndDate,...,Ydre mur,Ydre murværk,Ydremur,Ydremure,Ydrevægge,_________________________,købers brug af ejendomme,varmeinstallation:,Øvrige oplysninger,Øvrige oplysninger:
0,"A.D. Jørgensens Vej 75, 2. 1.",False,35.0,https://home.dk/boligkatalog/frederiksberg/200...,Frederiksberg,Ejerlejlighed,False,55.680726,12.494705,2019-10-27T11:20,...,,,,,,,,,,
1,"Holger Danskes Vej 14, 3. th.",False,46.0,https://home.dk/boligkatalog/frederiksberg/200...,Frederiksberg,Ejerlejlighed,False,55.686615,12.538356,,...,,,,,,,,,,
11,"Lyøvej 5, st.. tv.",False,60.0,https://home.dk/boligkatalog/frederiksberg/200...,Frederiksberg,Ejerlejlighed,False,55.68294,12.524527,,...,,,,,,,,,,
12,"H. Schneekloths Vej 13, 5. th.",False,56.0,https://home.dk/boligkatalog/frederiksberg/200...,Frederiksberg,Ejerlejlighed,False,55.679928,12.506927,,...,,,,,,,,,,
30,"Howitzvej 61, 3. th.",False,67.0,https://home.dk/boligkatalog/frederiksberg/200...,Frederiksberg,Ejerlejlighed,True,55.680209,12.523998,2019-11-03T15:20,...,,,,,,,,,,


In [122]:
cols = df_home_add.columns.tolist()
cols

['adresse',
 'andenmaegler',
 'boligOrGrundAreal',
 'boligurl',
 'city',
 'ejendomstypePrimaerNicename',
 'isNew',
 'lat',
 'lng',
 'openHouseEndDate',
 'openHouseStartDate',
 'overskrift2',
 'postal',
 'price',
 'sagsnummer',
 '****hvis boligen er udlej',
 'Aconto forbrug  pr. måned',
 'Adgangsvej',
 'Adresse',
 'Afløb',
 'Afløbsforhld',
 'Afløbsforhold',
 'Afløbsforhold:',
 'Afstand indkøb',
 'Afstand til indkøb',
 'Afstand til mariager fjor',
 'Afstand til off. transport',
 'Afstand til skole',
 'Afstand til skov',
 'Afstand til strand',
 'Afstand til vand',
 'Afstande',
 'Alarm',
 'Altan',
 'Altan:',
 'Alternative energikilder',
 'Andelboligforenings hjemm',
 'Andelsboligforening',
 'Andelsforening',
 'Andet',
 'Andre bygningsændringer',
 'Anetenneforhold',
 'Anlægsarbejder/påbud',
 'Antal plan',
 'Antal rum',
 'Antal toiletter',
 'Anten.forh.',
 'Antenn.forh.',
 'Antenne',
 'Antenne & internet',
 'Antenne forh.',
 'Antenne forhold',
 'Antenne forhold:',
 'Antenne og internet',
 'A

Alright, so knowing Danish (and you just have to trust me on that one), there are some very odd features like 'Byplansvedtægt', which translate to urban plan regulation, or 'Centralstøvsuger' which means central vacuum cleaner. Furthermore we have a lot of different features that are probably the same like 'Altan' and 'Altan:' (balcony).

So the first two tasks include getting rid of the weird stuff and merging columns.

We have a lot of columns, so let's get to it. To keep an overview let's review a subset at a time.

In [364]:
dropped_cols = []

In [365]:
df_home_add[cols[0:10]].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9430 entries, 0 to 51069
Data columns (total 10 columns):
adresse                        9430 non-null object
andenmaegler                   9430 non-null bool
boligOrGrundAreal              9418 non-null float64
boligurl                       9430 non-null object
city                           9429 non-null object
ejendomstypePrimaerNicename    9430 non-null object
isNew                          9430 non-null bool
lat                            9055 non-null float64
lng                            9055 non-null float64
openHouseEndDate               1504 non-null object
dtypes: bool(2), float64(3), object(5)
memory usage: 1001.5+ KB


We can drop 'andenmaegler' and boligurl now as we don't need them anymore. The open house end date is only practical information as missing values don't indicate whether there's an open house or not, so that should be dropped as well.
The rest we should keep and no need for merging.

In [366]:
dropped_cols.extend(['andenmaegler','boligurl','openHouseEndDate'])

In [367]:
df_home_add[cols[10:20]].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9430 entries, 0 to 51069
Data columns (total 10 columns):
openHouseStartDate           1504 non-null object
overskrift2                  8660 non-null object
postal                       9430 non-null int64
price                        9430 non-null object
sagsnummer                   9430 non-null object
****hvis boligen er udlej    0 non-null object
Aconto forbrug  pr. måned    205 non-null object
Adgangsvej                   0 non-null object
Adresse                      0 non-null object
Afløb                        1 non-null object
dtypes: int64(1), object(9)
memory usage: 1.1+ MB


overskrift2 is the secound heading which doesn't really say anything about the listing, so that should be dropped along with the openHouseStartDate, the weird **** column that doesn't contain data anyway, the Adgangsvej (access way) and the Adresse since we already have that. Afløb (drainage) was one of the columns with multiple forms, so let's dive into that.

In [368]:
dropped_cols.extend(['openHouseStartDate','overskrift2','****hvis boligen er udlej','Adgangsvej','Adresse'])

In [369]:
df_home_add[cols[19:32]].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9430 entries, 0 to 51069
Data columns (total 13 columns):
Afløb                         1 non-null object
Afløbsforhld                  0 non-null object
Afløbsforhold                 8 non-null object
Afløbsforhold:                0 non-null object
Afstand indkøb                1 non-null object
Afstand til indkøb            1771 non-null object
Afstand til mariager fjor     1 non-null object
Afstand til off. transport    1451 non-null object
Afstand til skole             1503 non-null object
Afstand til skov              138 non-null object
Afstand til strand            1 non-null object
Afstand til vand              398 non-null object
Afstande                      0 non-null object
dtypes: object(13)
memory usage: 1.3+ MB


Okay, so not very many rows have them anyway, so let's drop everything related to Afløb.

In [370]:
dropped_cols.extend(['Afløb','Afløbsforhld','Afløbsforhold','Afløbsforhold:'])

Afstand indkøb and Afstand til indkøb (distance to grocery shopping) is the same so they should be merged.

Afstand til mariager fjor (specific place in nothern Denmark), Afstand til strand (beach) and Afstande should be dropped.

In [371]:
dropped_cols.extend(['Afstand til mariager fjor','Afstand til strand','Afstande'])

In [372]:
df_home_add[cols[32:50]].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9430 entries, 0 to 51069
Data columns (total 18 columns):
Alarm                        0 non-null object
Altan                        263 non-null object
Altan:                       2 non-null object
Alternative energikilder     0 non-null object
Andelboligforenings hjemm    0 non-null object
Andelsboligforening          0 non-null object
Andelsforening               1 non-null object
Andet                        1 non-null object
Andre bygningsændringer      0 non-null object
Anetenneforhold              1 non-null object
Anlægsarbejder/påbud         0 non-null object
Antal plan                   1577 non-null object
Antal rum                    1941 non-null object
Antal toiletter              1486 non-null object
Anten.forh.                  0 non-null object
Antenn.forh.                 0 non-null object
Antenne                      340 non-null object
Antenne & internet           0 non-null object
dtypes: object(18)
memory usage: 

Alarm, Alternative energikilder, Andelboligforenings hjemm, Andelsboligforening, Andelsforening, Andet, Andre bygningsændringer, Anlægsarbejder/påbud should all be dropped

In [373]:
dropped_cols.extend(['Alarm', 'Alternative energikilder', 
                     'Andelboligforenings hjemm', 'Andelsboligforening', 
                     'Andelsforening', 'Andet', 'Andre bygningsændringer', 
                     'Anlægsarbejder/påbud'])

Altan and Altan: should be merged. Along with anything related to antenna.

In [374]:
df_home_add[cols[50:80]].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9430 entries, 0 to 51069
Data columns (total 30 columns):
Antenne forh.                6 non-null object
Antenne forhold              19 non-null object
Antenne forhold:             0 non-null object
Antenne og internet          4 non-null object
Antenne og it                0 non-null object
Antenne- og internetforho    6 non-null object
Antenne/bredbånd             1 non-null object
Antenne/fibernet             0 non-null object
Antenne/internet             1 non-null object
Antenne/internet:            2 non-null object
Antenne/parabol              1 non-null object
Antenne/tv                   0 non-null object
Antenne/tv/internet          1 non-null object
Antenne1                     1 non-null object
Antenne:                     31 non-null object
Antenne: (enhver udgift h    0 non-null object
Antenneforbindelse           0 non-null object
Antenneforening              0 non-null object
Antenneforh.                 8 non-null obje

The Antenne stuff needs to be investigated on it's own and to be merged into a single (or a few) column(s).

Arealer (areas) can be dropped along with Bad (bath), Badeværelse (bathroom) and Bebyggelse.

In [375]:
dropped_cols.extend(['Arealer', 'Bad', 'Badeværelse', 'Bebyggelse:'])

In [376]:
df_home_add[cols[80:100]].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9430 entries, 0 to 51069
Data columns (total 20 columns):
Bebyggelsens højde                                                                     0 non-null object
Bebyggelses højde                                                                      2 non-null object
Bebyggelsesprocent                                                                     9 non-null object
Bebyggelsesprocent:                                                                    0 non-null object
Benyttelse                                                                             0 non-null object
Beplantning                                                                            0 non-null object
Bevaringskategori                                                                      4 non-null object
Bevaringsværdi                                                                         16 non-null object
Bevaringsværdi:                                    

Everything but Boligarea (estate area) and the weird looking Brutto/Netto columns can be droppped.

In [377]:
dropped_cols.extend(['Bebyggelsens højde', 'Bebyggelses højde',
                     'Bebyggelsesprocent', 'Bebyggelsesprocent:',
                     'Benyttelse', 'Beplantning', 'Bevaringskategori',
                     'Bevaringsværdi', 'Bevaringsværdi:', 'Bevaringsværdig',
                     'Boligydelse pr. måned', 'Bopælspligt:', 'Bortforpagtning:', 
                     'Brugsret til have', 'Brugsret til kælderrum', 'Brændeovn'])

In [378]:
df_home_add[cols[100:120]].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9430 entries, 0 to 51069
Data columns (total 20 columns):
Bugsret til                0 non-null object
Busforbindelser            0 non-null object
Bygepligt:                 1 non-null object
Byggehøjde                 0 non-null object
Byggemodning               0 non-null object
Byggepligt                 25 non-null object
Byggepligt:                0 non-null object
Byggeår                    1945 non-null object
Byggeår / ombygning        0 non-null object
Byggeår /om- tilbygning    0 non-null object
Bylaug                     1 non-null object
Byplansvedtægt:            0 non-null object
Byplanvedtægt              0 non-null object
Bøbelsespligt              0 non-null object
Børnehave                  0 non-null object
Børnehuset bulderby        0 non-null object
Børnehuset lille ørholm    0 non-null object
Carport og udhus           0 non-null object
Centralstøvsuger           0 non-null object
Connection timed out       0 non-

Let's only keep the Byggeår (year of construction).

In [379]:
dropped_cols.extend(cols[100:107])
dropped_cols.extend(cols[108:120])

In [380]:
df_home_add[cols[120:140]].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9430 entries, 0 to 51069
Data columns (total 20 columns):
Cykelkælder                 1 non-null object
Cykelparkering              17 non-null object
Dagenstution                0 non-null object
Daginstitution              10 non-null object
Dagninstitution             0 non-null object
Dagsinstitution             2 non-null object
Dagsinstution               0 non-null object
Depositum                   205 non-null object
Depot til lejligheden       1 non-null object
Depotrum                    11 non-null object
Depotrum:                   2 non-null object
Distriktsskole              1 non-null object
Diverse                     0 non-null object
Dør:                        0 non-null object
Døre indv.                  4 non-null object
Dørtelefon                  53 non-null object
Dørtelefon:                 3 non-null object
Egen parkeringsplads        0 non-null object
Ejendommen kan overtages    0 non-null object
Ejendommens 

All of these can be dropped.

In [381]:
dropped_cols.extend(cols[120:140])

In [382]:
df_home_add[cols[140:180]].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9430 entries, 0 to 51069
Data columns (total 40 columns):
Ejendomsstatus               1 non-null object
Ejendomstype                 0 non-null object
Ejendomsværdi i kr.          1824 non-null object
Ejerforening                 9 non-null object
Ejerforening:                2 non-null object
Ejerforeningen               7 non-null object
Ejerforeningens hjemmesid    0 non-null object
Ejerlaug hjemmeside          0 non-null object
Ejerudgift pr. md.           1825 non-null object
Ekstra areal                 1 non-null object
El                           25 non-null object
El - tilslutningsbidrag      1 non-null object
El stik bidrag               0 non-null object
El:                          1 non-null object
Elektricitet                 0 non-null object
Elevator                     2 non-null object
Elevator i opgang:           2 non-null object
Elinstallationer             1 non-null object
Energi-optimeringer          0 non-null

Let's keep the Ejendomsværdi i kr. (value of property), Etage (floor) and Energimærke (energy rating).

In [383]:
dropped_cols.extend(cols[140:142])
dropped_cols.extend(cols[143:160])
dropped_cols.extend(cols[161:164])
dropped_cols.extend(cols[165:180])

In [384]:
df_home_add[cols[180:210]].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9430 entries, 0 to 51069
Data columns (total 30 columns):
Forkøbsret/venteliste:       0 non-null object
Forpagtning                  1 non-null object
Forudbetalt leje             205 non-null object
Forurening                   11 non-null object
Forurening:                  0 non-null object
Forventet indflytning        2 non-null object
Forventet overtagelse        3 non-null object
Forældrekøb                  0 non-null object
Forældrekøb muligt           1 non-null object
Forældrekøb muligt:          0 non-null object
Forældrekøb:                 1 non-null object
Fredskov:                    1 non-null object
Fundament                    31 non-null object
Fundament:                   27 non-null object
Fælles faciliteter           1 non-null object
Fælles parkeringsplads       0 non-null object
Fælles vaskeri               0 non-null object
Fællesantenne                42 non-null object
Fællesantenne:               0 non-null 

The Forudbetalt leje means prepaid rent - maybe some rentals snuck their way into our dataset! This should be investigated. But otherwise all of the above features can be dropped.

In [385]:
dropped_cols.extend(cols[180:210])

In [386]:
df_home_add[cols[210:233]].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9430 entries, 0 to 51069
Data columns (total 23 columns):
Garage/carport               0 non-null object
Garanti                      0 non-null object
Genvex                       0 non-null object
Gf.forening                  15 non-null object
Gf.forening uden medlemsp    1 non-null object
Gr. ejerforening             0 non-null object
Gr. forening                 1 non-null object
Gr.ejerforening              0 non-null object
Gr.forening                  6 non-null object
Grund                        0 non-null object
Grund-/ejerforening          0 non-null object
Grundareal                   1531 non-null object
Grundeejerforening           0 non-null object
Grundejer-/vejforening       0 non-null object
Grundejerfor.                0 non-null object
Grundejerforening            209 non-null object
Grundejerforening / vejla    0 non-null object
Grundejerforening:           42 non-null object
Grundejerforeningen          0 non-null

Let's keep the Grundareal (ground area) and drop the rest - the 

In [387]:
dropped_cols.extend(cols[210:221])
dropped_cols.extend(cols[222:233])

In [388]:
df_home_add[cols[233:300]].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9430 entries, 0 to 51069
Data columns (total 67 columns):
Gulv                         37 non-null object
Gulv i køkken                0 non-null object
Gulv:                        0 non-null object
Gulvarme                     1 non-null object
Gulvbelægning                0 non-null object
Gulvbelægning :              0 non-null object
Gulvbelægninger              16 non-null object
Gulve                        793 non-null object
Gulve yderligere             1 non-null object
Gulve/lofter                 2 non-null object
Gulve/lofter:                0 non-null object
Gulve:                       51 non-null object
Gulvvarme                    2 non-null object
Gulvvarme, rum               0 non-null object
Gulvvarme, rum:              0 non-null object
Gulvvarme:                   0 non-null object
Gårdmiljø                    31 non-null object
Gårdmiljø/have               0 non-null object
Gårdsmiljø/have              1 non-null 

Anything related to Gulve (flooring) we're gonna keep (and merge) and anything related to the Internet can possibly be merged with the antenna columns, so we'll keep those as well for now. The Heraf grundværdi i kr. is also refering to the estate value so that can maybe also be merged with the previous column. Indkøb (shopping) could also be realted to earlier columns. Everything else can be dropped.

In [389]:
dropped_cols.extend(cols[249:258])
dropped_cols.extend(cols[259:278])
dropped_cols.extend(cols[279:282])
dropped_cols.extend(cols[289:300])

In [390]:
df_home_add[cols[300:350]].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9430 entries, 0 to 51069
Data columns (total 50 columns):
Isolering - lofter           0 non-null object
Isolering - lofter & skun    0 non-null object
Isolering - tilbygning       0 non-null object
Isolering - udestue          0 non-null object
Isolering - vægge            0 non-null object
Isolering - vægge i opr.     0 non-null object
Isolering - vægge i tilby    0 non-null object
Isolering - vægge kælder     0 non-null object
Isolering - vægge stuepla    0 non-null object
Isolering - ydervægge        0 non-null object
Isolering gulv               0 non-null object
Isolering gulv - kælder      0 non-null object
Isolering gulve              0 non-null object
Isolering iflg. energimær    0 non-null object
Isolering iflg. sælger:      1 non-null object
Isolering iflg. sælger: l    1 non-null object
Isolering iflg. sælgers o    0 non-null object
Isolering iht. ejendommen    0 non-null object
Isolering iht. energimærk    0 non-null object

Kloak (sewers) seem like something to keep, alo the kabel-tv/internet stuff is also related to the antenna columns. The kontantpris is the listingprice, which we already have and the Kvm. pris is the price per squaremeter, which we can derive. So let's keep the sewer stuff and cable-tv/internet and drop the rest.

In [391]:
dropped_cols.extend(cols[300:328])
dropped_cols.extend(cols[340:350])

In [392]:
df_home_add[cols[350:400]].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9430 entries, 0 to 51069
Data columns (total 50 columns):
Kælderareal                  439 non-null object
Kælderrum                    9 non-null object
Kælderrum / loftrum          0 non-null object
Kælderrum/loftrum            0 non-null object
Køb af 5 grunde i udstykn    0 non-null object
Køb af de 3 nederste grun    0 non-null object
Køb af de 4 første grunde    0 non-null object
Købesumsfordeling            0 non-null object
Købspris                     22 non-null object
Køkken                       14 non-null object
Køkken og inventar           0 non-null object
Køkken og invetar            0 non-null object
Landbrugspligt               0 non-null object
Landbylaug                   0 non-null object
Landsbylaug & aktivitetsf    1 non-null object
Legeplads                    1 non-null object
Leje pr. måned               205 non-null object
Lejeforhold                  38 non-null object
Lejeindtægt                  2 non-null

Let's keep the stuff about kælder (basement) and Loft (attic). The Købspris is the listing price (again) so that can be discared.

In [393]:
dropped_cols.extend(cols[352:372])
dropped_cols.extend(cols[383:400])

In [394]:
df_home_add[cols[400:450]].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9430 entries, 0 to 51069
Data columns (total 50 columns):
Modrerniseringer             0 non-null object
Motorvej                     0 non-null object
Mulighed for bredbånd        0 non-null object
Mulighed for udlejning       0 non-null object
Mulighed for udlejning af    5 non-null object
Mur                          1 non-null object
Muret/isoleret anneks        0 non-null object
Muret/isoleret garage        0 non-null object
Murværk                      0 non-null object
Nedgravet olietank           1 non-null object
Nedsænket lofter             1 non-null object
Nlærmeste skole              1 non-null object
Næmeste skole                0 non-null object
Nærmeste børnehave           0 non-null object
Nærmeste folkeskole          15 non-null object
Nærmeste skole               8 non-null object
Nærmeste skole:              1 non-null object
Offentlig transport          1 non-null object
Olietank                     2 non-null objec

Overtagelse is just the acquisition date, which doesn't carry that much info. The prisudvikling is the price development, which might be interesting.

In [395]:
dropped_cols.extend(cols[400:447])
dropped_cols.extend(cols[448:450])

In [396]:
df_home_add[cols[450:550]].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9430 entries, 0 to 51069
Data columns (total 100 columns):
Pulterrrum:                  1 non-null object
Pulterrum                    27 non-null object
Pulterrum:                   2 non-null object
Sagsnr.                      2052 non-null object
Samlet bebygget areal        0 non-null object
Sekundær varmekilde          0 non-null object
Sekundær varmkilde           0 non-null object
Septictank                   0 non-null object
Skole                        248 non-null object
Skole/tilhørsforhold         0 non-null object
Skole:                       9 non-null object
Skoledistrikt                11 non-null object
Skoledistrikt:               24 non-null object
Skoleforhold                 0 non-null object
Skoletihørsforhold           1 non-null object
Skoletilhørsforhold          46 non-null object
Skoletilhørsforhold:         0 non-null object
Skov                         0 non-null object
Skur                         0 non-n

Skole (school) might be related to a previous column, so let's keep that. Sagsnr. is just the id and we already have that in another column. Tilbehør (accessories) might be interesting, so let's keep that. Udbetaling is the down payment and is derived from the listing price, so let's drop everything besides the skole columns and the Tilbehør.

In [397]:
dropped_cols.extend(cols[450:458])
dropped_cols.extend(cols[467:507])
dropped_cols.extend(cols[508:550])

In [398]:
df_home_add[cols[550:600]].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9430 entries, 0 to 51069
Data columns (total 50 columns):
Vand - varme  - kloak      0 non-null object
Vand - vej  - kloak        0 non-null object
Vand - vej - kloak         1 non-null object
Vand / vej / kloak         4 non-null object
Vand/vej                   11 non-null object
Vand/vej/kloak             42 non-null object
Vand/vej/kloak:            6 non-null object
Vand/vej:                  0 non-null object
Vand:                      5 non-null object
Vandforsyning              6 non-null object
Vandforsyning :            0 non-null object
Vandværk                   34 non-null object
Varemneinstallation        1 non-null object
Varme                      77 non-null object
Varme installation         1 non-null object
Varme og vand              0 non-null object
Varme, el og vand          0 non-null object
Varme/vand                 0 non-null object
Varme:                     3 non-null object
Varmeforhold               3 non-

Anything related to Varme (heat) we're gonna keep for now, but drop the rest.

In [399]:
dropped_cols.extend(cols[550:562])
dropped_cols.extend(cols[591:600])

In [400]:
df_home_add[cols[600:]].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9430 entries, 0 to 51069
Data columns (total 54 columns):
Vej :                        1 non-null object
Vej:                         13 non-null object
Vejfond                      0 non-null object
Vejforening                  0 non-null object
Vejlaug                      1 non-null object
Venteliste                   0 non-null object
Ventilation                  1 non-null object
Vicevært                     75 non-null object
Vicevært/ejendomsservice     1 non-null object
Vicevært:                    6 non-null object
Villæn er tegnet af          0 non-null object
Vinder                       0 non-null object
Vindue                       0 non-null object
Vinduer                      978 non-null object
Vinduer & døre               0 non-null object
Vinduer - resterende vind    0 non-null object
Vinduer - sydvendt facade    0 non-null object
Vinduer / yderdøre           0 non-null object
Vinduer / årgang             1 non-null ob

Vinduer (windows), vurderingsår (year of value assesment) and Ydermur (outer wall/exterior) we're gonna keep, and the rest is to be dropped.

In [401]:
dropped_cols.extend(cols[600:612])
dropped_cols.extend(cols[625:628])
dropped_cols.extend(cols[629:630])
dropped_cols.extend(cols[649:])

Let's have a look at the columns we're gonna drop:

In [402]:
dropped_cols

['andenmaegler',
 'boligurl',
 'openHouseEndDate',
 'openHouseStartDate',
 'overskrift2',
 '****hvis boligen er udlej',
 'Adgangsvej',
 'Adresse',
 'Afløb',
 'Afløbsforhld',
 'Afløbsforhold',
 'Afløbsforhold:',
 'Afstand til mariager fjor',
 'Afstand til strand',
 'Afstande',
 'Alarm',
 'Alternative energikilder',
 'Andelboligforenings hjemm',
 'Andelsboligforening',
 'Andelsforening',
 'Andet',
 'Andre bygningsændringer',
 'Anlægsarbejder/påbud',
 'Arealer',
 'Bad',
 'Badeværelse',
 'Bebyggelse:',
 'Bebyggelsens højde',
 'Bebyggelses højde',
 'Bebyggelsesprocent',
 'Bebyggelsesprocent:',
 'Benyttelse',
 'Beplantning',
 'Bevaringskategori',
 'Bevaringsværdi',
 'Bevaringsværdi:',
 'Bevaringsværdig',
 'Boligydelse pr. måned',
 'Bopælspligt:',
 'Bortforpagtning:',
 'Brugsret til have',
 'Brugsret til kælderrum',
 'Brændeovn',
 'Bugsret til',
 'Busforbindelser',
 'Bygepligt:',
 'Byggehøjde',
 'Byggemodning',
 'Byggepligt',
 'Byggepligt:',
 'Byggeår / ombygning',
 'Byggeår /om- tilbygning',

In [686]:
df_clean = df_home_add.copy()

In [687]:
df_clean.drop(columns = dropped_cols, inplace = True)

In [409]:
df_clean.to_csv('home_data_clean.csv')

First of let's look at the rentals that might have snuck into the dataset and let's drop them.

In [688]:
df_clean.ejendomstypePrimaerNicename.unique()

array(['Ejerlejlighed', 'Andelsbolig', 'Rækkehus', 'Lejebolig',
       'Villalejlighed', 'Villa', 'Fritidshus', 'Helårsgrund',
       'Landejendom', 'Fritidsgrund'], dtype=object)

In [689]:
df_clean[df_clean.ejendomstypePrimaerNicename == 'Lejebolig'].head()

Unnamed: 0,adresse,boligOrGrundAreal,city,ejendomstypePrimaerNicename,isNew,lat,lng,postal,price,sagsnummer,Aconto forbrug pr. måned,Afstand indkøb,Afstand til indkøb,Afstand til off. transport,Afstand til skole,Afstand til skov,Afstand til vand,Altan,Altan:,Anetenneforhold,Antal plan,Antal rum,Antal toiletter,Anten.forh.,Antenn.forh.,Antenne,Antenne & internet,Antenne forh.,Antenne forhold,Antenne forhold:,Antenne og internet,Antenne og it,Antenne- og internetforho,Antenne/bredbånd,Antenne/fibernet,Antenne/internet,Antenne/internet:,Antenne/parabol,Antenne/tv,Antenne/tv/internet,Antenne1,Antenne:,Antenne: (enhver udgift h,Antenneforbindelse,Antenneforening,Antenneforh.,Antenneforh.:,Antenneforhold,Antenneforhold og interne,Antenneforhold.,Antenneforhold:,Antennen,Antennetilslutning,Boligareal,Bredbånd,Brutto/Netto  ?  ekskl. ejerudgift,Brutto/Netto  ekskl. ejerudgift,Byggeår,Ejendomsværdi i kr.,Energimærke,Etage,Grundareal,Gulv,Gulv i køkken,Gulv:,Gulvarme,Gulvbelægning,Gulvbelægning :,Gulvbelægninger,Gulve,Gulve yderligere,Gulve/lofter,Gulve/lofter:,Gulve:,Gulvvarme,"Gulvvarme, rum","Gulvvarme, rum:",Gulvvarme:,Heraf grundværdi i kr.,Indkøb,Internet,Internet-/tv-forhold,Internet/antenne,Internet/antenne:,Internet/tv,Internet:,Internetforbindelse,Kabel-tv,Kabel-tv/internet,Kabeltv / internet,Kloak,Kloak - tilslutningsbidra,Kloak/vand/vej,Kloak:,Kloakering,Kloakering :,Kloaksepareret,Kloakseparering,Kloark,Kælderareal,Kælderrum,Loft,Loft / vægge / gulve,Loft:,Loftbeklædning,Loftbeklædninmg,Lofte,Lofter,Lofter :,Lofter og vægge,Lofter/gulve,Lofter:,Prisudvikling,Skole,Skole/tilhørsforhold,Skole:,Skoledistrikt,Skoledistrikt:,Skoleforhold,Skoletihørsforhold,Skoletilhørsforhold,Skoletilhørsforhold:,Tilbehør,Varemneinstallation,Varme,Varme installation,Varme og vand,"Varme, el og vand",Varme/vand,Varme:,Varmeforhold,Varmeforsyning,Varmeforsyning :,Varmeforsyning primær,Varmeforsyning sekundær,Varmeinst.,Varmeinstalation,Varmeinstallaion,Varmeinstallation,Varmeinstallation:,Varmeinstallationer,Varmeinstalletion:,Varmeinstillation,Varmekilde,Varmekilde hus 2:,Varmekilde primær,Varmekilde primær:,Varmekilde sekundær,Varmekilde sekundær:,Varmelplan,Varmeplan,Varmeplan:,Vindue,Vinduer,Vinduer & døre,Vinduer - resterende vind,Vinduer - sydvendt facade,Vinduer / yderdøre,Vinduer / årgang,Vinduer mv.,Vinduer og døre,Vinduer og yderdøre,Vinduer/rammer,Vinduer/yderdøre,Vinduer:,Vurderingsår,Ydemur,Yder vægge,Ydermur,Ydermur :,Ydermur bygning 2,Ydermur:,Ydermure,Ydermure:,Ydermyr:,Ydervæg,Ydervæg:,Ydervægge,Ydervægge:,Ydervæggens materiale,Ydre mur,Ydre murværk,Ydremur,Ydremure,Ydrevægge
430,"Teglværksgade 27B, 3. mf.",97.0,København Ø,Lejebolig,False,55.707112,12.558923,2100,0,1770000316,1.150 kr.,,800 m,500 m,450 m,,,,,,1,4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,97 m2,,,,2017,,A15,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
475,"Teglværksgade 27B, St. Tv.",102.0,København Ø,Lejebolig,False,55.70701,12.559074,2100,0,1770000309,1.050 kr.,,400 m,700 m,750 m,,,,,,1,4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,101 m2,,,,2020,,,2.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
476,"Scherfigsvej 13, 2. th., Scherfigs Have",109.0,København Ø,Lejebolig,False,55.72194,12.580712,2100,0,177L02360,1.200 kr.,,800 m,500 m,450 m,,,Ja,,,1,4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,103 m2,,,,2019,,A15,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
477,"Scherfigsvej 13, 4. th., Scherfigs Have",109.0,København Ø,Lejebolig,False,55.72194,12.580712,2100,0,177L02364,1.050 kr.,,400 m,700 m,750 m,,,,,,1,4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,101 m2,,,,2020,,,2.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
498,"Scherfigsvej 19, st.. tv., Scherfigs Have",120.0,København Ø,Lejebolig,False,55.721898,12.581546,2100,0,177L02394,1.050 kr.,,400 m,700 m,750 m,,,,,,1,4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,101 m2,,,,2020,,,4.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [690]:
df_clean.drop(df_clean[df_clean.ejendomstypePrimaerNicename == 'Lejebolig'].index, inplace = True)

Okay, let's check if we have any columns that only contain NaN and drop those.

In [691]:
df_clean.dropna(axis = 1, how = 'all', inplace = True)
df_clean.head()

Unnamed: 0,adresse,boligOrGrundAreal,city,ejendomstypePrimaerNicename,isNew,lat,lng,postal,price,sagsnummer,Aconto forbrug pr. måned,Afstand indkøb,Afstand til indkøb,Afstand til off. transport,Afstand til skole,Afstand til skov,Afstand til vand,Altan,Altan:,Anetenneforhold,Antal plan,Antal rum,Antal toiletter,Antenne,Antenne forh.,Antenne forhold,Antenne og internet,Antenne/bredbånd,Antenne/internet,Antenne/internet:,Antenne/parabol,Antenne/tv/internet,Antenne:,Antenneforh.,Antenneforhold,Antenneforhold:,Antennetilslutning,Boligareal,Bredbånd,Brutto/Netto  ?  ekskl. ejerudgift,Brutto/Netto  ekskl. ejerudgift,Byggeår,Ejendomsværdi i kr.,Energimærke,Etage,Grundareal,Gulv,Gulvbelægninger,Gulve,Gulve/lofter,Gulve:,Heraf grundværdi i kr.,Indkøb,Internet,Internet-/tv-forhold,Internet/antenne,Internet/tv,Internet:,Kabel-tv,Kloak,Kloak - tilslutningsbidra,Kloak/vand/vej,Kloak:,Kloakering,Kælderareal,Kælderrum,Loft,Lofter,Lofter/gulve,Lofter:,Prisudvikling,Skole,Skole:,Skoledistrikt,Skoletihørsforhold,Skoletilhørsforhold,Tilbehør,Varemneinstallation,Varme,Varme installation,Varme:,Varmeforhold,Varmeforsyning,Varmeinstallation,Varmeinstallation:,Varmekilde,Varmekilde primær,Varmeplan,Varmeplan:,Vinduer,Vinduer og yderdøre,Vinduer:,Vurderingsår,Yder vægge,Ydermur,Ydermur:,Ydermure,Ydervæg,Ydervæg:,Ydervægge,Ydremur
0,"A.D. Jørgensens Vej 75, 2. 1.",35.0,Frederiksberg,Ejerlejlighed,False,55.680726,12.494705,2000,1.350.000,1300000111,,,1.100 m,150 m,2.600 m,,,Nej,,,1.0,1.0,1.0,,,,,,,,,,,,,,,35 m2,,5.249 / 4.628 kr.,,1991.0,780.000,B,2.0,,,,,,,168.3,,,,,,,,,,,,,,,,,,,-7%,,,,,,Alle nuværende hårde hvidevarer i lejligheden ...,,,,,,,,,,,,,,,,2018.0,,,,,,,,
1,"Holger Danskes Vej 14, 3. th.",46.0,Frederiksberg,Ejerlejlighed,False,55.686615,12.538356,2000,2.145.000,1740000062,,,150 m,150 m,100 m,,,,,,1.0,2.0,1.0,,,,,,,,,,,,,,,46 m2,,8.313 / 7.331 kr.,,1885.0,1.100.000,D,3.0,,,,,,,145.0,,,,,,,,,,,,,,,,,,,-6%,,,,,,Intuition gaskomfur komfur - Matsui Fridge Do...,,,,,,,,,,,,,,,,2018.0,,,,,,,,
11,"Lyøvej 5, st.. tv.",60.0,Frederiksberg,Ejerlejlighed,False,55.68294,12.524527,2000,2.875.000,1300000128,,,650 m,100 m,550 m,,,Nej,,,1.0,2.0,1.0,,,,,,,,,,,,,,,77 m2,,10.446 / 9.212 kr.,,1988.0,1.600.000,C,3.0,,,,,,,259.1,,,,,,,,,,,,,,,,,,,-4%,,,,,,emhætte - bordkomfur glaskeramisk - indbygni...,,,,,,,,,,,,,,,,2018.0,,,,,,,,
12,"H. Schneekloths Vej 13, 5. th.",56.0,Frederiksberg,Ejerlejlighed,False,55.679928,12.506927,2000,2.750.000,130D01015,,,50 m,150 m,550 m,,,Ja,,,1.0,2.0,1.0,,,,,,,,,,,,,,,74 m2,,10.075 / 8.884 kr.,,1972.0,1.350.000,D,2.0,,,,,,,164.3,,,,,,,,,,,,,,,,,,,0%,,,,,,De i lejligheden hårde hvidevarer medfølger i ...,,,,,,,,,,,,,,,,2018.0,,,,,,,,
30,"Howitzvej 61, 3. th.",67.0,Frederiksberg,Ejerlejlighed,True,55.680209,12.523998,2000,3.195.000,1300000176,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


Alright time to have a look at the columns that we might want to merge (and maybe renaming some of them in English).

In [692]:
df_clean.columns.to_list()

['adresse',
 'boligOrGrundAreal',
 'city',
 'ejendomstypePrimaerNicename',
 'isNew',
 'lat',
 'lng',
 'postal',
 'price',
 'sagsnummer',
 'Aconto forbrug  pr. måned',
 'Afstand indkøb',
 'Afstand til indkøb',
 'Afstand til off. transport',
 'Afstand til skole',
 'Afstand til skov',
 'Afstand til vand',
 'Altan',
 'Altan:',
 'Anetenneforhold',
 'Antal plan',
 'Antal rum',
 'Antal toiletter',
 'Antenne',
 'Antenne forh.',
 'Antenne forhold',
 'Antenne og internet',
 'Antenne/bredbånd',
 'Antenne/internet',
 'Antenne/internet:',
 'Antenne/parabol',
 'Antenne/tv/internet',
 'Antenne:',
 'Antenneforh.',
 'Antenneforhold',
 'Antenneforhold:',
 'Antennetilslutning',
 'Boligareal',
 'Bredbånd',
 'Brutto/Netto\r\n                        ?\n\r\n                        ekskl. ejerudgift',
 'Brutto/Netto\r\n                        ekskl. ejerudgift',
 'Byggeår',
 'Ejendomsværdi i kr.',
 'Energimærke',
 'Etage',
 'Grundareal',
 'Gulv',
 'Gulvbelægninger',
 'Gulve',
 'Gulve/lofter',
 'Gulve:',
 'Her

We'll start of with the Balcony:

In [693]:
df_clean['Altan'].unique()

array(['Nej', nan, 'Ja'], dtype=object)

In [694]:
df_clean['Altan:'].unique()

array([nan, 'Ja'], dtype=object)

In [695]:
df_clean[df_clean['Altan:'] == 'Ja']

Unnamed: 0,adresse,boligOrGrundAreal,city,ejendomstypePrimaerNicename,isNew,lat,lng,postal,price,sagsnummer,Aconto forbrug pr. måned,Afstand indkøb,Afstand til indkøb,Afstand til off. transport,Afstand til skole,Afstand til skov,Afstand til vand,Altan,Altan:,Anetenneforhold,Antal plan,Antal rum,Antal toiletter,Antenne,Antenne forh.,Antenne forhold,Antenne og internet,Antenne/bredbånd,Antenne/internet,Antenne/internet:,Antenne/parabol,Antenne/tv/internet,Antenne:,Antenneforh.,Antenneforhold,Antenneforhold:,Antennetilslutning,Boligareal,Bredbånd,Brutto/Netto  ?  ekskl. ejerudgift,Brutto/Netto  ekskl. ejerudgift,Byggeår,Ejendomsværdi i kr.,Energimærke,Etage,Grundareal,Gulv,Gulvbelægninger,Gulve,Gulve/lofter,Gulve:,Heraf grundværdi i kr.,Indkøb,Internet,Internet-/tv-forhold,Internet/antenne,Internet/tv,Internet:,Kabel-tv,Kloak,Kloak - tilslutningsbidra,Kloak/vand/vej,Kloak:,Kloakering,Kælderareal,Kælderrum,Loft,Lofter,Lofter/gulve,Lofter:,Prisudvikling,Skole,Skole:,Skoledistrikt,Skoletihørsforhold,Skoletilhørsforhold,Tilbehør,Varemneinstallation,Varme,Varme installation,Varme:,Varmeforhold,Varmeforsyning,Varmeinstallation,Varmeinstallation:,Varmekilde,Varmekilde primær,Varmeplan,Varmeplan:,Vinduer,Vinduer og yderdøre,Vinduer:,Vurderingsår,Yder vægge,Ydermur,Ydermur:,Ydermure,Ydervæg,Ydervæg:,Ydervægge,Ydremur
338,"Jellingegade 7, 2. tv.",75.0,København Ø,Ejerlejlighed,False,55.70662,12.579088,2100,3.995.000,1260000021,,,,,,,,Ja,Ja,,1,3,1,,,,,,,,,,,,,,,92 m2,,12.371 / 10.829 kr.,,2010,1.850.000,B,5,,,,,,Plankegulve - Mosaikfliser på badeværelse,415.6,,,,Individuel tilslutning,,,,,,,,,,,,,,,-3%,,,,,,Indesit vaske og tørremaskine (kombi) - årgang...,,,,,,,,,,,,,,,Termoruder,2018,,,Metalplader,,,,,
339,Vangehusvej 6B,76.0,København Ø,Ejerlejlighed,False,55.720263,12.575785,2100,4.495.000,1290000019,,,,,,,,Ja,Ja,,1,3,1,,,,,,,,,,,,,,,92 m2,,13.528 / 11.842 kr.,,2007,1.800.000,B,2,,,,,,Parketgulve - Fliser på badeværelse,351.7,,,,Individuel tilslutning,,,,,,,,,,,,,,,0%,,,,,,Siemens tørretumbler - årgang 2007Siemens vask...,,,,,,,,,,,,,,,Termovinduer,2018,,,Andet materiale,,,,,


Alright the other Altan column contain data for those two instances as well, so let's drop the other column.

In [696]:
df_clean.drop(columns = 'Altan:', inplace = True)

Let's replace the Ja and Nej with Yes and No.

In [697]:
df_clean['Altan'].replace('Ja','Yes', inplace = True)
df_clean['Altan'].replace('Nej','No', inplace = True)
df_clean['Altan'].unique()

array(['No', nan, 'Yes'], dtype=object)

Finally, let's rename it:

In [698]:
df_clean.rename(columns={"Altan": "Balcony"}, inplace = True)
df_clean.head()

Unnamed: 0,adresse,boligOrGrundAreal,city,ejendomstypePrimaerNicename,isNew,lat,lng,postal,price,sagsnummer,Aconto forbrug pr. måned,Afstand indkøb,Afstand til indkøb,Afstand til off. transport,Afstand til skole,Afstand til skov,Afstand til vand,Balcony,Anetenneforhold,Antal plan,Antal rum,Antal toiletter,Antenne,Antenne forh.,Antenne forhold,Antenne og internet,Antenne/bredbånd,Antenne/internet,Antenne/internet:,Antenne/parabol,Antenne/tv/internet,Antenne:,Antenneforh.,Antenneforhold,Antenneforhold:,Antennetilslutning,Boligareal,Bredbånd,Brutto/Netto  ?  ekskl. ejerudgift,Brutto/Netto  ekskl. ejerudgift,Byggeår,Ejendomsværdi i kr.,Energimærke,Etage,Grundareal,Gulv,Gulvbelægninger,Gulve,Gulve/lofter,Gulve:,Heraf grundværdi i kr.,Indkøb,Internet,Internet-/tv-forhold,Internet/antenne,Internet/tv,Internet:,Kabel-tv,Kloak,Kloak - tilslutningsbidra,Kloak/vand/vej,Kloak:,Kloakering,Kælderareal,Kælderrum,Loft,Lofter,Lofter/gulve,Lofter:,Prisudvikling,Skole,Skole:,Skoledistrikt,Skoletihørsforhold,Skoletilhørsforhold,Tilbehør,Varemneinstallation,Varme,Varme installation,Varme:,Varmeforhold,Varmeforsyning,Varmeinstallation,Varmeinstallation:,Varmekilde,Varmekilde primær,Varmeplan,Varmeplan:,Vinduer,Vinduer og yderdøre,Vinduer:,Vurderingsår,Yder vægge,Ydermur,Ydermur:,Ydermure,Ydervæg,Ydervæg:,Ydervægge,Ydremur
0,"A.D. Jørgensens Vej 75, 2. 1.",35.0,Frederiksberg,Ejerlejlighed,False,55.680726,12.494705,2000,1.350.000,1300000111,,,1.100 m,150 m,2.600 m,,,No,,1.0,1.0,1.0,,,,,,,,,,,,,,,35 m2,,5.249 / 4.628 kr.,,1991.0,780.000,B,2.0,,,,,,,168.3,,,,,,,,,,,,,,,,,,,-7%,,,,,,Alle nuværende hårde hvidevarer i lejligheden ...,,,,,,,,,,,,,,,,2018.0,,,,,,,,
1,"Holger Danskes Vej 14, 3. th.",46.0,Frederiksberg,Ejerlejlighed,False,55.686615,12.538356,2000,2.145.000,1740000062,,,150 m,150 m,100 m,,,,,1.0,2.0,1.0,,,,,,,,,,,,,,,46 m2,,8.313 / 7.331 kr.,,1885.0,1.100.000,D,3.0,,,,,,,145.0,,,,,,,,,,,,,,,,,,,-6%,,,,,,Intuition gaskomfur komfur - Matsui Fridge Do...,,,,,,,,,,,,,,,,2018.0,,,,,,,,
11,"Lyøvej 5, st.. tv.",60.0,Frederiksberg,Ejerlejlighed,False,55.68294,12.524527,2000,2.875.000,1300000128,,,650 m,100 m,550 m,,,No,,1.0,2.0,1.0,,,,,,,,,,,,,,,77 m2,,10.446 / 9.212 kr.,,1988.0,1.600.000,C,3.0,,,,,,,259.1,,,,,,,,,,,,,,,,,,,-4%,,,,,,emhætte - bordkomfur glaskeramisk - indbygni...,,,,,,,,,,,,,,,,2018.0,,,,,,,,
12,"H. Schneekloths Vej 13, 5. th.",56.0,Frederiksberg,Ejerlejlighed,False,55.679928,12.506927,2000,2.750.000,130D01015,,,50 m,150 m,550 m,,,Yes,,1.0,2.0,1.0,,,,,,,,,,,,,,,74 m2,,10.075 / 8.884 kr.,,1972.0,1.350.000,D,2.0,,,,,,,164.3,,,,,,,,,,,,,,,,,,,0%,,,,,,De i lejligheden hårde hvidevarer medfølger i ...,,,,,,,,,,,,,,,,2018.0,,,,,,,,
30,"Howitzvej 61, 3. th.",67.0,Frederiksberg,Ejerlejlighed,True,55.680209,12.523998,2000,3.195.000,1300000176,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


Next let's take a look at the outer wall/exterior

In [699]:
exterior_cols = df_clean.columns.to_list()[-8:]
exterior_cols

['Yder vægge',
 'Ydermur',
 'Ydermur:',
 'Ydermure',
 'Ydervæg',
 'Ydervæg:',
 'Ydervægge',
 'Ydremur']

In [700]:
def print_unique_values(df, col_list):
    ''' Get the unique values from each column in the col_list for the dataframe'''
    for col in col_list:
        unique_values = df[col].unique()
        print('{}: {}'.format(col, unique_values))

In [701]:
print_unique_values(df_clean, exterior_cols)

Yder vægge: [nan 'Mursten']
Ydermur: [nan 'Mursten' 'Pudset' 'Mursten (tegl, kalksten, cementsten)'
 'Betonelementer' 'Træbeklædning' 'Built-up' 'mursten' 'Gasbeton'
 'Beton/Teglsten' 'Fibercement' 'Letbeton' 'Tegl/kalksandsten'
 'Pudset mursten' 'Røde mursten' 'Pudsede' 'Hvidpudsede' 'Gule pudsede'
 'Mursten, pudset' 'Træværk' 'Aqua paneler og Cedertræ' 'Træ'
 'Mursten og træ' 'Beton' 'Mursten / træ' 'Gule sten' 'gule sten'
 'Beklædningstegl i byens farver' 'Skiferbeklædning og listebeklædning'
 'Bindingsværk' 'Mudset mursten og gasbeton' 'Hvidpudset'
 'Pudset - Hvidt - Mursten' 'Pudset, malet' 'Pudset malede mursten'
 'Røde og gule mursten' 'Ydervæg i Træ (gran)' 'Hvide kalk/sandsten'
 'Pudset/bindingsværk' 'Pudsede mursten' 'Mursten, bindingsværk, gasbeton'
 'Mursten og pudsede facader' 'Mursten/letbeton' 'Malet mv.'
 'Hvide kalksandsten' 'Vandskuret og malet' 'Pudset/malet'
 'Vandskuret og malet i 2018' 'Vandskuret/malet' 'Mursten/malet'
 'Gule mursten' 'Mursten - malet' 'Kalksands

In [702]:
df_clean[exterior_cols].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 8515 entries, 0 to 51069
Data columns (total 8 columns):
Yder vægge    1 non-null object
Ydermur       561 non-null object
Ydermur:      23 non-null object
Ydermure      1 non-null object
Ydervæg       11 non-null object
Ydervæg:      1 non-null object
Ydervægge     7 non-null object
Ydremur       1 non-null object
dtypes: object(8)
memory usage: 598.7+ KB


Seems like the Ydermur column is the one with all the juice.

In [None]:
def merge_columns(df, col_list, merged_col):
    '''
    Merging columns containing similar information in a dataframe into a specified column.
    col_list is the list of similar columns. merged_col is the column information should be merged into.
    '''
    # Check for conflicting information:
    for col in col_list:
        # Only check if the column is not the column to be merged into
        if col != merged_col:
            # If the column in the list is the only one containing information
            # when it's not null, then merge values into the merged_col
            if ((~df[~df_clean[col].isnull()][col_list].isna()).sum() != 0).sum() == 1:
                df[merged_col].fillna(df[col], inplace = True)
            else:
                print('The column {} has conflicts with other columns'.format(col))

In [703]:
df_clean[~df_clean['Yder vægge'].isnull()]

Unnamed: 0,adresse,boligOrGrundAreal,city,ejendomstypePrimaerNicename,isNew,lat,lng,postal,price,sagsnummer,Aconto forbrug pr. måned,Afstand indkøb,Afstand til indkøb,Afstand til off. transport,Afstand til skole,Afstand til skov,Afstand til vand,Balcony,Anetenneforhold,Antal plan,Antal rum,Antal toiletter,Antenne,Antenne forh.,Antenne forhold,Antenne og internet,Antenne/bredbånd,Antenne/internet,Antenne/internet:,Antenne/parabol,Antenne/tv/internet,Antenne:,Antenneforh.,Antenneforhold,Antenneforhold:,Antennetilslutning,Boligareal,Bredbånd,Brutto/Netto  ?  ekskl. ejerudgift,Brutto/Netto  ekskl. ejerudgift,Byggeår,Ejendomsværdi i kr.,Energimærke,Etage,Grundareal,Gulv,Gulvbelægninger,Gulve,Gulve/lofter,Gulve:,Heraf grundværdi i kr.,Indkøb,Internet,Internet-/tv-forhold,Internet/antenne,Internet/tv,Internet:,Kabel-tv,Kloak,Kloak - tilslutningsbidra,Kloak/vand/vej,Kloak:,Kloakering,Kælderareal,Kælderrum,Loft,Lofter,Lofter/gulve,Lofter:,Prisudvikling,Skole,Skole:,Skoledistrikt,Skoletihørsforhold,Skoletilhørsforhold,Tilbehør,Varemneinstallation,Varme,Varme installation,Varme:,Varmeforhold,Varmeforsyning,Varmeinstallation,Varmeinstallation:,Varmekilde,Varmekilde primær,Varmeplan,Varmeplan:,Vinduer,Vinduer og yderdøre,Vinduer:,Vurderingsår,Yder vægge,Ydermur,Ydermur:,Ydermure,Ydervæg,Ydervæg:,Ydervægge,Ydremur
5556,Skovvej 40A,151.0,Gentofte,Villa,False,55.766069,12.54721,2820,8.995.000,156-03839,,,1.500 m,200 m,200 m,,,,,1,6,2,,,,,,,,,,,,,,,180 m2,,10.086 / 8.736 kr.,,1986,1.700.000,C,,766 m2,"Laminat, klinker",,,,,343.6,,,,,,,,,,,,,,,,,,,0%,,,,,,Kogeplade (Voss). Emhætte (eico). Ovn (Voss). ...,,,,,,,,,,,,,,,,2017,Mursten,,,,,,,


In [704]:
df_clean.Ydermur.fillna(df_clean['Yder vægge'], inplace = True)

In [705]:
df_clean[exterior_cols].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 8515 entries, 0 to 51069
Data columns (total 8 columns):
Yder vægge    1 non-null object
Ydermur       562 non-null object
Ydermur:      23 non-null object
Ydermure      1 non-null object
Ydervæg       11 non-null object
Ydervæg:      1 non-null object
Ydervægge     7 non-null object
Ydremur       1 non-null object
dtypes: object(8)
memory usage: 598.7+ KB


In [706]:
df_clean[~df_clean['Ydermur:'].isnull()]

Unnamed: 0,adresse,boligOrGrundAreal,city,ejendomstypePrimaerNicename,isNew,lat,lng,postal,price,sagsnummer,Aconto forbrug pr. måned,Afstand indkøb,Afstand til indkøb,Afstand til off. transport,Afstand til skole,Afstand til skov,Afstand til vand,Balcony,Anetenneforhold,Antal plan,Antal rum,Antal toiletter,Antenne,Antenne forh.,Antenne forhold,Antenne og internet,Antenne/bredbånd,Antenne/internet,Antenne/internet:,Antenne/parabol,Antenne/tv/internet,Antenne:,Antenneforh.,Antenneforhold,Antenneforhold:,Antennetilslutning,Boligareal,Bredbånd,Brutto/Netto  ?  ekskl. ejerudgift,Brutto/Netto  ekskl. ejerudgift,Byggeår,Ejendomsværdi i kr.,Energimærke,Etage,Grundareal,Gulv,Gulvbelægninger,Gulve,Gulve/lofter,Gulve:,Heraf grundværdi i kr.,Indkøb,Internet,Internet-/tv-forhold,Internet/antenne,Internet/tv,Internet:,Kabel-tv,Kloak,Kloak - tilslutningsbidra,Kloak/vand/vej,Kloak:,Kloakering,Kælderareal,Kælderrum,Loft,Lofter,Lofter/gulve,Lofter:,Prisudvikling,Skole,Skole:,Skoledistrikt,Skoletihørsforhold,Skoletilhørsforhold,Tilbehør,Varemneinstallation,Varme,Varme installation,Varme:,Varmeforhold,Varmeforsyning,Varmeinstallation,Varmeinstallation:,Varmekilde,Varmekilde primær,Varmeplan,Varmeplan:,Vinduer,Vinduer og yderdøre,Vinduer:,Vurderingsår,Yder vægge,Ydermur,Ydermur:,Ydermure,Ydervæg,Ydervæg:,Ydervægge,Ydremur
338,"Jellingegade 7, 2. tv.",75.0,København Ø,Ejerlejlighed,False,55.70662,12.579088,2100,3.995.000,1260000021,,,,,,,,Yes,,1.0,3,1,,,,,,,,,,,,,,,92 m2,,12.371 / 10.829 kr.,,2010,1.850.000,B,5.0,,,,,,Plankegulve - Mosaikfliser på badeværelse,415.6,,,,Individuel tilslutning,,,,,,,,,,,,,,,-3%,,,,,,Indesit vaske og tørremaskine (kombi) - årgang...,,,,,,,,,,,,,,,Termoruder,2018,,,Metalplader,,,,,
339,Vangehusvej 6B,76.0,København Ø,Ejerlejlighed,False,55.720263,12.575785,2100,4.495.000,1290000019,,,,,,,,Yes,,1.0,3,1,,,,,,,,,,,,,,,92 m2,,13.528 / 11.842 kr.,,2007,1.800.000,B,2.0,,,,,,Parketgulve - Fliser på badeværelse,351.7,,,,Individuel tilslutning,,,,,,,,,,,,,,,0%,,,,,,Siemens tørretumbler - årgang 2007Siemens vask...,,,,,,,,,,,,,,,Termovinduer,2018,,,Andet materiale,,,,,
1259,"Jens Otto Krags Gade 7, 1. 1.",99.0,København S,Ejerlejlighed,False,55.65907,12.568976,2300,4.245.000,1170000071,,,240 m,130 m,500 m,,,,,3.0,2,1,,,,,,,,,,,,,,,61 m2,,7.772 / 6.801 kr.,,1934,910.000,D,3.0,,,,,,"Planke, fliser, linoleum",151.8,,,,,,Individuel tilslutning,,,,,,,,,,,,,-5%,,,,,,"Køle-/fryseskab (Simens, før 2005) - Ovn (Amic...",,,,,,,,,,,,,,,Termo,2018,,,Mursten,,,,,
4233,Højmarksvej 8,146.0,Karlslunde,Villa,False,55.569109,12.254813,2690,3.595.000,2140000163,,,4.900 m,450 m,2.600 m,,,,,3.0,4,2,,,,,,,,,,,,,,,174 m2,,5.817 / 5.027 kr.,,1942,1.100.000,E,,879 m2,,,"Planke, vinyl, klinker og lamelparket.",,,178.3,,,,,,,,,,,,,16 m2,,,,,"Træ, troltex,",0%,,Hyllehøjskolen,,,,Ikea induktionskogeplader - AEG indbygningsovn...,,,,,,,,,,,,,,,Plast,2018,,,Pudset og træ,,,,,
4235,Karlslunde Strandvej 63A,144.0,Karlslunde,Villa,False,55.55891,12.259349,2690,4.450.000,2140000042,,,535 m,,1.600 m,,,,,,4,1,,,,,,,,,,,,,,,125 m2,,6.587 / 5.693 kr.,,1985,1.350.000,C,,890 m2,,,Træ - Tæpper - Klinker,,,329.6,,,,,,,,,,,,,,,,,,Listelofter,0%,,Østre Skole,,,,Gram komfurThermex emhætteAtlas køle-/fryseskab,,,,,,,,,,,,,,,Træ - Termo,2017,,,Mursten,,,,,
4236,Hørager 12,117.0,Karlslunde,Rækkehus,False,55.567107,12.249345,2690,3.195.000,2140000086,,,2.200 m,200 m,2.500 m,600 m,300 m,,,2.0,5,2,,,,,,,,,,,,,,,201 m2,,19.074 / 16.498 kr.,,2000,3.250.000,B,,757 m2,,,Parket - Klinker - Tæpper,,,584.4,,,,,,,,,,,,,,,,,,Gips,0%,,Østre Skole,,,,Miele indbygningsovnBlomberg indbygningskogepl...,,,,,,,,,,,,,,,Træ/alu - Energiruder (Velfac),2017,,,Mursten (Pudset - Malet),,,,,
4237,Søhøj 13,114.0,Karlslunde,Rækkehus,False,55.566624,12.230414,2690,2.685.000,2140000091,,,850 m,,,,,Yes,,1.0,3,1,,,,,,,,,,,,,,,120 m2,,22.395 / 19.371 kr.,,2005,2.250.000,B,4.0,,,,,,Træ - Klinker,117.4,,,,,,,,,,,,,,,,,,Gips,0%,,,,,,Bosch indbygningsovnBosch indbygningskogeplade...,,,,,,,,,,,,,,,Træ - træ/alu med 2 lags energiruder,2017,,,Mursten,,,,,
4238,Mejerivej 3,145.0,Karlslunde,Villa,False,55.570487,12.222394,2690,2.850.000,2140000040,,,550 m,,750 m,,75 m,Yes,,2.0,3,2,,,,,,,,,,,,,,,113 m2,,12.557 / 10.859 kr.,,2003,2.250.000,C,,,,,,,Egeparket - klinker,482.6,,,,,,,,,,,,,,,,,,Gips - Beton,0%,,,,,,Miele vaskemaskineVoss indbygningsovnBlomberg ...,,,,,,,,,,,,,,,Træ - Termo,2017,,,Mursten,,,,,
4239,Karlslunde Mosevej 23,181.0,Karlslunde,Villa,False,55.558446,12.250011,2690,3.795.000,2140000022,,,1.500 m,,1.700 m,10 m,200 m,,,2.0,3,2,,,,,,,,,,,,,,,127 m2,,10.635 / 9.196 kr.,,1965,1.150.000,D,,829 m2,,,"Tæppe, Parket, klinker og vinyl",,,199.3,,,,,,,,,,,,,115 m2,,,,,"Liste, træ og gipsplader",0%,,Vestre Skole,,,,Thermex emhætteVoss kogepladerGorenje indbygni...,,,,,,,,,,,,,,,Mahogni / Termo,2018,,,Mursten,,,,,
4240,Toftholmsvej 11,814.0,Karlslunde,Helårsgrund,False,55.558727,12.246224,2690,2.495.000,214M00793,,,400 m,75 m,650 m,,500 m,,,3.0,5,3,,,,,,,,,,,,,,,118 m2,,4.580 / 3.957 kr.,,1911,1.000.000,D,,198 m2,,,"Trægulv, klinker, linoleum og laminat.",,,183.1,,,,,,,,,,,,,68 m2,,,,,Træ og gips,-6%,,Østre Skole,,,,kogeplader - indbygningsovn - emhætte - køl...,,,,,,,,,,,,,,,Træ,2018,,,Mursten,,,,,


In [707]:
df_clean.Ydermur.fillna(df_clean['Ydermur:'], inplace = True)

In [708]:
df_clean[~df_clean['Ydermure'].isnull()]

Unnamed: 0,adresse,boligOrGrundAreal,city,ejendomstypePrimaerNicename,isNew,lat,lng,postal,price,sagsnummer,Aconto forbrug pr. måned,Afstand indkøb,Afstand til indkøb,Afstand til off. transport,Afstand til skole,Afstand til skov,Afstand til vand,Balcony,Anetenneforhold,Antal plan,Antal rum,Antal toiletter,Antenne,Antenne forh.,Antenne forhold,Antenne og internet,Antenne/bredbånd,Antenne/internet,Antenne/internet:,Antenne/parabol,Antenne/tv/internet,Antenne:,Antenneforh.,Antenneforhold,Antenneforhold:,Antennetilslutning,Boligareal,Bredbånd,Brutto/Netto  ?  ekskl. ejerudgift,Brutto/Netto  ekskl. ejerudgift,Byggeår,Ejendomsværdi i kr.,Energimærke,Etage,Grundareal,Gulv,Gulvbelægninger,Gulve,Gulve/lofter,Gulve:,Heraf grundværdi i kr.,Indkøb,Internet,Internet-/tv-forhold,Internet/antenne,Internet/tv,Internet:,Kabel-tv,Kloak,Kloak - tilslutningsbidra,Kloak/vand/vej,Kloak:,Kloakering,Kælderareal,Kælderrum,Loft,Lofter,Lofter/gulve,Lofter:,Prisudvikling,Skole,Skole:,Skoledistrikt,Skoletihørsforhold,Skoletilhørsforhold,Tilbehør,Varemneinstallation,Varme,Varme installation,Varme:,Varmeforhold,Varmeforsyning,Varmeinstallation,Varmeinstallation:,Varmekilde,Varmekilde primær,Varmeplan,Varmeplan:,Vinduer,Vinduer og yderdøre,Vinduer:,Vurderingsår,Yder vægge,Ydermur,Ydermur:,Ydermure,Ydervæg,Ydervæg:,Ydervægge,Ydremur
7592,Strandvejen 143A,180.0,Espergærde,Villa,False,55.996465,12.564137,3060,6.750.000,138-03037,,,500 m,500 m,1.700 m,,,,,2,5,1,,,,,,,,,,,,Mulighed for kabel-tv (mod egenbetaling),,,185 m2,,11.592 / 10.114 kr.,,1919,2.000.000,C,,1.034 m2,,,"Planke, klinker m.m.",,,322.6,,,,,,,,,,,,,,,,"Hvide, panelloft m.m.",,,0%,,,,,Bavnehøj Skole,emhætte - keramisk kogeplade - indbygningsov...,,,,,,,,,,,,,Termo,,,2017,,,,Mursten,,,,


In [709]:
df_clean.Ydermur.fillna(df_clean['Ydermure'], inplace = True)

In [710]:
df_clean[~df_clean['Ydervæg'].isnull()]

Unnamed: 0,adresse,boligOrGrundAreal,city,ejendomstypePrimaerNicename,isNew,lat,lng,postal,price,sagsnummer,Aconto forbrug pr. måned,Afstand indkøb,Afstand til indkøb,Afstand til off. transport,Afstand til skole,Afstand til skov,Afstand til vand,Balcony,Anetenneforhold,Antal plan,Antal rum,Antal toiletter,Antenne,Antenne forh.,Antenne forhold,Antenne og internet,Antenne/bredbånd,Antenne/internet,Antenne/internet:,Antenne/parabol,Antenne/tv/internet,Antenne:,Antenneforh.,Antenneforhold,Antenneforhold:,Antennetilslutning,Boligareal,Bredbånd,Brutto/Netto  ?  ekskl. ejerudgift,Brutto/Netto  ekskl. ejerudgift,Byggeår,Ejendomsværdi i kr.,Energimærke,Etage,Grundareal,Gulv,Gulvbelægninger,Gulve,Gulve/lofter,Gulve:,Heraf grundværdi i kr.,Indkøb,Internet,Internet-/tv-forhold,Internet/antenne,Internet/tv,Internet:,Kabel-tv,Kloak,Kloak - tilslutningsbidra,Kloak/vand/vej,Kloak:,Kloakering,Kælderareal,Kælderrum,Loft,Lofter,Lofter/gulve,Lofter:,Prisudvikling,Skole,Skole:,Skoledistrikt,Skoletihørsforhold,Skoletilhørsforhold,Tilbehør,Varemneinstallation,Varme,Varme installation,Varme:,Varmeforhold,Varmeforsyning,Varmeinstallation,Varmeinstallation:,Varmekilde,Varmekilde primær,Varmeplan,Varmeplan:,Vinduer,Vinduer og yderdøre,Vinduer:,Vurderingsår,Yder vægge,Ydermur,Ydermur:,Ydermure,Ydervæg,Ydervæg:,Ydervægge,Ydremur
128,"Seedorffs Vænge 8, 3. th.",97.0,Frederiksberg,Ejerlejlighed,False,55.683951,12.523416,2000,4.795.000,1300000160,,,300 m,300 m,500 m,,,,,1.0,3,1.0,Parknet Kabel-TV individuelt,,,,,,,,,,,,,,86 m2,,16.221 / 14.201 kr.,,1889,1.950.000,D,4.0,,,,Planker,,,81.400,,,,,,,,,,,,,,,,,,,-2%,,,,,,Miele vaskemaskineIkea kogepladerBosch ovnIkea...,,,,,,,,,,,,,Termo,,,2017,,,,,Mursten,,,
1436,"Hannemanns Allé 4P, Ørestad",106.0,København S,Rækkehus,False,55.62513,12.571298,2300,4.195.000,1680000081,,,500 m,50 m,1.000 m,,,,,2.0,5,,,,,,,,,,,,,,,,130 m2,,21.222 / 18.732 kr.,,1936,3.200.000,D,,842 m2,,,"Planker, linolium, klinker",,,2.353.400,,,,,,,,,,,,,75 m2,,,,,,0%,,,,,,gas komfur - køle/fryseskab - opvaskemaskin...,,,,,,,,,,,,,"Termo, enkelt lags",,,2017,,,,,Mursten,,,
3876,Parkås 56,160.0,Greve,Villa,True,55.580312,12.290787,2670,3.995.000,2140000104,,,1.400 m,300 m,1.100 m,,,,,3.0,3,,,,,,,,,,,,,,,,138 m2,,10.450 / 9.074 kr.,,1929,1.600.000,D,,738 m2,,,"Træ, Klinkegulv på beton, Teglsten på beton i ...",,,449.900,,,,,,,,,,,,,6 m2,,,"Gips lofter , listelofter",,,0%,,,,,,Whirlpool indbygningskogeplader - Gorenje indb...,,,,,,,,,,,,,tolags termoruder og energiruder,,,2017,,,,,"Tegl/kalksandsten, træ, Let pladekonstruktion",,,
3878,Søagerparken 25,118.0,Greve,Rækkehus,True,55.585393,12.295538,2670,2.595.000,2140000162,,,1.200 m,300 m,750 m,,,,,1.0,4,1.0,,,,,,,,,,,,,,,146 m2,,6.387 / 5.544 kr.,,1968,1.400.000,C,,744 m2,,,"Væg-til-væg tæppe, Laminatgulv på beton, Lamin...",,,489.300,,,,,,,,,,,,,,,,Profilbrædder,,,-5%,,,,,,Zamussi køle-/fryseskab - Voss komfur - Blombe...,,,,,,,,,,,,,Tolags energirude / Tolags termorude,,,2018,,,,,Tegl/kalksandsten,,,
3879,Knøsen 86,110.0,Greve,Villa,True,55.598345,12.31934,2670,2.795.000,214K00157,,,2.000 m,1.000 m,3.000 m,,,,,1.0,5,2.0,,,,,,,,,,,,,,,150 m2,,12.173 / 10.571 kr.,,2001,2.300.000,C,,810 m2,,,Beton/klinker/træ,,,470.900,,,,,,,,,,,,,,,,Gips/træ,,,-3%,,,,,,IKEA indbygningskogepladerWhirlpool indbygning...,,,,,,,,,,,,,Termo,,,2017,,,,,Mursten,,,
3892,Lundemosen 1,189.0,Greve,Villa,False,55.579215,12.271409,2670,3.895.000,2140000100,,,1.000 m,300 m,2.000 m,,,,,,4,1.0,,,,,,,,,,,,,,,104 m2,,6.979 / 6.058 kr.,,1983,1.200.000,A10,,154 m2,,,"klinker, tæpper mv.",,,325.600,,,,,,,,,,,,,,,,"Træ, gips mv.",,,0%,,,,,,"Vaskemaskine mrk. LG, Tørretumbler mrk. AEG. K...",,,,,,,,,,,,,Termo mv.,,,2018,,,,,Mursten,,,
3904,Grønnegården 441,69.0,Greve,Ejerlejlighed,False,55.599714,12.342404,2670,1.895.000,214M01224,,,450 m,450 m,1.100 m,,,,,,5,1.0,,,,,,,,,,,,,,,116 m2,,13.528 / 11.747 kr.,,1955,1.550.000,C,,258 m2,,,Træ/klinker,,,464.200,,,,,,,,,,,,,56 m2,,,Gips/pladeloft,,,0%,,,,,,Miele køleskab - 2003Voss ovn - 2003Bosch opva...,,,,,,,,,,,,,Træ/alu,,,2017,,,,,Murværk med puds,,,
4050,Maglekæret 5D,79.0,Solrød Strand,Ejerlejlighed,True,55.522146,12.208072,2680,1.895.000,1060000135,,,400 m,450 m,900 m,,,,,1.0,4,2.0,,,,,,,,,,,,,,,126 m2,,6.593 / 5.676 kr.,,1974,960.000,D,,772 m2,,,Laminat/klinker,,,306.500,,,,,,,,,,,,,,,,Gips,,,-6%,,,,,,Køleskab mrk. Bosch. Ovn mrk. Gorenje. Opvaske...,,,,,,,,,,,,,Termo,,,2018,,,,,Mursten,,,
4351,Håbets Allé 37,294.0,Brønshøj,Villa,False,55.702066,12.50379,2700,10.950.000,102M00323,,,2.000 m,250 m,1.500 m,,,No,,1.0,4,1.0,,,,,,,,,,,,,,,102 m2,,4.090 / 3.524 kr.,,1986,830.000,D,,,,,"Træ, klinker og tæpper",,,135.700,,,,,,,,Seperatkloakeret,,,,,,,,Træ og gips,,,0%,,,,,,De i køkkenet værende hvidevarer,,,,,,,Naturgas,,,,,,Termo,,,2017,,,,,Mursten,,,
5680,Bernhard Olsens Vej 1,133.0,Virum,Villa,False,55.787548,12.488782,2830,5.100.000,1720000069,,,1.100 m,900 m,500 m,,1.400 m,,,2.0,4,1.0,,,,,,,,,,,,Stofa,,,71 m2,,2.653 / 2.294 kr.,,1987,700.000,,,,,,,,,279.200,,,,,,,,,,,,,,,,,,,0%,,,,,,Alt indbo,,,,,,,,,,Elvarme,,,Plast,,,2017,,,,,Mursten,,,


In [711]:
df_clean.Ydermur.fillna(df_clean['Ydervæg'], inplace = True)

In [712]:
df_clean[~df_clean['Ydervæg:'].isnull()]

Unnamed: 0,adresse,boligOrGrundAreal,city,ejendomstypePrimaerNicename,isNew,lat,lng,postal,price,sagsnummer,Aconto forbrug pr. måned,Afstand indkøb,Afstand til indkøb,Afstand til off. transport,Afstand til skole,Afstand til skov,Afstand til vand,Balcony,Anetenneforhold,Antal plan,Antal rum,Antal toiletter,Antenne,Antenne forh.,Antenne forhold,Antenne og internet,Antenne/bredbånd,Antenne/internet,Antenne/internet:,Antenne/parabol,Antenne/tv/internet,Antenne:,Antenneforh.,Antenneforhold,Antenneforhold:,Antennetilslutning,Boligareal,Bredbånd,Brutto/Netto  ?  ekskl. ejerudgift,Brutto/Netto  ekskl. ejerudgift,Byggeår,Ejendomsværdi i kr.,Energimærke,Etage,Grundareal,Gulv,Gulvbelægninger,Gulve,Gulve/lofter,Gulve:,Heraf grundværdi i kr.,Indkøb,Internet,Internet-/tv-forhold,Internet/antenne,Internet/tv,Internet:,Kabel-tv,Kloak,Kloak - tilslutningsbidra,Kloak/vand/vej,Kloak:,Kloakering,Kælderareal,Kælderrum,Loft,Lofter,Lofter/gulve,Lofter:,Prisudvikling,Skole,Skole:,Skoledistrikt,Skoletihørsforhold,Skoletilhørsforhold,Tilbehør,Varemneinstallation,Varme,Varme installation,Varme:,Varmeforhold,Varmeforsyning,Varmeinstallation,Varmeinstallation:,Varmekilde,Varmekilde primær,Varmeplan,Varmeplan:,Vinduer,Vinduer og yderdøre,Vinduer:,Vurderingsår,Yder vægge,Ydermur,Ydermur:,Ydermure,Ydervæg,Ydervæg:,Ydervægge,Ydremur
4234,Søhøj 23,114.0,Karlslunde,Rækkehus,False,55.566249,12.231235,2690,2.835.000,2140000136,,,650 m,450 m,2.000 m,4.300 m,2.500 m,,,,3,1,,,,,,,,,,,,,,,92 m2,,4.277 / 3.695 kr.,,1974,1.050.000,D,,1.060 m2,,,,,"Linolium, parket og klinker",362.7,,,,,,,,,,,,,,,,,,Træ,0%,,Hyllehøjskolen,,,,Blomberg indbygningsovn - emhætte - Blomberg ...,,,,,,,,,,,,,,,Træ / termo,2018,,,,,,Gule mursten,,


In [713]:
df_clean.Ydermur.fillna(df_clean['Ydervæg:'], inplace = True)

In [714]:
df_clean[~df_clean['Ydervægge'].isnull()]

Unnamed: 0,adresse,boligOrGrundAreal,city,ejendomstypePrimaerNicename,isNew,lat,lng,postal,price,sagsnummer,Aconto forbrug pr. måned,Afstand indkøb,Afstand til indkøb,Afstand til off. transport,Afstand til skole,Afstand til skov,Afstand til vand,Balcony,Anetenneforhold,Antal plan,Antal rum,Antal toiletter,Antenne,Antenne forh.,Antenne forhold,Antenne og internet,Antenne/bredbånd,Antenne/internet,Antenne/internet:,Antenne/parabol,Antenne/tv/internet,Antenne:,Antenneforh.,Antenneforhold,Antenneforhold:,Antennetilslutning,Boligareal,Bredbånd,Brutto/Netto  ?  ekskl. ejerudgift,Brutto/Netto  ekskl. ejerudgift,Byggeår,Ejendomsværdi i kr.,Energimærke,Etage,Grundareal,Gulv,Gulvbelægninger,Gulve,Gulve/lofter,Gulve:,Heraf grundværdi i kr.,Indkøb,Internet,Internet-/tv-forhold,Internet/antenne,Internet/tv,Internet:,Kabel-tv,Kloak,Kloak - tilslutningsbidra,Kloak/vand/vej,Kloak:,Kloakering,Kælderareal,Kælderrum,Loft,Lofter,Lofter/gulve,Lofter:,Prisudvikling,Skole,Skole:,Skoledistrikt,Skoletihørsforhold,Skoletilhørsforhold,Tilbehør,Varemneinstallation,Varme,Varme installation,Varme:,Varmeforhold,Varmeforsyning,Varmeinstallation,Varmeinstallation:,Varmekilde,Varmekilde primær,Varmeplan,Varmeplan:,Vinduer,Vinduer og yderdøre,Vinduer:,Vurderingsår,Yder vægge,Ydermur,Ydermur:,Ydermure,Ydervæg,Ydervæg:,Ydervægge,Ydremur
1070,Dirchsvej 26,79.0,København S,Villa,True,55.641901,12.617531,2300,4.700.000,114U00696,,,800 m,250 m,950 m,,,,,2.0,4.0,2.0,,,,,,,,,,,,,,,84 m2,,13.528 / 11.731 kr.,,1960,1.850.000,G,,658 m2,,,,,,1.037.700,,,,,,,,,,,,,84 m2,,,,,,0%,Frydenhøjskolen,,,,,mrk. ukendt køleskabGram fryseskabWhirlpool va...,,,,,,,,,,,,,,,,2018,,,,,,,Mursten,
1072,"Hollænderdybet 30, st.. th.",77.0,København S,Ejerlejlighed,False,55.664088,12.595292,2300,3.350.000,1170000074,,,250 m,210 m,350 m,,,,,2.0,3.0,2.0,,,,,,,,,,,,,,,115 m2,,19.093 / 16.561 kr.,,1965,1.700.000,D,,1.024 m2,,,,,,426.700,,,,,,,,,,,,,112 m2,,,,,,0%,Holmegårdsskolen,,,,,"Køber er gjort bekendt med, at de tilhørende h...",,,,,,,,,,,,,,,,2018,,,,,,,Mursten,
1074,"Jens Otto Krags Gade 7, 5. 3.",80.0,København S,Ejerlejlighed,False,55.65907,12.568976,2300,3.550.000,1200000074,,,1.300 m,600 m,1.100 m,,,,,2.0,5.0,3.0,,,,,,,,,,,,,,,163 m2,,23.137 / 20.068 kr.,,2011,3.150.000,C,,814 m2,,,,,,505.600,,,,,,,,,,,,,163 m2,,,,,,0%,Engstrandskolen,,,,,Siemens amerikaner køleskabTThermex emhætteMrk...,,,,,,,,,,,,,,,,2018,,,,,,,Mursten,
1075,"Amagerbrogade 299, 3. tv.",73.0,København S,Ejerlejlighed,False,55.641319,12.6172,2300,2.395.000,1680000052,,,550 m,550 m,800 m,,,,,2.0,5.0,1.0,,,,,,,,,,,,,,,133 m2,,15.066 / 13.066 kr.,,1967,2.500.000,F,,930 m2,,,,,,1.231.300,,,,,,,,,,,,,120 m2,,,,,,0%,Dansborgskolen,,,,,Gram kummefryser - ingen garantiBosch vaskemas...,,,,,,,,,,,,,,,,2018,,,,,,,Mursten,
1077,"Hallandsgade 6C, st..",79.0,København S,Ejerlejlighed,False,55.66359,12.599646,2300,2.845.000,1170000010,,,800 m,600 m,750 m,,,,,1.0,5.0,2.0,,,,,,,,,,,,,,,151 m2,,13.141 / 11.396 kr.,,1984,2.050.000,E,,300 m2,,,,,,633.800,,,,,,,,,,,,,,,,,,,-6%,Præstemoseskolen,,,,,mrk. ukendt indbygnings ovnBosch køle/fryseska...,,,,,,,,,,,,,,,,2018,,,,,,,Mursten,
8990,Borupvej 58,200.0,Skævinge,Villa,False,55.907333,12.178038,3320,3.750.000,152-01989,,,800 m,500 m,,,,,,,,,,,,,,,,,,,,,,,50 m2,,11.626 / 9.918 kr.,,2016,1.350.000,,,210.697 m2,,,,,,0,,,,,,,,Sivedræn/Trixtank,,,,,49 m2,,,,,,0%,,,,,,,,Varmepumpe (luft til luft),,,,,,,,,,,,,,2018,,,,,,,Træ,
9177,Herlufdalsvej 10A,116.0,Hillerød,Rækkehus,False,55.919151,12.289652,3400,2.995.000,1510000079,,,2.000 m,800 m,,,,,,,6.0,2.0,,,,,,,,,,,,,,,93 m2,,5.814 / 4.958 kr.,,1977,1.350.000,,,1.518 m2,,,,,,274.400,,,,,,,,Mekanisk rensning med nedsivningsanlæg,,,,,,,,,,,0%,,,,,,Indbo medfølger undtaget sælgers personlige ef...,,Elvarme + brændeovn,,,,,,,,,,,,,,2018,,,,,,,Træbeklædning,


In [715]:
df_clean.Ydermur.fillna(df_clean['Ydervægge'], inplace = True)

In [716]:
df_clean[~df_clean['Ydremur'].isnull()]

Unnamed: 0,adresse,boligOrGrundAreal,city,ejendomstypePrimaerNicename,isNew,lat,lng,postal,price,sagsnummer,Aconto forbrug pr. måned,Afstand indkøb,Afstand til indkøb,Afstand til off. transport,Afstand til skole,Afstand til skov,Afstand til vand,Balcony,Anetenneforhold,Antal plan,Antal rum,Antal toiletter,Antenne,Antenne forh.,Antenne forhold,Antenne og internet,Antenne/bredbånd,Antenne/internet,Antenne/internet:,Antenne/parabol,Antenne/tv/internet,Antenne:,Antenneforh.,Antenneforhold,Antenneforhold:,Antennetilslutning,Boligareal,Bredbånd,Brutto/Netto  ?  ekskl. ejerudgift,Brutto/Netto  ekskl. ejerudgift,Byggeår,Ejendomsværdi i kr.,Energimærke,Etage,Grundareal,Gulv,Gulvbelægninger,Gulve,Gulve/lofter,Gulve:,Heraf grundværdi i kr.,Indkøb,Internet,Internet-/tv-forhold,Internet/antenne,Internet/tv,Internet:,Kabel-tv,Kloak,Kloak - tilslutningsbidra,Kloak/vand/vej,Kloak:,Kloakering,Kælderareal,Kælderrum,Loft,Lofter,Lofter/gulve,Lofter:,Prisudvikling,Skole,Skole:,Skoledistrikt,Skoletihørsforhold,Skoletilhørsforhold,Tilbehør,Varemneinstallation,Varme,Varme installation,Varme:,Varmeforhold,Varmeforsyning,Varmeinstallation,Varmeinstallation:,Varmekilde,Varmekilde primær,Varmeplan,Varmeplan:,Vinduer,Vinduer og yderdøre,Vinduer:,Vurderingsår,Yder vægge,Ydermur,Ydermur:,Ydermure,Ydervæg,Ydervæg:,Ydervægge,Ydremur
1265,"C.F. Møllers Allé 68, 1. tv., Ørestad",98.0,København S,Ejerlejlighed,False,55.63457,12.577761,2300,3.545.000,1680000040,,,400 m,500 m,450 m,,,,,,5,,,,,,,,,,,,,,,,119 m2,,15.484 / 13.555 kr.,,1924,2.075.000,D,1,,Træ og andet,,,,,731.75,,,,,,,,,,,,,82 m2,,,,,,-6%,Katrinedals skolen 450 m,,,,,"Køle-fryseskab: Samsung 2017, Ovn: Gram 2010, ...",,,,,,,,,,,,,Blandede termo og energiglas,,,2018,,,,,,,,Mursten


In [717]:
df_clean.Ydermur.fillna(df_clean['Ydremur'], inplace = True)

In [718]:
df_clean[exterior_cols].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 8515 entries, 0 to 51069
Data columns (total 8 columns):
Yder vægge    1 non-null object
Ydermur       606 non-null object
Ydermur:      23 non-null object
Ydermure      1 non-null object
Ydervæg       11 non-null object
Ydervæg:      1 non-null object
Ydervægge     7 non-null object
Ydremur       1 non-null object
dtypes: object(8)
memory usage: 598.7+ KB


We can now drop all but the Ydermur column.

In [719]:
df_clean.drop(axis = 1, 
              columns = ['Yder vægge', 'Ydermur:', 
                         'Ydermure', 'Ydervæg', 
                         'Ydervæg:', 'Ydervægge','Ydremur'],
              inplace = True)
df_clean.rename(columns = {"Ydermur": "Exterior"}, inplace = True)
df_clean.head()

Unnamed: 0,adresse,boligOrGrundAreal,city,ejendomstypePrimaerNicename,isNew,lat,lng,postal,price,sagsnummer,Aconto forbrug pr. måned,Afstand indkøb,Afstand til indkøb,Afstand til off. transport,Afstand til skole,Afstand til skov,Afstand til vand,Balcony,Anetenneforhold,Antal plan,Antal rum,Antal toiletter,Antenne,Antenne forh.,Antenne forhold,Antenne og internet,Antenne/bredbånd,Antenne/internet,Antenne/internet:,Antenne/parabol,Antenne/tv/internet,Antenne:,Antenneforh.,Antenneforhold,Antenneforhold:,Antennetilslutning,Boligareal,Bredbånd,Brutto/Netto  ?  ekskl. ejerudgift,Brutto/Netto  ekskl. ejerudgift,Byggeår,Ejendomsværdi i kr.,Energimærke,Etage,Grundareal,Gulv,Gulvbelægninger,Gulve,Gulve/lofter,Gulve:,Heraf grundværdi i kr.,Indkøb,Internet,Internet-/tv-forhold,Internet/antenne,Internet/tv,Internet:,Kabel-tv,Kloak,Kloak - tilslutningsbidra,Kloak/vand/vej,Kloak:,Kloakering,Kælderareal,Kælderrum,Loft,Lofter,Lofter/gulve,Lofter:,Prisudvikling,Skole,Skole:,Skoledistrikt,Skoletihørsforhold,Skoletilhørsforhold,Tilbehør,Varemneinstallation,Varme,Varme installation,Varme:,Varmeforhold,Varmeforsyning,Varmeinstallation,Varmeinstallation:,Varmekilde,Varmekilde primær,Varmeplan,Varmeplan:,Vinduer,Vinduer og yderdøre,Vinduer:,Vurderingsår,Exterior
0,"A.D. Jørgensens Vej 75, 2. 1.",35.0,Frederiksberg,Ejerlejlighed,False,55.680726,12.494705,2000,1.350.000,1300000111,,,1.100 m,150 m,2.600 m,,,No,,1.0,1.0,1.0,,,,,,,,,,,,,,,35 m2,,5.249 / 4.628 kr.,,1991.0,780.000,B,2.0,,,,,,,168.3,,,,,,,,,,,,,,,,,,,-7%,,,,,,Alle nuværende hårde hvidevarer i lejligheden ...,,,,,,,,,,,,,,,,2018.0,
1,"Holger Danskes Vej 14, 3. th.",46.0,Frederiksberg,Ejerlejlighed,False,55.686615,12.538356,2000,2.145.000,1740000062,,,150 m,150 m,100 m,,,,,1.0,2.0,1.0,,,,,,,,,,,,,,,46 m2,,8.313 / 7.331 kr.,,1885.0,1.100.000,D,3.0,,,,,,,145.0,,,,,,,,,,,,,,,,,,,-6%,,,,,,Intuition gaskomfur komfur - Matsui Fridge Do...,,,,,,,,,,,,,,,,2018.0,
11,"Lyøvej 5, st.. tv.",60.0,Frederiksberg,Ejerlejlighed,False,55.68294,12.524527,2000,2.875.000,1300000128,,,650 m,100 m,550 m,,,No,,1.0,2.0,1.0,,,,,,,,,,,,,,,77 m2,,10.446 / 9.212 kr.,,1988.0,1.600.000,C,3.0,,,,,,,259.1,,,,,,,,,,,,,,,,,,,-4%,,,,,,emhætte - bordkomfur glaskeramisk - indbygni...,,,,,,,,,,,,,,,,2018.0,
12,"H. Schneekloths Vej 13, 5. th.",56.0,Frederiksberg,Ejerlejlighed,False,55.679928,12.506927,2000,2.750.000,130D01015,,,50 m,150 m,550 m,,,Yes,,1.0,2.0,1.0,,,,,,,,,,,,,,,74 m2,,10.075 / 8.884 kr.,,1972.0,1.350.000,D,2.0,,,,,,,164.3,,,,,,,,,,,,,,,,,,,0%,,,,,,De i lejligheden hårde hvidevarer medfølger i ...,,,,,,,,,,,,,,,,2018.0,
30,"Howitzvej 61, 3. th.",67.0,Frederiksberg,Ejerlejlighed,True,55.680209,12.523998,2000,3.195.000,1300000176,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


Let's look at Vinduer next.

In [720]:
window_cols = ['Vinduer', 'Vinduer og yderdøre', 'Vinduer:']
df_clean[window_cols].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 8515 entries, 0 to 51069
Data columns (total 3 columns):
Vinduer                715 non-null object
Vinduer og yderdøre    2 non-null object
Vinduer:               30 non-null object
dtypes: object(3)
memory usage: 266.1+ KB


In [721]:
print_unique_values(df_clean, window_cols)

Vinduer: [nan 'Termo' 'Forsatsvinduer' 'Enkeltlags med Optoglas' 'Termovinduer'
 'Termo m. lags glas' 'Lavenergi'
 'Energiruder, træ/hvid indvendig. Udvendig alu/komposit mørk farve'
 'Termoruder' 'Termo lavenergi m. 3 lags glas'
 '3-lags, DW-Godkendte, lavenergiruder,' 'Tolags energirude.'
 'Energiglas, termo' 'Etlags glasruder, termo- og energiruder'
 'Termo- og energiruder' 'Termo-/energiruder' 'Energi- og termoruder'
 'Energiruder' 'Energi-termorude' 'Tolags energiruder'
 'Energitermoruder og enkelt lag glas' 'Trelags termoruder'
 'Tolags termoruder, tolags energiruder og trelags termoruder'
 '2-lags termorude' '3-lags energiruder' '2-lags termoruder'
 'Blandede termo og energiglas' 'Termo, enkelt lags' 'Sprossede'
 'Termovinduer og Energivinduer' 'Se energimærke'
 'Termoruder, tolags energiruder' 'Trelags energiruder'
 'Termo / enkeltlags' 'Termovinduer, dobbeltruder og enkelt'
 'Termovinduer m/energiglas' 'Enkeltlags og koblede'
 'Termoruder fra 2017/2018' '2 lags energiruder' 'T

In [722]:
df_clean[~df_clean['Vinduer og yderdøre'].isnull()]

Unnamed: 0,adresse,boligOrGrundAreal,city,ejendomstypePrimaerNicename,isNew,lat,lng,postal,price,sagsnummer,Aconto forbrug pr. måned,Afstand indkøb,Afstand til indkøb,Afstand til off. transport,Afstand til skole,Afstand til skov,Afstand til vand,Balcony,Anetenneforhold,Antal plan,Antal rum,Antal toiletter,Antenne,Antenne forh.,Antenne forhold,Antenne og internet,Antenne/bredbånd,Antenne/internet,Antenne/internet:,Antenne/parabol,Antenne/tv/internet,Antenne:,Antenneforh.,Antenneforhold,Antenneforhold:,Antennetilslutning,Boligareal,Bredbånd,Brutto/Netto  ?  ekskl. ejerudgift,Brutto/Netto  ekskl. ejerudgift,Byggeår,Ejendomsværdi i kr.,Energimærke,Etage,Grundareal,Gulv,Gulvbelægninger,Gulve,Gulve/lofter,Gulve:,Heraf grundværdi i kr.,Indkøb,Internet,Internet-/tv-forhold,Internet/antenne,Internet/tv,Internet:,Kabel-tv,Kloak,Kloak - tilslutningsbidra,Kloak/vand/vej,Kloak:,Kloakering,Kælderareal,Kælderrum,Loft,Lofter,Lofter/gulve,Lofter:,Prisudvikling,Skole,Skole:,Skoledistrikt,Skoletihørsforhold,Skoletilhørsforhold,Tilbehør,Varemneinstallation,Varme,Varme installation,Varme:,Varmeforhold,Varmeforsyning,Varmeinstallation,Varmeinstallation:,Varmekilde,Varmekilde primær,Varmeplan,Varmeplan:,Vinduer,Vinduer og yderdøre,Vinduer:,Vurderingsår,Exterior
8944,Bakkesvinget 18,88.0,Ølsted,Fritidshus,True,55.895063,12.056486,3310,1.395.000,1150000145,,,2.000 m,,,50 m,800 m,,,1,3,1,,,,,,,,,,,,,,,58 m2,,2.028 / 1.724 kr.,,1976,440.0,,,1.537 m2,,,150 mm singles+50 mm pladebatts Rockwoll+130 m...,,,118.4,,,,,,,,,,,,,,,,"100 mm mineraluld, tagpap og 12 mm fyrrusikbræ...",,,0%,,,,,,El-komfur og køleskab. Huset er beregnet til 4...,,,,,,,,,,,,,,Malede trærammer med termo.,,2018,
8946,Maxivej 26,37.0,Ølsted,Fritidshus,False,55.90511,12.055143,3310,749.000,1150000103,,,6.000 m,100 m,2.000 m,50 m,1.500 m,,,1,4,1,,,,,,,,,,,,,,,79 m2,,2.632 / 2.240 kr.,,1975,550.0,,,3.219 m2,,,,,,110.8,,,,,,,,,,,,,,,,,,,-12%,,,,,,"Glaskeramisk el-komfur, køle-/fryseskab, opvas...",,,,,,,,,,,,,,Hvidmalet trærammer m/termoruder,,2018,


In [723]:
df_clean.Vinduer.fillna(df_clean['Vinduer og yderdøre'], inplace = True)

In [724]:
df_clean[~df_clean['Vinduer:'].isnull()]

Unnamed: 0,adresse,boligOrGrundAreal,city,ejendomstypePrimaerNicename,isNew,lat,lng,postal,price,sagsnummer,Aconto forbrug pr. måned,Afstand indkøb,Afstand til indkøb,Afstand til off. transport,Afstand til skole,Afstand til skov,Afstand til vand,Balcony,Anetenneforhold,Antal plan,Antal rum,Antal toiletter,Antenne,Antenne forh.,Antenne forhold,Antenne og internet,Antenne/bredbånd,Antenne/internet,Antenne/internet:,Antenne/parabol,Antenne/tv/internet,Antenne:,Antenneforh.,Antenneforhold,Antenneforhold:,Antennetilslutning,Boligareal,Bredbånd,Brutto/Netto  ?  ekskl. ejerudgift,Brutto/Netto  ekskl. ejerudgift,Byggeår,Ejendomsværdi i kr.,Energimærke,Etage,Grundareal,Gulv,Gulvbelægninger,Gulve,Gulve/lofter,Gulve:,Heraf grundværdi i kr.,Indkøb,Internet,Internet-/tv-forhold,Internet/antenne,Internet/tv,Internet:,Kabel-tv,Kloak,Kloak - tilslutningsbidra,Kloak/vand/vej,Kloak:,Kloakering,Kælderareal,Kælderrum,Loft,Lofter,Lofter/gulve,Lofter:,Prisudvikling,Skole,Skole:,Skoledistrikt,Skoletihørsforhold,Skoletilhørsforhold,Tilbehør,Varemneinstallation,Varme,Varme installation,Varme:,Varmeforhold,Varmeforsyning,Varmeinstallation,Varmeinstallation:,Varmekilde,Varmekilde primær,Varmeplan,Varmeplan:,Vinduer,Vinduer og yderdøre,Vinduer:,Vurderingsår,Exterior
338,"Jellingegade 7, 2. tv.",75.0,København Ø,Ejerlejlighed,False,55.70662,12.579088,2100,3.995.000,1260000021,,,,,,,,Yes,,1.0,3,1,,,,,,,,,,,,,,,92 m2,,12.371 / 10.829 kr.,,2010,1.850.000,B,5.0,,,,,,Plankegulve - Mosaikfliser på badeværelse,415.6,,,,Individuel tilslutning,,,,,,,,,,,,,,,-3%,,,,,,Indesit vaske og tørremaskine (kombi) - årgang...,,,,,,,,,,,,,,,Termoruder,2018,Metalplader
339,Vangehusvej 6B,76.0,København Ø,Ejerlejlighed,False,55.720263,12.575785,2100,4.495.000,1290000019,,,,,,,,Yes,,1.0,3,1,,,,,,,,,,,,,,,92 m2,,13.528 / 11.842 kr.,,2007,1.800.000,B,2.0,,,,,,Parketgulve - Fliser på badeværelse,351.7,,,,Individuel tilslutning,,,,,,,,,,,,,,,0%,,,,,,Siemens tørretumbler - årgang 2007Siemens vask...,,,,,,,,,,,,,,,Termovinduer,2018,Andet materiale
1259,"Jens Otto Krags Gade 7, 1. 1.",99.0,København S,Ejerlejlighed,False,55.65907,12.568976,2300,4.245.000,1170000071,,,240 m,130 m,500 m,,,,,3.0,2,1,,,,,,,,,,,,,,,61 m2,,7.772 / 6.801 kr.,,1934,910.000,D,3.0,,,,,,"Planke, fliser, linoleum",151.8,,,,,,Individuel tilslutning,,,,,,,,,,,,,-5%,,,,,,"Køle-/fryseskab (Simens, før 2005) - Ovn (Amic...",,,,,,,,,,,,,,,Termo,2018,Mursten
4233,Højmarksvej 8,146.0,Karlslunde,Villa,False,55.569109,12.254813,2690,3.595.000,2140000163,,,4.900 m,450 m,2.600 m,,,,,3.0,4,2,,,,,,,,,,,,,,,174 m2,,5.817 / 5.027 kr.,,1942,1.100.000,E,,879 m2,,,"Planke, vinyl, klinker og lamelparket.",,,178.3,,,,,,,,,,,,,16 m2,,,,,"Træ, troltex,",0%,,Hyllehøjskolen,,,,Ikea induktionskogeplader - AEG indbygningsovn...,,,,,,,,,,,,,,,Plast,2018,Pudset og træ
4234,Søhøj 23,114.0,Karlslunde,Rækkehus,False,55.566249,12.231235,2690,2.835.000,2140000136,,,650 m,450 m,2.000 m,4.300 m,2.500 m,,,,3,1,,,,,,,,,,,,,,,92 m2,,4.277 / 3.695 kr.,,1974,1.050.000,D,,1.060 m2,,,,,"Linolium, parket og klinker",362.7,,,,,,,,,,,,,,,,,,Træ,0%,,Hyllehøjskolen,,,,Blomberg indbygningsovn - emhætte - Blomberg ...,,,,,,,,,,,,,,,Træ / termo,2018,Gule mursten
4235,Karlslunde Strandvej 63A,144.0,Karlslunde,Villa,False,55.55891,12.259349,2690,4.450.000,2140000042,,,535 m,,1.600 m,,,,,,4,1,,,,,,,,,,,,,,,125 m2,,6.587 / 5.693 kr.,,1985,1.350.000,C,,890 m2,,,Træ - Tæpper - Klinker,,,329.6,,,,,,,,,,,,,,,,,,Listelofter,0%,,Østre Skole,,,,Gram komfurThermex emhætteAtlas køle-/fryseskab,,,,,,,,,,,,,,,Træ - Termo,2017,Mursten
4236,Hørager 12,117.0,Karlslunde,Rækkehus,False,55.567107,12.249345,2690,3.195.000,2140000086,,,2.200 m,200 m,2.500 m,600 m,300 m,,,2.0,5,2,,,,,,,,,,,,,,,201 m2,,19.074 / 16.498 kr.,,2000,3.250.000,B,,757 m2,,,Parket - Klinker - Tæpper,,,584.4,,,,,,,,,,,,,,,,,,Gips,0%,,Østre Skole,,,,Miele indbygningsovnBlomberg indbygningskogepl...,,,,,,,,,,,,,,,Træ/alu - Energiruder (Velfac),2017,Mursten (Pudset - Malet)
4237,Søhøj 13,114.0,Karlslunde,Rækkehus,False,55.566624,12.230414,2690,2.685.000,2140000091,,,850 m,,,,,Yes,,1.0,3,1,,,,,,,,,,,,,,,120 m2,,22.395 / 19.371 kr.,,2005,2.250.000,B,4.0,,,,,,Træ - Klinker,117.4,,,,,,,,,,,,,,,,,,Gips,0%,,,,,,Bosch indbygningsovnBosch indbygningskogeplade...,,,,,,,,,,,,,,,Træ - træ/alu med 2 lags energiruder,2017,Mursten
4238,Mejerivej 3,145.0,Karlslunde,Villa,False,55.570487,12.222394,2690,2.850.000,2140000040,,,550 m,,750 m,,75 m,Yes,,2.0,3,2,,,,,,,,,,,,,,,113 m2,,12.557 / 10.859 kr.,,2003,2.250.000,C,,,,,,,Egeparket - klinker,482.6,,,,,,,,,,,,,,,,,,Gips - Beton,0%,,,,,,Miele vaskemaskineVoss indbygningsovnBlomberg ...,,,,,,,,,,,,,,,Træ - Termo,2017,Mursten
4239,Karlslunde Mosevej 23,181.0,Karlslunde,Villa,False,55.558446,12.250011,2690,3.795.000,2140000022,,,1.500 m,,1.700 m,10 m,200 m,,,2.0,3,2,,,,,,,,,,,,,,,127 m2,,10.635 / 9.196 kr.,,1965,1.150.000,D,,829 m2,,,"Tæppe, Parket, klinker og vinyl",,,199.3,,,,,,,,,,,,,115 m2,,,,,"Liste, træ og gipsplader",0%,,Vestre Skole,,,,Thermex emhætteVoss kogepladerGorenje indbygni...,,,,,,,,,,,,,,,Mahogni / Termo,2018,Mursten


In [725]:
df_clean.Vinduer.fillna(df_clean['Vinduer:'], inplace = True)

In [726]:
df_clean.drop(axis = 1, 
              columns = ['Vinduer:', 'Vinduer og yderdøre'],
              inplace = True)
df_clean.rename(columns = {"Vinduer": "Windows"}, inplace = True)
df_clean.head()

Unnamed: 0,adresse,boligOrGrundAreal,city,ejendomstypePrimaerNicename,isNew,lat,lng,postal,price,sagsnummer,Aconto forbrug pr. måned,Afstand indkøb,Afstand til indkøb,Afstand til off. transport,Afstand til skole,Afstand til skov,Afstand til vand,Balcony,Anetenneforhold,Antal plan,Antal rum,Antal toiletter,Antenne,Antenne forh.,Antenne forhold,Antenne og internet,Antenne/bredbånd,Antenne/internet,Antenne/internet:,Antenne/parabol,Antenne/tv/internet,Antenne:,Antenneforh.,Antenneforhold,Antenneforhold:,Antennetilslutning,Boligareal,Bredbånd,Brutto/Netto  ?  ekskl. ejerudgift,Brutto/Netto  ekskl. ejerudgift,Byggeår,Ejendomsværdi i kr.,Energimærke,Etage,Grundareal,Gulv,Gulvbelægninger,Gulve,Gulve/lofter,Gulve:,Heraf grundværdi i kr.,Indkøb,Internet,Internet-/tv-forhold,Internet/antenne,Internet/tv,Internet:,Kabel-tv,Kloak,Kloak - tilslutningsbidra,Kloak/vand/vej,Kloak:,Kloakering,Kælderareal,Kælderrum,Loft,Lofter,Lofter/gulve,Lofter:,Prisudvikling,Skole,Skole:,Skoledistrikt,Skoletihørsforhold,Skoletilhørsforhold,Tilbehør,Varemneinstallation,Varme,Varme installation,Varme:,Varmeforhold,Varmeforsyning,Varmeinstallation,Varmeinstallation:,Varmekilde,Varmekilde primær,Varmeplan,Varmeplan:,Windows,Vurderingsår,Exterior
0,"A.D. Jørgensens Vej 75, 2. 1.",35.0,Frederiksberg,Ejerlejlighed,False,55.680726,12.494705,2000,1.350.000,1300000111,,,1.100 m,150 m,2.600 m,,,No,,1.0,1.0,1.0,,,,,,,,,,,,,,,35 m2,,5.249 / 4.628 kr.,,1991.0,780.000,B,2.0,,,,,,,168.3,,,,,,,,,,,,,,,,,,,-7%,,,,,,Alle nuværende hårde hvidevarer i lejligheden ...,,,,,,,,,,,,,,2018.0,
1,"Holger Danskes Vej 14, 3. th.",46.0,Frederiksberg,Ejerlejlighed,False,55.686615,12.538356,2000,2.145.000,1740000062,,,150 m,150 m,100 m,,,,,1.0,2.0,1.0,,,,,,,,,,,,,,,46 m2,,8.313 / 7.331 kr.,,1885.0,1.100.000,D,3.0,,,,,,,145.0,,,,,,,,,,,,,,,,,,,-6%,,,,,,Intuition gaskomfur komfur - Matsui Fridge Do...,,,,,,,,,,,,,,2018.0,
11,"Lyøvej 5, st.. tv.",60.0,Frederiksberg,Ejerlejlighed,False,55.68294,12.524527,2000,2.875.000,1300000128,,,650 m,100 m,550 m,,,No,,1.0,2.0,1.0,,,,,,,,,,,,,,,77 m2,,10.446 / 9.212 kr.,,1988.0,1.600.000,C,3.0,,,,,,,259.1,,,,,,,,,,,,,,,,,,,-4%,,,,,,emhætte - bordkomfur glaskeramisk - indbygni...,,,,,,,,,,,,,,2018.0,
12,"H. Schneekloths Vej 13, 5. th.",56.0,Frederiksberg,Ejerlejlighed,False,55.679928,12.506927,2000,2.750.000,130D01015,,,50 m,150 m,550 m,,,Yes,,1.0,2.0,1.0,,,,,,,,,,,,,,,74 m2,,10.075 / 8.884 kr.,,1972.0,1.350.000,D,2.0,,,,,,,164.3,,,,,,,,,,,,,,,,,,,0%,,,,,,De i lejligheden hårde hvidevarer medfølger i ...,,,,,,,,,,,,,,2018.0,
30,"Howitzvej 61, 3. th.",67.0,Frederiksberg,Ejerlejlighed,True,55.680209,12.523998,2000,3.195.000,1300000176,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


Lets look at the varme columns now.

In [727]:
heat_cols = df_clean.columns.to_list()[-15:-3]
df_clean[heat_cols].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 8515 entries, 0 to 51069
Data columns (total 12 columns):
Varemneinstallation    1 non-null object
Varme                  49 non-null object
Varme installation     1 non-null object
Varme:                 3 non-null object
Varmeforhold           2 non-null object
Varmeforsyning         9 non-null object
Varmeinstallation      325 non-null object
Varmeinstallation:     15 non-null object
Varmekilde             6 non-null object
Varmekilde primær      1 non-null object
Varmeplan              26 non-null object
Varmeplan:             2 non-null object
dtypes: object(12)
memory usage: 864.8+ KB


In [728]:
print_unique_values(df_clean, heat_cols)

Varemneinstallation: [nan 'Fjernvarme']
Varme: [nan 'Fjernvarme, gulvvarme i alle boliger'
 'Generelt gulvvarme med individuel rumregulering fra ejerforeningens egen varmecentral'
 'Fjernvarme' 'Naturgas' 'Naturgas, brændeovn' 'Naturgas og brændeovn'
 ':Fjernvarme og pejs' 'Fjernvarme og brændeovn'
 'Fjernvarme suppleret af brændeovn' 'Naturgas og pejs' 'Pillefyr'
 'Naturgas suppleret af brændeovn'
 'Naturgas suppleret med el-gulvvarme i badeværelse'
 'Oliefyr og brændeovn' 'Træpilleovne og elvarme' 'Oliefyr'
 'Oliefyr og elvarme' 'Centralvarme fra eget anlæg, et-kammer fyr'
 'Fastbrændsel/piller' 'Ingen'
 'Tilslutningsbidrag er betalt  til fjernvarme'
 'Mulighed for fjernvarme - Silkeborg Forsyning' 'Undersøes'
 'Varmepumpe (luft til luft)' 'Elvarme' 'Elvarme + brændeovn'
 'Elvarme og brændeovn']
Varme installation: [nan 'Fjernvarme']
Varme:: [nan 'Naturgas' 'Fjernvarme' 'ingen - betales af køber']
Varmeforhold: [nan 'Oliefyr' 'Pillefyr']
Varmeforsyning: [nan 'Oliefyr' 'Stokerfyr' 'Ol

In [729]:
def merge_columns(df, col_list, merged_col):
    '''
    Merging columns containing similar information in a dataframe into a specified column.
    col_list is the list of similar columns. merged_col is the column information should be merged into.
    '''
    # Check for conflicting information:
    for col in col_list:
        # Only check if the column is not the column to be merged into
        if col != merged_col:
            # If the column in the list is the only one containing information
            # when it's not null, then merge values into the merged_col
            if ((~df[~df_clean[col].isnull()][col_list].isna()).sum() != 0).sum() == 1:
                df[merged_col].fillna(df[col], inplace = True)
            else:
                print('The column {} has conflicts with other columns'.format(col))

In [730]:
merge_columns(df_clean, heat_cols, 'Varmeinstallation')

The column Varmeinstallation: has conflicts with other columns
The column Varmeplan has conflicts with other columns
The column Varmeplan: has conflicts with other columns


In [733]:
df_clean[~df_clean['Varmeinstallation:'].isnull()][heat_cols]

Unnamed: 0,Varemneinstallation,Varme,Varme installation,Varme:,Varmeforhold,Varmeforsyning,Varmeinstallation,Varmeinstallation:,Varmekilde,Varmekilde primær,Varmeplan,Varmeplan:
3890,,,,,,,,Fjernvarme,,,,
3899,,,,,,,,Fjernvarme,,,,
4338,,,,,,,,Fjernvarme,,,,
5036,,,,,,,,Elvarme,,,,
5040,,,,,,,,Elvarme,,,,
5043,,,,,,,,Jordvarme suppleret af brændeovn,,,,
5213,,,,,,,,Naturgas,,,,Naturgas
5223,,,,,,,,Varmepumpe fra 2016,,,,
5224,,,,,,,,Pillefyr,,,,
5225,,,,,,,,Fjenvarme,,,,


Okay, so the Varmeplan: is sometimes conflicting, but is either the same or not adding more information.

In [736]:
df_clean[~df_clean['Varmeplan'].isnull()][heat_cols]

Unnamed: 0,Varemneinstallation,Varme,Varme installation,Varme:,Varmeforhold,Varmeforsyning,Varmeinstallation,Varmeinstallation:,Varmekilde,Varmekilde primær,Varmeplan,Varmeplan:
3633,,,,,,,,,,,Nej,
3634,,,,,,,,,,,Nej,
3660,,,,,,,Pillefyr + pejs,,,,Individuel opvarmning,
3661,,,,,,,Fjernvarme,,,,Fjernvarme,
3704,,,,,,,Fjernvarme,,,,Fjernvarme,
3709,,,,,,,Fjernvarme,,,,Fjernvarme,
3714,,,,,,,,,,,Fjernvarme,
3723,,,,,,,Fjernvarme,,,,Ingen,
3724,,,,,,,,,,,Ingen,
3726,,,,,,,,,,,Fjernvarme,


Let's merge it in and discard the rest.

In [740]:
df_clean['Varmeinstallation'].fillna(df_clean['Varmeinstallation:'], inplace = True)
df_clean['Varmeinstallation'].fillna(df_clean['Varmeplan'], inplace = True)
df_clean['Varmeinstallation'].fillna(df_clean['Varmeplan:'], inplace = True)
df_clean[heat_cols].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 8515 entries, 0 to 51069
Data columns (total 12 columns):
Varemneinstallation    1 non-null object
Varme                  49 non-null object
Varme installation     1 non-null object
Varme:                 3 non-null object
Varmeforhold           2 non-null object
Varmeforsyning         9 non-null object
Varmeinstallation      428 non-null object
Varmeinstallation:     15 non-null object
Varmekilde             6 non-null object
Varmekilde primær      1 non-null object
Varmeplan              26 non-null object
Varmeplan:             2 non-null object
dtypes: object(12)
memory usage: 864.8+ KB


In [744]:
heat_cols.remove('Varmeinstallation')

ValueError: list.remove(x): x not in list

In [748]:
heat_cols

['Varemneinstallation',
 'Varme',
 'Varme installation',
 'Varme:',
 'Varmeforhold',
 'Varmeforsyning',
 'Varmeinstallation:',
 'Varmekilde',
 'Varmekilde primær',
 'Varmeplan',
 'Varmeplan:']

In [747]:
df_clean.drop(axis = 1, columns = heat_cols, inplace = True)
df_clean.rename(columns = {"Varmeinstallation": "Heating"}, inplace = True)
df_clean.head()

KeyError: "['Varemneinstallation' 'Varme' 'Varme installation' 'Varme:'\n 'Varmeforhold' 'Varmeforsyning' 'Varmeinstallation:' 'Varmekilde'\n 'Varmekilde primær' 'Varmeplan' 'Varmeplan:'] not found in axis"

In [749]:
df_clean.head()

Unnamed: 0,adresse,boligOrGrundAreal,city,ejendomstypePrimaerNicename,isNew,lat,lng,postal,price,sagsnummer,Aconto forbrug pr. måned,Afstand indkøb,Afstand til indkøb,Afstand til off. transport,Afstand til skole,Afstand til skov,Afstand til vand,Balcony,Anetenneforhold,Antal plan,Antal rum,Antal toiletter,Antenne,Antenne forh.,Antenne forhold,Antenne og internet,Antenne/bredbånd,Antenne/internet,Antenne/internet:,Antenne/parabol,Antenne/tv/internet,Antenne:,Antenneforh.,Antenneforhold,Antenneforhold:,Antennetilslutning,Boligareal,Bredbånd,Brutto/Netto  ?  ekskl. ejerudgift,Brutto/Netto  ekskl. ejerudgift,Byggeår,Ejendomsværdi i kr.,Energimærke,Etage,Grundareal,Gulv,Gulvbelægninger,Gulve,Gulve/lofter,Gulve:,Heraf grundværdi i kr.,Indkøb,Internet,Internet-/tv-forhold,Internet/antenne,Internet/tv,Internet:,Kabel-tv,Kloak,Kloak - tilslutningsbidra,Kloak/vand/vej,Kloak:,Kloakering,Kælderareal,Kælderrum,Loft,Lofter,Lofter/gulve,Lofter:,Prisudvikling,Skole,Skole:,Skoledistrikt,Skoletihørsforhold,Skoletilhørsforhold,Tilbehør,Varmeinstallation,Windows,Vurderingsår,Exterior
0,"A.D. Jørgensens Vej 75, 2. 1.",35.0,Frederiksberg,Ejerlejlighed,False,55.680726,12.494705,2000,1.350.000,1300000111,,,1.100 m,150 m,2.600 m,,,No,,1.0,1.0,1.0,,,,,,,,,,,,,,,35 m2,,5.249 / 4.628 kr.,,1991.0,780.000,B,2.0,,,,,,,168.3,,,,,,,,,,,,,,,,,,,-7%,,,,,,Alle nuværende hårde hvidevarer i lejligheden ...,,,2018.0,
1,"Holger Danskes Vej 14, 3. th.",46.0,Frederiksberg,Ejerlejlighed,False,55.686615,12.538356,2000,2.145.000,1740000062,,,150 m,150 m,100 m,,,,,1.0,2.0,1.0,,,,,,,,,,,,,,,46 m2,,8.313 / 7.331 kr.,,1885.0,1.100.000,D,3.0,,,,,,,145.0,,,,,,,,,,,,,,,,,,,-6%,,,,,,Intuition gaskomfur komfur - Matsui Fridge Do...,,,2018.0,
11,"Lyøvej 5, st.. tv.",60.0,Frederiksberg,Ejerlejlighed,False,55.68294,12.524527,2000,2.875.000,1300000128,,,650 m,100 m,550 m,,,No,,1.0,2.0,1.0,,,,,,,,,,,,,,,77 m2,,10.446 / 9.212 kr.,,1988.0,1.600.000,C,3.0,,,,,,,259.1,,,,,,,,,,,,,,,,,,,-4%,,,,,,emhætte - bordkomfur glaskeramisk - indbygni...,,,2018.0,
12,"H. Schneekloths Vej 13, 5. th.",56.0,Frederiksberg,Ejerlejlighed,False,55.679928,12.506927,2000,2.750.000,130D01015,,,50 m,150 m,550 m,,,Yes,,1.0,2.0,1.0,,,,,,,,,,,,,,,74 m2,,10.075 / 8.884 kr.,,1972.0,1.350.000,D,2.0,,,,,,,164.3,,,,,,,,,,,,,,,,,,,0%,,,,,,De i lejligheden hårde hvidevarer medfølger i ...,,,2018.0,
30,"Howitzvej 61, 3. th.",67.0,Frederiksberg,Ejerlejlighed,True,55.680209,12.523998,2000,3.195.000,1300000176,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
