Purpose

The purpose of this data science project is to come up with a pricing model for the the rents charged for german apartments. The project aims to build a predictive model for rents based on facilities and properties boasted by each individual listing. The model may be useful to companies or individuals looking to price their apartments for rents. 

Imports

In [289]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
from pathlib import Path
import requests
import pandas_profiling
from pandas_profiling.utils.cache import cache_file



Data Wrangling Objectives

Fundamental questions to resolve in this notebook before moving on.
Is this the right type of data that should be used to model rent prices?
    Has the required target feature been identified?
    Are there potentially useful features?
Are there fundamental issues with the data?

Loading the raw data

In [290]:
immo_data= pd.read_csv('../data/raw/immo_data.csv')

Data Definition

The features of an apartment (the entity) has to be defined in sufficient detail for further analysis. In doing so we gain an understanding of its relevance to the each apartment record. We also identify our feature of interest - the target feature - which in our case is the rent charged for each apartment, and start to get a sense for how these features are related to our target feature. Of relevance is representing the features in the right data format for further processing and taking note of features that have limited values. 

We start inspecting the dataframe by reviewing summary level information about the features and viewing the first view entries

This report gives us all the summary level information we want about the data set in a lot of detail

In [291]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [292]:
report = immo_data.profile_report(sort='None', html={'style':{'full_width': True}}, progress_bar=False)


The info command gives a sense to the shape and quality of the dataframe.

In [293]:
immo_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 268850 entries, 0 to 268849
Data columns (total 49 columns):
 #   Column                    Non-Null Count   Dtype  
---  ------                    --------------   -----  
 0   regio1                    268850 non-null  object 
 1   serviceCharge             261941 non-null  float64
 2   heatingType               223994 non-null  object 
 3   telekomTvOffer            236231 non-null  object 
 4   telekomHybridUploadSpeed  45020 non-null   float64
 5   newlyConst                268850 non-null  bool   
 6   balcony                   268850 non-null  bool   
 7   picturecount              268850 non-null  int64  
 8   pricetrend                267018 non-null  float64
 9   telekomUploadSpeed        235492 non-null  float64
 10  totalRent                 228333 non-null  float64
 11  yearConstructed           211805 non-null  float64
 12  scoutId                   268850 non-null  int64  
 13  noParkSpaces              93052 non-null   f

We observe that the dataframe has 49 features associated with 268,850 apartment records that have been put up for rent. The dataframe features comprise of 4 main datatypes, including 6 boolean, 18 float, 6 integer and 19 object datatypes. We would need to verfiy that these datatypes are accurate in the context of our goal of developing a price model for rent. We also note a varying level of quality associated with each feature in terms of how many records exist for the particular feature. We would need to deal with those features that have a lot of empty values. One last observation is the amount of memory used up by the dataframe (89.7+ MB).This size is manageable with the operating system currently in use. A larger dataframe would necessitate the use of additional memory, possibly with cloud computing technologies. 

In [294]:
immo_data.head()

Unnamed: 0,regio1,serviceCharge,heatingType,telekomTvOffer,telekomHybridUploadSpeed,newlyConst,balcony,picturecount,pricetrend,telekomUploadSpeed,totalRent,yearConstructed,scoutId,noParkSpaces,firingTypes,hasKitchen,geo_bln,cellar,yearConstructedRange,baseRent,houseNumber,livingSpace,geo_krs,condition,interiorQual,petsAllowed,street,streetPlain,lift,baseRentRange,typeOfFlat,geo_plz,noRooms,thermalChar,floor,numberOfFloors,noRoomsRange,garden,livingSpaceRange,regio2,regio3,description,facilities,heatingCosts,energyEfficiencyClass,lastRefurbish,electricityBasePrice,electricityKwhPrice,date
0,Nordrhein_Westfalen,245.0,central_heating,ONE_YEAR_FREE,,False,False,6,4.62,10.0,840.0,1965.0,96107057,1.0,oil,False,Nordrhein_Westfalen,True,2.0,595.0,244.0,86.0,Dortmund,well_kept,normal,,Sch&uuml;ruferstra&szlig;e,Schüruferstraße,False,4,ground_floor,44269,4.0,181.4,1.0,3.0,4,True,4,Dortmund,Schüren,Die ebenerdig zu erreichende Erdgeschosswohnun...,Die Wohnung ist mit Laminat ausgelegt. Das Bad...,,,,,,May19
1,Rheinland_Pfalz,134.0,self_contained_central_heating,ONE_YEAR_FREE,,False,True,8,3.47,10.0,,1871.0,111378734,2.0,gas,False,Rheinland_Pfalz,False,1.0,800.0,,89.0,Rhein_Pfalz_Kreis,refurbished,normal,no,no_information,,False,5,ground_floor,67459,3.0,,,,3,False,4,Rhein_Pfalz_Kreis,Böhl_Iggelheim,Alles neu macht der Mai – so kann es auch für ...,,,,2019.0,,,May19
2,Sachsen,255.0,floor_heating,ONE_YEAR_FREE,10.0,True,True,8,2.72,2.4,1300.0,2019.0,113147523,1.0,,False,Sachsen,True,9.0,965.0,4.0,83.8,Dresden,first_time_use,sophisticated,,Turnerweg,Turnerweg,True,6,apartment,1097,3.0,,3.0,4.0,3,False,4,Dresden,Äußere_Neustadt_Antonstadt,Der Neubau entsteht im Herzen der Dresdner Neu...,"* 9 m² Balkon\n* Bad mit bodengleicher Dusche,...",,,,,,Oct19
3,Sachsen,58.15,district_heating,ONE_YEAR_FREE,,False,True,9,1.53,40.0,,1964.0,108890903,,district_heating,False,Sachsen,False,2.0,343.0,35.0,58.15,Mittelsachsen_Kreis,,,,Gl&uuml;ck-Auf-Stra&szlig;e,Glück-Auf-Straße,False,2,other,9599,3.0,86.0,3.0,,3,False,2,Mittelsachsen_Kreis,Freiberg,Abseits von Lärm und Abgasen in Ihre neue Wohn...,,87.23,,,,,May19
4,Bremen,138.0,self_contained_central_heating,,,False,True,19,2.46,,903.0,1950.0,114751222,,gas,False,Bremen,False,1.0,765.0,10.0,84.97,Bremen,refurbished,,,Hermann-Henrich-Meier-Allee,Hermann-Henrich-Meier-Allee,False,5,apartment,28213,3.0,188.9,1.0,,3,False,4,Bremen,Neu_Schwachhausen,Es handelt sich hier um ein saniertes Mehrfami...,Diese Wohnung wurde neu saniert und ist wie fo...,,,,,,Feb20


The head method alone allows us to see some, but not all the features of the dataframe. We need additional code to view all the features. 

In [295]:
pd.options.display.max_columns = None
immo_data.head()

Unnamed: 0,regio1,serviceCharge,heatingType,telekomTvOffer,telekomHybridUploadSpeed,newlyConst,balcony,picturecount,pricetrend,telekomUploadSpeed,totalRent,yearConstructed,scoutId,noParkSpaces,firingTypes,hasKitchen,geo_bln,cellar,yearConstructedRange,baseRent,houseNumber,livingSpace,geo_krs,condition,interiorQual,petsAllowed,street,streetPlain,lift,baseRentRange,typeOfFlat,geo_plz,noRooms,thermalChar,floor,numberOfFloors,noRoomsRange,garden,livingSpaceRange,regio2,regio3,description,facilities,heatingCosts,energyEfficiencyClass,lastRefurbish,electricityBasePrice,electricityKwhPrice,date
0,Nordrhein_Westfalen,245.0,central_heating,ONE_YEAR_FREE,,False,False,6,4.62,10.0,840.0,1965.0,96107057,1.0,oil,False,Nordrhein_Westfalen,True,2.0,595.0,244.0,86.0,Dortmund,well_kept,normal,,Sch&uuml;ruferstra&szlig;e,Schüruferstraße,False,4,ground_floor,44269,4.0,181.4,1.0,3.0,4,True,4,Dortmund,Schüren,Die ebenerdig zu erreichende Erdgeschosswohnun...,Die Wohnung ist mit Laminat ausgelegt. Das Bad...,,,,,,May19
1,Rheinland_Pfalz,134.0,self_contained_central_heating,ONE_YEAR_FREE,,False,True,8,3.47,10.0,,1871.0,111378734,2.0,gas,False,Rheinland_Pfalz,False,1.0,800.0,,89.0,Rhein_Pfalz_Kreis,refurbished,normal,no,no_information,,False,5,ground_floor,67459,3.0,,,,3,False,4,Rhein_Pfalz_Kreis,Böhl_Iggelheim,Alles neu macht der Mai – so kann es auch für ...,,,,2019.0,,,May19
2,Sachsen,255.0,floor_heating,ONE_YEAR_FREE,10.0,True,True,8,2.72,2.4,1300.0,2019.0,113147523,1.0,,False,Sachsen,True,9.0,965.0,4.0,83.8,Dresden,first_time_use,sophisticated,,Turnerweg,Turnerweg,True,6,apartment,1097,3.0,,3.0,4.0,3,False,4,Dresden,Äußere_Neustadt_Antonstadt,Der Neubau entsteht im Herzen der Dresdner Neu...,"* 9 m² Balkon\n* Bad mit bodengleicher Dusche,...",,,,,,Oct19
3,Sachsen,58.15,district_heating,ONE_YEAR_FREE,,False,True,9,1.53,40.0,,1964.0,108890903,,district_heating,False,Sachsen,False,2.0,343.0,35.0,58.15,Mittelsachsen_Kreis,,,,Gl&uuml;ck-Auf-Stra&szlig;e,Glück-Auf-Straße,False,2,other,9599,3.0,86.0,3.0,,3,False,2,Mittelsachsen_Kreis,Freiberg,Abseits von Lärm und Abgasen in Ihre neue Wohn...,,87.23,,,,,May19
4,Bremen,138.0,self_contained_central_heating,,,False,True,19,2.46,,903.0,1950.0,114751222,,gas,False,Bremen,False,1.0,765.0,10.0,84.97,Bremen,refurbished,,,Hermann-Henrich-Meier-Allee,Hermann-Henrich-Meier-Allee,False,5,apartment,28213,3.0,188.9,1.0,,3,False,4,Bremen,Neu_Schwachhausen,Es handelt sich hier um ein saniertes Mehrfami...,Diese Wohnung wurde neu saniert und ist wie fo...,,,,,,Feb20


We observe a variety of prices, costs and factors that may be used to determine rent. These includes serviceCharge, priceTrend, totalRent, baseRent, baseRentRange, heatingCosts, electricityBasePrice, electricityKwhPrice. We also notice that there appears to be a repeat of regional information. We also notice a good amount of text data in german that requires translation to english. Let's not the different features of the dataframe. 

Explore the Data

First of all we need to identify each record uniquely. It would be preferrable not to use address information captured in the street and streetPlain feature since this involves using 2 feature and the datatype of these feature is an object (text), which makes it difficult for processing. However, there is scoutID feature that may work. This feature must have unique values for all 268,849 apartment records. 

In [296]:
immo_data["scoutId"].nunique()

268850

Perfect! Each scoutId value is distinct and exists for all 268,850 records. 

Let's now move scoutId to the first column as a unique identifier for each record. 

In [297]:
cols = list(immo_data) #Get a list of column names for the dataframe
#Get the list index of scoutId using list index method, pop it from the list using the list pop method, and insert the value (scoutId) as the first list item using the list insert method
cols.insert(0,cols.pop(cols.index("scoutId")))
#Use the loc indexer to call the dataframe with the new column arrangement and assign to the original dataframe
immo_data = immo_data.loc[:, cols]

In [298]:
pd.options.display.max_columns = 6
immo_data.head(2)

Unnamed: 0,scoutId,regio1,serviceCharge,...,electricityBasePrice,electricityKwhPrice,date
0,96107057,Nordrhein_Westfalen,245.0,...,,,May19
1,111378734,Rheinland_Pfalz,134.0,...,,,May19


scoutId is now the first feature that identifies each apartment record

In [299]:
pd.options.display.max_columns = None
immo_data.head()

Unnamed: 0,scoutId,regio1,serviceCharge,heatingType,telekomTvOffer,telekomHybridUploadSpeed,newlyConst,balcony,picturecount,pricetrend,telekomUploadSpeed,totalRent,yearConstructed,noParkSpaces,firingTypes,hasKitchen,geo_bln,cellar,yearConstructedRange,baseRent,houseNumber,livingSpace,geo_krs,condition,interiorQual,petsAllowed,street,streetPlain,lift,baseRentRange,typeOfFlat,geo_plz,noRooms,thermalChar,floor,numberOfFloors,noRoomsRange,garden,livingSpaceRange,regio2,regio3,description,facilities,heatingCosts,energyEfficiencyClass,lastRefurbish,electricityBasePrice,electricityKwhPrice,date
0,96107057,Nordrhein_Westfalen,245.0,central_heating,ONE_YEAR_FREE,,False,False,6,4.62,10.0,840.0,1965.0,1.0,oil,False,Nordrhein_Westfalen,True,2.0,595.0,244.0,86.0,Dortmund,well_kept,normal,,Sch&uuml;ruferstra&szlig;e,Schüruferstraße,False,4,ground_floor,44269,4.0,181.4,1.0,3.0,4,True,4,Dortmund,Schüren,Die ebenerdig zu erreichende Erdgeschosswohnun...,Die Wohnung ist mit Laminat ausgelegt. Das Bad...,,,,,,May19
1,111378734,Rheinland_Pfalz,134.0,self_contained_central_heating,ONE_YEAR_FREE,,False,True,8,3.47,10.0,,1871.0,2.0,gas,False,Rheinland_Pfalz,False,1.0,800.0,,89.0,Rhein_Pfalz_Kreis,refurbished,normal,no,no_information,,False,5,ground_floor,67459,3.0,,,,3,False,4,Rhein_Pfalz_Kreis,Böhl_Iggelheim,Alles neu macht der Mai – so kann es auch für ...,,,,2019.0,,,May19
2,113147523,Sachsen,255.0,floor_heating,ONE_YEAR_FREE,10.0,True,True,8,2.72,2.4,1300.0,2019.0,1.0,,False,Sachsen,True,9.0,965.0,4.0,83.8,Dresden,first_time_use,sophisticated,,Turnerweg,Turnerweg,True,6,apartment,1097,3.0,,3.0,4.0,3,False,4,Dresden,Äußere_Neustadt_Antonstadt,Der Neubau entsteht im Herzen der Dresdner Neu...,"* 9 m² Balkon\n* Bad mit bodengleicher Dusche,...",,,,,,Oct19
3,108890903,Sachsen,58.15,district_heating,ONE_YEAR_FREE,,False,True,9,1.53,40.0,,1964.0,,district_heating,False,Sachsen,False,2.0,343.0,35.0,58.15,Mittelsachsen_Kreis,,,,Gl&uuml;ck-Auf-Stra&szlig;e,Glück-Auf-Straße,False,2,other,9599,3.0,86.0,3.0,,3,False,2,Mittelsachsen_Kreis,Freiberg,Abseits von Lärm und Abgasen in Ihre neue Wohn...,,87.23,,,,,May19
4,114751222,Bremen,138.0,self_contained_central_heating,,,False,True,19,2.46,,903.0,1950.0,,gas,False,Bremen,False,1.0,765.0,10.0,84.97,Bremen,refurbished,,,Hermann-Henrich-Meier-Allee,Hermann-Henrich-Meier-Allee,False,5,apartment,28213,3.0,188.9,1.0,,3,False,4,Bremen,Neu_Schwachhausen,Es handelt sich hier um ein saniertes Mehrfami...,Diese Wohnung wurde neu saniert und ist wie fo...,,,,,,Feb20


Number of missing values per feature

In [300]:
# To create a dataframe with  %age of values that are missing
missing = pd.concat([immo_data.isnull().sum(), 100 * immo_data.isnull().mean()], axis = 1)
missing.columns = ['count', '%']
missing.sort_values(by = 'count').T

Unnamed: 0,scoutId,regio3,regio2,livingSpaceRange,garden,noRoomsRange,noRooms,geo_plz,baseRentRange,lift,street,geo_krs,livingSpace,baseRent,cellar,geo_bln,date,newlyConst,regio1,hasKitchen,picturecount,balcony,pricetrend,serviceCharge,description,telekomTvOffer,telekomUploadSpeed,typeOfFlat,totalRent,heatingType,floor,facilities,firingTypes,yearConstructed,yearConstructedRange,condition,streetPlain,houseNumber,numberOfFloors,thermalChar,interiorQual,petsAllowed,noParkSpaces,heatingCosts,lastRefurbish,energyEfficiencyClass,electricityKwhPrice,electricityBasePrice,telekomHybridUploadSpeed
count,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1832.0,6909.0,19747.0,32619.0,33358.0,36614.0,40517.0,44856.0,51309.0,52924.0,56964.0,57045.0,57045.0,68489.0,71013.0,71018.0,97732.0,106506.0,112665.0,114573.0,175798.0,183332.0,188139.0,191063.0,222004.0,222004.0,223830.0
%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.681421,2.569834,7.344988,12.132788,12.407662,13.618747,15.070485,16.684397,19.08462,19.685326,21.188023,21.218151,21.218151,25.4748,26.413614,26.415473,36.351869,39.615399,41.906267,42.615957,65.388879,68.191185,69.979171,71.066766,82.575414,82.575414,83.254603


We notice that 27 features have missing values, with 7 features having over 50% of values missing. We also notice that while baseRent has no missing values, totalRent has over 15% of its values missing. We need to decide whether to use basRent over totalRent as our target feature. 

Categorical Features

We need to treat categorical features differently from numeric features. We first of all determine the categorical features starting with those having an object datatype. 

In [301]:
immo_cat = immo_data.select_dtypes('object')

In [302]:
immo_cat.head()

Unnamed: 0,regio1,heatingType,telekomTvOffer,firingTypes,geo_bln,houseNumber,geo_krs,condition,interiorQual,petsAllowed,street,streetPlain,typeOfFlat,regio2,regio3,description,facilities,energyEfficiencyClass,date
0,Nordrhein_Westfalen,central_heating,ONE_YEAR_FREE,oil,Nordrhein_Westfalen,244.0,Dortmund,well_kept,normal,,Sch&uuml;ruferstra&szlig;e,Schüruferstraße,ground_floor,Dortmund,Schüren,Die ebenerdig zu erreichende Erdgeschosswohnun...,Die Wohnung ist mit Laminat ausgelegt. Das Bad...,,May19
1,Rheinland_Pfalz,self_contained_central_heating,ONE_YEAR_FREE,gas,Rheinland_Pfalz,,Rhein_Pfalz_Kreis,refurbished,normal,no,no_information,,ground_floor,Rhein_Pfalz_Kreis,Böhl_Iggelheim,Alles neu macht der Mai – so kann es auch für ...,,,May19
2,Sachsen,floor_heating,ONE_YEAR_FREE,,Sachsen,4.0,Dresden,first_time_use,sophisticated,,Turnerweg,Turnerweg,apartment,Dresden,Äußere_Neustadt_Antonstadt,Der Neubau entsteht im Herzen der Dresdner Neu...,"* 9 m² Balkon\n* Bad mit bodengleicher Dusche,...",,Oct19
3,Sachsen,district_heating,ONE_YEAR_FREE,district_heating,Sachsen,35.0,Mittelsachsen_Kreis,,,,Gl&uuml;ck-Auf-Stra&szlig;e,Glück-Auf-Straße,other,Mittelsachsen_Kreis,Freiberg,Abseits von Lärm und Abgasen in Ihre neue Wohn...,,,May19
4,Bremen,self_contained_central_heating,,gas,Bremen,10.0,Bremen,refurbished,,,Hermann-Henrich-Meier-Allee,Hermann-Henrich-Meier-Allee,apartment,Bremen,Neu_Schwachhausen,Es handelt sich hier um ein saniertes Mehrfami...,Diese Wohnung wurde neu saniert und ist wie fo...,,Feb20


In [303]:
immo_cat.shape

(268850, 19)

We have 19 object datatypes 

In [304]:
missing_cat = pd.concat([immo_cat.isnull().sum(), 100 * immo_cat.isnull().mean()], axis = 1)
missing_cat.columns = ['count', '%']
missing_cat.sort_values(by = 'count').T

Unnamed: 0,regio1,regio3,regio2,street,geo_krs,date,geo_bln,description,telekomTvOffer,typeOfFlat,heatingType,facilities,firingTypes,condition,streetPlain,houseNumber,interiorQual,petsAllowed,energyEfficiencyClass
count,0.0,0.0,0.0,0.0,0.0,0.0,0.0,19747.0,32619.0,36614.0,44856.0,52924.0,56964.0,68489.0,71013.0,71018.0,112665.0,114573.0,191063.0
%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7.344988,12.132788,13.618747,16.684397,19.685326,21.188023,25.4748,26.413614,26.415473,41.906267,42.615957,71.066766


Let's examine the features that have complete values

In [305]:
complete_cat = immo_cat.loc[:, immo_cat.isnull().sum() == 0]

In [306]:
complete_cat.head()

Unnamed: 0,regio1,geo_bln,geo_krs,street,regio2,regio3,date
0,Nordrhein_Westfalen,Nordrhein_Westfalen,Dortmund,Sch&uuml;ruferstra&szlig;e,Dortmund,Schüren,May19
1,Rheinland_Pfalz,Rheinland_Pfalz,Rhein_Pfalz_Kreis,no_information,Rhein_Pfalz_Kreis,Böhl_Iggelheim,May19
2,Sachsen,Sachsen,Dresden,Turnerweg,Dresden,Äußere_Neustadt_Antonstadt,Oct19
3,Sachsen,Sachsen,Mittelsachsen_Kreis,Gl&uuml;ck-Auf-Stra&szlig;e,Mittelsachsen_Kreis,Freiberg,May19
4,Bremen,Bremen,Bremen,Hermann-Henrich-Meier-Allee,Bremen,Neu_Schwachhausen,Feb20


The data owner noted that regio1 and geo_bln have the same values, and geo_krs and regio2 have the same values as well. We can test these and a redundant feature. 

In [307]:
(complete_cat.regio1 != complete_cat.geo_bln).value_counts() # Number of records where it is regio1 is not equal to geo_bln

False    268850
dtype: int64

This implies that it is never true that regio1 is not equal to geo_bln. We will drop geo_bln

In [308]:
(complete_cat.geo_krs != complete_cat.regio2).value_counts() # Number of records where geo_krs and regio2 are not equal

False    268850
dtype: int64

This implies that it is never true that regio2 and goe_krs are not equal. We will drop geo_krs

In [309]:
complete_cat.drop(columns = ['geo_bln', 'geo_krs'], axis = 1, inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


In [310]:
complete_cat.columns #Verifying that geo_bln and geo_krs are gone from the complete_cat dataframe

Index(['regio1', 'street', 'regio2', 'regio3', 'date'], dtype='object')

In [311]:
immo_data.drop(columns = ['geo_bln', 'geo_krs'], axis = 1, inplace = True)

In [312]:
immo_data.select_dtypes("object").columns #Verifying that the geo_bln and geo_krs features are gone from the immo_data dataframe

Index(['regio1', 'heatingType', 'telekomTvOffer', 'firingTypes', 'houseNumber',
       'condition', 'interiorQual', 'petsAllowed', 'street', 'streetPlain',
       'typeOfFlat', 'regio2', 'regio3', 'description', 'facilities',
       'energyEfficiencyClass', 'date'],
      dtype='object')

In [313]:
complete_cat.head()

Unnamed: 0,regio1,street,regio2,regio3,date
0,Nordrhein_Westfalen,Sch&uuml;ruferstra&szlig;e,Dortmund,Schüren,May19
1,Rheinland_Pfalz,no_information,Rhein_Pfalz_Kreis,Böhl_Iggelheim,May19
2,Sachsen,Turnerweg,Dresden,Äußere_Neustadt_Antonstadt,Oct19
3,Sachsen,Gl&uuml;ck-Auf-Stra&szlig;e,Mittelsachsen_Kreis,Freiberg,May19
4,Bremen,Hermann-Henrich-Meier-Allee,Bremen,Neu_Schwachhausen,Feb20


Let's now examine the features with incomplete values

In [314]:
incomplete_cat = immo_cat.loc[:, immo_cat.isnull().sum() > 0]

In [315]:
incomplete_cat.columns

Index(['heatingType', 'telekomTvOffer', 'firingTypes', 'houseNumber',
       'condition', 'interiorQual', 'petsAllowed', 'streetPlain', 'typeOfFlat',
       'description', 'facilities', 'energyEfficiencyClass'],
      dtype='object')

In [316]:
incomplete_cat.head()

Unnamed: 0,heatingType,telekomTvOffer,firingTypes,houseNumber,condition,interiorQual,petsAllowed,streetPlain,typeOfFlat,description,facilities,energyEfficiencyClass
0,central_heating,ONE_YEAR_FREE,oil,244.0,well_kept,normal,,Schüruferstraße,ground_floor,Die ebenerdig zu erreichende Erdgeschosswohnun...,Die Wohnung ist mit Laminat ausgelegt. Das Bad...,
1,self_contained_central_heating,ONE_YEAR_FREE,gas,,refurbished,normal,no,,ground_floor,Alles neu macht der Mai – so kann es auch für ...,,
2,floor_heating,ONE_YEAR_FREE,,4.0,first_time_use,sophisticated,,Turnerweg,apartment,Der Neubau entsteht im Herzen der Dresdner Neu...,"* 9 m² Balkon\n* Bad mit bodengleicher Dusche,...",
3,district_heating,ONE_YEAR_FREE,district_heating,35.0,,,,Glück-Auf-Straße,other,Abseits von Lärm und Abgasen in Ihre neue Wohn...,,
4,self_contained_central_heating,,gas,10.0,refurbished,,,Hermann-Henrich-Meier-Allee,apartment,Es handelt sich hier um ein saniertes Mehrfami...,Diese Wohnung wurde neu saniert und ist wie fo...,


Let's review the number of unique values associated with the features

In [317]:
incomplete_cat.nunique()

heatingType                  13
telekomTvOffer                3
firingTypes                 132
houseNumber                5510
condition                    10
interiorQual                  4
petsAllowed                   3
streetPlain               54490
typeOfFlat                   10
description              212621
facilities               189526
energyEfficiencyClass        10
dtype: int64

Let's now get a sense for the values of features with distinct values that are greater than 15

In [318]:
distinct = incomplete_cat.nunique() # Array of the number of distinct values

In [319]:
distinct_15 = distinct[distinct < 15] # Filter that shows the number of distinct values greater than 15

In [320]:
distinct_15

heatingType              13
telekomTvOffer            3
condition                10
interiorQual              4
petsAllowed               3
typeOfFlat               10
energyEfficiencyClass    10
dtype: int64

In [321]:
index_15 = distinct_15.index # Index of distinct values greater than 15 corresponding to the column names

In [322]:
[[i, incomplete_cat[i].unique()] for i in index_15] # List that provides the feature and the names of unique values for the feature

[['heatingType',
  array(['central_heating', 'self_contained_central_heating',
         'floor_heating', 'district_heating', 'gas_heating', 'oil_heating',
         nan, 'wood_pellet_heating', 'electric_heating',
         'combined_heat_and_power_plant', 'heat_pump',
         'night_storage_heater', 'stove_heating', 'solar_heating'],
        dtype=object)],
 ['telekomTvOffer',
  array(['ONE_YEAR_FREE', nan, 'NONE', 'ON_DEMAND'], dtype=object)],
 ['condition',
  array(['well_kept', 'refurbished', 'first_time_use', nan,
         'fully_renovated', 'mint_condition',
         'first_time_use_after_refurbishment', 'modernized', 'negotiable',
         'need_of_renovation', 'ripe_for_demolition'], dtype=object)],
 ['interiorQual',
  array(['normal', 'sophisticated', nan, 'simple', 'luxury'], dtype=object)],
 ['petsAllowed', array([nan, 'no', 'negotiable', 'yes'], dtype=object)],
 ['typeOfFlat',
  array(['ground_floor', 'apartment', 'other', nan, 'roof_storey',
         'raised_ground_floor', '

By reviewing the values of these features, we note that we are unlikely to fill in the missing values with a representative value. We will come back to these later. 

In [323]:
distinct_plus = distinct[distinct > 15] # Filter that shows the number of distinct values greater than 15

In [324]:
distinct_plus

firingTypes       132
houseNumber      5510
streetPlain     54490
description    212621
facilities     189526
dtype: int64

In [325]:
incomplete_cat.loc[:, distinct_plus.index].head(10)

Unnamed: 0,firingTypes,houseNumber,streetPlain,description,facilities
0,oil,244,Schüruferstraße,Die ebenerdig zu erreichende Erdgeschosswohnun...,Die Wohnung ist mit Laminat ausgelegt. Das Bad...
1,gas,,,Alles neu macht der Mai – so kann es auch für ...,
2,,4,Turnerweg,Der Neubau entsteht im Herzen der Dresdner Neu...,"* 9 m² Balkon\n* Bad mit bodengleicher Dusche,..."
3,district_heating,35,Glück-Auf-Straße,Abseits von Lärm und Abgasen in Ihre neue Wohn...,
4,gas,10,Hermann-Henrich-Meier-Allee,Es handelt sich hier um ein saniertes Mehrfami...,Diese Wohnung wurde neu saniert und ist wie fo...
5,gas,1e,Hardeseiche,,"helle ebenerdige 2 Zi. Wohnung mit Terrasse, h..."
6,,14,Am_Bahnhof,Am Bahnhof 14 in Freiberg\nHeizkosten und Warm...,
7,gas:electricity,35,Lesumer_Heerstr.,+ Komfortabler Bodenbelag: Die Wohnung ist zus...,Rollläden; Warmwasserbereiter; Kellerraum; Gas...
8,oil,,,"Diese ansprechende, lichtdurchflutete DG-Wohnu...","Parkett, Einbauküche, kein Balkon"
9,gas,30,Hüttenstr.,Sie sind auf der Suche nach einer gepflegten u...,In Ihrem neuen Zuhause können Sie nach wenigen...


In [326]:
incomplete_cat.firingTypes.unique()[:20]

array(['oil', 'gas', nan, 'district_heating', 'gas:electricity',
       'electricity', 'pellet_heating', 'natural_gas_light',
       'combined_heat_and_power_fossil_fuels',
       'district_heating:local_heating', 'steam_district_heating',
       'natural_gas_heavy', 'gas:district_heating', 'solar_heating:gas',
       'environmental_thermal_energy', 'local_heating',
       'gas:natural_gas_light', 'geothermal',
       'combined_heat_and_power_regenerative_energy', 'heat_supply'],
      dtype=object)

firingTypes is described by the data owner as the "main energy sources, separated by colon". This is somewhat similar to the heatingType feature described as the "Type of heating". We can determine how often these features have the same value.

In [327]:
(immo_data.firingTypes != immo_data.heatingType).value_counts()

True     249296
False     19554
dtype: int64

We note that the firingTypes and heatingTypes feature are typically not the same. As mentioned earlier on, the firingTypes include all the fuel sources separated by a column. We can separate these fuel sources and pick the first fuel source as the main fuel source in lieu of having a list. 

In [328]:
split_data = immo_data["firingTypes"].str.split(":", expand = True)

In [329]:
split_data.tail(50)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12
268800,electricity,,,,,,,,,,,,
268801,gas,,,,,,,,,,,,
268802,gas,,,,,,,,,,,,
268803,gas,,,,,,,,,,,,
268804,gas,,,,,,,,,,,,
268805,natural_gas_light,,,,,,,,,,,,
268806,district_heating,,,,,,,,,,,,
268807,gas,,,,,,,,,,,,
268808,,,,,,,,,,,,,
268809,,,,,,,,,,,,,


We have a record with 13 possible firing types. But most entries have one. We'll give these firing types columns from 1 to 13

In [330]:
split_data.columns = ["fuelType1", "fuelType2", "fuelType3", "fuelType4", "fuelType5", "fuelType6", "fuelType7", "fuelType8", "fuelType9", "fuelType10", "fuelType11", "fuelType12", "fuelType13"]

In [331]:
fueltypes = [[x, split_data[x].count()] for x in ["fuelType1", "fuelType2", "fuelType3", "fuelType4", "fuelType5", "fuelType6", "fuelType7", "fuelType8", "fuelType9", "fuelType10", "fuelType11", "fuelType12", "fuelType13"]]

In [332]:
fueltypes

[['fuelType1', 211886],
 ['fuelType2', 3407],
 ['fuelType3', 69],
 ['fuelType4', 17],
 ['fuelType5', 4],
 ['fuelType6', 4],
 ['fuelType7', 2],
 ['fuelType8', 2],
 ['fuelType9', 2],
 ['fuelType10', 2],
 ['fuelType11', 2],
 ['fuelType12', 2],
 ['fuelType13', 1]]

There are more records in fuelType 1 than the others. We can select fuelType1 as the representive fuel for the firingTypes feature.

In [333]:
primaryFuel = split_data[["fuelType1"]]

In [334]:
immo_data.shape

(268850, 47)

In [335]:
immo_data = pd.concat([immo_data, primaryFuel], axis = 1)

In [336]:
immo_data.shape

(268850, 48)

In [337]:
immo_data.head()

Unnamed: 0,scoutId,regio1,serviceCharge,heatingType,telekomTvOffer,telekomHybridUploadSpeed,newlyConst,balcony,picturecount,pricetrend,telekomUploadSpeed,totalRent,yearConstructed,noParkSpaces,firingTypes,hasKitchen,cellar,yearConstructedRange,baseRent,houseNumber,livingSpace,condition,interiorQual,petsAllowed,street,streetPlain,lift,baseRentRange,typeOfFlat,geo_plz,noRooms,thermalChar,floor,numberOfFloors,noRoomsRange,garden,livingSpaceRange,regio2,regio3,description,facilities,heatingCosts,energyEfficiencyClass,lastRefurbish,electricityBasePrice,electricityKwhPrice,date,fuelType1
0,96107057,Nordrhein_Westfalen,245.0,central_heating,ONE_YEAR_FREE,,False,False,6,4.62,10.0,840.0,1965.0,1.0,oil,False,True,2.0,595.0,244.0,86.0,well_kept,normal,,Sch&uuml;ruferstra&szlig;e,Schüruferstraße,False,4,ground_floor,44269,4.0,181.4,1.0,3.0,4,True,4,Dortmund,Schüren,Die ebenerdig zu erreichende Erdgeschosswohnun...,Die Wohnung ist mit Laminat ausgelegt. Das Bad...,,,,,,May19,oil
1,111378734,Rheinland_Pfalz,134.0,self_contained_central_heating,ONE_YEAR_FREE,,False,True,8,3.47,10.0,,1871.0,2.0,gas,False,False,1.0,800.0,,89.0,refurbished,normal,no,no_information,,False,5,ground_floor,67459,3.0,,,,3,False,4,Rhein_Pfalz_Kreis,Böhl_Iggelheim,Alles neu macht der Mai – so kann es auch für ...,,,,2019.0,,,May19,gas
2,113147523,Sachsen,255.0,floor_heating,ONE_YEAR_FREE,10.0,True,True,8,2.72,2.4,1300.0,2019.0,1.0,,False,True,9.0,965.0,4.0,83.8,first_time_use,sophisticated,,Turnerweg,Turnerweg,True,6,apartment,1097,3.0,,3.0,4.0,3,False,4,Dresden,Äußere_Neustadt_Antonstadt,Der Neubau entsteht im Herzen der Dresdner Neu...,"* 9 m² Balkon\n* Bad mit bodengleicher Dusche,...",,,,,,Oct19,
3,108890903,Sachsen,58.15,district_heating,ONE_YEAR_FREE,,False,True,9,1.53,40.0,,1964.0,,district_heating,False,False,2.0,343.0,35.0,58.15,,,,Gl&uuml;ck-Auf-Stra&szlig;e,Glück-Auf-Straße,False,2,other,9599,3.0,86.0,3.0,,3,False,2,Mittelsachsen_Kreis,Freiberg,Abseits von Lärm und Abgasen in Ihre neue Wohn...,,87.23,,,,,May19,district_heating
4,114751222,Bremen,138.0,self_contained_central_heating,,,False,True,19,2.46,,903.0,1950.0,,gas,False,False,1.0,765.0,10.0,84.97,refurbished,,,Hermann-Henrich-Meier-Allee,Hermann-Henrich-Meier-Allee,False,5,apartment,28213,3.0,188.9,1.0,,3,False,4,Bremen,Neu_Schwachhausen,Es handelt sich hier um ein saniertes Mehrfami...,Diese Wohnung wurde neu saniert und ist wie fo...,,,,,,Feb20,gas


We can drop the original firingTypes column since we have captured relevant information in fuelType

In [338]:
immo_data.drop(columns = ["firingTypes"], axis = 1, inplace = True)

We can now rename fuelType1 as firingType

In [339]:
immo_data.rename(columns={'fuelType1':'firingType'}, inplace=True)


In [340]:
immo_data.columns

Index(['scoutId', 'regio1', 'serviceCharge', 'heatingType', 'telekomTvOffer',
       'telekomHybridUploadSpeed', 'newlyConst', 'balcony', 'picturecount',
       'pricetrend', 'telekomUploadSpeed', 'totalRent', 'yearConstructed',
       'noParkSpaces', 'hasKitchen', 'cellar', 'yearConstructedRange',
       'baseRent', 'houseNumber', 'livingSpace', 'condition', 'interiorQual',
       'petsAllowed', 'street', 'streetPlain', 'lift', 'baseRentRange',
       'typeOfFlat', 'geo_plz', 'noRooms', 'thermalChar', 'floor',
       'numberOfFloors', 'noRoomsRange', 'garden', 'livingSpaceRange',
       'regio2', 'regio3', 'description', 'facilities', 'heatingCosts',
       'energyEfficiencyClass', 'lastRefurbish', 'electricityBasePrice',
       'electricityKwhPrice', 'date', 'firingType'],
      dtype='object')

In [341]:
(immo_data.firingType!= immo_data.heatingType).value_counts()

True     249238
False     19612
dtype: int64

We still observe that by splitting the firingTypes columns and choosing the first fuel type (now firingType), there is an increase in the number of values for which the heatingType is the same as the firingType; however, the diffence is not much. 

In [348]:
immo_data.heatingType.nunique(), immo_data.firingType.nunique()

(13, 25)

In [344]:
immo_data[['heatingType', 'firingType']]

Unnamed: 0,heatingType,firingType
0,central_heating,oil
1,self_contained_central_heating,gas
2,floor_heating,
3,district_heating,district_heating
4,self_contained_central_heating,gas
...,...,...
268845,heat_pump,geothermal
268846,gas_heating,gas
268847,central_heating,gas
268848,heat_pump,gas


In [349]:
heatingList = immo_data.heatingType.unique()

In [351]:
heatingList

array(['central_heating', 'self_contained_central_heating',
       'floor_heating', 'district_heating', 'gas_heating', 'oil_heating',
       nan, 'wood_pellet_heating', 'electric_heating',
       'combined_heat_and_power_plant', 'heat_pump',
       'night_storage_heater', 'stove_heating', 'solar_heating'],
      dtype=object)

In [350]:
firingList = immo_data.firingType.unique()

In [352]:
firingList

array(['oil', 'gas', nan, 'district_heating', 'electricity',
       'pellet_heating', 'natural_gas_light',
       'combined_heat_and_power_fossil_fuels', 'steam_district_heating',
       'natural_gas_heavy', 'solar_heating',
       'environmental_thermal_energy', 'local_heating', 'geothermal',
       'combined_heat_and_power_regenerative_energy', 'heat_supply',
       'liquid_gas', 'wood', 'hydro_energy',
       'combined_heat_and_power_renewable_energy', 'coal', 'bio_energy',
       'wood_chips', 'combined_heat_and_power_bio_energy', 'wind_energy',
       'coal_coke'], dtype=object)

Let's see if we can find common and different elements between these two lists

In [367]:
firingSet = set(firingList)

In [368]:
heatingSet = set(heatingList)

In [369]:
firingSet.intersection(heatingList) # Common values

{'district_heating', nan, 'solar_heating'}

In [374]:
heatingSet.difference(firingSet) # Values in heatingSet that are not in firingSet

{'central_heating',
 'combined_heat_and_power_plant',
 'electric_heating',
 'floor_heating',
 'gas_heating',
 'heat_pump',
 'night_storage_heater',
 'oil_heating',
 'self_contained_central_heating',
 'stove_heating',
 'wood_pellet_heating'}

In [372]:
firingSet.difference(heatingSet) # Values in firingSet that are not in heatingSet

{'bio_energy',
 'coal',
 'coal_coke',
 'combined_heat_and_power_bio_energy',
 'combined_heat_and_power_fossil_fuels',
 'combined_heat_and_power_regenerative_energy',
 'combined_heat_and_power_renewable_energy',
 'electricity',
 'environmental_thermal_energy',
 'gas',
 'geothermal',
 'heat_supply',
 'hydro_energy',
 'liquid_gas',
 'local_heating',
 'natural_gas_heavy',
 'natural_gas_light',
 'oil',
 'pellet_heating',
 'steam_district_heating',
 'wind_energy',
 'wood',
 'wood_chips'}

The values unique to the firingSet are mainly energy sources as noted earlier on. Of these, the environmental_thermal_energy, heat_supply and local_heating are unclear fuel types. district_heating and solar_heating could also be understood as fuel sources for the firingSet. district_heating could use a variety of fuel sources, while solar heating uses energy from the sun. The firingSet feature is therefore easily understood, except for  the aforementioned unclear fuel sources. Conversely, the values unique to the heatingSet are somewhat inconsistent in terms of what this feature represents. heatingSet comprises of heat generation systems (i.e. combined_heat_and_power_plant, electric_heating, night_storage_heater), heat distribution systems (i.e. central_heating, floor_heating, heat_pump, self_contained_central_heating, stove_heating) and fuel types (i.e. gas_heating, oil_heating, wood_pellet_heating). district_heating would be a heat generation system, while solar_heating would be a fuel type for the heatingSet. It would be difficult to establish a relationship between our target feature and the heatingType feature. Moreover, we still need to determine if any of these features matter in determining rent. Do landlords price apartments higher with renewable energy sources than non-renewable sources? We would attempt to answer this question. 

In [343]:
immo_data.head()

Unnamed: 0,scoutId,regio1,serviceCharge,heatingType,telekomTvOffer,telekomHybridUploadSpeed,newlyConst,balcony,picturecount,pricetrend,telekomUploadSpeed,totalRent,yearConstructed,noParkSpaces,hasKitchen,cellar,yearConstructedRange,baseRent,houseNumber,livingSpace,condition,interiorQual,petsAllowed,street,streetPlain,lift,baseRentRange,typeOfFlat,geo_plz,noRooms,thermalChar,floor,numberOfFloors,noRoomsRange,garden,livingSpaceRange,regio2,regio3,description,facilities,heatingCosts,energyEfficiencyClass,lastRefurbish,electricityBasePrice,electricityKwhPrice,date,firingType
0,96107057,Nordrhein_Westfalen,245.0,central_heating,ONE_YEAR_FREE,,False,False,6,4.62,10.0,840.0,1965.0,1.0,False,True,2.0,595.0,244.0,86.0,well_kept,normal,,Sch&uuml;ruferstra&szlig;e,Schüruferstraße,False,4,ground_floor,44269,4.0,181.4,1.0,3.0,4,True,4,Dortmund,Schüren,Die ebenerdig zu erreichende Erdgeschosswohnun...,Die Wohnung ist mit Laminat ausgelegt. Das Bad...,,,,,,May19,oil
1,111378734,Rheinland_Pfalz,134.0,self_contained_central_heating,ONE_YEAR_FREE,,False,True,8,3.47,10.0,,1871.0,2.0,False,False,1.0,800.0,,89.0,refurbished,normal,no,no_information,,False,5,ground_floor,67459,3.0,,,,3,False,4,Rhein_Pfalz_Kreis,Böhl_Iggelheim,Alles neu macht der Mai – so kann es auch für ...,,,,2019.0,,,May19,gas
2,113147523,Sachsen,255.0,floor_heating,ONE_YEAR_FREE,10.0,True,True,8,2.72,2.4,1300.0,2019.0,1.0,False,True,9.0,965.0,4.0,83.8,first_time_use,sophisticated,,Turnerweg,Turnerweg,True,6,apartment,1097,3.0,,3.0,4.0,3,False,4,Dresden,Äußere_Neustadt_Antonstadt,Der Neubau entsteht im Herzen der Dresdner Neu...,"* 9 m² Balkon\n* Bad mit bodengleicher Dusche,...",,,,,,Oct19,
3,108890903,Sachsen,58.15,district_heating,ONE_YEAR_FREE,,False,True,9,1.53,40.0,,1964.0,,False,False,2.0,343.0,35.0,58.15,,,,Gl&uuml;ck-Auf-Stra&szlig;e,Glück-Auf-Straße,False,2,other,9599,3.0,86.0,3.0,,3,False,2,Mittelsachsen_Kreis,Freiberg,Abseits von Lärm und Abgasen in Ihre neue Wohn...,,87.23,,,,,May19,district_heating
4,114751222,Bremen,138.0,self_contained_central_heating,,,False,True,19,2.46,,903.0,1950.0,,False,False,1.0,765.0,10.0,84.97,refurbished,,,Hermann-Henrich-Meier-Allee,Hermann-Henrich-Meier-Allee,False,5,apartment,28213,3.0,188.9,1.0,,3,False,4,Bremen,Neu_Schwachhausen,Es handelt sich hier um ein saniertes Mehrfami...,Diese Wohnung wurde neu saniert und ist wie fo...,,,,,,Feb20,gas


Numeric Features

Let's now examine the features that may have a relationship with rent - our target feature.

In [30]:
#Create a list and dataframe of the relevant rent features
rentList = ['totalRent','baseRent', 'serviceCharge', 'heatingCosts', 'electricityBasePrice', 'electricityKwhPrice', 'baseRentRange', 'pricetrend']
rentDf = immo_data[rentList]

In [32]:
rentDf.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 268850 entries, 0 to 268849
Data columns (total 8 columns):
 #   Column                Non-Null Count   Dtype  
---  ------                --------------   -----  
 0   totalRent             228333 non-null  float64
 1   baseRent              268850 non-null  float64
 2   serviceCharge         261941 non-null  float64
 3   heatingCosts          85518 non-null   float64
 4   electricityBasePrice  46846 non-null   float64
 5   electricityKwhPrice   46846 non-null   float64
 6   baseRentRange         268850 non-null  int64  
 7   pricetrend            267018 non-null  float64
dtypes: float64(7), int64(1)
memory usage: 16.4 MB


We notice that only baseRent and baseRentRange have values for all 268,850 records. We will determine the proportion of missing values for each feature.

In [61]:
# To create a dataframe with  %age of values that are missing
missing_rent = pd.concat([rentDf.isnull().sum(), 100 * rentDf.isnull().mean()], axis = 1)
missing_rent.columns = ['count', '%']
missing_rent.sort_values(by = 'count')

Unnamed: 0,count,%
baseRent,0,0.0
baseRentRange,0,0.0
pricetrend,1832,0.681421
serviceCharge,6909,2.569834
totalRent,40517,15.070485
heatingCosts,183332,68.191185
electricityBasePrice,222004,82.575414
electricityKwhPrice,222004,82.575414


We notice that electricityBasePrice and electrictyKwhPrice are missing over 82% of their values, limiting their use.

Let's see the number of unique values associated with these features.

In [33]:
rentDf.nunique()

totalRent               28486
baseRent                26659
serviceCharge           12266
heatingCosts             5669
electricityBasePrice        2
electricityKwhPrice        15
baseRentRange               9
pricetrend               1234
dtype: int64

In [41]:
rentDf.electricityBasePrice.unique() # Distinct values associated with the electricityBasePrice feature

array([  nan, 90.76, 71.43])

In [42]:
rentDf.electricityKwhPrice.unique() # Distinct values associated with the electricityKwhPrice feature

array([   nan, 0.1915, 0.2055, 0.1985, 0.1845, 0.2265, 0.2137, 0.2074,
       0.1775, 0.1705, 0.2125, 0.2276, 0.2144, 0.2205, 0.2132, 0.2195])

The dataset owner describes electricityBasePrice as the "monthly base price for electricity in € (deprecated since Feb 2020)" and electricityKwhPrice as the "electricity price per kwh (deprecated since Feb 2020)". 

In [48]:
rentDf[~ rentDf['electricityBasePrice'].isnull()]

Unnamed: 0,totalRent,baseRent,serviceCharge,heatingCosts,electricityBasePrice,electricityKwhPrice,baseRentRange,pricetrend
15,300.00,220.00,80.00,,90.76,0.1915,1,1.67
17,325.00,200.00,50.00,75.00,90.76,0.2055,1,1.96
30,970.00,800.00,170.00,,90.76,0.1985,5,6.49
31,445.18,315.18,130.00,,90.76,0.1845,2,3.90
41,,333.20,196.80,,90.76,0.1985,2,1.82
...,...,...,...,...,...,...,...,...
268834,610.00,440.00,170.00,,90.76,0.2055,3,1.92
268835,764.82,509.88,,,90.76,0.1985,4,3.57
268839,1980.00,1780.00,100.00,100.00,90.76,0.1985,8,7.43
268840,1479.64,1255.38,112.13,112.13,90.76,0.1985,7,6.90
