# Construction of Building Typologies from a Regional Material Catalog:
## Assessment of Urban Heat Demand and the Environmental Impact of Retrofit policies.

# (a) Data retrieval

**Supplementary material to the paper with the same name**

[M. Esteban Munoz H.](emunozh@gmail.com)

Thu Mar 12, 2015

#Table of Contents
* [1. Retrive the data from the internet](#1.-Retrive-the-data-from-the-internet)
* [2. Parse html data](#2.-Parse-html-data)
* [3. Material characteristics](#3.-Material-characteristics)
* [4. Store the data on a HDF5 file](#4.-Store-the-data-on-a-HDF5-file)


In this notebook we present the used algorithm to download and process the data from <a name="ref-1"/>[(Klauß, Kirchhof and Gissel, 2009)](#cite-Klauss.2009), for a description of the method behind this data set see <a name="ref-2"/>[(Klauß, Kirchhof and Gissel, 2009b)](#cite-Klauss.2009b).

# 1. Retrive the data from the internet

Unfortunately the data form the regional material catalog is not available, therefor we have to extract this information directly from the website. First we download the raw html files containing the information from all the different building components, in a second step we process each html file in order to extract the desire data. Because the individual links to the building components do not contain a coherent format we have to download around 500 files and drop empty html files. All the files are downloaded from: http://altbauatlas.de/index.php?suche=1.

In [1]:
# talk to my os
import os
# used to download web pages
from urllib import request
# display html content on the ipython notebook
from IPython.display import HTML
# internal libraries
from scripts.fetchData import getData

In order to parse the html data we download a single web, we can view the structure of this page and design an algorithm to download and process the data on this web-page.  

In [2]:
test_url = "http://altbauatlas.de/datenblatt.php?id=187"
dat_path = os.path.join(os.getcwd(), "html/187.html")
request.urlretrieve(test_url, dat_path)

('/home/esteban/workspace/github/RegionalMaterialTypologies/html/187.html',
 <http.client.HTTPMessage at 0x7f49c3b5bb00>)

In [3]:
#!cat ./html/232.html # In case you need to view the raw html file

In [4]:
HTML("<iframe src={} width=900 height=350></iframe>".format(test_url))

In [5]:
for i in range(1500):
    test_url = "http://altbauatlas.de/datenblatt.php?id={}".format(i)
    dat_path = os.path.join(os.getcwd(), "html/{}.html".format(i))
    if not os.path.isfile(dat_path):
        request.urlretrieve(test_url, dat_path)

# 2. Parse html data

In [6]:
material_table = getData()

In [7]:
material_table.head()

Unnamed: 0,Construction,File,Location,Name,Source,Type,Uval,Year
0,massiv,849.html,48,"Fachdach, massiv, Stahbeton, geringe Dämmung, ...","Typologie Münster, DIN 4108:1960-05,",Flachdach,"[0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.9...",1958 bis 1978
1,massiv,445.html,45,"Fachdach, massiv, Stahbeton, Dämmung, Warmdach","Typologie Essen, DIN 4108:1969-08,",Flachdach,[0.51],1969 bis 1978
2,Holzkonstruktion,762.html,36,"Steidach, Hozbauweise, Dämmung","Typologie Bad Hersfeld, DIN 4108-4:1981-08",Steildach,[0.35],1984 bis 1994
3,Holzkonstruktion,886.html,26,"Steidach, Hozbauweise, Dämmung, Gipskartonpatte","BMBau:1985, DIN 4108-4:1981-08",Steildach,[0.42],1979 bis 1983
4,"massiv, zweischalig",941.html,48,"Außenwand, massiv, Kaksand-Lochstein, Dämmung,...","Haustypologie Münster, Nikolic:1977, DIN 4108:...",Außenwand,"[0.86, 0.87, 0.88, 0.89, 0.9, 0.91, 0.92, 0.93...",1979 bis 1983


I found **779** building components on a 1499 sample.

# 3. Material characteristics

In [8]:
from scripts.fetchData import getHtmlData, a2
from scripts.fetchData import getMaterialTable

In [9]:
example_file = "365.html"

In [10]:
html_data = getHtmlData(example_file)

In [11]:
p0 = html_data.index([a for a in html_data if "Material" in str(a)][0])
p1 = p0 + 5

In [12]:
table_html = " ".join(html_data[p0:p1]).replace(a2, "")#.replace(sb, "")
HTML(table_html)

0,1,2,3
Material,Stärke [cm],Rohdichte [kg/m³],λ-Wert [W/(mK)]
Innenputz,10,-,07
Bimsbetonhohlblockstein Hbl 25,240,1000,044
Bimsbetonhohlblockstein Hbl 25,240,1400,056
Außenputz,15,-,087


In [13]:
getMaterialTable(html_data)

Unnamed: 0,Material,Width,Density,Conductivity
0,Innenputz,1.0,-,0.7
1,Bimsbetonhohlblockstein Hbl 25,24.0,1000,0.44
2,Bimsbetonhohlblockstein Hbl 25,24.0,1400,0.56
3,Außenputz,1.5,-,0.87


In [14]:
material_table[material_table.File == example_file]

Unnamed: 0,Construction,File,Location,Name,Source,Type,Uval,Year
172,"massiv, monolithisch",365.html,40,"Außenwand, massiv, Bimshohbockstein","Typologie Düsseldorf, DIN 4108:1952-07",Außenwand,"[1.34, 1.59]",1949 bis 1957


# 4. Store the data on a HDF5 file

In [20]:
from pandas import HDFStore
import pandas as pd

see http://pandas.pydata.org/pandas-docs/stable/io.html#io-hdf5 for more information about HDF5 files.

In [16]:
store = HDFStore('materials.h5')

In [None]:
store['elements'] = material_table

In [None]:
P = []
for html_file in material_table.File.tolist():
    file_name = "table_"+html_file.split(".")[0]
    html_data = getHtmlData(html_file)
    element_materials = getMaterialTable(html_data)
    store[file_name] = element_materials
    for a,b,c in zip(element_materials.Material.tolist(),
                     element_materials.Density.tolist(),
                     element_materials.Conductivity.tolist()):
        if (a,b,c) not in P:
            P.append((a,b,c))

In [None]:
P = pd.DataFrame(P, columns=["Material", "Density", "Conductivity"])
store["materials"] = P

In [22]:
if store.is_open: store.close()
print(store.is_open)

False


#References

<a name="cite-Klauss.2009"/><sup>[^](#ref-1) </sup>Klauß, Swen and Kirchhof, Wiebke and Gissel, Johanna. 2009. _Katalog regionaltypischer Materialien im Geb\"{audebestand mit Bezug auf die Baualtersklasse und Ableitung typischer Bauteilaufbauten: 2., berichtigte Version_.

<a name="cite-Klauss.2009b"/><sup>[^](#ref-2) </sup>Klauß, Swen and Kirchhof, Wiebke and Gissel, Johanna. 2009b. _Erfassung regionaltypischer Materialien im Geb\"{audebestand mit Bezug auf die Baualtersklasse und Ableitung typischer Bauteilaufbauten_.

