# Project Concrete 

_Project Concrete_ aims to correlate, visualize and predict real estate prices based on the relationship between various
factors such as unemployment rate, disposable income and population growth in Austria.

Data Sources used for this endeavour
- [Unemployment Rates (Source: data.gv.at)](https://www.data.gv.at/katalog/dataset/CFE2FF7E9AD53C1EE053C630070AB105)
- !NOT IN USE! [Real Estate Prices (Source: statistik.at)](https://www.statistik.at/web_de/statistiken/wirtschaft/preise/immobilien_durchschnittspreise/index.html)
- [Real Estate Prices Vienna (Source: data.gv.at)](https://www.data.gv.at/katalog/dataset/kaufpreissammlung-liegenschaften-wien/resource/7b9bdd2d-2ff0-4e6e-bba5-21483d8cf55b)
- !NOT IN USE! [Disposable Income  (Source: statistik.at)](https://www.statistik.at/web_de/statistiken/menschen_und_gesellschaft/soziales/haushalts-einkommen/index.html)
- [Net Income Vienna (Source: statistik.at)](https://www.data.gv.at/katalog/dataset/d76c0e8b-c599-4700-8a88-29d0d87e563d)
- [Population Growth (Source: data.gv.at)](https://www.data.gv.at/katalog/dataset/f5f823c1-631b-35bd-abed-1442a7cb52a2)

Planned execution steps:
- Data aggregation and collection (see Planned Data Sources)
- Data preperation and import into MongoDB
- Analysis of data via Jupyter Notebook
- Presentation and visualization of results (i.e. map of Austria/Vienna, etc.)


### Install pymongo

In [1]:
!pip install pymongo

Collecting pymongo
  Downloading pymongo-4.1.1-cp310-cp310-macosx_10_9_universal2.whl (364 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m364.9/364.9 KB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: pymongo
Successfully installed pymongo-4.1.1


### Connect to DB via connection string

In [3]:
import pymongo
from pymongo import MongoClient
import pandas as pd

# Provide the mongodb atlas url to connect python to mongodb using pymongo
CONNECTION_STRING = "mongodb://127.0.0.1:27017"

# Create a connection using MongoClient
myclient = MongoClient(CONNECTION_STRING)

print("Connection Successful")

# Check DB data
for db in myclient.list_databases():
    print(db)

Connection Successful
{'name': 'admin', 'sizeOnDisk': 8192, 'empty': False}
{'name': 'config', 'sizeOnDisk': 12288, 'empty': False}
{'name': 'local', 'sizeOnDisk': 8192, 'empty': False}


In [7]:
#import unemployment CSV from 1946
unemploy_df = pd.read_csv('ub_al_alq_os.csv', sep = ";", decimal=',')
unemploy_df

Unnamed: 0,Jahr,UnselbstBesch,ArbeitslosVorgemerkte,Arbeitslosenquote,OffeneStellen,Unnamed: 5
0,1946,1760000,74000,4.034896,140067,
1,1947,1900000,52700,2.698827,102810,
2,1948,1926700,54500,2.750858,45334,
3,1949,1944700,99900,4.886041,35724,
4,1950,1946886,128745,6.202692,25187,
...,...,...,...,...,...,...
71,2017,3655297,339976,8.509456,56854,
72,2018,3741484,312107,7.699519,71545,
73,2019,3797304,301328,7.351916,77093,
74,2020,3717164,409639,9.926304,62833,


In [12]:
#import population growth CSV from 1974
pop_growth_df = pd.read_csv('OGD_ake003j_AKEZR_1.csv', sep = ";", decimal=',')
pop_growth_df = pop_growth_df.rename(columns={'C-A10-0': 'YEAR', 'F-ISIS-1': 'PERSON_IN_THOUSAND'})
pop_growth_df.YEAR = pop_growth_df.YEAR.str.replace('A10-','',regex = True)
pop_growth_df

Unnamed: 0,YEAR,PERSON_IN_THOUSAND
0,1974,7519
1,1975,7501
2,1976,7484
3,1977,7485
4,1978,7482
5,1979,7467
6,1980,7465
7,1981,7481
8,1982,7493
9,1983,7479


In [16]:
#import real estate prices vienna from 1990
real_estate_vienna_df = pd.read_csv('kaufpreissammlung-liegenschaften.csv', sep = ";", decimal=',', encoding='latin-1')
real_estate_vienna_df

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,KG.Code,Katastralgemeinde,EZ,PLZ,Straße,ON,Gst.,Gst.Fl.,ErwArt,Erwerbsdatum,...,Baureifgest,% Widmung,Baurecht,Bis,auf EZ,Stammeinlage,sonst_wid,sonst_wid_prz,ber. Kaufpreis,Bauzins
0,1617,Strebersdorf,1417.0,1210.0,Mühlweg,13,752/16,755.0,Kaufvertrag,13.10.2021,...,FALSCH,100.0,FALSCH,,,FALSCH,,,750000.0,
1,1607,Groß Jedlersdorf II,193.0,1210.0,Bahnsteggasse,4,408,510.0,Kaufvertrag,13.09.2021,...,FALSCH,100.0,FALSCH,,,FALSCH,,,1250000.0,
2,1209,Ober St.Veit,3570.0,1130.0,Jennerplatz,34/20,938/3,456.0,Kaufvertrag,10.08.2021,...,FALSCH,100.0,FALSCH,,,FALSCH,,,480000.0,
3,1207,Lainz,405.0,1130.0,Sebastian-Brunner-Gasse,6,8/23,523.0,Kaufvertrag,30.12.2020,...,FALSCH,100.0,FALSCH,,,FALSCH,,,1600000.0,
4,1101,Favoriten,3831.0,1100.0,Laxenburger Straße,2C -2 D,2044/19,12768.0,Kaufvertrag,04.11.2020,...,FALSCH,30.0,FALSCH,,,FALSCH,"W V 22 g , Wi g","40 ,30",15000000.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
57907,1204,Hadersdorf,1057.0,1140.0,Laskywiesengasse,10,889,1313.0,Kaufvertrag,18.08.2021,...,FALSCH,100.0,FALSCH,,,FALSCH,,,950000.0,
57908,1204,Hadersdorf,1200.0,1140.0,Robert-Fuchs-Gasse,25-31,448,4003.0,Kaufvertrag,05.12.2018,...,FALSCH,100.0,FALSCH,,,FALSCH,,,5200000.0,
57909,1206,Hütteldorf,2760.0,1140.0,Ulmenstraße,48,1232/1,499.0,Kaufvertrag,15.06.2021,...,FALSCH,100.0,FALSCH,,,FALSCH,,,699000.0,
57910,1216,Weidlingau,5.0,1140.0,Hauptstraße,114,11/2,1649.0,Kaufvertrag,26.07.2021,...,FALSCH,100.0,FALSCH,,,FALSCH,,,1496000.0,


In [23]:
#import net income vienna from 2002
net_income_vienna_df = pd.read_csv('vie-bez-biz-ecn-inc-sex-2002f.csv', sep=';', decimal=',')
net_income_vienna_df

Unnamed: 0,NUTS,DISTRICT_CODE,SUB_DISTRICT_CODE,REF_YEAR,REF_DATE,INC_TOT_VALUE,INC_MAL_VALUE,INC_FEM_VALUE
0,AT13,90000,90000,2002,20021231,18.217,20.709,15.424
1,AT13,90100,90100,2002,20021231,25.463,31.961,18.536
2,AT13,90200,90200,2002,20021231,16.439,18.301,14.282
3,AT13,90300,90300,2002,20021231,18.701,21.444,15.804
4,AT13,90400,90400,2002,20021231,20.325,23.641,16.876
...,...,...,...,...,...,...,...,...
451,AT13,91900,91900,2020,20201231,29.195,34.102,24.257
452,AT13,92000,92000,2020,20201231,20.671,21.591,19.531
453,AT13,92100,92100,2020,20201231,24.061,26.202,21.693
454,AT13,92200,92200,2020,20201231,26.272,29.302,23.100


### Create collection
Loading dataframes into database

In [24]:
# In MongoDB collections and DBs are only created on first insert (reminiscent of Git folders)
db = myclient["immodb"]

# We insert the whole dateframes
# unemploymentData
db.unemploymentData.insert_many(unemploy_df.to_dict('records'))
# populationGrowth
db.populationGrowth.insert_many(pop_growth_df.to_dict('records'))
# realEstateVienna
db.realEstateVienna.insert_many(real_estate_vienna_df.to_dict('records'))
# netIncomeVienna
db.netIncomeVienna.insert_many(net_income_vienna_df.to_dict('records'))

#check if colletion has been created
print(db.list_collection_names())


['netIncomeVienna', 'unemploymentData', 'realEstateVienna', 'populationGrowth']


In [28]:
#check if collection has content

# unemploymentData
unemp_col = db["unemploymentData"]
y = unemp_col.find()

for data in y:
    print(data)

# populationGrowth
populationGrowth_col = db["populationGrowth"]
# y = populationGrowth_col.find()
#
# for data in y:
#     print(data)

# realEstateVienna
realEstateVienna_col = db["realEstateVienna"]
# y = realEstateVienna_col.find()
#
# for data in y:
#     print(data)

# netIncomeVienna
netIncomeVienna_col = db["netIncomeVienna"]
# y = netIncomeVienna_col.find()
#
# for data in y:
#     print(data)

{'_id': ObjectId('626163596daf38671d1eb1cf'), 'Jahr': 1946, 'UnselbstBesch': 1760000, 'ArbeitslosVorgemerkte': 74000, 'Arbeitslosenquote': 4.034896401308616, 'OffeneStellen': 140067, 'Unnamed: 5': nan}
{'_id': ObjectId('626163596daf38671d1eb1d0'), 'Jahr': 1947, 'UnselbstBesch': 1900000, 'ArbeitslosVorgemerkte': 52700, 'Arbeitslosenquote': 2.6988272648128238, 'OffeneStellen': 102810, 'Unnamed: 5': nan}
{'_id': ObjectId('626163596daf38671d1eb1d1'), 'Jahr': 1948, 'UnselbstBesch': 1926700, 'ArbeitslosVorgemerkte': 54500, 'Arbeitslosenquote': 2.750858065818696, 'OffeneStellen': 45334, 'Unnamed: 5': nan}
{'_id': ObjectId('626163596daf38671d1eb1d2'), 'Jahr': 1949, 'UnselbstBesch': 1944700, 'ArbeitslosVorgemerkte': 99900, 'Arbeitslosenquote': 4.886041279467867, 'OffeneStellen': 35724, 'Unnamed: 5': nan}
{'_id': ObjectId('626163596daf38671d1eb1d3'), 'Jahr': 1950, 'UnselbstBesch': 1946886, 'ArbeitslosVorgemerkte': 128745, 'Arbeitslosenquote': 6.202692097005683, 'OffeneStellen': 25187, 'Unnamed: 

In [115]:
#code for droping collection (tables) 
# db["unemploymentData"].drop()
# db["populationGrowth"].drop()
# db["realEstateVienna"].drop()
# db["netIncomeVienna"].drop()

### Load from database
Build new dataframe by reading from database

In [31]:
# Unemployment
unemployment_fromDb = pd.DataFrame(list(unemp_col.find()))
unemployment_fromDb = unemployment_fromDb.set_index("_id")
unemployment_fromDb

Unnamed: 0_level_0,Jahr,UnselbstBesch,ArbeitslosVorgemerkte,Arbeitslosenquote,OffeneStellen,Unnamed: 5
_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
626163596daf38671d1eb1cf,1946,1760000,74000,4.034896,140067,
626163596daf38671d1eb1d0,1947,1900000,52700,2.698827,102810,
626163596daf38671d1eb1d1,1948,1926700,54500,2.750858,45334,
626163596daf38671d1eb1d2,1949,1944700,99900,4.886041,35724,
626163596daf38671d1eb1d3,1950,1946886,128745,6.202692,25187,
...,...,...,...,...,...,...
62617cb56daf38671d1f9546,2017,3655297,339976,8.509456,56854,
62617cb56daf38671d1f9547,2018,3741484,312107,7.699519,71545,
62617cb56daf38671d1f9548,2019,3797304,301328,7.351916,77093,
62617cb56daf38671d1f9549,2020,3717164,409639,9.926304,62833,


In [32]:
#population growth
populationGrowth_fromDb = pd.DataFrame(list(populationGrowth_col.find()))
populationGrowth_fromDb = populationGrowth_fromDb.set_index("_id")
populationGrowth_fromDb

Unnamed: 0_level_0,YEAR,PERSON_IN_THOUSAND
_id,Unnamed: 1_level_1,Unnamed: 2_level_1
626163596daf38671d1eb21b,1974,7519
626163596daf38671d1eb21c,1975,7501
626163596daf38671d1eb21d,1976,7484
626163596daf38671d1eb21e,1977,7485
626163596daf38671d1eb21f,1978,7482
...,...,...
62617cb56daf38671d1f9576,2017,8646
62617cb56daf38671d1f9577,2018,8679
62617cb56daf38671d1f9578,2019,8717
62617cb56daf38671d1f9579,2020,8766


In [33]:
#real estate vienna
realEstateVienna_fromDb = pd.DataFrame(list(realEstateVienna_col.find()))
realEstateVienna_fromDb = realEstateVienna_fromDb.set_index("_id")
realEstateVienna_fromDb

Unnamed: 0_level_0,KG.Code,Katastralgemeinde,EZ,PLZ,Straße,ON,Gst.,Gst.Fl.,ErwArt,Erwerbsdatum,...,Baureifgest,% Widmung,Baurecht,Bis,auf EZ,Stammeinlage,sonst_wid,sonst_wid_prz,ber. Kaufpreis,Bauzins
_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
6261665e6daf38671d1eb2c7,1617,Strebersdorf,1417.0,1210.0,Mühlweg,13,752/16,755.0,Kaufvertrag,13.10.2021,...,FALSCH,100.0,FALSCH,,,FALSCH,,,750000.0,
6261665e6daf38671d1eb2c8,1607,Groß Jedlersdorf II,193.0,1210.0,Bahnsteggasse,4,408,510.0,Kaufvertrag,13.09.2021,...,FALSCH,100.0,FALSCH,,,FALSCH,,,1250000.0,
6261665e6daf38671d1eb2c9,1209,Ober St.Veit,3570.0,1130.0,Jennerplatz,34/20,938/3,456.0,Kaufvertrag,10.08.2021,...,FALSCH,100.0,FALSCH,,,FALSCH,,,480000.0,
6261665e6daf38671d1eb2ca,1207,Lainz,405.0,1130.0,Sebastian-Brunner-Gasse,6,8/23,523.0,Kaufvertrag,30.12.2020,...,FALSCH,100.0,FALSCH,,,FALSCH,,,1600000.0,
6261665e6daf38671d1eb2cb,1101,Favoriten,3831.0,1100.0,Laxenburger Straße,2C -2 D,2044/19,12768.0,Kaufvertrag,04.11.2020,...,FALSCH,30.0,FALSCH,,,FALSCH,"W V 22 g , Wi g","40 ,30",15000000.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
62617cb96daf38671d2077ae,1204,Hadersdorf,1057.0,1140.0,Laskywiesengasse,10,889,1313.0,Kaufvertrag,18.08.2021,...,FALSCH,100.0,FALSCH,,,FALSCH,,,950000.0,
62617cb96daf38671d2077af,1204,Hadersdorf,1200.0,1140.0,Robert-Fuchs-Gasse,25-31,448,4003.0,Kaufvertrag,05.12.2018,...,FALSCH,100.0,FALSCH,,,FALSCH,,,5200000.0,
62617cb96daf38671d2077b0,1206,Hütteldorf,2760.0,1140.0,Ulmenstraße,48,1232/1,499.0,Kaufvertrag,15.06.2021,...,FALSCH,100.0,FALSCH,,,FALSCH,,,699000.0,
62617cb96daf38671d2077b1,1216,Weidlingau,5.0,1140.0,Hauptstraße,114,11/2,1649.0,Kaufvertrag,26.07.2021,...,FALSCH,100.0,FALSCH,,,FALSCH,,,1496000.0,


In [34]:
# net income vienna
netIncomeVienna_fromDb = pd.DataFrame(list(realEstateVienna_col.find()))
netIncomeVienna_fromDb = netIncomeVienna_fromDb.set_index("_id")
netIncomeVienna_fromDb

Unnamed: 0_level_0,KG.Code,Katastralgemeinde,EZ,PLZ,Straße,ON,Gst.,Gst.Fl.,ErwArt,Erwerbsdatum,...,Baureifgest,% Widmung,Baurecht,Bis,auf EZ,Stammeinlage,sonst_wid,sonst_wid_prz,ber. Kaufpreis,Bauzins
_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
6261665e6daf38671d1eb2c7,1617,Strebersdorf,1417.0,1210.0,Mühlweg,13,752/16,755.0,Kaufvertrag,13.10.2021,...,FALSCH,100.0,FALSCH,,,FALSCH,,,750000.0,
6261665e6daf38671d1eb2c8,1607,Groß Jedlersdorf II,193.0,1210.0,Bahnsteggasse,4,408,510.0,Kaufvertrag,13.09.2021,...,FALSCH,100.0,FALSCH,,,FALSCH,,,1250000.0,
6261665e6daf38671d1eb2c9,1209,Ober St.Veit,3570.0,1130.0,Jennerplatz,34/20,938/3,456.0,Kaufvertrag,10.08.2021,...,FALSCH,100.0,FALSCH,,,FALSCH,,,480000.0,
6261665e6daf38671d1eb2ca,1207,Lainz,405.0,1130.0,Sebastian-Brunner-Gasse,6,8/23,523.0,Kaufvertrag,30.12.2020,...,FALSCH,100.0,FALSCH,,,FALSCH,,,1600000.0,
6261665e6daf38671d1eb2cb,1101,Favoriten,3831.0,1100.0,Laxenburger Straße,2C -2 D,2044/19,12768.0,Kaufvertrag,04.11.2020,...,FALSCH,30.0,FALSCH,,,FALSCH,"W V 22 g , Wi g","40 ,30",15000000.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
62617cb96daf38671d2077ae,1204,Hadersdorf,1057.0,1140.0,Laskywiesengasse,10,889,1313.0,Kaufvertrag,18.08.2021,...,FALSCH,100.0,FALSCH,,,FALSCH,,,950000.0,
62617cb96daf38671d2077af,1204,Hadersdorf,1200.0,1140.0,Robert-Fuchs-Gasse,25-31,448,4003.0,Kaufvertrag,05.12.2018,...,FALSCH,100.0,FALSCH,,,FALSCH,,,5200000.0,
62617cb96daf38671d2077b0,1206,Hütteldorf,2760.0,1140.0,Ulmenstraße,48,1232/1,499.0,Kaufvertrag,15.06.2021,...,FALSCH,100.0,FALSCH,,,FALSCH,,,699000.0,
62617cb96daf38671d2077b1,1216,Weidlingau,5.0,1140.0,Hauptstraße,114,11/2,1649.0,Kaufvertrag,26.07.2021,...,FALSCH,100.0,FALSCH,,,FALSCH,,,1496000.0,
