# Project Concrete 

_Project Concrete_ aims to correlate, visualize and predict real estate prices based on the relationship between various
factors such as unemployment rate, disposable income and population growth in Austria.

Data Sources used for this endeavour
- [Unemployment Rates (Source: data.gv.at)](https://www.data.gv.at/katalog/dataset/CFE2FF7E9AD53C1EE053C630070AB105)
- [Real Estate Prices (Source: statistik.at)](https://www.statistik.at/web_de/statistiken/wirtschaft/preise/immobilien_durchschnittspreise/index.html)
- [Disposable Income  (Source: statistik.at)](https://www.statistik.at/web_de/statistiken/menschen_und_gesellschaft/soziales/haushalts-einkommen/index.html)
- [Population Growth (Source: statistik.at)](https://www.statistik.at/web_de/statistiken/menschen_und_gesellschaft/bevoelkerung/index.htm)

Planned execution steps:
- Data aggregation and collection (see Planned Data Sources)
- Data preperation and import into MongoDB
- Analysis of data via Jupyter Notebook
- Presentation and visualization of results (i.e. map of Austria, etc.)


### Install pymongo

In [2]:
!pip install pymongo



### Connect to DB via connection string

In [105]:
from pymongo import MongoClient
import pymongo
import pandas as pd

# Provide the mongodb atlas url to connect python to mongodb using pymongo
CONNECTION_STRING = "mongodb://127.0.0.1:27017"

# Create a connection using MongoClient
myclient = MongoClient(CONNECTION_STRING)

print("Connection Successful")

# Check DB data
for db in myclient.list_databases():
    print(db)

Connection Successful
{'name': 'admin', 'sizeOnDisk': 40960, 'empty': False}
{'name': 'config', 'sizeOnDisk': 110592, 'empty': False}
{'name': 'local', 'sizeOnDisk': 40960, 'empty': False}


In [106]:
df = pd.read_csv('ub_al_alq_os.csv', sep = ";", decimal=',')
df

Unnamed: 0,Jahr,UnselbstBesch,ArbeitslosVorgemerkte,Arbeitslosenquote,OffeneStellen,Unnamed: 5
0,1946,1760000,74000,4.034896,140067,
1,1947,1900000,52700,2.698827,102810,
2,1948,1926700,54500,2.750858,45334,
3,1949,1944700,99900,4.886041,35724,
4,1950,1946886,128745,6.202692,25187,
...,...,...,...,...,...,...
71,2017,3655297,339976,8.509456,56854,
72,2018,3741484,312107,7.699519,71545,
73,2019,3797304,301328,7.351916,77093,
74,2020,3717164,409639,9.926304,62833,


### Create collection 

In [117]:
# In MongoDB collections and DBs are only created on first insert (reminiscent of Git folders)
db = myclient["immodb"]
# We insert the whole dateframe with the unemploymentdata
db.unemploymentData.insert_many(df.to_dict('records'))

#check if colletion has been created
print(db.list_collection_names())


['unemploymentData']


In [118]:
#check if colletion has content

unemp_col = db["unemploymentData"]
y = unemp_col.find()
 
for data in y:
    print(data)

{'_id': ObjectId('624d8d4f3c4c66d63708e6f3'), 'Jahr': 1946, 'UnselbstBesch': 1760000, 'ArbeitslosVorgemerkte': 74000, 'Arbeitslosenquote': 4.034896401308616, 'OffeneStellen': 140067, 'Unnamed: 5': nan}
{'_id': ObjectId('624d8d4f3c4c66d63708e6f4'), 'Jahr': 1947, 'UnselbstBesch': 1900000, 'ArbeitslosVorgemerkte': 52700, 'Arbeitslosenquote': 2.6988272648128238, 'OffeneStellen': 102810, 'Unnamed: 5': nan}
{'_id': ObjectId('624d8d4f3c4c66d63708e6f5'), 'Jahr': 1948, 'UnselbstBesch': 1926700, 'ArbeitslosVorgemerkte': 54500, 'Arbeitslosenquote': 2.750858065818696, 'OffeneStellen': 45334, 'Unnamed: 5': nan}
{'_id': ObjectId('624d8d4f3c4c66d63708e6f6'), 'Jahr': 1949, 'UnselbstBesch': 1944700, 'ArbeitslosVorgemerkte': 99900, 'Arbeitslosenquote': 4.886041279467867, 'OffeneStellen': 35724, 'Unnamed: 5': nan}
{'_id': ObjectId('624d8d4f3c4c66d63708e6f7'), 'Jahr': 1950, 'UnselbstBesch': 1946886, 'ArbeitslosVorgemerkte': 128745, 'Arbeitslosenquote': 6.202692097005682, 'OffeneStellen': 25187, 'Unnamed: 

In [115]:
#code for droping collection (tables) 
# db["unemploymentData"].drop()

In [113]:
# Build new dataframe by reading from database
unemployment_df = pd.DataFrame(list(unemp_col.find()))
unemployment_df = unemployment_df.set_index("_id")
unemployment_df

Unnamed: 0_level_0,Jahr,UnselbstBesch,ArbeitslosVorgemerkte,Arbeitslosenquote,OffeneStellen,Unnamed: 5
_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
624d8c893c4c66d63708e6a7,1946,1760000,74000,4.034896,140067,
624d8c893c4c66d63708e6a8,1947,1900000,52700,2.698827,102810,
624d8c893c4c66d63708e6a9,1948,1926700,54500,2.750858,45334,
624d8c893c4c66d63708e6aa,1949,1944700,99900,4.886041,35724,
624d8c893c4c66d63708e6ab,1950,1946886,128745,6.202692,25187,
...,...,...,...,...,...,...
624d8c893c4c66d63708e6ee,2017,3655297,339976,8.509456,56854,
624d8c893c4c66d63708e6ef,2018,3741484,312107,7.699519,71545,
624d8c893c4c66d63708e6f0,2019,3797304,301328,7.351916,77093,
624d8c893c4c66d63708e6f1,2020,3717164,409639,9.926304,62833,
