If I'm going to find a flat, I might as well do it with style. This is going to be an aiding application where I don't need to go through the process of narrowing my search everytime. My current idea is:
* scrape [immobilienscount24 website](https://www.immobilienscout24.de/) based on keywords (neighborhood, price, etc.)
* return address and summary translated from german to english using the [translate tool](https://github.com/terryyin/translate-python) which is integrated with Microsoft API (why did I think it was google?)
* (optional) maybe also return sample pic? if this is possible!
* list by cheapest (I'm a cheapstake :/)
* get average rent of each area and compare rent status
* create map based on address
    * if possible...add where my company is (company location in big fat red) and the nearest transportation

In [189]:
# import libraries
import pandas as pd
import numpy as np
import translate
import requests
from bs4 import BeautifulSoup
from geopy.geocoders import Nominatim
from time import sleep
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
from translate import Translator

%matplotlib inline

In [10]:
# set params
nb = "Mitte-Mitte/" #neighborhood
prices = "EURO--1000,00/" #note that price decimal is ',' in Europe
rooms = "-1,00/" #again, the decimal

set_url = "https://www.immobilienscout24.de/Suche/S-T/"
set_url2 = "Wohnung-Miete/Berlin/Berlin/"

for i in range(1,5):
    if i == 1:
        url = set_url + set_url2 + nb + rooms + prices
    else:
        url = set_url + "P-" + str(i) + "/" + set_url2 + nb + rooms + prices

In [16]:
url = set_url + set_url2 + nb + rooms + prices

'https://www.immobilienscout24.de/Suche/S-T/Wohnung-Miete/Berlin/Berlin/Mitte-Mitte/-1,00/EURO--1000,00/'

In [21]:
soup = BeautifulSoup(requests.get(url).content, "lxml")

In [62]:
# translation to english
translator = Translator(to_lang = 'en', from_lang = 'de')

In [208]:
# get listing title and link to listing

# dataframe of listings
listings = pd.DataFrame(columns = ['title_de', 
                                   'address','cold_rent',
                                   'room_size','num_rooms','other_att',
                                   'link'])
listings.other_att = listings.other_att.astype(object)

link_base = "https://www.immobilienscout24.de"

rows = soup.find_all('div', {'class': 'result-list-entry__data'})
ind = 0

for r in rows:
    link = link_base + r.find('a', href = True)['href']
    title = r.find('a', href = True).find('h5').text
    address = r.find('div', {'class': 'result-list-entry__address'}).find('a').text
    
    primary_att = [dl.text for dl in r.find('div', {'class': 'grid'})]
    secondary_att = [i.text for i in r.find('div', {'class': 'result-list-entry__secondary-criteria-container'}).find_all('li')]
    
    listings.loc[ind,'title_de'] = title
    listings.loc[ind,'link'] = link
    listings.loc[ind,'address'] = address
    listings.loc[ind,'cold_rent'] = primary_att[0]
    listings.loc[ind,'room_size'] = primary_att[1]
    listings.loc[ind,'num_rooms'] = primary_att[2]
    listings.loc[ind,'other_att'] = ','.join(secondary_att)
    
    ind = ind + 1

In [209]:
listings.head(2)

Unnamed: 0,title_de,address,cold_rent,room_size,num_rooms,other_att,link
0,sophienstraße 1 zi-apt. mit südterrasse und ko...,"Mitte (Mitte), Berlin",1.500 €Kaltmiete,39 m²Wohnfläche,1 Zi.1Zi.,"Balkon/Terrasse,Einbauküche,Keller",https://www.immobilienscout24.de/expose/104969872
1,sophienstraße 1 zi-apt. mit südterrasse und ko...,"Mitte (Mitte), Berlin",1.500 €Kaltmiete,39 m²Wohnfläche,1 Zi.1Zi.,"Balkon/Terrasse,Einbauküche,Keller",https://www.immobilienscout24.de/expose/105923967


In [390]:
# cleaning up the dataframe

# get rid of wordings in cold rent, size, and num_rooms
# also change , -> . and . -> ,
# that is European numeric writing and unless I change it,
# I'm going to be wondering why there is a cheap room at 1.5 euro
def text_to_num(text):
    return float(text.split(' ')[0].replace('.','').replace(',','.'))

listings.cold_rent = [text_to_num(i) for i in listings.cold_rent]
listings.room_size = [text_to_num(i) for i in listings.room_size]
listings.num_rooms = [text_to_num(i) for i in listings.num_rooms]

In [391]:
# translate German
en_trans = Translator(to_lang = 'en', from_lang = 'de')

listings['title_en'] = [en_trans.translate(i) for i in listings.title_de]
listings['other_att_en'] = [en_trans.translate(i) for i in listings.other_att]

In [392]:
# collect coordinates (lon, lat) using geolocator
geolocator = Nominatim()

# use sleep method to ensure that Nominatim policy is followed
# which is 1 query per 1 sec
lat = []
lon = []
for i in listings.address:
    location = geolocator.geocode(i)
    lat.append(location.latitude)
    lon.append(location.longitude)
    sleep(1)

listings['address_lat'] = lat
listings['address_lon'] = lon

timeout: The read operation timed out

In [367]:
# getting more specific information from each page
# using the link scraped :)
def get_detailed_info(soup,class_name):
    td = dict()
    for d in soup.find('div', {'class': class_name}).find_all('dl'):
        td[d.find('dt').text] = d.find('dd').text
    return td

temp_df = pd.DataFrame()

for l in listings.link:
    ind_soup = BeautifulSoup(requests.get(l).content, 'lxml')
    dict_general = get_detailed_info(ind_soup,'criteriagroup criteria-group--two-columns')
    dict_costs_1 = get_detailed_info(ind_soup, 'grid-item lap-one-half desk-one-half padding-right-s')
    dict_costs_2 = get_detailed_info(ind_soup, 'grid-item lap-one-half desk-one-half padding-left-s')
    dict_building_energy = get_detailed_info(ind_soup, 'criteriagroup criteria-group--border criteria-group--two-columns criteria-group--spacing')
    
    info_dict = dict()
    info_dict.update(dict_general)
    info_dict.update(dict_costs_1)
    info_dict.update(dict_costs_2)
    info_dict.update(dict_building_energy)
    
    other_text = ind_soup.find('div', {'class': 'grid-item padding-desk-right-xl desk-two-thirds lap-one-whole desk-column-left flex-item palm--flex__order--1 lap--flex__order--1'})
    titles = []
    for h in other_text.find_all('h4'):
        if 'is24qa' in h['class'][0]:
            titles.append(h.text)
    texts = []
    for t in other_text.find_all('div', {'class': 'is24-text'}):
        texts.append(t.text)
    for i in range(len(titles)):
        info_dict[titles[i]] = texts[i]
    
    temp_df = pd.concat([temp_df,pd.DataFrame(pd.Series(info_dict)).transpose()])
    
    
temp_df.reset_index(drop = True, inplace = True)    

In [393]:
# concat with listings info
listings_full = pd.concat([listings,temp_df], axis = 1)

In [394]:
listings_full.head()

Unnamed: 0,title_de,address,cold_rent,room_size,num_rooms,other_att,link,title_en,other_att_en,Ausstattung,...,Nutzfläche ca.,Objektbeschreibung,Objektzustand,Schlafzimmer,Sonstiges,Typ,Umzugskosten,Wesentliche Energieträger,Wohnfläche ca.,Zimmer
0,sophienstraße 1 zi-apt. mit südterrasse und ko...,"Mitte (Mitte), Berlin",1500.0,39.0,1.0,"Balkon/Terrasse,Einbauküche,Keller",https://www.immobilienscout24.de/expose/104969872,Sophienstraße 1 zi-apt. with south facing terr...,"Balcony / terrace, fitted kitchen, cellar",Luxus,...,,"in berlin mitte,\r\nin einer der allerschönst...",Erstbezug nach Sanierung,1.0,,Etagenwohnung,Berechnung starten,Gas,39 m²,1
1,sophienstraße 1 zi-apt. mit südterrasse und ko...,"Mitte (Mitte), Berlin",1500.0,39.0,1.0,"Balkon/Terrasse,Einbauküche,Keller",https://www.immobilienscout24.de/expose/105923967,Sophienstraße 1 zi-apt. with south facing terr...,"Balcony / terrace, fitted kitchen, cellar",Luxus,...,,"in berlin mitte,\r\nin einer der allerschönst...",Erstbezug nach Sanierung,1.0,,Etagenwohnung,Berechnung starten,Gas,39 m²,1
2,NEUERSTBEZUG NACH KOMPLETTER TOP-SANIERUNG!!! ...,"Ackerstr. 156, Mitte (Mitte), Berlin",750.0,38.29,1.0,"Balkon/Terrasse,Einbauküche,Keller",https://www.immobilienscout24.de/expose/106342992,NEWEST STAY AFTER COMPLETE TOP-RENOVATION !!! ...,"Balcony / terrace, fitted kitchen, cellar",Diese traumhafte 1-Zimmer-Wohnung wird als Er...,...,,"VORAB MÖCHTEN WIR SIE HÖFLICH BITTEN, IHRE AN...",Erstbezug nach Sanierung,,Die Ackerstraße gehört heute zu den gefragtes...,Etagenwohnung,Berechnung starten,Erdgas leicht,"38,29 m²",1
3,"Erstbezug nach umfassender Modernisierung, 1-Z...","Holzmarktstraße 64, Mitte (Mitte), Berlin",845.0,42.6,1.0,"Einbauküche,Aufzug",https://www.immobilienscout24.de/expose/104322026,First time use after comprehensive modernizati...,"Fitted, Elevator","Allgemein:\n- Kabel-TV, Grundversorgung\n- Wa...",...,,Allgemein:\n- Wärmedämmung\n\nExtras:\n- Trep...,Modernisiert,,,Etagenwohnung,Berechnung starten,Fernwärme,"42,6 m²",1
4,Möbiliertes Appartement in erstklassiger Lage ...,"Mitte (Mitte), Berlin",890.0,42.54,1.0,"Balkon/Terrasse,Einbauküche,Aufzug",https://www.immobilienscout24.de/expose/105773236,Furnished apartment in a prime location - visi...,"Balcony / terrace, fitted kitchen, lift",,...,,,Neuwertig,1.0,,Etagenwohnung,Berechnung starten,Erdgas leicht,"42,54 m²",1


# All in one