# Pobranie danych o apartamentach wakacyjnych w Chorwacji i nałożenie ich lokalizacji na Google Maps

Pytania bez odpowiedzi :
- Dla wybranego rejonu (obszaru w mieście), jak znaleźć apartamenty ?
- Gdzie znajdują się interesujące apartamenty, których adresy mogę pobrać ze stron informacji turystycznej ?
- Jakie są namiary (email, telefon) do interesujących apartamentów ?

Zadanie :
- Pobrać ze stron informacji turystycznej w Rovinj listę apartamentów i na podstawie podanych współrzędnych geograficznych (bądź adresu), stworzyć własną mapę w google maps z zaznaczonymi apartamentami.

### Wykorzystane :
### - Pandas  https://en.wikipedia.org/wiki/Pandas_%28software%29
### - Web scraping z Selenium https://en.wikipedia.org/wiki/Selenium_(software)
### - Tworzenie pliku KML do importu do Google Maps https://pl.wikipedia.org/wiki/Keyhole_Markup_Language

In [1]:
# Prepare defaults (save all files locally) 
from selenium.webdriver.firefox.options import FirefoxProfile
fxProfile = FirefoxProfile()
#fxProfile.set_preference("browser.download.folderList", 2)
fxProfile.set_preference("browser.download.manager.showWhenStarting", False)
#fxProfile.set_preference("browser.download.dir","c:\\tmp")
fxProfile.set_preference("browser.helperApps.alwaysAsk.force", False)
fxProfile.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/octet-stream;application/msword;application/vnd.openxmlformats-officedocument.wordprocessingml.document;application/vnd.ms-excel;application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")
fxProfile.set_preference('network.proxy.Kind','Direct')

In [2]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select

In [3]:
import os
import time
folder = 'C:\\Users\\mw\\Downloads\\'
driver=webdriver.Firefox(firefox_profile=fxProfile)

In [145]:
appts = []
print ('Starting...')
for page in range(32) :
    driver.get("http://www.tzgrovinj.hr/page/accommodation/private-accommodation/p/"+str(page))
    print("Processing %s", (driver.current_url))
    for i_accom in driver.find_elements_by_xpath('//div[@class=\"text-opis-kategorije accomodations\"]'):
    #    print(i_accom.text)

        accom = i_accom.find_element_by_xpath('.//h4').text
        url = i_accom.find_element_by_xpath('.//p[1]/a[1]').get_property('href')
        try :    
            gps = i_accom.find_element_by_xpath('.//a[@class=\"iframe\"]').get_property('href')
            gps = gps.split('/')[-2]+','+gps.split('/')[-1]
        except : gps = ''
        appts.append([accom, url, gps])
print ('Completed.')

Starting...
Processing %s http://www.tzgrovinj.hr/page/accommodation/private-accommodation/p/0
Processing %s http://www.tzgrovinj.hr/page/accommodation/private-accommodation/p/1
Processing %s http://www.tzgrovinj.hr/page/accommodation/private-accommodation/p/2
Processing %s http://www.tzgrovinj.hr/page/accommodation/private-accommodation/p/3
Processing %s http://www.tzgrovinj.hr/page/accommodation/private-accommodation/p/4
Processing %s http://www.tzgrovinj.hr/page/accommodation/private-accommodation/p/5
Processing %s http://www.tzgrovinj.hr/page/accommodation/private-accommodation/p/6
Processing %s http://www.tzgrovinj.hr/page/accommodation/private-accommodation/p/7
Processing %s http://www.tzgrovinj.hr/page/accommodation/private-accommodation/p/8
Processing %s http://www.tzgrovinj.hr/page/accommodation/private-accommodation/p/9
Processing %s http://www.tzgrovinj.hr/page/accommodation/private-accommodation/p/10
Processing %s http://www.tzgrovinj.hr/page/accommodation/private-accommoda

In [144]:
appts

[['Villa Laiki',
  'http://www.tzgrovinj.hr/page/accommodation/private-accommodation/detail/13866',
  '45.1062425418872,13.71516466140747'],
 ['Casa Caterina',
  'http://www.tzgrovinj.hr/page/accommodation/private-accommodation/detail/15479',
  ''],
 ['Nataša',
  'http://www.tzgrovinj.hr/page/accommodation/private-accommodation/detail/3583',
  '45.07391459332291,13.64652156829834'],
 ['Apartman Benjamin',
  'http://www.tzgrovinj.hr/page/accommodation/private-accommodation/detail/16787',
  '45.06472860036322,13.689265251159668'],
 ['Sara',
  'http://www.tzgrovinj.hr/page/accommodation/private-accommodation/detail/3862',
  ''],
 ['Alida Apartments & Rooms',
  'http://www.tzgrovinj.hr/page/accommodation/private-accommodation/detail/16128',
  '45.08883158585556,13.64178264144357'],
 ['Medar Nadia',
  'http://www.tzgrovinj.hr/page/accommodation/private-accommodation/detail/15685',
  ''],
 ['St. Eufemia',
  'http://www.tzgrovinj.hr/page/accommodation/private-accommodation/detail/5110',
  '']

In [191]:
f = open('Rovinj_appts.kml', 'w', encoding='utf-8')
f.write("<?xml version='1.0' encoding='UTF-8'?>\n")
f.write("<kml xmlns='http://earth.google.com/kml/2.1'>\n")
f.write("<Document>\n")
f.write("   <name>Rovinj_appts.kml</name>\n")
for loc in appts:
    if loc[2][:3]=='45.' :
        g = loc[2].split(',')
        g = g[1][:17] + ',' + g[0][:17]
        f.write("   <Placemark>\n")
        f.write("       <name>" + loc[0].replace('&','&amp;') + "</name>\n")
    #    f.write("       <description>" + loc[0] + "<br><a href>" + loc[1] + "</description>\n")
    #    locs = ""
    #    for loc in row['Locs']:      
    #        locs = locs + ",".join(filter(None, loc)) + "<br>\n" 
        f.write("       <description><![CDATA[" + 
    #            str(row['smAddress']) + "<br>\n" + 
                loc[1] +
    #            "<a href=" + loc[1] + ">" + loc[1] + "</a>" + 
                "]]></description>\n")
        f.write("       <Point>\n")
        f.write("           <coordinates>" + g + "</coordinates>\n")
        f.write("       </Point>\n")
        f.write("   </Placemark>\n")
f.write("</Document>\n")
f.write("</kml>\n")
f.close()