# Venues Around Hotels in Yekaterinburg, Russia

### Capstone Project
by Eduard Meilakh

## Buisness Problem

Yekaterinburg is the largest city of Ural Federal District, Russia and is the administrative center of Sverdlovsk Region. There are a lot of hotels in Yekaterinburg. Meanwhile, market niches for venues near hotels are overflowing and highly competitive.

In [1]:
yekat_location = (56.83555556, 60.61277778) # latitude and longitude coordinates of Yekaterinburg

In [2]:
import folium
yekat_map = folium.Map(location=yekat_location, zoom_start=11)
label = "Yekaterinburg"
folium.CircleMarker(
        yekat_location,
        radius=5,
        popup=label,
        fill = True,
        fill_opacity=0.7).add_to(yekat_map)
yekat_map



The business problem which this project is devoted to is to choose a location for new venue near a hotel in Yekaterinburg. 
We can define the following sub-tasks:
* define a correlation between hotels' features (stars, rating) and types of venues around a hotel 
* compare venues near different hotels in Yekaterinburg (their density and deversity)
* choose hotels with the lowest density of venues near them
* give reccomendations for location and type of new venue.

## Data

We need the following data for our project:
* List of hotels in Yekaterinburg, their coordinates, stars, rating; these data may be scrapped from https://www.gogototour.com/ru/city/yekaterinburg/ and partly from Foursquare; 
* venues around each hotel, their types and ratings; these data may be obtained using Foursquare api.

In [3]:
import requests
from bs4 import BeautifulSoup
hotels_url = "https://www.gogototour.com/ru/city/yekaterinburg/sort/star/" # hotels of Yekaterinburg sorted by their ratings
hotels_list = set()
pages = ["",
        ]
for page_ind in range(2,6):
    pages.append("/page/"+str(page_ind)) # iterate over 5 pages on this site

for page in pages:
    cur_url = hotels_url+page
    hotels = requests.get(hotels_url)
    html = BeautifulSoup(hotels.text)
    for a_tag in html.find_all("a"): # looking for links to hotels
        try:
            href = a_tag["href"]
            name = a_tag["title"]
            if href.startswith("/ru/hotel") and not href.endswith("review/"):
                hotels_list.add(href)        
        except:
            pass
print("Urls of all hotels are scrapped!")

Urls of all hotels are scrapped!


In [6]:
hotel_url = "https://www.gogototour.com"        

hotel_features = {"names": [],
                   "lats": [],
                   "longs": [],
                   "stars": [],
                 }

for hotel in hotels_list: #scraping hotels' features: name, latitude, longitude, stars
    name = None
    lat = None
    long = None
    stars = None
    try:
        cur_url = hotel_url+hotel
        results = requests.get(cur_url)
        results = BeautifulSoup(results.text)
        name = results.find("h1", {"class":"hotel"}).text
        lat = results.find("meta", {"itemprop":"latitude"})["content"]
        long = results.find("meta", {"itemprop":"longitude"})["content"]
        stars = results.find("div", {"id":"ofstars"}).find("input", {"class":"val"})["value"]
    except:
        pass
    else:
        if name and lat and long: # if everything is okay append the dictionary with hotels data
            hotel_features["names"].append(name)
            hotel_features["lats"].append(lat)
            hotel_features["longs"].append(long)
            hotel_features["stars"].append(stars)

In [5]:
import pandas as pd
hotels_df = pd.DataFrame(hotel_features)
hotels_df

Unnamed: 0,names,lats,longs,stars
0,Atlaza City Residence,56.823979,60.63792,4
1,Onegin Hotel,56.828814,60.61389,4
2,Viz'avi Hotel,56.837612,60.548379,4
3,Vizit Hotel Yekaterinburg,56.890967,60.6120014,4
4,Ramada Yekaterinburg,56.77528,60.71783,5
5,Best Eastern Uralsky Dvor,56.827579,60.616588,4
6,Senator Business Hotel,56.841709,60.576708,4
7,Novotel Yekaterinburg Centre,56.8332,60.61362,4
8,Grand Hall Hotel,56.82864,60.5595703,4
9,Ural Hotel Yekaterinburg,56.841729,60.572492,4
