# Scraping Restaurant Links from RestaurantGuru

#### Purpose
This notebook is the **first step of the RestaurantGuru review scraping pipeline**.
It collects all restaurant profile URLs for a selected city from RestaurantGuru.

The extracted links serve as the input for the subsequent review scraping step.

---

#### What This Notebook Does
- Opens RestaurantGuru search result pages for a given city
- Iterates through all available result pages
- Extracts individual restaurant profile links
- Stores the collected URLs for later reuse

---

#### Role in the Project
This notebook defines the **scope of the dataset** by determining which restaurants
will be included in the review collection process.

---

#### Output
- A structured file (e.g. Pickle or list) containing restaurant URLs
- This output is used directly by:
  `scrape_restaurant_reviews_restaurantguru.ipynb`

---

#### Notes
- The city name can be changed in the configuration section
- Selenium and a compatible Chrome WebDriver are required
- Scraping speed is intentionally throttled to reduce the risk of blocking


IMPORTS

In [None]:
import pickle
import os
import selenium
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
import time
import random
from random import uniform
import pandas as pd


VARIABLEN DEFINIEREN

In [None]:
stadt = "Berlin"
dateipfad = r"\Outputs_Links\01"
dateiname = "Links_"+stadt+".csv"

LINKS VON RESTAURANTS IN STADT EXPORTIEREN

In [None]:
# Definiere den Pfad zum Chrome Webdriver
PATH = "C:\\Users\\ElianeTuchborn\\chromedriver.exe"

# Erstelle eine Instanz der Chrome-Optionen
options = Options()

# Setze einige Optionen, um den Browser menschenähnlicher zu machen
options.add_argument("--disable-blink-features=AutomationControlled")  # Verhindert die Erkennung von Automatisierung
options.add_argument("--disable-extensions")  # Deaktiviert Browsererweiterungen
options.add_argument("--disable-infobars")  # Deaktiviert Infobalken
options.add_argument("--disable-notifications")  # Deaktiviert Benachrichtigungen
options.add_argument("--disable-popup-blocking")  # Deaktiviert Pop-up-Blockierung
options.add_argument("--disable-web-security")  # Deaktiviert die Web-Sicherheitseinstellungen
options.add_argument("--disable-dev-shm-usage")  # Reduziert die Speichernutzung
options.add_argument("--disable-gpu")  # Deaktiviert die GPU-Nutzung
options.add_argument("--disable-features=VizDisplayCompositor")  # Deaktiviert den VizDisplayCompositor
options.add_argument("--window-size=1366,768")  # Legt die Fenstergröße fest

# Erstelle eine Instanz des Chrome Webdrivers mit den Optionen
driver = webdriver.Chrome(PATH, options=options)

#Cookie einladen 

# Öffne die Website
driver.get("https://de.restaurantguru.com/"+stadt+"#restaurant-list")

# Warte
time.sleep(uniform(3, 5))




# Scrolle bis ans Ende der Seite
current_height = driver.execute_script("return document.body.scrollHeight")
while True:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(uniform(3, 4))
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == current_height:
        break
    current_height = new_height

# Warte
time.sleep(uniform(2, 4))


# Text und Link extrahieren
links = driver.find_elements_by_css_selector('.restaurant_row[data-review-href]')


# Extrahierte Links in einen Datensatz speichern
link_list = [link.get_attribute('data-review-href') for link in links]
dflinks = pd.DataFrame(link_list, columns=["Link"])

# Datensatz anzeigen
print(dflinks)


#Datensatz als csv exportieren
dflinks.to_csv(dateipfad + dateiname, index=False)

#Rest-Variablen löschen
del link_list
del current_height
del links
del new_height
