In [1]:
import pandas as pd
from geoservices_bayern_scraping import building_plans

# Scraping Building Plans of Bayern

The [Geoportal of Bayern](https://geoportal.bayern.de/bauleitplanungsportal/karte.html) has an API that contains the direct links to different building plans across the region. The API takes as input a bounding box that will define the are over which the plans will be scraped. This code shows how to use two functions elaborated to scrape the entire API: one that defines the bounding boxes that will cover all of Bayern and one that efficiently downloads the information into csvs. 

## Step 1: Load the data from each Stadt

In [2]:
gemeinden = pd.read_json('../data/proc/geobayern_bpsites/geobayern_bauleitsites.json', orient= "index")

In [4]:
gemeinden[gemeinden['name'] == 'Herrngiersdorf']


Unnamed: 0,name,ags,kreis,bezirk,boundingbox,midpoint,bauleit_urls
Herrngiersdorf,Herrngiersdorf,9273127,Kelheim,Niederbayern,"[722630.85, 5404128.13, 727950.62, 5411909.47]","[725290.73, 5408018.8]",[{'url': 'https://herrngiersdorf.de/bauen-lebe...


In [3]:
munich = gemeinden[gemeinden['name'] == 'München']

## Step 2: Run the scraper

- Adjust `batch_size` for the number of items to be ran in each batch iteration.
- Adjust `batch_delay` to make a longer or shorter pause between batches. Helps in reducing the load on the server and avoids potential IP blocking or other restrictions due to rapid sequential requests.
- Adjust `max_retries` for number of times to try each download if failed. 
- Change name of output folder adjusting `output_folder`.

In [4]:
building_plans.scrape_in_batches(munich,
                      batch_delay = 30, 
                      max_retries = 5,
                      output_folder = '../data/raw/geoservices_results_sample',
                      size_n=None)

Running batch 0 of 1 bounding boxes.


1it [00:11, 11.09s/it]
