In [2]:
from src.geoservices_bayern_scraping import building_plans, bounding_boxes

# Scraping Building Plans of Bayern

The [Geoportal of Bayern](https://geoportal.bayern.de/bauleitplanungsportal/karte.html) has an API that contains the direct links to different building plans across the region. The API takes as input a bounding box that will define the are over which the plans will be scraped. This code shows how to use two functions elaborated to scrape the entire API: one that defines the bounding boxes that will cover all of Bayern and one that efficiently downloads the information into csvs. 

## Step 1: Define the regions

- Define the bounding box for the regions of Bavaria you want to scrape. In this example, we put the bounding box for the entire region. 
- Adjust the parameter `sample_n` to the number of bounding boxes to use for the sample, or remove it to keep all of them. 

In [3]:
bavaria_bounding_box =  (4195669.333333333, 4998144, 4724053.333333333, 5766144)

bounding_boxes = bounding_boxes.generate_sub_bboxes(bounding_box = bavaria_bounding_box)

In [23]:
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import folium
from pyproj import Transformer


In [28]:
def convert_to_wgs84(bounding_boxes, src_crs='EPSG:31468'):
    transformer = Transformer.from_crs(src_crs, 'EPSG:4326', always_xy=True)
    converted_boxes = []
    for bbox in bounding_boxes:
        x_min, y_min, x_max, y_max = map(float, bbox.split(','))
        lon_min, lat_min = transformer.transform(x_min, y_min)
        lon_max, lat_max = transformer.transform(x_max, y_max)
        converted_boxes.append([lon_min, lat_min, lon_max, lat_max])
    return converted_boxes

def plot_bounding_boxes_on_map(bounding_boxes, src_crs='EPSG:31468'):
    # Convert bounding box coordinates to WGS 84
    bounding_boxes = convert_to_wgs84(bounding_boxes, src_crs)
    
    # Create a folium map centered at the average location of the bounding boxes
    avg_lat = sum([bbox[1] for bbox in bounding_boxes]) / len(bounding_boxes)
    avg_lon = sum([bbox[0] for bbox in bounding_boxes]) / len(bounding_boxes)
    m = folium.Map(location=[avg_lat, avg_lon], zoom_start=12)
    
    # Add bounding boxes to the map
    for bbox in bounding_boxes:
        bounds = [[bbox[1], bbox[0]], [bbox[3], bbox[2]]]
        folium.Rectangle(bounds=bounds, color='blue', fill=True, fill_opacity=0.2).add_to(m)
    
    # Display the map
    return m

In [29]:
map_with_bounding_boxes = plot_bounding_boxes_on_map(bounding_boxes)

In [30]:
map_with_bounding_boxes

## Step 2: Run the scraper

- Adjust `batch_size` for the number of items to be ran in each batch iteration.
- Adjust `batch_delay` to make a longer or shorter pause between batches. Helps in reducing the load on the server and avoids potential IP blocking or other restrictions due to rapid sequential requests.
- Adjust `max_retries` for number of times to try each download if failed. 
- Change name of output folder adjusting `output_folder`.

In [4]:
building_plans.scrape_in_batches(bounding_boxes,
                      batch_size = 100,
                      batch_delay = 30, 
                      max_retries = 5,
                      output_folder = 'data/raw/geoservices_results_sample')

100%|██████████| 100/100 [19:24<00:00, 11.65s/it]
100%|██████████| 100/100 [19:39<00:00, 11.79s/it]
100%|██████████| 100/100 [19:24<00:00, 11.65s/it]
