# Crawling Food Product Data from German Retailer Budni

**Goal**: extract details of food products through Budni's API.

Relevant API endpoints:

- `POST https://www.budni.de/api/content/articles/v2/search` - list/search products.
- `GET  https://www.budni.de/api/content/articles/v3/{articleId}?branchId={branchId}` - get product details.
- `GET  https://www.budni.de/api/infra/branches/{branchId}` - get Budni store details.

Important: the API requests must include the ID of a specific Budni store (`branchId`), e.g. `412101` for Bergstr. 16, 20095 Hamburg (Europa Passage).

In [1]:
import requests
import json
import time
import pandas as pd
import matplotlib.pyplot as plt
from tqdm.auto import tqdm

## Search API — Get List of Food Products

Goal: Get a list of all food products offered by Budni.

The most suitable API endpoint is `https://www.budni.de/api/content/articles/v2/search`. It expects a `POST` request with search parameters in the payload. The result set will be paginated.
- Request 200 results (the maximum allowed by the API)
- Filter on the category 'Genuss' to return only food products.

In [2]:
search_url = "https://www.budni.de/api/content/articles/v2/search"

# responses are limited to 200 returned items (hitsPerPage)
req_data = {
  "branchId": 412101,
  "query": "*",
  "page": 1,
  "hitsPerPage": 200,
  "filters": [
    {
      "name": "category",
      "values": [
        {
          "value": "Genuss",
          "type": "or",
          "exclude": False
        }
      ]
    }
  ],
  "sortItems": [],
  "identifier": "web"
}

In [3]:
def send_search_request(url, payload):
    try:
        response = requests.post(url, json=payload)
        response.raise_for_status()
        return response.json()
        
    except requests.exceptions.RequestException as req_err:
        print(f"Request failed: {req_err}")
    except HTTPError as http_err:
        print(f"HTTP error occurred: {http_err}")
    except JSONDecodeError:
        print("Cannot parse json response")

def show_result_stats(result: dict):
    print(f"result sections: {', '.join(result.keys())}")
    print(f"total results: {result['totalHits']}")
    print(f"results items: {len(result['hits'])}")
    print(f"result page: {result['paging']['currentPage']}")
    print(f"total pages: {result['paging']['pageCount']}")

In [4]:
result = send_search_request(search_url, req_data)
show_result_stats(result)

result sections: searchParams, totalHits, paging, hits, facets, sortItems
total results: 14787
results items: 200
result page: 1
total pages: 74


In [5]:
page_results = [ result ]

# fetch other pages
last_page = result['paging']['pageCount']
for page in tqdm(range(2, last_page + 1)):
    next_page_req = {**req_data, "page": page}
    page_results.append(send_search_request(search_url, next_page_req))
    time.sleep(0.2)

  0%|          | 0/73 [00:00<?, ?it/s]

In [6]:
# combine all results
data = [{'id': x['id'],} | x['masterValues'] for res in page_results for x in res['hits']]
len(data)

14787

### Show one result item

In [7]:
print(json.dumps(data[0], indent=4, ensure_ascii=False))

{
    "id": 2854279009,
    "ean": 4039057411558,
    "brand": "BIO GOURMET",
    "name": "Biog.Kürbisk.Öl kaltg.100ml",
    "isDiscountable": true,
    "hasBabyClubBonus": false,
    "category": {
        "level1": "Genuss",
        "level2": "Kochhilfen & Gewürze"
    },
    "labels": [],
    "description": "Biogourmet Kürbiskernöl geröstet und kaltgepresst aus Österreich 100ml",
    "tradeDescription": "Speiseöle",
    "searchAttributes": [
        "Kochhilfen & Gewürze",
        "Öle & Essig",
        "Öle",
        "Speiseöle",
        "BIO GOURMET"
    ],
    "price": 679,
    "images": [
        "https://budni-static.live.cellular.de/images/edeka/articles/DV019_4039057411558_VOR.png"
    ],
    "displayName": "Kürbiskernöl",
    "base": {
        "price": 6790,
        "contents": "1",
        "baseUnit": "L"
    },
    "contents": "0.1",
    "bonusPoints": {
        "bonusPoints": 679
    },
    "sustainable": {
        "attributes": [
            "SPECIES_DIVERSITY",
         

### Convert JSON result into Data Frame

In [8]:
products_df = pd.DataFrame.from_records(data)
print(f"Retrieved {products_df.shape[0]} results.")

Retrieved 14787 results.


In [9]:
products_df.drop(columns=['isDiscountable', 'hasBabyClubBonus', 'labels', 'bonusPoints']).head()

Unnamed: 0,id,ean,brand,name,category,description,tradeDescription,searchAttributes,price,images,...,sustainable,isBiocid,qualityBrand,priceGroup,depositPrice,priceOffer,referencePrice,herstellerUvp,discount,strikePrice
0,2854279009,4039057411558,BIO GOURMET,Biog.Kürbisk.Öl kaltg.100ml,"{'level1': 'Genuss', 'level2': 'Kochhilfen & G...",Biogourmet Kürbiskernöl geröstet und kaltgepre...,Speiseöle,"[Kochhilfen & Gewürze, Öle & Essig, Öle, Speis...",679,[https://budni-static.live.cellular.de/images/...,...,"{'attributes': ['SPECIES_DIVERSITY', 'CLIMATE'...",False,,,,,,,,
1,2969148007,4104420030220,ALNATURA,"Bio Alna.Italie.Olivenöl 0,5l","{'level1': 'Genuss', 'level2': 'Kochhilfen & G...",Bio Alnatura Italienisches Olivenöl (Olio Extr...,Öl,"[Kochhilfen & Gewürze, Öle & Essig, Öle, Öl, V...",899,[https://budni-static.live.cellular.de/images/...,...,"{'attributes': ['SPECIES_DIVERSITY', 'CLIMATE'...",False,,,,,,,,
2,2967808006,4104420028708,ALNATURA,Bio Alna.Ghee 180g,"{'level1': 'Genuss', 'level2': 'Kochhilfen & G...",Bio Alnatura Ghee 180g,Fette,"[Kochhilfen & Gewürze, Öle & Essig, Öle, Fette...",499,[https://budni-static.live.cellular.de/images/...,...,"{'attributes': ['SPECIES_DIVERSITY', 'CLIMATE'...",False,,,,,,,,
3,2969405007,4104420030923,ALNATURA,"Bio Alna.Aceto Balsamico 0,5l","{'level1': 'Genuss', 'level2': 'Kochhilfen & G...","Bio Alnatura (Aceto Balsamico di Modena) 0,5l",Essig,"[Kochhilfen & Gewürze, Öle & Essig, Balsamico+...",299,[https://budni-static.live.cellular.de/images/...,...,"{'attributes': ['SPECIES_DIVERSITY', 'CLIMATE'...",False,,,,,,,,
4,2969150000,4104420031050,ALNATURA,"Bio Alna.Olivenöl 0,5l","{'level1': 'Genuss', 'level2': 'Kochhilfen & G...","Bio Alnatura Olivenöl 0,5l",Öl,"[Kochhilfen & Gewürze, Öle & Essig, Öle, Öl, V...",399,[https://budni-static.live.cellular.de/images/...,...,"{'attributes': ['SPECIES_DIVERSITY', 'CLIMATE'...",False,,,,,,,,


### Save Result to CSV

In [10]:
from pathlib import Path

Path("data").mkdir(exist_ok=True)

products_df.to_csv("data/budni_food_data.csv", index=False)

## Fetch Article Data

The article data does not add much to what is already returned as search response.

The most notable differences are

- more categories
- Edeka data

In [11]:
article_url_base = "https://www.budni.de/api/content/articles/v3/"
article_id = products_df.id[0]

try:
    res = requests.get(f"{article_url_base}{article_id}", params={"branchId": 412101})
except requests.exceptions.RequestException as e:
    print(e)

In [12]:
print(json.dumps(res.json(), indent=4, ensure_ascii=False))

{
    "isDiscountable": true,
    "hasBabyClubBonus": false,
    "price": 679,
    "base": {
        "baseUnit": "L",
        "price": 6790,
        "contents": 1
    },
    "bonusPoints": 340,
    "name": "Biog.Kürbisk.Öl kaltg.100ml",
    "displayName": "Kürbiskernöl",
    "tradeDescription": "Speiseöle",
    "sortCategories": {
        "level1": {
            "id": "04",
            "name": "Genuss"
        },
        "level2": {
            "id": "0413",
            "name": "Kochhilfen & Gewürze"
        },
        "level3": {
            "id": "041301",
            "name": "Öle & Essig"
        },
        "level4": {
            "id": "04130101",
            "name": "Öle"
        }
    },
    "suggestions": {
        "suggest": {
            "input": [
                "Kochhilfen & Gewürze",
                "Öle & Essig",
                "Öle",
                "Speiseöle"
            ]
        },
        "brandSuggest": {
            "input": [
                "BIO GOURMET"
      