Skip to content

Bug Report: Reader API Fails to Process Content in Specific Section of Website #1196

@tim-dim

Description

@tim-dim

Issue: When using the Jina Reader API (r.jina.ai) to process the URL https://www.foerderdatenbank.de/FDB/Content/DE/Foerderprogramm/Land/Schleswig-Holstein/soziale-wohnraumfoerderung-eigentumsmassnahmen.html, the content within the element identified by the CSS selector body > main > div.jumbotron > div:nth-child(2) > div is not included in the output. Only the bottom part of the website's content appears to be read out by the API.

Steps to Reproduce:

  1. Execute the following curl command:

    Bash

    curl "https://r.jina.ai/https://www.foerderdatenbank.de/FDB/Content/DE/Foerderprogramm/Land/Schleswig-Holstein/soziale-wohnraumfoerderung-eigentumsmassnahmen.html"\
      -H "Authorization: Bearer YOUR_JINA_API_TOKEN"\
      -H "X-Wait-For-Selector: body, .class, #id"
    
    

    (Note: Replace YOUR_JINA_API_TOKEN with your actual API token and ensure the X-Wait-For-Selector value is exactly as shown, although this header's value might not be directly related to the issue of missing content in a specific included part of the page).

Expected Behavior:

The Jina Reader API should process and return the content from the entire main body of the specified webpage, including the section identified by the selector body > main > div.jumbotron > div:nth-child(2) > div.

Image

Actual Behavior:

The content from the element with the CSS selector body > main > div.jumbotron > div:nth-child(2) > div is omitted from the API's output. Only the lower portion of the website's content is returned.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions