Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When there are iframe tags on the page, the extracted content is the content of the iframe tags. Is there a way to handle the expected content that is not labeled with iframe tags? #59

Open
fu1996 opened this issue May 15, 2024 · 1 comment

Comments

@fu1996
Copy link

fu1996 commented May 15, 2024

url: https://r.jina.ai/https://new.qq.com/rain/a/20230723A067YG00

@nomagick
Copy link
Member

It's not about iframe. It's the return timing.
Our default return timing didn't work on this page.

To properly crawl this kind of webpage, you need to know about its structure.
For this particular case, leverage our new x-target-selector header:

curl https://r.jina.ai/https://new.qq.com/rain/a/20230723A067YG00 -H 'x-target-selector: .content-article'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants