Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

community[patch]: feat: add headers param to cheerio loader #5806

Conversation

versole
Copy link
Contributor

@versole versole commented Jun 19, 2024

Feature request: Add a 'fetch headers' parameter to the Cheerio loader to accommodate different results based on the user-agent used.

@dosubot dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Jun 19, 2024
Copy link

vercel bot commented Jun 19, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchainjs-api-refs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jun 20, 2024 10:17pm
langchainjs-docs ✅ Ready (Inspect) Visit Preview Jun 20, 2024 10:17pm

@dosubot dosubot bot added the auto:improvement Medium size change to existing code to handle new use-cases label Jun 19, 2024
@@ -92,11 +99,13 @@ export class CheerioWebBaseLoader
caller: AsyncCaller,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there! I noticed that this code change introduces a new HTTP request using the fetch function with the addition of the headers parameter. I've flagged this for your review to ensure it aligns with the project's requirements. Let me know if you have any questions!

@versole versole changed the title feat: add_headers_param_2_cherrio_loader feat: add headers param to cheerio loader Jun 19, 2024
Copy link
Collaborator

@bracesproul bracesproul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good, left a couple comments. Thanks for adding this!

Comment on lines 126 to 127
this.textDecoder,
this.headers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is passing headers via the options field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad

Comment on lines 102 to 103
options?: CheerioOptions,
headers?: Headers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we combine these? Something like this:

static async _scrape(
  ...other args...
  options?: CheerioOptions & { headers?: Headers },
) {
  const { headers, ...cheerioOptions } = options ?? {};
  ...existing code
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@jacoblee93 jacoblee93 changed the title feat: add headers param to cheerio loader community[patch]" feat: add headers param to cheerio loader Jun 20, 2024
@jacoblee93 jacoblee93 changed the title community[patch]" feat: add headers param to cheerio loader community[patch]: feat: add headers param to cheerio loader Jun 20, 2024
@jacoblee93 jacoblee93 added the lgtm PRs that are ready to be merged as-is label Jun 20, 2024
@jacoblee93
Copy link
Collaborator

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:improvement Medium size change to existing code to handle new use-cases lgtm PRs that are ready to be merged as-is size:S This PR changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants