Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

langchain[minor]: Added Search Functionality to FireCrawl Document Loader #5418

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

rafaelsideguide
Copy link

This PR adds search functionality to the FireCrawl document loader, enabling users to search and retrieve specific data from the web.

This enhancement builds on the original FireCrawl loader introduced in PR #5180.

Thank you!

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label May 16, 2024
Copy link

vercel bot commented May 16, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchainjs-api-refs ❌ Failed (Inspect) May 28, 2024 1:53pm
langchainjs-docs ❌ Failed (Inspect) May 28, 2024 1:53pm

@dosubot dosubot bot added the auto:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features label May 16, 2024
@rafaelsideguide rafaelsideguide changed the title Added Search Functionality to FireCrawl Loader langchain[minor]: Added Search Functionality to FireCrawl Document Loader May 16, 2024
@@ -603,7 +603,7 @@
"@jest/globals": "^29.5.0",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there! 👋 I noticed a change in the "@mendable/firecrawl-js" dependency version from "^0.0.13" to "^0.0.21" in the package.json file. This seems to be a modification in the hard dependency, and I'm flagging this for your review. Keep up the great work! 🚀

@@ -32,3 +32,17 @@ test("Test FireCrawlLoader load method with crawl mode", async () => {
expect(document.pageContent).toBeTruthy();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there! I noticed that the recent change in the test file explicitly accesses an environment variable using process.env. I've flagged this for your review to ensure it aligns with our best practices for handling environment variables. Let me know if you have any questions!

@@ -32,3 +32,17 @@ test("Test FireCrawlLoader load method with crawl mode", async () => {
expect(document.pageContent).toBeTruthy();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey team, I've flagged this PR for review as it introduces a new test case that triggers an external HTTP request using fetch or axios to fetch documents based on the search query. Please take a look and ensure it aligns with our code standards. Thanks!

@@ -32,3 +32,17 @@ test("Test FireCrawlLoader load method with crawl mode", async () => {
expect(document.pageContent).toBeTruthy();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey team, I've flagged a change in the PR for the FireCrawlLoader load method that accesses an environment variable via process.env. Please review this change to ensure proper handling of environment variables.

@@ -82,6 +108,9 @@ export class FireCrawlLoader extends BaseDocumentLoader {
let firecrawlDocs: FirecrawlDocument[];

if (this.mode === "scrape") {
if (!this.url) {
throw new Error("Firecrawl URL not provided.");
}
const response = await app.scrapeUrl(this.url, this.params);
if (!response.success) {
throw new Error(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there! I noticed that this PR introduces a new HTTP request for the "search" mode using app.search. I've flagged this change for your review to ensure it aligns with the project's architecture and requirements. Let me know if you have any questions or need further clarification!

Copy link
Collaborator

@bracesproul bracesproul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing this up! I left a couple comments, and have one bigger request:
This search functionality should be a retriever, and not a document loader. Would you be willing to instead refactor this PR to not add any new functionality to the document loader, but instead create a new firecrawl retriever which has the ability to search?

You can use this retriever for reference

@@ -22,7 +22,7 @@ Sign up and get your free [FireCrawl API key](https://firecrawl.dev) to start. F

Here's an example of how to use the `FireCrawlLoader` to load web search results:

Firecrawl offers 2 modes: `scrape` and `crawl`. In `scrape` mode, Firecrawl will only scrape the page you provide. In `crawl` mode, Firecrawl will crawl the entire website.
Firecrawl offers 3 modes: `scrape`, `crawl` and `search`. In `scrape` mode, Firecrawl will only scrape the page you provide. In `crawl` mode, Firecrawl will crawl the entire website. In `search` mode, Firecrawl will search the web for the query you provide.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would do better as bullet points

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The langchain entrypoint is deprecated, so we shouldn't be adding more features to it. Only add to the community integration

- Got back changes made on previous commits on firecrawl document loader
@rafaelsideguide
Copy link
Author

Thanks for pushing this up! I left a couple comments, and have one bigger request: This search functionality should be a retriever, and not a document loader. Would you be willing to instead refactor this PR to not add any new functionality to the document loader, but instead create a new firecrawl retriever which has the ability to search?

You can use this retriever for reference

@bracesproul I just changed the search implementation following your suggestions!

@rafaelsideguide
Copy link
Author

Hey @bracesproul I've made the changes you asked for in the firecrawl retriever. Could you take a look when you get a chance? I want to make sure it's all good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants