-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
langchain[minor]: Added Search Functionality to FireCrawl Document Loader #5418
base: main
Are you sure you want to change the base?
langchain[minor]: Added Search Functionality to FireCrawl Document Loader #5418
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
langchain/package.json
Outdated
@@ -603,7 +603,7 @@ | |||
"@jest/globals": "^29.5.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there! 👋 I noticed a change in the "@mendable/firecrawl-js" dependency version from "^0.0.13" to "^0.0.21" in the package.json file. This seems to be a modification in the hard dependency, and I'm flagging this for your review. Keep up the great work! 🚀
@@ -32,3 +32,17 @@ test("Test FireCrawlLoader load method with crawl mode", async () => { | |||
expect(document.pageContent).toBeTruthy(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there! I noticed that the recent change in the test file explicitly accesses an environment variable using process.env
. I've flagged this for your review to ensure it aligns with our best practices for handling environment variables. Let me know if you have any questions!
@@ -32,3 +32,17 @@ test("Test FireCrawlLoader load method with crawl mode", async () => { | |||
expect(document.pageContent).toBeTruthy(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey team, I've flagged this PR for review as it introduces a new test case that triggers an external HTTP request using fetch or axios to fetch documents based on the search query. Please take a look and ensure it aligns with our code standards. Thanks!
@@ -32,3 +32,17 @@ test("Test FireCrawlLoader load method with crawl mode", async () => { | |||
expect(document.pageContent).toBeTruthy(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey team, I've flagged a change in the PR for the FireCrawlLoader
load method that accesses an environment variable via process.env
. Please review this change to ensure proper handling of environment variables.
@@ -82,6 +108,9 @@ export class FireCrawlLoader extends BaseDocumentLoader { | |||
let firecrawlDocs: FirecrawlDocument[]; | |||
|
|||
if (this.mode === "scrape") { | |||
if (!this.url) { | |||
throw new Error("Firecrawl URL not provided."); | |||
} | |||
const response = await app.scrapeUrl(this.url, this.params); | |||
if (!response.success) { | |||
throw new Error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there! I noticed that this PR introduces a new HTTP request for the "search" mode using app.search
. I've flagged this change for your review to ensure it aligns with the project's architecture and requirements. Let me know if you have any questions or need further clarification!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pushing this up! I left a couple comments, and have one bigger request:
This search functionality should be a retriever, and not a document loader. Would you be willing to instead refactor this PR to not add any new functionality to the document loader, but instead create a new firecrawl retriever which has the ability to search?
You can use this retriever for reference
@@ -22,7 +22,7 @@ Sign up and get your free [FireCrawl API key](https://firecrawl.dev) to start. F | |||
|
|||
Here's an example of how to use the `FireCrawlLoader` to load web search results: | |||
|
|||
Firecrawl offers 2 modes: `scrape` and `crawl`. In `scrape` mode, Firecrawl will only scrape the page you provide. In `crawl` mode, Firecrawl will crawl the entire website. | |||
Firecrawl offers 3 modes: `scrape`, `crawl` and `search`. In `scrape` mode, Firecrawl will only scrape the page you provide. In `crawl` mode, Firecrawl will crawl the entire website. In `search` mode, Firecrawl will search the web for the query you provide. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would do better as bullet points
langchain/package.json
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The langchain entrypoint is deprecated, so we shouldn't be adding more features to it. Only add to the community integration
- Got back changes made on previous commits on firecrawl document loader
@bracesproul I just changed the search implementation following your suggestions! |
Hey @bracesproul I've made the changes you asked for in the firecrawl retriever. Could you take a look when you get a chance? I want to make sure it's all good. |
This PR adds search functionality to the FireCrawl document loader, enabling users to search and retrieve specific data from the web.
This enhancement builds on the original FireCrawl loader introduced in PR #5180.
Thank you!