Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add Apify integration #998

Merged
merged 3 commits into from May 15, 2023
Merged

feat: Add Apify integration #998

merged 3 commits into from May 15, 2023

Conversation

jirimoravcik
Copy link
Contributor

JS version of langchain-ai/langchain#2201

If you have any suggestions, feel free to comment on this PR.
Also if there are some issues concerning Apify integrations in langchain, feel free to contact me.

@vercel
Copy link

vercel bot commented Apr 26, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
langchainjs-docs ✅ Ready (Inspect) Visit Preview May 15, 2023 1:12pm

@@ -0,0 +1,93 @@
import { Document } from "../document.js";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tools have a specific meaning in LangChain - this is in the wrong place. Can we bundle it into the document loader and run callActor when load'ing documents?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review.
In the Python version, it's in the utilities. https://github.com/hwchase17/langchain/blob/master/langchain/utilities/apify.py
Not sure if there's an equivalent of it in the JS version?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that an Actor can potentially run quite a long time - hours, or even days for large sites. The scenario when we run the Actor, wait for it to finish, and then feed data to the vector index, is to demonstrate how it works. For large-scale production use cases, running of Actors will be separate from loading the vector index, often the loading will be invoked via webhook once the Actor finishes. Hence linking the callActor action with loading documents might not be ideal.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I'd prefer to put this as a callActor method in the document_loader class. In a broad sense, you're still preparing documents to be loaded and interacted with, right?

I can make the change today and polish everything up if that's ok with you?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other reason is so that we can give everything that requires Apify one entrypoint.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jacoblee93 If you prefer so, feel free to update it that way.

@jacoblee93 jacoblee93 self-assigned this May 12, 2023
# Conflicts:
#	docs/docs/modules/agents/tools/integrations/index.mdx
#	langchain/package.json
#	yarn.lock
@jacoblee93 jacoblee93 merged commit 32816c0 into langchain-ai:main May 15, 2023
1 check passed
@jacoblee93
Copy link
Collaborator

Extended and merged here: #1271

Thanks for your patience!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants