feat: Add Apify integration #998

jirimoravcik · 2023-04-26T12:35:53Z

JS version of langchain-ai/langchain#2201

If you have any suggestions, feel free to comment on this PR.
Also if there are some issues concerning Apify integrations in langchain, feel free to contact me.

vercel · 2023-04-26T12:35:57Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated (UTC)
langchainjs-docs	✅ Ready (Inspect)	Visit Preview	May 15, 2023 1:12pm

jacoblee93 · 2023-05-03T23:33:35Z

langchain/src/tools/apify.ts

@@ -0,0 +1,93 @@
+import { Document } from "../document.js";


Tools have a specific meaning in LangChain - this is in the wrong place. Can we bundle it into the document loader and run callActor when load'ing documents?

Thanks for the review.
In the Python version, it's in the utilities. https://github.com/hwchase17/langchain/blob/master/langchain/utilities/apify.py
Not sure if there's an equivalent of it in the JS version?

Please note that an Actor can potentially run quite a long time - hours, or even days for large sites. The scenario when we run the Actor, wait for it to finish, and then feed data to the vector index, is to demonstrate how it works. For large-scale production use cases, running of Actors will be separate from loading the vector index, often the loading will be invoked via webhook once the Actor finishes. Hence linking the callActor action with loading documents might not be ideal.

In that case I'd prefer to put this as a callActor method in the document_loader class. In a broad sense, you're still preparing documents to be loaded and interacted with, right?

I can make the change today and polish everything up if that's ok with you?

The other reason is so that we can give everything that requires Apify one entrypoint.

Thank you @jacoblee93 If you prefer so, feel free to update it that way.

# Conflicts: # docs/docs/modules/agents/tools/integrations/index.mdx # langchain/package.json # yarn.lock

jacoblee93 · 2023-05-15T20:53:05Z

Extended and merged here: #1271

Thanks for your patience!

feat: Add Apify integration (#1)

7146bee

vercel bot deployed to Preview April 26, 2023 12:47 View deployment

Better copy for Apify intergration (#2)

163e36d

vercel bot deployed to Preview May 3, 2023 18:31 View deployment

jacoblee93 requested changes May 3, 2023

View reviewed changes

jirimoravcik requested a review from jacoblee93 May 6, 2023 12:13

jacoblee93 self-assigned this May 12, 2023

Merge remote-tracking branch 'upstream/main'

32816c0

# Conflicts: # docs/docs/modules/agents/tools/integrations/index.mdx # langchain/package.json # yarn.lock

vercel bot deployed to Preview May 15, 2023 13:12 View deployment

jacoblee93 mentioned this pull request May 15, 2023

Add Apify integration, update docs #1271

Merged

jacoblee93 merged commit 32816c0 into langchain-ai:main May 15, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add Apify integration #998

feat: Add Apify integration #998

jirimoravcik commented Apr 26, 2023

vercel bot commented Apr 26, 2023 •

edited

jacoblee93 May 3, 2023

jirimoravcik May 4, 2023

jancurn May 11, 2023

jacoblee93 May 15, 2023 •

edited

jacoblee93 May 15, 2023

jancurn May 15, 2023

jacoblee93 commented May 15, 2023

feat: Add Apify integration #998

feat: Add Apify integration #998

Conversation

jirimoravcik commented Apr 26, 2023

vercel bot commented Apr 26, 2023 • edited

jacoblee93 May 3, 2023

Choose a reason for hiding this comment

jirimoravcik May 4, 2023

Choose a reason for hiding this comment

jancurn May 11, 2023

Choose a reason for hiding this comment

jacoblee93 May 15, 2023 • edited

Choose a reason for hiding this comment

jacoblee93 May 15, 2023

Choose a reason for hiding this comment

jancurn May 15, 2023

Choose a reason for hiding this comment

jacoblee93 commented May 15, 2023

vercel bot commented Apr 26, 2023 •

edited

jacoblee93 May 15, 2023 •

edited