Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement: add strategy to unstructured options #1208

Merged

Conversation

MthwRobinson
Copy link
Contributor

Summary

Adds strategy as an option for the Unstructured API loader. Available strategies are "hi_res", "ocr_only", and "fast". The strategy option only impacts processing for PDFs and images. See the Unstructured documentation for more information on strategies.

@vercel
Copy link

vercel bot commented May 10, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
langchainjs-docs ✅ Ready (Inspect) Visit Preview May 12, 2023 5:43pm

@jacoblee93 jacoblee93 self-assigned this May 12, 2023
@jacoblee93
Copy link
Collaborator

It's failing the integration test on my end:

Screenshot 2023-05-12 at 10 15 36 AM

Can you try it on your machine with
yarn test:single langchain/src/document_loaders/tests/unstructured.int.test.ts?

@MthwRobinson
Copy link
Contributor Author

@jacoblee93 - Thanks, have that passing now locally. Looks like there was one doc in there with a null pageContent.

@jacoblee93 jacoblee93 merged commit 098e90c into langchain-ai:main May 13, 2023
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants