-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error with UnstructuredLoader when attempting to load markdown files #738
Comments
seem's like can't get file MIME type from buffer which read by fs.readFile API |
fix langchain-ai#738 Reference link
Yes, as mentioned in this PR unstructured-api, it should use the filename to determine the file type. |
It's still not detecting the type if it's a PDF |
@alextkd You can use
|
It worked in the end. It was a typo in my PDF that prevented it from loading. But thanks for sharing, I need to handle them from an S3 bucket, so I'll have to try using blobs if I use this approach and not the s3Loader that uses unstructured . It should be good now, it may use more resources but it does the job. Cheers! |
I have successfully run Docker for unstructured-api and I am using UnstructuredLoader to load markdown files.
But there is an error report
I have tested it using Postman and everything is ok.
May be we should append file name (with
.md
) to formData?https://github.com/hwchase17/langchainjs/blob/47539dae010cd6a38c10ebcf3fb315339c348889/langchain/src/document_loaders/fs/unstructured.ts#L30-L36
The text was updated successfully, but these errors were encountered: