New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Two different document loaders for Microsoft Word files #1716
Comments
I agree! |
…st (#1891) In langchain-ai/langchain#1716 , it was identified that there were two .py files performing similar tasks. As a resolution, one of the files has been removed, as its purpose had already been fulfilled by the other file. Additionally, the init has been updated accordingly. Furthermore, the how_to_guides.rst file has been updated to include links to documentation that was previously missing. This was deemed necessary as the existing list on https://langchain.readthedocs.io/en/latest/modules/document_loaders/how_to_guides.html was incomplete, causing confusion for users who rely on the full list of documentation on the left sidebar of the website.
I'm unable to load all the word files present in the folder. Below is the code
but below code is working only for 1 file.
is there any method to overcome this? |
@nithinreddyyyyyy - That seems like it should work. Could you post in here your loader = DirectoryLoader(folder_path, glob="./*.docx", loader_cls=UnstructuredWordDocumentLoader) |
above is the code, below is the error
I don't why it is returning error |
I didn't see |
Hello, I've noticed that after the latest commit of @MthwRobinson there are two different modules to load Word documents, could they be unified in a single version? Also there are two notebooks that do almost the same thing.
docx.py and word_document.py
microsoft_word.ipynb and word_document.ipynb
Or am I just missing something?
The text was updated successfully, but these errors were encountered: