Skip to content

Commit

Permalink
Remove redundant .docx loader (closes #1716) + update how_to_guides.r…
Browse files Browse the repository at this point in the history
…st (#1891)

In #1716 , it was
identified that there were two .py files performing similar tasks. As a
resolution, one of the files has been removed, as its purpose had
already been fulfilled by the other file. Additionally, the init has
been updated accordingly.

Furthermore, the how_to_guides.rst file has been updated to include
links to documentation that was previously missing. This was deemed
necessary as the existing list on
https://langchain.readthedocs.io/en/latest/modules/document_loaders/how_to_guides.html
was incomplete, causing confusion for users who rely on the full list of
documentation on the left sidebar of the website.
  • Loading branch information
klein-t committed Mar 22, 2023
1 parent 1f93c5c commit d3d4503
Show file tree
Hide file tree
Showing 5 changed files with 22 additions and 164 deletions.
145 changes: 0 additions & 145 deletions docs/modules/document_loaders/examples/microsoft_word.ipynb

This file was deleted.

4 changes: 2 additions & 2 deletions docs/modules/document_loaders/examples/word_document.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredWordDocumentLoader(\"fake.docx\")"
"loader = UnstructuredWordDocumentLoader(\"example_data/fake.docx\")"
]
},
{
Expand Down Expand Up @@ -78,7 +78,7 @@
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredWordDocumentLoader(\"fake.docx\", mode=\"elements\")"
"loader = UnstructuredWordDocumentLoader(\"example_data/fake.docx\", mode=\"elements\")"
]
},
{
Expand Down
22 changes: 20 additions & 2 deletions docs/modules/document_loaders/how_to_guides.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,6 @@ There are a lot of different document loaders that LangChain supports. Below are

`GoogleDrive <./examples/googledrive.html>`_: A walkthrough of how to load data from Google drive.

`Microsoft Word <./examples/microsoft_word.html>`_: A walkthrough of how to load data from Microsoft Word files.

`Obsidian <./examples/obsidian.html>`_: A walkthrough of how to load data from an Obsidian file dump.

`Roam <./examples/roam.html>`_: A walkthrough of how to load data from a Roam file export.
Expand Down Expand Up @@ -59,6 +57,26 @@ There are a lot of different document loaders that LangChain supports. Below are

`iFixit <./examples/ifixit.html>`_: A walkthrough of how to search and load data like guides, technical Q&A's, and device wikis from iFixit.com

`Notebook <./examples/notebook.html>`_: A walkthrough of how to load data from .ipynb notebook.

`Copypaste <./examples/copypaste.html>`_: A walkthrough of how to load a document object from something you just want to copy and paste.

`CSV <./examples/csv.html>`_: A walkthrough of how to load data from a .csv file.

`Facebook Chat <./examples/facebook_chat.html>`_: A walkthrough of how to load data from a Facebook Chat json file.

`Image <./examples/image.html>`_: A walkthrough of how to load images such as JPGs PNGs into a document format that can be used downstream.

`Markdown <./examples/markdown.html>`_: A walkthrough of how to load data from a markdown file.

`SRT <./examples/srt.html>`_: A walkthrough of how to load data from a subtitle (`.srt`) file.

`Telegram <./examples/telegram.html>`_: A walkthrough of how to load data from a Telegram Chat json file.

`URL <./examples/url.html>`_: A walkthrough of how to load HTML documents from a list of URLs into a document format that we can use downstream.

`Word Document <./examples/word_document.html>`_: A walkthrough of how to load data from Microsoft Word files.

`Blackboard <./examples/blackboard.html>`_: A walkthrough of how to load data from a Blackboard course.

.. toctree::
Expand Down
2 changes: 0 additions & 2 deletions langchain/document_loaders/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@
from langchain.document_loaders.conllu import CoNLLULoader
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.document_loaders.directory import DirectoryLoader
from langchain.document_loaders.docx import UnstructuredDocxLoader
from langchain.document_loaders.email import UnstructuredEmailLoader
from langchain.document_loaders.evernote import EverNoteLoader
from langchain.document_loaders.facebook_chat import FacebookChatLoader
Expand Down Expand Up @@ -72,7 +71,6 @@
"UnstructuredPDFLoader",
"UnstructuredImageLoader",
"ObsidianLoader",
"UnstructuredDocxLoader",
"UnstructuredEmailLoader",
"UnstructuredMarkdownLoader",
"RoamLoader",
Expand Down
13 changes: 0 additions & 13 deletions langchain/document_loaders/docx.py

This file was deleted.

0 comments on commit d3d4503

Please sign in to comment.