Load file creation and modification date to metadata with DirectoryLoader #13462
-
I was looking around the repository and web. I couldn't find the information if it is possible to load file creation and/or modification date together with a file with DirectoryLoader. Is it possible? How to do that? Currently, I load only docx files, but I would also like to load some other types of documents in the future. My current code snippet is:
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
🤖 Based on the current implementation of the DirectoryLoader class in the LangChain framework, it does not support loading file metadata such as creation and modification dates. The However, you can extend the DirectoryLoader class to include file metadata such as creation and modification dates. Here's how you can modify the import os
import time
def load_file(
self, item: Path, path: Path, docs: List[Document], pbar: Optional[Any]
) -> None:
"""Load a file.
Args:
item: File path.
path: Directory path.
docs: List of documents to append to.
pbar: Progress bar. Defaults to None.
"""
if item.is_file():
if _is_visible(item.relative_to(path)) or self.load_hidden:
try:
logger.debug(f"Processing file: {str(item)}")
sub_docs = self.loader_cls(str(item), **self.loader_kwargs).load()
# Get file metadata
stat = os.stat(str(item))
creation_date = time.ctime(stat.st_ctime)
modification_date = time.ctime(stat.st_mtime)
for doc in sub_docs:
doc.metadata['creation_date'] = creation_date
doc.metadata['modification_date'] = modification_date
docs.extend(sub_docs)
except Exception as e:
if self.silent_errors:
logger.warning(f"Error loading file {str(item)}: {e}")
else:
raise e
finally:
if pbar:
pbar.update(1) In this modification, we use You can find the original implementation of the SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
Bot, thank you for response. Based on your solution, I created a bit different code: class DateDirectoryLoader(DirectoryLoader):
def load_file(
self, item: Path, path: Path, docs: List[Document], pbar: Optional[Any]
) -> None:
prev_len = len(docs)
super().load_file(item, path, docs, pbar)
if len(docs) > prev_len:
# if any file was loaded by super().load_file == no error loading
stat = os.stat(str(item))
creation_date = datetime.fromtimestamp(stat.st_ctime).isoformat()
modification_date = datetime.fromtimestamp(stat.st_mtime).isoformat()
for doc in docs[prev_len:]:
doc.metadata['creation_date'] = creation_date
doc.metadata['modification_date'] = modification_date Key changes:
I also have one important notice. |
Beta Was this translation helpful? Give feedback.
Bot, thank you for response. Based on your solution, I created a bit different code: