Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

community[minor]: add exclude parameter to DirectoryLoader #17316

Merged
merged 10 commits into from
Feb 16, 2024

Conversation

nejch
Copy link
Contributor

@nejch nejch commented Feb 9, 2024

  • Description: adds an exclude parameter to the DirectoryLoader class, based on similar behavior in GenericLoader
  • Issue: discussed in Exclude documents using DocumentLoaders #9059 and I think in some other issues that I cannot find at the moment 🙇
  • Dependencies: None
  • Twitter handle: don't have one sorry! Just https://github/nejch

Copy link

vercel bot commented Feb 9, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Feb 15, 2024 9:14pm

@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. Ɑ: doc loader Related to document loader module (not documentation) 🤖:improvement Medium size change to existing code to handle new use-cases labels Feb 9, 2024
@@ -51,6 +52,7 @@ def __init__(
path: Path to directory.
glob: Glob pattern to use to find files. Defaults to "**/[!.]*"
(all files except hidden).
exclude: patterns to exclude from results, use glob syntax
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide an example of how to exclude something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eyurtsev this is a direct copy from the docstring in the file system loader:

exclude: patterns to exclude from results, use glob syntax

Should I update all occurrences to keep them consistent?

Copy link
Contributor Author

@nejch nejch Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some docstring examples based on other loaders.

@eyurtsev eyurtsev self-assigned this Feb 9, 2024
@nejch
Copy link
Contributor Author

nejch commented Feb 9, 2024

Thanks for the quick review @eyurtsev! Just noticed the tests also need some optional deps in CI, will rework this ASAP.

@nejch nejch force-pushed the feat/exclude-directoryloader branch from 167616c to 6f524c4 Compare February 9, 2024 19:58
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Feb 9, 2024
@nejch nejch force-pushed the feat/exclude-directoryloader branch 4 times, most recently from 7011967 to 143d007 Compare February 12, 2024 07:30
@nejch nejch requested a review from eyurtsev February 12, 2024 07:33
@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Feb 13, 2024
@eyurtsev
Copy link
Collaborator

Can merge if tests pass

@nejch nejch force-pushed the feat/exclude-directoryloader branch from 143d007 to bd784e5 Compare February 13, 2024 08:45
@eyurtsev
Copy link
Collaborator

Going to help resolve conflicts so we can merge

@eyurtsev eyurtsev changed the title community: add exclude parameter to DirectoryLoader community[minor]: add exclude parameter to DirectoryLoader Feb 15, 2024
@eyurtsev
Copy link
Collaborator

Can merge when tests pass

@nejch
Copy link
Contributor Author

nejch commented Feb 15, 2024

Sorry got the wrong lockfile updated there, I was going to get back to the dependency issue but I see you already got around that by mocking it out @eyurtsev, I guess next time this should be green ;)

@eyurtsev
Copy link
Collaborator

@nejch awesome! thank you. We need to investigate what's going with the extra dependency, but didn't want that to block from getting your contribution merged in!

@nejch
Copy link
Contributor Author

nejch commented Feb 16, 2024

Perfect, thanks a lot @eyurtsev. Looks like it's green now 👍

@eyurtsev eyurtsev merged commit b4fa847 into langchain-ai:master Feb 16, 2024
58 checks passed
haydeniw pushed a commit to haydeniw/langchain that referenced this pull request Feb 27, 2024
…-ai#17316)

- **Description:** adds an `exclude` parameter to the DirectoryLoader
class, based on similar behavior in GenericLoader
- **Issue:** discussed in
langchain-ai#9059 and I think
in some other issues that I cannot find at the moment 🙇
  - **Dependencies:** None
  - **Twitter handle:** don't have one sorry! Just https://github/nejch

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ɑ: doc loader Related to document loader module (not documentation) 🤖:improvement Medium size change to existing code to handle new use-cases lgtm PR looks good. Use to confirm that a PR is ready for merging. size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants