Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

community: removed unnecessary dependecy use from NotebookLoader #17614

Conversation

lachiewalker
Copy link
Contributor

Description: Removes unnecessary pandas use from the NotebookLoader document loader.

  • In no way changes the output from this classes .load() method.
  • Is compatible with existing tests.
  • Delivers a noticeable speedup over the previous method.

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Feb 16, 2024
Copy link

vercel bot commented Feb 16, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Mar 29, 2024 0:34am

@dosubot dosubot bot added Ɑ: doc loader Related to document loader module (not documentation) 🤖:improvement Medium size change to existing code to handle new use-cases labels Feb 16, 2024
p = Path(self.file_path)

with open(p, encoding="utf8") as f:
d = json.load(f)

data = pd.json_normalize(d["cells"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does pd.json_normalize do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the documentation: Normalizes semi-structured JSON data into a flat table.

Combined with filtered_data = data[["cell_type", "source", "outputs"]] this statement makes a table of [ cell_type, source, output ] information.

This behaviour can be simply replicated by a list of dictionaries with keys cell_type, source, output, without relying on a dependency.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, could we add a simple unit test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lachiewalker 👋 new to contributing here, I was wondering if I can help write the unit test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paulonasc yeah, of course 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here's the commit for the unit tests btw! #17721

Is it okay that it's in a separate PR?

@baskaryan baskaryan added the needs test PR needs to be updated with tests label Feb 18, 2024
baskaryan added a commit that referenced this pull request Mar 29, 2024
Thank you for contributing to LangChain!

- **Description:** added unit tests for NotebookLoader. Linked PR:
#17614
- **Issue:**
[#17614](#17614)
    - **Twitter handle:** @paulodoestech
- [x] Pass lint and test: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified to check that you're
passing lint and testing. See contribution guidelines for more
information on how to write/run tests, lint, etc:
https://python.langchain.com/docs/contributing/
- [x] Add tests and docs: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: lachiewalker <lachiewalker1@hotmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
@dosubot dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Mar 29, 2024
@baskaryan baskaryan closed this Mar 29, 2024
bechbd pushed a commit to bechbd/langchain that referenced this pull request Mar 29, 2024
Thank you for contributing to LangChain!

- **Description:** added unit tests for NotebookLoader. Linked PR:
langchain-ai#17614
- **Issue:**
[langchain-ai#17614](langchain-ai#17614)
    - **Twitter handle:** @paulodoestech
- [x] Pass lint and test: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified to check that you're
passing lint and testing. See contribution guidelines for more
information on how to write/run tests, lint, etc:
https://python.langchain.com/docs/contributing/
- [x] Add tests and docs: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: lachiewalker <lachiewalker1@hotmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
gkorland pushed a commit to FalkorDB/langchain that referenced this pull request Mar 30, 2024
Thank you for contributing to LangChain!

- **Description:** added unit tests for NotebookLoader. Linked PR:
langchain-ai#17614
- **Issue:**
[langchain-ai#17614](langchain-ai#17614)
    - **Twitter handle:** @paulodoestech
- [x] Pass lint and test: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified to check that you're
passing lint and testing. See contribution guidelines for more
information on how to write/run tests, lint, etc:
https://python.langchain.com/docs/contributing/
- [x] Add tests and docs: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: lachiewalker <lachiewalker1@hotmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
marlenezw pushed a commit to marlenezw/langchain that referenced this pull request Apr 2, 2024
Thank you for contributing to LangChain!

- **Description:** added unit tests for NotebookLoader. Linked PR:
langchain-ai#17614
- **Issue:**
[langchain-ai#17614](langchain-ai#17614)
    - **Twitter handle:** @paulodoestech
- [x] Pass lint and test: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified to check that you're
passing lint and testing. See contribution guidelines for more
information on how to write/run tests, lint, etc:
https://python.langchain.com/docs/contributing/
- [x] Add tests and docs: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: lachiewalker <lachiewalker1@hotmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Je-Cp pushed a commit to Je-Cp/jcp-langchain that referenced this pull request Apr 2, 2024
Thank you for contributing to LangChain!

- **Description:** added unit tests for NotebookLoader. Linked PR:
langchain-ai/langchain#17614
- **Issue:**
[#17614](langchain-ai/langchain#17614)
    - **Twitter handle:** @paulodoestech
- [x] Pass lint and test: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified to check that you're
passing lint and testing. See contribution guidelines for more
information on how to write/run tests, lint, etc:
https://python.langchain.com/docs/contributing/
- [x] Add tests and docs: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: lachiewalker <lachiewalker1@hotmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
hinthornw pushed a commit that referenced this pull request Apr 26, 2024
Thank you for contributing to LangChain!

- **Description:** added unit tests for NotebookLoader. Linked PR:
#17614
- **Issue:**
[#17614](#17614)
    - **Twitter handle:** @paulodoestech
- [x] Pass lint and test: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified to check that you're
passing lint and testing. See contribution guidelines for more
information on how to write/run tests, lint, etc:
https://python.langchain.com/docs/contributing/
- [x] Add tests and docs: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: lachiewalker <lachiewalker1@hotmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ɑ: doc loader Related to document loader module (not documentation) 🤖:improvement Medium size change to existing code to handle new use-cases needs test PR needs to be updated with tests size:XS This PR changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants