Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sitemap loader throws error TypeError: _request() got an unexpected keyword argument 'verify', many docs refer to wrong links for sitemap as well. #6691

Closed
luca-git opened this issue Jun 24, 2023 · 8 comments
Labels
01 bug Confirmed bug documentation Improvements or additions to documentation

Comments

@luca-git
Copy link

System Info

windows 11 python 3.9.16 langchain 0.0.212

Who can help?

Code from https://python.langchain.com/docs/modules/data_connection/document_loaders/integrations/sitemap

from langchain.document_loaders.sitemap import SitemapLoader

sitemap_loader = SitemapLoader(web_path="https://langchain.readthedocs.io/sitemap.xml")

docs = sitemap_loader.load()

throws:

    self._request(hdrs.METH_GET, url, allow_redirects=allow_redirects, **kwargs)

TypeError: _request() got an unexpected keyword argument 'verify'
```python

### Information

- [X] The official example notebooks/scripts
- [ ] My own modified scripts

### Related Components

- [ ] LLMs/Chat Models
- [ ] Embedding Models
- [ ] Prompts / Prompt Templates / Prompt Selectors
- [ ] Output Parsers
- [X] Document Loaders
- [ ] Vector Stores / Retrievers
- [ ] Memory
- [ ] Agents / Agent Executors
- [ ] Tools / Toolkits
- [ ] Chains
- [ ] Callbacks/Tracing
- [ ] Async

### Reproduction

from langchain.document_loaders.sitemap import SitemapLoader

sitemap_loader = SitemapLoader(web_path="https://langchain.readthedocs.io/sitemap.xml")

docs = sitemap_loader.load()

### Expected behavior

to work or get a doc update
@dosubot dosubot bot added 01 bug Confirmed bug documentation Improvements or additions to documentation labels Jun 24, 2023
@dosubot
Copy link

dosubot bot commented Jun 24, 2023

Answer generated by a 🤖

Answer

It seems like the issue is caused by the 'verify' keyword argument being passed to the _request() method. To fix this issue, you can update the SitemapLoader class to handle the 'verify' keyword argument properly.
First, you need to update the SitemapLoader class to accept the requests_kwargs parameter in its constructor. Then, you can pass the requests_kwargs to the _request() method.
Here's the updated code:

from langchain.document_loaders.sitemap import SitemapLoader

class CustomSitemapLoader(SitemapLoader):
    def __init__(self, *args, requests_kwargs=None, **kwargs):
        super().__init__(*args, **kwargs)
        self.requests_kwargs = requests_kwargs or {}

    def _request(self, method, url, **kwargs):
        kwargs.update(self.requests_kwargs)
        return super()._request(method, url, **kwargs)

sitemap_loader = CustomSitemapLoader(web_path="https://langchain.readthedocs.io/sitemap.xml", requests_kwargs={"verify": False})

docs = sitemap_loader.load()

This custom implementation of SitemapLoader should resolve the TypeError issue you encountered (Source: docs/extras/modules/data_connection/document_loaders/integrations/sitemap.ipynb).

Sources


This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

@juanvecino
Copy link

Have you managed to solve this?

@luca-git
Copy link
Author

Nope, i didn't think the bot advice to create a custom class is fully appropriate when dealing with an official notebook example. I,'m hoping for something more 'official' from the team.

@juanvecino
Copy link

@hwchase17 is this being solved? @support

@weasley0-0
Copy link

Nope, i didn't think the bot advice to create a custom class is fully appropriate when dealing with an official notebook example. I,'m hoping for something more 'official' from the team.

Just tried it out and can confirm it doesn't work.

@gj315
Copy link

gj315 commented Jul 1, 2023

Hi Guys @weasley0-0 and @juanvecino seems like this work with this.

just change 1 line in this file - /document_loaders/web_base.py

change this code
async with session.get(
url, headers=self.session.headers, verify=self.verify
) as response:

to Update code
Screenshot 2023-07-01 at 6 58 21 PM

async with session.get(
url, headers=self.session.headers, ssl=self.verify
) as response:

@rincon-santi
Copy link

Any updates in this bug?

baskaryan pushed a commit that referenced this issue Oct 17, 2023
**Description:**
While working on the Docusaurus site loader #9138, I noticed some
outdated docs and tests for the Sitemap Loader.

**Issue:** 
This is tangentially related to #6691 in reference to doc links. I plan
on digging in to a few of these issue when I find time next.
chocolate4 pushed a commit to chocolate4/langchain that referenced this issue Oct 17, 2023
)

**Description:**
While working on the Docusaurus site loader langchain-ai#9138, I noticed some
outdated docs and tests for the Sitemap Loader.

**Issue:** 
This is tangentially related to langchain-ai#6691 in reference to doc links. I plan
on digging in to a few of these issue when I find time next.
@dosubot
Copy link

dosubot bot commented Oct 17, 2023

Hi, @luca-git! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, the issue you reported is related to the sitemap loader in the langchain library throwing a TypeError due to an unexpected keyword argument. There have been some suggestions for a solution, including creating a custom class and updating the code in the web_base.py file. However, it seems that an official solution from the team has not been provided yet.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and patience. Let us know if you have any further questions or concerns.

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Oct 17, 2023
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 24, 2023
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Oct 24, 2023
hoanq1811 pushed a commit to hoanq1811/langchain that referenced this issue Feb 2, 2024
)

**Description:**
While working on the Docusaurus site loader langchain-ai#9138, I noticed some
outdated docs and tests for the Sitemap Loader.

**Issue:** 
This is tangentially related to langchain-ai#6691 in reference to doc links. I plan
on digging in to a few of these issue when I find time next.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
01 bug Confirmed bug documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

5 participants