Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harrison/rtd loader #1513

Merged
merged 2 commits into from Mar 8, 2023
Merged

Harrison/rtd loader #1513

merged 2 commits into from Mar 8, 2023

Conversation

hwchase17
Copy link
Contributor

No description provided.

yousseb and others added 2 commits March 7, 2023 15:39
This change improves ReadTheDocsLoader

1. Pass initialization args to BeautifulSoup. This allows selection of
parser library - eg. `features='html5lib'`
2. `load` method gets an optional `encoding` and `error` for the file
`open` method.

This allows developers to do things like:

Original:
```
    # The following line produces a BeautifulSoup warning that parser library is not specified
    loader = ReadTheDocsLoader('langchain.readthedocs.io/en/latest/') 
    
    # The following could fail if encoding is not current OS encoding
    raw_documents = loader.load()
```

New:
```
    # features='html5lib' is passed down to BeautifulSoup allowing standard parser library
    # across environments - still Optional to maintain backward compatibility
    loader = ReadTheDocsLoader('langchain.readthedocs.io/en/latest/', features='html5lib')
    
    # Specify document encoding (optional to maintain compat) 
    # also support errors='ignore', defaulting to None for file open function
    raw_documents = loader.load(encoding='utf-8') 
```
@hwchase17 hwchase17 merged commit a4a2d79 into master Mar 8, 2023
@hwchase17 hwchase17 deleted the harrison/rtd-loader branch March 8, 2023 05:09
zachschillaci27 pushed a commit to zachschillaci27/langchain that referenced this pull request Mar 8, 2023
Co-authored-by: Youssef A. Abukwaik <yousseb@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants