Skip to content

Commit

Permalink
Web Loader: Add proxy support (#6792)
Browse files Browse the repository at this point in the history
Proxies are helpful, especially when you start querying against more
anti-bot websites.

[Proxy
services](https://developers.oxylabs.io/advanced-proxy-solutions/web-unblocker/making-requests)
(of which there are many) and `requests` make it easy to rotate IPs to
prevent banning by just passing along a simple dict to `requests`.

CC @rlancemartin, @eyurtsev
  • Loading branch information
timothyasp committed Jun 28, 2023
1 parent f92ccf7 commit 3ca1a38
Show file tree
Hide file tree
Showing 2 changed files with 27 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -224,13 +224,33 @@
"docs"
]
},
{
"cell_type": "markdown",
"source": [
"## Using proxies\n",
"\n",
"Sometimes you might need to use proxies to get around IP blocks. You can pass in a dictionary of proxies to the loader (and `requests` underneath) to use them."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": null,
"id": "1dd8ab23",
"metadata": {},
"outputs": [],
"source": []
"source": [
"loader = WebBaseLoader(\n",
" \"https://www.walmart.com/search?q=parrots\", proxies={\n",
" \"http\": \"http://{username}:{password}:@proxy.service.com:6666/\",\n",
" \"https\": \"https://{username}:{password}:@proxy.service.com:6666/\"\n",
" }\n",
")\n",
"docs = loader.load()\n"
],
"metadata": {
"collapsed": false
}
}
],
"metadata": {
Expand Down
4 changes: 4 additions & 0 deletions langchain/document_loaders/web_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ def __init__(
web_path: Union[str, List[str]],
header_template: Optional[dict] = None,
verify: Optional[bool] = True,
proxies: Optional[dict] = None,
):
"""Initialize with webpage path."""

Expand Down Expand Up @@ -97,6 +98,9 @@ def __init__(
)
self.session.headers = dict(headers)

if proxies:
self.session.proxies.update(proxies)

@property
def web_path(self) -> str:
if len(self.web_paths) > 1:
Expand Down

0 comments on commit 3ca1a38

Please sign in to comment.