# URL

This example covers how to load `HTML` documents from a list of `URLs` into the `Document` format that we can use downstream.

## Unstructured URL Loader

For the examples below, please install the `unstructured` library and see [this guide](/docs/integrations/providers/unstructured/) for more instructions on setting up Unstructured locally, including setting up required system dependencies:

In [None]:
%pip install --upgrade --quiet unstructured

In [2]:
from langchain_community.document_loaders import UnstructuredURLLoader

urls = [
    "https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-february-8-2023",
    "https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-february-9-2023",
]

Pass in ssl_verify=False with headers=headers to get past ssl_verification errors.

In [3]:
loader = UnstructuredURLLoader(urls=urls)

data = loader.load()

data[0]

Document(page_content='Skip to main content\n\nSearch form\n\nHome\n\nWho We Are\n\nResearch\n\nPublications\n\nGet Involved\n\nPlanned Giving\n\nDonate\n\nRussian Offensive Campaign Assessment, February 8, 2023\n\nFeb 8, 2023 - ISW Press\n\nDownload the PDF\n\nKarolina Hird, Riley Bailey, George Barros, Layne Philipson, Nicole Wolkov, and Mason Clark\n\nFebruary 8, 8:30pm ET\n\nClick\xa0here\xa0to see ISW’s interactive map of the Russian invasion of Ukraine. This map is updated daily alongside the static maps present in this report.\n\nRussian forces have regained the initiative in Ukraine and have begun their next major offensive in Luhansk Oblast.\xa0The pace of Russian operations along the Svatove-Kreminna line in western Luhansk Oblast has increased markedly over the past week, and Russian sources are widely reporting that conventional Russian troops are attacking Ukrainian defensive lines and making marginal advances along the Kharkiv-Luhansk Oblast border, particularly northwest

## Selenium URL Loader

This covers how to load HTML documents from a list of URLs using the `SeleniumURLLoader`.

Using `Selenium` allows us to load pages that require JavaScript to render.


To use the `SeleniumURLLoader`, you have to install `selenium` and `unstructured`.

In [None]:
%pip install --upgrade --quiet selenium unstructured

In [6]:
from langchain_community.document_loaders import SeleniumURLLoader

urls = [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "https://goo.gl/maps/NDSHwePEyaHMFGwh8",
]

loader = SeleniumURLLoader(urls=urls)

data = loader.load()

data[1]

Document(page_content='Menu\n\nSearch\n\nClose\n\nCollapse side panel\n\n323,527 photos\n\nCN Tower\n\n4.6\n\n(71,071)\n\nTourist attraction\n\nOverview\n\nTickets\n\nReviews\n\nAbout\n\nDirections\n\nSave\n\nNearby\n\nSend to phone\n\nShare\n\nLandmark, over 553-metre tower featuring a glass floor & a revolving eatery with panoramic views.\n\nSponsoredBy CityPASS\n\nSave 42% at 5 top Toronto attractions.$92\xa0·\xa04.6(9k+)Entry included\n\nAdmission\n\nAbout these results\n\nGives you entry to this place\n\nCN Tower Official site$32.87\ue315Instant confirmation · Mobile ticket\n\nEvendo $60.43\ue315Multi-attraction pass · Free cancellation\n\nCityPASS $92.04\ue315Multi-attraction pass · Mobile ticket\n\nMore\n\n290 Bremner Blvd, Toronto, ON M5V 3L9, Canada\n\nOpen ⋅ Closes 9:30\u202fPM\n\nCanada Day hours\n\nMonday (Canada Day) 9:30\u202fAM–9:30\u202fPM Holiday hours Tuesday 9:30\u202fAM–9:30\u202fPM Wednesday 9:30\u202fAM–9:30\u202fPM Thursday 9:30\u202fAM–9:30\u202fPM Friday 9:30\u

## Playwright URL Loader

>[Playwright](https://github.com/microsoft/playwright) is an open-source automation tool developed by `Microsoft` that allows you to programmatically control and automate web browsers. It is designed for end-to-end testing, scraping, and automating tasks across various web browsers such as `Chromium`, `Firefox`, and `WebKit`.

This covers how to load HTML documents from a list of URLs using the `PlaywrightURLLoader`.

[Playwright](https://playwright.dev/) enables reliable end-to-end testing for modern web apps.

As in the Selenium case, `Playwright` allows us to load and render the JavaScript pages.

To use the `PlaywrightURLLoader`, you have to install `playwright` and `unstructured`. Additionally, you have to install the `Playwright Chromium` browser:

In [None]:
%pip install --upgrade --quiet playwright unstructured

In [9]:
!playwright install

Currently, nly the async method supported:

In [14]:
from langchain_community.document_loaders import PlaywrightURLLoader

urls = [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "https://goo.gl/maps/NDSHwePEyaHMFGwh8",
]

loader = PlaywrightURLLoader(urls=urls, remove_selectors=["header", "footer"])

data = await loader.aload()

data[0]

Document(page_content="Rick Astley - Never Gonna Give You Up (Official Music Video)\n\nSearch\n\nWatch later\n\nShare\n\nCopy link\n\nInfo\n\nShopping\n\nTap to unmute\n\n2x\n\nIf playback doesn't begin shortly, try restarting your device.\n\nUp next\n\nLiveUpcoming\n\nPlay Now\n\nRick Astley\n\nSubscribe\n\nSubscribed\n\nThe new album, 'Are We There Yet?' out now!\n\nRick Astley - Forever and More (Official Video)3:47\n\nThis video is unavailable\n\nAre We There Yet?15 videos\n\nRick Astley\n\nSubscribe\n\nSubscribed\n\nYou're signed out\n\nVideos you watch may be added to the TV's watch history and influence TV recommendations. To avoid this, cancel and sign in to YouTube on your computer.\n\nShare\n\nAn error occurred while retrieving sharing information. Please try again later.\n\n0:00\n\n0:00 / 3:32\n\nWatch full video\n\n•\n\nScroll for details\n\n•\n              \n            \n          \n        \n        \n          \n            \n          \n        \n      \n      \n    \