# Scraping URLs in Python with `urllib` and `BeautifulSoup`

Python provides a built-in module, `urllib`, to work with URLs. Here's how you can use it alongside `BeautifulSoup` to scrape data from a website.

### Steps:

1. **Initialize the Scraper Class**
   - The `__init__` method initializes the Scraper class with a URL as a parameter.
   - Example: Pass `"https://news.google.com/"` as the parameter during initialization.

2. **Retrieve HTML**
   - The `scrape` method uses the `urlopen()` function to send a request to the specified website.
   - The function returns a `Response` object containing:
     - HTML code of the website.
     - Metadata about the response.
   - Use the `read()` method to extract the HTML and store it in a variable like `html`.

3. **Parse the HTML**
   - Use `BeautifulSoup` to parse the HTML. This step makes the HTML easy to search and process:
     ```python
     from bs4 import BeautifulSoup
     soup = BeautifulSoup(html, "html.parser")
     ```

4. **Extract Links**
   - Use the `find_all` method to retrieve all `<a>` tags (hyperlinks):
     ```python
     links = soup.find_all("a")
     ```
   - Iterate through the returned tags and extract the `href` attribute (the actual URL) using the `get()` method:
     ```python
     for link in links:
         url = link.get("href")
         if url and "topics" in url:  # Filter URLs containing the string "topics"
             print(url)
     ```

### Example Implementation
Below is a complete example of how this process could be implemented:


In [10]:
from urllib.request import urlopen
from bs4 import BeautifulSoup

class Scraper:
    def __init__(self, url):
        self.url = url

    def scrape(self):
        response = urlopen(self.url)  # Fetch the HTML from the website
        html = response.read()  # Read the HTML content
        soup = BeautifulSoup(html, "html.parser")  # Parse the HTML with BeautifulSoup

        links = soup.find_all("a")  # Find all anchor tags
        for link in links:
            url = link.get("href")  # Get the href attribute
            if url and "topics" in url:  # Filter links containing "topics"
                print(url)

# Example usage
scraper = Scraper("https://news.google.com/")
scraper.scrape()

./topics/CAAqIggKIhxDQkFTRHdvSkwyMHZNRGxqTjNjd0VnSmxiaWdBUAE?hl=en-US&gl=US&ceid=US%3Aen
./topics/CAAqJggKIiBDQkFTRWdvSUwyMHZNRGx1YlY4U0FtVnVHZ0pWVXlnQVAB?hl=en-US&gl=US&ceid=US%3Aen
./topics/CAAqHAgKIhZDQklTQ2pvSWJHOWpZV3hmZGpJb0FBUAE?hl=en-US&gl=US&ceid=US%3Aen
./topics/CAAqJggKIiBDQkFTRWdvSUwyMHZNRGx6TVdZU0FtVnVHZ0pWVXlnQVAB?hl=en-US&gl=US&ceid=US%3Aen
./topics/CAAqJggKIiBDQkFTRWdvSUwyMHZNRGRqTVhZU0FtVnVHZ0pWVXlnQVAB?hl=en-US&gl=US&ceid=US%3Aen
./topics/CAAqJggKIiBDQkFTRWdvSUwyMHZNREpxYW5RU0FtVnVHZ0pWVXlnQVAB?hl=en-US&gl=US&ceid=US%3Aen
./topics/CAAqJggKIiBDQkFTRWdvSUwyMHZNRFp1ZEdvU0FtVnVHZ0pWVXlnQVAB?hl=en-US&gl=US&ceid=US%3Aen
./topics/CAAqJggKIiBDQkFTRWdvSUwyMHZNRFp0Y1RjU0FtVnVHZ0pWVXlnQVAB?hl=en-US&gl=US&ceid=US%3Aen
./topics/CAAqIQgKIhtDQkFTRGdvSUwyMHZNR3QwTlRFU0FtVnVLQUFQAQ?hl=en-US&gl=US&ceid=US%3Aen
./topics/CAAqJggKIiBDQkFTRWdvSUwyMHZNRFZxYUdjU0FtVnVHZ0pWVXlnQVAB?hl=en-US&gl=US&ceid=US%3Aen
./topics/CAAqHAgKIhZDQklTQ2pvSWJHOWpZV3hmZGpJb0FBUAE?hl=en-US&gl=US&ceid=US%3Aen
