# Simple Web Scraper: Fetching News Headlines

This project demonstrates basic web scraping techniques using Python's popular libraries, **requests** and **BeautifulSoup**. The goal is to extract real-time headlines from a target news website.

### Python Concepts Demonstrated:
* **HTTP Requests:** Using the `requests` library to fetch HTML content.
* **HTML Parsing:** Utilizing `BeautifulSoup` to navigate and search the DOM structure.
* **Error Handling:** Implementing `try...except` and `response.raise_for_status()` for robust network error management.

---
### ‚ö†Ô∏è Ethical Note on Scraping
Always check a website's `robots.txt` file (e.g., `https://www.news24.com/robots.txt`) before scraping. Excessive requests can burden a server, and violating terms of service may lead to legal issues. This script is intended for educational purposes only.

In [1]:
import requests
from bs4 import BeautifulSoup

# --- Scraper Logic ---

try:
    # 1. URL of the website to scrape
    url = "https://www.news24.com"
    print(f"Attempting to connect to: {url}")

    # 2. Send an HTTP GET request to the url
    response = requests.get(url, timeout=10) # Added timeout for safety

    # Check for HTTP errors (4xx or 5xx status codes)
    response.raise_for_status()

    # 3. Parse the HTML content of the page
    soup = BeautifulSoup(response.content, "html.parser")
    print("Successfully fetched and parsed HTML content.")

    # 4. Find specific tags with a specific class
    # NOTE: The class 'example-class' is a placeholder. 
    # To extract real data, you must inspect the target website's HTML 
    # to find the correct CSS class for headlines/articles.
    target_tags = soup.find_all("div", class_="example-class")

    # 5. Print the tags
    if target_tags:
        print(f"\n--- Found {len(target_tags)} Tags ---")
        for tag in target_tags:
            # We print the stripped text to show just the content, not the HTML
            print(tag.get_text(strip=True)) 

    else:
        # This will be the result if 'example-class' is wrong
        print(f"\nNo tags found with the class 'example-class'. Try a different class!")

# Corrected error handling syntax
except requests.exceptions.RequestException as e:
    print(f"\nAn error occurred during the request: {e}")
    print("Check your internet connection or the URL.")
except Exception as e:
    print(f"\nAn unexpected error occurred: {e}")

# We don't need a separate run function since we execute the logic directly.

Attempting to connect to: https://www.news24.com
Successfully fetched and parsed HTML content.

No tags found with the class 'example-class'. Try a different class!


### üîß Next Steps: Targeting Real Headlines

The previous cell ran successfully but likely returned 0 results because the class name `'example-class'` is a placeholder. To extract actual headlines, follow these steps:

1.  **Open the Target Site:** Go to `https://www.news24.com` in your browser.
2.  **Inspect Element:** Right-click on a headline you want to scrape.
3.  **Find the Selector:** Use the browser's Developer Tools to identify the HTML tag and, more importantly, the **CSS class** (e.g., `h3` with class `article-title`).
4.  **Update the Code:** Once you find the real class (e.g., `article-title`), you must replace the placeholder:

    ```python
    # CHANGE THIS LINE:
    target_tags = soup.find_all("div", class_="example-class")

    # TO SOMETHING LIKE THIS (example only):
    # target_tags = soup.find_all("a", class_="article-title")
    ```

## Author & Project Attribution

This scraper project demonstrates network requests, HTML parsing, and error handling‚Äîfundamental skills in data engineering and automation.

| Detail | Information |
| :--- | :--- |
| **Author** | Lindokuhle Hlatshwayo |
| **Date Completed** | 07/11/2025 |
| **Email** | lindokuhlecebisa7@gmail.com |
| **GitHub** | https://github.com/lindokuhlecebisa/Lindokuhle-Hlatshwayo--Portfolio |