# **Web Scraping and API Calling in Python**

---

### **1. Setup and Importing Libraries**
We need to install and import the necessary Python libraries for web scraping and API calls:
- **`requests`**: For making HTTP requests to websites and APIs.
- **`BeautifulSoup`**: For parsing and navigating HTML data.
- **`pandas`**: For data manipulation and exporting data.
- **`time`**: To introduce delays between requests.



In [1]:
# Install the required libraries (run this only if the libraries are not already installed)
!pip install requests beautifulsoup4 pandas



In [3]:
# Importing Libraries
import requests
from bs4 import BeautifulSoup
import pandas as pd
import json
import time

### **2. Real-World Web Scraping Example: IMDb Top 250 Movies**

We’ll scrape the **IMDb Top 250 Movies** to fetch:
- Movie Titles
- Release Years
- IMDb Ratings

In [5]:
import requests  # Import library to make HTTP requests
from bs4 import BeautifulSoup  # Import library to parse HTML
import pandas as pd  # Import pandas for data manipulation

def scrape_imdb_top_250():  # Define main scraping function
    url = "https://www.imdb.com/chart/top/"  # Define IMDb top movies URL

    # Set headers to mimic browser request and avoid being blocked
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    }

    try:
        # Send GET request to URL with specified headers and timeout
        response = requests.get(url, headers=headers, timeout=10)
        # Raise exception for bad HTTP responses
        response.raise_for_status()

        # Parse HTML content using BeautifulSoup
        soup = BeautifulSoup(response.text, 'html.parser')

        # Select all movie list items using CSS selector
        rows = soup.select('ul.ipc-metadata-list li.ipc-metadata-list-summary-item')

        movies = []  # Initialize empty list to store movie data
        for row in rows[:250]:  # Iterate through first 250 movies
            # Extract movie title, removing ranking number
            title_elem = row.select_one('h3.ipc-title__text')
            title = title_elem.text.split('. ', 1)[1] if title_elem else "N/A"

            # Extract movie year from metadata
            year_elem = row.select_one('.cli-title-metadata .cli-title-metadata-item')
            year = year_elem.text if year_elem else "N/A"

            # Extract movie rating
            rating_elem = row.select_one('span.ipc-rating-star--imdb')
            rating = rating_elem.text.split()[0] if rating_elem else "N/A"

            # Append movie details to movies list
            movies.append({
                'Title': title,
                'Year': year,
                'Rating': rating
            })

        # Convert movies list to pandas DataFrame
        movies_df = pd.DataFrame(movies)
        # Save DataFrame to CSV file
        movies_df.to_csv('imdb_top_250.csv', index=False)

        return movies_df

    except requests.RequestException as e:
        # Print error message if request fails
        print(f"Error fetching IMDb data: {e}")
        return pd.DataFrame()

# Run the scraper if script is executed directly
if __name__ == "__main__":
    top_movies = scrape_imdb_top_250()  # Call scraping function
    print("Top 5 Movies:")  # Print first 5 movies
    print(top_movies.head())
    print("Data saved to imdb_top_250.csv")  # Confirm CSV save

Top 5 Movies:
                      Title  Year Rating
0  The Shawshank Redemption  1994    9.3
1             The Godfather  1972    9.2
2           The Dark Knight  2008    9.0
3    The Godfather: Part II  1974    9.0
4              12 Angry Men  1957    9.0
Data saved to imdb_top_250.csv


### **3. Real-World Web Scraping Example: Books to Scrape**
We’ll scrape data from **Books to Scrape**:
- Book Titles
- Prices
- Availability Status

In [7]:
# URL for the "Fiction" books category
books_url = "http://books.toscrape.com/catalogue/category/books/fiction_10/index.html"

# Fetch webpage content
response = requests.get(books_url)
soup = BeautifulSoup(response.content, 'html.parser')

# Extract book details
books = []
for book in soup.select('article.product_pod'):
    title = book.h3.a['title']
    price = book.select_one('.price_color').text
    availability = book.select_one('.availability').text.strip()
    books.append({'Title': title, 'Price': price, 'Availability': availability})

# Save to DataFrame
books_df = pd.DataFrame(books)
print("Sample Data:")
print(books_df.head())

# Save to CSV
books_df.to_csv('books_data.csv', index=False)
print("Books data saved to books_data.csv")

Sample Data:
                                               Title   Price Availability
0                                         Soumission  £50.10     In stock
1                        Private Paris (Private #10)  £47.61     In stock
2                       We Love You, Charlie Freeman  £50.27     In stock
3                                             Thirst  £17.27     In stock
4  The Murder That Never Was (Forensic Instincts #5)  £54.11     In stock
Books data saved to books_data.csv


### **4. Real-World API Example: OpenWeatherMap**
We will use the **OpenWeatherMap API** to fetch:
- Real-time weather data for cities like London, New York, and Tokyo.
- Data includes Temperature, Weather Description, and Humidity.

**Note:** Replace `"your_api_key_here"` with your OpenWeatherMap API Key.

In [9]:
import requests  # Import library for making HTTP requests
import pandas as pd  # Import pandas for data manipulation
from google.colab import userdata # Import the userdata module to access user secrets.


def get_coordinates(city):
    """Geocoding API to convert city names to coordinates"""
    # Define Geocoding API endpoint
    geocode_url = "http://api.openweathermap.org/geo/1.0/direct"
    # API key for authentication
    api_key = userdata.get('OPENWEATHERMAP_API_KEY')

    # Parameters for geocoding request
    params = {
        'q': city,  # City name
        'limit': 1,  # Limit to first result
        'appid': api_key  # API authentication
    }

    try:
        # Send GET request to geocoding API
        response = requests.get(geocode_url, params=params)
        # Raise exception for bad HTTP responses
        response.raise_for_status()
        # Parse JSON response
        data = response.json()

        # Check if coordinates found
        if data:
            # Return dictionary with latitude and longitude
            return {
                'lat': data[0]['lat'],
                'lon': data[0]['lon']
            }
        else:
            # Print error if no coordinates found
            print(f"No coordinates found for {city}")
            return None

    except requests.RequestException as e:
        # Handle any request-related errors
        print(f"Geocoding error for {city}: {e}")
        return None

def fetch_weather_data(cities):
    """Fetch weather data for given cities using coordinates"""
    # Weather API endpoint
    weather_url = "https://api.openweathermap.org/data/2.5/weather"
    # API key
    api_key = userdata.get('OPENWEATHERMAP_API_KEY')

    # List to store weather data
    weather_data = []

    # Iterate through each city
    for city in cities:
        # Get coordinates for the city
        coords = get_coordinates(city)

        # Proceed if coordinates found
        if coords:
            # Parameters for weather API request
            params = {
                'lat': coords['lat'],  # Latitude
                'lon': coords['lon'],  # Longitude
                'appid': api_key,  # API authentication
                'units': 'metric'  # Temperature in Celsius
            }

            try:
                # Send GET request to weather API
                response = requests.get(weather_url, params=params)
                # Raise exception for bad HTTP responses
                response.raise_for_status()
                # Parse JSON response
                data = response.json()

                # Append weather data to list
                weather_data.append({
                    'City': city,
                    'Latitude': coords['lat'],
                    'Longitude': coords['lon'],
                    'Temperature (C)': data['main']['temp'],
                    'Feels Like (C)': data['main']['feels_like'],
                    'Weather': data['weather'][0]['description'],
                    'Humidity (%)': data['main']['humidity'],
                    'Wind Speed (m/s)': data['wind']['speed']
                })

            except requests.RequestException as e:
                # Handle any request-related errors
                print(f"Weather data error for {city}: {e}")

    # Return collected weather data
    return weather_data

def main():
    # List of cities to fetch weather for
    cities = ["London", "New York", "Tokyo"]

    # Fetch weather data for cities
    weather_data = fetch_weather_data(cities)

    # Process and save data if retrieved
    if weather_data:
        # Convert to pandas DataFrame
        weather_df = pd.DataFrame(weather_data)
        # Print weather data
        print("Weather Data:")
        print(weather_df)

        # Save to CSV file
        weather_df.to_csv('weather_data.csv', index=False)
        print("Weather data saved to weather_data.csv")
    else:
        # Print message if no data retrieved
        print("No weather data retrieved.")

# Run main function if script is executed directly
if __name__ == "__main__":
    main()

ModuleNotFoundError: No module named 'google.colab'

### **5. Handling Errors in API Requests**

Error handling is critical for robust programs. Here, we handle:
- **Network issues**: Using `try-except` blocks around HTTP requests.
- **Invalid responses**: Using `response.raise_for_status()` to catch HTTP errors.
- **No data**: Checking if the API returns valid data before proceeding.

In [None]:
try:
    # Example with an invalid city
    city = "InvalidCity"
    api_key = userdata.get('OPENWEATHERMAP_API_KEY')
    params = {'q': city, 'appid': api_key}
    response = requests.get("http://api.openweathermap.org/geo/1.0/direct", params=params)
    response.raise_for_status()  # Raise an error for bad HTTP responses

    data = response.json()
    if not data:
        print(f"No data found for city: {city}")
    else:
        print(f"Geocoding data for {city}: {data}")
except requests.exceptions.HTTPError as http_err:
    print(f"HTTP error occurred: {http_err}")
except requests.exceptions.RequestException as req_err:
    print(f"Request error occurred: {req_err}")


No data found for city: InvalidCity


### **6. Respecting API Rate Limits and Adding Delays**

When making repeated requests (e.g., to scrape multiple pages or fetch data for multiple cities), respect rate limits using the `time.sleep()` function to introduce delays.




In [None]:
import time

# Example with delays
cities = ["London", "New York", "Tokyo"]
for city in cities:
    print(f"Fetching weather data for {city}...")
    # Add a delay of 2 seconds between API calls
    time.sleep(2)
    # Fetch weather data (code here calls the API functions written earlier)
    print(f"Data for {city} fetched!")


Fetching weather data for London...
Data for London fetched!
Fetching weather data for New York...
Data for New York fetched!
Fetching weather data for Tokyo...
Data for Tokyo fetched!


### **7. Exporting and Analyzing the Data**


The retrieved weather data can be saved in a CSV file for further analysis using tools like Pandas. This allows us to:
- Store structured data for future use.
- Analyze or visualize the data with other Python libraries or BI tools.



In [None]:
# Load and display the saved weather data
try:
    weather_df = pd.read_csv('weather_data.csv')
    print("Loaded Weather Data:")
    print(weather_df.head())
except FileNotFoundError:
    print("Weather data CSV file not found. Run the script to generate it first.")


Loaded Weather Data:
       City   Latitude   Longitude  Temperature (C)  Feels Like (C)  \
0    London  51.507322   -0.127647             8.35            6.47   
1  New York  40.712728  -74.006015             1.24           -4.61   
2     Tokyo  35.682839  139.759455             9.68            7.80   

            Weather  Humidity (%)  Wind Speed (m/s)  
0   overcast clouds            76              3.09  
1  scattered clouds            40              7.72  
2         clear sky            53              3.60  


### **8. Conclusion**

#### **Text Cell**:
**Key Takeaways**:
- **Web Scraping**: Extract structured data using `requests` and `BeautifulSoup`.
- **API Calling**: Use `requests` to interact with RESTful APIs.
- **Error Handling**: Catch and handle common errors to make your program robust.
- **Rate Limits**: Respect API usage guidelines with delays.
- **Data Export**: Save data in CSV format for further use and analysis.

This notebook provided examples using:
1. The **OpenWeatherMap API** for weather and geocoding data.
2. **Error handling** and **rate limits** for safe and efficient API use.
