<a href="https://colab.research.google.com/github/sameermdanwer/python-assignment-/blob/main/Image_Scrapper_Assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Q1. Write a python program to extract the video URL of the first five videos.

To extract video URLs from the first five videos of a platform like YouTube, you can use web scraping libraries like BeautifulSoup and requests to retrieve the page source and extract the relevant URLs. Here's an example Python script to extract video URLs from a YouTube search result page (for educational purposes):

In [None]:
import requests
from bs4 import BeautifulSoup

# Function to extract video URLs
def get_video_urls(query):
    # Replace spaces in query with '+'
    query = query.replace(' ', '+')

    # YouTube search URL
    url = f"https://www.youtube.com/results?search_query={query}"

    # Send a request to YouTube
    response = requests.get(url)

    # Parse the HTML content
    soup = BeautifulSoup(response.text, 'html.parser')

    # Find all video links (they start with "/watch?v=")
    video_links = soup.find_all('a', href=True)

    # Extract the first 5 unique video URLs
    video_urls = []
    for link in video_links:
        href = link['href']
        if href.startswith("/watch?v=") and f"https://www.youtube.com{href}" not in video_urls:
            video_urls.append(f"https://www.youtube.com{href}")
            if len(video_urls) == 5:  # Break once we have 5 videos
                break

    return video_urls

# Example usage
search_query = "Python programming"
video_urls = get_video_urls(search_query)

# Print the extracted video URLs
for i, url in enumerate(video_urls, start=1):
    print(f"Video {i}: {url}")

In [None]:
pip install requests beautifulsoup4

# Explanation:
1. requests.get: Fetches the search results page HTML.
2. BeautifulSoup: Parses the HTML and extracts all <a> tags (links).
3. URL filtering: We extract links that start with /watch?v=, which corresponds to YouTube videos.
4. List of URLs: Collect the first five unique URLs.

# Q2. Write a python program to extract the URL of the video thumbnails of the first five videos.

To extract the URL of video thumbnails for the first five videos from a YouTube search result page, we can modify the previous program to specifically target thumbnail URLs. On YouTube, the video thumbnails usually follow a specific URL pattern and can be found in the HTML as img tags with a particular class or src attribute.

Below is an example Python script that extracts the thumbnail URLs:

# Python Code:

In [None]:
import requests
from bs4 import BeautifulSoup

# Function to extract thumbnail URLs
def get_thumbnail_urls(query):
    # Replace spaces in query with '+'
    query = query.replace(' ', '+')

    # YouTube search URL
    url = f"https://www.youtube.com/results?search_query={query}"

    # Send a request to YouTube
    response = requests.get(url)

    # Parse the HTML content
    soup = BeautifulSoup(response.text, 'html.parser')

    # Find all image tags containing thumbnails
    thumbnails = soup.find_all('img')

    # Extract the first 5 unique thumbnail URLs
    thumbnail_urls = []
    for thumb in thumbnails:
        src = thumb.get('src')
        if src and "https://i.ytimg.com" in src and src not in thumbnail_urls:
            thumbnail_urls.append(src)
            if len(thumbnail_urls) == 5:  # Break once we have 5 thumbnails
                break

    return thumbnail_urls

# Example usage
search_query = "Python programming"
thumbnail_urls = get_thumbnail_urls(search_query)

# Print the extracted thumbnail URLs
for i, url in enumerate(thumbnail_urls, start=1):
    print(f"Thumbnail {i}: {url}")

In [None]:
pip install requests beautifulsoup4

# Q3. Write a python program to extract the title of the first five videos.

To extract the titles of the first five videos from a YouTube search result page, we can modify the previous scraping approach to target the video titles. The titles of YouTube videos are usually enclosed in specific tags such as <a> or <h3> with certain attributes (like classes). Here's an example Python script to extract the video titles:

# Python Code:

In [None]:
import requests
from bs4 import BeautifulSoup

# Function to extract video titles
def get_video_titles(query):
    # Replace spaces in query with '+'
    query = query.replace(' ', '+')

    # YouTube search URL
    url = f"https://www.youtube.com/results?search_query={query}"

    # Send a request to YouTube
    response = requests.get(url)

    # Parse the HTML content
    soup = BeautifulSoup(response.text, 'html.parser')

    # Find all video titles
    titles = soup.find_all('a', {'title': True})

    # Extract the first 5 unique video titles
    video_titles = []
    for title in titles:
        video_title = title['title']
        if video_title not in video_titles:
            video_titles.append(video_title)
            if len(video_titles) == 5:  # Break once we have 5 titles
                break

    return video_titles

# Example usage
search_query = "Python programming"
video_titles = get_video_titles(search_query)

# Print the extracted video titles
for i, title in enumerate(video_titles, start=1):
    print(f"Video {i}: {title}")

# Requirements:


In [None]:
pip install requests beautifulsoup4

# Q4. Write a python program to extract the number of views of the first five videos.

To extract the number of views for the first five videos from a YouTube search result page, we need to identify where the view count is stored in the HTML structure. Typically, YouTube displays the view count near the title or metadata of the video.

We can scrape this information from the HTML using BeautifulSoup. Below is an example Python script that extracts the view count for the first five videos:


In [None]:
import requests
from bs4 import BeautifulSoup

# Function to extract the view counts of the first five videos
def get_video_views(query):
    # Replace spaces in query with '+'
    query = query.replace(' ', '+')

    # YouTube search URL
    url = f"https://www.youtube.com/results?search_query={query}"

    # Send a request to YouTube
    response = requests.get(url)

    # Parse the HTML content
    soup = BeautifulSoup(response.text, 'html.parser')

    # Find all spans or divs containing view count information
    # View count usually appears in the 'yt-view-count-renderer' span or similar class
    video_views = []

    # Get all divs with class 'style-scope ytd-video-meta-block' which often contains the view count
    views_divs = soup.find_all('div', class_="style-scope ytd-video-meta-block")

    # Extract the first 5 unique video view counts
    for div in views_divs:
        views = div.get_text(strip=True)
        if "views" in views.lower() and views not in video_views:
            video_views.append(views)
            if len(video_views) == 5:  # Break once we have 5 view counts
                break

    return video_views

# Example usage
search_query = "Python programming"
video_views = get_video_views(search_query)

# Print the extracted video view counts
for i, views in enumerate(video_views, start=1):
    print(f"Video {i}: {views}")

# Q5. Write a python program to extract the time of posting of video for the first five videos.

To extract the time of posting (e.g., "2 days ago", "1 month ago") for the first five videos from a YouTube search result page, you can use a similar approach as in the previous tasks. The time of posting is usually found within certain <span> or <div> elements that contain time-related information.

Here’s a Python program to extract the time of posting for the first five videos:

# Python Code:

In [None]:
import requests
from bs4 import BeautifulSoup

# Function to extract the time of posting for the first five videos
def get_video_posting_times(query):
    try:
        # Replace spaces in query with '+'
        query = query.replace(' ', '+')

        # YouTube search URL
        url = f"https://www.youtube.com/results?search_query={query}"

        # Send a request to YouTube
        response = requests.get(url)
        response.raise_for_status()  # Raise an exception for a bad status code

        # Parse the HTML content
        soup = BeautifulSoup(response.text, 'html.parser')

        # Find all elements containing time of posting information
        times = soup.find_all('span', class_='style-scope ytd-video-meta-block')

        # Extract the first 5 posting times (e.g., "2 days ago")
        posting_times = []
        for time in times:
            if 'ago' in time.text:  # Check if the text contains "ago" (indicative of time of posting)
                posting_times.append(time.text.strip())
                if len(posting_times) == 5:  # Break once we have 5 posting times
                    break

        # If fewer than 5 videos found, raise an exception
        if len(posting_times) < 5:
            raise ValueError("Fewer than 5 videos found.")

        return posting_times

    except requests.exceptions.RequestException as e:
        print(f"Error fetching the page: {e}")
    except ValueError as e:
        print(f"Error in extracting posting times: {e}")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")
    return []

# Example usage
search_query = "Python programming"
posting_times = get_video_posting_times(search_query)

# Print the extracted posting times
for i, posting_time in enumerate(posting_times, start=1):
    print(f"Video {i} Posted: {posting_time}")