## Question 1: Write a python program to extract the video URL of the first five videos.

To extract the video URLs of the first five videos from a YouTube channel page such as "https://www.youtube.com/@PW-Foundation/videos", you can use web scraping techniques. Python libraries such as requests and BeautifulSoup are commonly used for this purpose. Alternatively, you can use Selenium if you need to handle dynamic content.

#### Here's an example using BeautifulSoup and requests:

* Using BeautifulSoup and requests

First, install the necessary libraries if you haven't already:

In [1]:
pip install requests beautifulsoup4

Note: you may need to restart the kernel to use updated packages.


In [2]:
##Then, use the following Python code to extract the video URLs:
import requests
from bs4 import BeautifulSoup

# URL of the YouTube channel's videos page
url = "https://www.youtube.com/@PW-Foundation/videos"

# Send a GET request to the URL
response = requests.get(url)
response.raise_for_status()  # Ensure the request was successful

# Parse the HTML content using BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')

# Find all video links (hrefs contain "/watch")
video_links = soup.find_all('a', href=True)

# Filter and collect the first five unique video URLs
video_urls = []
for link in video_links:
    href = link['href']
    if "/watch" in href and href not in video_urls:
        video_urls.append(f"https://www.youtube.com{href}")
        if len(video_urls) == 5:
            break

# Print the extracted video URLs
for idx, video_url in enumerate(video_urls, start=1):
    print(f"Video {idx}: {video_url}")


#### Explanation
1. Import Libraries: Import the necessary libraries requests and BeautifulSoup.
2. Send GET Request: Send a GET request to the YouTube channel's videos page.
3. Parse HTML Content: Use BeautifulSoup to parse the HTML content of the page.
4. Find Video Links: Find all <a> tags with href attributes containing /watch.
5. Filter and Collect URLs: Filter out the first five unique video URLs and add them to a list.
6. Print URLs: Print the extracted video URLs.

#### Using Selenium

If you encounter issues with dynamic content, you can use Selenium to automate a web browser to load the JavaScript content:

In [3]:
pip install selenium

Collecting selenium
  Downloading selenium-4.23.1-py3-none-any.whl (9.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.4/9.4 MB[0m [31m59.6 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting trio~=0.17
  Downloading trio-0.26.0-py3-none-any.whl (475 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m475.7/475.7 kB[0m [31m43.0 MB/s[0m eta [36m0:00:00[0m
Collecting typing_extensions~=4.9
  Downloading typing_extensions-4.12.2-py3-none-any.whl (37 kB)
Collecting trio-websocket~=0.9
  Downloading trio_websocket-0.11.1-py3-none-any.whl (17 kB)
Collecting websocket-client~=1.8
  Downloading websocket_client-1.8.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.8/58.8 kB[0m [31m9.6 MB/s[0m eta [36m0:00:00[0m
Collecting attrs>=23.2.0
  Downloading attrs-23.2.0-py3-none-any.whl (60 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.8/60.8 kB[0m [31m9.6 MB/s[0m eta 

In [None]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

# Set up the Selenium WebDriver
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service)

# URL of the YouTube channel's videos page
url = "https://www.youtube.com/@PW-Foundation/videos"

# Navigate to the URL
driver.get(url)

# Find video links using XPath
video_elements = driver.find_elements(By.XPATH, '//a[@href and contains(@href, "/watch")]')

# Extract the first five unique video URLs
video_urls = []
for video_element in video_elements:
    href = video_element.get_attribute('href')
    if href not in video_urls:
        video_urls.append(href)
    if len(video_urls) == 5:
        break

# Print the extracted video URLs
for idx, video_url in enumerate(video_urls, start=1):
    print(f"Video {idx}: {video_url}")

# Close the WebDriver
driver.quit()

## Question 2: Write a python program to extract the URL of the video thumbnails of the first five videos.

### Here’s how you can do it using BeautifulSoup and requests:

Using BeautifulSoup and requests

First, ensure you have the necessary libraries installed:

In [1]:
pip install requests beautifulsoup4

Note: you may need to restart the kernel to use updated packages.


In [2]:
import requests
from bs4 import BeautifulSoup

# URL of the YouTube channel's videos page
url = "https://www.youtube.com/@PW-Foundation/videos"

# Send a GET request to the URL
response = requests.get(url)
response.raise_for_status()  # Ensure the request was successful

# Parse the HTML content using BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')

# Find all video thumbnail images
thumbnail_images = soup.find_all('img', {'src': True, 'alt': True})

# Filter and collect the first five unique thumbnail URLs
thumbnail_urls = []
for img in thumbnail_images:
    src = img['src']
    if 'https://i.ytimg.com/vi/' in src and src not in thumbnail_urls:
        thumbnail_urls.append(src)
        if len(thumbnail_urls) == 5:
            break

# Print the extracted thumbnail URLs
for idx, thumbnail_url in enumerate(thumbnail_urls, start=1):
    print(f"Thumbnail {idx}: {thumbnail_url}")

### Using Selenium

If you encounter issues with dynamically loaded content, you can use Selenium:

In [3]:
pip install selenium

Collecting selenium
  Downloading selenium-4.23.1-py3-none-any.whl (9.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.4/9.4 MB[0m [31m67.8 MB/s[0m eta [36m0:00:00[0m00:01[0m:00:01[0m
[?25hCollecting trio~=0.17
  Downloading trio-0.26.0-py3-none-any.whl (475 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m475.7/475.7 kB[0m [31m39.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting websocket-client~=1.8
  Downloading websocket_client-1.8.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.8/58.8 kB[0m [31m9.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting trio-websocket~=0.9
  Downloading trio_websocket-0.11.1-py3-none-any.whl (17 kB)
Collecting typing_extensions~=4.9
  Downloading typing_extensions-4.12.2-py3-none-any.whl (37 kB)
Collecting exceptiongroup
  Downloading exceptiongroup-1.2.2-py3-none-any.whl (16 kB)
Collecting attrs>=23.2.0
  Downloading attrs-23.2.0-py3-none-any.whl (60 kB)
[2K 

In [None]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

# Set up the Selenium WebDriver
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service)

# URL of the YouTube channel's videos page
url = "https://www.youtube.com/@PW-Foundation/videos"

# Navigate to the URL
driver.get(url)

# Find video thumbnail images using XPath
thumbnail_elements = driver.find_elements(By.XPATH, '//img[@src and contains(@src, "https://i.ytimg.com/vi/")]')

# Extract the first five unique thumbnail URLs
thumbnail_urls = []
for thumbnail_element in thumbnail_elements:
    src = thumbnail_element.get_attribute('src')
    if src not in thumbnail_urls:
        thumbnail_urls.append(src)
    if len(thumbnail_urls) == 5:
        break

# Print the extracted thumbnail URLs
for idx, thumbnail_url in enumerate(thumbnail_urls, start=1):
    print(f"Thumbnail {idx}: {thumbnail_url}")

# Close the WebDriver
driver.quit()

## Question 3: Write a python program to extract the title of the first five videos.

### Using BeautifulSoup and requests
First, make sure you have the required libraries installed:

In [5]:
pip install requests beautifulsoup4

Note: you may need to restart the kernel to use updated packages.


In [6]:
import requests
from bs4 import BeautifulSoup

# URL of the YouTube channel's videos page
url = "https://www.youtube.com/@PW-Foundation/videos"

# Send a GET request to the URL
response = requests.get(url)
response.raise_for_status()  # Ensure the request was successful

# Parse the HTML content using BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')

# Find all video title elements
title_elements = soup.find_all('a', {'id': 'video-title'})

# Extract the first five unique video titles
video_titles = []
for title in title_elements:
    video_title = title.get('title')
    if video_title not in video_titles:
        video_titles.append(video_title)
    if len(video_titles) == 5:
        break

# Print the extracted video titles
for idx, video_title in enumerate(video_titles, start=1):
    print(f"Video {idx}: {video_title}")

## Using Selenium
If you encounter issues due to dynamically loaded content, you can use Selenium:

In [7]:
pip install selenium

Note: you may need to restart the kernel to use updated packages.


In [None]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

# Set up the Selenium WebDriver
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service)

# URL of the YouTube channel's videos page
url = "https://www.youtube.com/@PW-Foundation/videos"

# Navigate to the URL
driver.get(url)

# Find video title elements using XPath
title_elements = driver.find_elements(By.XPATH, '//*[@id="video-title"]')

# Extract the first five video titles
video_titles = []
for title_element in title_elements:
    video_title = title_element.get_attribute('title')
    if video_title not in video_titles:
        video_titles.append(video_title)
    if len(video_titles) == 5:
        break

# Print the extracted video titles
for idx, video_title in enumerate(video_titles, start=1):
    print(f"Video {idx}: {video_title}")

# Close the WebDriver
driver.quit()

## Question 4: Write a python program to extract the number of views of the first five videos.

### Using BeautifulSoup and requests
First, ensure you have the necessary libraries installed:

In [9]:
pip install requests beautifulsoup4

Note: you may need to restart the kernel to use updated packages.


In [10]:
import requests
from bs4 import BeautifulSoup

# URL of the YouTube channel's videos page
url = "https://www.youtube.com/@PW-Foundation/videos"

# Send a GET request to the URL
response = requests.get(url)
response.raise_for_status()  # Ensure the request was successful

# Parse the HTML content using BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')

# Find all elements containing the view count
view_count_elements = soup.find_all('span', class_='style-scope ytd-grid-video-renderer')

# Filter and collect the first five unique view counts
view_counts = []
for element in view_count_elements:
    # Check if the element contains 'views'
    if 'views' in element.text:
        views_text = element.text.strip()
        view_counts.append(views_text)
        if len(view_counts) == 5:
            break

# Print the extracted view counts
for idx, view_count in enumerate(view_counts, start=1):
    print(f"Video {idx}: {view_count}")

### Using Selenium
If the view count information is dynamically loaded, you may need to use Selenium:

In [11]:
pip install selenium

Note: you may need to restart the kernel to use updated packages.


In [None]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

# Set up the Selenium WebDriver
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service)

# URL of the YouTube channel's videos page
url = "https://www.youtube.com/@PW-Foundation/videos"

# Navigate to the URL
driver.get(url)

# Find video view count elements using XPath
view_count_elements = driver.find_elements(By.XPATH, '//*[@id="metadata-line"]/span[1]')

# Extract the first five unique view counts
view_counts = []
for element in view_count_elements:
    views_text = element.text.strip()
    if "views" in views_text:
        view_counts.append(views_text)
    if len(view_counts) == 5:
        break

# Print the extracted view counts
for idx, view_count in enumerate(view_counts, start=1):
    print(f"Video {idx}: {view_count}")

# Close the WebDriver
driver.quit()

## Question 5: Write a python program to extract the time of posting of video for the first five videos.

### Using BeautifulSoup and requests
First, ensure you have the necessary libraries installed:

In [13]:
pip install requests beautifulsoup4

Note: you may need to restart the kernel to use updated packages.


In [None]:
import requests
from bs4 import BeautifulSoup

# URL of the YouTube channel's videos page
url = "https://www.youtube.com/@PW-Foundation/videos"

# Send a GET request to the URL
response = requests.get(url)
response.raise_for_status()  # Ensure the request was successful

# Parse the HTML content using BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')

# Find all elements containing the time of posting
time_elements = soup.find_all('span', class_='style-scope ytd-grid-video-renderer')

# Filter and collect the first five unique times of posting
times_of_posting = []
for element in time_elements:
    # Check if the element contains time information (usually in the form of "X time ago")
    if 'ago' in element.text:
        time_text = element.text.strip()
        times_of_posting.append(time_text)
        if len(times_of_posting) == 5:
            break

# Print the extracted times of posting
for idx, time in enumerate(times_of_posting, start=1):
    print(f"Video {idx}: {time}")


### Using Selenium
If the time of posting information is dynamically loaded, you may need to use Selenium:

In [14]:
pip install selenium

Note: you may need to restart the kernel to use updated packages.


In [None]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

# Set up the Selenium WebDriver
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service)

# URL of the YouTube channel's videos page
url = "https://www.youtube.com/@PW-Foundation/videos"

# Navigate to the URL
driver.get(url)

# Find video time elements using XPath
time_elements = driver.find_elements(By.XPATH, '//*[@id="metadata-line"]/span[2]')

# Extract the first five unique times of posting
times_of_posting = []
for element in time_elements:
    time_text = element.text.strip()
    if "ago" in time_text:
        times_of_posting.append(time_text)
    if len(times_of_posting) == 5:
        break

# Print the extracted times of posting
for idx, time in enumerate(times_of_posting, start=1):
    print(f"Video {idx}: {time}")

# Close the WebDriver
driver.quit()