<a href="https://colab.research.google.com/github/raynerz/niche_research/blob/main/Niche_Research.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
#https://github.com/googleapis/google-api-python-client
!pip install google-api-python-client

# Build Youtube API


In [2]:
from googleapiclient.discovery import build

api_key='Replace this with your own API key' #TODO

youtube = build('youtube', 'v3', developerKey=api_key)

# Functions

## 1. Retrieve video ID's from a youtube search query

In [3]:
def get_video_ids(youtube, query, published_after, region_code, video_duration, max_results):
    """
    Retrieve video IDs from YouTube based on a specified search query and filtering parameters.

    Parameters:
    - youtube: The YouTube Data API service obtained by calling build('youtube', 'v3', developerKey=api_key).
    - query: A string representing the search query for which videos are to be retrieved.
    - published_after: A string representing the date and time in ISO 8601 format to filter videos published after this time.
    - region_code: A string representing the region code to focus the search results on a specific region.
    - videoDuration: A string representing the duration filter for videos, such as 'any', 'long', 'medium', or 'short'.
    - maxResults: An integer representing the maximum number of video IDs to retrieve.

    Returns:
    - A list of video IDs obtained from the search results.
    """
    # Set parameters for the search
    response = youtube.search().list(
        q=query,
        part='snippet',
        type="video",
        order="viewCount",
        publishedAfter=published_after,
        regionCode=region_code,
        videoDuration=video_duration,
        maxResults=max_results
    ).execute()

    # Process the search results
    video_ids = []
    for item in response['items']:
        if item['id']['kind'] == 'youtube#video':
            video_ids.append(item['id']['videoId'])

    return video_ids


## 2. Retrieve detailed information of the IDs you just searched for

In [4]:
def get_video_info(youtube, video_ids):
    """
    Retrieve detailed information for a list of YouTube videos based on their video IDs.

    Parameters:
    - youtube: The YouTube Data API service obtained by calling build('youtube', 'v3', developerKey=api_key).
    - video_ids: A list of strings containing the video IDs for which information is to be retrieved.

    Returns:
    - A list of dictionaries containing detailed information for each video, including its ID, title, duration in ISO 8601 format, and view count.
    """

    # Convert list of video IDs to a comma-separated string
    video_ids_str = ','.join(video_ids)

    # Make a request to the videos.list endpoint with the video IDs, 'snippet', 'contentDetails', and 'statistics' parts
    video_request = youtube.videos().list(
        part='snippet,contentDetails,statistics',
        id=video_ids_str
    )

    response = video_request.execute()

    video_info = []

    # Process the response
    for item in response['items']:
        video_id = item['id']
        title = item['snippet']['title']
        published_at = item['snippet']['publishedAt']
        duration_iso8601 = item['contentDetails']['duration']
        views = item['statistics']['viewCount'] if 'viewCount' in item['statistics'] else 'Not available'

        video_data = {
            'video_id': video_id,
            'title': title,
            'duration': duration_iso8601,
            'published_at': published_at,
            'views': views
        }

        video_info.append(video_data)

    return video_info

## 3. Pretty print video data in a markdown table

In [5]:
def print_video_data_as_markdown_table(video_info):
  # Determine maximum lengths for each column
  max_lengths = {
      'video_id': len(max((data['video_id'] for data in video_info), key=len)),
      'views': len(max((str(data['views']) for data in video_info), key=len)),
      'duration': len(max((data['duration'] for data in video_info), key=len)),
      'published_at': len(max((data['published_at'] for data in video_info), key=len)),
      'title': len(max((data['title'] for data in video_info), key=len)),
  }

  # Print table headers
  print("|", end=" ")
  for column, length in max_lengths.items():
      print(f"{column.capitalize().ljust(length)} |", end=" ")

  # Print table rows
  for video_data in video_info:
      print("|", end=" ")
      for column, length in max_lengths.items():
          value = str(video_data[column]).ljust(length)
          print(f"{value} |", end=" ")
      print()

#Usage

In [6]:
video_ids = get_video_ids(youtube,
                          query="Space",
                          published_after="2023-08-01T00:00:00Z",
                          region_code="US",
                          video_duration="long",
                          max_results=25)

In [7]:
video_info = get_video_info(youtube, video_ids)

In [8]:
print_video_data_as_markdown_table(video_info)

| Video_id    | Views   | Duration   | Published_at         | Title                                                                                         | | uwekHBt4KIs | 9417486 | PT21M24S   | 2023-08-19T14:59:56Z | Mystery of Apollo 13 Mission | Lost in Space | Dhruv Rathee                                   | 
| LlY79zjud-Q | 8697692 | PT1H50M    | 2023-10-14T17:30:41Z | The Ring of Fire: 2023 Annular Solar Eclipse (Official NASA Broadcast)                        | 
| udzaNR8o-48 | 7921428 | PT2H56M52S | 2023-08-13T20:00:13Z | The Universe: The Most DANGEROUS Phenomena in Our Solar System *3 Hour Marathon*              | 
| _IcgGYZTXQw | 7437877 | PT1H50M56S | 2023-09-02T07:41:32Z | Launch of PSLV-C57/Aditya-L1 Mission from Satish Dhawan Space Centre (SDSC) SHAR, Sriharikota | 
| R18q4F6Lqnk | 5232239 | PT21M16S   | 2023-08-22T16:41:40Z | Apollo Astronaut Breaks In Tears: "The Moon Is NOT What You Think!"                           | 
| FXpIXQf6Y7w | 5046836 | PT4H11M51S | 2023-10-