# Workshop Youtube API 1. Keyword search for videos and getting their metadata

This Colab is part of a series that introduces some of the things you can do with the free YouTube API. This workshop covers how you perform a search and  download metadata for all results in that search.

# Run the cell below first to install and import the necessary code

It takes some time, so be patient.

In [4]:
!pip install --upgrade google-api-python-client
!pip install --upgrade google-auth-oauthlib google-auth-httplib2
!pip install isodate

import pandas as pd
from apiclient.discovery import build
from apiclient.errors import HttpError

Collecting google-api-python-client
  Downloading google_api_python_client-2.161.0-py2.py3-none-any.whl.metadata (6.7 kB)
Downloading google_api_python_client-2.161.0-py2.py3-none-any.whl (12.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.9/12.9 MB[0m [31m78.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: google-api-python-client
  Attempting uninstall: google-api-python-client
    Found existing installation: google-api-python-client 2.160.0
    Uninstalling google-api-python-client-2.160.0:
      Successfully uninstalled google-api-python-client-2.160.0
Successfully installed google-api-python-client-2.161.0
Collecting isodate
  Downloading isodate-0.7.2-py3-none-any.whl.metadata (11 kB)
Downloading isodate-0.7.2-py3-none-any.whl (22 kB)
Installing collected packages: isodate
Successfully installed isodate-0.7.2


# Verify Your Identity
When you run the cell below, it will ask for your API key. You can obtain this key via Google: https://developers.google.com/youtube/v3/getting-started. Save it in a secure location.

Here is a thorough walk-through for obtaining a key from google: https://blog.hubspot.com/website/how-to-get-youtube-api-key

Run the cell, enter your key, press return.

In [5]:
key=input("What is your API-key?")
from IPython.display import clear_output
clear_output(wait=False)

# Let's build the YouTube "engine" and call it "youtube".
Note how "youtube" is used throughout the code below. Each time we call this "engine", and give it instructions, such as:

youtube.search  
youtube.videos  
youtube.channels   

etc.

In [6]:
DEVELOPER_KEY = key
YOUTUBE_API_SERVICE_NAME = "youtube"
YOUTUBE_API_VERSION = "v3"
youtube = build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION,
    developerKey=DEVELOPER_KEY)#this is where we build our engine

# Create functions for later use
The large cell below creates functions. We are going to use these later by calling them to perform different tasks.


The functions are the following:

*   youtube_search takes a search string and returns a list of information posts
*   search_to_df takes the result from the search and extracts information on videoID, Channel, Title of video and description of video and arranges it in a dataframe
*   metaDataExtractor extracts the metadata for the search results from a list of video IDs




In [7]:
def youtube_search(q, max_results=50, order="relevance", type="video", language=None, token=None, location=None, location_radius=None, total=200):
    """Performs a YouTube keyword search with the specified parameters."""
    search_list = []
    iterations = -(-total // max_results)  # Calculate the number of iterations (ceil division)

    for _ in range(iterations):
        search_response = youtube.search().list(
            q=q,
            type=type,
            pageToken=token,
            order=order,
            part="id,snippet",
            maxResults=max_results,
            relevanceLanguage=language,
            location=location,
            locationRadius=location_radius
        ).execute()
        search_list.extend(search_response.get("items", []))
        print(search_response.get("pageInfo", {}))

        # Update the token for the next page or break if no token is available
        token = search_response.get("nextPageToken")
        if not token:
            break

    return search_list

def search_to_df(test_search):
    """Transforms a result from a keywords search into a dataframe."""
    search_list = [
        (
            post["id"]["videoId"],
            post["snippet"]["channelTitle"],
            post["snippet"]["title"],
            post["snippet"]["description"]
        )
        for post in test_search #Remove nested loop to iterate directly through search results
        if isinstance(post, dict) and post.get("id") and isinstance(post.get("id"), dict) and post["id"].get("videoId") #Check if post is a dict, if "id" key exists and is a dict, and if "videoId" key exists within "id"

    ]
    return pd.DataFrame(search_list, columns=["Id", "Channel", "Title", "Description"])

def metaDataExtractor(video_ids):
    """Takes a list of video IDs as input and returns their metadata."""
    import isodate  # For parsing duration

    video_data = []
    for num, video_id in enumerate(video_ids, start=1):
        res = youtube.videos().list(id=video_id, part="snippet,statistics,content_details").execute()
        video_data.append(res)
        print(f"\rDownloading metadata for video {num} of {len(video_ids)}. Please wait...", end="")

    metadata_list = []
    keys = {
        "Id": ("items", 0, "id"),
        "Channel": ("items", 0, "snippet", "channelTitle"),
        "Date": ("items", 0, "snippet", "publishedAt"),
        "Time": ("items", 0, "snippet", "publishedAt"),
        "Title": ("items", 0, "snippet", "title"),
        "Description": ("items", 0, "snippet", "description"),
        "Duration": ("items", 0, "contentDetails", "duration"),
        "Tags": ("items", 0, "snippet", "tags"),
        "Views": ("items", 0, "statistics", "viewCount"),
        "Likes": ("items", 0, "statistics", "likeCount"),
        "Favourite": ("items", 0, "statistics", "favoriteCount"),
        "Comments": ("items", 0, "statistics", "commentCount"),
    }

    for item in video_data:
        tempdict = {}
        for key, path in keys.items():
            try:
                value = item
                for p in path:
                    value = value[p] if isinstance(p, (int, str)) else value
                if key == "Duration":
                    value = isodate.parse_duration(value).seconds
                tempdict[key] = value
            except (KeyError, IndexError, TypeError):
                tempdict[key] = ""
        # Separate Date and Time from publishedAt
        if tempdict.get("Date"):
            tempdict["Date"], tempdict["Time"] = tempdict["Date"][:10], tempdict["Time"][11:19]
        metadata_list.append(tempdict)

    return metadata_list



def MetaDownloadDF(video_list):
    """A function that calls other functions above, and returns a Pandas DataFrame
    with metadata for a channel (given as input)"""
    metadict=metaDataExtractor(video_list)
    metadf=pd.DataFrame(metadict)
    return metadf

# Performing a search

Enter your search terms below, between quotation marks

In [25]:
search_terms="بيت المولد"#exchange for your own

Perform your search and transform the search results into a dataframe (excellike structure)

In [26]:
search=youtube_search(search_terms, total=10000)
searchdf=search_to_df(search)
searchdf

{'totalResults': 1000000, 'resultsPerPage': 50}
{'totalResults': 1000000, 'resultsPerPage': 50}
{'totalResults': 1000000, 'resultsPerPage': 50}
{'totalResults': 1000000, 'resultsPerPage': 50}
{'totalResults': 1000000, 'resultsPerPage': 50}
{'totalResults': 1000000, 'resultsPerPage': 50}
{'totalResults': 1000000, 'resultsPerPage': 50}
{'totalResults': 1000000, 'resultsPerPage': 50}
{'totalResults': 1000000, 'resultsPerPage': 50}
{'totalResults': 1000000, 'resultsPerPage': 50}




{'totalResults': 1000000, 'resultsPerPage': 50}


HttpError: <HttpError 403 when requesting https://youtube.googleapis.com/youtube/v3/search?q=%D8%A8%D9%8A%D8%AA+%D8%A7%D9%84%D9%85%D9%88%D9%84%D8%AF&type=video&pageToken=CKYEEAA&order=relevance&part=id%2Csnippet&maxResults=50&key=AIzaSyC5zJV_s8I_ZaoCDimch1HcGUX03jQ0PZA&alt=json returned "The request cannot be completed because you have exceeded your <a href="/youtube/v3/getting-started#quota">quota</a>.". Details: "[{'message': 'The request cannot be completed because you have exceeded your <a href="/youtube/v3/getting-started#quota">quota</a>.', 'domain': 'youtube.quota', 'reason': 'quotaExceeded'}]">


### Save search results to CSV
The code below saves a file that will appear under the folder to the left of here <----- (klick the folder symbol under the key symbol, if not visible.)

In [24]:
searchdf.to_csv("/content/drive/MyDrive/Forskning/Relikprojekt/Youtube/Resultat/"+search_terms+".csv")

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


+### Perhaps you want all metadata for all videos in the search results (publication date, channel, views, likes etc etc)
Then we first extract all the video ids

In [None]:
video_list=searchdf["Id"].tolist()

Then we use the MetaDownloadtoDF function from above

In [None]:
metadf=MetaDownloadDF(video_list)
metadf

And then we can save the result as a CSV-file.

In [None]:
metadf.to_csv("searchmetadf.csv")