# Platforms

In [The Politics of Platforms](https://journals.sagepub.com/doi/abs/10.1177/1461444809342738) Gillespie traces the history of the term "platform" and why it is being used by the technology industry today. In 2006 Google acquired YouTube for $1.65 billion. Gillespie argues that Google describes YouTube as a "platform" because it allows them to simultaneously speak to an audience of content creators and an audience of advertisers, which in turn allows the service to generate revenue, which was estimated in 2020 to be [\$15 billion/year](https://www.theverge.com/2020/2/3/21121207/youtube-google-alphabet-earnings-revenue-first-time-reveal-q4-2019). Calling itself a "platform" also allows Google to provide tools (such as [ContentID](https://en.wikipedia.org/wiki/Content_ID_(system)) for the policing of content, to side step regulation by the government, and thus to become a regulator of the web itself.

> Whatever possible tension there is between being a "platform" for empowering individual users and being a robust marketing ‘platform’ and being a ‘platform’ for major studio content is elided in the versatility of the term and the powerful appeal of the idea behind it. And the term is a valuable and persuasive token in legal environments, positing their service in a familiar metaphoric framework as merely the neutral provision of content, a vehicle for art rather than its producer or patron, where liability should fall to the users themselves. (p. 358)

Gillespie also points out that YouTube have a key role in "curating" content on the web. In this notebook we will look at the YouTube API and what it tells us about the sociotechnical politics of video curation on the web.

## The YouTube API

The [YouTube API](https://developers.google.com/youtube/v3/docs/) allows you to interact with the YouTube platform programatically. This API, or [Application Program Interface](https://en.wikipedia.org/wiki/API), is what allows people to create software services and applications that interact with YouTube. For example an Apple or Android smartphone app to upload a video from your phone to YouTube, add add or edit informatoin about it (title, description, etc). From previous modules we know that the video is a file which has metadata associated with it.

The [documentation](https://developers.google.com/youtube/v3/docs/) for the YouTube API describes a set of *resources* that you can work with, such as *Videos*, *Channels*, *Comments*, *Playlists*, *Members* and more. We are going to look specifically at the *Videos* resource and use it to retrieve information about this historic video created by CNN which was published to YouTube.

    https://www.youtube.com/watch?v=dH6EW7iSoLA

As a convenience I've embedded the video below using Jupyter's ability to add inline HTML, in this case the embed code that YouTube make available for their videos.

In [82]:
from IPython.display import HTML

HTML('''<iframe width="560" height="315" src="https://www.youtube.com/embed/dH6EW7iSoLA" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
''')

### API Keys

To talk to the YouTube API, and many other APIs, you first need to get an *API Key*. An API Key is a unique string that identifies you, the user of the API. Anyone with a Google account can get a key to talk to Google's APIs, including the YouTube API. The key is used to give you special permissions to manage your own content, and prevent other people from changing it without your permission. It is also used to limit the number of interactions you can perform with the API in a certain period of time (quotas).

I have given you a temporary API key in the ELMS assignment which you will need to add here. If you prefer you can [follow the instructions](https://console.developers.google.com/apis/library/youtube.googleapis.com) to get your own key.

When evaluating the ways that platforms shape the curation of data it is critically important to think about the role that *API keys* play in determining what is possible and what is not possible.

In [85]:
# Important: add the key from Canvas here between the quotes!

key = ""

### Videos

The [Videos](https://developers.google.com/youtube/v3/docs/videos) API Resource lets you add videos and retrieve information about them. You can see the various **methods** or operations that they allow you to perform on videos:

* **getRating:** Retrieves the ratings that the authorized user gave to a list of specified videos.
* **list:** Returns a list of videos that match the API request parameters.
* **insert:** Uploads a video to YouTube and optionally sets the video's metadata. 
* **update:** Updates a video's metadata.
* **delete:** Deletes a YouTube video.
* **rate:** Add a like or dislike rating to a video or remove a rating from a video.
* **reportAbuse:** Report a video for containing abusive content.

We are just going to look at one method [list](https://developers.google.com/youtube/v3/docs/videos/list) which is used to retrieve information about a video or set of videos.

### List Method

The **list** method itself has a set of *parameters* that allow you to control how the method is used. You can see these described in the [parameters section of the documentation](https://developers.google.com/youtube/v3/docs/videos/list#parameters).

Web APIs like the YouTube API are basically just instructions for how to construct URLs for data, usually [JSON] data. URLs are the addresses you see when you browse the web in your browser. Once you have a URL for some data you can use the HTTP protocol, the protocol your web browser uses to navigate the web, to *GET* the data.

So for example if we wanted to list the information for our video which has an ID of `dH6EW7iSoLA` we could use the **list** documentation to formulate the following URL:

    https://www.googleapis.com/youtube/v3/videos?id=dH6EW7iSoLA&part=contentDetails,statistics,snippet&key=key
    
I've chose to use the *id* parameter to identify the video, and the *part* parameter to select three parts of metadata that I'm interested in, and also passed it my key (which I've stubbed out).

Fortunately Python's [requests](https://requests.readthedocs.io/en/master/) module makes it pretty easy to construct these URLs and to use the HTTP protocol to fetch the data associated with them. First lets install it:

In [55]:
! pip3 --quiet install requests

Now we can use it to fetch the data. Notice how a Python dictionary is used to express the API parameters, and then the base URL and the parameters are passed to the `requests.get()` function.

In [88]:
import requests

params = {
    "id": "dH6EW7iSoLA",
    "part": "contentDetails,statistics,snippet",
    "key": key,
}

resp = requests.get('https://www.googleapis.com/youtube/v3/videos', params=params)
resp.json()

{'kind': 'youtube#videoListResponse',
 'etag': 'tkukhXwf6YS-rPbaV7XmncoCXJU',
 'items': [{'kind': 'youtube#video',
   'etag': 'rsj08Q7Mid9KWRbhhYFJ8q3lcXk',
   'id': 'dH6EW7iSoLA',
   'snippet': {'publishedAt': '2020-11-08T08:17:31Z',
    'channelId': 'UCQJiKgCuBwtje8BgzX0-9NQ',
    'title': "Hear Kamala Harris' tribute to her mother",
    'description': "Vice President-elect Kamala Harris pays tribute to her mother and the women who made her path possible during a victory speech in Wilmington, Delaware. Source: https://goo-gl.su/rhIN85\n\nSubscribe to CNN News! https://goo-gl.su/SsHTo3D3\nFollow CNN News on Facebook: https://www.facebook.com/cnn\nFollow CNN News on Twitter: https://twitter.com/cnnbrk\n\nRepublican Sen. Rick Scott breaks with President Donald Trump: https://goo-gl.su/CvQ5 \nPolitics on TikTok is anything but boring':https://goo-gl.su/5g98mSaZ\nVoting underway in swing state of Pennsylvania: https://goo-gl.su/7qTcKw7\nCNN Projection: Biden wins Michigan: https://goo-gl.

You can see a variety of information comes back, like the *title* and the *description* as well as *tags* that it was assigned and various sizes of images. But notice what is missing? Where is the video content itself?

YouTube's API lets you upload video and add metadata to it, but it doesn't let you download the video! To do that we need to turn to other tools.

## youtube-dl

There are other approaches to fetching data from web platforms than using their officially documented APIs. Since many web platforms like YouTube are "on the web" and viewable in web browsers, it's possible to automate the extraction of data directly from the web pages themselves. This is commonly known as [web scraping](https://en.wikipedia.org/wiki/Web_scraping). It sounds kind of messy, and it is, but it's also widely practiced, by none other than big companies like Google when they crawl and index the web for their search product, or by Facebook when they create a snippet of information about a URL you just pasted into a new post.

A platform's Terms of Service often does not permit scraping of their web pages. So it is good to be careful when doing it, and to pay particular attention to what you do with the collected data. For example you wouldn't want to crawl a website as a logged in user, and then publish that potentially sensitive material out on the public web again as [some researchers](https://www.wired.com/2016/05/okcupid-study-reveals-perils-big-data-science/) did with the dating site OKCupid. It is often a gray area as to what does and does not constitute a violation of a sites Terms of Service. Recently in [Sandvig v Barr](https://www.eff.org/deeplinks/2020/04/federal-judge-rules-it-not-crime-violate-websites-terms-service) a group of researchers won a lawsuit that made it not necessarily a crime to breach Terms of Service.

Many tools get developed to make it easier to extract content from the web. For exmaple the [youtube-dl](https://yt-dl.org/) utility, which is written in Python, is open source software that allows you to download videos and their metadata from YouTube and [over a thousand](https://yt-dl.org/supportedsites.html) other video service providers on the web. Recently the Recording Industry Association of America (RIAA) issued a takedown request to GitHub where the youtube-dl software was being developed. The RIAA requested that GitHub remove youtube-dl from its platform, and GitHub complied. However some like the Electronic Frontier Foundation (EFF) [point out](https://www.eff.org/deeplinks/2020/11/riaa-abuses-dmca-take-down-popular-tool-downloading-online-video) that the tool itself was not violating copyright, and can be used in a lawful way.

If you upload a video to YouTube, and aren't able to download it, shouldn't you be able to use a tool like youtube-dl to allow you to have a copy of your video? If the video is licensed to allow copying, as with a Creative Commons License, or is in the public domain, isn't a tool like youtube-dl appropriate? 

Lets install youtube-dl and use it to download the video for the video above. First we need to install youtube-dl:

In [None]:
!pip3 install --quiet youtube-dl 

### Download

Now we can download the video for: https://www.youtube.com/watch?v=dH6EW7iSoLA by creating a `YoutubeDL` object and using its `download()` method to download our YouTube URL. Notice we pass in a list, since download() can download multiple videos.

In [96]:
import youtube_dl

ydl = youtube_dl.YoutubeDL()
result = ydl.download(["https://www.youtube.com/watch?v=dH6EW7iSoLA"])

[youtube] dH6EW7iSoLA: Downloading webpage
[youtube] dH6EW7iSoLA: Downloading MPD manifest
[dashsegments] Total fragments: 40
[download] Destination: Hear Kamala Harris' tribute to her mother-dH6EW7iSoLA.f134.mp4
[download] 100% of 8.84MiB in 00:04.23MiB/s ETA 00:000:02
[dashsegments] Total fragments: 22
[download] Destination: Hear Kamala Harris' tribute to her mother-dH6EW7iSoLA.f140.m4a
[download] 100% of 3.22MiB in 00:02.47MiB/s ETA 00:000:01
[ffmpeg] Merging formats into "Hear Kamala Harris' tribute to her mother-dH6EW7iSoLA.mp4"
Deleting original file Hear Kamala Harris' tribute to her mother-dH6EW7iSoLA.f134.mp4 (pass -k to keep)
Deleting original file Hear Kamala Harris' tribute to her mother-dH6EW7iSoLA.f140.m4a (pass -k to keep)


Notice how youtube-dl prints out a log of what it's doing. It first downloads the video as an mp4 file. Then it downloads the audio as an m4a file, and then it merges the video and audio into a single mp4 file named: 

    Hear Kamala Harris' tribute to her mother-dH6EW7iSoLA.mp4
    
If you click on the file explorer on the left in Colab you can see the file. We can also use Python to examine the file, and see what the size is in bytes using techniques we learned back in Module 2.

In [106]:
import pathlib 

p = pathlib.Path("Hear Kamala Harris' tribute to her mother-dH6EW7iSoLA.mp4")
video_size = p.stat().st_size
print(video_size)

12715964


With a bit of math we can print out the size in megabytes too, or if you are lazy you can use [hurry.filesize](https://pypi.org/project/hurry.filesize/):

In [109]:
! pip install --quiet hurry.filesize

In [110]:
from hurry.filesize import size

size(video_size)

'12M'

## Exercises

1. Find a video on [YouTube](https://youtube.com) and use the requests code above in the *List Method* section to get and print the JSON metadata for it from the YouTube API. **Note:** you will need to know the *id* for the video you find. The id is the unique string that is contained in the video URL's `v` parameter. For example the YouTube video `https://www.youtube.com/watch?v=dH6EW7iSoLA` has an id of `dH6EW7iSoLA`.

2. Download the same video you found using youtube-dl as shown above. What format is the downloaded video file? What is the size of the file in bytes?

3. What reasons do you think Google has for not allowing the download of video through the YouTube API?

4. **Optional:** The CNN video on YouTube that we have been working with was also published directly on the CNN site at https://www.cnn.com/videos/politics/2020/11/08/kamala-harris-victory-speech-mom-women-postelex-vpx.cnn See if you can use youtube-dl to download it, using the technique above. If you can download the video file how does it compare to the one we downloaded from YouTube?