Prototype API and sample app for searching Google developer videos
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Search Google developer videos

This is a prototype API and application for searching transcripts and metadata for videos from the Google Developers, Android Developers and Chrome Developers channels.



With minor tweaking the app and API can be used to build search for any YouTube channel with manually captioned videos.

JSON response from


For those who prefer to access information by reading text rather than watching videos, the app provides downloadable transcripts:

Developer video transcript

The transcripts have Google Translate built in, so you can choose read them in a different language. Caption highlighting is synchronised with video playback — and you can tap or click on any part of a transcript to navigate through the video.

Google Translate French translation of developer video transcript


Search for something:

Readable transcripts for one or more videos:,3i9WFgMuKHs

Link to a query:

Data and transcript for a video:

Transcript only: or or

Multiple values — comma, semicolon or pipe delimiter, spaces OK:,iZZdhTUP5qg, iZZdhTUP5qg

Search any field for a query, spaces OK — can be a bit slow: 203

More shortcuts: c for captions, s for speaker — speakers are parsed from transcript:

Specify ranges for commentCount, dislikeCount, favoriteCount, likeCount, viewCount:>10000

Use any of these values to specify order:>10000&sort=viewCount

Add a hyphen for descending order:>10000&sort=-viewCount

Show items with titles that include 'Android' or

Items with speakers that include Reto and a title that includes Android:

Spaces are OK: Meier&title=Android

More complex stuff works too: Wear|description=Android Wear)&speakers=Reto Wear|description=Android Wear)&speakers=[Reto,Wayne]"Android Wear"|title=WebRTC Wear|description=Android Wear)&speakers=Timothy

Fuzzy matching — with apologies to Wayne :):

For dates, use 'from' and 'to', which can cope with anything Date can handle: // assumes text-only is a month this year 2014 // midnight, 1 January to midnight, 1 January

Get total for any quantity field — this query returns the total number of views for all videos:

Get total for any query and quantity field:

Get all individual values for any quantity field for all videos — returns an object keyed by amounts, values are number of occurrences for each amount:

Get all individual values for any quantity field for any query:

Build a chart from results (views for videos that mention 'Chrome'):

The code

Issues and pull requests welcome.

There are three code directories:

###app The web client (as used at This will automatically choose the local Node middle layer (below) if run from localhost.


Middle layer Node app to get data from the database. For testing, you can run this locally with the app running from localhost. The live version is on Nodejitsu at, for queries like this: (same as

###put Node app to get YouTube data and transcripts, massage the response and put it in a CouchDB database at


Why didn't you use Firebase?

Cloudant has Lucene search built in, and is based on CouchDB, which is easy to use from Node.

Firebase can now be used with Elasticsearch, but at the start of the project required extra installation.

Why didn't you just use MySQL or …

An SQL database with Lucene for full text search might have been more appropriate than CouchDB.

(This kind of search is actually much easier with Firebase now.)

How was CouchDB?

Good in some ways, and quick. In particular, the JSON/HTTP/REST styles feels fits well with Node/JavaScript development.

Problems came with full text search:

  • Full text search is not built into CouchDB, though it can be added on with Lucene or other search engines.
  • CouchDB searches return entire documents, with no 'partial' results. (In my case, a document represents all data for a video.) So, for example, to return only captions that include 'Android Wear', it's necessary to retrieve all the documents (in their entirety) that have captions that mention 'Android Wear' then filter.
  • CouchDB search queries cannot be combined: for example, 'get me all videos from 2013 with WebRTC in the title'. So, again, you have to add your own filter.

How big is the database?

Around 250MB, but more like 150MB without transcripts: the transcript for each document is really just a convenience to make it quick and simple to retrieve human readable transcripts, and replicates the captions (with a few tweaks).

How often is the data updated?

At present the database is updated manually to avoid code changes breaking it.

Why didn't you use io.js?

No big reason. Node.js has been around longer.

How many videos have transcripts?

When the repo was created: 4312 videos, 3550 with transcripts.

How did you get the speaker names?

With a bit of sneaky regexing these are parsed from transcripts. NB: speaker names are not parsable for many captions, so speaker search results may not always be complete.

Why are caption matches returned as span elements?

The primary use for the caption matches is within HTML markup. Returning JSON for each span might be neater and less verbose, but for most apps that would entail extra effort transforming to HTML.

How long does it take to store and index data?

This depends a lot on connectivity. From work, the app gets and inserts the video data and transcripts in under three minutes. From home, it takes about 10 minutes.

Indexing takes about 10 minutes.

What build tools do you use?

JSCS and JSHint with grunt and githooks to force validation on commit.

Chrome JSON formatting extensions, and were very useful.


  • General code refactoring.
  • Unit tests.
  • Better error handling.
  • Better Node socket handling: a lot of the code is deliberately synchronous to avoid errors.
  • The API is HTTP only as yet.
  • Use the official YouTube Captions API.
  • Move to Firebase. When the project started it was a bit tricky to implement full-text search with Firebase, so Cloudant was chosen (which has full text search built in). It's now pretty simple to use Firebase with ElasticSearch, so the data will be ported at some stage.
  • Database updates are done manually at the moment — mostly to avoid messing up the sample app. Easily automated.


Copyright 2015 Google, Inc.

Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Please note: this is not a Google product.