# BigQuery - JavaScript UDFs

This notebook demonstrates how you can write JavaScript UDFs and include them into your BigQuery SQL queries to include computation logic that can be expressed more simply in script than in SQL. The notebook also demonstrates how you can test the script before it is included into a query and processed by BigQuery.

JavaScript UDFs process records in a table or query via an arbitrary JavaScript logic. The function takes as arguments the record and an emit function. The function produces output records by invoking emit(..) for each output record. They are the equivalent of mappers in map/reduce. The input and output records are vanilla JavaScript objects.

----

NOTE:

* JavaScript UDFs are a new feature of BigQuery. In order to try the functionality here, you'll need to get your project white-listed.
* For an introduction to BigQuery and notebook basics, see the full list of introductory notebooks.


# Quick Look at the Data

BigQuery provides a corpus of Shakespeare's works as a dataset.

In [1]:
%%sql
SELECT word, word_count, corpus, LENGTH(word) as length
FROM [publicdata:samples.shakespeare]
ORDER BY length DESC
LIMIT 10

# JavaScript function

Lets say we want to query all words from all shakespeare texts for words that contain letter(s) in "shakespeare".

In [2]:
%%bigquery udf --module word_filter
/**
 * @param {{word: string, corpus: string, word_count: integer}} r
 * @param function({{word: string, corpus: string, count: integer}}) emitFn
 */
function(r, emitFn) {
  if (r.word.match(/[shakespeare]/) !== null) {
    var result = { word: r.word, corpus: r.corpus, count: r.word_count };
    emitFn(result);
  }
}

## Testing JavaScript

The notebooks allows testing JavaScript UDFs as JavaScript, before they are included into a SQL query. The name assigned to the JavaScript function is available in the page for experimenting with mock or dummy data.

In [3]:
# A few sample words to run through the function
word_filter([{ 'word': 'love', 'corpus': 'othello ', 'word_count': 78 },
             { 'word': 'not', 'corpus': 'kinghenryviii ', 'word_count': 171 },
             { 'word': 'midsummer', 'corpus': 'asyoulikeit ', 'word_count': 100 }])
              

# Including JavaScript into SQL

The JavaScript function can now be applied to a table or SQL query.

In [4]:
import gcp.bigquery as bq
shakespeare = bq.Table('publicdata:samples.shakespeare')

In [5]:
filtering_query = word_filter(shakespeare)

In [6]:
%%sql --module query
SELECT word, sum(count) as count
FROM $filter
WHERE LENGTH(word) >= 5
GROUP BY word 
ORDER BY count DESC
LIMIT 100

It might be interesting to check out the SQL resulting from applying a JavaScript UDF.

In [7]:
final_query = bq.Query(query, filter=filtering_query)
print final_query.sql

SELECT word, sum(count) as count
FROM (SELECT word, corpus, word_count FROM word_filter([publicdata:samples.shakespeare]))
WHERE LENGTH(word) >= 5
GROUP BY word 
ORDER BY count DESC
LIMIT 100


And of course execute it, and see the results...

In [8]:
final_query.results()