This is a very rough sketch of what happens when you [use Spark](https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Spark) to run an SQL query on [our internal analytics cluster](https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster).

<img src="Hadoop-Spark-sketch.png">

Now, let's run a big query.

In [None]:
import findspark
findspark.init('/usr/lib/spark2')
from pyspark.sql import SparkSession

import wmfdata as wmf

big_query = """
SELECT
  SUBSTR(r.rev_timestamp, 0, 10) AS date,
  IF(array_contains(t.tags, "mobile edit"), 'mobile', 'desktop') AS access_method,
  array_contains(t.tags, "campaign-external-machine-translation") AS is_eg_edit,
  r.`database` = 'idwiki' AS is_id,
  r.rev_parent_id IS NULL AS is_new_page,
  COUNT(DISTINCT r.rev_id) AS edits
FROM event.mediawiki_revision_tags_change t
RIGHT OUTER JOIN event.mediawiki_revision_create r
ON
  t.rev_id = r.rev_id AND
  r.year >= 2017 AND
  t.year >= 2017 AND
  r.page_id = t.page_id AND
  r.`database` = t.`database` AND
  r.page_namespace = t.page_namespace
WHERE
  NOT r.performer.user_is_bot AND
  NOT ARRAY_CONTAINS(r.performer.user_groups, 'bot') AND
  SUBSTR(r.rev_timestamp, 0, 4) >= 2017 AND
  r.meta.domain LIKE '%wikipedia%'
GROUP BY 
  SUBSTR(r.rev_timestamp, 0, 10), 
  ARRAY_CONTAINS(t.tags, "mobile edit"),
  ARRAY_CONTAINS(t.tags, "campaign-external-machine-translation"),
  r.`database` = 'idwiki', r.rev_parent_id IS NULL
ORDER BY date ASC
LIMIT 1000000
"""

wmf.hive.run(big_query, app_name = "yarn-demo")

Now, we can use the YARN tracker at [yarn.wikimedia.org](https://yarn.wikimedia.org) to learn more about what's going on.
1. Find the application corresponding to this query by looking for "yarn-demo" in the "Name" column.
1. Go to the application details by clicking on the application ID, which will be something like "application_1576512674871_254013".
1. This page has basic information, but for running Spark queries, you can get *much* more information by clicking on "Tracking URL: ApplicationMaster". Here's part of what you'll see:
<img src="Spark-application-UI-example.png">