# SELECT from Nobel

## `nobel` Nobel Laureates

We continue practicing simple SQL queries on a single table.

This tutorial is concerned with a table of Nobel prize winners:

```
nobel(yr, subject, winner)
```

Using the `SELECT` statement.

In [1]:
import $ivy.`org.apache.spark::spark-sql:3.4.0`

import org.apache.log4j.{Level, Logger}
Logger.getLogger("org").setLevel(Level.OFF)

import org.apache.spark._
import org.apache.spark.sql._
import org.apache.spark.sql.functions._

val spark = {
    NotebookSparkSession.builder()
    .progress(false)
    .appName("app03")
    // .master("spark://192.168.31.31:7077")
    .master("local[*]")
    .config("spark.sql.warehouse.dir", 
            "hdfs://192.168.31.31:9000/user/hive/warehouse") 
    .config("spark.cores.max", "4") 
    .config("spark.executor.instances", "1") 
    .config("spark.executor.cores", "2") 
    .config("spark.executor.memory", "10g") 
    .config("spark.shuffle.service.enabled", "false") 
    .config("spark.dynamicAllocation.enabled", "false") 
    .config("spark.sql.catalogImplementation", "hive")
    .config("spark.sql.repl.eagerEval.enabled", "true")
    .config("spark.driver.allowMultipleContexts", "true")
    .getOrCreate()
}

Loading spark-stubs, spark-hive
Adding Hive conf dir /opt/hive/conf to classpath
Creating SparkSession


SLF4J: No SLF4J providers were found.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See https://www.slf4j.org/codes.html#noProviders for further details.


[32mimport [39m[36m$ivy.$                                  

[39m
[32mimport [39m[36morg.apache.log4j.{Level, Logger}
[39m
[32mimport [39m[36morg.apache.spark._
[39m
[32mimport [39m[36morg.apache.spark.sql._
[39m
[32mimport [39m[36morg.apache.spark.sql.functions._

[39m
[36mspark[39m: [32mSparkSession[39m = org.apache.spark.sql.SparkSession@16562d5a

In [2]:
import spark.implicits._
def sc = spark.sparkContext
val hiveCxt = new org.apache.spark.sql.hive.HiveContext(sc)

[32mimport [39m[36mspark.implicits._
[39m
defined [32mfunction[39m [36msc[39m
[36mhiveCxt[39m: [32msql[39m.[32mhive[39m.[32mHiveContext[39m = org.apache.spark.sql.hive.HiveContext@41c1703f

In [3]:
// Credit to Aivean
implicit class RichDF(val ds:DataFrame) {
    def showHTML(limit: Int = 50, truncate: Int = 100) = {
        import xml.Utility.escape
        val data = ds.take(limit)
        val header = ds.schema.fieldNames.toSeq        
        val rows: Seq[Seq[String]] = data.map { row =>
          row.toSeq.map {cell =>
            val str = cell match {
              case null => "null"
              case binary: Array[Byte] => binary.map("%02X".format(_)).mkString("[", " ", "]")
              case array: Array[_] => array.mkString("[", ", ", "]")
              case seq: Seq[_] => seq.mkString("[", ", ", "]")
              case _ => cell.toString
            }
            if (truncate > 0 && str.length > truncate) {
              // do not show ellipses for strings shorter than 4 characters.
              if (truncate < 4) str.substring(0, truncate)
              else str.substring(0, truncate - 3) + "..."
            } else {
              str
            }
          }: Seq[String]
        }
    publish.html(s""" <table>
                <tr>
                 ${header.map(h => s"<th>${escape(h)}</th>").mkString}
                </tr>
                ${rows.map {row =>
                  s"<tr>${row.map{c => s"<td>${escape(c)}</td>" }.mkString}</tr>"
                }.mkString}
            </table>
        """)
    }
}

defined [32mclass[39m [36mRichDF[39m

In [4]:
val nobel = hiveCxt.table("sqlzoo.nobel")

[36mnobel[39m: [32mDataFrame[39m = [yr: int, subject: string ... 1 more field]

## 1. Winners from 1950

Change the query shown so that it displays Nobel prizes for 1950.

In [5]:
nobel.filter($"yr"===1950).showHTML()

yr,subject,winner
1950,Chemistry,Kurt Alder
1950,Chemistry,Otto Diels
1950,Literature,Bertrand Russell
1950,Medicine,Philip S. Hench
1950,Medicine,Edward C. Kendall
1950,Medicine,Tadeus Reichstein
1950,Peace,Ralph Bunche
1950,Physics,Cecil Powell


## 2. 1962 Literature

Show who won the 1962 prize for Literature.

In [6]:
(nobel.filter(($"yr"===1962) && ($"subject"==="Literature"))
     .select($"winner")
     .showHTML())

winner
John Steinbeck


## 3. Albert Einstein

Show the year and subject that won 'Albert Einstein' his prize.

In [7]:
(nobel.filter($"winner"==="Albert Einstein")
    .select($"yr", $"subject").showHTML())

yr,subject
1921,Physics


## 4. Recent Peace Prizes

Give the name of the 'Peace' winners since the year 2000, including 2000.

In [8]:
(nobel.filter(($"yr">=2000) && ($"subject"==="Peace"))
     .select($"winner")
     .showHTML())

winner
Tunisian National Dialogue Quartet
Kailash Satyarthi
Malala Yousafzai
European Union
Ellen Johnson Sirleaf
Leymah Gbowee
Tawakel Karman
Liu Xiaobo
Barack Obama
Martti Ahtisaari


## 5. Literature in the 1980's

Show all details **(yr, subject, winner)** of the Literature prize winners for 1980 to 1989 inclusive.

In [9]:
(nobel.filter(($"yr".between(1980, 1989)) && 
              ($"subject"==="Literature"))
 .showHTML())

yr,subject,winner
1989,Literature,Camilo José Cela
1988,Literature,Naguib Mahfouz
1987,Literature,Joseph Brodsky
1986,Literature,Wole Soyinka
1985,Literature,Claude Simon
1984,Literature,Jaroslav Seifert
1983,Literature,William Golding
1982,Literature,Gabriel García Márquez
1981,Literature,Elias Canetti
1980,Literature,Czeslaw Milosz


## 6. Only Presidents

Show all details of the presidential winners:

- Theodore Roosevelt
- Woodrow Wilson
- Jimmy Carter
- Barack Obama

In [10]:
nobel.filter($"winner".isin(List(
    "Theodore Roosevelt", "Woodrow Wilson", "Jimmy Carter", 
    "Barack Obama"): _*)).showHTML()

yr,subject,winner
2009,Peace,Barack Obama
2002,Peace,Jimmy Carter
1919,Peace,Woodrow Wilson
1906,Peace,Theodore Roosevelt


## 7. John

Show the winners with first name John

In [11]:
(nobel.filter($"winner".startsWith("John"))
 .select($"winner").showHTML())

winner
John O'Keefe
John B. Gurdon
John C. Mather
John L. Hall
John B. Fenn
John E. Sulston
John Pople
John Hume
John E. Walker
John C. Harsanyi


## 8. Chemistry and Physics from different years

**Show the year, subject, and name of Physics winners for 1980 together with the Chemistry winners for 1984.**

In [12]:
(nobel.filter((($"subject"==="Physics") && 
               ($"yr"===1980)) ||
              (($"subject"==="Chemistry") &&
               ($"yr"===1984)))
     .select($"yr", $"subject", $"winner")
     .showHTML())

yr,subject,winner
1984,Chemistry,Bruce Merrifield
1980,Physics,James Cronin
1980,Physics,Val Fitch


## 9. Exclude Chemists and Medics

**Show the year, subject, and name of winners for 1980 excluding Chemistry and Medicine**

In [13]:
(nobel.filter(($"yr"===1980) && 
              ! ($"subject".isin(List("Chemistry", "Medicine"): _*)))
     .select($"yr", $"subject", $"winner").showHTML())

yr,subject,winner
1980,Economics,Lawrence R. Klein
1980,Literature,Czeslaw Milosz
1980,Peace,Adolfo Pérez Esquivel
1980,Physics,James Cronin
1980,Physics,Val Fitch


## 10. Early Medicine, Late Literature

Show year, subject, and name of people who won a 'Medicine' prize in an early year (before 1910, not including 1910) together with winners of a 'Literature' prize in a later year (after 2004, including 2004)

In [14]:
(nobel.filter((($"yr" < 1910) && ($"subject"==="Medicine")) ||
              (($"yr" >= 2004) && ($"subject"==="Literature")))
         .select($"yr", $"subject", $"winner")
         .showHTML())

yr,subject,winner
2015,Literature,Svetlana Alexievich
2014,Literature,Patrick Modiano
2013,Literature,Alice Munro
2012,Literature,Mo Yan
2011,Literature,Tomas Tranströmer
2010,Literature,Mario Vargas Llosa
2009,Literature,Herta Müller
2008,Literature,Jean-Marie Gustave Le Clézio
2007,Literature,Doris Lessing
2006,Literature,Orhan Pamuk


## 11. Umlaut

Find all details of the prize won by PETER GRÜNBERG

> _Non-ASCII characters_   
> The u in his name has an umlaut. You may find this link useful <https://en.wikipedia.org/wiki/%C3%9C#Keyboarding>

In [15]:
nobel.filter(upper($"winner")==="PETER GRÜNBERG").showHTML()

yr,subject,winner
2007,Physics,Peter Grünberg


## 12. Apostrophe

Find all details of the prize won by EUGENE O'NEILL

> _Escaping single quotes_   
> You can't put a single quote in a quote string directly. You can use two single quotes within a quoted string.

In [16]:
nobel.filter(upper($"winner")==="EUGENE O\'NEILL").showHTML()

yr,subject,winner
1936,Literature,Eugene O'Neill


## 13. Knights of the realm

Knights in order

**List the winners, year and subject where the winner starts with Sir. Show the the most recent first, then by name order.**

In [17]:
(nobel.filter($"winner".startsWith("Sir"))
     .select($"winner", $"yr", $"subject")
     .orderBy($"yr".desc, $"winner").showHTML())

winner,yr,subject
Sir Martin J. Evans,2007,Medicine
Sir Peter Mansfield,2003,Medicine
Sir Paul Nurse,2001,Medicine
Sir Harold Kroto,1996,Chemistry
Sir James W. Black,1988,Medicine
Sir Arthur Lewis,1979,Economics
Sir Nevill F. Mott,1977,Physics
Sir Bernard Katz,1970,Medicine
Sir John Eccles,1963,Medicine
Sir Frank Macfarlane Burnet,1960,Medicine


## 14. Chemistry and Physics last

The expression **subject IN ('Chemistry','Physics')** can be used as a value - it will be 0 or 1.

**Show the 1984 winners and subject ordered by subject and winner name; but list Chemistry and Physics last.**

In [18]:
(nobel.withColumn("flg", $"subject".isin(
    List("Chemistry", "Physics"): _*))
  .filter($"yr"===1984)
  .orderBy($"flg", $"subject", $"winner")
  .select($"winner", $"subject")
  .showHTML())

winner,subject
Richard Stone,Economics
Jaroslav Seifert,Literature
César Milstein,Medicine
Georges J.F. Köhler,Medicine
Niels K. Jerne,Medicine
Desmond Tutu,Peace
Bruce Merrifield,Chemistry
Carlo Rubbia,Physics
Simon van der Meer,Physics


In [19]:
spark.stop()