# Guest House - Easy

Data for this assessment is available:

- [guesthouse data in MySQL format](http://sqlzoo.net/guesthouse.sql)
- [guesthouse data in Microsoft SQL Server format](http://sqlzoo.net/guesthouse-ms.sql)

Background

- Guests stay at a small hotel.
- Each booking is recorded in the table booking, the date of the first night of the booking is stored here (we do not record the date the booking was made)
- At the time of booking the room to be used is decided
- There are several room types (single, double..)
- The amount charged depends on the room type and the number of people staying and the number of nights
- Guests may be charged extras (for breakfast or using the minibar)
- **Database Description** | [Easy Problems](https://sqlzoo.net/wiki/Guest_House_Assessment_Easy) | [Medium Problems](https://sqlzoo.net/wiki/Guest_House_Assessment_Medium) | [Hard Problems](https://sqlzoo.net/wiki/Guest_House_Assessment_Hard)
- [Guest House Assessment Sample Queries](https://sqlzoo.net/wiki/Guest_House_Assessment_Sample_Queries)

![rel](https://sqlzoo.net/w/images/8/83/Hotel.png)

## Table booking
The table booking contains an entry for every booking made at the hotel. A booking is made by one guest - even though more than one person may be staying we do not record the details of other guests in the same room. In normal operation the table includes both past and future bookings.

```
+------------+--------------+---------+----------+-----------+---------------------+--------+--------------+
| booking_id | booking_date | room_no | guest_id | occupants | room_type_requested | nights | arrival_time |
+------------+--------------+---------+----------+-----------+---------------------+--------+--------------+
|       5001 | 2016-11-03   |     101 |     1027 |         1 | single              |      7 | 13:00        |
|       5002 | 2016-11-03   |     102 |     1179 |         1 | double              |      2 | 18:00        |
|       5003 | 2016-11-03   |     103 |     1106 |         2 | double              |      2 | 21:00        |
|       5004 | 2016-11-03   |     104 |     1238 |         1 | double              |      3 | 22:00        |
+------------+--------------+---------+----------+-----------+---------------------+--------+--------------+
```

## Table room
Rooms are either single, double, twin or family.

```
+-----+-----------+---------------+
| id  | room_type | max_occupancy |
+-----+-----------+---------------+
| 101 | single    |             1 |
| 102 | double    |             2 |
| 103 | double    |             2 |
| 104 | double    |             2 |
| 105 | family    |             3 |
+-----+-----------+---------------+
```

## Table rate
Rooms are charged per night, the amount charged depends on the "room type requested" value of the booking and the number of people staying:

```
+-----------+-----------+--------+
| room_type | occupancy | amount |
+-----------+-----------+--------+
| double    |         1 |  56.00 |
| double    |         2 |  72.00 |
| family    |         1 |  56.00 |
| family    |         2 |  72.00 |
| family    |         3 |  84.00 |
| single    |         1 |  48.00 |
| twin      |         1 |  50.00 |
| twin      |         2 |  72.00 |
+-----------+-----------+--------+
```

You can see that a double room with one person staying costs £56 while a double room with 2 people staying costs £72 per night

Note that the actual room assigned to the booking might not match the room required (a customer may ask for a single room but we actually assign her a double). In this case we charge at the "requirement rate".

In [1]:
import $ivy.`org.apache.spark::spark-sql:3.4.0`

import org.apache.log4j.{Level, Logger}
Logger.getLogger("org").setLevel(Level.OFF)

import org.apache.spark._
import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions.Window

val spark = {
    NotebookSparkSession.builder()
    .progress(false)
    .appName("app13-1")
    // .master("spark://192.168.31.31:7077")
    .master("local[*]")
    .config("spark.sql.warehouse.dir", 
            "hdfs://192.168.31.31:9000/user/hive/warehouse") 
    .config("spark.cores.max", "4") 
    .config("spark.executor.instances", "1") 
    .config("spark.executor.cores", "2") 
    .config("spark.executor.memory", "10g") 
    .config("spark.shuffle.service.enabled", "false") 
    .config("spark.dynamicAllocation.enabled", "false") 
    .config("spark.sql.catalogImplementation", "hive")
    .config("spark.sql.repl.eagerEval.enabled", "true")
    .config("spark.driver.allowMultipleContexts", "true")
    .getOrCreate()
}

Loading spark-stubs, spark-hive
Adding Hive conf dir /opt/hive/conf to classpath
Creating SparkSession


SLF4J: No SLF4J providers were found.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See https://www.slf4j.org/codes.html#noProviders for further details.


[32mimport [39m[36m$ivy.$                                  

[39m
[32mimport [39m[36morg.apache.log4j.{Level, Logger}
[39m
[32mimport [39m[36morg.apache.spark._
[39m
[32mimport [39m[36morg.apache.spark.sql._
[39m
[32mimport [39m[36morg.apache.spark.sql.types._
[39m
[32mimport [39m[36morg.apache.spark.sql.functions._
[39m
[32mimport [39m[36morg.apache.spark.sql.expressions.Window

[39m
[36mspark[39m: [32mSparkSession[39m = org.apache.spark.sql.SparkSession@191b2f1a

In [2]:
import spark.implicits._
def sc = spark.sparkContext
val hiveCxt = new org.apache.spark.sql.hive.HiveContext(sc)

[32mimport [39m[36mspark.implicits._
[39m
defined [32mfunction[39m [36msc[39m
[36mhiveCxt[39m: [32msql[39m.[32mhive[39m.[32mHiveContext[39m = org.apache.spark.sql.hive.HiveContext@11d07a78

In [3]:
// Credit to Aivean
implicit class RichDF(val ds:DataFrame) {
    def showHTML(limit: Int = 50, truncate: Int = 100) = {
        import xml.Utility.escape
        val data = ds.take(limit)
        val header = ds.schema.fieldNames.toSeq        
        val rows: Seq[Seq[String]] = data.map { row =>
          row.toSeq.map {cell =>
            val str = cell match {
              case null => "null"
              case binary: Array[Byte] => binary.map("%02X".format(_)).mkString("[", " ", "]")
              case array: Array[_] => array.mkString("[", ", ", "]")
              case seq: Seq[_] => seq.mkString("[", ", ", "]")
              case _ => cell.toString
            }
            if (truncate > 0 && str.length > truncate) {
              // do not show ellipses for strings shorter than 4 characters.
              if (truncate < 4) str.substring(0, truncate)
              else str.substring(0, truncate - 3) + "..."
            } else {
              str
            }
          }: Seq[String]
        }
    publish.html(s""" <table>
                <tr>
                 ${header.map(h => s"<th>${escape(h)}</th>").mkString}
                </tr>
                ${rows.map {row =>
                  s"<tr>${row.map{c => s"<td>${escape(c)}</td>" }.mkString}</tr>"
                }.mkString}
            </table>
        """)
    }
}

defined [32mclass[39m [36mRichDF[39m

In [4]:
val booking = hiveCxt.table("sqlzoo.booking")
val guest = hiveCxt.table("sqlzoo.guest")
val room = hiveCxt.table("sqlzoo.room")
val rate = hiveCxt.table("sqlzoo.rate")

[36mbooking[39m: [32mDataFrame[39m = [booking_id: int, booking_date: string ... 6 more fields]
[36mguest[39m: [32mDataFrame[39m = [id: int, first_name: string ... 2 more fields]
[36mroom[39m: [32mDataFrame[39m = [id: int, room_type: string ... 1 more field]
[36mrate[39m: [32mDataFrame[39m = [room_type: string, occupancy: int ... 1 more field]

## 1.
Guest 1183. Give the booking_date and the number of nights for guest 1183.

```
+--------------+--------+
| booking_date | nights |
+--------------+--------+
| 2016-11-27   |      5 |
+--------------+--------+
```

In [5]:
(booking.filter(col("guest_id")===1183)
 .select("booking_date", "nights")
 .showHTML())

booking_date,nights
2016-11-27,5


## 2.
When do they get here? List the arrival time and the first and last names for all guests due to arrive on 2016-11-05, order the output by time of arrival.

```
+--------------+------------+-----------+
| arrival_time | first_name | last_name |
+--------------+------------+-----------+
| 14:00        | Lisa       | Nandy     |
| 15:00        | Jack       | Dromey    |
| 16:00        | Mr Andrew  | Tyrie     |
| 21:00        | James      | Heappey   |
| 22:00        | Justin     | Tomlinson |
+--------------+------------+-----------+
```

In [6]:
(booking.filter(col("booking_date")==="2016-11-05")
 .join(guest, (booking("guest_id")===guest("id")))
 .select("arrival_time", "first_name", "last_name")
 .orderBy("arrival_time")
 .showHTML())

arrival_time,first_name,last_name
14:00,Lisa,Nandy
15:00,Jack,Dromey
16:00,Mr Andrew,Tyrie
21:00,James,Heappey
22:00,Justin,Tomlinson


## 3.
Look up daily rates. Give the daily rate that should be paid for bookings with ids 5152, 5165, 5154 and 5295. Include booking id, room type, number of occupants and the amount.

```
+------------+---------------------+-----------+--------+
| booking_id | room_type_requested | occupants | amount |
+------------+---------------------+-----------+--------+
|       5152 | double              |         2 |  72.00 |
|       5154 | double              |         1 |  56.00 |
|       5295 | family              |         3 |  84.00 |
+------------+---------------------+-----------+--------+
```

In [7]:
(booking
 .filter(col("booking_id").isin(List(5152, 5165, 5154, 5295): _*))
 .join(room, (booking("room_no")===room("id")))
 .join(rate, (room("room_type")===rate("room_type")) && 
             (col("occupants")===col("occupancy")))
 .select("booking_id", "room_type_requested", "occupants", "amount")
 .showHTML())

booking_id,room_type_requested,occupants,amount
5152,double,2,72.0
5154,double,1,56.0
5295,family,3,84.0


## 4.
Who’s in 101? Find who is staying in room 101 on 2016-12-03, include first name, last name and address.

```
+------------+-----------+-------------+
| first_name | last_name | address     |
+------------+-----------+-------------+
| Graham     | Evans     | Weaver Vale |
+------------+-----------+-------------+
```

In [8]:
(booking
 .filter((col("room_no")===101) && 
         (col("booking_date") >= "2016-12-03") && 
         (col("booking_date") <= date_add(
             lit("2016-12-03"), col("nights"))))
 .join(guest, (col("guest_id")===guest("id")))
 .select("first_name", "last_name", "address")
 .showHTML())

first_name,last_name,address
Graham,Evans,Weaver Vale


## 5.
How many bookings, how many nights? For guests 1185 and 1270 show the number of bookings made and the total number of nights. Your output should include the guest id and the total number of bookings and the total number of nights.

```
+----------+---------------+-------------+
| guest_id | COUNT(nights) | SUM(nights) |
+----------+---------------+-------------+
|     1185 |             3 |           8 |
|     1270 |             2 |           3 |
+----------+---------------+-------------+
```

In [9]:
(booking.filter(col("guest_id").isin(List(1185, 1270): _*))
 .groupBy("guest_id")
 .agg(count("nights").alias("COUNT(nights)"),
      sum("nights").alias("SUM(nights)"))
 .orderBy("guest_id")
 .showHTML())

guest_id,COUNT(nights),SUM(nights)
1185,3,8
1270,2,3


In [10]:
spark.stop()