# Congestion Charging - Easy

![rel](https://sqlzoo.net/w/images/f/f6/CongestionCharge.png)

camera(**id**, perim)

keeper(**id**, name, address)

vehicle(**id**, keeper)

image(**_camera_**, **whn**, reg)

permit(**_reg_**, **sDate**, chargeType)

In [1]:
import findspark
import pandas as pd
findspark.init()

SVR = '192.168.31.31'
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql import Window

sc = (SparkSession.builder.appName('app18-1') 
      .master(f'spark://{SVR}:7077') 
      .config('spark.sql.warehouse.dir', f'hdfs://{SVR}:9000/user/hive/warehouse') 
      .config('spark.cores.max', '4') 
      .config('spark.executor.instances', '1') 
      .config('spark.executor.cores', '2') 
      .config('spark.executor.memory', '10g') 
      .enableHiveSupport().getOrCreate())

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


In [2]:
camera = sc.read.table('sqlzoo.camera')
keeper = sc.read.table('sqlzoo.keeper')
vehicle = sc.read.table('sqlzoo.vehicle')
image = sc.read.table('sqlzoo.image')
permit = sc.read.table('sqlzoo.permit')

## Sample query

List the vehicles for which 'Strenuous, Sam' is the registered keeper. The link between Keepers and Vehicles is via the foreign key specified in the CREATE TABLE vehicle statement. Note the line:

```
 ,FOREIGN KEY(keeper) REFERENCES keeper(id)
```

This will be the basis of our join condition.

## 1.
Show the name and address of the keeper of vehicle SO 02 PSP.

In [3]:
(keeper.join(vehicle.filter(vehicle['id']=='SO 02 PSP'),
             on=(keeper['id']==vehicle['keeper']))
 .select('name', 'address')
 .toPandas())

                                                                                

Unnamed: 0,name,address
0,"Strenuous, Sam",Surjection Street


## 2.
Show the number of cameras that take images for incoming vehicles.

In [4]:
(camera.filter(col('perim')=='IN')
 .groupBy()
 .agg(count('id').alias('cnt'))
 .toPandas())

Unnamed: 0,cnt
0,8


## 3.
List the image details taken by Camera 10 before 26 Feb 2007.

In [5]:
(image.filter(image['whn'] < '2007-02-26')
 .join(camera, on=(image['camera']==camera['id']))
 .orderBy('whn', 'camera')
 .toPandas())

Unnamed: 0,camera,whn,reg,id,perim
0,1,2007-02-25 06:10:13.0,SO 02 ASP,1,IN
1,17,2007-02-25 06:20:01.0,SO 02 ASP,17,
2,18,2007-02-25 06:23:40.0,SO 02 ASP,18,
3,9,2007-02-25 06:26:04.0,SO 02 ASP,9,OUT
4,17,2007-02-25 06:57:31.0,SO 02 CSP,17,
5,17,2007-02-25 07:00:40.0,SO 02 CSP,17,
6,12,2007-02-25 07:04:31.0,SO 02 CSP,12,OUT
7,5,2007-02-25 07:10:00.0,SO 02 GSP,5,IN
8,16,2007-02-25 07:13:00.0,SO 02 GSP,16,OUT
9,2,2007-02-25 07:20:01.0,SO 02 TSP,2,IN


## 4.
List the number of images taken by each camera. Your answer should show how many images have been taken by camera 1, camera 2 etc. The list must NOT include the images taken by camera 15, 16, 17, 18 and 19.

In [6]:
(image.filter(~ image['camera'].between(15, 19))
 .groupBy('camera')
 .agg(count('reg').alias('cnt'))
 .toPandas())

Unnamed: 0,camera,cnt
0,12,4
1,1,1
2,3,5
3,5,1
4,9,8
5,8,2
6,10,4
7,11,3
8,2,1


## 5.
A number of vehicles have permits that start on 30th Jan 2007. List the name and address for each keeper in alphabetical order without duplication.

In [7]:
(keeper.withColumnRenamed('id', 'keeper')
 .join(vehicle, on='keeper')
 .join(permit.filter(to_date(permit['sdate'])=='2007-01-30'),
       on=(vehicle['id']==permit['reg']))
 .select('name', 'address')
 .distinct()
 .orderBy('name')
 .toPandas())

Unnamed: 0,name,address
0,"Ambiguous, Arthur",Absorption Ave.
1,"Assiduous, Annie",Attribution Alley
2,"Contiguous, Carol",Circumscription Close
3,"Strenuous, Sam",Surjection Street


In [8]:
sc.stop()