# Congestion Charges - Hard

You may need to create views to complete these questions - but you do not have permission to create tables or views in the default schema. Your SQL commands are executed by user scott in schema gisq - you may create or drop views and tables in schema scott but not in gisq.

In [1]:
import findspark
import pandas as pd
findspark.init()

SVR = '192.168.31.31'
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql import Window

sc = (SparkSession.builder.appName('app18-3') 
      .master(f'spark://{SVR}:7077') 
      .config('spark.sql.warehouse.dir', f'hdfs://{SVR}:9000/user/hive/warehouse') 
      .config('spark.cores.max', '4') 
      .config('spark.executor.instances', '1') 
      .config('spark.executor.cores', '2') 
      .config('spark.executor.memory', '10g') 
      .enableHiveSupport().getOrCreate())

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


In [2]:
camera = sc.read.table('sqlzoo.camera')
keeper = sc.read.table('sqlzoo.keeper')
vehicle = sc.read.table('sqlzoo.vehicle')
image = sc.read.table('sqlzoo.image')
permit = sc.read.table('sqlzoo.permit')

## 1.
When creating a view in scott you must specify the schema name of the sources and the destination.

## 2.
There are four types of permit. The most popular type means that this type has been issued the highest number of times. Find out the most popular type, together with the total number of permits issued.

In [3]:
(permit.groupBy('chargetype')
 .agg(count('reg').alias('cnt'))
 .orderBy(col('cnt').desc())
 .limit(1)
 .toPandas())

                                                                                

Unnamed: 0,chargetype,cnt
0,Daily,27


## 3.
For each of the vehicles caught by camera 19 - show the registration, the earliest time at camera 19 and the time and camera at which it left the zone.

In [4]:
#  registrations showing at camera 19, the earliest time
(image.filter(image['camera']==19)
 .join(vehicle, on=(image['reg']==vehicle['id']))
 .groupBy('reg')
 .agg(min('whn').alias('earliest'))
 .join(image, on='reg', how='left')
 .filter(col('earliest') < col('whn'))
 .groupBy('reg', 'earliest')
 .agg(min('whn').alias('next'))
 .withColumnRenamed('reg', 'reg1')
 .join(image, on=((col('reg1')==image['reg']) & 
                  (col('next')==image['whn'])))
 .select('reg', 'earliest', 'next', 'camera')
 .toPandas())

Unnamed: 0,reg,earliest,next,camera
0,SO 02 CSP,2007-02-25 07:51:10.0,2007-02-25 07:55:11.0,18
1,SO 02 DSP,2007-02-25 16:31:01.0,2007-02-25 17:42:41.0,19
2,SO 02 JSP,2007-02-25 17:14:11.0,2007-02-25 17:17:03.0,3
3,SO 02 TSP,2007-02-25 07:23:00.0,2007-02-25 07:26:31.0,19


## 4.
For all 19 cameras - show the position as IN, OUT or INTERNAL and the busiest hour for that camera.

In [5]:
(camera
 .withColumn('type', when(camera['perim'].isNull(), 'INTERNAL').otherwise(camera['perim']))
 .join(image.withColumn('hr', hour(image['whn'])), 
       on=(camera['id']==image['camera']))
 .groupBy('camera', 'type', 'hr')
 .agg(count('id').alias('n'))
 .groupBy('camera', 'type', 'hr')
 .max('n')
 .orderBy('camera')
 .toPandas())

Unnamed: 0,camera,type,hr,max(n)
0,1,IN,6,1
1,2,IN,7,1
2,3,IN,18,2
3,3,IN,17,3
4,5,IN,7,1
5,8,IN,7,2
6,9,OUT,16,6
7,9,OUT,18,1
8,9,OUT,6,1
9,10,OUT,18,2


## 5.
Anomalous daily permits. Daily permits should not be issued for non-charging days. Find a way to represent charging days. Identify the anomalous daily permits.

In [6]:
(permit.filter((dayofweek(permit['sdate']).isin([1, 7])) &
               (permit['chargetype']=='Daily'))
 .toPandas())

Unnamed: 0,reg,sdate,chargetype
0,SO 02 ATP,2007-01-21 00:00:00.0,Daily
1,SO 02 BTP,2007-02-03 00:00:00.0,Daily
2,SO 02 BTP,2007-02-04 00:00:00.0,Daily
3,SO 02 CTP,2007-01-21 00:00:00.0,Daily
4,SO 02 FTP,2007-02-25 00:00:00.0,Daily
5,SO 02 HTP,2006-01-21 00:00:00.0,Daily
6,SO 02 HTP,2006-01-22 00:00:00.0,Daily
7,SO 02 JTP,2007-01-21 00:00:00.0,Daily


## 6.
Issuing fines: Vehicles using the zone during the charge period, on charging days must be issued with fine notices unless they have a permit covering that day. List the name and address of such culprits, give the camera and the date and time of the first offence.

In [7]:
# vehicles with permits
f = (vehicle.join(permit, on=(vehicle['id']==permit['reg']), how='left')
     .withColumn('sdate', to_timestamp(col('sdate')))
     .withColumn('edate', when(
        col('chargetype')=='Daily', col('sdate') + expr('interval 1 day')).when(
        col('chargetype')=='Weekly', col('sdate') + expr('interval 1 week')).when(
        col('chargetype')=='Monthly', col('sdate') + expr('interval 1 month')).when(
        col('chargetype')=='Annual', col('sdate') + expr('interval 1 year')))
     .join(image, on='reg', how='right')
     .join(keeper.withColumnRenamed('id', 'keeper'), on='keeper')
     .filter((col('whn')<col('sdate')) | (col('whn')>col('edate'))))

a = (f.groupBy('reg', 'name', 'address')
     .agg(min('whn').alias('first_offence')))

(a.alias('a')
 .join(f.alias('f').select('reg', 'whn', 'camera'), 
        on=((a['reg']==f['reg']) & (a['first_offence']==f['whn'])))
 .select(col('a.reg'), 'name', 'address', 'first_offence', 'camera')
 .toPandas())

Unnamed: 0,reg,name,address,first_offence,camera
0,SO 02 ISP,"Incongruous, Ingrid",Irresolution Pl.,2007-02-25 16:58:01.0,9
1,SO 02 JSP,"Inconspicuous, Iain",Interception Rd.,2007-02-25 17:07:00.0,3
2,SO 02 HSP,"Assiduous, Annie",Attribution Alley,2007-02-25 16:45:04.0,9
3,SO 02 GSP,"Incongruous, Ingrid",Irresolution Pl.,2007-02-25 07:10:00.0,5
4,SO 02 ASP,"Ambiguous, Arthur",Absorption Ave.,2007-02-25 06:10:13.0,1
5,SO 02 DSP,"Strenuous, Sam",Surjection Street,2007-02-25 16:29:11.0,18
6,SO 02 CSP,"Ambiguous, Arthur",Absorption Ave.,2007-02-25 06:57:31.0,17


In [8]:
sc.stop()