-
Notifications
You must be signed in to change notification settings - Fork 0
/
homework_sql.txt
92 lines (80 loc) · 2.53 KB
/
homework_sql.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
Question 1:
Create a materialized view to compute the average, min and max trip time between each taxi zone.
From this MV, find the pair of taxi zones with the highest average trip time.
CREATE MATERIALIZED VIEW trip_time_stats AS
SELECT
tz1.zone AS pickup_zone,
tz2.zone AS dropoff_zone,
AVG(EXTRACT(EPOCH FROM (tpep_dropoff_datetime - tpep_pickup_datetime))) AS avg_trip_time,
MIN(EXTRACT(EPOCH FROM (tpep_dropoff_datetime - tpep_pickup_datetime))) AS min_trip_time,
MAX(EXTRACT(EPOCH FROM (tpep_dropoff_datetime - tpep_pickup_datetime))) AS max_trip_time
FROM
trip_data td
JOIN
taxi_zone tz1 ON td.pulocationid = tz1.location_id
JOIN
taxi_zone tz2 ON td.dolocationid = tz2.location_id
GROUP BY
tz1.zone, tz2.zone;
SELECT
pickup_zone,
dropoff_zone,
avg_trip_time
FROM
trip_time_stats
ORDER BY
avg_trip_time DESC
LIMIT 1;
ANSWER: Yorkville East | Steinway | 86373.000000
Question 2:
Recreate the MV(s) in question 1, to also find the number of trips for the pair of taxi zones with the highest average trip time.
CREATE MATERIALIZED VIEW trip_time_stats_with_count AS
SELECT
tz1.zone AS pickup_zone,
tz2.zone AS dropoff_zone,
COUNT(*) AS num_trips,
AVG(EXTRACT(EPOCH FROM (tpep_dropoff_datetime - tpep_pickup_datetime))) AS avg_trip_time,
MIN(EXTRACT(EPOCH FROM (tpep_dropoff_datetime - tpep_pickup_datetime))) AS min_trip_time,
MAX(EXTRACT(EPOCH FROM (tpep_dropoff_datetime - tpep_pickup_datetime))) AS max_trip_time
FROM
trip_data td
JOIN
taxi_zone tz1 ON td.pulocationid = tz1.location_id
JOIN
taxi_zone tz2 ON td.dolocationid = tz2.location_id
GROUP BY
tz1.zone, tz2.zone;
SELECT
pickup_zone,
dropoff_zone,
num_trips,
avg_trip_time
FROM
trip_time_stats_with_count
ORDER BY
avg_trip_time DESC
LIMIT 1;
ANSWER: 1
Question 3:
From the latest pickup time to 17 hours before, what are the top 3 busiest zones in terms of number of pickups?
SELECT
tz.zone AS pickup_zone,
COUNT(*) AS num_pickups
FROM
trip_data td
JOIN
taxi_zone tz ON td.pulocationid = tz.location_id
WHERE
td.tpep_pickup_datetime >= (SELECT MAX(tpep_pickup_datetime) - INTERVAL '17 hours' FROM trip_data)
AND td.tpep_pickup_datetime <= (SELECT MAX(tpep_pickup_datetime) FROM trip_data)
GROUP BY
tz.zone
ORDER BY
num_pickups DESC
LIMIT 3;
ANSWER:
pickup_zone | num_pickups
---------------------+-------------
LaGuardia Airport | 19
JFK Airport | 17
Lincoln Square East | 17