# Information Systems for Engineers Fall 2024 - Cheat Sheet

During the exam, you will be required to write SQL queries using a Jupyter notebook.

This notebook is designed to help you start writing your queries by providing you an environment with the datasets loaded and a simple query that you can use to recap the syntax of SQL.

Feel free to extend this notebook and use it for preparing the answers you need for the exam. Take into account that the content of this notebook will not be considered for grading.

## SQL

There is a local PostgreSQL 16.4 installation with a dataset loaded into a database. Run the next cell to connect to it.

**PLEASE NOTE: IF YOUR NOTEBOOK CRASHES TRY TO RESTART THE KERNEL AS SHOWN BELOW. IF THIS DOESN'T FIX THE PROBLEM ASK A TA.**
![instructions.png](attachment:instructions.png)

In [1]:
%load_ext sql
%sql postgresql://postgres:example@db 

To print the tables currently loaded in the database run:

In [2]:
%%sql

SELECT * 
FROM INFORMATION_SCHEMA.TABLES 
WHERE TABLE_TYPE = 'BASE TABLE' and TABLE_CATALOG = 'examdb' and TABLE_SCHEMA = 'public';

 * postgresql://postgres:***@db
0 rows affected.


table_catalog,table_schema,table_name,table_type,self_referencing_column_name,reference_generation,user_defined_type_catalog,user_defined_type_schema,user_defined_type_name,is_insertable_into,is_typed,commit_action


To print the attributes of a particular table (`airports`, for example) run:

In [3]:
%%sql

SELECT column_name, data_type, character_maximum_length
FROM INFORMATION_SCHEMA.COLUMNS 
WHERE table_name = 'airports';

 * postgresql://postgres:***@db
6 rows affected.


column_name,data_type,character_maximum_length
latitude,double precision,
longtide,double precision,
code,character varying,5.0
name,character varying,80.0
city,character varying,40.0
residence,character varying,2.0


## Useful SQL Keywords

The Keyword `SELECT DISTINCT` is used to return only distinct values. For example:

In [4]:
%%sql
SELECT DISTINCT residence 
FROM airlines
LIMIT 3;

 * postgresql://postgres:***@db
3 rows affected.


residence
NY
WA
CO


## Complex query example

More complex PostgreSQL queries would look like:

In [5]:
%%sql
SELECT airlines.residence, COUNT(airlines.code)
FROM airlines INNER JOIN airports ON airlines.residence = airports.residence
WHERE airlines.residence <> 'CA'
GROUP BY airlines.residence
HAVING COUNT(airlines.code) >5
ORDER BY airlines.residence
OFFSET 1
LIMIT 10;

 * postgresql://postgres:***@db
5 rows affected.


residence,count
FL,17
GA,14
IL,7
NY,14
TX,72


## Exam database − data about flight delays in the US

The dataset consists of relations containing information such as airports, airlines, flights and flight irregularities. Tables include both real-world and synthetic data. 

Here is some basic information on the database tables.

### 1) `airlines` table

Contains the list of airlines serving flights in our database.

* `code` is the three-letter IATA code identifier of the airline

* `name` is the airline name

* `residence` is the two-letter code of the state of residence

In [6]:
%%sql
SELECT * 
FROM airlines
LIMIT 3;

 * postgresql://postgres:***@db
3 rows affected.


code,name,residence
UA,United Air Lines Inc.,IL
AA,American Airlines Inc.,TX
US,US Airways Inc.,AZ


### 2) `airports` table

Contains the list of available airports.

* `code` is the three-letter IATA code identifier of the airport

* `name` is the airport name

* `city` is the name of the city where the airport is located

* `residence` is the two-letter code of the state in which airport is located

* `latitude` and `longitude` are floating-point numbers describing the geographical location of the airport

In [7]:
%%sql
SELECT * 
FROM airports 
LIMIT 3;

 * postgresql://postgres:***@db
3 rows affected.


code,name,city,residence,latitude,longtide
ABE,Lehigh Valley International Airport,Allentown,PA,40.65236,-75.4404
ABI,Abilene Regional Airport,Abilene,TX,32.41132,-99.6819
ABQ,Albuquerque International Sunport,Albuquerque,NM,35.04022,-106.60918999999998


### 3) `flights` table

Contains the list of flights conducted during the the first seven days of January 2015. This table contains many rows - careful when printing the data!

* `id` is the unique flight ID

* `flight_number` is the IATA flight code

* `airline` is the IATA code of the airline

* `departure` and `arrival` are the IATA codes of the departure and arrival airports

* `year`, `month`, and `day` are the integer values encoding the number of day, month and the year when the flight departed. All values are 1-based integers, e.g., the date 2nd January 2015 is stored in columns `year`, `month`, `day` as `2015`, `1`, and `2`, respectively.

* `load_factor` is a floating-point number in the range `[0-1]` that describes the load factor on the flight, i.e., the fraction of occupied passenger seats

In [8]:
%%sql
SELECT * 
FROM flights 
LIMIT 3;

 * postgresql://postgres:***@db
3 rows affected.


id,flight_number,airline,departure,arrival,year,month,day,load_factor
0,98,AS,ANC,SEA,2015,1,1,0.465
1,2336,AA,LAX,PBI,2015,1,1,0.507
2,840,US,SFO,CLT,2015,1,1,0.774


### 4) `flights_delay` table

Contains the information on flights delays and irregularities. This table contains many rows - careful when printing the data!

* `flight_id` is the unique flight ID

* `arrival_delay` a positive value is the delay on arrival in minutes, a negative value indicates an early arrival in minutes

* `cancellation` and `divertion` are boolean flags indicating cancelled and diverted flights

In [9]:
%%sql
SELECT * 
FROM flights_delay 
LIMIT 3;

 * postgresql://postgres:***@db
3 rows affected.


flight_id,arrival_delay,cancellation,divertion
0,-22,False,False
1,-9,False,False
2,5,False,False


##### Note: the examples provided above do not contain all the query operations you might need during the exam.

Now its your turn, you can write all your queries in new cells below. Feel free to add as many cells as needed.

In [10]:
%%sql 
SELECT year, name, residence, airport, ROUND(CAST(AVG(load_factor) AS NUMERIC),2)
FROM fact_table
GROUP BY ROLLUP(year, name, residence, airport)
ORDER BY year, name, residence, airport
LIMIT 3;

 * postgresql://postgres:***@db
3 rows affected.


year,name,residence,airport,round
2015,Alaska Airlines Inc.,WA,Adak Airport,0.44
2015,Alaska Airlines Inc.,WA,Albuquerque International Sunport,0.48
2015,Alaska Airlines Inc.,WA,Austin-Bergstrom International Airport,0.58


In [11]:
%%sql 
SELECT * FROM (
    (
    SELECT year, name, residence, NULL AS airport, ROUND(CAST(AVG(load_factor) AS NUMERIC),2)
    FROM fact_table
    GROUP BY ROLLUP(year, name, residence)
    )
    UNION ALL
    (
    SELECT year, name, residence, airport, ROUND(CAST(AVG(load_factor) AS NUMERIC),2)
    FROM fact_table
    GROUP BY GROUPING SETS((), (year, name, residence, airport))
    ORDER BY year, name, residence, airport
    )

)
ORDER BY year, name, residence, airport
LIMIT 3;

 * postgresql://postgres:***@db
3 rows affected.


year,name,residence,airport,round
2015,Alaska Airlines Inc.,WA,Adak Airport,0.44
2015,Alaska Airlines Inc.,WA,Albuquerque International Sunport,0.48
2015,Alaska Airlines Inc.,WA,Austin-Bergstrom International Airport,0.58


In [12]:
%%sql
SELECT * FROM (
    (
    SELECT year, name, residence, airport, ROUND(CAST(AVG(load_factor) AS NUMERIC),2)
    FROM fact_table
    GROUP BY CUBE(year, name, residence, airport)
    )
    INTERSECT ALL
    (
    SELECT year, name, residence, airport, ROUND(CAST(AVG(load_factor) AS NUMERIC),2)
    FROM fact_table
    GROUP BY ROLLUP(year, name, residence, airport)
    )
)
ORDER BY year, name, residence, airport
LIMIT 3;

 * postgresql://postgres:***@db
3 rows affected.


year,name,residence,airport,round
2015,Alaska Airlines Inc.,WA,Adak Airport,0.44
2015,Alaska Airlines Inc.,WA,Albuquerque International Sunport,0.48
2015,Alaska Airlines Inc.,WA,Austin-Bergstrom International Airport,0.58


---

In [13]:
%%sql
SELECT COUNT(DISTINCT code)
FROM airports;

 * postgresql://postgres:***@db
1 rows affected.


count
322


In [14]:
%%sql
SELECT airline, COUNT(DISTINCT flight_number)
FROM flights
WHERE year = 2015 AND month = 1 AND day = 3
GROUP BY airline
ORDER BY count ASC
LIMIT 1;

 * postgresql://postgres:***@db
1 rows affected.


airline,count
VX,152


In [15]:
%%sql
SELECT COUNT(DISTINCT departure)
FROM flights f
JOIN airports a
ON f.arrival = a.code
WHERE a.residence = 'IL'
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


count
159


In [16]:
%%sql
SELECT COUNT(*)
FROM flights f
JOIN airports a1
ON f.arrival = a1.code
JOIN airports a2
ON f.departure = a2.code
WHERE a1.residence = a2.residence
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


count
11761


In [17]:
%%sql

---
SELECT COUNT(DISTINCT arrival)
FROM flights
WHERE departure = 'ABR'
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


count
1


In [18]:
%%sql
SELECT COUNT(DISTINCT f2.arrival)
FROM flights f1
JOIN flights f2
ON f1.arrival = f2.departure AND f1.day = f2.day AND f1.month = f2.month AND f1.year = f2.year AND f1.departure = 'ABR' AND f1.departure != f2.arrival


 * postgresql://postgres:***@db
1 rows affected.


count
103


In [19]:
%%sql
SELECT ROUND(AVG(arrival_delay),2)
FROM flights f
JOIN airlines a
ON f.airline = a.code
JOIN flights_delay fd
ON f.id = fd.flight_id
WHERE a.residence = 'GA' AND load_factor >= 0.5
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


round
12.71


---

# Round 1

In [20]:
%%sql
SELECT COUNT(*)
FROM airlines
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


count
14


In [21]:
%%sql
SELECT COUNT(*)
FROM airports
WHERE residence = 'CA'
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


count
22


In [22]:
%%sql
SELECT departure, COUNT(*) AS count_flights_from_lax
FROM flights
WHERE arrival = 'LAX'
GROUP BY departure
ORDER BY count_flights_from_lax DESC
LIMIT 1;

 * postgresql://postgres:***@db
1 rows affected.


departure,count_flights_from_lax
JFK,238


In [23]:
%%sql
SELECT ROUND(CAST(AVG(load_factor) AS NUMERIC), 2)
FROM flights
WHERE year = 2015 AND month = 1 AND day = 2
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


round
0.58


In [24]:
%%sql
SELECT COUNT(DISTINCT code)
FROM airports
WHERE code NOT IN (SELECT arrival FROM flights WHERE departure = 'CRW') AND code NOT IN (SELECT departure FROM flights WHERE arrival = 'CRW')
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


count
318


In [25]:
%%sql
SELECT COUNT(*)
FROM flights f
JOIN airports a
ON f.departure = a.code
JOIN airlines al
ON f.airline = al.code
WHERE latitude >= 40 AND al.residence = 'TX' AND day = 1 AND month = 1 AND year = 2015
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


count
1172


In [26]:
%%sql
SELECT f.arrival, ROUND(AVG(arrival_delay),2) AS avg_delay
FROM flights f
JOIN flights_delay fd
ON f.id = fd.flight_id
WHERE arrival_delay > 0
GROUP BY f.arrival
ORDER BY avg_delay DESC
LIMIT 1 OFFSET 1;

 * postgresql://postgres:***@db
1 rows affected.


arrival,avg_delay
EAU,110.75


---

# Round 2

In [27]:
%%sql
SELECT COUNT(*)
FROM airlines
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


count
14


In [28]:
%%sql
SELECT COUNT(DISTINCT code)
FROM airports
WHERE residence = 'CA'
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


count
22


In [29]:
%%sql
SELECT arrival, COUNT(id) AS num_flights
FROM flights
WHERE departure = 'LAX'
GROUP BY arrival
ORDER BY num_flights DESC
LIMIT 1;

 * postgresql://postgres:***@db
1 rows affected.


arrival,num_flights
JFK,238


In [30]:
%%sql
SELECT ROUND(CAST(AVG(load_factor) AS NUMERIC),2)
FROM flights
WHERE year = 2015 AND month = 1 AND day = 2
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


round
0.58


In [31]:
%%sql
SELECT COUNT(DISTINCT code)
FROM airports
WHERE code NOT IN
    ((SELECT arrival
    FROM flights
    WHERE departure = 'CRW')
    UNION ALL
    (SELECT departure
    FROM flights
    WHERE arrival = 'CRW'))

 * postgresql://postgres:***@db
1 rows affected.


count
318


In [32]:
%%sql
SELECT COUNT(DISTINCT id)
FROM flights f
JOIN airports a
ON f.departure = a.code
JOIN airlines al
ON al.code = f.airline
WHERE latitude >= 40 AND al.residence = 'TX' AND day = 1 AND month = 1 AND year = 2015
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


count
1172


In [33]:
%%sql
SELECT arrival, ROUND(AVG(arrival_delay),2) AS avg_del
FROM flights f
JOIN flights_delay fd
ON f.id = fd.flight_id
WHERE arrival_delay > 0
GROUP BY arrival
ORDER BY avg_del DESC
LIMIT 1 OFFSET 1;

 * postgresql://postgres:***@db
1 rows affected.


arrival,avg_del
EAU,110.75


---

# Round 3

In [34]:
%%sql
SELECT COUNT(DISTINCT name)
FROM airlines
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


count
14


In [35]:
%%sql
SELECT COUNT(DISTINCT code)
FROM airports
WHERE residence = 'CA'
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


count
22


In [36]:
%%sql
SELECT arrival, COUNT(DISTINCT id) AS num_arr
FROM flights
WHERE departure = 'LAX'
GROUP BY arrival
ORDER BY num_arr DESC
LIMIT 1;

 * postgresql://postgres:***@db
1 rows affected.


arrival,num_arr
JFK,238


In [37]:
%%sql
SELECT ROUND(CAST(AVG(load_factor) AS NUMERIC),2)
FROM flights
WHERE year = 2015 AND month = 1 AND day = 2
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


round
0.58


In [38]:
%%sql
SELECT COUNT(*)
FROM airports
WHERE code NOT IN 
    ((SELECT DISTINCT arrival
    FROM flights
    WHERE departure = 'CRW')
    UNION
    (SELECT DISTINCT departure
    FROM flights
    WHERE arrival = 'CRW'))
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


count
318


In [39]:
%%sql
SELECT COUNT(id)
FROM flights f
JOIN airlines a
ON f.airline = a.code
JOIN airports ap
ON f.departure = ap.code
WHERE ap.latitude >= 40 AND a.residence = 'TX' AND day = 1 AND month = 1 AND year = 2015
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


count
1172


In [40]:
%%sql
SELECT arrival, ROUND(AVG(arrival_delay),2) as avg_delay
FROM flights f
JOIN flights_delay fd
ON f.id = fd.flight_id
WHERE arrival_delay > 0
GROUP BY arrival
ORDER BY avg_delay DESC
LIMIT 1 OFFSET 1;

 * postgresql://postgres:***@db
1 rows affected.


arrival,avg_delay
EAU,110.75


---

# Round 4

In [41]:
%%sql
SELECT *
FROM airlines
LIMIT 10;

 * postgresql://postgres:***@db
10 rows affected.


code,name,residence
UA,United Air Lines Inc.,IL
AA,American Airlines Inc.,TX
US,US Airways Inc.,AZ
F9,Frontier Airlines Inc.,CO
B6,JetBlue Airways,NY
OO,Skywest Airlines Inc.,UT
AS,Alaska Airlines Inc.,WA
NK,Spirit Air Lines,FL
WN,Southwest Airlines Co.,TX
DL,Delta Air Lines Inc.,GA


In [42]:
%%sql
SELECT COUNT(DISTINCT code)
FROM airlines
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


count
14


In [43]:
%%sql
SELECT COUNT(DISTINCT code)
FROM airports
WHERE residence = 'CA'
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


count
22


In [44]:
%%sql
SELECT arrival, COUNT(id) AS num_flights
FROM flights
WHERE departure = 'LAX'
GROUP BY arrival
ORDER BY num_flights DESC
LIMIT 1;

 * postgresql://postgres:***@db
1 rows affected.


arrival,num_flights
JFK,238


In [45]:
%%sql
SELECT ROUND(AVG(CAST(load_factor AS NUMERIC)),2)
FROM flights
WHERE day = 2 AND month = 1 AND year = 2015
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


round
0.58


In [46]:
%%sql
SELECT COUNT(DISTINCT code)
FROM airports
WHERE code NOT IN
    (SELECT DISTINCT departure
    FROM flights
    WHERE arrival = 'CRW'
    UNION
    SELECT DISTINCT arrival
    FROM flights
    WHERE departure = 'CRW')
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


count
318


In [47]:
%%sql
SELECT COUNT(f.id)
FROM flights f
JOIN airports a
ON a.code = f.departure
JOIN airlines al
ON f.airline = al.code
WHERE a.latitude >= 40 AND al.residence = 'TX' AND day = 1 AND month = 1 AND year = 2015
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


count
1172


In [48]:
%%sql
SELECT arrival, ROUND(AVG(arrival_delay),2) AS avg_del
FROM flights f
JOIN flights_delay fd
ON f.id = fd.flight_id
WHERE arrival_delay > 0
GROUP BY arrival
ORDER BY avg_del DESC
LIMIT 1 OFFSET 1;

 * postgresql://postgres:***@db
1 rows affected.


arrival,avg_del
EAU,110.75


---

# Round 5

In [50]:
%%sql
SELECT COUNT(DISTINCT residence)
FROM airports
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


count
54


In [59]:
%%sql
SELECT COUNT(DISTINCT arrival)
FROM flights
WHERE departure = 'TYS'
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


count
8


In [66]:
%%sql
SELECT airline, COUNT(id) AS num_delays
FROM flights f
JOIN flights_delay fd
ON f.id = fd.flight_id
WHERE arrival_delay > 0
GROUP BY airline
ORDER BY num_delays ASC
LIMIT 1 OFFSET 1;

 * postgresql://postgres:***@db
1 rows affected.


airline,num_delays
HA,760


In [97]:
%%sql
SELECT ROUND(AVG(CAST(load_factor AS NUMERIC)),3)
FROM flights f
JOIN flights_delay fd
ON f.id = fd.flight_id
WHERE day = 5 AND month = 1 AND year = 2015 AND cancellation = FALSE
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


round
0.578


In [80]:
%%sql
SELECT COUNT(*)
FROM
    (SELECT city
    FROM airports
    GROUP BY city
    HAVING COUNT(*) > 1)

 * postgresql://postgres:***@db
1 rows affected.


count
14


In [85]:
%%sql
SELECT COUNT(*)
FROM flights f
JOIN airports a1
ON a1.code = f.departure
JOIN airports a2
ON a2.code = f.arrival
WHERE a1.residence = a2.residence
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


count
11761


In [88]:
%%sql

--- DIRECT FLIGHT
SELECT COUNT(DISTINCT arrival)
FROM flights
WHERE departure = 'ABR'
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


count
1


In [92]:
%%sql

--- Connecting Flights
SELECT COUNT(DISTINCT f2.arrival)
FROM flights f1
JOIN flights f2
ON f1.arrival = f2.departure
WHERE f1.departure = 'ABR' AND f1.departure != f2.arrival AND f1.day = f2.day AND f1.month = f2.month AND f1.year = f2.year
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


count
103


---