# Information Systems for Engineers Spring 2022 - Cheat Sheet

During the exam, you will be required to write SQL queries using a Jupyter notebook.

This notebook is designed to help you start writing your queries by providing you an environment with the datasets loaded and a simple query that you can use to recap the syntax of SQL.

Feel free to extend this notebook and use it for preparing the answers you need for the exam. Take into account that the content of this notebook will not be considered for grading.

## SQL

There is a local PostgreSQL 13 installation with a dataset loaded into a database. Run the next cell to connect to it.

In [1]:
%load_ext sql
%sql  postgresql://postgres:example@db 

To print the tables currently loaded in the database run:

In [2]:
%%sql

SELECT * 
FROM INFORMATION_SCHEMA.TABLES 
WHERE TABLE_TYPE = 'BASE TABLE' and TABLE_CATALOG = 'postgres' and TABLE_SCHEMA = 'public';

 * postgresql://postgres:***@db
39 rows affected.


table_catalog,table_schema,table_name,table_type,self_referencing_column_name,reference_generation,user_defined_type_catalog,user_defined_type_schema,user_defined_type_name,is_insertable_into,is_typed,commit_action
postgres,public,categories,BASE TABLE,,,,,,YES,NO,
postgres,public,customers,BASE TABLE,,,,,,YES,NO,
postgres,public,nwemployees,BASE TABLE,,,,,,YES,NO,
postgres,public,employeeterritories,BASE TABLE,,,,,,YES,NO,
postgres,public,order_details,BASE TABLE,,,,,,YES,NO,
postgres,public,orders,BASE TABLE,,,,,,YES,NO,
postgres,public,products,BASE TABLE,,,,,,YES,NO,
postgres,public,region,BASE TABLE,,,,,,YES,NO,
postgres,public,shippers,BASE TABLE,,,,,,YES,NO,
postgres,public,suppliers,BASE TABLE,,,,,,YES,NO,


To print the attributes of a particular table (`airports`, for example) run:

In [3]:
%%sql

SELECT column_name, data_type, character_maximum_length
FROM INFORMATION_SCHEMA.COLUMNS 
WHERE table_name = 'airports';

 * postgresql://postgres:***@db
6 rows affected.


column_name,data_type,character_maximum_length
latitude,double precision,
longtide,double precision,
code,character varying,5.0
name,character varying,80.0
city,character varying,40.0
residence,character varying,2.0


## Useful SQL Keywords

The Keyword `SELECT DISTINCT` is used to return only distinct values. For example:

In [4]:
%%sql
SELECT DISTINCT residence FROM airlines;

 * postgresql://postgres:***@db
11 rows affected.


residence
NY
WA
CO
AZ
HI
FL
UT
CA
TX
GA


## Complex query example

More complex PostgreSQL queries would look like:

In [5]:
%%sql
SELECT airlines.residence, COUNT(airlines.code)
    FROM airlines INNER JOIN airports ON airlines.residence = airports.residence
    WHERE airlines.residence <> 'CA'
GROUP BY airlines.residence
ORDER BY airlines.residence;

 * postgresql://postgres:***@db
10 rows affected.


residence,count
AZ,4
CO,10
FL,17
GA,14
HI,5
IL,7
NY,14
TX,72
UT,5
WA,4


## Exam database − data about flight delays in the US

The dataset consists of relations containing information such as airports, airlines, flights and flight irregularities. Tables include both real-world and synthetic data. 

Here is some basic information on the database tables.

### 1) `airlines` table

Contains the list of airlines serving flights in our database.

* `code` is the three-letter IATA code identifier of the airline

* `name` is the airline name

* `residence` is the two-letter code of the state of residence

In [6]:
%%sql
SELECT * FROM airlines;

 * postgresql://postgres:***@db
14 rows affected.


code,name,residence
UA,United Air Lines Inc.,IL
AA,American Airlines Inc.,TX
US,US Airways Inc.,AZ
F9,Frontier Airlines Inc.,CO
B6,JetBlue Airways,NY
OO,Skywest Airlines Inc.,UT
AS,Alaska Airlines Inc.,WA
NK,Spirit Air Lines,FL
WN,Southwest Airlines Co.,TX
DL,Delta Air Lines Inc.,GA


### 2) `airports` table

Contains the list of available airports.

* `code` is the three-letter IATA code identifier of the airport

* `name` is the airport name

* `city` is the name of the city where the airport is located

* `residence` is the two-letter code of the state in which airport is located

* `latitude` and `longitude` are floating-point numbers describing the geographical location of the airport

In [7]:
%%sql
SELECT * FROM airports LIMIT 4;

 * postgresql://postgres:***@db
4 rows affected.


code,name,city,residence,latitude,longtide
ABE,Lehigh Valley International Airport,Allentown,PA,40.65236,-75.4404
ABI,Abilene Regional Airport,Abilene,TX,32.41132,-99.6819
ABQ,Albuquerque International Sunport,Albuquerque,NM,35.04022,-106.60918999999998
ABR,Aberdeen Regional Airport,Aberdeen,SD,45.44906,-98.42183


### 3) `flights` table

Contains the list of flights conducted during the the first seven days of January 2015. This table contains many rows - careful when printing the data!

* `id` is the unique flight ID

* `flight_number` is the IATA flight code

* `airline` is the IATA code of the airline

* `departure` and `arrival` are the IATA codes of the departure and arrival airports

* `year`, `month`, and `day` are the integer values encoding the number of day, month and the year when the flight departed. All values are 1-based integers, e.g., the date 2nd January 2015 is stored in columns `year`, `month`, `day` as `2015`, `1`, and `2`, respectively.

* `load_factor` is a floating-point number in the range `[0-1]` that describes the load factor on the flight, i.e., the fraction of occupied passenger seats

In [8]:
%%sql
SELECT * FROM flights LIMIT 4;

 * postgresql://postgres:***@db
4 rows affected.


id,flight_number,airline,departure,arrival,year,month,day,load_factor
0,98,AS,ANC,SEA,2015,1,1,0.465
1,2336,AA,LAX,PBI,2015,1,1,0.507
2,840,US,SFO,CLT,2015,1,1,0.774
3,258,AA,LAX,MIA,2015,1,1,0.633


### 4) `flights_delay` table

Contains the information on flights delays and irregularities. This table contains many rows - careful when printing the data!

* `flight_id` is the unique flight ID

* `arrival_delay` a positive value is the delay on arrival in minutes, a negative value indicates an early arrival in minutes

* `cancellation` and `divertion` are boolean flags indicating cancelled and diverted flights

In [9]:
%%sql
SELECT * FROM flights_delay LIMIT 4;

 * postgresql://postgres:***@db
4 rows affected.


flight_id,arrival_delay,cancellation,divertion
0,-22,False,False
1,-9,False,False
2,5,False,False
3,-9,False,False


##### Note: the examples provided above do not contain all the query operations you might need during the exam.

Now its your turn, you can write all your queries in new cells below. Feel free to add as many cells as needed.

In [10]:
%%sql
SELECT COUNT(DISTINCT residence)
FROM airports
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


count
54


In [11]:
%%sql 
SELECT COUNT(DISTINCT arrival)
FROM flights
WHERE departure = 'TYS'
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


count
8


In [12]:
%%sql
SELECT airline, COUNT(id) AS num_flights
FROM flights f
JOIN flights_delay fd
ON f.id = fd.flight_id
WHERE arrival_delay > 0
GROUP BY airline
ORDER BY num_flights ASC
LIMIT 1 OFFSET 1;

 * postgresql://postgres:***@db
1 rows affected.


airline,num_flights
HA,760


In [13]:
%%sql 
SELECT ROUND(AVG(CAST(load_factor AS NUMERIC)),3)
FROM flights f
JOIN flights_delay fd
ON f.id = fd.flight_id
WHERE cancellation is FALSE AND year = 2015 AND month = 1 AND day = 5
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


round
0.578


In [14]:
%%sql 
SELECT COUNT(*)
FROM 
    (SELECT city, COUNT(DISTINCT name)
    FROM airports
    GROUP BY city
    HAVING COUNT(DISTINCT name) > 1)

 * postgresql://postgres:***@db
1 rows affected.


count
14


In [15]:
%%sql 
SELECT COUNT(DISTINCT id)
FROM flights f
JOIN airports a1
ON f.departure = a1.code
JOIN airports a2
ON f.arrival = a2.code
WHERE a1.residence = a2.residence
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


count
11761


In [16]:
%%sql 

--- Direct Flights

SELECT COUNT(DISTINCT arrival)
FROM flights
WHERE departure = 'ABR'
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


count
1


In [17]:
%%sql

--- Connecting Flights

SELECT COUNT(DISTINCT f2.arrival)
FROM flights f1
JOIN flights f2
ON f1.arrival = f2.departure
WHERE f1.departure = 'ABR' AND f1.day = f2.day AND f1.month = f2.month AND f1.year = f2.year AND f1.departure != f2.arrival
LIMIT 10;

 * postgresql://postgres:***@db
1 rows affected.


count
103


---