# PART 1. QUERYING DATA
Taken from this tutorial: https://www.dataquest.io/blog/python-pandas-databases/

## Table of Contents
### Getting Oriented with the Database
1. [What tables does my database have?](#1.-What-tables-does-my-database-have?)
2. [What columns does my table have?](#2.-What-columns-does-my-table-have?)

### Querying
1. [Retrieve data from **all** columns](#1.-Select-All-Columns,-Display-the-First-15-Rows)
2. [Retrieve data from **some** columns](#2.-Select-Some-Columns,-Display-the-First-15-Rows)
3. [Filter data by 1 condition](#3.-Select-Some-Columns,-Filter-by-a-Single-Condition)
4. [Filter data by multiple conditions (AND)](#4.-Select-Some-Columns,-Filter-by-Two-Condition-(AND))
5. [Filter data by multiple conditions (OR)](#5.-Select-Some-Columns,-Filter-by-Two-Condition-(OR))
6. [Getting unique rows](#6.-Getting-Unique-Rows)
7. [Grouping](#7.-Grouping)
8. [Grouping and then filtering](#8.-Grouping-and-Then-Filtering)

## I. Connecting to your database
In order to connect to your database, you need to import the sqlite3 module and create a database connection:
1. Figure out what tables you have
2. Figure out what columns each table has

In [1]:
# use the sqlite3 library
import sqlite3
conn = sqlite3.connect('../databases/flights.db')

## II. Getting Oriented with Your Database
To understand what exists in your database, you have to know what tables are available and what each table looks like. This section will help you:
1. Figure out what tables you have
2. Figure out what columns each table has

[back to top](#PART-1.-QUERYING-DATA)

### 1. What tables does my database have?
```sql
SELECT name 
FROM sqlite_master 
WHERE type='table';
```

In [2]:
# create and execute a cursor (analagous to a file reader with an instruction):
cur = conn.cursor()
cur.execute("SELECT name FROM sqlite_master WHERE type='table';")

# put the results of your query into the results variable:
results = cur.fetchall()
print(results)

# just adding a little formatting:
print()
print('List Tables:')
for row in results:
    print(row[0])

[('airports',), ('airlines',), ('routes',)]

List Tables:
airports
airlines
routes


[back to top](#PART-1.-QUERYING-DATA)

### 2. What columns does my table have?
```sql
pragma table_info(<table_name>);
```

In [3]:
cur = conn.cursor()
cur.execute("pragma table_info(airlines);")
results = cur.fetchall()
print(results)

# add a little formatting to make things easier to read:
print()
print('-' * 35)
print('List each column name and datatype:')
print('-' * 35)
for row in results:
    print(row[1] + ' (' +  row[2] + ')')

[(0, 'index', 'INTEGER', 0, None, 0), (1, 'id', 'TEXT', 0, None, 0), (2, 'name', 'TEXT', 0, None, 0), (3, 'alias', 'TEXT', 0, None, 0), (4, 'iata', 'TEXT', 0, None, 0), (5, 'icao', 'TEXT', 0, None, 0), (6, 'callsign', 'TEXT', 0, None, 0), (7, 'country', 'TEXT', 0, None, 0), (8, 'active', 'TEXT', 0, None, 0)]

-----------------------------------
List each column name and datatype:
-----------------------------------
index (INTEGER)
id (TEXT)
name (TEXT)
alias (TEXT)
iata (TEXT)
icao (TEXT)
callsign (TEXT)
country (TEXT)
active (TEXT)


## III. SELECT STATEMENTS 
Select statements are for querying your database and getting specific information out of a database to answer a question. There are several different parts of the select statement that allow you to target pieces of information:

1. SELECT clause: which columns do you want (* means you want all of the columns)?
2. FROM clause: which table do you want to query?
3. WHERE clause: how do you want to filter your data?
4. ORDER BY clause: how do you want to order your data?
5. LIMIT clause: how many records do you want back?
6. GROUP BY clause: do you want to aggregate information by a particular column?

[back to top](#PART-1.-QUERYING-DATA)

### 1. Select All Columns, Display the First 15 Rows
Here, we will select data from every column in the airlines table, but only return the first 15 rows (since it's a big dataset).

```sql
SELECT * FROM airlines LIMIT 15;
```

* The asterisk indicates that we want all of the columns from the airlines table
* The LIMIT clause limits the results to just the first 15 rows
* Question: what is the data type of results?

In [4]:
cur = conn.cursor()
cur.execute("SELECT * FROM airlines LIMIT 15;")
results = cur.fetchall()
# print(results)

In [35]:
# make a loop and print everything:
for row in results:
    print(row)

(0, '1', 'Private flight', '\\N', '-', None, None, None, 'Y')
(1, '2', '135 Airways', '\\N', None, 'GNL', 'GENERAL', 'United States', 'N')
(2, '3', '1Time Airline', '\\N', '1T', 'RNX', 'NEXTIME', 'South Africa', 'Y')
(3, '4', '2 Sqn No 1 Elementary Flying Training School', '\\N', None, 'WYT', None, 'United Kingdom', 'N')
(4, '5', '213 Flight Unit', '\\N', None, 'TFU', None, 'Russia', 'N')
(5, '6', '223 Flight Unit State Airline', '\\N', None, 'CHD', 'CHKALOVSK-AVIA', 'Russia', 'N')
(6, '7', '224th Flight Unit', '\\N', None, 'TTF', 'CARGO UNIT', 'Russia', 'N')
(7, '8', '247 Jet Ltd', '\\N', None, 'TWF', 'CLOUD RUNNER', 'United Kingdom', 'N')
(8, '9', '3D Aviation', '\\N', None, 'SEC', 'SECUREX', 'United States', 'N')
(9, '10', '40-Mile Air', '\\N', 'Q5', 'MLA', 'MILE-AIR', 'United States', 'Y')
(10, '11', '4D Air', '\\N', None, 'QRT', 'QUARTET', 'Thailand', 'N')
(11, '12', '611897 Alberta Limited', '\\N', None, 'THD', 'DONUT', 'Canada', 'N')
(12, '13', 'Ansett Australia', '\\N', 'AN', '

In [36]:
# just print the name of the airline (third column) for each row:
for row in results:
    print(row[2])

Private flight
135 Airways
1Time Airline
2 Sqn No 1 Elementary Flying Training School
213 Flight Unit
223 Flight Unit State Airline
224th Flight Unit
247 Jet Ltd
3D Aviation
40-Mile Air
4D Air
611897 Alberta Limited
Ansett Australia
Abacus International
Abelag Aviation


[back to top](#PART-1.-QUERYING-DATA)

### 2. Select Some Columns, Display the First 15 Rows
Now, rather than retrieving data about all of the columns, we're just asking for some of them.

 ```sql
 SELECT id, name, country, active 
 FROM airlines 
 LIMIT 15;
 ```

In [37]:
cur = conn.cursor()
cur.execute("SELECT id, name, country, active FROM airlines LIMIT 15;")
results = cur.fetchall()
# print(results)

# adding some formatting:
for result in results:
    print(result[0], result[1], result[2], result[3], sep=' | ')
    
    
# # alternative syntax (using the * operator):
# for result in results:
#     print(*result, sep=' | ')

1 | Private flight | None | Y
2 | 135 Airways | United States | N
3 | 1Time Airline | South Africa | Y
4 | 2 Sqn No 1 Elementary Flying Training School | United Kingdom | N
5 | 213 Flight Unit | Russia | N
6 | 223 Flight Unit State Airline | Russia | N
7 | 224th Flight Unit | Russia | N
8 | 247 Jet Ltd | United Kingdom | N
9 | 3D Aviation | United States | N
10 | 40-Mile Air | United States | Y
11 | 4D Air | Thailand | N
12 | 611897 Alberta Limited | Canada | N
13 | Ansett Australia | Australia | Y
14 | Abacus International | Singapore | Y
15 | Abelag Aviation | Belgium | N


[back to top](#PART-1.-QUERYING-DATA)

### 3. Select Some Columns, Filter by a Single Condition
What if we only want flights **in the United Kingdom**?
```sql
SELECT id, name, country, active 
FROM airlines 
WHERE country = 'United Kingdom' 
LIMIT 15;
```


In [38]:
cur = conn.cursor()
cur.execute("SELECT id, name, country, active FROM airlines WHERE country = 'United Kingdom' LIMIT 15;")
results = cur.fetchall()

for result in results:
    print(result[0], result[1], result[2], result[3], sep=' | ')

4 | 2 Sqn No 1 Elementary Flying Training School | United Kingdom | N
8 | 247 Jet Ltd | United Kingdom | N
16 | Army Air Corps | United Kingdom | N
52 | Avcard Services | United Kingdom | N
59 | Air Charter Service | United Kingdom | N
77 | Aero Dynamics | United Kingdom | N
105 | Air Atlantique | United Kingdom | N
112 | Astraeus | United Kingdom | Y
138 | Air Partner | United Kingdom | N
143 | Air Data | United Kingdom | N
158 | Airfreight Express | United Kingdom | N
232 | A J Services | United Kingdom | N
269 | Albion Aviation | United Kingdom | N
302 | Alan Mann Helicopters Ltd. | United Kingdom | N
311 | Aeromedicare Ltd. | United Kingdom | N


[back to top](#PART-1.-QUERYING-DATA)

### 4. Select Some Columns, Filter by Two Condition (AND)
What if we only want flights **in the United Kingdom which are currently are active**?
```sql
SELECT id, name, country, active 
FROM airlines 
WHERE country = 'United Kingdom' and active = 'Y' 
LIMIT 15;
```



In [39]:
cur = conn.cursor()
cur.execute('''
SELECT id, name, country, active 
FROM airlines 
WHERE country = 'United Kingdom' and active = 'Y' 
LIMIT 15;
''')
results = cur.fetchall()

for result in results:
    print(result[0], result[1], result[2], result[3], sep=' | ')

112 | Astraeus | United Kingdom | Y
492 | Air Southwest | United Kingdom | Y
508 | Aurigny Air Services | United Kingdom | Y
565 | Air Wales | United Kingdom | Y
665 | AD Aviation | United Kingdom | Y
690 | Air Foyle | United Kingdom | Y
1355 | British Airways | United Kingdom | Y
1411 | British International Helicopters | United Kingdom | Y
1437 | bmi | United Kingdom | Y
1441 | bmibaby | United Kingdom | Y
1445 | British Midland Regional | United Kingdom | Y
1543 | British Mediterranean Airways | United Kingdom | Y
1795 | BA CityFlyer | United Kingdom | Y
1923 | Crest Aviation | United Kingdom | Y
2117 | Eastern Airways | United Kingdom | Y


[back to top](#PART-1.-QUERYING-DATA)

### 5. Select Some Columns, Filter by Two Condition (OR)
What if we only want flights **in the United Kingdom which are currently are active**?
```sql
SELECT id, name, country, active 
FROM airlines 
WHERE country = 'Colombia' or country = 'Brazil' 
LIMIT 15;
```



In [40]:
cur = conn.cursor()
cur.execute('''
SELECT id, name, country, active 
FROM airlines 
WHERE country = 'Colombia' or country = 'Brazil' 
LIMIT 15;
''')
results = cur.fetchall()

for result in results:
    print(result[0], result[1], result[2], result[3], sep=' | ')

42 | ABSA - Aerolinhas Brasileiras | Brazil | Y
43 | Abaet | Brazil | N
45 | APSA Colombia | Colombia | N
46 | Aerovias Bueno | Colombia | N
51 | ATA Brasil | Brazil | N
98 | Aeroexpreso Interamericano | Colombia | N
110 | ACES Colombia | Colombia | Y
226 | Airvias S/A Linhas Aereas | Brazil | N
245 | Aeroejecutivos Colombia | Colombia | N
258 | Arca Aerovias Colombianas Ltda. | Colombia | N
270 | Aeroalas Colombia | Colombia | N
297 | Aerolineas Medellin | Colombia | N
301 | Air Minas Linhas A | Brazil | N
339 | Aerol | Colombia | N
359 | Aeroatlantico Colombia | Colombia | N


[back to top](#PART-1.-QUERYING-DATA)

### 6. Getting Unique Rows
What if we want to know which countries were represented in the airlines data table?

```sql
SELECT DISTINCT country 
FROM airlines;
```

In [41]:
cur = conn.cursor()
cur.execute("SELECT DISTINCT country FROM airlines LIMIT 25;")
results = cur.fetchall()

# alternative syntax (using the * operator):
for result in results:
    print(result[0])

None
United States
South Africa
United Kingdom
Russia
Thailand
Canada
Australia
Singapore
Belgium
Mexico
Spain
France
United Arab Emirates
Republic of Korea
Pakistan
Libya
Gambia
Ivory Coast
Ukraine
Democratic Republic of the Congo
Iran
Finland
Brazil
Colombia


[back to top](#PART-1.-QUERYING-DATA)

### 7. Grouping
What if we want to know which countries were represented in the airlines data table?

```sql
SELECT country, count(country) as airline_count
FROM airlines
GROUP BY country
ORDER BY airline_count desc
LIMIT 25;
```


In [42]:
# Note: you can break the line (if you surround a string with triple quotes) for readability:
cur = conn.cursor()
cur.execute('''
   SELECT country, count(country) as airline_count 
   FROM airlines 
   GROUP BY country 
   ORDER BY airline_count desc 
   LIMIT 25;
''')
results = cur.fetchall()

# alternative syntax (using the * operator):
for result in results:
    print(result[0], result[1])

United States 1080
Mexico 439
United Kingdom 407
Canada 318
Russia 230
Spain 166
Germany 131
France 119
Australia 93
South Africa 91
Italy 90
Ukraine 89
Nigeria 85
Kazakhstan 79
China 70
Sweden 70
Switzerland 60
Brazil 58
Netherlands 52
Austria 50
Sudan 49
Egypt 48
Indonesia 48
Thailand 48
Portugal 45


[back to top](#PART-1.-QUERYING-DATA)

### 8. Grouping and Then Filtering
What if we want to know which countries were represented in the airlines data table?

```sql
SELECT country, count(country) as airline_count
FROM airlines
GROUP BY country
HAVING count(country) > 10
ORDER BY airline_count desc
LIMIT 25;
```


In [43]:
# Note: you can break the line (if you surround a string with triple quotes) for readability:
cur = conn.cursor()
cur.execute('''
   SELECT country, count(country) as airline_count 
   FROM airlines 
   WHERE active ='Y'
   GROUP BY country 
   HAVING count(country) > 10
   ORDER BY airline_count desc;
''')
results = cur.fetchall()

# alternative syntax (using the * operator):
for result in results:
    print(result[0], result[1])

United States 141
Russia 72
United Kingdom 40
Germany 37
Canada 34
Australia 26
China 25
Spain 24
Brazil 23
France 22
Japan 19
Italy 18
India 17
Indonesia 17
Thailand 16
Turkey 16
Sweden 15
Switzerland 14
Portugal 13
Ukraine 13
Austria 12
Egypt 12
Finland 12
Mexico 12
Peru 11


In [44]:
conn.close()