# Retrieving Data with SELECT

When working with databases and SQL, the most common task is to request data from one or more tables, which returns this data in the form of a result table. These result tables are called result-sets. The SELECT statement accomplishes this. In most applications, SELECT is the most commonly used data query language (DQL) command. Moreover, the SELECT can do far more than simply retrieve and display data and I will show you in the following sections.

**SELECT Syntax**

- SELECT column1, column2, columnN FROM table_name;
Here, column1, column2... are the fields (or columns) of a table (table_name).

However, if you want query all records, just use:

- SELECT * FROM table_name;

In [1]:
# Create SQL connection
# From now on I'll use MySQL instead of SQLite
import mysql.connector as sql
import pandas as pd
import os

In [2]:
connection = sql.connect(
    host = os.environ.get('mysql_host'),
    user = os.environ.get('mysql_user'),
    password = os.environ.get('mysql_password')
)

cursor = connection.cursor()

### 1. Connect to the world database and listing database tables

In [3]:
# All db in connection:
show_db_query = "SHOW DATABASES"
with connection.cursor() as cursor:
     cursor.execute(show_db_query)
     for db in cursor:
        print(db)

('employees',)
('information_schema',)
('mysql',)
('performance_schema',)
('sakila',)
('sys',)
('test',)
('trading',)
('world',)


In [4]:
world_tables = pd.read_sql_query('SHOW TABLES FROM world', connection)
world_tables

Unnamed: 0,Tables_in_world
0,city
1,country
2,countrylanguage


We can create a python list containing world tables:

In [5]:
tables = world_tables['Tables_in_world']
tables

0               city
1            country
2    countrylanguage
Name: Tables_in_world, dtype: object

Description of the tables:

In [6]:
for table_name in tables:
    output = pd.read_sql_query('DESCRIBE world.{0}'.format(table_name), connection)
    print(table_name)
    print(output)

city
         Field         Type Null  Key Default           Extra
0           ID       b'int'   NO  PRI    None  auto_increment
1         Name  b'char(35)'   NO          b''                
2  CountryCode   b'char(3)'   NO  MUL     b''                
3     District  b'char(20)'   NO          b''                
4   Population       b'int'   NO         b'0'                
country
             Field                                               Type Null  \
0             Code                                         b'char(3)'   NO   
1             Name                                        b'char(52)'   NO   
2        Continent  b"enum('Asia','Europe','North America','Africa...   NO   
3           Region                                        b'char(26)'   NO   
4      SurfaceArea                                     b'float(10,2)'   NO   
5        IndepYear                                        b'smallint'  YES   
6       Population                                             b'int'

### 2 - Retrieving all data from country table

In [12]:
pd.read_sql_query("""SELECT *
                     FROM world.country""",
                     connection)

Unnamed: 0,Code,Name,Continent,Region,SurfaceArea,IndepYear,Population,LifeExpectancy,GNP,GNPOld,LocalName,GovernmentForm,HeadOfState,Capital,Code2
0,ABW,Aruba,North America,Caribbean,193.0,,103000,78.4,828.0,793.0,Aruba,Nonmetropolitan Territory of The Netherlands,Beatrix,129.0,AW
1,AFG,Afghanistan,Asia,Southern and Central Asia,652090.0,1919.0,22720000,45.9,5976.0,,Afganistan/Afqanestan,Islamic Emirate,Mohammad Omar,1.0,AF
2,AGO,Angola,Africa,Central Africa,1246700.0,1975.0,12878000,38.3,6648.0,7984.0,Angola,Republic,JosÃ© Eduardo dos Santos,56.0,AO
3,AIA,Anguilla,North America,Caribbean,96.0,,8000,76.1,63.2,,Anguilla,Dependent Territory of the UK,Elisabeth II,62.0,AI
4,ALB,Albania,Europe,Southern Europe,28748.0,1912.0,3401200,71.6,3205.0,2500.0,ShqipÃ«ria,Republic,Rexhep Mejdani,34.0,AL
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
234,YEM,Yemen,Asia,Middle East,527968.0,1918.0,18112000,59.8,6041.0,5729.0,Al-Yaman,Republic,Ali Abdallah Salih,1780.0,YE
235,YUG,Yugoslavia,Europe,Southern Europe,102173.0,1918.0,10640000,72.4,17000.0,,Jugoslavija,Federal Republic,Vojislav KoÂštunica,1792.0,YU
236,ZAF,South Africa,Africa,Southern Africa,1221037.0,1910.0,40377000,51.1,116729.0,129092.0,South Africa,Republic,Thabo Mbeki,716.0,ZA
237,ZMB,Zambia,Africa,Eastern Africa,752618.0,1964.0,9169000,37.2,3377.0,3922.0,Zambia,Republic,Frederick Chiluba,3162.0,ZM


### 3. Retrieving data from specific columns

In many cases, it is not necessary to pull all columns in a SELECT statement. You can also pick and choose only the columns you are interested in.

In [13]:
pd.read_sql_query("""SELECT Name, Continent, Region
                     FROM world.country""",
                     connection)

Unnamed: 0,Name,Continent,Region
0,Aruba,North America,Caribbean
1,Afghanistan,Asia,Southern and Central Asia
2,Angola,Africa,Central Africa
3,Anguilla,North America,Caribbean
4,Albania,Europe,Southern Europe
...,...,...,...
234,Yemen,Asia,Middle East
235,Yugoslavia,Europe,Southern Europe
236,South Africa,Africa,Southern Africa
237,Zambia,Africa,Eastern Africa


### 4. Do some calculations in SELECT Statements
The SELECT statement can do far more than simply select columns.

Sometimes, we are also interested in the relationship between columns. This can be done with expressions in SELECT Statements. For example, I'd like to see the difference between GNP and GNPOld columns (using the minus operator -). You can also try other operators such as +, *, / or %.

In [16]:
pd.read_sql_query("""
    SELECT Name, GNP, GNPOld, GNP - GNPOld AS GNPDelta
    FROM world.country""",
    connection)

Unnamed: 0,Name,GNP,GNPOld,GNPDelta
0,Aruba,828.0,793.0,35.0
1,Afghanistan,5976.0,,
2,Angola,6648.0,7984.0,-1336.0
3,Anguilla,63.2,,
4,Albania,3205.0,2500.0,705.0
...,...,...,...,...
234,Yemen,6041.0,5729.0,312.0
235,Yugoslavia,17000.0,,
236,South Africa,116729.0,129092.0,-12363.0
237,Zambia,3377.0,3922.0,-545.0


Besides giving names to expressions using aliases, aliases can also be used to rename an existing column within the query.

However, keep in mind that such an operation does not affect the real data or change the name in the table. It only change the way you see it on your screen.

# References
- [Chonghua Yin notebook](https://github.com/royalosyin/Practice-SQL-with-SQLite-and-Jupyter-Notebook/blob/master/ex03-Retrieving%20Data%20with%20SELECT.ipynb)