# Pulling Data with SELECT

In this section, we are going to learn the most common SQL command. `SELECT` is used to retrieve data from one or more tables. It can also transform data before it is returned. However, it is a read-only operation so it does not change the underlying tables. 

## Setup 
First get set up. Download the SQLite database file `company_operations.db` and connect to it. Also bring in `pandas` to display our SQL query results as a `DataFrame`. 

In [2]:
import sqlite3
import pandas as pd
import urllib.request

# download SQLite database and connect to it 
urllib.request.urlretrieve("https://github.com/thomasnield/anaconda_intro_to_sql/blob/main/company_operations.db?raw=true", "company_operations.db")
conn = sqlite3.connect('company_operations.db')

## Selecting Columns 

Let's first select all columns from the `CUSTOMER` table. 

In [3]:
sql = "SELECT * FROM CUSTOMER"

pd.read_sql(sql, conn)

Unnamed: 0,CUSTOMER_ID,CUSTOMER_NAME,ADDRESS,CITY,STATE,ZIP,CATEGORY
0,1,Alpha Medical,18745 Train Dr,Dallas,TX,75021,INDUSTRIAL
1,2,Oak Cliff Base,2379 Cliff Ave,Abbevile,LA,70510,GOVERNMENT
2,3,Sports Unlimited,1605 Station Dr,Alexandrai,LA,71301,COMMERCIAL
3,4,Riley Sporting Goods,9854 Firefly Blvd,Austin,TX,78701,COMMERCIAL
4,5,Lite Industrial,462 Roadrunner Blvd,Houston,TX,77254,INDUSTRIAL
5,6,Prairie Sports Center,689 Stadium Way,Tulsa,OK,74101,COMMERCIAL
6,7,Facility 95,2396 Runway Dr,Oklahoma City,OK,73101,GOVERNMENT
7,8,Allen Stadium,573 HIllcrest Rd,Allen,TX,75002,COMMERCIAL
8,9,Dent Research,392 45th St,Waco,TX,76700,INDUSTRIAL
9,10,Gamma Solutions,2752 27th St,Phoenix,AZ,85001,COMMERCIAL


Note that the asterisk `*` indicates to select all columns, and the `FROM` is preceded by the table you are selecting the columns from which is `CUSTOMER`. We can see that there are 10 customers in this table. 

If you want to limit your query to just the first 5 results, add a `LIMIT 5` so it cuts off returning data after 5 records. This is helpful if there are a lot of records and you want just a sample of records to see what the data looks like. 

In [4]:
sql = "SELECT * FROM CUSTOMER LIMIT 5"

pd.read_sql(sql, conn)

Unnamed: 0,CUSTOMER_ID,CUSTOMER_NAME,ADDRESS,CITY,STATE,ZIP,CATEGORY
0,1,Alpha Medical,18745 Train Dr,Dallas,TX,75021,INDUSTRIAL
1,2,Oak Cliff Base,2379 Cliff Ave,Abbevile,LA,70510,GOVERNMENT
2,3,Sports Unlimited,1605 Station Dr,Alexandrai,LA,71301,COMMERCIAL
3,4,Riley Sporting Goods,9854 Firefly Blvd,Austin,TX,78701,COMMERCIAL
4,5,Lite Industrial,462 Roadrunner Blvd,Houston,TX,77254,INDUSTRIAL


Note you can also select specific columns separated by commas. This is helpful to only grab columns you are interested in as well as reduce the amount of data that has to be retrieved. Below we only retrieve the `CUSTOMER_NAME` and `ADDRESS` columns. 

In [5]:
sql = "SELECT CUSTOMER_NAME, ADDRESS FROM CUSTOMER"

pd.read_sql(sql, conn)

Unnamed: 0,CUSTOMER_NAME,ADDRESS
0,Alpha Medical,18745 Train Dr
1,Oak Cliff Base,2379 Cliff Ave
2,Sports Unlimited,1605 Station Dr
3,Riley Sporting Goods,9854 Firefly Blvd
4,Lite Industrial,462 Roadrunner Blvd
5,Prairie Sports Center,689 Stadium Way
6,Facility 95,2396 Runway Dr
7,Allen Stadium,573 HIllcrest Rd
8,Dent Research,392 45th St
9,Gamma Solutions,2752 27th St


If you want to see what tables are available in a database, you can ask for documentation from the database administrator or use a graphical user interface tool which displays the tables. In a Python environment, you will need a SQL command for your database platform that lists all the tables. 

In SQLite, there is a hidden administrative table called `sqlite_master` that allows you to list all the objects in a database. We will learn more about the `WHERE` keyword, but note it allows us to filter to only `table` objects. 

In [6]:
sql = "SELECT NAME FROM sqlite_master WHERE type='table'"

pd.read_sql(sql, conn)

Unnamed: 0,name
0,CALENDAR
1,CUSTOMER
2,EMPLOYEE
3,PRODUCT
4,CUSTOMER_ORDER
5,EMPLOYEE_AIR_TRAVEL
6,WEATHER_MONITOR


## Expressions and Functions

Let's take a look at the `PRODUCT` table. 

In [7]:
sql = "SELECT * FROM PRODUCT"

pd.read_sql(sql, conn)

Unnamed: 0,PRODUCT_ID,PRODUCT_NAME,PRODUCT_GROUP,PRICE
0,1,Eagle Kit,ALPHA,120
1,2,Hawkeye Cam,ALPHA,80
2,3,Sparrow Blade,BETA,40
3,4,Raven Klaw,BETA,40
4,5,Kriket Light,GAMMA,25
5,6,Owl NV,ALPHA,100
6,7,Vulture X,BETA,56
7,8,Roadrunner Pro,ALPHA,70
8,9,Falcon Tracker,GAMMA,20
9,10,Emu Handheld,GAMMA,35


Let's say we want to drop each price by 10%. We can multiply each price by `0.9` by creating a new field as an expression. We will call it `REDUCED_PRICE`. This does not modify the table, but rather transforms the data before it is returned. It is calculating that `REDUCED_PRICE` only within this query, much like a formula in Excel. This is what's great about SQL. It allows the stored data to be simple and minimal, but we can layer calculations and manipulations on top of it within a query. 

In [8]:
sql = """
SELECT PRODUCT_NAME,
PRICE,
PRICE * 0.9 AS REDUCED_PRICE

FROM PRODUCT
"""

pd.read_sql(sql, conn)

Unnamed: 0,PRODUCT_NAME,PRICE,REDUCED_PRICE
0,Eagle Kit,120,108.0
1,Hawkeye Cam,80,72.0
2,Sparrow Blade,40,36.0
3,Raven Klaw,40,36.0
4,Kriket Light,25,22.5
5,Owl NV,100,90.0
6,Vulture X,56,50.4
7,Roadrunner Pro,70,63.0
8,Falcon Tracker,20,18.0
9,Emu Handheld,35,31.5


Note how I can write my SQL query across multiple lines for legibility, and I leveraged the triple double-quote syntax in Python `"""` to take advantage of this. 

The mathematical operators you can expect in every SQL platform are as follows: 

Symbol  | Operation 
-------------------|------------------
+      | Adds two numbers
- | Subtracts two numbers
* | Multiplies two numbers
/ | Divides two numbers
% | Divides, but returns remainder

Note that these mathematical operators only work between numeric values or fields. These symbols may be used in other contexts, such as the `*` can mean "select all columns" but between two numbers it is a multiplication.

Now let's say we want to calculate a `PROCESS_FEE` for each price, which is `.00047` multiplied on the `PRICE`. 

In [9]:
sql = """
SELECT PRODUCT_NAME,
PRICE,
PRICE * .00047 AS PROCESS_FEE

FROM PRODUCT
"""

pd.read_sql(sql, conn)

Unnamed: 0,PRODUCT_NAME,PRICE,PROCESS_FEE
0,Eagle Kit,120,0.0564
1,Hawkeye Cam,80,0.0376
2,Sparrow Blade,40,0.0188
3,Raven Klaw,40,0.0188
4,Kriket Light,25,0.01175
5,Owl NV,100,0.047
6,Vulture X,56,0.02632
7,Roadrunner Pro,70,0.0329
8,Falcon Tracker,20,0.0094
9,Emu Handheld,35,0.01645


If we want to round these values to two decimal places, we have to use a function. Functions are much like functions in Python. They have a name, open with parentheses, accept arguments, and return a result. Here is the `ROUND()` function to two decimal places on the `REDUCED_PRICE` field. 



In [10]:
sql = """
SELECT PRODUCT_NAME,
PRICE,
ROUND(PRICE * .00047, 2) AS PROCESS_FEE

FROM PRODUCT
"""

pd.read_sql(sql, conn)

Unnamed: 0,PRODUCT_NAME,PRICE,PROCESS_FEE
0,Eagle Kit,120,0.06
1,Hawkeye Cam,80,0.04
2,Sparrow Blade,40,0.02
3,Raven Klaw,40,0.02
4,Kriket Light,25,0.01
5,Owl NV,100,0.05
6,Vulture X,56,0.03
7,Roadrunner Pro,70,0.03
8,Falcon Tracker,20,0.01
9,Emu Handheld,35,0.02


When you are working with text, an operator `||` can be used to concatenate text together (although some database platforms use a `CONCAT()` function instead). If we wanted to merge several fields in the `CUSTOMER` table to create a `SHIP_ADDRESS`, we can do so like this. Note how spaces `' '` and commas `' ,'` are padded in between each field.

In [11]:
sql = """
SELECT CUSTOMER_NAME,
ADDRESS || ' ' || CITY || ', ' || STATE || ' ' || ZIP AS SHIP_ADDRESS
FROM CUSTOMER
"""

pd.read_sql(sql, conn)

Unnamed: 0,CUSTOMER_NAME,SHIP_ADDRESS
0,Alpha Medical,"18745 Train Dr Dallas, TX 75021"
1,Oak Cliff Base,"2379 Cliff Ave Abbevile, LA 70510"
2,Sports Unlimited,"1605 Station Dr Alexandrai, LA 71301"
3,Riley Sporting Goods,"9854 Firefly Blvd Austin, TX 78701"
4,Lite Industrial,"462 Roadrunner Blvd Houston, TX 77254"
5,Prairie Sports Center,"689 Stadium Way Tulsa, OK 74101"
6,Facility 95,"2396 Runway Dr Oklahoma City, OK 73101"
7,Allen Stadium,"573 HIllcrest Rd Allen, TX 75002"
8,Dent Research,"392 45th St Waco, TX 76700"
9,Gamma Solutions,"2752 27th St Phoenix, AZ 85001"


## Commenting Code and Syntax Rules

You can comment code out in SQL using a double dash `--` or multiline syntax `/* */`. These will be ignored by the SQL engine and can be a helpful way to provide context and explanations to your SQL code. 

```sql
-- this is a comment

/*
This is a
multiline comment
*/
```

SQL is not case sensitive so keywords, fields, and table names can be uppercase or lowercase regardless how they are named in storage. You will see queries often end with a semicolon `;` but this is only necessary when running multiple SQL commands at once. Usually running multiple SQL commands happen in writing data, not selecting data. 

# Exercise

Complete the SQL query below by replacing the question marks `?`. Retrieve all records from the `CUSTOMER` table, but grab the `CUSTOMER_NAME` and `CATEGORY` fields. Also concatenate the `CITY` and `STATE` with a comma in-between and name that expression `LOCATION`. 

In [12]:
sql = "SELECT CUSTOMER_NAME, CATEGORY, CITY ||', '|| STATE AS LOCATION FROM CUSTOMER"

pd.read_sql(sql, conn)

Unnamed: 0,CUSTOMER_NAME,CATEGORY,LOCATION
0,Alpha Medical,INDUSTRIAL,"Dallas, TX"
1,Oak Cliff Base,GOVERNMENT,"Abbevile, LA"
2,Sports Unlimited,COMMERCIAL,"Alexandrai, LA"
3,Riley Sporting Goods,COMMERCIAL,"Austin, TX"
4,Lite Industrial,INDUSTRIAL,"Houston, TX"
5,Prairie Sports Center,COMMERCIAL,"Tulsa, OK"
6,Facility 95,GOVERNMENT,"Oklahoma City, OK"
7,Allen Stadium,COMMERCIAL,"Allen, TX"
8,Dent Research,INDUSTRIAL,"Waco, TX"
9,Gamma Solutions,COMMERCIAL,"Phoenix, AZ"


### SCROLL DOWN FOR ANSWER
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
v 

In [13]:
sql = "SELECT CUSTOMER_NAME, CATEGORY, CITY || ', ' || STATE AS LOCATION FROM CUSTOMER"

pd.read_sql(sql, conn)

Unnamed: 0,CUSTOMER_NAME,CATEGORY,LOCATION
0,Alpha Medical,INDUSTRIAL,"Dallas, TX"
1,Oak Cliff Base,GOVERNMENT,"Abbevile, LA"
2,Sports Unlimited,COMMERCIAL,"Alexandrai, LA"
3,Riley Sporting Goods,COMMERCIAL,"Austin, TX"
4,Lite Industrial,INDUSTRIAL,"Houston, TX"
5,Prairie Sports Center,COMMERCIAL,"Tulsa, OK"
6,Facility 95,GOVERNMENT,"Oklahoma City, OK"
7,Allen Stadium,COMMERCIAL,"Allen, TX"
8,Dent Research,INDUSTRIAL,"Waco, TX"
9,Gamma Solutions,COMMERCIAL,"Phoenix, AZ"
