# BEGINNER TUTORIAL

Welcome to the first part of the SQL tutorial! We are happy to see you here :) In this notebook, you will learn the basic syntax of a SQL query, and begin to navigate a database to get the information you need. Once you've finished, you will be able to specify what kind of data you want from any database you have on hand. As with all unknown things, this is going to be an adventure - but no worries, since you already worked up the courage to be here, you'll be just fine :)


To start things off, run the first command down below by pressing **shift + enter**.  

In [1]:
-- connection: postgresql://localhost:5432/northwind

For our exercises, we will be using the Northwind Database. This database is about a company named "Northwind Traders" and captures all the sales transactions that occur between the company and the customers, as well as the purchase transactions between Northwind and its suppliers.

The diagram shows the table structure of the Northwind database.

![](img/northwind_schema.png)

There are additional tables, but we will only be using the ones shown above in this tutorial.

Most of the actions you need to perform on a database are down with SQL statements. The general syntax of a SQL query takes the form below:

SELECT DISTINCT < column expression list >
FROM < relation >
WHERE < predicate >
GROUP BY < column list > 
HAVING < predicate > 
ORDER BY < column list > 
LIMIT < number > 

SQL keywords are NOT case sensitive; select is the same as SELECT. However, it is common practice to write SQL syntax in capital letters. It also helps to visually structure your query for others to read. We will write all SQL keywords in uppercase.

A semicolon is the standard way to separate each SQL statement in database systems that allow more than one SQL statement to be executed in the same call to the server. In this tutorial, we will use semicolon at the end of each SQL statement.


### Select Statements

The SELECT statement is used to select data from a database. The data returned is stored in a result table, called the result-set.

Try out the following SQL statements to see what columns you select from the "Customers" table.

In [None]:
SELECT CompanyName, Address FROM Customers;

In [None]:
SELECT * FROM Customers;

Inside a table, a column often contains many duplicate values; and sometimes you only want to list the different (distinct) values. The SELECT DISTINCT statement is used to return only unique values.

What is the difference between the two statements below?

In [None]:
SELECT Country FROM Customers;

In [None]:
SELECT DISTINCT Country FROM Customers;

### Where Clause

The WHERE clause is used to filter records.

The WHERE clause is used to extract only those records that fulfill a specified condition.

In [None]:
SELECT * FROM Customers 
WHERE Country = 'Mexico';

The following operators can be used in the WHERE clause:

Operator | Description |
--- |---|
= | Equal |
<> | Not equal|
\> | Greater than |
< | Less than|
\>= |Greater than or equal|
<= |Less than or equal|
BETWEEN |Between an inclusive range|
LIKE |Search for a pattern|
IN |To specify multiple possible values for a column|


### And/Not/Or Operators
The WHERE clause can be combined with AND, OR, and NOT operators.

The AND and OR operators are used to filter records based on more than one condition:

The AND operator displays a record if all the conditions separated by AND is TRUE.
The OR operator displays a record if any of the conditions separated by OR is TRUE.
The NOT operator displays a record if the condition(s) is NOT TRUE.

In [None]:
SELECT * FROM Customers
WHERE Postalcode BETWEEN '05021' AND '05030';

In [None]:
SELECT companyname, contactname, country
FROM Customers
WHERE NOT country = 'Mexico';

### Order By Keyword

The ORDER BY keyword is used to sort the result-set in ascending or descending order.

The ORDER BY keyword sorts the records in ascending order by default. To sort the records in descending order, use the DESC keyword.

In [None]:
SELECT DISTINCT * FROM Customers
ORDER BY Country ASC;

In [None]:
SELECT * FROM Customers
ORDER BY City ASC, Country DESC;

### Your turn to try! 

Select the company name, contact title, address, and region from the Customers table where the country isn't Italy. 

Select all distinct columns from the Customers table where the city is Sao Paulo in ascending order by country. 

Select the birthdate, address, city, and home phone from the Employees table where the city is in Seattle and the region is in Washington, ordered by postal code in descending order.

**(Hint: refer to the database schema at the top of the notebook)**

### Aggregate Functions

So far, we've only worked with data from the existing rows in the table - all of our returned tables have been some subset of the entries found in the table. But to conduct data analysis, we'll want to compute aggregate values over our data. In SQL, these are called aggregate functions.

|Common Aggregate Functions include **count, sum, average (avg), maximum (max), and minimum (min).**

For example, if we want to find the average price of all units in the Order Details table:

In [None]:
SELECT AVG(UnitPrice) 
FROM Order_Details;

### Limit Clause

LIMIT controls how many tuples are displayed and helps us control how the data is displayed.
If we want the top 3 distinct Customer IDs and ship names from the Orders table, then we would write:

In [None]:
SELECT DISTINCT CustomerID, shipname 
FROM Orders
ORDER BY CustomerID
LIMIT 3;

We don't have to put ASC in the ORDER BY clause since SQL automatically orders in ascending order.  
Also note that LIMIT needs to be the last part of the query.

### Let's try it!

How many employees are from London? Use the Employees table. 

What is the first name, last name, and address of the first 10 employees in alphabetical order by last name?