SQL, which stands for Structured Query Language, is a language for interacting with data stored in something called a relational database.

You can think of a relational database as a collection of tables. A table is just a set of rows and columns, like a spreadsheet, which represents exactly one type of entity. For example, a table might represent employees in a company or purchases made, but not both.

Each row, or record, of a table contains information about a single entity. For example, in a table representing employees, each row represents a single person. Each column, or field, of a table contains a single attribute for all rows in the table. For example, in a table representing employees, we might have a column containing first and last names for all employees.

A query is a request for data from a database table (or combination of tables). 

In SQL, you can select data from a table using a SELECT statement. For example, the following query selects the name column from the people table:

~~~
SELECT name
FROM people;
~~~

In the real world, you will often want to select multiple columns. Luckily, SQL makes this really easy. To select multiple columns from a table, simply separate the column names with commas!

For example, this query selects two columns, name and birthdate, from the people table:

~~~
SELECT name, birthdate
FROM people;
~~~

Sometimes, you may want to select all columns from a table. Typing out every column name would be a pain, so there's a handy shortcut:

~~~
SELECT *
FROM people;
~~~

If you only want to return a certain number of results, you can use the LIMIT keyword to limit the number of rows returned:

~~~
SELECT *
FROM people
LIMIT 10;
~~~

Often your results will include many duplicate values. If you want to select all the unique values from a column, you can use the DISTINCT keyword.

This might be useful if, for example, you're interested in knowing which languages are represented in the films table:

~~~
SELECT DISTINCT language
FROM films;
~~~

What if you want to count the number of employees in your employees table? The COUNT statement lets you do this by returning the number of rows in one or more columns.

For example, this code gives the number of rows in the people table:

~~~
SELECT COUNT(*)
FROM people;
~~~

As you've seen, COUNT(*) tells you how many rows are in a table. However, if you want to count the number of non-missing values in a particular column, you can call COUNT on just that column.

For example, to count the number of birth dates present in the people table:

~~~
SELECT COUNT(birthdate)
FROM people;
~~~

It's also common to combine COUNT with DISTINCT to count the number of distinct values in a column.

For example, this query counts the number of distinct birth dates contained in the people table:

~~~
SELECT COUNT(DISTINCT birthdate)
FROM people;
~~~

In SQL, the WHERE keyword allows you to filter based on both text and numeric values in a table. There are a few different comparison operators you can use:

- '=' equal
- '<>' not equal
- '<' less than
- '>' greater than
- '<=' less than or equal to
- '>='greater than or equal to

For example, you can filter text records such as title. The following code returns all films with the title 'Metropolis':

~~~
SELECT title
FROM films
WHERE title = 'Metropolis';
~~~

**Notice that the WHERE clause always comes after the FROM statement!**

Often, you'll want to select data based on multiple conditions. You can build up your WHERE queries by combining multiple conditions with the AND keyword.

For example,

~~~
SELECT title
FROM films
WHERE release_year > 1994
AND release_year < 2000;
~~~

gives you the titles of films released between 1994 and 2000.

SQL has the OR operator.

For example, the following returns all films released in either 1994 or 2000:

~~~
SELECT title
FROM films
WHERE release_year = 1994
OR release_year = 2000;
~~~

When combining AND and OR, be sure to enclose the individual clauses in parentheses, like so:

~~~
SELECT title
FROM films
WHERE (release_year = 1994 OR release_year = 1995)
AND (certification = 'PG' OR certification = 'R');
~~~

Otherwise, due to SQL's precedence rules, you may not get the results you're expecting!

Checking for ranges like this is very common, so in SQL the BETWEEN keyword provides a useful shorthand for filtering values within a specified range. This query is equivalent to the one above:

~~~
SELECT title
FROM films
WHERE release_year
BETWEEN 1994 AND 2000;
~~~

It's important to remember that BETWEEN is inclusive, meaning the beginning and end values are included in the results!

The IN operator allows you to specify multiple values in a WHERE clause, making it easier and quicker to specify multiple OR conditions! Neat, right?

~~~
SELECT name
FROM kids
WHERE age IN (2, 4, 6, 8, 10);
~~~

In SQL, NULL represents a missing or unknown value. You can check for NULL values using the expression IS NULL. For example, to count the number of missing birth dates in the people table:

~~~
SELECT COUNT(*)
FROM people
WHERE birthdate IS NULL;
~~~

As you can see, IS NULL is useful when combined with WHERE to figure out what data you're missing.

Sometimes, you'll want to filter out missing values so you only get results which are not NULL. To do this, you can use the IS NOT NULL operator.

For example, this query gives the names of all people whose birth dates are not missing in the people table.

~~~
SELECT name
FROM people
WHERE birthdate IS NOT NULL;
~~~

In SQL, the LIKE operator can be used in a WHERE clause to search for a pattern in a column. To accomplish this, you use something called a wildcard as a placeholder for some other values. There are two wildcards you can use with LIKE:

The % wildcard will match zero, one, or many characters in text. For example, the following query matches companies like 'Data', 'DataC' 'DataCamp', 'DataMind', and so on:

~~~
SELECT name
FROM companies
WHERE name LIKE 'Data%';
~~~

The _ wildcard will match a single character. For example, the following query matches companies like 'DataCamp', 'DataComp', and so on:

~~~
SELECT name
FROM companies
WHERE name LIKE 'DataC_mp';
~~~

You can also use the NOT LIKE operator to find records that don't match the pattern you specify.

#### Aggregate functions

Often, you will want to perform some calculation on the data in a database. SQL provides a few functions, called aggregate functions, to help you out with this.

For example,

~~~
SELECT AVG(budget)
FROM films;
~~~

gives you the average value from the budget column of the films table. Similarly, the MAX function returns the highest budget:

~~~
SELECT MAX(budget)
FROM films;
~~~

The SUM function returns the result of adding up the numeric values in a column:

~~~
SELECT SUM(budget)
FROM films;
~~~

SQL allows you to do something called aliasing. Aliasing simply means you assign a temporary name to something. To alias, you use the AS keyword, which you've already seen earlier in this course.

For example, in the above example we could use aliases to make the result clearer:

~~~
SELECT MAX(budget) AS max_budget,
       MAX(duration) AS max_duration
FROM films;
~~~

In SQL, the ORDER BY keyword is used to sort results in ascending or descending order according to the values of one or more columns.

By default ORDER BY will sort in ascending order. If you want to sort the results in descending order, you can use the DESC keyword. For example,

~~~
SELECT title
FROM films
ORDER BY release_year DESC;
~~~

gives you the titles of films sorted by release year, from newest to oldest.

In SQL, GROUP BY allows you to group a result by one or more columns, like so:

~~~
SELECT sex, count(*)
FROM employees
GROUP BY sex;
~~~

**Commonly, GROUP BY is used with aggregate functions like COUNT() or MAX(). Note that GROUP BY always goes after the FROM clause!**

 Note also that ORDER BY always goes after GROUP BY.

In SQL, aggregate functions can't be used in WHERE clauses. For example, the following query is invalid:

~~~
SELECT release_year
FROM films
GROUP BY release_year
WHERE COUNT(title) > 10;
~~~

This means that if you want to filter based on the result of an aggregate function, you need another way! That's where the HAVING clause comes in. For example,

~~~
SELECT release_year
FROM films
GROUP BY release_year
HAVING COUNT(title) > 10;
~~~

shows only those years in which more than 10 films were released.

~~~
-- select country, average budget, average gross
SELECT country, AVG(budget) AS avg_budget, AVG(gross) AS avg_gross
-- from the films table
FROM films
-- group by country 
GROUP BY country
-- where the country has more than 10 titles
HAVING COUNT(title) > 10
-- order by country
ORDER BY country
-- limit to only show 5 results
LIMIT 5
~~~