# Learning to COUNT()

### The number of records containing a film_id.

* As you've seen, COUNT(*) tells you how many records are in a table. However, if you want to count the number of non-missing values in a particular field, you can call COUNT() on just that field.

## Count the number of records in the people table, aliasing the result as count_records.

## Count the number of records with a birthdate in the people table, aliasing the result as count_birthdate.

## Count the languages and countries in the films table; alias as count_languages and count_countries.

# SELECT DISTINCT

* Often query results will include many duplicate values. You can use the DISTINCT keyword to select the unique values from a field.

* This might be useful if, for example, you're interested in knowing which languages are represented in the films table.

## Return the unique countries represented in the films table using DISTINCT.

## Return the number of unique countries represented in the films table, aliased as count_distinct_countries.

# Using WHERE with numbers
* Filtering with WHERE allows you to analyze your data better. You may have a dataset that includes a range of different movies, and you need to do a case study on the most notable films with the biggest budgets. In this case, you'll want to filter your data to a specific budget range.

## Select the film_id and imdb_score from the reviews table and filter on scores higher than 7.0.

## Select the film_id and facebook_likes of the first ten records with less than 1000 likes from the reviews table.

## Count how many records have a num_votes of at least 100,000; use the alias films_over_100K_votes.

# Using WHERE with text
* WHERE can also filter string values.

* Imagine you are part of an organization that gives cinematography awards, and you have several international categories. Before you confirm an award for every language listed in your dataset, it may be worth seeing if there are enough films of a specific language to make it a fair competition. If there is only one movie or a significant skew, it may be worth considering a different way of giving international awards.

## Select and count the language field using the alias count_spanish.
## Apply a filter to select only Spanish from the language field

# Using AND
* The following exercises combine AND and OR with the WHERE clause. Using these operators together strengthens your queries and analyses of data.

* You will apply these new skills now on the films database.

## Select the title and release_year for all German-language films released before 2000.

## Update the query from the previous step to show German-language films released after 2000 rather than before.

## Select all details for German-language films released after 2000 but before 2010 using only WHERE and AND.

# Using OR
* This time you'll write a query to get the title and release_year of films released in 1990 or 1999, which were in English or Spanish and took in more than $2,000,000 gross.

## Select the title and release_year for films released in 1990 or 1999 using only WHERE and OR.

## Filter the records to only include English or Spanish-language films.

## Finally, restrict the query to only return films worth more than $2,000,000 gross.

# Using BETWEEN
* Let's use BETWEEN with AND on the films database to get the title and release_year of all Spanish-language films released between 1990 and 2000 (inclusive) with budgets over $100 million.

## Select the title and release_year of all films released between 1990 and 2000 (inclusive) using BETWEEN.

## Build on your previous query to select only films with a budget over $100 million.

## Now, restrict the query to only return Spanish-language films.

## Finally, amend the query to include all Spanish-language or French-language films with the same criteria.

# LIKE and NOT LIKE
* The LIKE and NOT LIKE operators can be used to find records that either match or do not match a specified pattern, respectively. They can be coupled with the wildcards % and _. The % will match zero or many characters, and _ will match a single character.

## Select the names of all people whose names begin with 'B'.

# Select the names of people whose names have 'r' as the second letter.

# Select the names of people whose names don't start with 'A'.

# WHERE IN
* You now know you can query multiple conditions using the IN operator and a set of parentheses. It is a valuable piece of code that helps us keep our queries clean and concise.

## Select the title and release_year of all films released in 1990 or 2000 that were longer than two hours.

## Select the title and language of all films in English, Spanish, or French using IN.

## Select the title, certification and language of all films certified NC-17 or R that are in English, Italian, or Greek.

# Combining filtering and selecting

* Time for a little challenge. So far, your SQL vocabulary from this course includes COUNT(), DISTINCT, LIMIT, WHERE, OR, AND, BETWEEN, LIKE, NOT LIKE, and IN. In this exercise, you will try to use some of these together. Writing more complex queries will be standard for you as you become a qualified SQL programmer.

* As this query will be a little more complicated than what you've seen so far, we've included a bit of code to get you started. You will be using DISTINCT here too because, surprise, there are two movies named 'Hamlet' in this dataset!

# Practice with NULLs

## Select the title of every film that doesn't have a budget associated with it and use the alias no_budget_info.

## Count the number of films with a language associated with them and use the alias count_language_known.

### Aggregate functions and data types
Aggregate functions are another valuable tool for the SQL programmer. They are used extensively across businesses to calculate important metrics, such as the average cost of making a film.

You know five different aggregate functions:

-AVG()

-SUM()

-MIN()

-MAX()

-COUNT()

## Use the SUM() function to calculate the total duration of all films and alias with total_duration.

## Calculate the average duration of all films and alias with average_duration.

## Find the most recent release_year in the films table, aliasing as latest_year.

## Find the duration of the shortest film and use the alias shortest_film.

# Combining aggregate functions with WHERE
* When combining aggregate functions with WHERE, you get a powerful tool that allows you to get more granular with your insights, for example, to get the total budget of movies made from the year 2010 onwards.

* This combination is useful when you only want to summarize a subset of your data. In your film-industry role, as an example, you may like to summarize each certification category to compare how they each perform or if one certification has a higher average budget than another.

## Calculate the average amount grossed by all films whose titles start with the letter 'A' and alias with avg_gross_A.

## Calculate the lowest gross film in 1994 and use the alias lowest_gross.

## Calculate the highest gross film between 2000 and 2012, inclusive, and use the alias highest_gross.

# Using ROUND()
* Aggregate functions work great with numerical values; however, these results can sometimes get unwieldy when dealing with long decimal values. Luckily, SQL provides you with the ROUND() function to tame these long decimals.

## Calculate the average facebook_likes to one decimal place and assign to the alias, avg_facebook_likes.

# ROUND() with a negative parameter
* A useful thing you can do with ROUND() is have a negative number as the decimal place parameter. This can come in handy if your manager only needs to know the average number of facebook_likes to the hundreds since granularity below one hundred likes won't impact decision making.

## Calculate the average budget from the films table, aliased as avg_budget_thousands, and round to the nearest thousand.

# Aliasing with functions
* Aliasing can be a lifesaver, especially as we start to do more complex SQL queries with multiple criteria. Aliases help you keep your code clean and readable. For example, if you want to find the MAX() value of several fields without aliasing, you'll end up with the result with several columns called max and no idea which is which. You can fix this with aliasing.

## Select the title and duration in hours for all films and alias as duration_hours; since the current durations are in minutes, you'll need to divide duration by 60.0.

## Calculate the percentage of people who are no longer alive and alias the result as percentage_dead.

## Find how many decades the films table covers by using MIN() and MAX() and alias as number_of_decades.

# Rounding results
* You found some valuable insights in the previous exercise, but many of the results were inconveniently long. We forgot to round! We won't make you redo them all; however, you'll update the worst offender in this exercise.

## Update the query by adding ROUND() around the calculation and round to two decimal places.

# Sorting single fields
* Now that you understand how ORDER BY works, you'll put it into practice. In this exercise, you'll work on sorting single fields only. This can be helpful to extract quick insights such as the top-grossing or top-scoring film.

## Select the name of each person in the people table, sorted alphabetically.

## Select the title and duration for every film, from longest duration to shortest.

# Sorting multiple fields
* ORDER BY can also be used to sort on multiple fields. It will sort by the first field specified, then sort by the next, and so on. As an example, you may want to sort the people data by age and keep the names in alphabetical order.

## Select the release_year, duration, and title of films ordered by their release year and duration, in that order.

## Select the certification, release_year, and title from films ordered first by certification (alphabetically) and second by release year, starting with the most recent year.## 

# GROUP BY single fields
* GROUP BY is a SQL keyword that allows you to group and summarize results with the additional use of aggregate functions. For example, films can be grouped by the certification and language before counting the film titles in each group. This allows you to see how many films had a particular certification and language grouping.

## Select the release_year and count of films released in each year aliased as film_count.

## Select the release_year and average duration aliased as avg_duration of all films, grouped by release_year.

## Select the release_year, country, and the maximum budget aliased as max_budget for each year and each country; sort your results by release_year and country.

# Answering business questions
* In the real world, every SQL query starts with a business question. Then it is up to you to decide how to write the query that answers the question. Let's try this out.
* Which release_year had the most language diversity?
* "Most language diversity" can be interpreted as COUNT(DISTINCT ___).

In [None]:
SELECT release_year,
    COUNT(DISTINCT language) AS diversity
FROM films
GROUP BY release_year
ORDER BY diversity DESC;

# Filter with HAVING
*  It works similarly to WHERE in that it is a filtering clause, with the difference that HAVING filters grouped data.
* Filtering grouped data can be especially handy when working with a large dataset. When working with thousands or even millions of rows, HAVING will allow you to filter for just the group of data you want, such as films over two hours in length!

## Select country from the films table, and get the distinct count of certification aliased as certification_count.

## Group the results by country.

In [None]:
SELECT country, COUNT(DISTINCT certification) AS certification_count
FROM films
GROUP BY country;

## Filter the unique count of certifications to only results greater than 10.

# HAVING and sorting

### Select the country and the average budget as average_budget, rounded to two decimal, from films.
### Group the results by country.
### Filter the results to countries with an average budget of more than one billion (1000000000).
### Sort by descending order of the average_budget.

# All together now
## In this exercise, you'll write a query that returns the average budget and gross earnings for films each year after 1990 if the average budget is greater than 60 million.