## Aggregate Functions, Sorting, Groups 

- aggregate functions
- aliasing
- group by
- order by
- group by
- having


In [86]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [87]:
%sql sqlite:///imdb.db

'Connected: @imdb.db'

List tables in db

In [88]:
%sql select name from sqlite_master where type = 'table'

 * sqlite:///imdb.db
Done.


name
films
people
reviews
roles


Use the SUM function to get the total duration of all films.

Get the duration of the longest film.

In [89]:
%sql select sum(duration) from films;

 * sqlite:///imdb.db
Done.


sum(duration)
1069764


In [90]:
%sql select max(duration) from films;

 * sqlite:///imdb.db
Done.


max(duration)
334


Get the amount grossed by the best performing film.

In [91]:
%sql select * from films limit 3;

 * sqlite:///imdb.db
Done.


id,title,release_year,country,duration,language,certification,gross,budget
1,Intolerance: Love's Struggle Throughout the Ages,1916,USA,123,,Not Rated,,385907
2,Over the Hill to the Poorhouse,1920,USA,110,,,3000000.0,100000
3,The Big Parade,1925,USA,151,,Not Rated,,245000


In [92]:
%sql select max(gross) from films;

 * sqlite:///imdb.db
Done.


max(gross)
936627416


Get the amount grossed by the best performing film between 2000 and 2012, inclusive.

In [93]:
% sql select max(gross) from films where release_year between 2000 and 2012;

 * sqlite:///imdb.db
Done.


max(gross)
760505847


Get the average duration in hours for all films, aliased as avg_duration_hours

In [94]:
%sql select avg(duration) as avg_duration_hours from films;

 * sqlite:///imdb.db
Done.


avg_duration_hours
107.94793138244198


Get the percentage of people who are no longer alive. Alias the result as percentage_dead.

In [95]:
% sql select * from people limit 3; 

 * sqlite:///imdb.db
Done.


id,name,birthdate,deathdate
1,50 Cent,1975-07-06,
2,A. Michael Baldwin,1963-04-04,
3,A. Raven Cruz,,


In [96]:
%sql select count(deathdate) * 100.0 / count(*) as percentage_dead from people;

 * sqlite:///imdb.db
Done.


percentage_dead
9.372394902941526


get titles of films sorted by relase_year from newest to oldest

In [97]:
%sql select title,release_year from films order by release_year desc limit 5;

 * sqlite:///imdb.db
Done.


title,release_year
10 Cloverfield Lane,2016
13 Hours,2016
A Beginner's Guide to Snuff,2016
Airlift,2016
Alice Through the Looking Glass,2016


Get the names of people from the people table, sorted alphabetically.

In [98]:
% sql select name from people order by name asc limit 5;

 * sqlite:///imdb.db
Done.


name
50 Cent
A. Michael Baldwin
A. Raven Cruz
A.J. Buckley
A.J. DeLucia


Get the title of films released in 2000 or 2012, in the order they were released.

In [99]:
%sql select title from films where release_year in (2000,2012) order by release_year limit 5;

 * sqlite:///imdb.db
Done.


title
102 Dalmatians
28 Days
3 Strikes
Aberdeen
All the Pretty Horses


Get the title and gross earnings for movies which begin with the letter 'M' and order the results alphabetically.

In [100]:
%sql select title,gross from films where title like 'M%' limit 10;

 * sqlite:///imdb.db
Done.


title,gross
Metropolis,26435.0
Modern Times,163245.0
Mr. Smith Goes to Washington,
Moby Dick,
Mary Poppins,102300000.0
My Fair Lady,72000000.0
Major Dundee,14873.0
Machine Gun McCain,
Midnight Cowboy,
Mississippi Mermaid,26893.0


Get the birth date and name of people in the people table, in order of when they were born and alphabetically by name

In [101]:
%sql select birthdate,name from people where birthdate != 'None' order by birthdate,name limit 5;

 * sqlite:///imdb.db
Done.


birthdate,name
1837-10-10,Robert Shaw
1872-11-07,Lucille La Verne
1874-03-14,Mary Carr
1875-01-22,D.W. Griffith
1878-01-20,Finlay Currie


group employees by sex

Get the release year and count of films released in each year.

In [102]:
%sql select release_year, count(*) from films group by release_year limit 5;

 * sqlite:///imdb.db
Done.


release_year,count(*)
,84
1916.0,2
1920.0,2
1925.0,2
1927.0,2


Get the release year and average duration of all films, grouped by release year.

In [103]:
%sql select release_year, avg(duration) from films group by release_year limit 5;

 * sqlite:///imdb.db
Done.


release_year,avg(duration)
,77.4390243902439
1916.0,123.0
1920.0,110.0
1925.0,151.0
1927.0,145.0


Get the IMDB score and count of film reviews grouped by IMDB score in the reviews table.

In [104]:
%sql select * from reviews limit 1;

 * sqlite:///imdb.db
Done.


id,film_id,num_user,num_critic,imdb_score,num_votes,facebook_likes
1,3934,588,432,7.1,203461,46000


In [105]:
%sql select imdb_score, count(*) from reviews  group by imdb_score order by imdb_score desc limit 5;

 * sqlite:///imdb.db
Done.


imdb_score,count(*)
9.5,1
9.3,1
9.2,1
9.1,1
9.0,2


Get the country, release year, and lowest amount grossed per release year per country. Order your results by country and release year.

In [106]:
%%sql 
select country,release_year, min(gross) from films
group by country, release_year
order by country, release_year
limit 5;

 * sqlite:///imdb.db
Done.


country,release_year,min(gross)
,,
,2014.0,
Afghanistan,2003.0,1127331.0
Argentina,2000.0,1221261.0
Argentina,2004.0,304124.0


In SQL, aggregate functions can't be used in WHERE clauses - use having


In [107]:
%%sql 
SELECT release_year,count(*)
FROM films
GROUP BY release_year
HAVING COUNT(title) > 10
limit 5;

 * sqlite:///imdb.db
Done.


release_year,count(*)
,84
1962.0,16
1963.0,16
1964.0,20
1965.0,14


Get the release year, budget and gross earnings for each film in the films table.

In [108]:
%sql select release_year, budget, gross from films limit 5;

 * sqlite:///imdb.db
Done.


release_year,budget,gross
1916,385907.0,
1920,100000.0,3000000.0
1925,245000.0,
1927,6000000.0,26435.0
1929,,9950.0


only records with a release_year after 1990 are included

In [109]:
%sql select release_year, budget, gross from films where release_year > 1990 limit 5;

 * sqlite:///imdb.db
Done.


release_year,budget,gross
1991,6000000,869325
1991,20000000,38037513
1991,6000000,57504069
1991,35000000,79100000
1991,15000000,30102717


Remove the budget and gross columns, and group your results by release year.

In [110]:
%sql select release_year from films where release_year > 1990 group by release_year limit 5;

 * sqlite:///imdb.db
Done.


release_year
1991
1992
1993
1994
1995


Modify your query to include the average budget and average gross earnings for the results you have so far. Alias the average budget as avg_budget; alias the average gross earnings as avg_gross

In [111]:
%%sql
select release_year, avg(budget) as avg_budget, avg(gross) as avg_gross from films
where release_year > 1990 group by release_year limit 5;

 * sqlite:///imdb.db
Done.


release_year,avg_budget,avg_gross
1991,25176548.387096774,53844501.66666666
1992,25982030.303030305,63665195.14705882
1993,20729787.23404255,45302091.41304348
1994,29013773.58490566,59395666.16981132
1995,32775000.0,44909519.98550725


Modify your query so that only years with an average budget of greater than $60 million are included.

In [112]:
%%sql
select release_year, avg(budget) as avg_budget, avg(gross) as avg_gross from films
where release_year > 1990 group by release_year having avg(budget) > 60000000 limit 5;

 * sqlite:///imdb.db
Done.


release_year,avg_budget,avg_gross
2005,70323938.23152709,41159143.29064039
2006,93968929.5774648,39237855.9537037
