<a href="https://colab.research.google.com/github/partapparam/jigsawLabPractice/blob/main/week3/has_many_movies.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Has Many Movie Lab

### Introduction
In this lab we will continue to look at the "Has-Many" relationships in our data. The database we will be using during this lab contains information about a selection of movies and related entities such as actors, directors and writers. A movie entity will have relationships with actor, director, and writer entities. The actors, directors and writers will also have relationships with themselves (i.e. a director will have worked with many actors). In problems below, we will use our knowledge of these relationships to build SQL queries.

Let's begin by connecting to the database and reviewing the schema of the tables.

In [1]:
import sqlite3
conn = sqlite3.connect('movie_films_actors.db')
cursor = conn.cursor()

In [2]:
import pandas as pd
root_url = "https://raw.githubusercontent.com/jigsawlabs-student/curriculum-images/main/has-many-movies-lab/"
names = ['actors', 'directors', 'movies', 'writers', 'movie_actors', 'movie_directors', 'movie_writers']
loaded_dfs = [pd.read_csv(f'{root_url}{name}.csv') for name in names]

In [3]:
for index, name in enumerate(names):
    loaded_dfs[index].to_sql(f'{name}', conn, index = False)

In [4]:
cursor.execute('SELECT name from sqlite_master where type= "table"')
cursor.fetchall()

[('actors',),
 ('directors',),
 ('movies',),
 ('writers',),
 ('movie_actors',),
 ('movie_directors',),
 ('movie_writers',)]

In [5]:
cursor.execute('PRAGMA table_info(movies)')
cursor.fetchall()

[(0, 'id', 'INTEGER', 0, None, 0),
 (1, 'title', 'TEXT', 0, None, 0),
 (2, 'studio', 'TEXT', 0, None, 0),
 (3, 'runtime', 'REAL', 0, None, 0),
 (4, 'description', 'TEXT', 0, None, 0),
 (5, 'release_date', 'TEXT', 0, None, 0),
 (6, 'year', 'INTEGER', 0, None, 0)]

In [6]:
cursor.execute('PRAGMA table_info(actors)')
cursor.fetchall()

[(0, 'id', 'INTEGER', 0, None, 0), (1, 'name', 'TEXT', 0, None, 0)]

In [7]:
cursor.execute('PRAGMA table_info(directors)')
cursor.fetchall()

[(0, 'id', 'INTEGER', 0, None, 0), (1, 'name', 'TEXT', 0, None, 0)]

In [8]:
cursor.execute('PRAGMA table_info(writers)')
cursor.fetchall()

[(0, 'id', 'INTEGER', 0, None, 0), (1, 'name', 'TEXT', 0, None, 0)]

In [9]:
cursor.execute('PRAGMA table_info(movie_actors)')
cursor.fetchall()

[(0, 'id', 'INTEGER', 0, None, 0),
 (1, 'movie_id', 'INTEGER', 0, None, 0),
 (2, 'actor_id', 'INTEGER', 0, None, 0)]

In [10]:
cursor.execute('PRAGMA table_info(movie_directors)')
cursor.fetchall()

[(0, 'id', 'INTEGER', 0, None, 0),
 (1, 'movie_id', 'INTEGER', 0, None, 0),
 (2, 'director_id', 'INTEGER', 0, None, 0)]

In [11]:
cursor.execute('PRAGMA table_info(movie_writers)')
cursor.fetchall()

[(0, 'id', 'INTEGER', 0, None, 0),
 (1, 'movie_id', 'INTEGER', 0, None, 0),
 (2, 'writer_id', 'INTEGER', 0, None, 0)]

Let's start off with some basic one table queries:

* What is the title, length, and id of the movie with the longest runtime?

In [12]:
cursor.execute('''SELECT title, runtime, id FROM movies
ORDER BY runtime DESC
LIMIT 1''')
cursor.fetchall()

# [('Never Sleep Again: The Elm Street Legacy', 480.0, 11415)]

[('Never Sleep Again: The Elm Street Legacy', 480.0, 11415)]

* Using your answer from the previous question, how many actors were credited for the movie with the longest runtime? Hint: Use the COUNT function with the movie ID

In [13]:
cursor.execute('''SELECT count(a.movie_id) FROM movies
JOIN movie_actors as a ON movies.id = a.movie_id
WHERE a.movie_id = 11415''')
cursor.fetchall()

# [(6,)]

[(6,)]

* What was the shortest movie released in 2006?

In [14]:
cursor.execute('''SELECT title FROM movies
WHERE year = 2006
ORDER BY runtime ASC
LIMIT 1 ''')
cursor.fetchall()

# [('The Guardian',)]

[('The Guardian',)]

### Has Many

* What are the names of the actors in Toy Story?

In [32]:
cursor.execute('''SELECT ac.name FROM movies AS m
JOIN movie_actors AS ma ON ma.movie_id = m.id
JOIN actors AS ac on ma.actor_id = ac.id
WHERE m.title = "Toy Story"''')
cursor.fetchall()

# [('Tom Hanks',),
#  ('Jim Varney',),
#  ('Wallace Shawn',),
#  ('Don Rickles',),
#  ('John Ratzenberger',),
#  ('Tim Allen',)]

[('Tom Hanks',),
 ('Tim Allen',),
 ('Jim Varney',),
 ('Wallace Shawn',),
 ('Don Rickles',),
 ('John Ratzenberger',)]

* What is the name of the director of Toy Story?

In [33]:
cursor.execute('''SELECT d.name FROM movies AS m
JOIN movie_directors AS md ON md.movie_id = m.id
JOIN directors AS d ON md.director_id = d.id
WHERE m.title = "Toy Story"''')
cursor.fetchall()

# [('John Lasseter',)]

[('John Lasseter',)]

* What are the names of the writers of Toy Story?

In [35]:
cursor.execute('''select w.name from movies as m
join movie_writers as mw on mw.movie_id = m.id
join writers as w on mw.writer_id = w.id
where m.title = "Toy Story"''')
cursor.fetchall()


# [('Joss Whedon',), ('Joel Cohen',), ('Andrew Stanton',), ('Alec Sokolow',)]

[('Joss Whedon',), ('Joel Cohen',), ('Andrew Stanton',), ('Alec Sokolow',)]

* What is the name and actor id of the actor with the most credits in the database?

In [40]:
cursor.execute('''select a.name, a.id, count(ma.id) from movie_actors as ma
join actors as a on ma.actor_id = a.id
group by 1
order by 3 desc
limit 1
''')
cursor.fetchall()

# [('Robert De Niro', 429, 78)]

[('Robert De Niro', 429, 78)]

* What are the titles of the movies the actor from the previous question has been in, after the year 2005?

In [42]:
cursor.execute('''select m.title from movies as m
join movie_actors as ma on ma.movie_id = m.id
join actors as a on ma.actor_id = a.id
where a.name = "Robert De Niro" and year > 2005''')
cursor.fetchall()

# [("New Year's Eve",),
#  ('Mr. Warmth: The Don Rickles Project',),
#  ('Hands of Stone',),
#  ('Last Vegas',),
#  ('I Knew It Was You: Rediscovering John Cazale',),
#  ('Stardust',),
#  ('Killer Elite',),
#  ("Everybody's Fine",),
#  ('Stone',),
#  ('Machete',),
#  ('Red Lights',),
#  ('Righteous Kill',),
#  ('The Good Shepherd',),
#  ('The Bag Man',),
#  ('Being Flynn',),
#  ('Joy',),
#  ('The Wizard of Lies',),
#  ('Limitless',),
#  ('Killing Season',),
#  ('The Family',),
#  ('Heist',),
#  ('Great Expectations',),
#  ('Little Fockers',),
#  ('What Just Happened?',),
#  ('The Comedian',),
#  ('The Big Wedding',),
#  ('Dirty Grandpa',),
#  ('Grudge Match',)]

[('Stardust',),
 ('Heist',),
 ('Hands of Stone',),
 ('Killer Elite',),
 ('Machete',),
 ('The Family',),
 ('Great Expectations',),
 ('Little Fockers',),
 ('What Just Happened?',),
 ('The Comedian',),
 ('The Big Wedding',),
 ('Dirty Grandpa',),
 ('Grudge Match',),
 ("New Year's Eve",),
 ('Mr. Warmth: The Don Rickles Project',),
 ('Last Vegas',),
 ('I Knew It Was You: Rediscovering John Cazale',),
 ("Everybody's Fine",),
 ('Stone',),
 ('Red Lights',),
 ('Righteous Kill',),
 ('The Good Shepherd',),
 ('The Bag Man',),
 ('Being Flynn',),
 ('Joy',),
 ('The Wizard of Lies',),
 ('Limitless',),
 ('Killing Season',)]

* What are the titles of movies with more than two directors -- order by title ascending and limit to the first five results

In [43]:
cursor.execute('''select m.title from movies as m
join movie_directors as md on md.movie_id = m.id
group by m.title
having count(md.director_id) > 2
order by 1 asc
limit 5''')
cursor.fetchall()

# [('101 Dalmatians',),
#  ('11/8/2016',),
#  ('A Crude Awakening: The Oil Crash',),
#  ('A Farewell To Arms',),
#  ("A Liar's Autobiography - The Untrue Story of Monty Python's Graham Chapman",)]

[('101 Dalmatians',),
 ('11/8/2016',),
 ('A Crude Awakening: The Oil Crash',),
 ('A Farewell To Arms',),
 ("A Liar's Autobiography - The Untrue Story of Monty Python's Graham Chapman",)]

### Has Many Through

* What is the name of the writer in the database that has been credited the most times during the year 2018?

In [48]:
cursor.execute('''select w.name, count(mw.id) from writers as w
join movie_writers as mw on mw.writer_id = w.id
join movies as m on m.id = mw.movie_id
where m.year = 2018
group by w.name
order by 2 desc
limit 1''')
cursor.fetchall()

# [('Ryan Engle', 3)]

[('Ryan Engle', 3)]

* What is the name of the actor or actress in the database that has been credited the most between 2010 and 2015 (inclusive)?

In [59]:
cursor.execute('''select a.name, count(ma.id) from actors as a
join movie_actors as ma on ma.actor_id = a.id
join movies as m on ma.movie_id = m.id
where year >= 2010 and year <= 2015
group by 1
order by 2 desc, 1
limit 2''')
cursor.fetchall()

# [('James Franco', 22)]

[('James Franco', 22), ('Liam Neeson', 22)]

* What are the names of all actors who performed in more than 3 movies in 2010?

In [62]:
cursor.execute('''select a.name, count(ma.id) from actors as a
join movie_actors as ma on ma.actor_id = a.id
join movies as m on ma.movie_id = m.id
where year = 2010
group by 1
having count(ma.id) > 3
''')
cursor.fetchall()

# [('Aaron Taylor-Johnson',),
#  ('Adam Scott',),
#  ('Barry Pepper',),
#  ('Ben Stiller',),
#  ('Danny Huston',),
#  ('Gemma Arterton',),
#  ('Helen Mirren',),
#  ('Jay Baruchel',),
#  ('Jessica Alba',),
#  ('Jonah Hill',),
#  ('Josh Brolin',),
#  ('Josh Duhamel',),
#  ('Keith David',),
#  ('Liam Neeson',),
#  ('Matt Damon',),
#  ('Melissa Leo',),
#  ('Patricia Clarkson',),
#  ('Pierce Brosnan',),
#  ('Ralph Fiennes',),
#  ('Susan Sarandon',),
#  ('Zach Galifianakis',)]

[('Aaron Taylor-Johnson', 4),
 ('Adam Scott', 5),
 ('Barry Pepper', 4),
 ('Ben Stiller', 4),
 ('Danny Huston', 4),
 ('Gemma Arterton', 4),
 ('Helen Mirren', 4),
 ('Jay Baruchel', 4),
 ('Jessica Alba', 4),
 ('Jonah Hill', 4),
 ('Josh Brolin', 5),
 ('Josh Duhamel', 4),
 ('Keith David', 4),
 ('Liam Neeson', 7),
 ('Matt Damon', 4),
 ('Melissa Leo', 5),
 ('Patricia Clarkson', 4),
 ('Pierce Brosnan', 5),
 ('Ralph Fiennes', 5),
 ('Susan Sarandon', 5),
 ('Zach Galifianakis', 5)]

* What studio has Steven Spielberg worked with the most?

In [65]:
cursor.execute('''select m.studio, count(md.id) from movies as m
join movie_directors as md on md.movie_id = m.id
join directors as d on md.director_id = d.id
where d.name = "Steven Spielberg"
group by 1
order by 2 desc
limit 1''')
cursor.fetchall()

# [('Universal Pictures', 7)]

[('Universal Pictures', 7)]

* What years did Steven Spielberg direct 2 movies?

In [68]:
cursor.execute('''select m.year, count(md.id) from movies as m
join movie_directors as md on md.movie_id = m.id
join directors as d on md.director_id = d.id
where d.name = "Steven Spielberg"
group by 1
having count(md.id) = 2
''')
cursor.fetchall()

# [(1989, 2), (1993, 2), (1997, 2), (2002, 2),
# (2005, 2), (2011, 2), (2018, 2)]

[(1989, 2), (1993, 2), (1997, 2), (2002, 2), (2005, 2), (2011, 2), (2018, 2)]

* How many movies has each of the actors from Toy Story been in? (movie ID is 3648)

In [76]:
cursor.execute('''select a.name, count(ma.id) from movies as m
join movie_actors as ma on ma.movie_id = m.id
join actors as a on a.id = ma.actor_id
where a.id in (select a.id from actors a join movie_actors as ma on ma.actor_id = a.id where ma.movie_id = 3648)
group by 1
''')
# cursor.execute('''select a.id from actors a join movie_actors as ma on ma.actor_id = a.id where ma.movie_id = 3648
# ''')
cursor.fetchall()

# [('Tom Hanks', 46),
 # ('Jim Varney', 8),
 # ('Wallace Shawn', 27),
 # ('Don Rickles', 10),
 # ('John Ratzenberger', 7),
 # ('Tim Allen', 20)]

[('Don Rickles', 10),
 ('Jim Varney', 8),
 ('John Ratzenberger', 7),
 ('Tim Allen', 20),
 ('Tom Hanks', 46),
 ('Wallace Shawn', 27)]

* What are the names of other movies the director of Toy Story directed?

In [77]:
cursor.execute('''select m.title from movies as m
join movie_directors as md on md.movie_id = m.id
join directors as d on d.id = md.director_id
where d.id = (select d.id from directors d join movie_directors md on md.director_id = d.id where md.movie_id = 3648)
group by 1
''')

cursor.fetchall()

# [('Cars 2',), ('Cars',), ("A Bug's Life",), ('Toy Story 2',), ('Toy Story',)]

[("A Bug's Life",), ('Cars',), ('Cars 2',), ('Toy Story',), ('Toy Story 2',)]

* What are the names of all the directors Tom Hanks has worked with? (Actor ID 189) -- order by the director's name ascending and limit to the first five results.

In [79]:
cursor.execute('''select d.name from directors as d
where d.id in (select d.id from directors d
                join movie_directors as md on md.director_id = d.id
                join movie_actors as ma on md.movie_id = ma.movie_id
                where ma.actor_id = 189)
order by 1
limit 5
''')

cursor.fetchall()

# [('Alexander Mackendrick',),
#  ('Angus MacLane',),
#  ('Brian DePalma',),
#  ('Chris Paine',),
#  ('Clint Eastwood',)]

[('Alexander Mackendrick',),
 ('Angus MacLane',),
 ('Brian DePalma',),
 ('Chris Paine',),
 ('Clint Eastwood',)]

* What is the name of the director Tom Hanks has worked with the most?

In [90]:
cursor.execute('''select d.id, count(md.id) from directors d
                join movie_directors as md on md.director_id = d.id
                join movie_actors as ma on md.movie_id = ma.movie_id
                where ma.actor_id = 189
                group by 1

''')
cursor.fetchall()

[(9, 1),
 (23, 5),
 (45, 1),
 (98, 3),
 (141, 1),
 (262, 1),
 (378, 5),
 (511, 1),
 (514, 1),
 (535, 1),
 (540, 1),
 (561, 1),
 (622, 1),
 (652, 1),
 (725, 1),
 (774, 2),
 (911, 2),
 (1030, 1),
 (1095, 1),
 (1322, 1),
 (1535, 1),
 (1580, 1),
 (1751, 1),
 (2428, 1),
 (2771, 1),
 (3299, 1),
 (3475, 2),
 (3605, 1),
 (3766, 1),
 (3971, 1),
 (4202, 2),
 (4814, 1),
 (5259, 1),
 (5523, 1),
 (5729, 1),
 (6370, 1),
 (6485, 1)]

In [94]:
cursor.execute('''select dl.name, dl.cd from
                (select d.name, count(md.id) as cd from directors d
                join movie_directors as md on md.director_id = d.id
                join movie_actors as ma on md.movie_id = ma.movie_id
                where ma.actor_id = 189
                group by 1
                ) dl
order by 2 desc
limit 3
''')

cursor.fetchall()

# [('Steven Spielberg', 5)]

[('Ron Howard', 5), ('Steven Spielberg', 5), ('Robert Zemeckis', 3)]

* What are the names of all the writers Tom Hanks has worked with?

In [102]:
cursor.execute('''select distinct(w.name) from (select movie_id from movie_actors as ma
                where ma.actor_id = 189) as mov
                join movie_writers as mw on mov.movie_id = mw.movie_id
                join writers as w on w.id = mw.writer_id
''')

cursor.fetchall()

# [('Eric Roth',),
#  ('Nia Vardalos',),
#  ('Tom Hanks',),
#  ('Gary Ross',),
#  ('Anne Spielberg',),
#  ('Chris Paine',),
#  ('Scott Frank',),
#  ('Robert Rodat',),
#  ('Frank Darabont',),
#  ('Tom Tykwer',)]

[('Eric Roth',),
 ('Nia Vardalos',),
 ('Tom Hanks',),
 ('Gary Ross',),
 ('Anne Spielberg',),
 ('Chris Paine',),
 ('Scott Frank',),
 ('Robert Rodat',),
 ('Frank Darabont',),
 ('Tom Tykwer',),
 ('Max Allan Collins',),
 ('David Self',),
 ('Richard Piers Rayner',),
 ('Steve Purcell (II)',),
 ('Jeff Nathanson',),
 ('Sacha Gervasi',),
 ('Nora Ephron',),
 ('Delia Ephron',),
 ('Mikls Lszl',),
 ('Billy Ray',),
 ('David Koepp',),
 ('Akiva Goldsman',),
 ('Dave Eggers',),
 ('James Ponsoldt',),
 ('Joel Coen',),
 ('Ethan Coen',),
 ('Matt Charman',),
 ('Matthew Charman',),
 ('David Seltzer',),
 ('Erik Jendresen',),
 ('Josh Singer',),
 ('Liz Hannah',),
 ('Lilly Wachowski',),
 ('Lana Wachowski',),
 ('Lowell Ganz',),
 ('Bruce Jay Friedman',),
 ('Babaloo Mandel',),
 ('Brian Grazer',),
 ('William Broyles',),
 ('Dario Argento',),
 ('Andrew Stanton',),
 ('Todd Komarnicki',),
 ('John Lasseter',),
 ('Lee Unkrich',),
 ('Michael Arndt',),
 ('Robert Zemeckis',),
 ('Joss Whedon',),
 ('Joel Cohen',),
 ('Alec Sokol

### Conclusion
The movie database we queried during this lab had a multitude of relationships between the tables. We saw how we could use JOIN to connect the tables, in order query information about entities in different tables.