# More JOIN operations

![rel](https://sqlzoo.net/w/images/5/50/Movie2-er.png)

In [1]:
import os
import pandas as pd
import findspark
os.environ['SPARK_HOME'] =  '/opt/spark'
findspark.init()

from pyspark.sql import SparkSession
ss = (SparkSession.builder.appName('app00')
      .config('spark.sql.warehouse.dir', 'hdfs://quickstart.cloudera:8020/user/hive/warehouse')
      .config('hive.metastore.uris', 'thrift://quickstart.cloudera:9083')
      .enableHiveSupport().getOrCreate())

 ····


## 1. 1962 movies

List the films where the **yr** is 1962 [Show **id, title**]

In [2]:
movie = pd.read_sql_table('movie', engine)
actor = pd.read_sql_table('actor', engine)
casting = pd.read_sql_table('casting', engine)

In [3]:
movie.loc[movie['yr']==1962, ['id', 'title']]

Unnamed: 0,id,title
211,10212,A Kind of Loving
328,10329,A Symposium on Popular Songs
346,10347,A Very Private Affair (Vie PrivÃ©e)
647,10648,An Autumn Afternoon
867,10868,Atraco a las tres
1005,11006,Barabbas
1052,11053,Battle Beyond the Sun (ÐÐµÐ±Ð¾ Ð·Ð¾Ð²ÐµÑ‚)
1198,11199,Big and Little Wong Tin Bar
1229,11230,Billy Budd
1233,11234,Billy Rose's Jumbo


## 2. When was Citizen Kane released?

Give year of 'Citizen Kane'.

In [4]:
movie.loc[movie['title']=='Citizen Kane', ['yr']]

Unnamed: 0,yr
1954,1941


## 3. Star Trek movies

List all of the Star Trek movies, include the **id**, **title** and **yr** (all of these movies include the words Star Trek in the title). Order results by year.

In [24]:
movie.loc[movie['title'].str.lower().str.contains('star trek').fillna(False),
         ['id', 'title', 'yr']]

Unnamed: 0,id,title,yr
7769,17770,Star Trek: First Contact,1996
7770,17771,Star Trek: Insurrection,1998
7771,17772,Star Trek: The Motion Picture,1979
7772,17773,Star Trek,2009
7773,17774,Star Trek Generations,1994
7774,17775,Star Trek II: The Wrath of Khan,1982
7775,17776,Star Trek III: The Search for Spock,1984
7776,17777,Star Trek IV: The Voyage Home,1986
7777,17778,Star Trek Nemesis,2002
7778,17779,Star Trek V: The Final Frontier,1989


## 4. id for actor Glenn Close

What **id** number does the actor 'Glenn Close' have?

In [25]:
actor.loc[actor['name']=='Glenn Close', ['id']]

Unnamed: 0,id
22511,140


## 5. id for Casablanca

What is the **id** of the film 'Casablanca'

In [26]:
movie.loc[movie['title']=='Casablanca', ['id']]

Unnamed: 0,id
1767,11768


## 6. Cast list for Casablanca

Obtain the cast list for 'Casablanca'.

> _what is a cast list?_  
> The cast list is the names of the actors who were in the movie.

Use **movieid=11768**, (or whatever value you got from the previous question)

In [32]:
a = casting.merge(actor, how='inner', left_on='actorid', right_on='id')
a.loc[a['movieid']==11768, ['name']]

Unnamed: 0,name
3766,Peter Lorre
8665,John Qualen
8938,Madeleine LeBeau
15264,Jack Benny
24061,Norma Varden
24380,Ingrid Bergman
24740,Conrad Veidt
25362,Leon Belasco
27263,Humphrey Bogart
27294,Sydney Greenstreet


## 7. Alien cast list

Obtain the cast list for the film 'Alien'

In [91]:
a = (movie.merge(casting, how='right', left_on='id', right_on='movieid')
     .merge(actor, how='left', left_on='actorid', right_on='id'))
a.loc[a['title']=='Alien', ['name']]

Unnamed: 0,name
5425,John Hurt
5426,Sigourney Weaver
5427,Yaphet Kotto
5428,Harry Dean Stanton
5429,Ian Holm
5430,Tom Skerritt
5431,Veronica Cartwright


## 8. Harrison Ford movies

List the films in which 'Harrison Ford' has appeared

In [37]:
# a was obtained in #7
a.loc[a['name']=='Harrison Ford', ['title']]

Unnamed: 0,title
15429,A Hundred and One Nights
15430,Air Force One
15431,American Graffiti
15432,Apocalypse Now
15433,Clear and Present Danger
15434,Cowboys & Aliens
15435,Crossing Over
15436,Dead Heat on a Merry-Go-Round
15437,Extraordinary Measures
15438,Firewall


## 9. Harrison Ford as a supporting actor

List the films where 'Harrison Ford' has appeared - but not in the starring role. [Note: the ord field of casting gives the position of the actor. If ord=1 then this actor is in the starring role]

In [38]:
# a was obtained in #7
a.loc[(a['name']=='Harrison Ford') & (a['ord']>1), ['title']]

Unnamed: 0,title
15429,A Hundred and One Nights
15431,American Graffiti
15432,Apocalypse Now
15434,Cowboys & Aliens
15436,Dead Heat on a Merry-Go-Round
15437,Extraordinary Measures
15439,Force 10 From Navarone
15441,Hawthorne of the U.S.A.
15446,Jimmy Hollywood
15448,More American Graffiti


## 10. Lead actors in 1962 movies

List the films together with the leading star for all 1962 films.

In [39]:
# a was obtained in #7
a.loc[(a['yr']==1962) & (a['ord']==1), ['title', 'name']]

Unnamed: 0,title,name
3454,Birdman of Alcatraz,Burt Lancaster
4041,What Ever Happened to Baby Jane?,Bette Davis
4099,David and Lisa,Keir Dullea
5668,Experiment in Terror,Glenn Ford
6323,Who's Got the Action?,Dean Martin
6334,It's Only Money,Jerry Lewis
7302,Term of Trial,Laurence Olivier
8093,Boys' Night Out,Kim Novak
8696,La notte,Marcello Mastroianni
10637,Long Day's Journey into Night,Katharine Hepburn


## 11. Busy years for Rock Hudson

Which were the busiest years for 'Rock Hudson', show the year and the number of movies he made each year for any year in which he made more than 2 movies.

In [50]:
# a was obtained in #7
b = (a.loc[a['name']=='Rock Hudson', ['yr', 'title']]
     .groupby('yr').count()
     .reset_index()
     .rename(columns={'title': 'n'}))
b.loc[b['n']>2, :]

Unnamed: 0,yr,n
2,1953,5
8,1961,3


## 12. Lead actor in Julie Andrews movies

List the film title and the leading actor for all of the films 'Julie Andrews' played in.

> _Did you get "Little Miss Marker twice"?_   
> Julie Andrews starred in the 1980 remake of Little Miss Marker and not the original(1934).
>
> Title is not a unique field, create a table of IDs in your subquery

In [56]:
# a was obtained in #7
b = a.loc[a['name']=='Julie Andrews', 'movieid'].values
a.loc[(a['movieid'].isin(b)) & (a['ord']==1), ['title', 'name']]

Unnamed: 0,title,name
1201,10,Dudley Moore
1212,Darling Lili,Julie Andrews
1214,Duet for One,Julie Andrews
1215,Hawaii,Julie Andrews
1217,Mary Poppins,Julie Andrews
1218,Relative Values,Julie Andrews
1220,Star!,Julie Andrews
1226,The Tamarind Seed,Julie Andrews
1227,Thoroughly Modern Millie,Julie Andrews
1230,Victor Victoria,Julie Andrews


## 13. Actors with 15 leading roles

Obtain a list, in alphabetical order, of actors who've had at least 15 **starring** roles.

In [80]:
# a was obtained in #7
b = (a.loc[a['ord']==1, ['name', 'actorid', 'movieid']]
     .groupby(['actorid', 'name'])
     .count()
     .reset_index()
     .rename(columns={'movieid': 'n'}))
b.loc[b['n']>=15, ['name']].sort_values('name')

Unnamed: 0,name
264,Adam Sandler
304,Al Pacino
303,Anthony Hopkins
1810,Antonio Banderas
1611,Arnold Schwarzenegger
623,Barbara Stanwyck
1601,Ben Affleck
141,Bette Davis
1549,Bing Crosby
73,Bruce Willis


## 14.
List the films released in the year 1978 ordered by the number of actors in the cast, then by title.

In [92]:
# a was obtained in #7
b = (a.loc[a['yr']==1978, ['title', 'movieid', 'actorid']]
      .groupby(['title', 'movieid'])
      .count()
      .reset_index()
      .rename(columns={'actorid': 'n'})
      .loc[:, ['title', 'n']])
b.sort_values(['n', 'title'], ascending=[False, True])

Unnamed: 0,title,n
75,The Bad News Bears Go to Japan,50
94,The Swarm,37
36,Grease,28
5,American Hot Wax,27
77,The Boys from Brazil,26
...,...,...
52,Lies My Father Told Me,2
66,"Same Time, Next Year",2
70,Somebody Killed Her Husband,2
72,That's Carry On!,2


## 15.

List all the people who have worked with 'Art Garfunkel'.

In [96]:
# a was obtained in #7
b = a.loc[a['name']=='Art Garfunkel', 'movieid']
a.loc[(a['movieid'].isin(b)) & (a['name']!='Art Garfunkel'),
      ['name']].sort_values('name')

Unnamed: 0,name
980,Beverly Johnson
14279,Bill Paxton
959,Breckin Meyer
981,Bruce Jay Friedman
975,Cecilie Thomsen
972,Cindy Crawford
974,Donald Trump
969,Elio Fiorucci
966,Ellen Albertini Dow
976,Frederique van der Wal
