# Project Overview

Introduction:

The provided task involves web scraping, data storage in CSV files, creation of tables in SQLite database, executing SQL queries on the table, and performing similar queries using Pandas SQL on the loaded CSV data.
    
The first CSV file includes the data like - Sno, Movie Name, Director Name(bifurcated into subfields as per number of directors in the movie), Duration, Year, Ratings, and Metascore.
The second CSV file includes the data like - Movie Name, Stars(bifurcated into 4 subfields as per number of stars in the movie), Votes, Genere(bifurcated into 3 subfields as per the number of genere movie belongs), Gross Collection, Popularity, and Certification.
The above task of project will be the phase1. 

Now for phase2: Here, two tables will be created in SQLite database using the columns provided in the CSV files. The data from each CSV file is gonna inserted into the corresponding tables. The subsequent SQL queries are executed on the tables in the SQLite database.
The queries involve retrieving specific details from the tables based on various conditions and sorting orders.

The CSV data is then loaded into Pandas DataFrames. The DataFrames are then used to execute Pandas SQL queries, which involve SQL-like operations on the DataFrames. The queries may include joins, filtering, sorting, and other operations as required. The results of the queries are obtained as Dataframes, which can be used for further analysis and visualization.

# 1

In [1]:
# Importing necessary libraries

import pandas as pd
import requests
from bs4 import BeautifulSoup
import pdb
import re
import csv

In [2]:
# Scrap the IMDB data and return a list of dictionaries
def scrape_imdb_data(page_num):
    url = f"https://www.imdb.com/search/title/?genres=action&sort=user_rating,desc&title_type=feature&num_votes=25000,&pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=f11158cc-b50b-4c4d-b0a2-40b32863395b&pf_rd_r=XZ8X52H1R40B7KG5SNZ9&pf_rd_s=right-6&pf_rd_t=15506&pf_rd_i=top&ref_=chttp_gnr_1&start={(page_num - 1) * 50 + 1}"
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")
    
    movie_list = []
    
    for movie in soup.find_all("div", class_ = "lister-item mode-advanced"):
        title = movie.h3.a.text
        director_and_stars = movie.find("p", class_ = "").text.strip()
        director_start = director_and_stars.find("Director:") + len("Director:")
        director_end = director_and_stars.find("|", director_start)
        director_names = [name.strip() for name in director_and_stars[director_start:director_end].split(",")]
        # Bifurcate director field in three subfield
        director1 = director_names[0] if director_names else None
        director2 = director_names[1] if len(director_names) > 1 else None
        director3 = director_names[2] if len(director_names) > 2 else None
        duration = movie.find("span", class_ = "runtime").text if movie.find("span", class_ = "runtime") else ""
        year = movie.find("span", class_ = "lister-item-year").text.strip("()")
        rating = movie.find("div", class_ = "inline-block ratings-imdb-rating").strong.text
        metascore = movie.find("span", class_ = "metascore").text.strip() if movie.find("span", class_ = "metascore") else ""
        stars_start = director_and_stars.find("Stars:") + len("Stars:")
        stars = director_and_stars[stars_start:].strip().split(",")
        # Bifurcate the stars field into four subfields
        star1 = stars[0] if stars else None
        star2 = stars[1] if len(stars) > 1 else None
        star3 = stars[2] if len(stars) > 2 else None
        star4 = stars[3] if len(stars) > 3 else None
        votes_element = movie.find("p", class_ = "sort-num_votes-visible")
        votes = votes_element.find_all("span")[1].text.strip().replace(",","") if votes_element else ""
        genre_element = movie.find("span", class_ = "genre")
        genre = genre_element.text.strip().replace("\n", "") if genre_element else ""
        genres = [genre.strip() for genre in genre.split(",")]
        # bifurcate the  genre field to three subfields
        genre1 = genres[0] if genres else None
        genre2 = genres[1] if len(genres) > 1 else None
        genre3 = genres[2] if len(genres) > 2 else None
        gross_element = movie.find("span", text= re.compile("Gross:"))
        if gross_element and gross_element.find_next_sibling("span"):
            gross = gross_element.find_next_sibling("span").text.strip() if gross_element else ""
        else:
            gross = ""
        popularity_element = movie.find("span", class_ = "lister-item-index unbold text-primary")
        popularity = popularity_element.text.strip() if popularity_element else ""  
        certification_element = movie.find("span", class_ = "certificate")
        certification = certification_element.text.strip() if certification_element else ""
        
        movie_data = {"Sno": len(movie_list) + 1,
                     "Movie Name": title,
                     "Director1": director1,
                     "Director2": director2,
                     "Director3": director3,
                     "Duration": duration,
                     "Year": year,
                     "Rating": rating,
                     "Metascore": metascore,
                     "Star1": star1,
                     "Star2": star2,
                     "Star3": star3,
                     "Star4": star4,
                     "Votes": votes,
                     "Genre1": genre1,
                     "Genre2": genre2,
                     "Genre3": genre3,
                     "Gross collection": gross,
                     "Popularity": popularity,
                     "Certification": certification}
        movie_list.append(movie_data)
    return movie_list   

# Function to store data into two seperate CSV files
def store_data_in_csv(movie_list):
    with open("movies_directors.csv", mode = "w", newline = "", encoding = "utf-8") as df1:
        writer = csv.DictWriter(df1, fieldnames = ["Sno", "Movie Name", "Director1", "Director2", "Director3", "Duration", "Year", "Rating", "Metascore"])
        writer.writeheader()
        
        for index, movie in enumerate(movie_list, start = 1):
            writer.writerow({"Sno": index,
                            "Movie Name": movie["Movie Name"],
                            "Director1": movie["Director1"],
                            "Director2": movie["Director2"],
                            "Director3": movie["Director3"],
                            "Duration": movie["Duration"],
                            "Year": movie["Year"],
                            "Rating": movie["Rating"],
                            "Metascore": movie["Metascore"]})
            
    with open("movies_genres.csv", mode = "w", newline = "", encoding = "utf-8") as df2:
        writer = csv.DictWriter(df2, fieldnames = ["Movie Name", "Star1", "Star2", "Star3", "Star4", "Votes", "Genre1", "Genre2", "Genre3", "Gross collection", "Popularity", "Certification"])
        writer.writeheader() 
        
        for movie in movie_list:
            writer.writerow({"Movie Name": movie["Movie Name"],
                            "Star1": movie["Star1"],
                            "Star2": movie["Star2"],
                            "Star3": movie["Star3"],
                            "Star4": movie["Star4"],
                            "Votes": movie["Votes"],
                            "Genre1": movie["Genre1"],
                            "Genre2": movie["Genre2"],
                            "Genre3": movie["Genre3"],
                            "Gross collection": movie["Gross collection"],
                            "Popularity": movie["Popularity"],
                            "Certification": movie["Certification"]})
if __name__ == "__main__":
    all_movie_data = []
    
    # Loop through all pages and append data to the all_movie_data list.
    for page_num in range(1,38):
        movie_data = scrape_imdb_data(page_num)
        all_movie_data.extend(movie_data)
    store_data_in_csv(all_movie_data)    
    print("Data has been successfully scraped and stored in CSV files.")    

Data has been successfully scraped and stored in CSV files.


In [3]:
df1 = pd.read_csv("movies_directors.csv")
df1

Unnamed: 0,Sno,Movie Name,Director1,Director2,Director3,Duration,Year,Rating,Metascore
0,1,The Dark Knight,Christopher Nolan,,,152 min,2008,9.0,84.0
1,2,The Lord of the Rings: The Return of the King,Peter Jackson,,,201 min,2003,9.0,94.0
2,3,Spider-Man: Across the Spider-Verse,s:\nJoaquim Dos Santos,Kemp Powers,Justin K. Thompson,140 min,2023,8.9,86.0
3,4,Inception,Christopher Nolan,,,148 min,2010,8.8,74.0
4,5,The Lord of the Rings: The Fellowship of the Ring,Peter Jackson,,,178 min,2001,8.8,92.0
...,...,...,...,...,...,...,...,...,...
1765,1766,Radhe,Prabhu Deva,,,109 min,2021,1.9,
1766,1767,Race 3,Remo D'Souza,,,160 min,2018,1.9,
1767,1768,Angels Apocalypse,s:\nSean Cain,Enzo Zelocchi,,85 min,2015,1.6,
1768,1769,Elk*rtuk,Keith English,,,125 min,2021,1.5,


In [4]:
df2 = pd.read_csv("movies_genres.csv")
df2

Unnamed: 0,Movie Name,Star1,Star2,Star3,Star4,Votes,Genre1,Genre2,Genre3,Gross collection,Popularity,Certification
0,The Dark Knight,Christian Bale,\nHeath Ledger,\nAaron Eckhart,\nMichael Caine,2742002,Action,Crime,Drama,$534.86M,1.,UA
1,The Lord of the Rings: The Return of the King,Elijah Wood,\nViggo Mortensen,\nIan McKellen,\nOrlando Bloom,1899372,Action,Adventure,Drama,$377.85M,2.,U
2,Spider-Man: Across the Spider-Verse,Shameik Moore,\nHailee Steinfeld,\nBrian Tyree Henry,\nLuna Lauren Velez,186672,Animation,Action,Adventure,,3.,U
3,Inception,Leonardo DiCaprio,\nJoseph Gordon-Levitt,\nElliot Page,\nKen Watanabe,2433238,Action,Adventure,Sci-Fi,$292.58M,4.,UA
4,The Lord of the Rings: The Fellowship of the Ring,Elijah Wood,\nIan McKellen,\nOrlando Bloom,\nSean Bean,1927794,Action,Adventure,Drama,$315.54M,5.,U
...,...,...,...,...,...,...,...,...,...,...,...,...
1765,Radhe,Salman Khan,\nDisha Patani,\nRandeep Hooda,\nJackie Shroff,179060,Action,Crime,Thriller,,1766.,UA
1766,Race 3,Anil Kapoor,\nSalman Khan,\nBobby Deol,\nJacqueline Fernandez,47815,Action,Crime,Thriller,$1.69M,1767.,UA
1767,Angels Apocalypse,Enzo Zelocchi,\nJana Rochelle,\nRyan C.F. Buckley,\nWilliam Kirkham,42919,Action,Fantasy,Sci-Fi,,1768.,
1768,Elk*rtuk,Vivianne Bánovits,\nAndrás Mózes,\nBarna Bokor,\nGabriella Gubás,39601,Action,Crime,Drama,,1769.,


# 2

Now we will create two tables in SQLite and insert the data from the CSV files into the tables

In [5]:
import sqlite3

# Create a connection to the SQLite database
conn = sqlite3.connect("movies.db")

# Create Table 1
conn.execute(""" 
CREATE TABLE IF NOT EXISTS Table1(
     Sno INTEGER PRIMARY KEY AUTOINCREMENT,
     MovieName TEXT,
     Director1 TEXT,
     Director2 TEXT,
     Director3 TEXT,
     Duration TEXT,
     Year INTEGER,
     Rating REAL,
     Metascore REAL)""")

# Insert data into Table 1
df1 = pd.read_csv("movies_directors.csv")
df1.to_sql("Table1", conn, if_exists = "replace", index = False)

# Table 2
conn.execute("""
CREATE TABLE IF NOT EXISTS Table2(
     MovieName TEXT,
     Star1 TEXT,
     Star2 TEXT,
     Star3 TEXT,
     Star4 TEXT,
     Votes INTEGER,
     Genre1 TEXT,
     Genre2 TEXT,
     Genre3 TEXT,
     GrossCollection REAL,
     Popularity INTEGER,
     Certification TEXT)""")

# Insert data
df2 = pd.read_csv("movies_genres.csv")
df2.to_sql("Table2", conn, if_exists = "replace", index = False)

# Commit and close the database connection
conn.commit()
conn.close()

* Now we will start quering the tables using SQL queries

In [6]:
# Lets establish a new connection to the SQLite database
conn = sqlite3.connect("movies.db")

# Queries for table 1

# 1. Let's Display all the details of movies created by directors Christopher and Matt Reeves.
query1 = """
SELECT * FROM Table1
WHERE Director1 LIKE "%Christopher%" OR Director1 LIKE "%Matt Reeves%"
   OR Director2 LIKE "%Christopher%" OR Director2 LIKE "%Matt Reeves%"
   OR Director3 LIKE "%Christopher%" OR Director3 LIKE "%Matt Reeves%"
"""
result1 = pd.read_sql_query(query1, conn)
print("Query 1:")
print(result1)
print()

# 2. Display all the details of movies with a duration of 140 to 190 minutes.
query2 = """
SELECT * FROM Table1
WHERE Duration >=140 AND Duration <=190"""

result2 = pd.read_sql_query(query2, conn)
print("Query 2:")
print(result2)
print()

# 3. Display all details of movies with ratings above 7 in ascending order.
query3 = """
SELECT * FROM Table1
WHERE Rating > 7
ORDER BY Rating ASC"""

result3 = pd.read_sql_query(query3, conn)
print("Query 3:")
print(result3)
print()

# 4. Display all movie names in descending order.
query4 = """
SELECT DISTINCT `Movie Name` FROM Table1
ORDER BY `Movie Name` DESC"""

result4 = pd.read_sql_query(query4, conn)
print("Query 4:")
print(result4)
print()

# 5. Display movie name starts with 'P' and their rating is greater than 7.
query5 = """
SELECT `Movie Name` FROM Table1
WHERE `Movie Name` LIKE 'P%' AND Rating > 7"""

result5 = pd.read_sql_query(query5, conn)
print("Query 5:")
print(result5)
print()

Query 1:
     Sno                                     Movie Name  \
0      1                                The Dark Knight   
1      4                                      Inception   
2     22                          The Dark Knight Rises   
3     40                                  Batman Begins   
4     57  Mission: Impossible - Dead Reckoning Part One   
5    132                                     The Batman   
6    133                                        Dunkirk   
7    167                  Mission: Impossible - Fallout   
8    175                                 The Lego Movie   
9    231                 Dawn of the Planet of the Apes   
10   289             Mission: Impossible - Rogue Nation   
11   310                 War for the Planet of the Apes   
12   323                                          Tenet   
13   386                                 21 Jump Street   
14   500                                   Jack Reacher   
15   517                                    Clo

In [7]:
# Queries for table 2

# 1. Display all movie names with star 'Arnold Schwarzenegger' in ascending order.

query6 = """
SELECT  DISTINCT `Movie Name` FROM Table2
WHERE (`Star1` LIKE '%Arnold Schwarzenegger%')
   OR (`Star2` LIKE '%Arnold Schwarzenegger%') 
   OR (`Star3` LIKE '%Arnold Schwarzenegger%')
   OR (`Star4` LIKE '%Arnold Schwarzenegger%')
ORDER BY `Movie Name` ASC"""

result6 = pd.read_sql_query(query6, conn)
print("Query 6:")
print(result6)
print()


# 2. Now display all details of the movie with the highest number of votes.
query7 = """
SELECT * FROM Table2
WHERE Votes = (SELECT MAX(Votes) FROM Table2)
LIMIT 1 """

result7 = pd.read_sql_query(query7, conn)
print("Query 7:")
print(result7)
print()

# 3. Now display  movie names with gross colletion in descending order.
query8 = """
SELECT DISTINCT `Movie Name` FROM Table2
ORDER BY `Gross collection` DESC"""

result8 = pd.read_sql_query(query8, conn)
print("Query8:")
print(result8)
print()

# 4. Display the gross collection of movies with the star 'Arnold'.
query9 = """
SELECT DISTINCT `Movie Name`, `Gross collection` FROM Table2
WHERE (`Star1` LIKE '%Arnold%')
   OR (`Star2` LIKE '%Arnold')
   OR (`Star3` LIKE '%Arnold')
   OR (`Star4` LIKE '%Arnold')"""

result9 = pd.read_sql_query(query9, conn)
print("Query 9:")
print(result9)
print()

# 5. Dispaly all details of movies with comedy and action genres.
query10 = """
SELECT * FROM Table2
WHERE (`Genre1` LIKE '%Comedy%' AND `Genre2` LIKE '%Action%')
   OR (`Genre1` LIKE '%Action%' AND `Genre2` LIKE '%Comedy%')
   OR (`Genre1` LIKE '%Comedy%' AND `Genre3` LIKE '%Action%')
   OR (`Genre1` LIKE '%Action%' AND `Genre3` LIKE '%Comedy%')
   OR (`Genre2` LIKE '%Comedy%' AND `Genre3` LIKE '%Action%')
   OR (`Genre2` LIKE '%Action%' AND `Genre3` LIKE '%Comedy%')"""

result10 = pd.read_sql_query(query10, conn)
print("Query10:")
print(result10)
print()

Query 6:
                            Movie Name
0                       Batman & Robin
1                    Collateral Damage
2                             Commando
3                  Conan the Barbarian
4                  Conan the Destroyer
5                          End of Days
6                               Eraser
7                          Escape Plan
8                     Kindergarten Cop
9                     Last Action Hero
10                            Predator
11                            Raw Deal
12                            Red Heat
13                           Red Sonja
14                            Sabotage
15          Terminator 2: Judgment Day
16  Terminator 3: Rise of the Machines
17                  Terminator Genisys
18               Terminator: Dark Fate
19                         The 6th Day
20                      The Last Stand
21                     The Running Man
22                      The Terminator
23                        Total Recall
24              

In [8]:
# Subqueries

# 1. Display all details from both tables where movie names are the same
query11 = """
SELECT DISTINCT * FROM Table1
JOIN Table2 ON Table1.`Movie Name` = Table2.`Movie Name` """

result11 = pd.read_sql_query(query11, conn)
print("Query11:")
print(result11)
print()

# 2. Let's display all movie names, Director, ratings, and gross collection where the genre is action.
query12 = """
SELECT DISTINCT Table1.`Movie Name`, Table1. `Director1`, Table1. `Director2`, Table1. `Director3`, Table1. `Rating`, Table2.`Gross collection`
FROM Table1
JOIN Table2 ON Table1.`Movie Name` = Table2.`Movie Name`
WHERE Table2. `Genre1` LIKE '%Action%' OR Table2. `Genre2` LIKE '%Action%' OR Table2. `Genre3` LIKE '%Action%'"""

result12 = pd.read_sql_query(query12, conn)
print("Query12:")
print(result12)
print()

# 3. Now query will display all details from both the tables with the highest gross collection.
query13 = """
SELECT DISTINCT *
FROM Table1
JOIN Table2 ON Table1.`Movie Name` = Table2.`Movie Name`
WHERE Table2.`Gross collection` = (SELECT MAX(`Gross collection`) FROM Table2)"""

result13 = pd.read_sql_query(query13, conn)
print("Query13:")
print(result13)
print()


# 4. Now query will display all details from both the tables with the highest rating.
query14 = """SELECT DISTINCT *
FROM Table1
JOIN Table2 ON Table1. `Movie Name` = Table2.`Movie Name`
WHERE Table1.Rating = (SELECT MAX(Rating) FROM Table1)"""

result14 = pd.read_sql_query(query14, conn)
print("Query14:")
print(result14)
print()

# 5. This query will display all the details from both tables with the lowest gross collection and lowest rating.
query15 = """SELECT DISTINCT *
FROM Table1
JOIN Table2 ON Table1.`Movie Name` = Table2.`Movie Name`
WHERE Table1.Rating = (SELECT MIN(Rating) AS MinRating FROM Table1)
 AND Table2.`Gross collection` = (SELECT MIN(`Gross collection`) AS MinGross FROM Table2)"""

result15 = pd.read_sql_query(query15, conn)
print("Query15:")
print(result15)
print()

Query11:
       Sno                                         Movie Name  \
0        1                                    The Dark Knight   
1        2      The Lord of the Rings: The Return of the King   
2        3                Spider-Man: Across the Spider-Verse   
3        4                                          Inception   
4        5  The Lord of the Rings: The Fellowship of the Ring   
...    ...                                                ...   
1827  1766                                              Radhe   
1828  1767                                             Race 3   
1829  1768                                  Angels Apocalypse   
1830  1769                                           Elk*rtuk   
1831  1770                                            Sadak 2   

                   Director1      Director2           Director3 Duration  \
0          Christopher Nolan           None                None  152 min   
1              Peter Jackson           None               

In [9]:
# Close the connection
conn.close()

* Now we will make solution of above all 15 query using Pandas SQL

In [10]:
! pip install pandasql



In [11]:
from pandasql import sqldf

# Define pandasql function
pysqldf = lambda q: sqldf(q, globals())

# Load CSV files into dataframes
df1 = pd.read_csv("movies_directors.csv")
df2 = pd.read_csv("movies_genres.csv")

# Concatenate the dataframes for subqueries
df_new = pd.merge(df1, df2, on = 'Movie Name')

Queries' solution using Pandas SQL

In [12]:
# Q1
query1 = """SELECT * FROM df1
WHERE Director1 LIKE '%Christopher%' OR Director1 LIKE '%Matt Reeves%'
   OR Director1 LIKE '%Christopher%' OR Director1 LIKE '%Matt Reeves%'
   OR Director1 LIKE '%Christopher%' OR Director1 LIKE '%Matt Reeves%'"""

result1 = pysqldf(query1)
print("Query1:")
print(result1)
print()


# Q2
query2 = """SELECT *
FROM df1
WHERE Duration >= 140
AND Duration <= 190"""

result2 = pysqldf(query2)
print("Query 2:")
print(result2)
print()

# Q3
query3 = """SELECT *
FROM df1
WHERE Rating >7
ORDER BY Rating ASC"""

result3 = pysqldf(query3)
print("Query 3:")
print(result3)
print()

# Q4
query4 = """SELECT DISTINCT `Movie Name` FROM df1
ORDER BY `Movie Name` DESC"""

result4 = pysqldf(query4)
print("Query 4:")
print(result4)
print()

# Q5
query5 = """SELECT `Movie Name` FROM df1
WHERE `Movie Name` LIKE 'P%'
AND Rating>7 """

result5 = pysqldf(query5)
print("Query 5:")
print(result5)
print()

# Q6
query6 = """SELECT DISTINCT `Movie Name` FROM df2
WHERE Star1 LIKE '%Arnold Schwarzenegger%'
   OR Star2 LIKE '%Arnold Schwarzenegger%'
   OR Star3 LIKE '%Arnold Schwarzenegger%'
   OR Star4 LIKE '%Arnold Schwarzenegger%'
ORDER BY `Movie Name` ASC """

result6 = pysqldf(query6)
print("Query 6:")
print(result6)
print()

# Q7
query7 = """SELECT * FROM df2
WHERE Votes == (SELECT MAX(Votes) FROM df2)"""

result7 = pysqldf(query7)
print("Query 7:")
print(result7)
print()

# Q8
query8 = """SELECT DISTINCT `Movie Name` FROM df2
ORDER BY `Gross collection` DESC """

result8 = pysqldf(query8)
print("Query 8:")
print(result8)
print()

# Q9
query9 = """SELECT DISTINCT `Movie Name`, `Gross collection`
FROM df2 WHERE Star1 LIKE '%Arnold%'
            OR Star2 LIKE '%Arnold%'
            OR Star3 LIKE '%Arnold%'
            OR Star4 LIKE '%Arnold%'"""

result9 = pysqldf(query9)
print("Query 9 :")
print(result9)
print()

# Q10
query10 = """SELECT * FROM df2
WHERE Genre1 LIKE '%Comedy%' AND Genre2 LIKE '%Action%'
   OR Genre1 LIKE '%Action%' AND Genre2 LIKE '%Comedy%'
   OR Genre1 LIKE '%Comedy%' AND Genre3 LIKE '%Action%'
   OR Genre1 LIKE '%Action%' AND Genre3 LIKE '%Comedy%'
   OR Genre2 LIKE '%Comedy%' AND Genre3 LIKE '%Action%'
   OR Genre2 LIKE '%Action%' AND Genre3 LIKE '%Comedy%'"""

result10 = pysqldf(query10)
print("Query 10:")
print(result10)
print()

Query1:
     Sno                                     Movie Name  \
0      1                                The Dark Knight   
1      4                                      Inception   
2     22                          The Dark Knight Rises   
3     40                                  Batman Begins   
4     57  Mission: Impossible - Dead Reckoning Part One   
5    132                                     The Batman   
6    133                                        Dunkirk   
7    167                  Mission: Impossible - Fallout   
8    231                 Dawn of the Planet of the Apes   
9    289             Mission: Impossible - Rogue Nation   
10   310                 War for the Planet of the Apes   
11   323                                          Tenet   
12   500                                   Jack Reacher   
13   517                                    Cloverfield   
14   656                                     Young Guns   
15   789                             The Way of 

In [14]:
# Subqueries

# Q11
query11 = """SELECT DISTINCT * FROM df1
JOIN df2 ON df1. `Movie Name` = df2. `Movie Name` """

result11 = pysqldf(query11)
print("Query 11:")
print(result11)
print()

# Q12
query12 = """SELECT DISTINCT df1. `Movie Name`, df1. Director1, df1. Director2, df1. Director3, df1. Rating, df2. `Gross collection`
FROM df1 JOIN df2 ON df1. `Movie Name` = df2. `Movie Name` 
WHERE df2. Genre1 LIKE '%Action%' OR df2. Genre2 LIKE '%Action%' OR df2. Genre3 LIKE '%Action%'"""

result12 = pysqldf(query12)
print("Query 12:")
print(result12)
print()

# Q13
query13 = """SELECT DISTINCT *
FROM df1
JOIN df2 ON df1. `Movie Name` = df2. `Movie Name`
WHERE df2. `Gross collection` = (SELECT MAX(`Gross collection`)FROM df2) """

result13 = pysqldf(query13)
print("Query 13:")
print(result13)
print()

# Q14
query14 = """SELECT DISTINCT * FROM df1
JOIN df2 ON df1. `Movie Name` = df2. `Movie Name`
WHERE df1. Rating = (SELECT MAX(Rating) FROM df1)"""

result14 = pysqldf(query14)
print("Query14:")
print(result14)
print()

# Q15
query15 = """SELECT DISTINCT * FROM df1
JOIN df2 ON df1. `Movie Name` = df2. `Movie Name`
WHERE df2. `Gross collection` = (SELECT MIN(`Gross collection`) FROM df2)
AND df1. Rating = (SELECT MIN(Rating)FROM df1)"""

result15 = pysqldf(query15)
print("Query 15:")
print(result15)
print()

Query 11:
       Sno                                         Movie Name  \
0        1                                    The Dark Knight   
1        2      The Lord of the Rings: The Return of the King   
2        3                Spider-Man: Across the Spider-Verse   
3        4                                          Inception   
4        5  The Lord of the Rings: The Fellowship of the Ring   
...    ...                                                ...   
1827  1766                                              Radhe   
1828  1767                                             Race 3   
1829  1768                                  Angels Apocalypse   
1830  1769                                           Elk*rtuk   
1831  1770                                            Sadak 2   

                   Director1      Director2           Director3 Duration  \
0          Christopher Nolan           None                None  152 min   
1              Peter Jackson           None              

Here All queries and subqueries are sorted using Pandasql.