<a href="https://colab.research.google.com/github/mading225/GROUP-6-PROJECT-/blob/main/Movie_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Business Understanding

## Business Problem

The company is planning to enter the entertainment industry by launching a new movie studio. While the company has experience in other business areas, it currently lacks expertise in movie production and distribution.

Creating movies requires significant upfront investment, and poor decisions around genre, audience targeting, or production strategy can lead to major financial losses. To reduce this risk, the company needs data-driven guidance on what types of films are most likely to succeed in today’s market.

## Business Objectives

The objective of this project is to **analyze historical movie data** to identify patterns associated with **financial success and positive audience reception.**

Using box office performance and audience ratings, this analysis aims to answer the following questions:

 - What types of movies generate the highest box office revenue?

 - Which movie characteristics are associated with strong audience approval?

 - What trends can guide the studio’s initial movie production strategy?

## Key Business Questions

To support leadership decision-making, this project focuses on three core questions:

 - **Which movie genres consistently perform best at the box office?**

 - **How does audience reception (ratings and engagement) relate to financial success?**

 - **What movie characteristics (such as runtime or release trends) are most associated with successful films?**

# Data Understanding

To explore what types of movies perform best both financially and critically, this project uses three datasets sourced from widely recognized movie industry platforms. Each dataset provides a different perspective on movie performance, allowing for a more well-rounded analysis.

In [1]:
# Importing the necessary libraries
import pandas as pd
import sqlite3
import matplotlib.pyplot as plt

## IMDB Dataset

The IMDB data is stored in a SQLite database. This database contains multiple tables, but this analysis focuses on:

`movie_basics`

`movie_ratings`

These tables provide movie characteristics and audience ratings.

In [2]:
# Unzipping the zipped imdb dataset
!unzip im.db.zip -d zippedData


Archive:  im.db.zip
  inflating: zippedData/im.db        


In [3]:
# Connect to Sqlite3
conn = sqlite3.connect('zippedData/im.db')

# Load relevant tables
movie_basics = pd.read_sql("""
SELECT * FROM movie_basics;
""", conn)

movie_ratings = pd.read_sql("""
SELECT * FROM movie_ratings;
""", conn)



In [4]:
# Previewing the movie_basics table
movie_basics.head()

Unnamed: 0,movie_id,primary_title,original_title,start_year,runtime_minutes,genres
0,tt0063540,Sunghursh,Sunghursh,2013,175.0,"Action,Crime,Drama"
1,tt0066787,One Day Before the Rainy Season,Ashad Ka Ek Din,2019,114.0,"Biography,Drama"
2,tt0069049,The Other Side of the Wind,The Other Side of the Wind,2018,122.0,Drama
3,tt0069204,Sabse Bada Sukh,Sabse Bada Sukh,2018,,"Comedy,Drama"
4,tt0100275,The Wandering Soap Opera,La Telenovela Errante,2017,80.0,"Comedy,Drama,Fantasy"


In [5]:
# Previewing the movie_ratings table
movie_ratings.head()

Unnamed: 0,movie_id,averagerating,numvotes
0,tt10356526,8.3,31
1,tt10384606,8.9,559
2,tt1042974,6.4,20
3,tt1043726,4.2,50352
4,tt1060240,6.5,21
