# Team A08- Analysing Factors Behind Success In Modern TV Industry

#### **Collaborators:** Adarsh Prajapat, Rohit Devanaboina, Muhammad Shayan Hasan Khan, Ritam Bhattacharya, Shizuka Takahashi, Shravani Thalla

![IMDb-TV.png](https://cordcutting.com/wp-content/uploads/2020/09/IMDb-TV-.png)

<br>

## **Table of Content**
---

### 0. Tableau
### I. Introduction and Motivation

### II. Executive Summary

### III. Project Description & Dataset Overview

    1. Problem Statement
    2. Data Source
    3. Data Dictionary


### IV. Entity Relationship Diagram

### V. Exploratory Data Analysis
    
### VI. Conclusion

### VII. Challenges

### VIII. References

### IX. Tableau Dashboard

<br>

## 0) Tableau
---

__Link to Tableau Public Dashboard__ - https://public.tableau.com/app/profile/rohit.devanaboina2582/viz/Team-A08-TableauProject775-Final/FinalDashboard

### Dashboard Page 1

In [25]:
from IPython.display import Image 
Image(url="https://drive.google.com/uc?export=view&id=1CqhEE0lj8uK3H2ZjdUEc4JKoG5e7oZj6", width = 1280, height = 720)

### Dashboard Page 2

In [26]:
from IPython.display import Image 
Image(url="https://drive.google.com/uc?export=view&id=1KkbEl4SIVFLGvlBpfcBduWRqFWMIFPB1", width = 1280, height = 720)

<br>

## I) Introduction and Motivation
---

Our exploration into factors behind success in modern TV industry using the IMDb dataset is an exciting journey into understanding what makes TV shows popular and also successful. We're diving deep into this vast dataset, which holds a treasure trove of information about movies, TV series, video games and much more.

We're passionate about this project because we want to uncover the secrets behind why some shows become huge hits while others might not. By analyzing the data from IMDb, we aim to __discover the patterns and factors__ that make a show successful. This knowledge can help TV makers create better and more engaging content, improving the experience for viewers like us.

**IMDb Dataset and its Importance**

IMDb is like a huge library filled with information about movies, TV shows, games, podcasts, etc., including cast, production crew and personal biographies, plot summaries, trivia, ratings, and fan and critical reviews. It collects data from various sources, including users who write reviews and give ratings for each show or movie they watch. This data is incredibly valuable because it reflects what people like and don't like about different TV series, movies and others.

The ratings and reviews on IMDb act as a guide for viewers looking for something to watch. They help suggest which shows or movies might be worth watching based on other people's experiences. Filmmakers and TV producers also use this information to understand audience preferences, guiding them in making decisions about what types of shows or movies are currently resonating with viewers and what to create in the future. Therefore, our objective here is to streamline their work process and provide comprehensive insights, enabling them to make __well-informed decisions__ based on our thorough __descriptive analysis__.

<BR>

## II) Executive Summary
---

Our comprehensive analysis of the IMDb dataset aimed to unearth the secrets behind the success of media shows in the contemporary landscape, specifically focusing on TV series genres and their contributing elements. By scrutinizing multiple facets encompassing genres, cast, crew, and regions, our objective was to __empower TV producers and production companies__ in making informed decisions for show development, thereby amplifying their __market influence globally__. We strategically narrowed our focus to __TV industry data__, a pivotal market segment capturing a significant audience share.

Refining our dataset by eliminating null values and duplicates, we concentrated on TV industry data spanning from the 2000s to present, ensuring data accuracy. Employing descriptive analytics through __SQL__ and implementing interesting plots leveraging BI tool like __Tableau__, we extracted profound insights into the factors driving success in TV series and episodes across various levels: location, genres, cast, and crew dynamics. Notably, our analysis unveiled the pivotal role of language popularity and geographic locations in dictating a show's triumph or downfall, in our case a _Blockbuster_ or a _Flop_, indicating that a devoted audience base trumps widespread popularity in elevating ratings.

Contrary to common belief, high popularity doesn't invariably translate to high ratings; rather, it's the __dedicated audience base__ that uplifts a show's ratings, particularly in smaller regions. We firmly believe that insights observed from analyses of this nature will serve as guiding beacons for content creators, enabling them to channel their focus on critical factors, ultimately augmenting a show's popularity, ratings, and, consequently, enhancing the profitability of their productions.

<BR>

## III) Project Description & Dataset Overview
---

### 1. Problem Definition

Entering the exciting world of TV, our project dives deep into understanding how shows have changed over many years using the huge IMDb dataset. __Leveraging rigorous SQL queries__ and meticulous analyses, our aim is to unveil pivotal insights shaping the evolving landscape of TV shows. Using most powerful __Business Intelligence tool - Tableau__ through custom SQL queries, and visualizing our insights through __dashboards and stories.__

We really want to know what people like to watch, how types of shows are changing, and what's new in the TV world. Our main goal is to help people who make shows, big companies like Netflix and Amazon Prime, and others in the TV business and also provide meaning insights to viewrs who watch them. These companies have changed how we watch TV, and our research wants to help them understand what viewers like, what shows work best, and what's becoming popular.

Hence, our analysis spans more than __two decades of TV shows__, scrutinizing standout episodes and series, dissecting the creative minds behind them, analyzing geographic expansions in the industry, and uncovering critical factors impacting a show's trajectory. By describing viewer preferences and industry dynamics, our project endeavors to drive data-informed strategies, fostering superior content creation, resource allocation, and heightened audience engagement within the dynamic domain of television entertainment.

### 2. Data Source

We have chosen to utilize the official dataset provided by IMDb, a reputable source for tv-related data. This dataset encompasses a wide range of information, including details about tv series, miniseries, actors, directors, genres, and much more. The dataset can be accessed through the following link: [IMDb Dataset Source and Link](https://developer.imdb.com/non-commercial-datasets/)


### 3. Data Dictionary

__Our project comprises of seven datasets as described below.__

**`title_principals table`**<br>
Contains the job roles (eg.director, producer etc.), category of jobs, and characters played. It has 59026652 different entries and has 6 columns:
* `tconst` (string) - alphanumeric unique identifier of the title
* `ordering` (integer) – a number to uniquely identify rows for a given titleId
* `nconst` (string) - alphanumeric unique identifier of the name/person
* `category` (string) - the category of job that person was in
* `job` (string) - the specific job title if applicable, else '\N'
* `characters` (string) - the name of the character played if applicable, else '\N'

**`names_basics table`**<br>
Contains the basic detials of the person who played a role in certain tv shows. This has 12999851 entries and has 6 columns:

* `nconst` (string) - alphanumeric unique identifier of the name/person
* `primaryName` (string)– name by which the person is most often credited
* `birthYear` – in YYYY format
* `deathYear` – in YYYY format if applicable, else '\N'
* `primaryProfession` (array of strings)– the top-3 professions of the person
* `knownForTitles` (array of tconsts) – titles the person is known for

**`titles_akas table`**<br>
This dataset is valuable for understanding how movies or TV shows are localized or presented in different regions, languages.
This has 37806944 and contains 6 columns:

* `titleId` (string) - a tconst, an alphanumeric unique identifier of the title
* `ordering` (integer) – a number to uniquely identify rows for a given titleId
* `title` (string) – the localized title
* `region` (string) - the region for this version of the title
* `language` (string) - the language of the title
* `types` (array) - Enumerated set of attributes for this alternative title. One or more of the following: "alternative", "dvd", "festival", "tv", "video", "working", "original", "imdbDisplay". New values may be added in the future without warning
* `attributes` (array) - Additional terms to describe this alternative title, not enumerated
* `isoriginalTitle` (boolean) – 0: not original title; 1: original title

**`title_basics table`**<br>
Contains title details of the tv shows etc. This table has 10306126 entries and has 9 columns:

* `tconst` (string): Unique identifier for media for
* `titleType` (string): Identifies the type of media eg. movie, tvepisode etc
* `primaryTitle` (string): the more popular title / the title used by the filmmakers on promotional materials at the point of release
* `originalTitle` (string) - original title, in the original language 
* `isAdult` (boolean) - 0: non-adult title; 1: adult title
* `startYear` (YYYY) – represents the release year of a title. In the case of TV Series, it is the series start year
* `endYear` (YYYY) – TV Series end year. ‘\N’ for all other title types
* `runtimeMinutes` – primary runtime of the title, in minutes
* `genres` (string array) – includes up to three genres associated with the title

**`title_crew table`**<br>
Contains director details. This table has 10309011 and has 3 columns:
* `tconst` (string) - alphanumeric unique identifier of the title
* `directors` (array of nconsts) - director(s) of the given title
* `writers` (array of nconsts) – writer(s) of the given title

**`title_episode table`**<br>
Contains season and episode details. This table contains 7862280 entries and has 4 columns:
* `tconst` (string) - alphanumeric identifier of episode
* `parentTconst` (string) - alphanumeric identifier of the parent TV Series
* `seasonNumber` (integer) – season number the episode belongs to
* `episodeNumber` (integer) – episode number of the tconst in the TV series

**`title_ratings table`**<br>
Contains user rating and votes. This table contains 1368920 entries and has 3 columns:
* `tconst` (string) - alphanumeric unique identifier of the title
* `averageRating` – weighted average of all the individual user ratings
* `numVotes` - number of votes the title has received

<BR>

## IV) ERD - Entity Relationship Diagram
---

In [27]:
from IPython.display import Image 
Image(url="https://drive.google.com/uc?export=view&id=1pdtKn47t6LtSwQ2my7h9ibVY-Dg28CU7", width = 600, height = 800)


### Summary of ERD

The IMDB dataset consists of 7 distinct tables containing information on over 1 million movies, TV Series, Video Games and much more. These tables are related to each other via two keys - tconst (unique ID for each piece of media, such as a movie or a tv series) & nconst (unique ID for each individual, such as actors or directors). 

The titles_basic table serves as the primary source of truth, containing basic information for all titles (tconst) on IMDB. The remaining 6 tables contain additional information for each title (tconst), or the crew (nconsts) associated with each title. For example, title_akas contains geographic information for each title (tconst), such as coutnry of origin & language.

In addition to this, we have 3 tables - names_basic, title_crew, title_principals - which provide information on the individuals that worked on the titles present in the titles_basic table. This includes info on cast, technical crew, writers and more.

<br>

## V) Exploratory Data Analysis
---

##### Create a separate table to filter the category specific data (here:tvepisodes, tvseries and tvminiseries) and date range from 2000 onwards.

In [1]:
%%bigquery
CREATE OR REPLACE TABLE `ba775team08.imdb_non_commercial_datasets_11_11.title_basic_Tvseries_Tvminiseries_Tvepisodes` AS
SELECT * FROM `ba775team08.imdb_non_commercial_datasets_11_11.title_basics` 
WHERE titleType= "tvEpisode" OR titleType= "tvSeries" OR titleType= "tvMiniSeries" AND startYear > '2000'

Query is running:   0%|          |

### Question 1. Which countries have produced TV shows with the highest popularity and ratings since 2000?

#### Creating view by joining required table 

In [10]:
%%bigquery
CREATE OR REPLACE VIEW `imdb_non_commercial_datasets_11_11.region_rat_pop` AS
WITH series AS
(SELECT * FROM `ba775team08.imdb_non_commercial_datasets_11_11.title_basic_Tvseries_Tvminiseries_Tvepisodes` 
WHERE titleType = 'tvSeries'
AND startYear not like '%\\\\%'),
region AS
(SELECT * FROM `ba775team08.imdb_non_commercial_datasets_11_11.title_akas_new`
WHERE region NOT LIKE '%\\\\%'
AND ordering = 2),
rating AS
(SELECT * FROM `ba775team08.imdb_non_commercial_datasets_11_11.title_ratings`)

SELECT A.tconst, A.primaryTitle, A.startYear, B.region, C.averageRating, c.numVotes
FROM series A
INNER JOIN region B
ON A.tconst = B.titleId
INNER JOIN rating C
ON A.tconst = C.tconst;

SELECT * from `imdb_non_commercial_datasets_11_11.region_rat_pop`

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,tconst,primaryTitle,startYear,region,averageRating,numVotes
0,tt0443361,Con,2005,US,4.7,314
1,tt7236776,Dani & Flo,2017,ES,3.6,19
2,tt0065343,San Francisco International Airport,1970,US,2.6,154
3,tt0368494,Fame,2003,US,3.1,28
4,tt12354316,Group,2020,PH,9.1,36
...,...,...,...,...,...,...
55674,tt2720708,Ainak Wala Jin,1993,IN,8.9,200
55675,tt0233810,Der goldene Schuß,1964,XWG,8.9,47
55676,tt0803091,Trailer Court Justice,2006,US,8.9,10
55677,tt0831187,La légende des sciences,1997,FR,8.9,9


**The above SQL query creates a view with all TV Series since the year 2000, along with their region (for regional analysis), IMDB rating, number of votes (proxy for popularity), and start year.**

#### Top 5 by  Popularity (numVotes) - Since 2000

In [13]:
%%bigquery
SELECT region, SUM(numVotes) total_votes FROM `imdb_non_commercial_datasets_11_11.region_rat_pop`
WHERE startYear LIKE '20%' 
GROUP BY region
ORDER BY total_votes DESC
LIMIT 5;

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,region,total_votes
0,IN,6996092
1,US,6838931
2,CA,6540746
3,GB,4607874
4,PL,4257316


**India, USA, Canada, Great Britain and Poland produce the most popular TV Shows, with shows from these regions receiving an aggregate of 4-7 million in votes each from IMDB users.**

#### Top 5 -  Average Rating - Since 2000

In [43]:
%%bigquery
SELECT region, AVG(averageRating) agg_avg_rating FROM `imdb_non_commercial_datasets_11_11.region_rat_pop`
WHERE startYear LIKE '20%' 
GROUP BY region
HAVING SUM(numVotes) > 100000
ORDER BY agg_avg_rating DESC
LIMIT 5;

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,region,agg_avg_rating
0,MK,7.953846
1,LV,7.6
2,BA,7.566667
3,BY,7.385714
4,SI,7.368571


**Smaller countries, such as Macedonia, Belarus and Slovenia seem to have the highest rated TV shows. This is likely due to the small number of shows produced there receiving high ratings from local viewers.**

#### Findings:

* English is a major language in 4 out of the Top 5 regions by popularity. The dominance of english-language media in the movie industry is reflected in the television industry as well.

* High popularity does not equal high ratings. The Top 5 regions by popularity and the Top 5 by ratings have no overlap. Shows from niche regions tend to be higher rated, possibly because these markets have smaller, but more dedicated audiences that lead to inflated ratings compared to their peers from larger regions.

### Question 2. Which shows are the highest rated since 2000?

In [44]:
%%bigquery
SELECT titleType, primaryTitle, startYear, endYear, genres, averageRating, numVotes
FROM `ba775team08.imdb_non_commercial_datasets_11_11.title_basics` AS A
LEFT JOIN `ba775team08.imdb_non_commercial_datasets_11_11.title_ratings` AS B
on A.tconst = B.tconst
WHERE B.averageRating is not NULL 
    and titleType = 'tvSeries' 
    and numVotes > 100000
    and startYear LIKE '20%'
order by averageRating desc
limit 20
;

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,titleType,primaryTitle,startYear,endYear,genres,averageRating,numVotes
0,tvSeries,Breaking Bad,2008,2013,"Crime,Drama,Thriller",9.5,2060194
1,tvSeries,The Wire,2002,2008,"Crime,Drama,Thriller",9.3,364872
2,tvSeries,Avatar: The Last Airbender,2005,2008,"Action,Adventure,Animation",9.3,346805
3,tvSeries,Aspirants,2021,\N,Drama,9.2,306603
4,tvSeries,Game of Thrones,2011,2019,"Action,Adventure,Drama",9.2,2221092
5,tvSeries,Fullmetal Alchemist: Brotherhood,2009,2010,"Action,Adventure,Animation",9.1,189130
6,tvSeries,Sherlock,2010,2017,"Crime,Drama,Mystery",9.1,973109
7,tvSeries,Attack on Titan,2013,2023,"Action,Adventure,Animation",9.1,464970
8,tvSeries,Rick and Morty,2013,\N,"Adventure,Animation,Comedy",9.1,575993
9,tvSeries,Better Call Saul,2015,2022,"Crime,Drama",9.0,614809


#### Findings:

* High Genre Diversity:
The top 20 list of highest-rated TV shows consists of Action, Adventure, Drama, Crime and Animation genres, reflecting a diverse range of themes and storytelling.

* Significant Presence of Animated Shows:
A notable observation is that 40% of the TV shows in the Top 20 list are animated. This demonstrates the increasing popularity and appeal of animated content among viewers.

* Exclusive English Language Content:
All TV shows listed in the Top 20 are exclusively in the English language. This indicates a strong preference or recognition for English-language productions in terms of quality and viewership.

* Comprehensive Ratings and Popularity:
Alongside genre and language, the list provides insights into each show's start and end years, total runtime, average rating, and number of votes. The high number of votes serves as a proxy for the show's popularity among audiences.

<br>

### Question 3. Which genre is the most popular?

In [8]:
%%bigquery
SELECT
  basics.genres,
  COUNT(ratings.tconst) AS titleCount
FROM
  `ba775team08.imdb_non_commercial_datasets_11_11.title_basics` AS basics 
INNER JOIN
  `ba775team08.imdb_non_commercial_datasets_11_11.title_ratings` AS ratings ON basics.tconst = ratings.tconst
WHERE
  basics.startYear like '20%'
  and basics.titleType = 'tvSeries'
  and genres not like '%N'
GROUP BY
  basics.genres
ORDER BY
  titleCount desc limit 10

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,genres,titleCount
0,Comedy,9719
1,Reality-TV,6679
2,Drama,6614
3,Documentary,5661
4,Talk-Show,2212
5,"Drama,Romance",1500
6,"Comedy,Drama",1346
7,Game-Show,1182
8,Animation,1179
9,Family,980


#### Findings:
* **Pervasive Presence of 'Comedy' Category:**
The analysis underscores that the 'Comedy' genre emerged as the most prevalent category across a wide spectrum of entertainment media available within the dataset. This also suggest its unversal appeal and cultural significance resonating strongly with viewers.

<br>

### Question 4. Which region produces the most shows?

In [101]:
%%bigquery
SELECT
  akas.region,
  COUNT(ratings.tconst) AS titleCount
FROM
  `ba775team08.imdb_non_commercial_datasets_11_11.title_akas` AS akas
INNER JOIN
  `ba775team08.imdb_non_commercial_datasets_11_11.title_ratings` AS ratings ON akas.titleId = ratings.tconst
WHERE
  akas.region NOT like '%N'
GROUP BY
  akas.region
ORDER BY
  titleCount DESC 
LIMIT 10



Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,region,titleCount
0,US,572789
1,GB,238383
2,DE,215339
3,FR,212212
4,JP,202537
5,ES,186162
6,IT,173630
7,CA,150147
8,PT,130420
9,XWW,115429


#### Findings:
* **Leading Producers of Entertainment Media:** 
The analysis distinctly identifies the United States as the primary producer of entertainment media. Following closely behind are Great Britain, Germany, and France, in that order.



<br>

### Question 5. What are the highest rated Episodes of all time and which shows do those episodes belong to?

#### Creating a view to simplify this query

In [17]:
%%bigquery
CREATE OR REPLACE VIEW `imdb_non_commercial_datasets_11_11.new_title_parentTconst` AS
WITH pTconst AS (
    SELECT episode.*, highest_rated.tconst AS rated_tconst, highest_rated.primaryTitle AS rated_primaryTitle, highest_rated.averageRating AS rated_averageRating, highest_rated.numVotes AS rated_numVotes, highest_rated.startYear AS rated_startYear, highest_rated.runtimeMinutes AS rated_runtimeMinutes, highest_rated.genres AS rated_genres, highest_rated.titleType AS rated_titleType
    FROM `ba775team08.imdb_non_commercial_datasets_11_11.title_episode` AS episode
    LEFT JOIN (
        SELECT tconst, primaryTitle, averageRating, numVotes, startYear, runtimeMinutes, genres, titleType
        FROM `ba775team08.imdb_non_commercial_datasets_11_11.title_ratings`
        LEFT JOIN `ba775team08.imdb_non_commercial_datasets_11_11.title_basics`
        USING (tconst)
        WHERE CAST(numVotes AS FLOAT64) > 4000
            AND runtimeMinutes != '\\N'
            AND titleType = 'tvEpisode'
            AND startYear > '2000'
        ORDER BY averageRating DESC
    ) AS highest_rated ON episode.tconst = highest_rated.tconst
    WHERE highest_rated.averageRating IS NOT NULL
    ORDER BY highest_rated.averageRating DESC
)
SELECT * FROM pTconst;


Query is running:   0%|          |

In [18]:
%%bigquery

(SELECT primaryTitle, rated_primaryTitle, rated_averageRating, rated_numVotes, rated_runtimeMinutes, genres FROM `ba775team08.imdb_non_commercial_datasets_11_11.new_title_parentTconst` as pT 
inner join `ba775team08.imdb_non_commercial_datasets_11_11.title_basics` as title
on pT.parentTconst = title.tconst
order by rated_averageRating desc, rated_numVotes desc)
limit 20

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,primaryTitle,rated_primaryTitle,rated_averageRating,rated_numVotes,rated_runtimeMinutes,genres
0,Breaking Bad,Ozymandias,10.0,208203,47,"Crime,Drama,Thriller"
1,Game of Thrones,Battle of the Bastards,9.9,220604,60,"Action,Adventure,Drama"
2,Game of Thrones,The Winds of Winter,9.9,157054,68,"Action,Adventure,Drama"
3,Breaking Bad,Felina,9.9,135029,55,"Crime,Drama,Thriller"
4,Game of Thrones,The Rains of Castamere,9.9,114663,51,"Action,Adventure,Drama"
5,Breaking Bad,Face Off,9.9,71877,50,"Crime,Drama,Thriller"
6,Better Call Saul,Plan and Execution,9.9,54213,50,"Crime,Drama"
7,Succession,Connor's Wedding,9.9,33682,62,"Comedy,Drama"
8,Mr. Robot,407 Proxy Authentication Required,9.9,33130,56,"Crime,Drama,Thriller"
9,BoJack Horseman,The View from Halfway Down,9.9,20131,26,"Animation,Comedy,Drama"


#### Findings

**Insights on the Highest-Rated Episodes and Their Hosting TV Shows**:

* Selection Based on Widespread Popularity:
By specifically filtering episodes with votes exceeding 20,000, the query narrows its focus to identify episodes that have garnered significant popularity and substantial viewer participation, indicating widespread acclaim.

* Inclusion of TV Show Context:
An additional column from another table is incorporated, displaying the TV show associated with the highest-rated episode. This added context provides insight into the hosting TV series, offering a broader perspective on the top-rated content.

* Dominance of a few TV shows:
We can see that the Top 20 episodes of all time are dominated by Game of Thrones, Breaking Bad and Attack on Titan, indicating the popularity of these shows


<br>

### Question 6. What are the highest Rated Comedy Episodes of the past two decades and which shows do those episodes belong to?

 #### Creating a View to simplify this query

In [22]:
%%bigquery
CREATE OR REPLACE VIEW `imdb_non_commercial_datasets_11_11.new_title_parentTconst_comedy` AS
WITH pTconst AS (
    SELECT episode.*, highest_rated.tconst AS rated_tconst, highest_rated.primaryTitle AS rated_primaryTitle, highest_rated.averageRating AS rated_averageRating, highest_rated.numVotes AS rated_numVotes, highest_rated.startYear AS rated_startYear, highest_rated.runtimeMinutes AS rated_runtimeMinutes, highest_rated.genres AS rated_genres, highest_rated.titleType AS rated_titleType
    FROM `ba775team08.imdb_non_commercial_datasets_11_11.title_episode` AS episode
    LEFT JOIN (
        SELECT tconst, primaryTitle, averageRating, numVotes, startYear, runtimeMinutes, genres, titleType
        FROM `ba775team08.imdb_non_commercial_datasets_11_11.title_ratings`
        LEFT JOIN `ba775team08.imdb_non_commercial_datasets_11_11.title_basics`
        USING (tconst)
        WHERE CAST(numVotes AS FLOAT64) > 4000
            AND runtimeMinutes != '\\N'
            AND titleType = 'tvEpisode'
            AND startYear like '20%'
            AND genres = 'Comedy'
        ORDER BY averageRating DESC
    ) AS highest_rated ON episode.tconst = highest_rated.tconst
    WHERE highest_rated.averageRating IS NOT NULL
    ORDER BY highest_rated.averageRating DESC
)
SELECT * FROM pTconst;


Query is running:   0%|          |

In [23]:
%%bigquery

(SELECT primaryTitle, rated_primaryTitle, rated_averageRating, rated_numVotes, rated_runtimeMinutes, genres FROM `ba775team08.imdb_non_commercial_datasets_11_11.new_title_parentTconst_comedy` as pT 
inner join `ba775team08.imdb_non_commercial_datasets_11_11.title_basics` as title
on pT.parentTconst = title.tconst
order by rated_averageRating desc, rated_numVotes desc)
limit 20

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,primaryTitle,rated_primaryTitle,rated_averageRating,rated_numVotes,rated_runtimeMinutes,genres
0,The Office,Finale,9.8,20853,51,Comedy
1,The Office,"Goodbye, Michael",9.8,17571,36,Comedy
2,Community,Modern Warfare,9.8,13776,20,Comedy
3,Community,Remedial Chaos Theory,9.8,13255,22,Comedy
4,It's Always Sunny in Philadelphia,Charlie Work,9.8,9824,23,Comedy
5,The Office,Stress Relief,9.7,16044,42,Comedy
6,It's Always Sunny in Philadelphia,The Nightman Cometh,9.7,7769,22,Comedy
7,Community,A Fistful of Paintballs,9.6,8241,30,Comedy
8,Community,Advanced Dungeons & Dragons,9.5,7373,22,Comedy
9,Community,For a Few Paintballs More,9.5,6784,30,Comedy


#### Findings:

**Insights on the Highest-Rated Comedy Episodes and Their Respective TV Shows:**

* Refinement for Widespread Popularity:
By specifically considering episodes with votes surpassing 20,000, the query focuses on identifying episodes that have garnered widespread popularity and significant viewer engagement.

* Inclusion of Hosting TV Show Information:
Notably, the query incorporates a column from another table, displaying the TV Show that hosts the highest-rated episode. This additional context provides insight into the TV series associated with the top-rated comedic content.

* Dominance of a few TV shows:
We can see that the Top 20 episodes of all time are dominated by 4 shows: The Office, Community, Its Always Sunny in Philadelphia, indicating their popularity.


<br>

### Question 7. Determining most successfull Directors based on the ratings and votings for their TV series 

In [129]:
%%bigquery
CREATE  OR REPLACE TABLE `imdb_non_commercial_datasets_11_11.title_parentTconst` AS
SELECT DISTINCT parentTconst FROM `ba775team08.imdb_non_commercial_datasets_11_11.title_episode` 

Query is running:   0%|          |

#### Creating a new table by joining the reqired datasets for determining the Director's ratings for series 

In [28]:
%%bigquery
CREATE OR REPLACE TABLE `imdb_non_commercial_datasets_11_11.Director_TvSeries_miniSeries_Ratings` AS
WITH Numbers AS 
(SELECT ROW_NUMBER() OVER () - 1 AS n
FROM `imdb_non_commercial_datasets_11_11.title_parentTconst`
LIMIT 20)

SELECT A.parentTconst,B.titleType,B.originalTitle,C.averageRating,C.numVotes,
SPLIT(D.directors, ',')[OFFSET(N.n)] AS director_nconst,E.primaryName AS director_name,B.startYear
FROM
`imdb_non_commercial_datasets_11_11.title_parentTconst` A
INNER JOIN
`ba775team08.imdb_non_commercial_datasets_11_11.title_basic_Tvseries_Tvminiseries_Tvepisodes` B 
ON A.parentTconst = B.tconst
INNER JOIN
`ba775team08.imdb_non_commercial_datasets_11_11.title_ratings` C 
ON A.parentTconst = C.tconst
INNER JOIN
`ba775team08.imdb_non_commercial_datasets_11_11.title_crew` D 
ON A.parentTconst = D.tconst
CROSS JOIN Numbers N
INNER JOIN
`ba775team08.imdb_non_commercial_datasets_11_11.names_basic` E ON SPLIT(D.directors, ',')[OFFSET(N.n)] = E.nconst
WHERE
C.averageRating IS NOT NULL 
AND N.n < ARRAY_LENGTH(SPLIT(D.directors, ','))
AND B.startYear LIKE '20%' 

Query is running:   0%|          |

#### Creating a new table to determine the Director's average ratings for all the series worked 

In [29]:
%%bigquery
CREATE OR REPLACE TABLE  `ba775team08.imdb_non_commercial_datasets_11_11.Director_AVGRatings` AS
WITH Series_Names AS 
(SELECT director_nconst,
STRING_AGG(originalTitle, ', ') AS Series_Names
FROM `ba775team08.imdb_non_commercial_datasets_11_11.Director_TvSeries_miniSeries_Ratings`
GROUP BY director_nconst)
SELECT 
A.director_nconst,B.primaryName AS director_name,A.titleType,AVG(A.averageRating) AS director_avg_rating,
COUNT(A.parentTconst) AS number_of_titles,SUM(A.numVotes) AS total_votes,C.Series_Names
FROM 
`ba775team08.imdb_non_commercial_datasets_11_11.Director_TvSeries_miniSeries_Ratings` A
INNER JOIN 
`ba775team08.imdb_non_commercial_datasets_11_11.names_basic` B ON A.director_nconst = B.nconst
LEFT JOIN 
Series_Names C ON A.director_nconst = C.director_nconst
GROUP BY 
A.director_nconst, B.primaryName,A.titleType,C.Series_Names;

Query is running:   0%|          |

In [30]:
%%bigquery
SELECT  * 
FROM `ba775team08.imdb_non_commercial_datasets_11_11.Director_AVGRatings`
WHERE titleType = "tvSeries" AND number_of_titles>=10 AND Total_votes >=100000
ORDER BY director_avg_rating DESC
LIMIT 10

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,director_nconst,director_name,titleType,director_avg_rating,number_of_titles,total_votes,Series_Names
0,nm0139081,Benjamin Caron,tvSeries,8.3,10,1502258,"Scott & Bailey, Derren Brown: Trick or Treat, ..."
1,nm0887700,Timothy Van Patten,tvSeries,8.263636,11,3737358,"Into the West, Pasadena, Deadwood, Boardwalk E..."
2,nm0851930,Alan Taylor,tvSeries,8.2,19,4237525,"House of the Dragon, Blue Eye Samurai, Intervi..."
3,nm0528186,Euros Lyn,tvSeries,8.2,15,2459991,"Belonging, Happy Valley, Cutting It, All About..."
4,nm0267497,Julian Farino,tvSeries,8.127273,11,1316191,"The Office, Bob & Rose, Ballers, Giri/Haji, Th..."
5,nm0661751,Dean Parisot,tvSeries,8.066667,12,1115830,"Modern Family, The Deep End, The Tick, Santa C..."
6,nm0132600,Jonny Campbell,tvSeries,8.054545,11,316123,"The Casual Vacancy, Shameless, Glasgow Kiss, D..."
7,nm1047532,Brian Kirk,tvSeries,8.028571,14,3557721,"Great Expectations, Father & Son, The Riches, ..."
8,nm0002399,Alik Sakharov,tvSeries,8.023529,17,5222776,"Marco Polo, Counterpart, The Witcher, Ozark, G..."
9,nm0590889,Daniel Minahan,tvSeries,7.994444,18,4122078,"John from Cincinnati, Life on Mars, Halston, H..."


#### Findings :

* The above analysis helps us identify the success for the directors from the TV series they have worked on based on the average ratings from 2000. When the analysis was done just on the basis of the ratings ,there were more than 200 top performing directors. To find out the top 10 directors, the conditions for number of votings (more than 100,000) and number of TV series(10) were applied. By doing this, we will be able to identify the top consistently high-performing directors in these categories.This analysis can help us understand which directors have the most significant impact on the success of TV series as judged by audience ratings and votings.

* A director's rank in the top 10 list might not have a direct impact with the number of titles they have worked on. A higher number of titles can mean their experience but it does not mean they get the higher rank. The quality of work also playes a crucial role for the ratings of the director. Benjamin Caron has titles (10 titles) less than Daniel Minahan (18 titles) but, has ranked more successfull. A director with fewer but highly-rated titles could outrank one with more titles but lower ratings.

<br>

### Question 8. What are the Top-rated TV series genre across different Regions and how is the genre preference now different from 2000 

In [31]:
%%bigquery
WITH GenreRanking AS (
SELECT akas.region, basics.genres,
ROUND(AVG(ratings.averageRating), 2) AS avgRating,
ROW_NUMBER() OVER(PARTITION BY akas.region ORDER BY AVG(ratings.averageRating) DESC) as Rank
FROM `imdb_non_commercial_datasets_11_11.title_akas` AS akas
INNER JOIN
`imdb_non_commercial_datasets_11_11.title_basics` AS basics ON akas.titleId = basics.tconst
INNER JOIN
`imdb_non_commercial_datasets_11_11.title_ratings` AS ratings ON basics.tconst = ratings.tconst
WHERE akas.region IS NOT NULL AND basics.titleType = "tvSeries" 
AND akas.region NOT LIKE '%\\\\%' AND basics.startYear = '2000'
GROUP BY akas.region, basics.genres)
SELECT
region,genres,avgRating
FROM GenreRanking
WHERE Rank = 1 AND genres NOT LIKE '%N'
ORDER BY avgRating DESC
LIMIT 10

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,region,genres,avgRating
0,PK,Romance,9.6
1,IN,Romance,9.5
2,RS,Family,9.5
3,AE,"Action,Biography,Drama",9.3
4,SY,"Action,Biography,Drama",9.3
5,EG,"Action,Biography,Drama",9.3
6,DE,"Action,Biography,Drama",9.3
7,VN,Drama,9.2
8,XWW,"Animation,Comedy,Family",9.1
9,US,"Adventure,Documentary,Family",9.1


In [32]:
%%bigquery
WITH GenreRanking AS (
SELECT 
    akas.region, basics.genres,
    ROUND(AVG(ratings.averageRating), 2) AS avgRating,
    ROW_NUMBER() OVER(PARTITION BY akas.region 
ORDER BY 
    AVG(ratings.averageRating) DESC) as Rank
FROM 
    `imdb_non_commercial_datasets_11_11.title_akas` AS akas
INNER JOIN
 `imdb_non_commercial_datasets_11_11.title_basics` AS basics 
ON akas.titleId = basics.tconst
INNER JOIN
 `imdb_non_commercial_datasets_11_11.title_ratings` AS ratings 
ON basics.tconst = ratings.tconst
WHERE 
    akas.region IS NOT NULL AND basics.titleType = "tvSeries" 
    AND akas.region NOT LIKE '%\\\\%' AND basics.startYear LIKE '202_'
GROUP BY 
    akas.region, basics.genres)


SELECT region,genres,avgRating
FROM GenreRanking
WHERE Rank = 1 AND genres NOT LIKE '%N'
ORDER BY avgRating DESC
LIMIT 10; 

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,region,genres,avgRating
0,SI,Adventure,9.8
1,CA,"Sport,Talk-Show",9.8
2,TZ,"Adventure,Drama",9.8
3,BR,"Animation,Drama",9.8
4,NG,Talk-Show,9.7
5,AU,"Comedy,Crime,Reality-TV",9.7
6,ZA,"Comedy,Crime,Reality-TV",9.7
7,US,"Comedy,Crime,Reality-TV",9.7
8,AO,Drama,9.7
9,NL,"Comedy,Crime,Reality-TV",9.7


#### Findings: 

* In the year 2000, the top genres are more varied across different regions. The genres include Romance, Family, Action,Biography,Drama,Animation, Comedy,Adventure and Documentary. Certain regions seem to have a strong preference for specific genres. For example, Romance is highly rated in Pakistan (PK) and India (IN), while Action/Biography/Drama is popular in Syria (SY), Egypt (EG), Germany (DE), and the United Arab Emirates (AE).

* By 2023, there is a clear difference in genre preferences. The top genres now include Adventure, Sport,Talk-Show, Animation,Drama,Comedy,Crime and Reality-TV. Comedy,Crime,Reality-TV is highly rated in Australia (AU), South Africa (ZA), United States (US) and Netherlands (NL).

* The ratings when compares to year 2000, remnained same with most genres receiving around 9.7 to 9.8 rankings. This suggests that viewer's engagement and satisfaction with TV series have remained strong over the years.

<br>

### Question 9. How do average ratings of TV series differ when they are broadcasted outside their original region?

In [24]:
%%bigquery
SELECT
  original.region AS originalRegion,
  original.language AS originalLanguage,
  akas.region AS popularityRegion,
  basics.genres,
  ROUND(AVG(ratings.averageRating), 2) AS avgRating,
  COUNT(DISTINCT basics.tconst) AS titleCount
FROM
  `imdb_non_commercial_datasets_11_11.title_basics` AS basics
INNER JOIN
  `imdb_non_commercial_datasets_11_11.title_akas` AS original ON basics.tconst = original.titleId AND original.isOriginalTitle = '1'
INNER JOIN
  `imdb_non_commercial_datasets_11_11.title_akas` AS akas ON basics.tconst = akas.titleId
INNER JOIN
  `imdb_non_commercial_datasets_11_11.title_ratings` AS ratings ON basics.tconst = ratings.tconst
WHERE
  basics.titleType IN ('tvSeries', 'tvEpisode')
  AND original.region NOT LIKE '%\\\\%'
  AND akas.region NOT LIKE '%\\\\%'
  AND original.region != akas.region
  AND original.region NOT LIKE '%\\\\%' AND original.language NOT LIKE  '%\\\\%' 
GROUP BY
  originalRegion, originalLanguage, popularityRegion, basics.genres

ORDER BY
  originalRegion, avgRating DESC
LIMIT 10;

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,originalRegion,originalLanguage,popularityRegion,genres,avgRating,titleCount
0,US,en,IT,"Family,Game-Show",7.5,1
1,US,en,SUHH,"Family,Game-Show",7.5,1
2,US,en,JP,"Family,Game-Show",7.5,1
3,US,en,GB,"Family,Game-Show",7.5,1
4,US,en,DE,"Family,Game-Show",7.5,1
5,US,en,CA,"Family,Game-Show",7.5,1
6,US,en,AU,"Family,Game-Show",7.5,1
7,US,en,IN,"Family,Game-Show",7.5,1
8,US,en,PH,"Family,Game-Show",7.5,1


#### Findings: 

* Regardless the country of origin, TV series with genres like Family and Game-Show  maintained consistent ratings across various regions. By this, it is said that the international success lies in the content which is created by addressing universal themes and emotions, capable of bridging cultural divides.


<br>

### Question 10. What are the top 5 TV Series with the highest number of episodes?

In [40]:
%%bigquery
WITH EpisodeCounts AS (
    SELECT
        tb.tconst AS tvSeriesId,
        tb.primaryTitle AS tvSeriesTitle,
        COUNT(te.tconst) AS numberOfEpisodes
    FROM
        `ba775team08.imdb_non_commercial_datasets_11_11.title_basics` AS tb
    JOIN
        `ba775team08.imdb_non_commercial_datasets_11_11.title_episode` AS te ON tb.tconst = te.parentTconst
    WHERE
        tb.titleType = 'tvSeries'
    GROUP BY
        tb.tconst, tb.primaryTitle
)

SELECT
    tvSeriesId,
    tvSeriesTitle,
    numberOfEpisodes
FROM
    EpisodeCounts
ORDER BY
    numberOfEpisodes DESC
LIMIT 5;


Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,tvSeriesId,tvSeriesTitle,numberOfEpisodes
0,tt12164062,NRK Nyheter,18593
1,tt0058796,Days of Our Lives,14729
2,tt0069658,The Young and the Restless,12791
3,tt0056758,General Hospital,12455
4,tt0053494,Coronation Street,10744


#### Findings:

* Among the TV series listed, 'NRK Nyheater' commands the highest episode count, an impressive 18,593 episodes. Following closely behind are 'Days of our Lives' with 14,729 episodes, 'The Young and the Restless' with 12,791 episodes, 'General Hospital' with 12,455 episodes, and 'Coronation Street' with 10,744 episodes.

<br>

### Question 11. How has the average number of episodes in TV Shows changed since the launch of streaming platforms?

#### 2003-2012

In [5]:
%%bigquery
SELECT
    ROUND(AVG(numberOfEpisodes)) AS averageEpisodes
FROM (
    SELECT
        tb.tconst AS tvSeriesId,
        COUNT(te.tconst) AS numberOfEpisodes
    FROM
        `ba775team08.imdb_non_commercial_datasets_11_11.title_basics` AS tb
    JOIN
        `ba775team08.imdb_non_commercial_datasets_11_11.title_episode` AS te ON tb.tconst = te.parentTconst
    WHERE
        tb.titleType = 'tvSeries'
        AND SAFE_CAST(tb.startYear AS INT64) BETWEEN 2003 AND 2013
        AND tb.endYear LIKE '20%'
    GROUP BY
        tb.tconst
) EpisodeCounts;


Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,averageEpisodes
0,61.0


#### 2013-2022

In [47]:
%%bigquery
SELECT
    ROUND(AVG(numberOfEpisodes)) AS averageEpisodes
FROM (
    SELECT
        tb.tconst AS tvSeriesId,
        COUNT(te.tconst) AS numberOfEpisodes
    FROM
        `ba775team08.imdb_non_commercial_datasets_11_11.title_basics` AS tb
    JOIN
        `ba775team08.imdb_non_commercial_datasets_11_11.title_episode` AS te ON tb.tconst = te.parentTconst
    WHERE
        tb.titleType = 'tvSeries'
        AND SAFE_CAST(tb.startYear AS INT64) BETWEEN 2014 AND 2023
        AND tb.endYear LIKE '20%'
    GROUP BY
        tb.tconst
) EpisodeCounts;


Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,averageEpisodes
0,31.0


#### Findings: 
* From these two tables we can see that the number of episodes on an average has almost halved from 2014-present in comparison to the number of episodes in 2003-2013.The streaming services changed the whole dynamics of the TV industry by dropping entire seasons of 13 episodes in 2013 to compete with the 22-24 episode year long run tv shows of network TV.

<br>

### Question 12. How has the average runtime of episodes changed since debut of streaming services?

#### 2003-2012

In [37]:
%%bigquery
SELECT
    ROUND(SUM(total_runTime)/SUM(numberOfEpisodes)) AS averageRuntime
FROM (
    SELECT
        tb.tconst AS tvSeriesId,
        COUNT(te.tconst) AS numberOfEpisodes,
        SUM(SAFE_CAST(tb.runtimeMinutes as INT64)) as total_runTime

    FROM
        `ba775team08.imdb_non_commercial_datasets_11_11.title_basics` AS tb
    JOIN
        `ba775team08.imdb_non_commercial_datasets_11_11.title_episode` AS te ON tb.tconst = te.parentTconst
    WHERE
        tb.titleType = 'tvSeries'
        AND SAFE_CAST(tb.startYear AS INT64) BETWEEN 2003 AND 2014
    GROUP BY
        tb.tconst
) EpisodeCounts;

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,averageRuntime
0,26.0


#### 2013-2022

In [38]:
%%bigquery
SELECT
    ROUND(SUM(total_runTime)/SUM(numberOfEpisodes)) AS averageRuntime
FROM (
    SELECT
        tb.tconst AS tvSeriesId,
        COUNT(te.tconst) AS numberOfEpisodes,
        SUM(SAFE_CAST(tb.runtimeMinutes as INT64)) as total_runTime

    FROM
        `ba775team08.imdb_non_commercial_datasets_11_11.title_basics` AS tb
    JOIN
        `ba775team08.imdb_non_commercial_datasets_11_11.title_episode` AS te ON tb.tconst = te.parentTconst
    WHERE
        tb.titleType = 'tvSeries'
        AND SAFE_CAST(tb.startYear AS INT64) BETWEEN 2014 AND 2023
    GROUP BY
        tb.tconst
) EpisodeCounts;

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,averageRuntime
0,19.0


#### Findings: 
* From these two tables we can see that the average episode length has reduced by 25% over the last decade, reflecting the broader trend in the media landscape towards short form content. 

<br>

### Question 13. How does the popularity of different genres vary over the years?

In [6]:
%%bigquery
WITH RankedGenres AS (
  SELECT
    CAST(basics.startYear AS STRING) AS Year,
    basics.genres,
    COUNT(*) AS GenreCount,
    ROW_NUMBER() OVER(PARTITION BY CAST(basics.startYear AS STRING) ORDER BY COUNT(*) DESC) AS rn
  FROM
    `ba775team08.imdb_non_commercial_datasets_11_11.title_basics` AS basics
  WHERE
    basics.titleType = 'tvSeries'  -- Assuming analysis for TV series
    AND CAST(basics.startYear AS STRING) > '2000' AND CAST(basics.startYear AS STRING) <= '2023'  -- Casting startYear as string and comparing
    AND basics.startYear NOT LIKE '%\\\\%'  -- Exclude rows with startYear containing '%\\\\%'
    AND basics.genres NOT LIKE '%\\\\%'  -- Exclude rows with genres containing '%\\\\%'
    AND basics.genres != 'N/A' -- Exclude null or 'N/A' genres
  GROUP BY
    Year, basics.genres
)

SELECT
  Year,
  genres AS TopGenre,
  GenreCount
FROM
  RankedGenres
WHERE
  rn = 1
ORDER BY
  Year desc;


Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,Year,TopGenre,GenreCount
0,2023,Drama,866
1,2022,Drama,1019
2,2021,Comedy,1265
3,2020,Comedy,1717
4,2019,Comedy,1589
5,2018,Comedy,1696
6,2017,Comedy,1923
7,2016,Comedy,2057
8,2015,Comedy,2047
9,2014,Comedy,1862


#### Findings:
- During the early 2000s, there was a noticeable preference for the Documentary genre. However, a shift occurred over the years, indicating a rising popularity in the Comedy genre. This shift might be attributed to the younger generation's inclination towards more easily digestible content rather than the documentary format.

- Additionally, there was a significant decline observed in the production of TV shows between 2020 and 2022. This decline could potentially be associated with the global impact of the COVID-19 pandemic, which might have disrupted or slowed down the production processes within the entertainment industry during that period.

<br>

## VI) Conclusion
---

__Based on our various findings we identified several intriguing patterns, strategies and best practices for television producers to follow for improving the chances of success in the market:__

The TV industry is marked by it's adaptability, as seen in the preferences of it's gobal audience. Ability to create content that is not just high in quality but also by keeping in mind the tatste and preferences of the viewers contributes to the success in this industry.

* __Regional Popularity and Ratings:__ Countries like India, USA, Canada, Great Britain, and Poland emerged as hubs for TV show production, garnering significant attention from IMDb users. However, high popularity doesn't always align with top ratings.Smaller countries like Macedonia, Bosnia, and Slovenia showcased higher-rated TV shows, indicating localized shows with devoted audiences can attain inflated ratings, contrasting the popularity-driven ratings of larger regions.

* __Genre Preferences:__ 'Comedy' emerged as the predominant genre across media, indicating its dominance in the entertainment industry. Regional preferences unveiled strong inclinations towards specific genres; for instance, documentaries in English-speaking nations and action-oriented content in India. As the years passed, regardless of the changing preferences in genres, the quality or appeal of TV series has remained high, keeping viewers engaged and satisfied. It's also indicative of the TV industry's ability to evolve and adapt to changing viewer preferences and global trends.

* __Director Success:__ Directors with a few highly-rated titles might be seen as specialists who deliver quality over quantity.Conversely, directors with a long list of titles may be viewed as versatile and capable of handling diverse genres and themes, even if each individual title doesn’t have top-tier ratings.

* __Shift in TV Dynamics:__ The transition from traditional network TV's lengthy seasons to streaming services releasing shorter seasons with fewer episodes, notably from 22-24 episodes to 13-episode runs, revolutionized the TV landscape. This shift aimed at capturing audience attention by offering more condensed, impactful content.

* __Enhanced Viewer Experience:__ The condensed episode format enhanced the overall viewing experience, fostering a binge-watching culture. Viewers appreciated the concise storytelling, leading to heightened anticipation for each new episode release, thereby contributing to increased show popularity and sustained audience engagement.

* __Meaningful Content Creation:__ The paradigm shift enabled TV producers to focus on crafting more concise, story-driven content. This change allowed for more intricate plot development, character arcs, and focused narratives, leading to increased viewer engagement and sustained interest throughout the show's duration.


<br>

## VII. Challenges
---

* Uploading files to Big Query was an issue as it was not automatically detecting the type of the columns. We had to choose type as "string" and "cast" to change most of the required columns appropriate types.

* Analysing seven datasets to find out meaningful insights was a challenge. To overcome this issue, we had to refine our analysis to Tv series and their episodes only

* Multiple query dataset connection issue: Since it was custom query set, we could not use multiple query set on the same worksheet. It needed tricky joining to make it work. 

* Creating dashboard visualizations directly from our custom SQL query sets posed a challenge because of the limitations in Tableau features. As a workaround, we transformed them into a story format within Tableau.

* Visualizing extremely large datasets, complex queries sometimes affected Tableau's performance, leading to slower rendering and response times in Tableau.



<br>

## VIII) References
---

1. [IMDb Logo](https://en.wikipedia.org/wiki/File:IMDb_TV.png)
2. ChatGPT - to check syntax for certain commands in Bigquery like - SPLIT and Rank Window functions, etc.
3. [IMDB wikipedia](https://en.wikipedia.org/wiki/IMDb)
4. ChatGPT and Grammarly - for grammar and spelling check.
5. [IMDb country code reference](https://help.imdb.com/article/contribution/other-submission-guides/country-codes/G99K4LFRMSC37DCN#)
6. [IMDb data dictionary](https://developer.imdb.com/non-commercial-datasets/)
7. [Miro - ERD Maker Software](https://miro.com/app/board/uXjVNNjFs3M=/)
8. Tableau Desktop, Public for Visualization

<br>