## What are Golden Globe Awards?
The Golden Globe Awards are accolades bestowed by the 93 members of the Hollywood Foreign Press Association beginning in January 1944, recognizing excellence in film, both American and International, and the American television.

The annual ceremony at which the awards are presented, is a major part of the film industry's awards season, which culminates each year in the Academy Awards. The eligibility period for the Golden Globes corresponds to the calendar year (i.e. January 1 through December 31). The 77th Golden Globe Awards, honoring the best in film and television in 2019, were held on January 5, 2020.[Wiki reference](https://en.wikipedia.org/wiki/Golden_Globe_Awards)
![Golden Globe Awards](https://upload.wikimedia.org/wikipedia/en/0/09/Golden_Globe_Trophy.jpg)

In this notebook I have performed Exploratory Data Analysis on the Golden Globe Awards dataset and tried to identify various trends and features from the given dataset.

I hope you find this kernel helpful and some **<font color='red'>UPVOTES</font>** would be very much appreciated


In [None]:
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

### **Importing Required Libraries**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

## **Describing the Dataset**

### Loading the dataset

In [None]:
df = pd.read_csv('/kaggle/input/golden-globe-awards/golden_globe_awards.csv')
df.head()

In [None]:
df.tail()

The dataset contains all the information about the Golden Globe Awards from the year 1944 when the first awards were presented to the current i.e. 2020.

### Dimensions of dataset

In [None]:
print('Number of rows in the dataset: ', df.shape[0])
print('Number of columns in the dataset: ', df.shape[1])

**The dataset contains the following features**

**1. year_film:** The year in which the film was screened.

**2. year_award:** The year of the award ceremony.

**3. ceremony:** Ceremony Number.

**4. category:** Name of the categories for which a film was nominated.

**5. nominee:** Name of the nominee.

**6. film:** Name of the film.

**7. win:** Whether the film won the award or not.


### Checking for Null values in the dataset

In [None]:
null_values = df.isnull().sum()
null_values = pd.DataFrame(null_values, columns=['Missing Values'])
null_values

There are 1800 missing values in the film category.

### Unique categories for which award was given.
There are 75 different categories in which awards were/is given till now. The categories are listed below:

In [None]:
total_unique_categories = list(range(df['category'].nunique()))
print('------CATEGORIES----------')
for category in zip(total_unique_categories, df['category'].unique()):
    print(category[0], ': ', category[1])

**1. The following awards are no longer given/discontinued:**

**2. Best Documentary Film:** Awarded from 1972 to 1976

**3. Best English-Language Foreign Film:** Awarded from 1957 to 1973

**4. New Star of the Year – Actor:** Awarded from 1948 to 1983

**5. New Star of the Year – Actress:** Awarded from 1948 to 1983

**6. Henrietta Award (World Film Favorite – Female):** Awarded from 1950 to
1979

**7. Henrietta Award (World Film Favorite – Male):** Awarded from 1950 to 1979

**8. Best Film Promoting International Understanding:** Awarded from 1945 to 1963

**9. Golden Globe Award for Best Cinematography:** Awarded from 1948 to 1953, in 1955 and in 1963.

## **Exploratory Data Analysis**

### Number of unique nominees in the dataset

In [None]:
print('Number of unique nominees for the award: ', df['nominee'].nunique())

### Number of unique films in the dataset

In [None]:
print('Number of unique films in the award ceremony: ', df['film'].nunique())

### Nominee who won the most awards

In [None]:
nominee_most_awards = df.groupby('nominee')['win'].sum()
nominee_most_awards = pd.DataFrame(nominee_most_awards[nominee_most_awards == nominee_most_awards.max()])
nominee_most_awards

### Nominee who was nominated the most number of times

In [None]:
most_nominated_nominee = df.groupby('nominee')['win'].count()
pd.DataFrame(most_nominated_nominee[most_nominated_nominee == most_nominated_nominee.max()])

Meryl Streep is the only person who has been nominated as well as won the awards the most number of times for her contribution to movies.

### Nominees who won atlest 5 awards

In [None]:
more_5_win = df.groupby('nominee')['win'].sum()
more_5_win = more_5_win.reset_index()
more_5_win = more_5_win[more_5_win['win'] >= 5].sort_values(ascending=False, by='win')

# print('There are ', more_5_win['win'].count(), 'nominees who won more than 5 awards')
plt.figure(figsize=(20,8))
sns.set_style('whitegrid')
sns.barplot(x='nominee', y='win', data=more_5_win, palette='hls')
plt.title('Nominees who won more than 5 awards', fontsize=14)
plt.xlabel('Nominee', fontsize=14)
plt.ylabel('Total Wins', fontsize=14)
plt.xticks(fontsize=12, rotation=90)
plt.yticks(fontsize=12)
plt.show()

### Nominees who were nominated more than 15 times

In [None]:
more_15_nominated = df.groupby('nominee')['win'].count()
more_15_nominated = more_15_nominated.reset_index()
more_15_nominated = more_15_nominated[more_15_nominated['win'] >= 15].sort_values(ascending=False, by='win')

# top_10_nominated
plt.figure(figsize=(20,8))
sns.set_style('whitegrid')
sns.barplot(x='nominee', y='win', data=more_15_nominated, palette='hls')
plt.title('Nominees who were nominated more than 15 times', fontsize=14)
plt.xlabel('Nominee', fontsize=14)
plt.ylabel('Nomination Count', fontsize=14)
plt.xticks(fontsize=12, rotation=90)
plt.yticks(fontsize=12)
plt.show()

### Film that has won the most awards

In [None]:
film_most_awards = df.groupby('film')['win'].sum()
pd.DataFrame(film_most_awards[film_most_awards == film_most_awards.max()])

### Films/T.V. Show that have won atleast 5 awards

In [None]:
more_5_films = df.groupby('film')['win'].sum()
more_5_films = more_5_films.reset_index()
more_5_films = more_5_films[more_5_films['win'] >= 5].sort_values(ascending=False, by='win')

plt.figure(figsize=(20,8))
sns.set_style('whitegrid')
sns.barplot(x='film', y='win', data=more_5_films, palette='hls')
plt.title('Films that have won atleast 5 awards', fontsize=12)
plt.xlabel('Film', fontsize=12)

plt.ylabel('Award Count', fontsize=12)
plt.xticks(rotation=90, fontsize=12)
plt.yticks(rotation=90, fontsize=12)
plt.show()

### Film that has won most awards in a single year

In [None]:
film_awards_year = df.groupby(['film', 'year_award'])['win'].sum()
pd.DataFrame(film_awards_year[film_awards_year == film_awards_year.max()])

La La Land won 6 awards in the year 2017.

### Films that won atleast four awards in a single year

In [None]:
film_award_df = pd.DataFrame(film_awards_year).reset_index()
film_four_awards = film_award_df[film_award_df['win'] >= 4].sort_values(ascending = False, by='win')


plt.figure(figsize=(20,8))
sns.set_style('whitegrid')
sns.barplot(x='film', y='win', data=film_four_awards, palette='hls')
plt.title('Films with atleast 4 awards in a single year', fontsize=12)
plt.xlabel('Film', fontsize=12)
plt.ylabel('Award Count', fontsize=12)
plt.xticks(rotation=90, fontsize=12)
plt.yticks(rotation=90, fontsize=12)
plt.show()

### Film that got the most nominations in a single year

In [None]:
film_nominations = df.groupby(['film', 'year_award'])['win'].count()
pd.DataFrame(film_nominations[film_nominations == film_nominations.max()]).reset_index()

Nashville had a record 10 nominations in the year 1976.

### Most nominated directors in the Motion Picture category.

All these directors have atleast five nominations

In [None]:
director_motion_picture = df[df['category'].str.contains('Best Director - Motion Picture')]
director_motion_picture = director_motion_picture.groupby('nominee')['win'].count().reset_index().sort_values(ascending=False, by='win')
director_motion_picture = director_motion_picture[director_motion_picture['win'] >= 5]
# director_motion_picture

plt.figure(figsize=(15,6))
sns.set_style('whitegrid')
sns.barplot(x='nominee', y='win', data=director_motion_picture, palette='hls')
plt.title('Most nominated directors in motion picture category', fontsize=12)
plt.xlabel('Director', fontsize=12)

plt.ylabel('Nominations', fontsize=12)
plt.xticks(rotation=90, fontsize=12)
plt.yticks(rotation=90, fontsize=12)
plt.show()

Steven Spielberg has a record 12 nominations followed by Martin Scorsese who has 9 nominations.

### Director who won the most awards in Motion Picture category

In [None]:
winner_director = df[df['category'].str.contains('Best Director - Motion Picture')]
winner_director = winner_director.groupby('nominee')['win'].sum()
pd.DataFrame(winner_director[winner_director == winner_director.max()])

**Elia Kazan** had won four Golden Globe awards for the Best Director in the Motion Picture category.

**I will update the notebook with more analysis and trends in the future**

**Suggestions are welcome**