<h2 align="center">Movies Dataset</h2>

**Converting the excel file with 3 sheets to a csv file with single sheet**

In [None]:
import numpy as np
import pandas as pd

**Importing the excel file in separate files for separate sheets**

In [None]:
s1 = pd.read_excel("Dataset/movies.xls",sheet_name=0)
s1.head(2)

In [None]:
s2 = pd.read_excel("Dataset/movies.xls",sheet_name=1)
s2.head(2)

In [None]:
s3 = pd.read_excel("Dataset/movies.xls",sheet_name=2)
s3.head(2)

**Now we've imported all the 3 sheets let's combine them**

In [None]:
df = pd.concat([s1,s2,s3], ignore_index=True)
df[0:5000:2000]

In [None]:
# Saving the converted file in csv format
df.to_csv("Dataset/movies-comb.csv", index=False)

In [None]:
# Import the new file
df = pd.read_csv("Dataset/movies-comb.csv")
df.head(2)

In [None]:
df.shape

In [None]:
df.info()

In [None]:
df.columns

In [None]:
# Finding all missing values
df.isna().sum()

In [None]:
# All records of year 2005
df[df.Year==2005]

In [None]:
# Genre total counts
df.Genres.value_counts()

In [None]:
# Unique Genres
np.unique(df.Genres)

In [None]:
# Showing records having Language 'Portuguese' & Country 'UK'
df[(df.Language=="Portuguese")&(df.Country=="UK")]

In [None]:
# Showing records of movies having duration less than 100 of Japan
df[(df.Duration<100)&(df.Country=="Japan")]

In [None]:
# Showing records having budgets more than 1 million and director is Frank Capra
df[(df.Budget>1000000)&(df.Director=="Frank Capra")]

In [None]:
# Showing top 5 records based on IMDB Rating, only 3 cols
df.loc[df.sort_values("IMDB Score", ascending=False).index[:5],["Title","Genres","Budget"]]

In [None]:
# Replace empty countries with India
df.Country.fillna("India")

In [None]:
# Dropping all empty values
df.dropna()

**Other way of doing the same:** Importing the dataset in the below format creates a dictionary which stores multiple sheets

In [None]:
s4 = pd.read_excel("Dataset/movies.xls", sheet_name=[0,1,2])
# s4 = pd.read_excel("Dataset/movies.xls", sheet_name=['1900s','2000s','2010s'])
print(s4.keys())
print(type(s4))

In [None]:
df = pd.concat((s4[keys] for keys in s4.keys()), ignore_index=True)
df

In [None]:
# m1 = pd.read_excel("Dataset/movies.xls", sheet_name=0, index_col=0)
# m1.head()