# **Netflix and ...Analyze?**

<img src="https://media.giphy.com/media/KZe02gpoAj4yVjxKQt/giphy.gif">

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
df = pd.read_csv('/kaggle/input/netflix-shows/netflix_titles.csv')

In [None]:
print(df.shape)
print(df.columns)
df.head()

In [None]:
for column in df:
    print(column)
    print(df[column].isnull().sum().sum())

In [None]:
df = df.fillna('Unknown')

In [None]:
df.head()

**First, I will look at the ratings of the productions over the world.**

In [None]:
ratings = df['rating'].value_counts()

In [None]:
plt.plot(ratings)
plt.xticks(rotation=90)
plt.xlabel('Ratings')
plt.ylabel('Number of Work')
plt.title('World')

We see that most of the production in the world is made for mature audiences.

Genre is a interesting category here. But it is not clean to analyze in this form. I will try to refine it a little bit by taking the first genre specification.

In [None]:
world_genre = df['listed_in']
world_genre.shape, df.shape

In [None]:
for i in range(0, len(world_genre)-1):
    temp = world_genre[i].split()
    world_genre[i] = temp[0]

In [None]:
df['listed_in']= df['listed_in'].map(lambda x: x.lstrip('.,'))
df['listed_in'] = df['listed_in'].map(lambda x: x.rstrip('.,'))

In [None]:
df.head()

In [None]:
values = df['listed_in'].value_counts()

In [None]:
plt.plot(values)
plt.xticks(rotation=90)
plt.xlabel('Genres')
plt.ylabel('Number of Work')
plt.title('World')

As we can see from above. World mostly responds to Dramas. There is an huge difference between Dramas and the second closest Comedies.

**Now, I want to look at the overall productions happening across the years.**

In [None]:
sns.distplot(df['release_year'], kde=True)

In [None]:
movie_world = df.loc[df['type'] == 'Movie']
tv_world = df.loc[df['type'] == 'TV Show']

In [None]:
sns.set(rc={'figure.figsize':(20,10)})
f, axes = plt.subplots(2, 1)
sns.distplot(movie_world['release_year'], kde=True, ax=axes[0]).set(title='Movie')
sns.distplot(tv_world['release_year'], kde=True, ax=axes[1]).set(title='TV Show')

Globally speaking, we can see the growth that Netflix had at the tail of 2010s. And of course we can see the effect of the pandemic in the 2020. Exponentially growing industry had a massive descent in 2020.

TV Shows effected way less than the movies from this pandemic. Both of them had massive downfall but TV Shows were able to stay above the water. This must be why Tom Cruise was so furious.

Finally it is also very clear that people were getting more into TV Shows than Movies lately, globally.

**Now, I want to analyze the data continent by continent to see if there are any differences between different cultures. However, there too many countries in the dataset so I will not be able to add every country**

In [None]:
country = df['country']

In [None]:
country = country.sort_values()
country = country.reset_index(drop=True)

In [None]:
country.unique()

In [None]:
#Africa
south_africa = df.loc[df['country'] == 'South Africa']
algeria = df.loc[df['country'] == 'Algeria']
sudan = df.loc[df['country'] == 'Sudan']
angola =  df.loc[df['country'] == 'Angola']

#Asia
turkey = df.loc[df['country'] == 'Turkey']
japan = df.loc[df['country'] == 'Japan']
russia = df.loc[df['country'] == 'Russia']
china = df.loc[df['country'] == 'China']
india = df.loc[df['country'] == 'India']
arab_emirates = df.loc[df['country'] == 'United Arab Emirates']
jordan = df.loc[df['country'] == 'Jordan']

#Australia
aust = df.loc[df['country'] == 'Australia']
new_zealand = df.loc[df['country'] == 'New Zealand']

#Europe
france = df.loc[df['country'] == 'France']
italy = df.loc[df['country'] == 'Italy']
uk = df.loc[df['country'] == 'United Kingdom']
poland = df.loc[df['country'] == 'Poland']
spain = df.loc[df['country'] == 'Spain']
portugal = df.loc[df['country'] == 'Portugal']
germany = df.loc[df['country'] == 'Germany']
sweden = df.loc[df['country'] == 'Sweden']

#North America
us = df.loc[df['country'] == 'United States']
canada = df.loc[df['country'] == 'Canada']
mexico = df.loc[df['country'] == 'Mexico']

#South America
brazil = df.loc[df['country'] == 'Brazil']
argentina = df.loc[df['country'] == 'Argentina']
paraguay = df.loc[df['country'] == 'Paraguay']
peru = df.loc[df['country'] == 'Peru']
uruguay = df.loc[df['country'] == 'Uruguay']

In [None]:
africa = pd.concat([south_africa, algeria, sudan, angola])
asia = pd.concat([turkey, japan, russia, china, india, arab_emirates, jordan])
australia = pd.concat([aust, new_zealand])
europe = pd.concat([france, italy, uk, poland, spain, portugal, germany, sweden])
north_america = pd.concat([us, canada, mexico])
south_america = pd.concat([brazil, argentina, paraguay, peru, uruguay])

I would like to say that I chose these countries totally randomly.

In [None]:
sns.set(rc={'figure.figsize':(8,3)})
sns.distplot(africa['release_year'], kde=True)
plt.title('Africa')

In [None]:
sns.set(rc={'figure.figsize':(8,3)})
sns.distplot(asia['release_year'], kde=True)
plt.title('Asia')

In [None]:
sns.set(rc={'figure.figsize':(8,3)})
sns.distplot(australia['release_year'], kde=True)
plt.title('Australia')

In [None]:
sns.set(rc={'figure.figsize':(8,3)})
sns.distplot(europe['release_year'], kde=True)
plt.title('Europe')

In [None]:
sns.set(rc={'figure.figsize':(8,3)})
sns.distplot(north_america['release_year'], kde=True)
plt.title('North America')

In [None]:
sns.set(rc={'figure.figsize':(8,3)})
sns.distplot(south_america['release_year'], kde=True)
plt.title('South America')

2020 was not a downhill road for all of the continents it seems. Africa, Australia and South America numbers continued to grow as years went by. The other continents followed the worlds trend.

**I think it is not interesting to see the cultures tendencies to like a TV Show or a Movie. I want to see which cultures enjoys which genre.**

In [None]:
africa_val = africa['listed_in'].value_counts()
asia_val = asia['listed_in'].value_counts()
australia_val = australia['listed_in'].value_counts()
europe_val = europe['listed_in'].value_counts()
north_america_val = north_america['listed_in'].value_counts()
south_america_val = south_america['listed_in'].value_counts()

In [None]:
plt.plot(africa_val)
plt.xticks(rotation=90)
plt.xlabel('Genres')
plt.ylabel('Number of Work')
plt.title('Africa')

In [None]:
plt.plot(asia_val)
plt.xticks(rotation=90)
plt.xlabel('Genres')
plt.ylabel('Number of Work')
plt.title('Asia')

In [None]:
plt.plot(australia_val)
plt.xticks(rotation=90)
plt.xlabel('Genres')
plt.ylabel('Number of Work')
plt.title('Australia')

In [None]:
plt.plot(europe_val)
plt.xticks(rotation=90)
plt.xlabel('Genres')
plt.ylabel('Number of Work')
plt.title('Europe')

In [None]:
plt.plot(north_america_val)
plt.xticks(rotation=90)
plt.xlabel('Genres')
plt.ylabel('Number of Work')
plt.title('North America')

In [None]:
plt.plot(south_america_val)
plt.xticks(rotation=90)
plt.xlabel('Genres')
plt.ylabel('Number of Work')
plt.title('South America')

* Africa mostly enjoys comedies.
* Australia enjoys international shows and/or movies.
* In europe, british productions are very popular.
* North America is very into documentaries.
* Rest follows the worlds trend: Dramas.

**This is it for this notebook. If you like, you can do the continent based analysis with the ratings using the same procedure as above. Take care!**

<img src="https://media.giphy.com/media/xUPOqo6E1XvWXwlCyQ/giphy.gif">