# Statistical Rating analysis and Customer behavioural Analysis:

This project is based on one of the most interesting topic : **Movie Recommendations**. In this notebook we are going to check whether we can find top movies as per rating and popularity and other features.

As we are going through the rating we will also try to find the best audiences and prepare medias for them which will be a **GOAL** of this project.

# UPVOTE if you like this notebook :)

### Libraries :

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt

Now we are going to read the files and then visualize it for the first time, this might help us to understand that how we can wrangle the data.

In [None]:
df=pd.read_csv('/kaggle/input/amazon-movie-ratings/Amazon.csv')
df.head()

We can see that every single subscriber haven't given review or rating for every sigle movie but for some single ones. This helps us understand the consumer/subscribers taste of content.

In [None]:
df.info()

In [None]:
df_1=df.fillna(0.0)
df_1.head()

In [None]:

plt.figure(figsize=(20,10))
sns.heatmap(df.drop('user_id',axis=1),cmap="YlGnBu")
plt.show()

In [None]:
plt.figure(figsize=(20,5))
plt.title('Total Rating in Movies')
plt.plot(df.drop('user_id',axis=1).sum())
plt.xticks(rotation=90)
plt.show()

We can conclude here that some movies have been rated by more consumer and some are very less.

In [None]:
df.shape

# Movie Recommendation based on votes and ratings :

In this part we are going to check for the highest rated and the highets voted films. In this type section we can find the best suited movie for the mass and then can produce the same type of criteria which will be beneficial for the production houses to collect more profits.

In [None]:
arr=[]
for i in df_1.columns:
  #print(df[i].isnull().sum())
  if df[i].isnull().sum()!=4848:
    arr.append(i)

df_actual=df[arr]
df_actual.head()

In [None]:
df_actual.shape

The new dataframe has the same shape as the previous dataframe which indicates that every single movie has atleast one ratings by this far.

In [None]:
arr=[]
pep_cnt=[]
for col in df_actual.drop('user_id',axis=1).columns:
  arr.append(df_actual[col].sum()/(4848-df_actual[col].isnull().sum()))
  pep_cnt.append((4848-df_actual[col].isnull().sum()))
plt.figure(figsize=(20,5))
plt.title('effective stars per movie')
plt.ylabel('Effective Ratings')
plt.plot(arr)
plt.show()

After visualizing the average rating we can identify that most of the movies are rated in a range greater than 4 and less than 5.

still there are a number of movies (very less) which have recieved very low consumer rating.

In [None]:
df_ranked=pd.DataFrame({'movie_name':np.arange(1,207,1),'eff_score':arr,'votes':pep_cnt})
df_ranked.head()

This dataset has been produced on the total votes and effective ratings. This will help us get the best recommendation as per the client response found in the dataset.

#### Best Rating Recommendation

In [None]:
df_ranked.sort_values(by=['eff_score','votes'],ascending=False)

In [None]:
x=df_ranked[df_ranked['eff_score']>4]
Best_movie_per_rating=[]
for i in x['movie_name']:
  Best_movie_per_rating.append(i)
np.asarray(Best_movie_per_rating).reshape(23,7)

#### Best popularity recommendation :

In [None]:
df_ranked.sort_values(by=['votes','eff_score'],ascending=False)

In [None]:
x=df_ranked[df_ranked['votes']>4]
Best_movie_per_popularity=[]
for i in x['movie_name']:
  Best_movie_per_popularity.append(i)
#len(Best_movie_per_popularity)
np.asarray(Best_movie_per_popularity).reshape(19,3)

In [None]:
plt.figure(figsize=(20,8))
plt.title('Votes per rating')
plt.scatter(df_ranked['eff_score'],df_ranked['votes'],color='g',s=150)
plt.show()

# Behavorial Analysis of consumer :

As the company should produce best content as per the rating also they should find out the consumers which are not showing healthy behaviour ( giving ratings without visiting and others).

In [None]:
consumer=df.T
consumer.head()

In [None]:
consumer.shape

In [None]:
print('Given rating :')

207-consumer.isnull().sum()

In [None]:
m=consumer.isnull().sum()
plt.figure(figsize=(20,8))
plt.title('People gave rating')
plt.scatter(np.arange(1,len(m)+1),207-m,s=2)
plt.show()

In [None]:
consumer.fillna(0.0,inplace=True)

In [None]:
c1=consumer.drop('user_id',axis=0)
c1.head()

In [None]:
average_rating=[]
ratings_given=[]
for i in c1.columns:
  val=c1[i].sum()/(len(c1[c1[i]>0]))
  average_rating.append(val)
  ratings_given.append(len(c1[c1[i]>0]))
plt.figure(figsize=(15,8))
plt.title('Consumer ratings')
plt.xlabel('Ratings')
plt.ylabel('consumer id')
plt.scatter(np.arange(1,len(average_rating)+1,1),average_rating)
plt.show()

In [None]:
con_data=pd.DataFrame({'id':np.arange(1,4849,1),'av_rating':average_rating,'rating_count':ratings_given})
con_data.head()

In [None]:
con_data.sort_values(by=['av_rating','rating_count'],ascending=False)

In [None]:
con_data.sort_values(by=['rating_count','av_rating'],ascending=False)

In [None]:
len(con_data.loc[(con_data.av_rating<2) & (con_data.rating_count<2)])


In [None]:
fin_data=con_data.loc[(con_data.av_rating>=2) | (con_data.rating_count>=2)]

In [None]:
fin_data

So the fin_data contains all the consumer who are very light or happy behavorial and not making any degrdataion while creating or punlishing any content. 

In [None]:
well_subscribers=[]

#print("The list of names of good subscribers are :")
for i in fin_data['id']:
  try:
    #print(df['user_id'][i])
    well_subscribers.append(df['user_id'][i])
  except:
    pass
well_subscribers=pd.DataFrame({'names':well_subscribers})
well_subscribers.to_csv('well_subscribers.csv',index=False)

In [None]:
best_film_data=con_data.loc[(con_data.av_rating>5) | (con_data.rating_count>=4)]

In [None]:
len(best_film_data)

As they are giving reviews on more than 4 contents we are predicting he/she is amazed with the content and also the platform. If we can produce the content similar to these people's liked list that would be nice for growth of the platform i.e. for Amazon.

In [None]:
best_film_data

In [None]:
movie_name=[]

for i in best_film_data['id']:
  for j in df.iloc[i-1][df.iloc[i-1].isnull()==False].index[1:]:
    if j  not in movie_name:
      movie_name.append(j)


In [None]:
movie_name

These whole project ends here declaring these movies which are the most effective towards subscribers and if contents are produced similar to this ones, thus creating profit .


## HURRAH!
We've completed a recommendation system projects.

If you have like this projct go through my other projects on other topics on [kaggle](https://kaggle.com/sagnik1511/notebooks) or in [github](https://github.com/sagnik1511/repositories).

# THANK YOU :)