<a href="https://colab.research.google.com/github/mahimscit/DO180-apps/blob/master/CapstoneProject.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Capstone Project(Netflix)**

Problem Statement:
Customer Behaviour and it’s prediction lies
at the core of every Business Model. From
Stock Exchange, e-Commerce and
Automobile to even Presidential Elections,
predictions serve a great purpose. Most of
these predictions are based on the data
available about a person’s activity either
online or in-person.

In [None]:
pip install pandas scikit-learn



In [None]:
#Customer Purchase Prediction

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

data = {
    'Age': [25, 34, 45, 23, 35, 50, 28, 40, 33, 29],
    'Annual_Income': [50000, 60000, 70000, 45000, 80000, 100000, 55000, 65000, 48000, 72000],
    'Num_Transactions': [5, 7, 3, 8, 9, 6, 4, 5, 7, 6],
    'Has_Purchased': [1, 1, 0, 1, 1, 0, 1, 0, 1, 0]  # 1 = Purchased, 0 = Not Purchased
}

df = pd.DataFrame(data)

X = df[['Age', 'Annual_Income', 'Num_Transactions']]
y = df['Has_Purchased']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model = LogisticRegression()

model.fit(X_train_scaled, y_train)

y_pred = model.predict(X_test_scaled)

accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print(f'Accuracy: {accuracy:.2f}')
print('Confusion Matrix:')
print(conf_matrix)
print('Classification Report:')
print(class_report)

new_customer = pd.DataFrame({
    'Age': [30],
    'Annual_Income': [75000],
    'Num_Transactions': [6]
})

new_customer_scaled = scaler.transform(new_customer)

purchase_prediction = model.predict(new_customer_scaled)
print(f'Will the new customer purchase? {"Yes" if purchase_prediction[0] == 1 else "No"}')


Accuracy: 1.00
Confusion Matrix:
[[2]]
Classification Report:
              precision    recall  f1-score   support

           1       1.00      1.00      1.00         2

    accuracy                           1.00         2
   macro avg       1.00      1.00      1.00         2
weighted avg       1.00      1.00      1.00         2

Will the new customer purchase? Yes




Problem Statement: Recommendation Engines are the much
needed manifestations of the desired
Predictability of User Activity.
Recommendation Engines move one step
further and not only give information but
put forth strategies to further increase users
interaction with the platform.

In [None]:
#Collaborative Filtering Recommendation Engine
import pandas as pd

data = {
    'item_id': [1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5],
    'user_id': [1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8],
    'rating': [5, 3, 4, 5, 2, 4, 3, 5, 4, 2, 5, 4, 3, 5, 4]
}

df = pd.DataFrame(data)

item_popularity = df.groupby('item_id')['rating'].mean().sort_values(ascending=False)

def get_recommendations(top_n=3):
    return item_popularity.head(top_n).index.tolist()

recommendations = get_recommendations()
print(f"Top 3 recommendations: {recommendations}")  # Output: [1, 3, 2]

Top 3 recommendations: [1, 4, 3]


Problem Statement: In today’s world OTT platform and Streaming
Services have taken up a big chunk in the
Retail and Entertainment industry.
Organizations like Netflix, Amazon etc.
analyse User Activity Pattern’s and suggest
products that better suit the user needs and
choices.

In [None]:
#OTT User Analysis
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

data = {
    'item_id': ['Stranger Things', 'The Crown', 'The Witcher', 'Bridgerton', 'Ozark'],
    'item_description': [
        'Sci-fi horror series about a group of friends who encounter supernatural forces.',
        'Historical drama series about the reign of Queen Elizabeth II.',
        'Fantasy series based on the novels of the same name.',
        'Romance series set in Regency England.',
        'Crime drama series about a family involved in money laundering.'
    ],
    'genres': ['Sci-Fi, Horror, Drama', 'Drama, History', 'Fantasy, Drama, Action',
               'Romance, Drama, Period', 'Crime, Drama, Thriller']
}

df = pd.DataFrame(data)

df['content'] = df['item_description'] + ' ' + df['genres']

vectorizer = TfidfVectorizer(stop_words='english')

tfidf_matrix = vectorizer.fit_transform(df['content'])

cosine_sim = cosine_similarity(tfidf_matrix)

def get_recommendations(item_id, top_n=3):
    item_index = df.index[df['item_id'] == item_id][0]

    sim_scores = list(enumerate(cosine_sim[item_index]))

    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    sim_scores = sim_scores[1:top_n + 1]

    recommendations = [df['item_id'][i] for i, _ in sim_scores]

    return recommendations

recommendations = get_recommendations('Stranger Things')
print(f"Recommendations for Stranger Things: {recommendations}")


Recommendations for Stranger Things: ['The Crown', 'Ozark', 'The Witcher']


Problem Statement: For the purpose of this Project we will be
creating one such Recommendation Engine
from the ground-up, where every single user,
based on there area of interest and ratings,
would be recommended a list of movies that
are best suited for them.

In [None]:
#Movie Recommendation Engine Based on User Interests and Ratings

import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity

ratings = {
    'user_id': [1, 1, 2, 2, 3, 3, 4, 4, 5, 5],
    'movie_id': [1, 3, 2, 4, 1, 5, 3, 4, 2, 5],
    'rating': [5, 4, 3, 5, 3, 4, 4, 5, 4, 3]
}
ratings_df = pd.DataFrame(ratings)

movies = {
    'movie_id': [1, 2, 3, 4, 5],
    'title': ['The Shawshank Redemption', 'The Godfather', 'The Dark Knight', 'Pulp Fiction', '12 Angry Men']
}
movies_df = pd.DataFrame(movies)

user_item_matrix = ratings_df.pivot_table(index='user_id', columns='movie_id', values='rating', fill_value=0)

user_similarity = cosine_similarity(user_item_matrix)

def get_recommendations(user_id, top_n=5):

    similar_users = pd.Series(user_similarity[user_id - 1], index=user_item_matrix.index).sort_values(ascending=False)
    similar_users = similar_users[similar_users.index != user_id]

    recommendations = []
    for similar_user_id, similarity_score in similar_users.items():
        similar_user_ratings = user_item_matrix.loc[similar_user_id]
        unrated_movies = similar_user_ratings[user_item_matrix.loc[user_id] == 0]
        recommendations.extend(unrated_movies[unrated_movies > 3].index.tolist())

    recommendations = pd.Series(recommendations).value_counts().head(top_n).index.tolist()
    return movies_df[movies_df['movie_id'].isin(recommendations)]['title'].tolist()

print(get_recommendations(1))

print(get_recommendations(3))

['The Godfather', 'Pulp Fiction', '12 Angry Men']
['The Godfather', 'The Dark Knight', 'Pulp Fiction']


DataSet Information - ID– Contains the separate keys for
customer and movies.
Rating– A section contains the user
ratings for all the movies.
Genre– Highlights the category of the
movie.
Movie Name– Name of the movie with
respect to the movie id.

In [None]:
import pandas as pd

data = {
    'CustomerID': [1, 2, 3, 4, 5, 1, 2, 3, 4, 5],
    'MovieID': [101, 102, 103, 104, 105, 106, 107, 108, 109, 110],
    'Rating': [5, 4, 3, 4, 5, 2, 3, 4, 5, 3],
    'Genre': ['Action', 'Drama', 'Comedy', 'Action', 'Drama',
              'Sci-Fi', 'Action', 'Comedy', 'Drama', 'Sci-Fi'],
    'MovieName': ['The Matrix', 'The Godfather', 'The Dark Knight', 'Pulp Fiction', 'Forrest Gump',
                  'Inception', 'Terminator', 'The Hangover', 'The Shawshank Redemption', 'Interstellar']
}

df = pd.DataFrame(data)

print(df)


   CustomerID  MovieID  Rating   Genre                 MovieName
0           1      101       5  Action                The Matrix
1           2      102       4   Drama             The Godfather
2           3      103       3  Comedy           The Dark Knight
3           4      104       4  Action              Pulp Fiction
4           5      105       5   Drama              Forrest Gump
5           1      106       2  Sci-Fi                 Inception
6           2      107       3  Action                Terminator
7           3      108       4  Comedy              The Hangover
8           4      109       5   Drama  The Shawshank Redemption
9           5      110       3  Sci-Fi              Interstellar


1. Find out the list of most popular and liked genre
2. Create Model that finds the best suited Movie for one
user in every genre.
3. Find what Genre Movies have received the best and
worst ratings based on User Rating.

In [None]:
import pandas as pd

df = pd.DataFrame(data)

# Task 1:
genre_popularity = df['Genre'].value_counts()

genre_ratings = df.groupby('Genre')['Rating'].mean()

print("Most Popular Genres (by count of movies):")
print(genre_popularity)

print("\nMost Liked Genres (by average rating):")
print(genre_ratings)

# Task 2:
def best_movie_for_user_in_genres(user_id, df):
    user_data = df[df['CustomerID'] == user_id]
    best_movies = {}

    for genre in user_data['Genre'].unique():
        genre_movies = user_data[user_data['Genre'] == genre]
        best_movie = genre_movies.loc[genre_movies['Rating'].idxmax()]
        best_movies[genre] = best_movie['MovieName']

    return best_movies

user_id = 1
best_movies_for_user = best_movie_for_user_in_genres(user_id, df)
print(f"\nBest Movies for User {user_id} in Every Genre:")
for genre, movie in best_movies_for_user.items():
    print(f"{genre}: {movie}")

# Task 3
genre_avg_ratings = df.groupby('Genre')['Rating'].mean()

best_genre = genre_avg_ratings.idxmax()
worst_genre = genre_avg_ratings.idxmin()

print(f"\nBest Genre (Highest Average Rating): {best_genre} with Rating: {genre_avg_ratings[best_genre]:.2f}")
print(f"Worst Genre (Lowest Average Rating): {worst_genre} with Rating: {genre_avg_ratings[worst_genre]:.2f}")


Most Popular Genres (by count of movies):
Genre
Action    3
Drama     3
Comedy    2
Sci-Fi    2
Name: count, dtype: int64

Most Liked Genres (by average rating):
Genre
Action    4.000000
Comedy    3.500000
Drama     4.666667
Sci-Fi    2.500000
Name: Rating, dtype: float64

Best Movies for User 1 in Every Genre:
Action: The Matrix
Sci-Fi: Inception

Best Genre (Highest Average Rating): Drama with Rating: 4.67
Worst Genre (Lowest Average Rating): Sci-Fi with Rating: 2.50
