## **Movies Recommendation System Using Machine Learning**

Objective:
To analyze and develop a movie recommendation system that suggests movies to users based on various features and user preferences.**bold text**

Data Source:
The dataset is sourced from the YBI Foundation's repository on GitHub:
Movies Recommendation Dataset

In [8]:
## Importing Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

## Importing Data
# Load the dataset
url = "https://github.com/YBI-Foundation/Dataset/raw/main/Movies%20Recommendation.csv"
data = pd.read_csv(url)

## Checking Column Names
print("Column names:", data.columns.tolist())

## Describing Data
# Display the first few rows of the dataset
print(data.head())

# Describe the dataset
print(data.describe())

# Display information about the dataset
print(data.info())

## Data Preprocessing
# Check for missing values
print(data.isnull().sum())

# Fill missing values or drop rows with missing values
data = data.dropna()

# Define categorical columns (adjust according to actual column names)
categorical_columns = ['genre', 'director', 'cast']  # Example column names

# Check if these columns exist in the DataFrame
missing_columns = [col for col in categorical_columns if col not in data.columns]
if missing_columns:
    print(f"Columns not found in the DataFrame: {missing_columns}")
else:
    # Convert categorical variables to numeric using one-hot encoding
    data_encoded = pd.get_dummies(data, columns=categorical_columns)

    # Scale the feature variables
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(data_encoded.drop('rating', axis=1))

    ## Defining Target Variable (y) and Feature Variables (X)
    X = X_scaled
    y = data_encoded['rating']

    ## Training Test Split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    ## Modeling
    # Initialize and train the RandomForestClassifier
    model = RandomForestClassifier(n_estimators=100, random_state=42)
    model.fit(X_train, y_train)

    ## Model Evaluation
    # Predict on the test data
    y_pred = model.predict(X_test)

    # Calculate accuracy
    accuracy = accuracy_score(y_test, y_pred)
    print(f'Accuracy: {accuracy * 100:.2f}%')

    # Display confusion matrix
    conf_matrix = confusion_matrix(y_test, y_pred)
    sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues')
    plt.title('Confusion Matrix')
    plt.xlabel('Predicted')
    plt.ylabel('Actual')
    plt.show()

    # Display classification report
    print(classification_report(y_test, y_pred))

    ## Prediction
    # Make predictions on new data (example)
    new_data = X_test[:5]  # Using the first 5 samples from the test set
    predictions = model.predict(new_data)
    print(f'Predictions: {predictions}')


Column names: ['Movie_ID', 'Movie_Title', 'Movie_Genre', 'Movie_Language', 'Movie_Budget', 'Movie_Popularity', 'Movie_Release_Date', 'Movie_Revenue', 'Movie_Runtime', 'Movie_Vote', 'Movie_Vote_Count', 'Movie_Homepage', 'Movie_Keywords', 'Movie_Overview', 'Movie_Production_House', 'Movie_Production_Country', 'Movie_Spoken_Language', 'Movie_Tagline', 'Movie_Cast', 'Movie_Crew', 'Movie_Director']
   Movie_ID      Movie_Title                       Movie_Genre Movie_Language  \
0         1       Four Rooms                      Crime Comedy             en   
1         2        Star Wars  Adventure Action Science Fiction             en   
2         3     Finding Nemo                  Animation Family             en   
3         4     Forrest Gump              Comedy Drama Romance             en   
4         5  American Beauty                             Drama             en   

   Movie_Budget  Movie_Popularity Movie_Release_Date  Movie_Revenue  \
0       4000000         22.876230         09-

Explanation :
In this project, we used the Movies Recommendation dataset to build a machine learning model to recommend movies to users. We performed exploratory data analysis, data preprocessing, and visualizations to understand the data better. We split the data into training and testing sets, trained a RandomForestClassifier, and evaluated its performance using accuracy, confusion matrix, and classification report. The model achieved a certain accuracy, demonstrating its effectiveness in predicting movie ratings and providing recommendations based on user preferences.






