MOVIE RECOMMENDATION

-------------

## **Objective**

machine learning

## **Data Source**

CVS

## **Import Library**

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

## **Import Data**

In [None]:
df = pd.read_csv(r"https://raw.githubusercontent.com/YBIFoundation/Dataset/main/Movies%20Recommendation.csv")
df.head()
df.info()

## **Describe Data**

In [None]:
df.describe()

## **Data Visualization**

In [None]:
# Create a bar plot of movie ratings
plt.bar(df['Movie_Title'], df['Movie_Vote_Count'])

# Customize the plot
plt.xlabel('Movie_Title')
plt.ylabel('Movie_Vote_Count')
plt.title('Movie Ratings')
plt.xticks(rotation=90)

# Display the plot
plt.show()

## **Data Preprocessing**

In [None]:
# Handling Missing Values
df.dropna(inplace=True)

## **Define Target Variable (y) and Feature Variables (X)**

In [None]:
# Define target (y) and features (X)
# Define the target variable (y) as 'Popularity'
y = df['Movie_Popularity']

# Select relevant features as the input (X)
X = df[['Movie_Vote','Movie_Genre', 'Movie_Runtime', 'Movie_Director']]

# Display the target variable and features
print("Target variable (y):")
print(y.head())

print("\nFeatures (X):")
print(X.head())

## **Train Test Split**

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Display the shape of the train and test sets
print("Training set shape:")
print(X_train.shape, y_train.shape)

print("\nTest set shape:")
print(X_test.shape, y_test.shape)

## **Modeling**

In [None]:
model = LinearRegression()
model.fit(X_train, y_train)

## **Prediction**

In [None]:

y_pred = model.predict(X_test)

## **Model Evaluation**

In [None]:
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

## **Explaination**

In the given example, we start by importing the necessary libraries, including pandas for data handling, scikit-learn for modeling, and relevant metrics for evaluation.

Next, we read the movie recommendation dataset from the provided URL into a pandas DataFrame (df).

The target variable, 'Popularity', is defined as the variable we want to predict, and the relevant features ('Rating', 'Genre', 'Duration', 'Director') are selected as the input variables.

We then split the dataset into training and test sets using the train_test_split function from scikit-learn. The test size is set to 0.2, meaning 20% of the data will be used for testing, while the remaining 80% will be used for training. A random state of 42 is set for reproducibility.

After splitting the data, we instantiate a linear regression model using LinearRegression() from scikit-learn.

The model is trained on the training set (X_train and y_train) using the fit method, which fits the linear regression model to the training data.

We then use the trained model to make predictions on the test set (X_test) using the predict method, storing the predicted values in y_pred.

To evaluate the performance of the model, we calculate the mean squared error (MSE) using the mean_squared_error function and the R-squared score using the r2_score function. Lower values of MSE indicate better model fit, while R-squared measures the proportion of the variance in the target variable explained by the model (closer to 1 indicates better fit).

Finally, we print the MSE and R-squared score as evaluation metrics for the linear regression model.

Remember, this is a basic example using linear regression. For a more sophisticated recommendation system, you might explore other algorithms like collaborative filtering or content-based filtering, as well as more advanced evaluation techniques and feature engineering methods specific to recommendation systems.