# <u>MovieLens Tagging Data Research</u>

### This notebook demonstrates my research for the MovieLens group where we try to connect movie tags to movie recommendation. The overall end goal of this project is to be able to recommend movies based on which tags the users on the MovieLens website are having a positive interaction with. This means that a user puts a "+1" for that tag on that specific movie. 

### <u>10/26/23-11/16/23 </u> 

##### Shown below is the code that I wrote when I tried to correlate user-affect interaction with the names of different actors/actresses. We tried looking through this through random "active" movie data to see if there was sufficient "starter" data for the tagging system. By doing this we are hoping we could dive deeper into this aspect of the database to have a better understanding of how to recommend movies that users could like. 

In [1]:
import mysql.connector
import csv
import random

def get_user_input():
    while True:
        try:
            num_movies = int(input("Enter the number of movies you want to print: "))
            if num_movies > 0:
                return num_movies
            else:
                print("Please enter a positive number.")
        except ValueError:
            print("Invalid input. Please enter a valid number.")

db_config = {
    "host": "127.0.0.1",
    "user": "readonly",
    "password": "",
    "database": "ML3_mirror"
}

conn = mysql.connector.connect(**db_config)
cursor = conn.cursor()

movies = []

# Fetch movie data
query = "SELECT movieID, title, directedBy, starring FROM movie_data WHERE movieStatus = 2 AND rowType = 11"
cursor.execute(query)

# Loop through movies to get title, directedBy, and Starring within the database
for (movieID, title, directedBy, starring) in cursor:
    movie_info = {
        "movieID": movieID,
        "title": title,
        "directedBy": directedBy.split(", "),
        "starring": starring.split(", ")
    }
    movies.append(movie_info)

# Get the number of movies the user wants to print
num_movies_to_print = get_user_input()

# Ensure at least num_movies_to_print unique movies are selected
selected_movies = random.sample(movies, num_movies_to_print)

# Write statistics for the selected random movies and relevant tags to a CSV file
csv_filename = "random_movies_statistics.csv"

with open(csv_filename, "w", newline="") as csv_file:
    csv_writer = csv.writer(csv_file)

    # Write header
    csv_writer.writerow(["Movie Title", "Tag", "Affect Percentage"])

    # Loop through each random movie
    for movie in selected_movies:
        movie_title = movie["title"]

        tag_stats = {}

        # Fetch tag data for the current movie and actors
        query = f"SELECT tag, uniqueUsers, totalPositive FROM tag_data WHERE tag IN ({', '.join(['%s']*len(movie['starring']))})"
        cursor.execute(query, movie['starring'])

        # Get tag, uniqueUsers, and totalPositive values for the current movie
        for (tag, uniqueUsers, totalPositive) in cursor:
            tag_stats[tag] = {
                "uniqueUsers": uniqueUsers,
                "totalPositive": totalPositive,
            }

        # Check if there are tags related to the actors and actresses in the selected movie
        if not tag_stats:
            # Write a row to the CSV file indicating no tags were found for the actors in the movie
            csv_writer.writerow([movie_title, "No tags found", "N/A"])
            continue

        user_affect = {}

        # Calculate user influence for the current movie
        for actor in movie["starring"]:
            for tag, stats in tag_stats.items():
                if actor in tag:
                    user_affect[tag] = (
                        user_affect.get(tag, 0) + (stats["totalPositive"] / stats["uniqueUsers"])
                    )

        user_total_affect = sum(user_affect.values())

        # Calculate the percentage of influence for each tag
        user_percentage_influence = {}
        if user_total_affect != 0:
            user_percentage_influence = {
                tag: round((influence / user_total_affect) * 100, 2) for tag, influence in user_affect.items()
            }

        # Write data for each tag that includes an actor's or actress's name
        for tag, percentage in user_percentage_influence.items():
            csv_writer.writerow([movie_title, tag, '{:.2f}'.format(percentage)])

# Close the database connection
cursor.close()
conn.close()

print(f"Statistics for {num_movies_to_print} random movies written to {csv_filename}")

Statistics for 5 random movies written to random_movies_statistics.csv


In [8]:
import pandas as pd 

In [9]:
data = pd.read_csv("random_movies_statistics.csv")

#### Shown below is the resulting csv file that is outputed with the title of the random movie selected, the tags that were interacted with and the percentage of user-interaction with that tag out of all the tags on that movie.

In [10]:
data.head(30)

Unnamed: 0,Movie Title,Tag,Affect Percentage
0,Worm (2014),No tags found,
1,"Happy March 8th, Men! (2014)",No tags found,
2,"Kindred, The (1986)",Rod Steiger,100.0
3,Jannat 2 (2012),No tags found,
4,Apocalypse '45 (2020),,100.0


### <u>12/1/23-Present</u>

##### As of right now, since we figured out there is clear evidence of sufficient tag data to use for movie recommendation we are attempting to deeper into finding a way to connect the results of tag interactions with another category to recommend movies effectively to users. Below is shown my previous proposal from 12/1/23, and then we had gone on Winter Break and I had not really done anything after the meeting nor have we had another meeting yet.

### Question That is Being Asked: 
     What is the meaning of the valence/total Application of tags in the database?

### Findings: 
    From what I've found in the databases, it appears that when diving deeper into specific tables, the meaning behind numNuetral, numPosiitve, and numNegative is the number of possibilities of effects for each tag. That means that there is only one option for each which means you felt neutral, positive, or Negative for each tag. I tried selecting the counts for what affect through the tag_events table. I believe that this is just the total amount of times that the affect was used for all tags in all of the movies. I'm not exactly sure if this is the correct interpretation of what I was trying to do, or whether or not this is what Daniel had asked for at the last meeting, but this is how I understood it. I would love it if somebody from the last meeting would give me feedback on whether or not this is correct. 
### Proposal for Moving Forward: 
    I think that moving forward, it would be interesting to move forward with the idea of figuring out how we can suggest movies based on how users are using the affect on tags. For example, if a user puts a +1 for a tag, then we can use that in-time data to affect the "we think you'd like these movies" section of the website and then it would give movies with that exact tag or similar tags. If a user puts a -1 for a tag then we would find movies that have tags that are opposite of that tag that the user didn't like. If a user uses the neutral affect on the tag then there would be no effect on which movies are recommended. 

#### New Proposal as of 1/22/24:

##### What if we looked at not only the tags in general but maybe tried to look at more specifically the tags throughout specific genres? We could try to see if there is similarity in tags for example between actors, themes in movies, or other types of tags in specific genres of movies. Then we could try to recommend movies to users who liked movies in that genre based on the tag that they liked. For example, say John Doe was a tag for one of the movies and then the user puts a "+1" for that tag. Then on the homepage, there could be a section that loads other movies of that genre containing that John Doe tag. I think we have usable the data for this as we have the movie genre data and the tag data in our database. With that data, we can essentially loop through all the different active movies throughout the different genres using real-time user interactions with tags to keep updating the homepage with different tags. 