<a href="https://colab.research.google.com/github/holatung/DSN-AI-INVASION-OSUN/blob/main/MyNews_AI_News_Recommender_System.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The code and algorithm are designed as part of the research on the call for papers on '**Recommender Systems in the Media, Creative, and Cultural Industries'**. A real-world production system would require much more sophisticated techniques, a larger dataset, and continuous monitoring for bias.

With the growing dominance of online platforms in the media, creative, and cultural industries (MCCI), recommender systems have become omnipresent in MCCI, from production to distribution and consumption. These systems leverage training data, algorithms, and prediction models to deliver automated, flexible, and immediate responses tailored to the personal preferences of content consumers. By filtering, ranking, and eventually recommending content to users, recommender systems influence exposure in two ways: they can promote items that might otherwise go unnoticed, or they can hide, and thus virtually remove, other items. In both cases, algorithms are not neutral, raising important questions about how they are designed and implemented, who decides, and on what basis (Kunaver & Požrl, 2017)

**Algorithm: Content-Based Recommender System for News:**
This algorithm recommends news articles by identifying articles with topics similar to those a user has shown interest in.

* Data Preparation: The algorithm first requires a dataset of news articles. Each article needs to be associated with one or more of the specified categories: Politics, Entertainment, Lifestyle, Sports, Health, Business, Opinion, Tech, and Fashion.


* User Profile Creation: For each user, a "profile" is created. This profile is a vector representing their interests. For example, if a user likes a "Politics" and a "Tech" article, their profile vector will have a positive value for those categories.

* Content Vectorization: Each news article is also represented as a vector. For simplicity, we can use a one-hot encoding scheme where a '1' indicates the presence of a specific topic and a '0' indicates its absence.

* Similarity Calculation: The core of the recommendation is calculating the similarity between the user's profile vector and the content vector of each unread news article. The algorithm uses Cosine Similarity, a common metric that measures the angle between two vectors. A smaller angle (closer to 1) means the vectors are more similar.

* Recommendation Generation: The algorithm ranks the unread articles based on their similarity score. The top 'N' articles with the highest scores are then recommended to the user.

This approach ensures that if a user reads and enjoys a "Health" article, they are more likely to see other "Health" or related articles, helping them discover relevant content within their interests.

This Python code implements the algorithm described above. It uses a sample dataset and the *scikit-learn* library to perform the similarity calculations.

In [None]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from collections import defaultdict
import random

def create_sample_dataset():
    """
    Creates a sample dataset of news articles with their topics.
    In a real application, this would be a large database or a DataFrame
    loaded from a file.
    """
    data = {
        'article_id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
        'title': [
            "Presidential Race Heats Up", "New Movie Breaks Box Office Records",
            "Simple Yoga Poses for Beginners", "Messi Scores Hat-Trick",
            "New Study on Heart Disease", "Stock Market Sees Major Gains",
            "Analyzing the Latest Policy Debate", "Fashion Week 2025 Highlights",
            "Tech Giant Launches New Phone", "Debate on Climate Change Policy",
            "Healthy Eating Tips", "New Fashion Trends for Fall",
            "SpaceX Rocket Launch Successful", "Op-Ed: The Future of Democracy",
            "Local Business Thrives"
        ],
        'topics': [
            'Politics', 'Entertainment', 'Health', 'Sports', 'Health',
            'Business', 'Politics', 'Fashion', 'Tech', 'Politics',
            'Health', 'Fashion', 'Tech', 'Opinion', 'Business'
        ]
    }
    return pd.DataFrame(data)

def get_recommendations(user_id, num_recommendations=5):
    """
    Generates news recommendations for a given user.

    Args:
        user_id (int): The ID of the user.
        num_recommendations (int): The number of articles to recommend.

    Returns:
        list: A list of recommended article titles.
    """
    # 1. Load data and simulate user interactions
    articles_df = create_sample_dataset()
    user_interactions = {
        101: [1, 7, 10], # User 101 likes Politics articles
        102: [2, 8, 11, 12], # User 102 likes Entertainment, Fashion, Health
        103: [4, 6] # User 103 likes Sports and Business
    }

    if user_id not in user_interactions:
        print(f"User {user_id} not found. Providing random recommendations.")
        return random.sample(articles_df['title'].tolist(), num_recommendations)

    liked_articles = articles_df[articles_df['article_id'].isin(user_interactions[user_id])]

    # 2. Vectorize the topics
    # We use a simple one-hot encoding for topics to create the feature matrix
    all_topics = sorted(articles_df['topics'].unique().tolist())

    def vectorize_topics(topics):
        vector = [0] * len(all_topics)
        for topic in topics.split(','):
            if topic in all_topics:
                vector[all_topics.index(topic)] = 1
        return vector

    articles_df['vector'] = articles_df['topics'].apply(vectorize_topics)

    # 3. Create the user profile
    user_profile_vector = [0] * len(all_topics)
    for topic in liked_articles['topics']:
        user_profile_vector[all_topics.index(topic)] = 1

    # 4. Calculate similarity scores for unread articles
    unread_articles = articles_df[~articles_df['article_id'].isin(user_interactions[user_id])]

    # Calculate cosine similarity between the user profile and each unread article
    similarity_scores = cosine_similarity([user_profile_vector], list(unread_articles['vector']))[0]

    unread_articles['similarity_score'] = similarity_scores

    # 5. Get top recommendations
    recommended_articles = unread_articles.sort_values(
        by='similarity_score', ascending=False
    ).head(num_recommendations)

    return recommended_articles['title'].tolist()

if __name__ == "__main__":
    # Example usage for different users
    user_101_recs = get_recommendations(user_id=101)
    print("Recommendations for User 101 (likes Politics):")
    for rec in user_101_recs:
        print(f"- {rec}")
    print("\n" + "="*50 + "\n")

    user_102_recs = get_recommendations(user_id=102)
    print("Recommendations for User 102 (likes Entertainment, Fashion, Health):")
    for rec in user_102_recs:
        print(f"- {rec}")
    print("\n" + "="*50 + "\n")

    user_103_recs = get_recommendations(user_id=103)
    print("Recommendations for User 103 (likes Sports and Business):")
    for rec in user_103_recs:
        print(f"- {rec}")
    print("\n" + "="*50 + "\n")

Recommendations for User 101 (likes Politics):
- New Movie Breaks Box Office Records
- Simple Yoga Poses for Beginners
- Messi Scores Hat-Trick
- New Study on Heart Disease
- Stock Market Sees Major Gains


Recommendations for User 102 (likes Entertainment, Fashion, Health):
- Simple Yoga Poses for Beginners
- New Study on Heart Disease
- Presidential Race Heats Up
- Messi Scores Hat-Trick
- Stock Market Sees Major Gains


Recommendations for User 103 (likes Sports and Business):
- Local Business Thrives
- New Movie Breaks Box Office Records
- Simple Yoga Poses for Beginners
- New Study on Heart Disease
- Presidential Race Heats Up




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  unread_articles['similarity_score'] = similarity_scores
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  unread_articles['similarity_score'] = similarity_scores
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  unread_articles['similarity_score'] = similarity_scores


In [None]:
This Python code implements the algorithm described above. It uses a sample dataset and the scikit-learn library to perform the similarity calculations.