#NEWS RECOMMENDATION USING GNN

##🟢 Step 1: Set Up the Environment In your first cell, install the required libraries:

This code silently installs PyTorch 2.1.0+cu121, torchvision 0.16.0+cu121, and torchaudio 2.1.0 from the PyTorch cu121 wheel index.

In [1]:
!pip install -q torch==2.1.0+cu121 torchvision==0.16.0+cu121 torchaudio==2.1.0 --extra-index-url https://download.pytorch.org/whl/cu121


In [2]:
!pip install -q torch-scatter -f https://data.pyg.org/whl/torch-2.1.0+cu121.html --timeout=300
!pip install -q torch-sparse -f https://data.pyg.org/whl/torch-2.1.0+cu121.html --timeout=300
!pip install -q torch-cluster -f https://data.pyg.org/whl/torch-2.1.0+cu121.html --timeout=300
!pip install -q torch-spline-conv -f https://data.pyg.org/whl/torch-2.1.0+cu121.html --timeout=300
!pip install -q torch-geometric --timeout=300


In [3]:
import torch
from torch_geometric.data import Data
import contextlib, io

# Redirect stdout to silence the output
with contextlib.redirect_stdout(io.StringIO()):
    print("PyTorch version:", torch.__version__)
    print("CUDA version:", torch.version.cuda)
    print("PyTorch Geometric is installed and accessible!")



## 🟢 **Step 2: Create a Toy Dataset**

###Create a small dataset that simulates users, news articles, and user–news interactions. In a new cell, paste:

In [4]:
import pandas as pd

# Define a small list of users.
users = ['User0', 'User1', 'User2', 'User3', 'User4']

# Create a DataFrame for news articles with news IDs and titles.
news = pd.DataFrame({
    'news_id': ['News0', 'News1', 'News2', 'News3', 'News4', 'News5', 'News6', 'News7', 'News8', 'News9'],
    'title': [
        "Economic Growth in 2025",
        "New Advances in AI",
        "Global Warming Trends",
        "Political Shifts Worldwide",
        "Sports Update: Finals",
        "Innovative Startups to Watch",
        "Health: New Breakthroughs",
        "Tech Gadgets of Tomorrow",
        "Art and Culture Trends",
        "Local News: Community Events"
    ]
})

# Create a DataFrame to represent interactions,
# where each row indicates that a user interacted with a specific news article.
interactions = pd.DataFrame({
    'user_id': ['User0', 'User0', 'User1', 'User1', 'User2', 'User3', 'User3', 'User4', 'User4', 'User4'],
    'news_id': ['News0', 'News1', 'News1', 'News2', 'News3', 'News0', 'News4', 'News2', 'News6', 'News9']
})

# Display the dataset to verify its contents.
print("Users:")
print(users)
print("\nNews Articles:")
print(news)
print("\nUser Interactions:")
print(interactions)


Users:
['User0', 'User1', 'User2', 'User3', 'User4']

News Articles:
  news_id                         title
0   News0       Economic Growth in 2025
1   News1            New Advances in AI
2   News2         Global Warming Trends
3   News3    Political Shifts Worldwide
4   News4         Sports Update: Finals
5   News5  Innovative Startups to Watch
6   News6     Health: New Breakthroughs
7   News7      Tech Gadgets of Tomorrow
8   News8        Art and Culture Trends
9   News9  Local News: Community Events

User Interactions:
  user_id news_id
0   User0   News0
1   User0   News1
2   User1   News1
3   User1   News2
4   User2   News3
5   User3   News0
6   User3   News4
7   User4   News2
8   User4   News6
9   User4   News9


#🟢 **Step 3: Preprocess News Titles with GloVe**

##We now download small (50-dimensional) GloVe embeddings, load them, and convert each news title into a numeric vector by averaging word embeddings.

###3.A – Download and Load GloVe Embeddings

In [5]:
import os
import numpy as np

# Create a directory for GloVe if not already present
if not os.path.exists('.glove'):
    os.makedirs('.glove')

# Download the 50D GloVe embeddings
!wget -q http://nlp.stanford.edu/data/glove.6B.zip -P .glove
!unzip -o -q .glove/glove.6B.zip -d .glove


unzip:  cannot find or open .glove/glove.6B.zip, .glove/glove.6B.zip.zip or .glove/glove.6B.zip.ZIP.


###3.B – Convert News Titles to Embeddings

In [6]:
# Fix any broken versions
!pip install --upgrade --force-reinstall scipy gensim

# Now try again
import gensim.downloader as api

glove_embeddings = api.load("glove-wiki-gigaword-50")

print(f"Loaded {len(glove_embeddings)} GloVe vectors.")
print("Example:", glove_embeddings['king'][:5])

Collecting scipy
  Using cached scipy-1.15.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
Collecting gensim
  Using cached gensim-4.3.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (8.1 kB)
Collecting numpy<2.5,>=1.23.5 (from scipy)
  Using cached numpy-2.3.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (62 kB)
  Using cached numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
Collecting scipy
  Using cached scipy-1.13.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
Collecting smart-open>=1.8.1 (from gensim)
  Using cached smart_open-7.1.0-py3-none-any.whl.metadata (24 kB)
Collecting wrapt (from smart-open>=1.8.1->gensim)
  Using cached wrapt-1.17.2-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.4 kB)
Using cached gensim-4.3.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.7 MB)
Using ca

Loaded 400000 GloVe vectors.
Example: [ 0.50451   0.68607  -0.59517  -0.022801  0.60046 ]


###3.C Convert Titles to Embeddings

In [7]:
import re
import pandas as pd

def title_to_embedding(title, embeddings_dict, embedding_dim=50):
    # Tokenize: lowercase and extract words using regex
    tokens = re.findall(r'\b\w+\b', title.lower())
    valid_vectors = [embeddings_dict[token] for token in tokens if token in embeddings_dict]

    if valid_vectors:
        return np.mean(valid_vectors, axis=0)  # average the word vectors
    else:
        return np.zeros(embedding_dim, dtype='float32')  # fallback if no valid token

# Apply this function to each news title
news['embedding'] = news['title'].apply(lambda x: title_to_embedding(x, glove_embeddings))

# Inspect the result
print(news[['news_id', 'title', 'embedding']].head())



  news_id                       title  \
0   News0     Economic Growth in 2025   
1   News1          New Advances in AI   
2   News2       Global Warming Trends   
3   News3  Political Shifts Worldwide   
4   News4       Sports Update: Finals   

                                           embedding  
0  [0.36180013, 0.21652225, 0.3831025, -0.2690132...  
1  [0.07600999, 0.03465251, -0.009784758, 0.30647...  
2  [0.064313345, 0.4191023, 0.26325163, -0.171656...  
3  [0.26426998, 0.020493334, 0.09324334, -0.13158...  
4  [-0.7301367, 0.7985837, -0.00303333, 1.28415, ...  


#🟢 Step 4: Build the Bipartite Graph Using PyTorch Geometric

###4.A – Map Node IDs to Indices

In [8]:
import numpy as np
import torch
from torch_geometric.data import Data

# Map each user to a numerical index
user_to_index = {user: idx for idx, user in enumerate(users)}

# Extract all news IDs from the news DataFrame
all_news_ids = news['news_id'].tolist()
news_to_index = {nid: idx for idx, nid in enumerate(all_news_ids)}

num_users = len(users)
num_news = len(all_news_ids)
total_nodes = num_users + num_news

###4.B – Create Node Features

In [9]:
# --- USER NODE FEATURES ---
# One-hot embedding for each user. Shape: [num_users, num_users]
user_features = np.eye(num_users, dtype=np.float32)

# We pad from [num_users, num_users] to [num_users, 50]
if user_features.shape[1] < 50:
    pad_size = 50 - user_features.shape[1]
    user_features = np.pad(user_features, ((0, 0), (0, pad_size)), 'constant', constant_values=0)

# Convert user_features to a Torch tensor
x_users = torch.tensor(user_features, dtype=torch.float)  # shape: [num_users, 50]

# --- NEWS NODE FEATURES ---
# Stack the 50D embeddings from the news DataFrame
news_embeds = np.vstack(news['embedding'].values).astype(np.float32)  # shape: [num_news, 50]
x_news = torch.tensor(news_embeds, dtype=torch.float)

# Combine user and news features into a single feature matrix
x = torch.cat([x_users, x_news], dim=0)  # shape: [num_users + num_news, 50]

print("Combined feature matrix shape:", x.shape)


Combined feature matrix shape: torch.Size([15, 50])


###4.C – Build the Edges


In [10]:
edges_src = []
edges_dst = []

for _, row in interactions.iterrows():
    u_idx = user_to_index[row['user_id']]
    n_idx = news_to_index[row['news_id']] + num_users  # offset for news
    # Add undirected edges
    edges_src.extend([u_idx, n_idx])
    edges_dst.extend([n_idx, u_idx])

edge_index = torch.tensor([edges_src, edges_dst], dtype=torch.long)  # shape: [2, num_edges]

print("Edge index shape:", edge_index.shape)


Edge index shape: torch.Size([2, 20])


###4.D – Create the PyG Data Object

In [11]:
data = Data(x=x, edge_index=edge_index)
print("Number of nodes:", data.num_nodes)
print("Node feature shape:", data.x.size())
print("Number of edges:", data.num_edges)


Number of nodes: 15
Node feature shape: torch.Size([15, 50])
Number of edges: 20


#🟢 Step 5: Define a Simple GCN Model

In [12]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch_geometric.nn import GCNConv

# Define a simple 2-layer GCN model.
class SimpleGCN(nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        super(SimpleGCN, self).__init__()
        # First GCN layer
        self.conv1 = GCNConv(in_channels, hidden_channels)
        # Second GCN layer to produce final embeddings
        self.conv2 = GCNConv(hidden_channels, out_channels)

    def forward(self, data):
        x, edge_index = data.x, data.edge_index
        # Apply first convolution and non-linearity.
        x = self.conv1(x, edge_index)
        x = F.relu(x)
        # Apply second convolution.
        x = self.conv2(x, edge_index)
        return x

# Instantiate the model:
# Input features: 50 (from our node features)
# Hidden dimension: 32 (arbitrarily chosen for this toy project)
# Output embedding dimension: 16 (this is the dimension of the learned embeddings)
model = SimpleGCN(in_channels=50, hidden_channels=32, out_channels=16)

# Print the model summary
print(model)

# Run a forward pass using our graph data (data from Step 4)
model.eval()
with torch.no_grad():
    output_embeddings = model(data)
    print("Output embeddings shape:", output_embeddings.shape)

SimpleGCN(
  (conv1): GCNConv(50, 32)
  (conv2): GCNConv(32, 16)
)
Output embeddings shape: torch.Size([15, 16])


#🟢 Step 6: Train the GNN for Interaction Prediction

In [13]:
import random
import torch.nn.functional as F

# STEP 6.A: Prepare the Training Data

# Build a list of positive edges: each is a tuple (user_index, news_index)
positive_edges = []
for _, row in interactions.iterrows():
    user_idx = user_to_index[row['user_id']]
    news_idx = news_to_index[row['news_id']] + num_users  # offset for news nodes
    positive_edges.append((user_idx, news_idx))

# Function for negative sampling:
def sample_negative_edges(num_samples):
    neg_edges = []
    while len(neg_edges) < num_samples:
        # Randomly select a user and a news article
        u = random.randrange(0, num_users)
        n = random.randrange(0, num_news) + num_users  # offset for news
        # Only add if this (user, news) edge doesn't exist in positive_edges
        if (u, n) not in positive_edges:
            neg_edges.append((u, n))
    return neg_edges

num_positive = len(positive_edges)
negative_edges = sample_negative_edges(num_positive)

# Helper function: get dot-product scores for a list of edges.
def get_edge_scores(embeddings, edge_list):
    scores = []
    for (u_idx, n_idx) in edge_list:
        # Dot product between user and news embeddings.
        score = torch.dot(embeddings[u_idx], embeddings[n_idx])
        scores.append(score)
    return torch.stack(scores)

# STEP 6.B: Training Loop

optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
num_epochs = 200  # You can adjust the epochs as needed

for epoch in range(num_epochs):
    model.train()
    optimizer.zero_grad()

    # Compute all node embeddings with a forward pass
    embeddings = model(data)  # shape: [total_nodes, out_channels] i.e., [15, 16]

    # Get scores for both positive and negative samples
    pos_scores = get_edge_scores(embeddings, positive_edges)
    neg_scores = get_edge_scores(embeddings, negative_edges)

    # Create labels: positive = 1, negative = 0
    pos_labels = torch.ones(pos_scores.size(0))
    neg_labels = torch.zeros(neg_scores.size(0))

    # Concatenate scores and labels
    all_scores = torch.cat([pos_scores, neg_scores])
    all_labels = torch.cat([pos_labels, neg_labels])

    # Compute Binary Cross Entropy Loss (with logits)
    loss = F.binary_cross_entropy_with_logits(all_scores, all_labels)
    loss.backward()
    optimizer.step()

    if (epoch + 1) % 20 == 0:
        print(f"Epoch {epoch+1}/{num_epochs}, Loss: {loss.item():.4f}")

print("Training complete.")


Epoch 20/200, Loss: 0.0715
Epoch 40/200, Loss: 0.0003
Epoch 60/200, Loss: 0.0000
Epoch 80/200, Loss: 0.0000
Epoch 100/200, Loss: 0.0000
Epoch 120/200, Loss: 0.0000
Epoch 140/200, Loss: 0.0000
Epoch 160/200, Loss: 0.0000
Epoch 180/200, Loss: 0.0000
Epoch 200/200, Loss: 0.0000
Training complete.


#🟢 Step 7: Generate Recommendations

In [14]:
def recommend_for_user(user_id, top_k=3):
    # Basic checks to ensure required variables exist in the current scope
    required_vars = ['data', 'model', 'user_to_index', 'num_users', 'num_news', 'news']
    missing_vars = [var for var in required_vars if var not in globals()]
    if missing_vars:
        print("The following variables are missing. Please run previous steps:", missing_vars)
        return None

    # Check if user_id is in user_to_index:
    if user_id not in user_to_index:
        print(f"User '{user_id}' not found in user_to_index.")
        return None

    # Set the model to evaluation mode and disable gradient computation.
    model.eval()
    with torch.no_grad():
        # Run a forward pass to obtain the node embeddings.
        embeddings = model(data)

    # Extract the user's embedding
    user_idx = user_to_index[user_id]
    user_embedding = embeddings[user_idx]  # shape: [16]

    # Extract the news embeddings
    news_embeddings = embeddings[num_users : num_users + num_news]  # shape: [num_news, 16]

    # Calculate dot product scores between the user's embedding and each news embedding (still on GPU or CPU).
    scores = torch.matmul(news_embeddings, user_embedding)  # shape: [num_news]

    # Use torch.topk to get top-k indices (no NumPy call).
    top_values, top_indices = torch.topk(scores, top_k)  # both are Tensors

    # Get the corresponding news IDs from the news DataFrame
    news_ids = news['news_id'].tolist()

    # Build recommendation list
    # Convert each score to a Python float using .item()
    recommended_news = []
    for i, idx in enumerate(top_indices):
        news_index = idx.item()
        recommended_news.append((news_ids[news_index], top_values[i].item()))

    return recommended_news

# Example usage
recommendations = recommend_for_user('User0', top_k=3)
if recommendations is not None:
    print("Recommendations for User0:", recommendations)

Recommendations for User0: [('News8', 28.454833984375), ('News1', 27.753583908081055), ('News0', 14.280116081237793)]


#🟢 Step 8: Build a Simple UI with Gradio

In [15]:
!pip install -q --upgrade gradio


[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m54.3/54.3 MB[0m [31m13.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m323.6/323.6 kB[0m [31m19.5 MB/s[0m eta [36m0:00:00[0m
[?25h

In [18]:
import gradio as gr

# ✅ User and City keys (file paths no longer needed here)
user_images = {
    "User0": None,
    "User1": None,
    "User2": None,
    "User3": None,
    "User4": None,
}

city_images = {
    "Delhi": None,
    "Mumbai": None,
    "Pune": None,
    "Bangalore": None,
    "Srinagar": None,
}

time_slots = ["Morning", "Afternoon", "Evening", "Night"]

# ✅ Output function connected to GNN
def display_output(user, time_of_day, city):
    try:
        recommendations = recommend_for_user(user, top_k=3)
        if recommendations is None:
            return "⚠️ Recommendation system not ready. Please check your model."
        result = f"📍 City: {city} | 🕒 Time: {time_of_day}\n\n"
        result += "\n".join([f"📰 {news_id} — Score: {score:.2f}" for news_id, score in recommendations])
        return result
    except Exception as e:
        return f"❌ Error: {str(e)}"

# ✅ Build the Gradio UI
with gr.Blocks(
    css="""
        body {
            background: linear-gradient(135deg, #0f2027, #203a43, #2c5364);
        }
        .gr-button {
            background: linear-gradient(to right, #667eea, #764ba2) !important;
            color: white !important;
            font-weight: bold;
        }
        h1, h2, label {
            color: white !important;
        }
    """
) as demo:
    gr.Markdown("## 🌙 GNN-Based News Recommender")

    # 👤 User selection
    gr.Markdown("### 👤 Select a User")
    with gr.Row():
        user_choice = gr.Radio(choices=list(user_images.keys()), label="Select User")

    with gr.Row():
        for name in user_images.keys():
            gr.Image(
                label=f"Upload {name} Image",
                type="filepath"
            )

    # 🕒 Time of Day
    gr.Markdown("### 🕓 Select Time of Day")
    time_input = gr.Dropdown(choices=time_slots, label="Time of Day", value="Night")

    # 🏙️ City selection
    gr.Markdown("### 🏙️ Select a City")
    with gr.Row():
        city_choice = gr.Radio(choices=list(city_images.keys()), label="City")

    with gr.Row():
        for name in city_images.keys():
            gr.Image(
                label=f"Upload {name} Image",
                type="filepath"
            )

    # 🎯 Submit and Output
    gr.Markdown("### ✅ Get Recommendations")
    submit = gr.Button("🎬 Get Recommendations")
    output = gr.Textbox(label="📦 Output", lines=5)

    submit.click(fn=display_output, inputs=[user_choice, time_input, city_choice], outputs=output)

demo.launch(share=True)


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://f73ff0ff121edba1c9.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


