# Evnto Recommendation System
The basis of recommendation system is to predict what the user might be interested in. From user data, the system would provide the closest events based on the similarity between the user data and events data (content-based filtering) or the behavior (collaborative filtering). Our approach is to use HuggingFace transformers package to develop a simple recommendation system based on embedding similarity.

In [1]:
import requests
import pandas as pd
from transformers import AutoTokenizer, AutoModel
import torch
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
import json

* **Fetching users and events data** from the provided API endpoints.

In [33]:
# Define API endpoints
USERS_API = 'https://example.com/api/users'  # Replace with actual endpoint
EVENTS_API = 'https://example.com/api/events'  # Replace with actual endpoint

# Define function to fetch data from API
def fetch_data_from_api(url):
    try:
        response = requests.get(url)
        response.raise_for_status()  # Raise exception for bad responses
        return response.json()
    except (requests.ConnectionError, requests.Timeout) as error:
        print(f"Error fetching data: {error}")
        return None

In case the connection with database has failed, we will use this initial data...

In [26]:
%%writefile users_data.json
[
  {
    "id": 1,
    "college_name": "Engineering",
    "user_skill": "machine learning",
    "user_interests": "AI, tech"
  },
  {
    "id": 2,
    "college_name": "Business",
    "user_skill": "finance",
    "user_interests": "economics, startups"
  }
]

Overwriting users_data.json


In [30]:
%%writefile events_data.json
[
    {
    "event_id": 1,
    "name": "Tech Talk",
    "description": "AI advancements",
    "start_date": "2024-09-15",
     "end_date": "2024-09-16",
     "type": "Talk",
     "status": "upcoming",
     "category": "tech"
     },
    {
    "event_id": 2,
    "name": "Music Fest",
    "description": "Live concert by a famous artist",
    "start_date": "2024-09-20",
     "end_date": "2024-09-21",
     "type": "Concert",
     "status": "upcoming",
     "category": "music"
     },
    {
    "event_id": 3,
    "name": "AI Conference",
    "description": "Advances in AI and ML",
    "start_date": "2024-09-10",
    "end_date": "2024-09-11",
    "type": "Conference",
    "status": "ended",
    "category": "tech"
    }
]

Overwriting events_data.json


In [15]:
def read_json_file(filename):
    """Reads a JSON file and returns its contents as a Python object.

    Args:
        filename (str): The name of the JSON file.

    Returns:
        The contents of the JSON file as a Python object.
    """

    with open(filename, 'r') as f:
        data = json.load(f)
    return data

In [31]:
INIT_USERS_DATA = read_json_file("users_data.json")
INIT_EVENTS_DATA = read_json_file("events_data.json")

In [35]:
# Fetch users and events data from the database (or fallback to initial data)
#users_data = fetch_data_from_api(USERS_API) or INIT_USERS_DATA
#events_data = fetch_data_from_api(EVENTS_API) or INIT_EVENTS_DATA
users_data = INIT_USERS_DATA
events_data = INIT_EVENTS_DATA

## Step 1: Filter out events that have ended
The events with status "ended" are filtered out so they aren’t recommended.

In [36]:
events_df = pd.DataFrame(events_data)
events_df = events_df[events_df['status'] != 'ended']

## Step 2: Combine event details into a single feature column for embedding

In [37]:
events_df['combined_features'] = events_df['type'] + " " + events_df['description'] + " " + events_df['category']

## Step 3: Initialize transformer model and tokenizer
The system generates text embeddings for events using the `sentence-transformers/all-MiniLM-L6-v2` model and creates a user profile embedding based on the user's skills and interests

In [38]:
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
model = AutoModel.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

## Step 4: Generate embeddings for events

In [39]:
def generate_event_embeddings(events):
    texts = events['combined_features'].tolist()
    inputs = tokenizer(texts, return_tensors='pt', padding=True, truncation=True)

    with torch.no_grad():
        event_embeddings = model(**inputs).last_hidden_state.mean(dim=1)

    return event_embeddings.cpu().numpy()

In [40]:
# Generate embeddings for events
event_embeddings = generate_event_embeddings(events_df)

## Step 5: Generate recommendations for each user
For each user, the system calculates cosine similarity between the user’s profile embedding and event embeddings. The events with the highest similarity scores are recommended.


In [41]:
def recommend_events_for_user(user_profile):
    # Combine user's profile data (skills and interests)
    user_profile_text = user_profile['user_skill'] + " " + user_profile['user_interests']

    # Tokenize and encode user profile
    inputs = tokenizer([user_profile_text], return_tensors='pt', padding=True, truncation=True)

    with torch.no_grad():
        user_embedding = model(**inputs).last_hidden_state.mean(dim=1)

    user_embedding = user_embedding.cpu().numpy()

    # Calculate similarity between user profile and event embeddings
    similarity_scores = cosine_similarity(user_embedding, event_embeddings)

    # Get indices of the most similar events
    recommended_event_indices = similarity_scores.argsort()[0][::-1]

    # Recommend top N events (let's say 5 events)
    top_n = 5
    recommended_events = events_df.iloc[recommended_event_indices[:top_n]]['event_id'].values

    return recommended_events

## Step 6: Loop through each user and generate recommendations

In [42]:
for user in users_data:
    recommended_events = recommend_events_for_user(user)
    print(f"Recommended events for user {user['id']}: {recommended_events}")

Recommended events for user 1: [1 2]
Recommended events for user 2: [1 2]


At the end, I would say that this approach can be easily extended to include **more sophisticated filtering** or switch to **collaborative filtering** if more behavioral data becomes available.