# Collaborative Filtering (CF)

### Overview
Collaborative Filtering is a popular technique in **recommender systems** that suggests items to a user based on **preferences of similar users** or **similar items**.  


**User-User CF**  
   - Finds users with similar tastes.  
   - Recommends items liked by similar users that the target user hasn't interacted with.  
   - Example: “People who liked X also liked Y.”

---

### Key Concepts
- **Similarity Metrics**: Measure closeness between users or items.  
  - Cosine similarity, Pearson correlation, Euclidean distance.  
- **Rating Matrix**: Users as rows, items as columns, values as ratings.  
- **Sparsity Problem**: Most users rate only a few items; methods like matrix factorization or neighborhood-based approaches help.

---

### Advantages
- No domain knowledge required (learns purely from data).  
- Captures latent preferences and trends.

### Limitations
- **Cold Start Problem**: New users or items with no ratings are hard to recommend.  
- **Scalability**: Large datasets require efficient similarity computations.  


In [1]:
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity

# Sample user-movie rating matrix
ratings = pd.DataFrame({
    "Movie1": [5, 4, 0, 1],
    "Movie2": [4, 0, 0, 1],
    "Movie3": [1, 1, 0, 5],
    "Movie4": [0, 1, 5, 4]
}, index=["User1", "User2", "User3", "User4"])

# Compute user-user similarity
user_similarity = cosine_similarity(ratings)
user_sim_df = pd.DataFrame(user_similarity, index=ratings.index, columns=ratings.index)

# Recommendation function
def recommend_movies(user, top_n=2):
    # Find most similar user
    similar_users = user_sim_df[user].sort_values(ascending=False).index[1:]
    for sim_user in similar_users:
        # Get movies rated by similar user but not by target user
        unrated = ratings.loc[user][ratings.loc[user] == 0].index
        rec_movies = ratings.loc[sim_user, unrated].sort_values(ascending=False)
        if not rec_movies.empty:
            return rec_movies.head(top_n).index.tolist()
    return []

# Example usage
print("Recommendations for User1:")
print(recommend_movies("User1"))

Recommendations for User1:
['Movie4']


# Content-Based Filtering

Content-based filtering is a recommendation technique that suggests items to users based on the **features of the items** and the **user's previous interactions**. Instead of relying on other users' preferences (like in collaborative filtering), it focuses on the **content characteristics**.


In [1]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd

# Sample dataset with new books
books = pd.DataFrame({
    "title": ["Mystic River", "The Silent Garden", "Cosmic Voyage", "Enchanted Tales"],
    "description": [
        "A gripping mystery set in a small town along a river",
        "A romantic story unfolding in a hidden, tranquil garden",
        "An exciting space journey exploring distant galaxies",
        "A collection of magical stories and fantastical adventures"
    ]
})

# TF-IDF vectorization
vectorizer = TfidfVectorizer(stop_words="english")
tfidf_matrix = vectorizer.fit_transform(books["description"])

# Cosine similarity
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

# Recommendation function
def recommend_books(title, top_n=2):
    idx = books[books["title"] == title].index[0]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)[1:top_n+1]
    book_indices = [i[0] for i in sim_scores]
    return books["title"].iloc[book_indices]

# Example usage
print("Recommendations for 'Mystic River':")
print(recommend_books("Mystic River"))


Recommendations for 'Mystic River':
1    The Silent Garden
2        Cosmic Voyage
Name: title, dtype: object
