# Recommender Systems 1
### Pavan Singh - MSc Advanced Analytics - 2023
#### An Introduction to Recommender Systems in Python


        The objective of a RecSys is to recommend relevant items for users, based on their preference. Preference and relevance are subjective, and they are generally inferred by items users have consumed previously

<br/><br/>

***
As previously discussed in the README.md file, there are three main families of methods for RecSys, these are:

1. **Collaborative Filtering**: This method makes automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating). The underlying assumption of the collaborative filtering approach is that if a person `A` has the same opinion as a person `B` on a set of items, A is more likely to have B's opinion for a given item than that of a randomly chosen person.
    * Imagine that there is a website that sells books, and we have data on which books each user has purchased. We can use this data to build a recommendation system that suggests books to users based on the purchases of other similar users. 

<br/><br/>

2. **Content-Based Filtering**: This method uses only information about the description and attributes of the items users has previously consumed to model user's preferences. In other words, these algorithms try to recommend items that are similar to those that a user liked in the past (or is examining in the present). In particular, various candidate items are compared with items previously rated by the user and the best-matching items are recommended.
    * Imagine that there is a website that sells books, and each book has a number of attributes, such as the author, the publisher, and the genre. We can use these attributes to build a recommendation system that suggests books to users based on their past preferences. So we, for example, look at the attributes of the books person `A` has purchased and recommend similar books to them.

<br/><br/>

3. **Hybrid methods**: Recent research has demonstrated that a hybrid approach, combining collaborative filtering and content-based filtering could be more effective than pure approaches in some cases. These methods can also be used to overcome some of the common problems in recommender systems such as cold start and the sparsity problem.
    * Imagine that there is a website that sells books, and each book has a number of attributes, such as the author, the publisher, and the genre. We also have data on which books each user has purchased. We can use this data to build a recommendation system that combines content-based and collaborative filtering techniques. So first we can use content-based filtering to recommend books to user `A` based on the attributes of the books they have purchased. Then we can also use collaborative filtering to recommend books to user `A` based on the purchases of other similar users. We can then combine recommendations using various methods to give one list.

***
# Goal

We will demonstrate how to implement **Collaborative Filtering**, **Content-Based Filtering** and **Hybrid methods** in Python, for the task of providing personalized recommendations to the users. 

***
# Data

We make use of a data  set available on Kaggle: [*Articles sharing and reading from CI&T DeskDrop*](https://www.kaggle.com/datasets/gspmoreira/articles-sharing-reading-from-cit-deskdrop?select=users_interactions.csv). The dataset contains logs of users interactions on shared articles for the purpose of content Recommender Systems.  

The data set is useful as it contains additional item attributes, which would allow the application of Content-Based filtering techniques or Hybrid approaches, as well as collaborative filtering methods. 
***

In [None]:
# import libraries (collection of modules)

import numpy as np  
import scipy
import pandas as pd
import math
import random
import sklearn
from nltk.corpus import stopwords
from scipy.sparse import csr_matrix
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from scipy.sparse.linalg import svds
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt


- **Numpy**: provides support for large, multi-dimensional arrays and matrices of numerical data, and for performing mathematical operations on these
- **Pandas**: provides tools for data manipulation and analysis
- **Scikit-learn**: provides tools for machine learning and statistical modeling
- **Matplotlib**: provides functions for creating visualizations of data, such as plots, histograms, and scatter plots