# **Research**

# Recommender Systems
Recommender systems are strategies designed to make decisions for users when the available information is complex.
The aim of a recommender system is to help users make decisions about choices about their preferences when there is a lack of experience with some of the available options. Large amounts of historical data are given to the recommender system and the system will output personalised recommendations for the user.

# Approaches to Recommender Systems
There are a variety of techniques that a recommender system may use. Some examples include collaborative filtering, content-based filtering and hybrid filtering. These can utilise many statistical methods, including neural networks and graphical approaches.

Collaborative filtering identifies users with similar preferences and uses the similar data to make recommendations to the user of interest.

Content-based filtering focuses on the characteristics of the items that it needs to recommend, disregarding preferences of other users.

There are drawbacks for the two methods described. Collaborative filtering encounters problems with cold-starts, meaning there can be difficulty with giving meamningful recommendations for new users or items. Furthermore, sparse data has a significant impact on the quality of the predictions, and there may be issues with scalability depending on the specific collaborative technique used. Content-based filtering also has problems when dealing with sparse data and may give limited analysis on the given content.

Hybrid filtering can be used to combine collaborative and content-based filtering, allowing us to use both the information about similar users and the characteristics of the items. This may mitigate some of the drawbacks and improve the accuracy of a recommender system. There are a number of different approaches to hybrid filtering, inlcuding weighted hybrid, mixed hybrid and meta-level hybrid.

# Process of a Recommender System
There are 3 phases in a recommender system. Firstly, all relevant data is collected and pre-processed. This data may be explicit or implicit feedback. Explicit feedback can be item ratings that the user gives, and this type of feedback is seen as more reliable. Implicit feedback is less reliable, however, this does not need any user effort since this may be clicks and counts of listens or views. Hybrid filtering allows us to gain knowledge from both types of feedback. In the next phase, the recommender system learns from the chosen data, using particular algorithms, depending on the type of recommender system. Finally, the system makes recommendations or gives predictions on items that the user will hopefully like.

# Collaborative Filtering & Methods

Collaborative filtering creates a user-item matrix with values corresponding to the users preferences. Next, using a chosen similarity metric, the similarities between users' preferences are used to give recommendations for each user. Each user will be given recommendations for items that they have not given feedback for, but have positive feedback from users similar to the chosen user. These recommendations may also be predictions. 

The two types of collaborative filtering techniques are model-based and memory-based.

## Memory-based Methods
Memory-based collaborative filtering can be user-based or item-based. User-based techniques compute the similarities between users based on their implicit feedback for the same item. Then, the predicted rating or given feedback is calculated using weighted averages of the item's ratings given by similar users. The weights are the similarities of the other users with the chosen item. Item-based techniques work similalrly but use the similarity between items instead of the similarity between users. Both of these methods form a similarity matrix.

## Similarity
There are different similarity measures that can be chosen. The Pearson correlation coefficient measures linear relation between two variables, and the cosine similarity measures the simialrirty between two vectors depending on the angle between them in a vector space. Similarity may also be referenced as the distance metric or correlation metric.

## Model-based Methods
Model-based collaborative filtering can be a lot quicker than memory-based methods. An example of this is the singular value decomposition (SVD). These methods use the user-item matrix to find rules between items and uses these rules to give a list of recommendations. If data is sparse, then model-based methods are recommended to deal with this. More advanced model-based recommendation systems can use clustering, neural networks and elements of graph theory. The main drawback of model-based methods is that they are typically have a very high computational cost and may require a large amount of memory.

The most popular algorithm used for collaborative filtering, when the similarity matrix is sparse, is Alternating Least Squares (ALS) minimisation. Simply, this aims to estimate the entries of a matrix $M=UV^T$ when a subset of these entries is observed. The algorithm minimises the squared error with the observed entries, when alternating in optimising $U$ and $V$. 

## Pros & Cons
Collaborative filtering can be used when data is difficult to analyse since it can use the imnplicit feedback. However, there are a few problems. Firstly, the cold-start problem - a new user has no data, hence, the system cannot make meaningful recommendations for them. Also, if data is sparse, then recommendations can be less accurate and many items may not be recommended at all. Finally, the method must be scalable in order to stay efficient. The basic collaborative filtering methods can struggle with this, but model-based methods like SVD can be used to give efficient and robust recommendations.

# Content-based Filtering
...

...

...

# Hybrid Filtering
...

...

...

## ALS with `implicit`

## ALS with PySpark

## **References**
[1] F.O. Isinkaye, Y.O. Folajimi, B.A. Ojokoh,
Recommendation systems: Principles, methods and evaluation,
Egyptian Informatics Journal,
Volume 16, Issue 3,
2015,
Pages 261-273.
(https://www.sciencedirect.com/science/article/pii/S1110866515000341)