# BLU10 - Learning Notebook - Part 2 of 3 - Rating Matrix

In [1]:
import numpy as np
import pandas as pd
import scipy as sp

from scipy.sparse import random, coo_matrix, lil_matrix, dok_matrix, csr_matrix, csc_matrix

# 1 Creating a ratings matrix

We come back to our framework, and we now focus on the base model, which only considers the interactions between users and items:

![Recommender Sytems Framework](./media/recommender_systems_framework_base.png)

This basic model is the foundation of every RS and gives rise to the Ratings Matrix $R$. 

## 1.2 Types of data

Different types of interactions between users and items can manifest the user's opinion about an item in different ways.

### Explicit and implicit feedback

Feedback is said to be explicit when provided by the user and implicit if inferred based on user actions (e.g., clicks).

Implicit feedback usually takes the form of unary data {0,1},

### Rating scale

We write $S$ the set of possible ratings. For example, in 1-5 stars rating system $r_{u, i} \in S = \{1, 2, 3, 4, 5\}$.

| Type of data    | Description                          | Rating scale (examples) | Explicit/Implicit |  
|-----------------|--------------------------------------|-------------------------|-------------------|
| Numeric         | Continuous ratings                   | $S = [1, 5]$            | Explicit          |
| Ordinal         | Ordered categories                   | $S = \{1, 2, 3, 4, 5\}$ | Explicit          |
| Binary          | Good or bad  (e.g., Upvote/Downvote) | $S = \{-1, 1\}$         | Explicit          |
| Unary           | User action  (e.g., Click, Purchase) | $S = \{1\}$             | Implicit          |
*Table 1: Different types of data and rating scales*

## 1.3 Ratings matrix

Consider the following ratings matrix $R$, with $S = \{1, 2, 3, 4, 5\}$:

$$\begin{bmatrix}1 &  & 2\\ 1 & 5 & \\  & 2 & 1\end{bmatrix}$$

## 1.4 Representing vectors

Let's go bit by bit, starting with the first row of the matrix, corresponding to:

$$\begin{bmatrix}(Ana, Bananas) & (Ana, Water) & (Ana, Milk)\end{bmatrix}$$

To clarify, $I_{Ana} = \{Bananas, Milk\}$ and $(Ana, Water) \notin R'$. Right?

At the core of Numpy is the homogeneous (i.e., all elements of the same type) n-dimensional array.

Corresponding to the NumPy array:

```
┌───┬───┬───┐
│ 1 │   │ 2 │
└───┴───┴───┘
```

We can create using `numpy.array` with an array-like object, a standard Python list in this case.

In [2]:
np.array([1, np.NaN, 2])

array([ 1., nan,  2.])

The resulting array is what we call a rank-1 because it is a vector with one dimension.

Nonetheless, rank-1 arrays can be ambiguous, so we represent vectors as rank-2 arrays, i.e., as matrices with two dimensions.

The general convention is to use a column vector instead, i.e., a $n$ by 1 matrix, such as:

$$\begin{bmatrix}1 \\  \\ 2\end{bmatrix}$$

Corresponding to a 3 by 1 matrix, such as:

```
┌───┐
│ 1 │
├───┤
│   │
├───┤
│ 2 │
└───┘
```

We do this by using a list of lists, with one nested list per row.

In [3]:
np.array([[1], [np.NaN], [2]])

array([[ 1.],
       [nan],
       [ 2.]])

## 1.5 Representing matrices

Finally, we create our ratings matrix $R$, corresponding to:
```
┌───┬───┬───┐
│ 1 │   │ 2 │
├───┼───┼───┤
│ 1 │ 5 │   │
├───┼───┼───┤
│   │ 2 │ 1 │
└───┴───┴───┘
```

Conveniently, we can pass a list of lists, just like we did above.

In [4]:
R = [[1, np.NaN, 2], [1, 5, np.NaN], [np.NaN, 2, 1]]
R = np.array(R)
R

array([[ 1., nan,  2.],
       [ 1.,  5., nan],
       [nan,  2.,  1.]])

## 1.6 Matrix attributes

Some important attributes of any `ndarray`, to keep in mind.

In [5]:
ndims = R.ndim
nrows = R.shape[0]
ncols = R.shape[1] 
dtype = R.dtype

print("R is a {}-dimensional, {} by {} matrix, of {} elements.".format(ndims, nrows, ncols, dtype))

R is a 2-dimensional, 3 by 3 matrix, of float64 elements.


## 1.7 Saving the matrix

We can save the matrix to a binary file in NumPy `.npy` format.

Note that `save` is a stand-alone function and not an array method.

In [6]:
np.save('data/interim/ratings_matrix', R)

Alternatively, we can dump the matrix into a `.csv` file, as we would typically do.

In [7]:
np.savetxt("data/interim/ratings_matrix.csv", R, delimiter=",")

### 2.2.3 From pandas DataFrame to scipy sparse

If you have a pandas DataFrame (containing only numerical values, of course), you don't need to create a numpy array from it and then convert to scipy sparse: you can do it directly!
This allows you to use Pandas to do cool feature engineering, plot some things and pretend you actually understand what the data is telling you.

In [22]:
df = pd.DataFrame({
    'Bananas': [1,1,0],
    'Water': [0,5,2],
    'Milk': [2,0,1]
    },
    index=['Ana', 'Miguel', 'Beatriz']
)
df

Unnamed: 0,Bananas,Water,Milk
Ana,1,0,2
Miguel,1,5,0
Beatriz,0,2,1


In [23]:
H_ = csc_matrix(df.values)
H_

<3x3 sparse matrix of type '<class 'numpy.int64'>'
	with 6 stored elements in Compressed Sparse Column format>

In [24]:
H_.toarray()

array([[1, 0, 2],
       [1, 5, 0],
       [0, 2, 1]], dtype=int64)