In [1]:
import pandas as pd
import numpy as np

### Erstellen der Ineraction Matrix
----

In [2]:
data = np.array([[1,1,2,3],['A','A','B','B']]).T

In [3]:
df = pd.DataFrame(data, columns = ['user_id','product_name'])
df

Unnamed: 0,user_id,product_name
0,1,A
1,1,A
2,2,B
3,3,B


In [4]:
# create interaction matrix
interaction_matrix_count = df.pivot_table(index='user_id', columns='product_name', aggfunc=len, fill_value=0)
interaction_matrix_count.head()

product_name,A,B
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1
1,2,0
2,0,1
3,0,1


In [5]:
#interaction_matrix_count = df.pivot_table(index='user_id', columns='product_name', values=[1])


In [6]:
def binary(x):
    if x > 0:
        x = 1
    else:
        x = 0
    return x

In [7]:
interaction_matrix_count.applymap(binary)

product_name,A,B
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1
1,1,0
2,0,1
3,0,1


## User Product Rating

#### Version 1

In order to get a user-item rating for the interaction matrix, we had to develop a rating function.

The following thoughts were made:
- We want to give sufficient weight to the first product purchase
    - We decided that the first purchase has a weight $\displaystyle \omega = \frac{1}{3}$
- We want to even more weight a reorder of a product (this assumes that the user liked the product)
    - To create the gap, we defined the minimum gap of $\displaystyle 2 \cdot \omega$
- In addition, we do not want to weight multiple reorders too much that users with e.g. 6 or 10 reorders differs not too much
    - To achieve this, we can take the square-root of the number of reorders since $\sqrt{x}\,\, |\,\, x \geq 0\,\,$ is concave
- The rating should be a number between 0 and 1

Therefore the following formula has been developed:

$\displaystyle{ rating(o, o_{tot}) =
  \begin{cases}
    0            & \quad \text{if } o \text{ is } 0\\
    \omega       & \quad \text{if } o \text{ is }1 \land o_{tot} > 1\\
    2 \cdot \omega + (1 - 2 \cdot \omega) \cdot \sqrt{\frac{o}{o_{tot}}}  & \quad \text{if } o \text{ is } \geq 1 \lor \left( o \text{ is }1 \land o_{tot} \text{ is } 1 \right)
  \end{cases}}$
    
- where $o$ is the number of orders of the specified product $p$ of the user and $o_{tot}$ is the number of orders of the user.
- $\omega$ must be well defined $0 < \omega < 0.5$, optimally somewhere in the middle

This approach does strongly weight the first reorder and neither takes the number of orders of a user nor the aisles into account.

In [16]:
omega = 0.35
gamma = 10 # number of orders of the user
for o in range(gamma + 1):
    if o == 0:
        x = 0
    elif o == 1 and gamma > 1:
        x = omega
    else:
        x = 2*omega + (1-2*omega) * np.sqrt(o/gamma)
    print(x)

0
0.35
0.8341640786499873
0.8643167672515498
0.8897366596101027
0.9121320343559642
0.9323790007724451
0.9509980079602227
0.9683281572999747
0.9846049894151541
1.0


#### Version 2

In order to get a user-item rating for the interaction matrix, we had to develop a rating function.

The following thoughts were made:
- We want to give sufficient weight to the first product purchase
    - We decided that the first purchase has a weight $\displaystyle \omega = \frac{1}{3}$
    
- Our main weight should be the frequency of a product being in an order.
    - To achieve this, we can take the square-root of the number of orders containing a product  $o$ divided by the total number of orders by the customer $o_{tot}$: $\sqrt{\frac{o}{o_{tot}}}$
    
- Because there are a lot of customers with a low number of orders and related to that a specific uncertainty, we want to weaken the ratings for these customers.
    - To achieve this, we can take the square-root of the total amount of orders for each customer $o_{tot}$ divided by a specific treshold value $m$: $\sqrt{\frac{o_{tot}}{m}}\,\, |\,\, o_{tot} < m\,\,$

- The rating should be a number between 0 and 1

Therefore the following formula has been developed:

$\displaystyle{ rating(o, o_{tot}) =
  \begin{cases}
    0            & \quad \text{if } o \text{ is } 0\\
    \omega       & \quad \text{if } o \text{ is }1 \\
    \omega + (1 - \omega) \cdot \sqrt{\frac{o}{o_{tot}}}  & \quad \text{if } o \text{ is } > 1 \land \left(  o_{tot} \geq m \right)\\
    \omega + (1 - \omega) \cdot \sqrt{\frac{o}{o_{tot}}} \cdot \sqrt{\frac{o_{tot}}{m}}  & \quad \text{if } o \text{ is } > 1 \land \left(  o_{tot} < m \right)\\
  \end{cases}}$
    
- where $o$ is the number of orders of the specified product and $o_{tot}$ is the total amount of orders from the corresponding customer $p$.

This approach takes the ratio of each product beeing ordered by a customer and weakens the rating, if to little orders are aviable.

In [10]:
theta = 1/3
o_tot = 10 # products in o
# o total o of customer
m = 10 # lowers ratings ir less than m orders

for o_tot in range(m+20):
    print("o_tot =", o_tot)
    for o in range(o_tot + 1):
        if o == 0:
            x = 0
        elif o == 1:
            x = theta
        else:
            if o_tot < m:
                w_freq = np.sqrt(o_tot / m)
            else:
                w_freq = 1

            w_prod = np.sqrt(o / o_tot)
            
            x = theta + (1-theta) * w_prod * w_freq

        print("x =", x)

o_tot = 0
x = 0
o_tot = 1
x = 0
x = 0.3333333333333333
o_tot = 2
x = 0
x = 0.3333333333333333
x = 0.6314757303333053
o_tot = 3
x = 0
x = 0.3333333333333333
x = 0.6314757303333054
x = 0.6984817050034441
o_tot = 4
x = 0
x = 0.3333333333333333
x = 0.6314757303333054
x = 0.6984817050034442
x = 0.7549703546891173
o_tot = 5
x = 0
x = 0.3333333333333333
x = 0.6314757303333054
x = 0.6984817050034442
x = 0.7549703546891173
x = 0.804737854124365
o_tot = 6
x = 0
x = 0.3333333333333333
x = 0.6314757303333053
x = 0.6984817050034442
x = 0.7549703546891173
x = 0.804737854124365
x = 0.8497311128276557
o_tot = 7
x = 0
x = 0.3333333333333333
x = 0.6314757303333054
x = 0.6984817050034441
x = 0.7549703546891173
x = 0.804737854124365
x = 0.8497311128276557
x = 0.8911066843560504
o_tot = 8
x = 0
x = 0.3333333333333333
x = 0.6314757303333053
x = 0.698481705003444
x = 0.7549703546891173
x = 0.804737854124365
x = 0.8497311128276557
x = 0.8911066843560504
x = 0.9296181273332773
o_tot = 9
x = 0
x = 0.33333333333