# Matrix Factorization for a small subset

In this notebook, we're going to build our first recommender system, which follows a **collaborative filtering approach** and only takes into account all the readers and all the articles in a small subset of our data. The goal with this **matrix factorization technique** is to 'learn' two embedding matrices with the respective size of the numbers of readers/articles and an arbitrarily chosen (and thus tunable) size of latent factors. 

Thus, if we had 10 readers, 5 articles and were to assume we needed 3 latent factors (which could represent implicit, but substantive differences in our reader/article-base), our method will calculate two matrices (a 10 by 3 for the readers and a 3 by 5 for the articles) whose scalar products yield a new matrix the size of our original one (10 x 5), which *approximates* the original matrix best. This optimization problem is typically solved by stochastic gradient descent (although there are, of course, other possibilities) and from a once extremely sparse matrix (obviously, ervery single reader only reads/clicks a tiny fraction of the articles available to us), we get a densely populated table which now contains information on wether some reader might be more or less inclined to read certain articles. 

The approach might sound a bit dry and mathematic at first, but with the embeddings we actually learn some lower dimensional representations of our readers/articles and can hereby determine *resemblances in preferences*. If you ever wondered how amazon or google knew what you were interested in before you even searched for it: here you go!

## Python Imports

In [3]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from scipy.sparse.linalg import svds

## Data import and cleaning

### Load the impression logs and news articles information into pandas data frames

In [6]:
behaviors = pd.read_csv('../../data/mind_small_train/behaviors.tsv', sep="\t", header=None)
news = pd.read_csv('../../data/mind_small_train/news.tsv', sep="\t", header = None)

The news dataset stores the information of all the news articles (id, header, abstract, ...). It looks like this:

In [7]:
news.head(3)

Unnamed: 0,0,1,2,3,4,5,6,7
0,N55528,lifestyle,lifestyleroyals,"The Brands Queen Elizabeth, Prince Charles, an...","Shop the notebooks, jackets, and more that the...",https://assets.msn.com/labs/mind/AAGH0ET.html,"[{""Label"": ""Prince Philip, Duke of Edinburgh"",...",[]
1,N19639,health,weightloss,50 Worst Habits For Belly Fat,These seemingly harmless habits are holding yo...,https://assets.msn.com/labs/mind/AAB19MK.html,"[{""Label"": ""Adipose tissue"", ""Type"": ""C"", ""Wik...","[{""Label"": ""Adipose tissue"", ""Type"": ""C"", ""Wik..."
2,N61837,news,newsworld,The Cost of Trump's Aid Freeze in the Trenches...,Lt. Ivan Molchanets peeked over a parapet of s...,https://assets.msn.com/labs/mind/AAJgNsz.html,[],"[{""Label"": ""Ukraine"", ""Type"": ""G"", ""WikidataId..."


At first, we will only need to work with the behaviors dataset, which stores the click history and the impression logs. It looks like this:

In [44]:
behaviors.head()

Unnamed: 0,0,1,2,3,4
0,1,U13740,11/11/2019 9:05:58 AM,N55189 N42782 N34694 N45794 N18445 N63302 N104...,N55689-1 N35729-0
1,2,U91836,11/12/2019 6:11:30 PM,N31739 N6072 N63045 N23979 N35656 N43353 N8129...,N20678-0 N39317-0 N58114-0 N20495-0 N42977-0 N...
2,3,U73700,11/14/2019 7:01:48 AM,N10732 N25792 N7563 N21087 N41087 N5445 N60384...,N50014-0 N23877-0 N35389-0 N49712-0 N16844-0 N...
3,4,U34670,11/11/2019 5:28:05 AM,N45729 N2203 N871 N53880 N41375 N43142 N33013 ...,N35729-0 N33632-0 N49685-1 N27581-0
4,5,U8125,11/12/2019 4:11:21 PM,N10078 N56514 N14904 N33740,N39985-0 N36050-0 N16096-0 N8400-1 N22407-0 N6...


and needs some column-relabelling:

In [45]:
behaviors= behaviors.rename(columns={3:'history'})
behaviors = behaviors.rename(columns={0:'impression_id'})
behaviors = behaviors.rename(columns= {1 : 'user_id'})
behaviors = behaviors.rename(columns= {2 : 'time'})
behaviors = behaviors.rename(columns= {4 : 'labels'})

### Are there multiple readers (users) with multiple sessions in the impression logs?

Now we want to check if there are readers with multiple sessions:

In [6]:
behaviors.user_id.value_counts()

U32146    62
U15740    44
U20833    41
U51286    40
U44201    40
          ..
U27713     1
U68491     1
U56233     1
U58450     1
U80626     1
Name: user_id, Length: 50000, dtype: int64

In [7]:
len(behaviors.user_id.unique()), len(behaviors.user_id)

(50000, 156965)

**Apparently, there are!** For matrix factorization, we only want to work with the click history, so let's check whether the click histories for the duplicate users are the same:

In [8]:
duplicate_users_value_counts = behaviors.user_id.value_counts()

In [48]:
# Create list with the IDs of duplicate users
duplicate_users = duplicate_users_value_counts[duplicate_users_value_counts!=1].index.to_list()

In [49]:
behaviors[behaviors.user_id == duplicate_users[0]].head(3)

Unnamed: 0,impression_id,user_id,time,history,labels
379,380,U32146,11/11/2019 7:39:08 AM,N17933 N55829 N61864 N46346 N29597 N52097 N291...,N55689-1 N35729-0
3635,3636,U32146,11/13/2019 11:59:19 AM,N17933 N55829 N61864 N46346 N29597 N52097 N291...,N11551-1 N56214-0 N51048-0 N10913-0 N28523-0
6686,6687,U32146,11/9/2019 8:22:36 AM,N17933 N55829 N61864 N46346 N29597 N52097 N291...,N27845-0 N51398-0 N41881-1 N60374-0 N52000-0 N...


### Do the users with multiple sessions have equal click histories?

In [53]:
# ATTENTION: This cell needs some time to compute (~5min).
# So only uncomment if you have some spare time.
# Check whether the click histories of the duplicate users are the same. 
# If not, save the user ID to diff_hist.

# diff_hist = []
# for user in duplicate_users:
#     l = behaviors[behaviors.user_id==user].history.to_list()
#     if len(set(l)) != 1:
#         diff_hist.append(user)

# if len(diff_hist) == 0:
#     print(f"Length of diff_hist is {len(diff_hist)},",
#           "i.e. all users with multiple session have equal history logs.")

All users with multiple sessions have equal history logs. In contrast, the recommendation and click logs differ from one impression to another.

In [40]:
user = 'U89995'
x = behaviors[behaviors.user_id == user].history.iloc[1].split(' ')
print(f"History log of user {user} has length:   {len(x)}",
      f"\nNumber of unique entries in history log: {len(set(x))}")

History log of user U89995 has length:   79 
Number of unique entries in history log: 56


It also looks like there are readers who clicked the same articles multiple times. We treat these instances as redundancies here, which -- together with the repeating histories in general -- don't pose a problem for constructing our **original reader-article-matrix**.

Let's take a look at the most clicked article of the particular user U89995:

In [54]:
U89995 = behaviors[behaviors.user_id == 'U89995']
U89995 = U89995.history.iloc[0]
max([(s, U89995.split().count(s)) for s in U89995.split()], key=lambda x: x[1])

('N47020', 16)

In [55]:
news[news[0]=='N47020']

Unnamed: 0,0,1,2,3,4,5,6,7
22967,N47020,news,newsopinion,The News In Cartoons,News as seen through the eyes of the nation's ...,https://assets.msn.com/labs/mind/AAJ7oYd.html,[],[]


### Remove duplicate user IDs

Still, to prevent the leakage of the same user-article pairs into the test set, we remove duplicate user IDs from the impression logs.

In [75]:
behaviors_unique_userIDs = behaviors.drop_duplicates(subset="user_id").copy()
behaviors_unique_userIDs.dropna(inplace=True)

## Data preparation for the model

### Restrict data size and create user-article table

In order to reduce computing time, we want to reduce our dataset to the first 10,000 impressions for this task:

In [77]:
behav_part_1 = behaviors_unique_userIDs.iloc[:10000, :]

In [59]:
behav_part_1.shape

(10000, 5)

Create a dictonary that maps impression IDs to corresponding user IDs for later use in evaluation.

In [60]:
id_dict = pd.Series(behav_part_1.user_id.values,
                    index=behav_part_1.impression_id
                   ).to_dict()

Create table which lists all the user-article pairs and labels them as read.

In [90]:
x = behav_part_1.set_index('user_id').history.str.split(' ', expand =True)
x = x.stack().reset_index(1, drop=True).reset_index(name='article')
behaviors_part_1_set = x

In [127]:
behaviors_part_1_set['read'] = 1

In [128]:
behaviors_part_1_set.head()

Unnamed: 0,user_id,article,read
0,U13740,N55189,1.0
1,U13740,N42782,1.0
2,U13740,N34694,1.0
3,U13740,N45794,1.0
4,U13740,N18445,1.0


#### Short intermezzo: Let's take a look at some articles users clicked more than once

In [129]:
user_article_vc = behaviors_part_1_set.value_counts(["user_id", "article"])
user_article_vc[user_article_vc!=1][:10]

user_id  article
U89995   N47020     16
U28941   N47020     14
U80573   N20413     14
U20271   N47020     13
U45154   N47020     13
U28941   N61864     10
U26511   N47020     10
U10703   N47020     10
U64006   N47020     10
U17022   N47020     10
dtype: int64

Article "N47020" seems to be very popular! As you can see in the following output it is a cartoon, which depicts the news of the day in one comic. So no wonder people click this article more than once.

In [130]:
news[news[0]=='N47020']

Unnamed: 0,0,1,2,3,4,5,6,7
22967,N47020,news,newsopinion,The News In Cartoons,News as seen through the eyes of the nation's ...,https://assets.msn.com/labs/mind/AAJ7oYd.html,[],[]


In [131]:
news[news[0]=='N20413']

Unnamed: 0,0,1,2,3,4,5,6,7
3593,N20413,sports,football_nfl,The 2019 NFL Season,The 2019 NFL Season,https://assets.msn.com/labs/mind/AAGShGI.html,"[{""Label"": ""NFL regular season"", ""Type"": ""E"", ...","[{""Label"": ""2019 NFL season"", ""Type"": ""N"", ""Wi..."


In [132]:
news[news[0]=='N61864']

Unnamed: 0,0,1,2,3,4,5,6,7
86,N61864,news,newsopinion,The News In Cartoons,News as seen through the eyes of the nation's ...,https://assets.msn.com/labs/mind/AABGTFJ.html,[],[]


**Oh dang!** From the title and apstract this seems to be the same cartoon article as above :O so some of the articles can be stored under different article IDs. That's not good! Later we will have to identify those duplicates and sort this out, but for now we will just pretend we never saw this. 

In the folowing we compare the number of all articles to the number of articles, which were read more than once by the some particular user:

In [133]:
user_article_vc.shape, user_article_vc[user_article_vc!=1].shape

((287806,), (3502,))

### Train Test Split

Next we will perform the train-test-split on the user-article table. Then we want to make sure we have a good overlap of the same users and articles in the two splits. This is important for the evaluation of the model later on, as we can only give recommendations for users the model already saw in training. 

In [134]:
train, test = train_test_split(behaviors_part_1_set, test_size=0.5, random_state=420)

In [135]:
user_intersection = set(train.user_id) & set(test.user_id)
article_intersection = set(train.article) & set(test.article)
print("User ID overlap in train and test split:    ",
      f"{len(user_intersection)} / {behaviors_part_1_set.user_id.nunique()}",
      "\n"
      "Article ID overlap in train and test split: ",
      f"{len(article_intersection)} / {behaviors_part_1_set.article.nunique()}")   

User ID overlap in train and test split:     9519 / 10000 
Article ID overlap in train and test split:  11502 / 21954


As we can see from the numbers above we have a sufficient amount of the same users and articles in both of the splits.

### Create Pivot Table

Now we create the user-article matrix from our train set, which we then approximate by singular value decomposition aka matrix factorization.

In [151]:
train_pivot = train.pivot_table(index='user_id', 
                                columns='article',
                                values='read',
                                fill_value=0,
                             #   aggfunc=np.sum
                               )

In [152]:
train_pivot = train_pivot.astype(np.float64)
train_pivot.head()

article,N1001,N10016,N10021,N10024,N10025,N10034,N10040,N10041,N10047,N10048,...,N9955,N9958,N996,N9969,N997,N9973,N9974,N9977,N9978,N9992
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
U10022,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
U10043,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
U10045,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
U10059,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
U10062,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Model Fitting

In [154]:
b1 = train_pivot.to_numpy(copy=True)
b1_mean = np.mean(b1, axis=1)
b1 -= b1_mean.reshape(-1,1)

In [156]:
U, sigma, Vt = svds(b1, k=5)

In [157]:
sigma = np.diag(sigma)

In [158]:
sigma.shape

(5, 5)

In [162]:
recommendations = np.dot(np.dot(U, sigma), Vt) + b1_mean.reshape(-1, 1)
recos_df = pd.DataFrame(recommendations)
recos_df.columns = train_pivot.columns
recos_df['user_ids'] = train_pivot.index
recos_df.set_index('user_ids', inplace=True)

In [163]:
recos_df.head()

article,N1001,N10016,N10021,N10024,N10025,N10034,N10040,N10041,N10047,N10048,...,N9955,N9958,N996,N9969,N997,N9973,N9974,N9977,N9978,N9992
user_ids,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
U10022,-0.000549,0.001115,-0.000521,-9.5e-05,-0.000311,-0.00042,0.000838,-0.000275,-0.0002,-0.000547,...,0.005826,-0.000582,0.010311,-0.000471,-0.000564,-0.000477,-0.000552,-0.000293,-0.000484,-0.000459
U10043,0.000767,0.001345,0.000837,0.0008,0.000704,0.00098,0.000768,0.00079,0.000868,0.000888,...,0.000983,0.000873,0.002325,0.000882,0.000769,0.000802,0.000692,0.000931,0.000952,0.000795
U10045,0.00094,0.001574,0.001075,0.001005,0.000912,0.001234,0.000918,0.000972,0.001074,0.001089,...,0.001429,0.00107,0.002735,0.001098,0.000957,0.00106,0.000934,0.001172,0.001149,0.000981
U10059,-0.001402,0.001535,-0.00128,-0.002161,-0.002106,0.000199,-0.001775,-0.001481,-0.001074,-0.000877,...,-0.001184,-0.000892,0.007148,-0.001404,-0.001579,-0.000987,-0.001721,-0.000319,-0.000194,-0.001439
U10062,-0.001496,0.007858,-0.002202,-0.000852,-0.002468,0.000586,0.00262,-0.000646,-0.000238,-0.000757,...,0.016033,-0.000977,0.041127,-0.000991,-0.001842,-0.001996,-0.003065,0.000322,0.000139,-0.001339


In [37]:
news.head()

Unnamed: 0,0,1,2,3,4,5,6,7
0,N55528,lifestyle,lifestyleroyals,"The Brands Queen Elizabeth, Prince Charles, an...","Shop the notebooks, jackets, and more that the...",https://assets.msn.com/labs/mind/AAGH0ET.html,"[{""Label"": ""Prince Philip, Duke of Edinburgh"",...",[]
1,N19639,health,weightloss,50 Worst Habits For Belly Fat,These seemingly harmless habits are holding yo...,https://assets.msn.com/labs/mind/AAB19MK.html,"[{""Label"": ""Adipose tissue"", ""Type"": ""C"", ""Wik...","[{""Label"": ""Adipose tissue"", ""Type"": ""C"", ""Wik..."
2,N61837,news,newsworld,The Cost of Trump's Aid Freeze in the Trenches...,Lt. Ivan Molchanets peeked over a parapet of s...,https://assets.msn.com/labs/mind/AAJgNsz.html,[],"[{""Label"": ""Ukraine"", ""Type"": ""G"", ""WikidataId..."
3,N53526,health,voices,I Was An NBA Wife. Here's How It Affected My M...,"I felt like I was a fraud, and being an NBA wi...",https://assets.msn.com/labs/mind/AACk2N6.html,[],"[{""Label"": ""National Basketball Association"", ..."
4,N38324,health,medical,"How to Get Rid of Skin Tags, According to a De...","They seem harmless, but there's a very good re...",https://assets.msn.com/labs/mind/AAAKEkt.html,"[{""Label"": ""Skin tag"", ""Type"": ""C"", ""WikidataI...","[{""Label"": ""Skin tag"", ""Type"": ""C"", ""WikidataI..."


In [38]:
titles_dict = pd.Series(news[3].values,index=news[0]).to_dict()

In [39]:
def give_recommendations(user, n = 5):
    recos = recommendations_df.T[user].sort_values().tail(n)
    return recos

In [40]:
give_recommendations('U91836')

article
N11101    0.306672
N6233     0.320884
N41375    0.329777
N37509    0.354515
N14761    0.456252
Name: U91836, dtype: float64

In [41]:
recommendations_df.T['U91836']    #[user].sort_values().tail(n)

article
N100      0.000032
N1000     0.000729
N10001   -0.001700
N10003   -0.000208
N10009    0.000684
            ...   
N9977     0.000857
N9978    -0.002033
N9984     0.001415
N9992    -0.000673
N9993     0.000366
Name: U91836, Length: 20688, dtype: float64

In [42]:
news.head()

Unnamed: 0,0,1,2,3,4,5,6,7
0,N55528,lifestyle,lifestyleroyals,"The Brands Queen Elizabeth, Prince Charles, an...","Shop the notebooks, jackets, and more that the...",https://assets.msn.com/labs/mind/AAGH0ET.html,"[{""Label"": ""Prince Philip, Duke of Edinburgh"",...",[]
1,N19639,health,weightloss,50 Worst Habits For Belly Fat,These seemingly harmless habits are holding yo...,https://assets.msn.com/labs/mind/AAB19MK.html,"[{""Label"": ""Adipose tissue"", ""Type"": ""C"", ""Wik...","[{""Label"": ""Adipose tissue"", ""Type"": ""C"", ""Wik..."
2,N61837,news,newsworld,The Cost of Trump's Aid Freeze in the Trenches...,Lt. Ivan Molchanets peeked over a parapet of s...,https://assets.msn.com/labs/mind/AAJgNsz.html,[],"[{""Label"": ""Ukraine"", ""Type"": ""G"", ""WikidataId..."
3,N53526,health,voices,I Was An NBA Wife. Here's How It Affected My M...,"I felt like I was a fraud, and being an NBA wi...",https://assets.msn.com/labs/mind/AACk2N6.html,[],"[{""Label"": ""National Basketball Association"", ..."
4,N38324,health,medical,"How to Get Rid of Skin Tags, According to a De...","They seem harmless, but there's a very good re...",https://assets.msn.com/labs/mind/AAAKEkt.html,"[{""Label"": ""Skin tag"", ""Type"": ""C"", ""WikidataI...","[{""Label"": ""Skin tag"", ""Type"": ""C"", ""WikidataI..."


In [43]:
titles_dict = pd.Series(news[3].values,index=news[0]).to_dict()

In [44]:
give_recommendations('U91836', n=10)

article
N12349    0.236248
N59704    0.239799
N27526    0.259654
N4607     0.269513
N11231    0.276449
N11101    0.306672
N6233     0.320884
N41375    0.329777
N37509    0.354515
N14761    0.456252
Name: U91836, dtype: float64

In [45]:
xy=give_recommendations('U91836')

In [46]:
xy

article
N11101    0.306672
N6233     0.320884
N41375    0.329777
N37509    0.354515
N14761    0.456252
Name: U91836, dtype: float64

In [47]:
recommendations_df.index.unique()

Index(['U10022', 'U10043', 'U10045', 'U10059', 'U10062', 'U10064', 'U10079',
       'U10099', 'U10101', 'U10123',
       ...
       'U9881', 'U9920', 'U9923', 'U9929', 'U994', 'U9965', 'U9969', 'U9984',
       'U999', 'U9991'],
      dtype='object', name='user_ids', length=8502)

In [48]:
behaviors_dev = pd.read_csv('../../data/mind_small_dev/behaviors.tsv', sep="\t", header=None)

In [51]:
beh_num = behav_part_1.to_numpy()


In [52]:
user_dic = {}
for i in range(beh_num.shape[0]):
    tri = [s[:-2] for s in beh_num[i][4].split(' ') if s[-1] == '1']
    
    unity = set(tri) & hist_set
    if len(unity) > 0:
        user_dic[i] = list(unity)

NameError: name 'hist_set' is not defined

In [53]:
map_dict = {}
for i, s in enumerate(behaviors_part_1_pivot.columns):
    map_dict[s] = i

In [None]:
map_dict['N10284'], user_dic[21]


In [None]:
np.dot(np.dot(U[21, :], sigma), Vt[:, 13175])
np.dot(np.dot(U[24, :], sigma), Vt[:, 7831])

In [None]:
results = []
for k, v in user_dic.items():
    for n in v:
        news_idx = map_dict[n]
        pred = np.dot(np.dot(U[k, :], sigma), Vt[:, news_idx])
        results.append(pred + b1_mean[k])
    

In [None]:
results.sort(reverse=True)


In [None]:
erg = pd.DataFrame(np.dot(np.dot(U, sigma), Vt) + b1_mean.reshape(-1, 1))

In [None]:
erg.columns = behaviors_part_1_pivot.columns

In [None]:
erg.iloc[24]['N10016']

In [None]:
np.mean(erg.mean())

In [None]:
np.std(erg.mean())

In [None]:
user_dic
erg.iloc[24]['N47020']

In [None]:
recos = []
for user, article in user_dic.items():
    recos.append(erg.iloc[user][article].to_list())

In [None]:
recos_2 =[]
for x in recos:
    for y in x:
        recos_2.append(y)
        

In [None]:
recos_2 = pd.Series(recos_2)

In [None]:
recos_2.describe()

In [None]:
np.mean(erg.mean())

In [None]:
np.std(erg.mean())

In [None]:
erg.mean

In [None]:
from sklearn.decomposition import NMF

In [None]:
beahviors_np = behaviors_part_1_pivot.to_numpy(copy=True)

In [None]:
beahviors_np.shape

In [None]:
model = NMF(n_components=10, init='random', random_state=420)

In [None]:
W = model.fit_transform(beahviors_np)

In [None]:
H = model.components_

In [None]:
H.shape

In [None]:
W.shape

In [None]:
nmf_matrix = np.dot(W, H)

In [None]:
nfm_matrix_df = pd.DataFrame(nmf_matrix)

In [None]:
nfm_matrix_df.columns = behaviors_part_1_pivot.columns

In [None]:
nfm_matrix_df

In [None]:
recos_nfm = []
for user, article in user_dic.items():
    recos_nfm.append(nfm_matrix_df.iloc[user][article].to_list())
    
recos_nfm_2 =[]
for x in recos_nfm:
    for y in x:
        recos_nfm_2.append(y)
        