# Matrix Factorization

## [Task 1] Fill Missing Values

<br />

I am going to implement the matrix factorization by using scikit-learn. At first, I will fill missing values by evaluating them as 0s.

In [10]:
import numpy as np
import pandas as pd
from sklearn.decomposition import NMF

In [6]:
# Prepare a concise dataset

data1 = pd.DataFrame({"ゼロから作るDeepLearning": [2, 1, 0, 2, 1],
                      "Python機械学習プログラミング": [2, 1, 5, 4, 3],
                      "ゼロから作るDeepLearning-自然言語処理編-": [4, 2, 2, 2, 0],
                      "はじめてのパターン認識": [np.nan, 4, 2, 3, 4],
                      "これからの強化学習": [np.nan, 3, 5, 4, 1]}, index=["user1", "user2", "user3", "user4", "user5"])

In [7]:
# Check

data1

Unnamed: 0,ゼロから作るDeepLearning,Python機械学習プログラミング,ゼロから作るDeepLearning-自然言語処理編-,はじめてのパターン認識,これからの強化学習
user1,2,2,4,,
user2,1,1,2,4.0,3.0
user3,0,5,2,2.0,5.0
user4,2,4,2,3.0,4.0
user5,1,3,0,4.0,1.0


In [8]:
# Fill missing values

new_data1 = data1.fillna(0)

In [9]:
# Check

new_data1

Unnamed: 0,ゼロから作るDeepLearning,Python機械学習プログラミング,ゼロから作るDeepLearning-自然言語処理編-,はじめてのパターン認識,これからの強化学習
user1,2,2,4,0.0,0.0
user2,1,1,2,4.0,3.0
user3,0,5,2,2.0,5.0
user4,2,4,2,3.0,4.0
user5,1,3,0,4.0,1.0


## [Task 2] Implement Non-negative Matrix Factorization(NMF) by scikit-learn

<br />

sklearn.decomposition.NMF


https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html

In [12]:
# NMF

model = NMF(n_components=2, init='random', random_state=0)
W = model.fit_transform(new_data1)
H = model.components_

In [13]:
# Show the result

np.dot(W, H)

array([[1.90202882, 1.92346129, 4.01522546, 0.        , 0.52282576],
       [0.7813298 , 2.83404347, 1.24516592, 2.59768915, 2.84379341],
       [1.20809937, 4.10725982, 1.97962923, 3.66736152, 4.04367796],
       [1.35579048, 3.98848411, 2.34443869, 3.32658269, 3.73938561],
       [0.35469018, 2.45515588, 0.33412505, 2.66448919, 2.79412539]])

# Try the Implementation by Big Data

<br />

I will do a recommendation by using a dataset of animations on Kaggle.

<br />

"Anime Recommendations Database"


https://www.kaggle.com/CooperUnion/anime-recommendations-database/version/1

### Preprocessing

In [14]:
# Read the data

ratings = pd.read_csv('anime-recommendations-database/rating.csv')
anime = pd.read_csv('anime-recommendations-database/anime.csv')

In [16]:
# Check

ratings.head()

Unnamed: 0,user_id,anime_id,rating
0,1,20,-1
1,1,24,-1
2,1,79,-1
3,1,226,-1
4,1,241,-1


In [15]:
anime.head()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266


In [17]:
# Preprocessing

# Get only animes evaluated by more than 1000 members
anime = anime[anime['members'] > 1000]

# Delete missing values while evaluating animes that have not been evaluated yet as -1s
anime = anime.dropna()
ratings[ratings.rating == -1] = 0

# Merge
merge_df = ratings.merge(anime, left_on='anime_id',
                         right_on='anime_id', suffixes=['_user', ''])

# Delete duplications
merge_df = merge_df.drop_duplicates(['user_id', 'name'])
anime_pivot = merge_df.pivot(index='user_id', columns='name', values='rating').fillna(0)

In [18]:
# Check

anime_pivot.head()

name,&quot;0&quot;,&quot;Bungaku Shoujo&quot; Kyou no Oyatsu: Hatsukoi,&quot;Bungaku Shoujo&quot; Memoire,&quot;Bungaku Shoujo&quot; Movie,.hack//G.U. Returner,.hack//G.U. Trilogy,.hack//G.U. Trilogy: Parody Mode,.hack//Gift,.hack//Intermezzo,.hack//Liminality,...,gdgd Fairies,gdgd Fairies 2,iDOLM@STER Xenoglossia,iDOLM@STER Xenoglossia Specials,s.CRY.ed,xxxHOLiC,xxxHOLiC Kei,xxxHOLiC Movie: Manatsu no Yoru no Yume,xxxHOLiC Rou,xxxHOLiC Shunmuki
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,8.11,0.0,0.0,0.0,0.0
7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## [Advanced Task 1] Implement NMF by scikit-learn

In [21]:
# NMF

model2 = NMF(n_components=2, init='random', random_state=0)
W2 = model.fit_transform(anime_pivot)
H2 = model.components_

In [22]:
# Show the result

np.dot(W2, H2)

array([[1.56695401e-04, 6.05443697e-03, 9.80824391e-03, ...,
        0.00000000e+00, 7.34831264e-03, 2.54466534e-03],
       [2.81847785e-05, 1.08901068e-03, 1.76420738e-03, ...,
        0.00000000e+00, 1.32173991e-03, 4.57708579e-04],
       [2.12356740e-03, 8.36358988e-02, 1.17250369e-01, ...,
        1.00098874e-01, 1.27831224e-01, 1.06096963e-01],
       ...,
       [6.91283539e-05, 2.83700420e-03, 2.68549560e-03, ...,
        1.04841427e-02, 6.20017557e-03, 8.62301371e-03],
       [6.91425028e-03, 2.82910533e-01, 2.76989543e-01, ...,
        9.95075907e-01, 6.05033144e-01, 8.24165859e-01],
       [1.38532928e-04, 5.64667199e-03, 5.76413502e-03, ...,
        1.85678025e-02, 1.17359478e-02, 1.55331976e-02]])

## [Advanced Task 2] Recommendation