## Restricted Boltzmann Machine that recommends music artists to a user on a basis of artists that this user has already scored. Yahoo! dataset (Tensorflow)

### Information about the dataset

- Number of inputs: **>150 000 000**, for the project only 1.55 million is used
- Total number of music artists: **97 956**
- Dataset: https://webscope.sandbox.yahoo.com/catalog.php?datatype=r&did=1 <br>
A permission to use this data set for non-commercial usage was provided by Yahoo

### Data preprocesing stage

**Import libraires** 

In [3]:
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

**Import datasets** 

In [2]:
music_rat = pd.read_csv('data/part1.txt', sep=',', header=None)
music_art = pd.read_csv('data/artists.txt', sep='::', header=None)

First column is the id of a user, second one is the id of an artist and the lest one is a score that this user gave to this artist

In [4]:
music_rat.head()

Unnamed: 0,0,1,2
0,1,1000125,90
1,1,1006373,100
2,1,1006978,90
3,1,1007035,100
4,1,1007098,100


In [5]:
music_art.head()

Unnamed: 0,0,1
0,-100,Not Applicable
1,-99,Unknown Artist
2,1000001,"Bobby ""O"""
3,1000002,"Jimmy ""Z"""
4,1000003,'68 Comeback


**Naming columns**

In [6]:
music_rat.columns = ["UserID", "ArtistID", "score"]
music_art.columns = ["ArtistID", "Name"]

**Remove a special condition (don't turn again)**

In [7]:
music_rat = music_rat[music_rat["score"] < 255] 

In [8]:
len(music_rat)

1550379

The total number of ratings to be used is: **1550379**

**Add index to every artist**

In [10]:
music_art["List Index"] = music_art.index

**Combune data of two dataframes on a unique artist ID**

In [11]:
data_combined = pd.merge(music_rat, music_art, on="ArtistID")
data_combined = data_combined.drop(["Name"], 1)
data_combined.head(4)

Unnamed: 0,UserID,ArtistID,score,List Index
0,1,1000125,90,123
1,5,1000125,90,123
2,10,1000125,90,123
3,21,1000125,90,123


**Group data by User and create a template to be used by a neural networ** <br>
Every input will be a user with all of ratings of artists. If user did not rate an artist, a very small number is put (0.000000001).

In [14]:
user_Group = data_combined.groupby('UserID')
TotalUsers = 3500
X = []
for userID, curUser in user_Group:
    temp = [0.000000001]*len(music_art)
    for num, artist in curUser.iterrows():
        temp[artist["List Index"]] = artist["score"]/100
    X.append(temp)
    if TotalUsers == 0:
        break
    TotalUsers -= 1

### Create a neural network

**Placeholders for visible, hidden units, weight** <br>
Number of hidden units selected is 30

In [16]:
hiddenUnits = 30
visibleUnits = len(music_art)
vb = tf.placeholder("float", [visibleUnits]) #Number of unique movies
hb = tf.placeholder("float", [hiddenUnits]) #Number of features we're going to learn
W = tf.placeholder("float", [visibleUnits, hiddenUnits])

**Input Processing**

In [17]:
v0 = tf.placeholder("float", [None, visibleUnits])
_h0= tf.nn.sigmoid(tf.matmul(v0, W) + hb)
h0 = tf.nn.relu(tf.sign(_h0 - tf.random_uniform(tf.shape(_h0))))

**Reconstruction**

In [None]:
_v1 = tf.nn.sigmoid(tf.matmul(h0, tf.transpose(W)) + vb) 
v1 = tf.nn.relu(tf.sign(_v1 - tf.random_uniform(tf.shape(_v1))))
h1 = tf.nn.sigmoid(tf.matmul(v1, W) + hb)