# Double domain recommendations on ML-1m

## Introduction

In this tutorial, we are training DCDCSR model on MovieLens 1m dataset. Data has been divided into four parts D1,D2,D3 and D4. D1 and D2 have users common. D1 and D3 have items common. D1 and D4 have no user and no item in common. We test the model on the testing part of the dataset i.e., testing set from 10% dataset of D1 and calculated the MAE, RMSD, Precision and Recall values. Same is repeated with every dataset. Case 1 :- cross domain recommendation D1 is Target Domain and D4 is Source Domain. Case 2 :- cross domain recommendation D2 is Target Domain and D3 is Source Domain. Case 3 :- cross domain recommendation D3 is Target Domain and D2 is Source Domain. Case 4 :- cross domain recommendation D4 is Target Domain and D1 is Source Domain.

### Model Architecture

<p><center><img src='_images/T490340_1.png'></center></p>

### Training Procedure

<p><center><img src='_images/T490340_2.png'></center></p>

## Setup

In [None]:
!git clone https://github.com/Worm4047/crossDomainRecommenderSystem.git

Cloning into 'crossDomainRecommenderSystem'...
remote: Enumerating objects: 79, done.[K
remote: Total 79 (delta 0), reused 0 (delta 0), pack-reused 79[K
Unpacking objects: 100% (79/79), done.


In [None]:
%cd crossDomainRecommenderSystem

/content/crossDomainRecommenderSystem


## Simple recommendations

In [None]:
from math import *


def sim_distance(prefs,person1,person2):
    si={}
    for item in prefs[person1]: 
        if item in prefs[person2]: si[item]=1
    if len(si)==0: return 0
    sum_of_squares=sum([pow(prefs[person1][item]-prefs[person2][item],2) 
                        for item in prefs[person1] if item in prefs[person2]])

    return 1/(1+sum_of_squares)


def sim_pearson(prefs,p1,p2):
    si={}
    for item in prefs[p1]: 
        if item in prefs[p2]: si[item]=1
    if len(si)==0: return 0

    # Sum calculations
    n=len(si)

    # Sums of all the preferences
    sum1=sum([prefs[p1][it] for it in si])
    sum2=sum([prefs[p2][it] for it in si])

    # Sums of the squares
    sum1Sq=sum([pow(prefs[p1][it],2) for it in si])
    sum2Sq=sum([pow(prefs[p2][it],2) for it in si])	

    # Sum of the products
    pSum=sum([prefs[p1][it]*prefs[p2][it] for it in si])

    # Calculate r (Pearson score)
    num=pSum-(sum1*sum2/n)
    den=sqrt((sum1Sq-pow(sum1,2)/n)*(sum2Sq-pow(sum2,2)/n))
    if den==0: return 0

    r=num/den

    return r


# Returns the best matches for person from the prefs dictionary. 
# Number of results and similarity function are optional params.
def topMatches(prefs,person,n=5,similarity=sim_pearson):
        scores=[(similarity(prefs,person,other),other) 
                        for other in prefs if other!=person]
        scores.sort()
        scores.reverse()
        return scores[0:n]

# Gets recommendations for a person by using a weighted average
# of every other user's rankings
def getRecommendations(prefs,person,similarity=sim_pearson):
    totals={}
    simSums={}
    for other in prefs:
        # don't compare me to myself
        if other==person: continue
        sim=similarity(prefs,person,other)

        # ignore scores of zero or lower
        if sim<=0: continue
        for item in prefs[other]:
            
            # only score movies I haven't seen yet
            if item not in prefs[person] or prefs[person][item]==0:
                # Similarity * Score
                totals.setdefault(item,0)
                totals[item]+=prefs[other][item]*sim
                # Sum of similarities
                simSums.setdefault(item,0)
                simSums[item]+=sim

    # Create the normalized list
    rankings=[(total/simSums[item],item) for item,total in totals.items()]

    # Return the sorted list
    rankings.sort()
    rankings.reverse()
    return rankings


def transformPrefs(prefs):
    result={}
    for person in prefs:
        for item in prefs[person]:
            result.setdefault(item,{})
        
        # Flip item and person
        result[item][person]=prefs[person][item]
    return result


def calculateSimilarItems(prefs,n=10):
    # Create a dictionary of items showing which other items they
    # are most similar to.
    result={}
    # Invert the preference matrix to be item-centric
    itemPrefs=transformPrefs(prefs)
    c=0
    for item in itemPrefs:
        # Status updates for large datasets
        c+=1
        if c%100==0: print("%d / %d" % (c,len(itemPrefs)))
        # Find the most similar items to this one
        scores=topMatches(itemPrefs,item,n=n,similarity=sim_distance)
        result[item]=scores
    return result


def getRecommendedItems(prefs,itemMatch,user):
    userRatings=prefs[user]
    scores={}
    totalSim={}
    # Loop over items rated by this user
    for (item,rating) in userRatings.items( ):

        # Loop over items similar to this one
        for (similarity,item2) in itemMatch[item]:

            # Ignore if this user has already rated this item
            if item2 in userRatings: continue
            # Weighted sum of rating times similarity
            scores.setdefault(item2,0)
            scores[item2]+=similarity*rating
            # Sum of all the similarities
            totalSim.setdefault(item2,0)
            totalSim[item2]+=similarity

    # Divide each total score by total weighting to get an average
    rankings=[(score/totalSim[item],item) for item,score in scores.items( )]

    # Return the rankings from highest to lowest
    rankings.sort( )
    rankings.reverse( )
    return rankings


def loadMovieLens(path='movielens',file='/u1.base'):
    # Get movie titles
    movies={}
    for line in open(path+'/u.item', encoding="ISO-8859-1"):
        (id,title)=line.split('|')[0:2]
        movies[id]=title

    # Load data
    prefs={}
    for line in open(path+file, encoding="ISO-8859-1"):
        (user,movieid,rating,ts)=line.split('\t')
        prefs.setdefault(user,{})
        prefs[user][movieid]=float(rating)
    return prefs


if __name__=='__main__':
    trainPrefs = loadMovieLens()
    testPrefs = loadMovieLens(file='/u1.test')
    movies={}
    for line in open('movielens/u.item', encoding="ISO-8859-1"):
        (id,title)=line.split('|')[0:2]
        movies[id]=title
    for user in testPrefs:
        pred = getRecommendations(trainPrefs,user)
        count=-1
        preds={}
        for rating,item in pred:
            preds[item]=rating
            # print(movies[item],rating,item)
        accuracies=[]
        for movie in testPrefs[user]:
            if not movie in preds:continue 
            actualRating = testPrefs[user][movie]
            predcitedRating = preds[movie]
            diff = fabs(fabs(predcitedRating) - fabs(actualRating))
            # print(predcitedRating,actualRating,diff)
            accu = float(diff)/actualRating
            if accu > 1:
                continue
            accuracies.append(1 - accu)
        print((sum(accuracies)/len(accuracies))*100)

79.24292127362807
87.93479303981941
75.80961193124574
77.8640632822975
76.56256923376974
78.05246040052124
78.19357343520991
81.13604068897467
82.26781258070697
87.34195067949857
79.91200870698799
82.33292876950038
74.91055413077252
82.47050838332606
70.36418973654769
79.56924386852161
70.06096018208072
82.54765533265174
86.16115379810184
61.402091303279896
74.29525866837294
77.43795288652258
78.53617559609039
84.07707243773565
83.87027342334234
81.29841478761122
75.16255625046323
81.93612533003622
82.3030263233203
77.599541842924
76.0172517761654
81.30200636053584
88.2794180153652
73.41566820212797
76.43439609741209
62.22666268033935
81.81228812713917
67.40551694521065
81.27439063057882
73.8401045804389
82.43483676518201
77.4490840352222
82.05600364094401
80.84286023516452
81.40923574121854
72.96913711111404
83.95842685723139
79.34709717966854
68.46162508791912
65.89825421429686
90.59991854329544
82.60698558853309
79.07418800988738
77.43232254022591
79.78923642779793
80.42694643711489

## Double-domain recommendations

In [None]:
from math import *


def sim_distance(prefs,person1,person2):
  si={}
  for item in prefs[person1]: 
    if item in prefs[person2]: si[item]=1
  if len(si)==0: return 0
  sum_of_squares=sum([pow(prefs[person1][item]-prefs[person2][item],2) for item in prefs[person1] if item in prefs[person2]])
  return 1/(1+sum_of_squares)

  
def sim_pearson(prefs,p1,p2):
  si={}
  for item in prefs[p1]: 
    if item in prefs[p2]: si[item]=1
  if len(si)==0: return 0

  # Sum calculations
  n=len(si)
  
  # Sums of all the preferences
  sum1=sum([prefs[p1][it] for it in si])
  sum2=sum([prefs[p2][it] for it in si])
  
  # Sums of the squares
  sum1Sq=sum([pow(prefs[p1][it],2) for it in si])
  sum2Sq=sum([pow(prefs[p2][it],2) for it in si])	
  
  # Sum of the products
  pSum=sum([prefs[p1][it]*prefs[p2][it] for it in si])
  
  # Calculate r (Pearson score)
  num=pSum-(sum1*sum2/n)
  den=sqrt((sum1Sq-pow(sum1,2)/n)*(sum2Sq-pow(sum2,2)/n))
  if den==0: return 0

  r=num/den

  return r


# Returns the best matches for person from the prefs dictionary. 
# Number of results and similarity function are optional params.
def topMatches(prefs,person,n=5,similarity=sim_pearson):
  scores=[(similarity(prefs,person,other),other) 
                  for other in prefs if other!=person]
  scores.sort()
  scores.reverse()
  return scores[0:n]


# Gets recommendations for a person by using a weighted average
# of every other user's rankings
def getRecommendations(domain1,domain2,person,similarity=sim_pearson):
  totals={}
  simSums={}
  for other in domain2:
    # don't compare me to myself
    if other==person: continue
    sim=similarity(domain2,person,other)

    # ignore scores of zero or lower
    if sim<=0: continue
    # print other,sim
    for item in domain1[other]:
	    
      # only score movies I haven't seen yet
      if item not in domain1[person] or domain1[person][item]==0:
        # Similarity * Score
        totals.setdefault(item,0)
        totals[item]+=domain1[other][item]*sim
        # Sum of similarities
        simSums.setdefault(item,0)
        simSums[item]+=sim

  # Create the normalized list
  rankings=[(total/simSums[item],item) for item,total in totals.items()]

  # Return the sorted list
  rankings.sort()
  rankings.reverse()
  return rankings


def loadMovieLens(path='movielens',file='/u1.base'):
  # Get movie titles
  movies={}
  for line in open(path+'/u.item', encoding="ISO-8859-1"):
    (id,title)=line.split('|')[0:2]
    movies[id]=title
  
  # Load data
  prefs={}
  for line in open(path+file, encoding="ISO-8859-1"):
    (user,movieid,rating,ts)=line.split('\t')
    prefs.setdefault(user,{})
    prefs[user][movieid]=float(rating)
  return prefs

In [None]:
import csv
from sklearn.model_selection import train_test_split


if __name__=='__main__':

    file = open('movielens/u.data', encoding="ISO-8859-1")
    data = csv.reader(file, delimiter='\t')

    t = [row for row in data]

    fDomainDict,sDomainDict={},{}
    fDomainDict=loadMovieLens('movielens','/u1.base')
    sDomainDict=loadMovieLens('movielens','/u2.base')
    fDomainTest =loadMovieLens('movielens','/u1.test')

    sumAccuracy=0
    lenCount=0
    for user in fDomainTest:
        pred = getRecommendations(fDomainDict,sDomainDict,user)
        count=-1
        preds={}
        for rating,item in pred:
            preds[item]=rating
            # print(movies[item],rating,item)
        accuracies=[]
        for movie in fDomainTest[user]:
            if not movie in preds:continue 
            actualRating = fDomainTest[user][movie]
            predcitedRating = preds[movie]
            accu = fabs((predcitedRating - actualRating)/actualRating)
            if accu > 1:
                continue
            accuracies.append(1 - accu)
        lenCount+=1
        print((sum(accuracies)/len(accuracies))*100)
        sumAccuracy+=(sum(accuracies)/len(accuracies))*100
        
    print('Average Accuracy')
    print(float(sumAccuracy)/lenCount)

78.24458597531506
88.72733991801918
74.95084077484837
78.07399138163689
78.33014107962788
79.37675348022614
78.16251325662003
80.61837594957294
83.27084079284761
88.20977500645698
82.01889065118986
83.9474574689836
75.9191819559501
84.16207820451268
72.51444430410291
79.87563986794397
74.58287423991234
84.03080793731756
82.0726031809847
66.1935783088158
75.04043838849766
75.81956475714028
81.09323822210993
84.8137538788754
87.69639052666197
82.65709505265816
83.32672887963535
84.33751301478816
86.89653169865294
81.7428314527821
82.18927200064815
87.03495023540914
88.64489145800378
79.1915150582124
75.28049685820244
80.19683014300965
84.68910045981767
65.7860993956031
86.22655908813755
79.36955779517363
86.45272893856121
80.13292208729263
83.6397888730046
81.70979636168832
84.53468384734644
84.14168996785631
89.82850502678284
82.16464240366996
70.24732219787117
74.03423604766205
92.100430389486
87.06309009254586
83.95274443284171
79.20398611773642
76.17186131332959
82.0360042814422
84.5

## Citations

A Deep Framework for Cross-Domain and Cross-System Recommendations. Feng Zhu, Yan Wang, Chaochao Chen, Guanfeng Liu, Mehmet Orgun, Jia Wu. 2018. IJCAI. [https://www.ijcai.org/proceedings/2018/0516.pdf](https://www.ijcai.org/proceedings/2018/0516.pdf)