# DaRE Cross-domain recommender on Amazon Reviews dataset

CDR utilizes information from source domains to alleviate the cold-start problem in the target domain. Early studies adopt feature mapping technique that requires overlapped users. For example, RC-DFM applies Stacked Denoising Autoencoder (SDAE) to each domain, where the learned knowledge of the same set of users are transferred from source to target domain. To overcome the restrictive requirement of overlapped users, CDLFM and CATN employ neighbor or similar user-based feature mapping. However, this kind of cross-domain algorithm implicates defects like filtering noises or requiring duplicate users.

## Problem Statement

Assume two datasets, $𝐷^𝑠$ and $𝐷^𝑡$, be the information from the source and target domains, respectively. Each dataset consists of tuples, $(𝑢,𝑖,𝑦_{𝑢,𝑖}, 𝑟_{𝑢,𝑖})$ which represents an individual review $𝑟_{𝑢,𝑖}$ written by a user 𝑢 for item 𝑖 with a rating $𝑦_{𝑢,𝑖}$. The two datasets take the form of $D^s = (𝑢^s,𝑖^s,𝑦^s_{𝑢,𝑖}, 𝑟^s_{𝑢,𝑖})$ and $D^t = (𝑢^t,𝑖^t,𝑦^t_{𝑢,𝑖}, 𝑟^t_{𝑢,𝑖})$, respectively. The goal of our task is to predict an accurate rating score $y^t_{u,i}$ using $𝐷^𝑠$ and a partial set of $𝐷^t$.

## Model Architecture

<p><center><img src='_images/T519611_1.png'></center></p>

## Training Procedure

The training phase starts with review embedding layers followed by three types of feature extractors, ${𝐹𝐸}^𝑠$, ${𝐹𝐸}^c$, and ${𝐹𝐸}^t$, named source, common, and target, for the separation of domain-specific, domain-common knowledge. Integrated with domain discriminator, three FEs are trained independently for the parallel extraction of domain-specific $𝑂^𝑠$, $𝑂^𝑡$ and domain-common knowledge $𝑂^{𝑐,𝑠}$, $𝑂^{𝑐,𝑡}$.

<p><center><img src='_images/T519611_2.png'></center></p>

Then, for each domain, the review encoder generates a single vector $𝐸^𝑠$, $𝐸^𝑡$ with extracted features 𝑂 by aligning them with individual review $𝐼^𝑠$, $𝐼^𝑡$. Finally, the regressor predicts an accurate rating that the user will give on an item. Here, shared parameters across two domains are common FE and a domain discriminator.

## Setup

### Imports

In [None]:
import re
import json
import numpy as np
from string import punctuation
from tqdm.notebook import tqdm

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torch.autograd import Function

In [None]:
import warnings
warnings.filterwarnings('ignore') 

### Params

In [None]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
batch_size = 32

## Dataset

### Loading

In [None]:
# !wget -q --show-progress https://anonymous.4open.science/api/repo/DaRE-9CC9/file/DaRE/Musical_Instruments.json
# !wget -q --show-progress https://anonymous.4open.science/api/repo/DaRE-9CC9/file/DaRE/Patio_Lawn_and_Garden.json

!wget -q --show-progress https://github.com/sparsh-ai/coldstart-recsys/raw/main/data/DaRE/Musical_Instruments.zip
!unzip Musical_Instruments.zip

!wget -q --show-progress https://github.com/sparsh-ai/coldstart-recsys/raw/main/data/DaRE/Patio_Lawn_and_Garden.zip
!unzip Patio_Lawn_and_Garden.zip

In [None]:
!head -1 Musical_Instruments.json

{"reviewerID": "A2IBPI20UZIR0U", "asin": "1384719342", "reviewerName": "cassandra tu \"Yeah, well, that's just like, u...", "helpful": [0, 0], "reviewText": "Not much to write about here, but it does exactly what it's supposed to. filters out the pop sounds. now my recordings are much more crisp. it is one of the lowest prices pop filters on amazon so might as well buy it, they honestly work the same despite their pricing,", "overall": 5.0, "summary": "good", "unixReviewTime": 1393545600, "reviewTime": "02 28, 2014"}


In [None]:
!wget -q --show-progress https://github.com/allenai/spv2/raw/master/model/glove.6B.100d.txt.gz



In [None]:
!gunzip glove.6B.100d.txt.gz

### Preprocessing

In [None]:
def read_dataset(s_path, t_path):
    # Initialization
    s_dict, t_dict, w_embed = dict(), dict(), dict()
    s_data, t_train, t_valid, t_test = [], [], [], []
    len_t_data = 0

    print('\nProcessing Source & Target Data ... \n')

    f = open(s_path, 'r')

    # Read source data and generate user & item's review dict
    while True:
        line = f.readline()
        if not line: break

        # Convert str to json format
        line = json.loads(line)

        try:
            user, item, review, rating = line['reviewerID'], line['asin'], line['reviewText'], line['overall']

            review = review.lower()
            review = ''.join([c for c in review if c not in punctuation])

        except KeyError:
            continue

        s_data.append([user, item, rating])

        if user in s_dict:
            s_dict[user].append([item, review])
        else:
            s_dict[user] = [[item, review]]

        if item in s_dict:
            s_dict[item].append([user, review])
        else:
            s_dict[item] = [[user, review]]
    f.close()

    # For the separation of train / valid / test data in a target domain
    f = open(t_path, 'r')
    while True:
        len_t_data += 1
        line = f.readline()
        if not line: break

    len_train_data = int(len_t_data * 0.8)
    len_t_data = int(len_t_data * 0.2)
    f.close()

    # Read target domain's data
    f = open(t_path, 'r')
    while True:
        line = f.readline()
        if not line: break

        line = json.loads(line)

        try:
            user, item, review, rating = line['reviewerID'], line['asin'], line['reviewText'], line['overall']

            review = review.lower()
            review = ''.join([c for c in review if c not in punctuation])

        except KeyError:
            continue

        if user in t_dict and item in t_dict and len(t_valid) < len_t_data:
            t_valid.append([user, item, rating])
        else:
            if len(t_train) > len_train_data:
                break

            t_train.append([user, item, rating])

            if user in t_dict:
                t_dict[user].append([item, review])
            else:
                t_dict[user] = [[item, review]]
            if item in t_dict:
                t_dict[item].append([user, review])
            else:
                t_dict[item] = [[user, review]]

    f.close()

    # Split valid / test data
    t_test, t_valid = t_valid[int(len_t_data/2):len_t_data], t_valid[0:int(len_t_data/2)]

    print('Size of Train / Valid / Test data  : %d / %d / %d' % (len(t_train), len(t_valid), len(t_test)))

    # Dictionary for word embedding
    f = open('glove.6B.100d.txt')

    for line in f:
        word_vector = line.split()
        word = word_vector[0]
        word_vector_arr = np.asarray(word_vector[1:], dtype='float32')
        w_embed[word] = word_vector_arr

    f.close()

    return s_data, s_dict, t_train, t_valid, t_test, t_dict, w_embed

**Define GRL for common feature extraction**

In [None]:
class GradientReversalFunction(Function):
    @staticmethod
    def forward(ctx, x):
        ctx.lambda_ = 1
        return x.clone()

    @staticmethod
    def backward(ctx, grads):
        lambda_ = 1
        lambda_ = grads.new_tensor(lambda_)
        dx = -lambda_ * grads
        return dx, None

## Model Definition

In [None]:
class DaRE(nn.Module):
    def __init__(self):
        super(DaRE, self).__init__()
        # Num of CNN filter, CNN filter size 5x100
        self.filters_num = 100
        self.kernel_size = 5
        # Word embedding dimension
        self.word_dim = 100
        # Loss for siamese encoder
        self.dist = nn.MSELoss()

        self.s_user_feature_extractor = nn.Sequential(
            nn.Conv2d(1, self.filters_num, (self.kernel_size, self.word_dim)),
            nn.BatchNorm2d(self.filters_num),
            nn.Sigmoid(),
            nn.MaxPool2d((496, 1)),
            nn.Dropout(),
        )

        self.s_item_feature_extractor = nn.Sequential(
            nn.Conv2d(1, self.filters_num, (self.kernel_size, self.word_dim)),
            nn.BatchNorm2d(self.filters_num),
            nn.Sigmoid(),
            nn.MaxPool2d((496, 1)),
            nn.Dropout(),
        )

        self.t_user_feature_extractor = nn.Sequential(
            nn.Conv2d(1, self.filters_num, (self.kernel_size, self.word_dim)),
            nn.BatchNorm2d(self.filters_num),
            nn.Sigmoid(),
            nn.MaxPool2d((496, 1)),
            nn.Dropout(),
        )

        self.t_item_feature_extractor = nn.Sequential(
            nn.Conv2d(1, self.filters_num, (self.kernel_size, self.word_dim)),
            nn.BatchNorm2d(self.filters_num),
            nn.Sigmoid(),
            nn.MaxPool2d((496, 1)),
            nn.Dropout(),
        )

        self.c_user_feature_extractor = nn.Sequential(
            nn.Conv2d(1, self.filters_num, (self.kernel_size, self.word_dim)),
            nn.BatchNorm2d(self.filters_num),
            nn.Sigmoid(),
            nn.MaxPool2d((496, 1)),
            nn.Dropout(),
        )

        self.c_item_feature_extractor = nn.Sequential(
            nn.Conv2d(1, self.filters_num, (self.kernel_size, self.word_dim)),
            nn.BatchNorm2d(self.filters_num),
            nn.Sigmoid(),
            nn.MaxPool2d((496, 1)),
            nn.Dropout(),
        )

        self.discriminator = nn.Sequential(
            nn.Linear(200, 64),
            nn.Sigmoid(),
            nn.Linear(64, 1),
        )

        self.s_encoder = nn.Sequential(
            nn.Linear(200, 200)
        )

        self.s_classifier = nn.Sequential(
            nn.Linear(200, 32),
            nn.Sigmoid(),
            nn.Linear(32, 1),
        )

        self.t_encoder = nn.Sequential(
            nn.Linear(200, 200)
        )

        self.t_classifier = nn.Sequential(
            nn.Linear(200, 32),
            nn.Sigmoid(),
            nn.Linear(32, 1),
        )

        self.reset_para()

    def reset_para(self):
        for cnn in [self.s_user_feature_extractor[0], self.s_item_feature_extractor[0]]:
            nn.init.xavier_normal_(cnn.weight)
            nn.init.constant_(cnn.bias, 0.1)

        for cnn in [self.t_user_feature_extractor[0], self.t_item_feature_extractor[0]]:
            nn.init.xavier_normal_(cnn.weight)
            nn.init.constant_(cnn.bias, 0.1)

        for cnn in [self.c_user_feature_extractor[0], self.c_item_feature_extractor[0]]:
            nn.init.xavier_normal_(cnn.weight)
            nn.init.constant_(cnn.bias, 0.1)

        for fc in [self.s_classifier[0]]:
            nn.init.uniform_(fc.weight, -0.1, 0.1)
            nn.init.constant_(fc.bias, 0.1)

        for fc in [self.t_classifier[0]]:
            nn.init.uniform_(fc.weight, -0.1, 0.1)
            nn.init.constant_(fc.bias, 0.1)

    def forward(self, user, item, ans, label):
        # Source individual review FE
        s_u_ans_fea = self.s_user_feature_extractor(ans).squeeze(2).squeeze(2)
        c_u_ans_fea = self.c_user_feature_extractor(ans).squeeze(2).squeeze(2)
        s_u_ans_fea = (s_u_ans_fea + c_u_ans_fea) / 2

        s_i_ans_fea = self.s_item_feature_extractor(ans).squeeze(2).squeeze(2)
        c_i_ans_fea = self.c_item_feature_extractor(ans).squeeze(2).squeeze(2)
        s_i_ans_fea = (s_i_ans_fea + c_i_ans_fea) / 2

        s_ans_fea = torch.cat((s_u_ans_fea, s_i_ans_fea), 1).squeeze(1)

        # Label of source individual review
        s_cls_out = self.s_classifier(s_ans_fea)

        # Output is [Source | Target] --> Masking target output for loss calculation
        masking = torch.cat([torch.ones(batch_size), torch.zeros(batch_size)]).view(batch_size * 2, -1).to(device)
        s_ans_out, s_label = torch.mul(s_cls_out, masking), torch.mul(label, masking)

        # Source aggregated reviews FE
        s_u_fea = self.s_user_feature_extractor(user).squeeze(2).squeeze(2)
        s_i_fea = self.s_item_feature_extractor(item).squeeze(2).squeeze(2)

        s_c_u_fea = self.c_user_feature_extractor(user).squeeze(2).squeeze(2)
        s_c_i_fea = self.c_item_feature_extractor(item).squeeze(2).squeeze(2)

        s_u_fea = (s_u_fea + s_c_u_fea) / 2
        s_i_fea = (s_i_fea + s_c_i_fea) / 2

        s_fea = torch.cat((s_u_fea, s_i_fea), 1).squeeze(1)

        # Passing through encoder for aggregated review embedding
        s_fea = self.s_encoder(s_fea)

        s_cls_out = self.s_classifier(s_fea)
        s_out = torch.mul(s_cls_out, masking)

        # Distance between individual review & aggregated review
        s_dist = self.dist(torch.mul(s_ans_fea, masking), torch.mul(s_fea, masking))

        # Same for target domain
        t_u_ans_fea = self.t_user_feature_extractor(ans).squeeze(2).squeeze(2)
        c_u_ans_fea = self.c_user_feature_extractor(ans).squeeze(2).squeeze(2)
        t_u_ans_fea = (t_u_ans_fea + c_u_ans_fea) / 2

        t_i_ans_fea = self.t_item_feature_extractor(ans).squeeze(2).squeeze(2)
        c_i_ans_fea = self.c_item_feature_extractor(ans).squeeze(2).squeeze(2)
        t_i_ans_fea = (t_i_ans_fea + c_i_ans_fea) / 2

        t_ans_fea = torch.cat((t_u_ans_fea, t_i_ans_fea), 1).squeeze(1)

        t_cls_out = self.t_classifier(t_ans_fea)

        masking = torch.cat([torch.zeros(batch_size), torch.ones(batch_size)]).view(batch_size * 2, -1).to(device)
        t_ans_out, t_label = torch.mul(t_cls_out, masking), torch.mul(label, masking)

        # Target classification loss
        t_u_fea = self.t_user_feature_extractor(user).squeeze(2).squeeze(2)
        t_i_fea = self.t_item_feature_extractor(item).squeeze(2).squeeze(2)

        t_c_u_fea = self.c_user_feature_extractor(user).squeeze(2).squeeze(2)
        t_c_i_fea = self.c_item_feature_extractor(item).squeeze(2).squeeze(2)

        t_u_fea = (t_u_fea + t_c_u_fea) / 2
        t_i_fea = (t_i_fea + t_c_i_fea) / 2

        t_fea = torch.cat((t_u_fea, t_i_fea), 1).squeeze(1)

        t_fea = self.t_encoder(t_fea)

        t_cls_out = self.t_classifier(t_fea)
        t_out = torch.mul(t_cls_out, masking)

        t_dist = self.dist(torch.mul(t_ans_fea, masking), torch.mul(t_fea, masking))

        # Discriminator label
        s_domain_specific = torch.zeros(batch_size).to(device)
        t_domain_specific = torch.ones(batch_size).to(device)

        # Common source discriminator loss
        s_c_d_fea = torch.cat((s_c_u_fea, s_c_i_fea), 1)
        s_c_d_fea = GradientReversalFunction.apply(s_c_d_fea)
        s_c_d_fea = self.discriminator(s_c_d_fea).squeeze(1)[0:batch_size]
        s_c_domain_loss = F.binary_cross_entropy_with_logits(s_c_d_fea, s_domain_specific)

        # Common target discriminator loss
        t_c_d_fea = torch.cat((t_c_u_fea, t_c_i_fea), 1)
        t_c_d_fea = GradientReversalFunction.apply(t_c_d_fea)
        t_c_d_fea = self.discriminator(t_c_d_fea).squeeze(1)[batch_size:batch_size * 2]
        t_c_domain_loss = F.binary_cross_entropy_with_logits(t_c_d_fea, t_domain_specific)

        domain_common_loss = (s_c_domain_loss + t_c_domain_loss) / 2

        # Source specific discriminator loss
        s_d_fea = torch.cat((s_u_fea, s_i_fea), 1)
        s_d_fea = self.discriminator(s_d_fea).squeeze(1)[0:batch_size]

        # Target specific discriminator loss
        t_d_fea = torch.cat((t_u_fea, t_i_fea), 1)
        t_d_fea = self.discriminator(t_d_fea).squeeze(1)[batch_size:batch_size * 2]

        s_domain_specific = torch.zeros(batch_size).to(device)
        s_domain_loss = F.binary_cross_entropy_with_logits(s_d_fea, s_domain_specific)
        t_domain_specific = torch.ones(batch_size).to(device)
        t_domain_loss = F.binary_cross_entropy_with_logits(t_d_fea, t_domain_specific)
        domain_specific_loss = (s_domain_loss + t_domain_loss) / 2

        return s_ans_out, s_out, s_label, s_dist, t_ans_out, t_out, t_label, t_dist, domain_common_loss, domain_specific_loss

## Clean strings for reviews

In [None]:
def clean_str(string):
    string = re.sub(r"[^A-Za-z0-9]", " ", string)
    string = re.sub(r"\'s", " \'s", string)
    string = re.sub(r"\'ve", " \'ve", string)
    string = re.sub(r"n\'t", " n\'t", string)
    string = re.sub(r"\'re", " \'re", string)
    string = re.sub(r"\'d", " \'d", string)
    string = re.sub(r"\'ll", " \'ll", string)
    string = re.sub(r",", " , ", string)
    string = re.sub(r"!", " ! ", string)
    string = re.sub(r"\(", " \( ", string)
    string = re.sub(r"\)", " \) ", string)
    string = re.sub(r"\?", " \? ", string)
    string = re.sub(r"\s{2,}", " ", string)
    string = re.sub(r"\s{2,}", " ", string)
    string = re.sub(r"sssss ", " ", string)

    return string.strip().lower()

## Review embedding layer

In [None]:
def pre_processing(s_data, s_dict, t_data, t_dict, w_embed, valid_idx):
    # Return embedded vector [user, item, rev_ans, rat]
    u_embed, i_embed, ans_embed, label = [], [], [], []
    limit = 500

    for idx in range(batch_size):
        u, i, rat = s_data[0][idx], s_data[1][idx], s_data[2][idx]

        u_rev, i_rev, ans_rev = [], [], []

        reviews = s_dict[u]
        for review in reviews:
            if review[0] != i:
                review = review[1].split(' ')
                for rev in review:
                    try:
                        rev = clean_str(rev)
                        rev = w_embed[rev]
                        u_rev.append(rev)
                        if len(u_rev) > limit:
                            break
                    except KeyError:
                        continue

        reviews = s_dict[i]
        for review in reviews:
            if review[0] != u:
                review = review[1].split(' ')
                for rev in review:
                    try:
                        rev = clean_str(rev)
                        rev = w_embed[rev]
                        i_rev.append(rev)
                        if len(i_rev) > limit:
                            break
                    except KeyError:
                        continue

        reviews = s_dict[u]
        for review in reviews:
            if review[0] == i:
                review = review[1].split(' ')
                for rev in review:
                    try:
                        rev = clean_str(rev)
                        rev = w_embed[rev]
                        ans_rev.append(rev)
                        if len(ans_rev) > limit:
                            break
                    except KeyError:
                        continue

        if len(u_rev) > limit:
            u_rev = u_rev[0:limit]
        else:
            lis = [0.0] * 100
            pend = limit - len(u_rev)
            for p in range(pend):
                u_rev.append(lis)

        if len(i_rev) > limit:
            i_rev = i_rev[0:limit]
        else:
            lis = [0.0] * 100
            pend = limit - len(i_rev)
            for p in range(pend):
                i_rev.append(lis)

        if len(ans_rev) > limit:
            ans_rev = ans_rev[0:limit]
        else:
            lis = [0.0] * 100
            pend = limit - len(ans_rev)
            for p in range(pend):
                ans_rev.append(lis)

        u_embed.append(u_rev)
        i_embed.append(i_rev)
        ans_embed.append(ans_rev)
        label.append([rat])

    if valid_idx:
        u_embed = torch.tensor(u_embed, requires_grad=True).view(batch_size, 1, 500, 100).to(device)
        i_embed = torch.tensor(i_embed, requires_grad=True).view(batch_size, 1, 500, 100).to(device)
        ans_embed = torch.tensor(ans_embed, requires_grad=True).view(batch_size, 1, 500, 100).to(device)
        label = torch.FloatTensor(label).to(device)

        return u_embed, i_embed, ans_embed, label

    for idx in range(batch_size):
        u, i, rat = t_data[0][idx], t_data[1][idx], t_data[2][idx]

        u_rev, i_rev, ans_rev = [], [], []

        reviews = t_dict[u]
        for review in reviews:
            if review[0] != i:
                review = review[1].split(' ')
                for rev in review:
                    try:
                        rev = clean_str(rev)
                        rev = w_embed[rev]
                        u_rev.append(rev)
                        if len(u_rev) > limit:
                            break
                    except KeyError:
                        continue

        reviews = t_dict[i]
        for review in reviews:
            if review[0] != u:
                review = review[1].split(' ')
                for rev in review:
                    try:
                        rev = clean_str(rev)
                        rev = w_embed[rev]
                        i_rev.append(rev)
                        if len(i_rev) > limit:
                            break
                    except KeyError:
                        continue

        reviews = t_dict[u]
        for review in reviews:
            if review[0] == i:
                review = review[1].split(' ')
                for rev in review:
                    try:
                        rev = clean_str(rev)
                        rev = w_embed[rev]
                        ans_rev.append(rev)
                        if len(ans_rev) > limit:
                            break
                    except KeyError:
                        continue

        if len(u_rev) > limit:
            u_rev = u_rev[0:limit]
        else:
            lis = [0.0] * 100
            pend = limit - len(u_rev)
            for p in range(pend):
                u_rev.append(lis)

        if len(i_rev) > limit:
            i_rev = i_rev[0:limit]
        else:
            lis = [0.0] * 100
            pend = limit - len(i_rev)
            for p in range(pend):
                i_rev.append(lis)

        if len(ans_rev) > limit:
            ans_rev = ans_rev[0:limit]
        else:
            lis = [0.0] * 100
            pend = limit - len(ans_rev)
            for p in range(pend):
                ans_rev.append(lis)

        u_embed.append(u_rev)
        i_embed.append(i_rev)
        ans_embed.append(ans_rev)
        label.append([rat])

    u_embed = torch.tensor(u_embed, requires_grad=True).view(batch_size * 2, 1, 500, 100).to(device)
    i_embed = torch.tensor(i_embed, requires_grad=True).view(batch_size * 2, 1, 500, 100).to(device)
    ans_embed = torch.tensor(ans_embed, requires_grad=True).view(batch_size * 2, 1, 500, 100).to(device)
    label = torch.FloatTensor(label).to(device)

    return u_embed, i_embed, ans_embed, label

## Training function

In [None]:
def learning(s_data, s_dict, t_data, t_dict, w_embed, save, idx):
    # Model
    print('Start Training ... \n')
    enc_loss_ratio, domain_loss_ratio = 0.05, 0.1
    model = DaRE()
    # After 1 epoch, load trained parameters
    if idx == 1:
        model.load_state_dict(torch.load(save, map_location=device))
    model.to(device)
    model.train()

    criterion = nn.MSELoss()

    optim = torch.optim.Adam(model.parameters(), lr=1e-4)

    # Make batch
    batch_size = 32
    s_batch = DataLoader(s_data, batch_size=batch_size, shuffle=True, num_workers=2)
    t_batch = DataLoader(t_data, batch_size=batch_size, shuffle=True, num_workers=2)

    batch_data, zip_size = zip(s_batch, t_batch), min(len(s_batch), len(t_batch))

    for source_x, target_x in tqdm(batch_data, leave=False, total=zip_size):
        # Pre processing
        if len(source_x[0]) != batch_size or len(target_x[0]) != batch_size:
            continue

        # Get embedding of user and item reviews
        u_embed, i_embed, ans_embed, label = pre_processing(source_x, s_dict, target_x, t_dict, w_embed, 0)

        s_ans_out, s_out, s_label, s_dist, t_ans_out, t_out, t_label, t_dist, \
        c_domain_loss, domain_loss = model(u_embed, i_embed, ans_embed, label)

        # Loss
        s_ans_loss, s_loss = criterion(s_ans_out, s_label) * 2, criterion(s_out, s_label) * 2
        t_ans_loss, t_loss = criterion(t_ans_out, t_label) * 2, criterion(t_out, t_label) * 2

        # Train
        loss_func = (s_loss + t_loss + s_ans_loss + t_ans_loss) / 2 + \
                    (s_dist + t_dist) * enc_loss_ratio + (c_domain_loss + domain_loss) * domain_loss_ratio

        optim.zero_grad()
        loss_func.backward()
        optim.step()

        torch.save(model.state_dict(), save)
              
        print('Prediction Loss / Encoder Loss / Domain Loss: %.2f %.2f %.2f %.2f %.2f %.2f' %
              (s_loss, t_loss, s_dist, t_dist, c_domain_loss, domain_loss))

## Validation & Inference function

In [None]:
def valid(v_data, t_data, t_dict, w_embed, save, write_file):
    model = DaRE()
    model.load_state_dict(torch.load(save, map_location=device))
    model.to(device)
    model.eval()

    criterion = nn.MSELoss()

    t_user_feature_extractor = model.t_user_feature_extractor
    t_item_feature_extractor = model.t_item_feature_extractor
    t_encoder = model.t_encoder
    t_clf = model.t_classifier

    c_user_feature_extractor = model.c_user_feature_extractor
    c_item_feature_extractor = model.c_item_feature_extractor

    v_batch = DataLoader(v_data, batch_size=batch_size, shuffle=True, num_workers=2)
    v_loss, idx = 0, 0

    for v_data in tqdm(v_batch, leave=False):
        if len(v_data[0]) != batch_size:
            continue
        u_embed, i_embed, ans_embed, label = pre_processing(v_data, t_dict, v_data, t_dict, w_embed, 1)

        with torch.no_grad():
            # Target rating encoder
            c_u_fea = c_user_feature_extractor(u_embed).squeeze(2).squeeze(2)
            c_i_fea = c_item_feature_extractor(i_embed).squeeze(2).squeeze(2)

            t_u_fea = t_user_feature_extractor(u_embed).squeeze(2).squeeze(2)
            t_i_fea = t_item_feature_extractor(i_embed).squeeze(2).squeeze(2)

            u_fea, i_fea = (c_u_fea + t_u_fea) / 2, (c_i_fea + t_i_fea) / 2

            t_fea = t_encoder(torch.cat((u_fea, i_fea), 1).squeeze(1))

            t_out = t_clf(t_fea)

            v_loss += criterion(t_out, label)
        idx += 1
    v_loss = v_loss / idx

    t_batch = DataLoader(t_data, batch_size=batch_size, shuffle=True, num_workers=2)
    t_loss, idx = 0, 0

    for t_data in tqdm(t_batch, leave=False):
        if len(t_data[0]) != batch_size:
            continue
        u_embed, i_embed, ans_embed, label = pre_processing(t_data, t_dict, t_data, t_dict, w_embed, 1)

        with torch.no_grad():
            # Target rating encoder
            c_u_fea = c_user_feature_extractor(u_embed).squeeze(2).squeeze(2)
            c_i_fea = c_item_feature_extractor(i_embed).squeeze(2).squeeze(2)

            t_u_fea = t_user_feature_extractor(u_embed).squeeze(2).squeeze(2)
            t_i_fea = t_item_feature_extractor(i_embed).squeeze(2).squeeze(2)

            u_fea, i_fea = (c_u_fea + t_u_fea) / 2, (c_i_fea + t_i_fea) / 2

            t_fea = t_encoder(torch.cat((u_fea, i_fea), 1).squeeze(1))

            t_out = t_clf(t_fea)

            t_loss += criterion(t_out, label)
        idx += 1

    t_loss = t_loss / idx

    print('Loss: %.4f %.4f' % (v_loss, t_loss))

    w = open(write_file, 'a')
    w.write('%.6f %.6f\n' % (v_loss, t_loss))

In [None]:
if __name__ == '__main__':
    # Define paths for source & target domain
    source_path = './Musical_Instruments.json'
    target_path = './Patio_Lawn_and_Garden.json'

    iteration = 5

    path = source_path[2:-5] + '_plus_' + target_path[2:-5]
    print('Source & Target domain: ', path)

    save = './' + path + '.pth'
    write_file = './Performance_' + path + '.txt'

    s_data, s_dict, t_train, t_valid, t_test, t_dict, w_embed = read_dataset(source_path, target_path)

    for i in range(iteration):
        # After 1 epoch of training -> load trained parameter
        if i > 0:
            learning(s_data, s_dict, t_train, t_dict, w_embed, save, 1)
        # First training
        else:
            learning(s_data, s_dict, t_train, t_dict, w_embed, save, 0)

        # Validation and Test
        valid(t_valid, t_test, t_dict, w_embed, save, write_file)

Source & Target domain:  Musical_Instruments_plus_Patio_Lawn_and_Garden

Processing Source & Target Data ... 

Size of Train / Valid / Test data  : 10618 / 1327 / 1327
Start Training ... 



  0%|          | 0/321 [00:00<?, ?it/s]

Prediction Loss / Encoder Loss / Domain Loss: 26.19 19.32 0.90 0.98 0.70 0.69
Prediction Loss / Encoder Loss / Domain Loss: 23.41 18.45 0.88 0.96 0.71 0.70
Prediction Loss / Encoder Loss / Domain Loss: 22.63 18.95 0.89 0.95 0.70 0.70
Prediction Loss / Encoder Loss / Domain Loss: 25.88 19.87 0.85 0.99 0.70 0.70
Prediction Loss / Encoder Loss / Domain Loss: 22.18 17.75 0.85 0.94 0.70 0.70
Prediction Loss / Encoder Loss / Domain Loss: 23.46 19.79 0.85 0.93 0.69 0.69
Prediction Loss / Encoder Loss / Domain Loss: 24.17 16.33 0.83 0.93 0.70 0.70
Prediction Loss / Encoder Loss / Domain Loss: 22.42 17.31 0.84 0.93 0.69 0.69
Prediction Loss / Encoder Loss / Domain Loss: 23.93 15.30 0.82 0.91 0.69 0.69
Prediction Loss / Encoder Loss / Domain Loss: 23.86 18.12 0.80 0.91 0.69 0.69
Prediction Loss / Encoder Loss / Domain Loss: 22.18 14.11 0.81 0.89 0.70 0.70
Prediction Loss / Encoder Loss / Domain Loss: 21.92 15.48 0.80 0.89 0.69 0.69
Prediction Loss / Encoder Loss / Domain Loss: 20.06 16.45 0.81 0

  0%|          | 0/42 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

Loss: 6.0532 6.0182
Start Training ... 



  0%|          | 0/321 [00:00<?, ?it/s]

Prediction Loss / Encoder Loss / Domain Loss: 9.92 4.63 1.27 1.35 0.69 0.68
Prediction Loss / Encoder Loss / Domain Loss: 10.16 5.18 1.26 1.33 0.69 0.68
Prediction Loss / Encoder Loss / Domain Loss: 9.64 5.82 1.25 1.33 0.69 0.68
Prediction Loss / Encoder Loss / Domain Loss: 8.70 7.07 1.26 1.35 0.69 0.68
Prediction Loss / Encoder Loss / Domain Loss: 8.87 5.80 1.27 1.27 0.69 0.68
Prediction Loss / Encoder Loss / Domain Loss: 11.17 5.20 1.27 1.31 0.69 0.67
Prediction Loss / Encoder Loss / Domain Loss: 9.91 6.11 1.23 1.29 0.69 0.68
Prediction Loss / Encoder Loss / Domain Loss: 9.28 6.36 1.22 1.33 0.68 0.67
Prediction Loss / Encoder Loss / Domain Loss: 7.65 5.89 1.22 1.28 0.68 0.67
Prediction Loss / Encoder Loss / Domain Loss: 7.82 6.34 1.15 1.28 0.68 0.68
Prediction Loss / Encoder Loss / Domain Loss: 8.93 4.26 1.21 1.25 0.69 0.67
Prediction Loss / Encoder Loss / Domain Loss: 8.86 5.24 1.19 1.31 0.68 0.67
Prediction Loss / Encoder Loss / Domain Loss: 9.16 5.43 1.17 1.25 0.68 0.67
Prediction

  0%|          | 0/42 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

Loss: 3.9323 3.9282
Start Training ... 



  0%|          | 0/321 [00:00<?, ?it/s]

Prediction Loss / Encoder Loss / Domain Loss: 6.04 2.99 0.36 0.38 0.70 0.64
Prediction Loss / Encoder Loss / Domain Loss: 7.33 3.03 0.35 0.38 0.68 0.62
Prediction Loss / Encoder Loss / Domain Loss: 6.74 3.27 0.34 0.38 0.70 0.65
Prediction Loss / Encoder Loss / Domain Loss: 6.86 3.32 0.35 0.37 0.68 0.64
Prediction Loss / Encoder Loss / Domain Loss: 6.29 4.19 0.34 0.37 0.69 0.64
Prediction Loss / Encoder Loss / Domain Loss: 6.57 4.05 0.36 0.37 0.68 0.63
Prediction Loss / Encoder Loss / Domain Loss: 7.53 3.91 0.34 0.37 0.70 0.63
Prediction Loss / Encoder Loss / Domain Loss: 7.46 3.65 0.36 0.35 0.71 0.65
Prediction Loss / Encoder Loss / Domain Loss: 6.20 3.46 0.35 0.35 0.69 0.64
Prediction Loss / Encoder Loss / Domain Loss: 7.01 3.39 0.35 0.37 0.72 0.65
Prediction Loss / Encoder Loss / Domain Loss: 6.53 3.06 0.35 0.36 0.71 0.65
Prediction Loss / Encoder Loss / Domain Loss: 6.94 3.20 0.34 0.35 0.69 0.63
Prediction Loss / Encoder Loss / Domain Loss: 7.15 3.83 0.34 0.36 0.72 0.65
Prediction L

  0%|          | 0/42 [00:00<?, ?it/s]

## Extra Notes

### Inference Process

<p><center><img src='_images/T519611_3.png'></center></p>

### Domain-Aware Feature Extraction Example

Following is the example of domain-aware feature extraction from a real-world benchmark dataset Amazon.

<p><center><img src='_images/T519611_4.png'></center></p>

<p><center><img src='_images/T519611_5.png'></center></p>

We assume two phases: training and inference, with two different domains: Musical Instruments and Toys & Games for cross-domain recommendation scenario. The scenario assumes a training phase with source (upper) and target (lower) domain. The difference is that a common FE (red-box) is shared across domains, while the source and target FEs (green and blue boxes) are domain-specific networks. **The objective is predicting a rating that a user 𝐴 gives on item 2**. Excluding individual review, user 𝐴′𝑠 review on item 2, the source and common extractors distillate latent of user and item respectively. Specifically, for user 𝐴 in 𝑀𝑢𝑠𝑖𝑐𝑎𝑙 𝐼𝑛𝑠𝑡𝑟𝑢𝑚𝑒𝑛𝑡𝑠, a source FE captures domain-specific knowledge that she makes much of sound quality, while common FE extracts domain-common information like beautiful, and nice price. The analysis for item 2 follows the same mechanism. To summarize, DaRE model not only considers domain-shareable knowledge with common FE but also reflects domain-specific information through the source and target FE.

### Review Encoder Example

<p><center><img src='_images/T519611_6.png'></center></p>

<p><center><img src='_images/T519611_7.png'></center></p>

For the training of a review encoder, we utilize individual review that user 𝐴 has written on item 2 (blue box) as another label. Taking the above figure as an example, the review encoder (purple box) takes four types of inputs which are extracted from the source and common FEs. Then, the encoder generates a single output, which contains mixed information of user 𝐴 and item 2. Here, the encoder is trained to infer an individual review, negative feedback of user 𝐴 who takes sound quality into account. Likewise, another encoder in a target domain can be trained in a same manner. With user and item’s previous reviews, the encoder assumes a real feedback that user will leave after purchasing an item.

**END**