The goal of this part is to fit the model to a sparse matrix with some games' FPS are missing. 
The key part is to compute the MSE with some element of F representing the fps data is missing.

## Building a model to predict FPS
First, we build our modle as:
\begin{align} 
F^{i}_{mn}=g^{i}P_{mn}+\alpha_{mn}
\end{align}
where $i$ is the label for games, $mn$ are the label for gpu and cpu respectively, and $\alpha$ contains other information that is game independent.

Next, I will use the current data to testify this model.

## Building a new model to predict FPS
First, we build our modle as:
\begin{align} 
F^{i}_{mn}=g^{i}G_{m}C_{n}
\end{align}
where $i$ is the label for games, $mn$ are the label for gpu and cpu respectively, and $\alpha$ contains other information that is game independent.

Next, I will use the current data to testify this model.

The number of parameters in this model is $i+m+n$.
Because we find 'i' games fps benchmark, the number of data point will be $i*m*n$.
In the case where $i=24,m=28,n=14$,the number of parameters are $66$ and the number of data points is $9408$.

import pandas as pd
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import random

# The following cell is the model class. 
Its __call__ method returns the predicted FPS according to aformentioned formula.
Its load_variables method loads previously trained parameters which will be used by the __call__ method to make predictions.

In [2]:
## i is the total number of games, m is the total number of GPUs considered,
## and n is th total number of CPUs considered.
class model():
    def __init__(self,shape):
        self.i=shape[0]
        self.m=shape[1]
        self.n=shape[2]
        self.g=tf.Variable(tf.random.truncated_normal(shape=(self.i,)))
        self.P=tf.Variable(tf.random.truncated_normal(shape=(self.m,self.n)))
        self.alpha=tf.Variable(tf.random.truncated_normal(shape=(self.m,self.n)))
        self.trainable_variables=[self.P,self.alpha,self.g]
        
    def __call__(self):
        F_predict=tf.concat([tf.expand_dims(self.g[j]*self.P,0) for j in range(self.i)],0)\
                    +tf.tile(tf.expand_dims(self.alpha,0),[self.i,1,1])
        return F_predict
    
    def load_variables(self,parameters):
        self.P=tf.constant(parameters[0])
        self.alpha=tf.constant(parameters[1])
        self.g=tf.constant(parameters[2])

In [3]:
## model without alpha
class model_without_alpha():
    def __init__(self,shape):
        self.i=shape[0]
        self.m=shape[1]
        self.n=shape[2]
        self.g=tf.Variable(tf.random.truncated_normal(shape=(self.i,)))
        self.P=tf.Variable(tf.random.truncated_normal(shape=(self.m,self.n)))
        self.trainable_variables=[self.P,self.g]
        
    def __call__(self):
        F_predict=tf.concat([tf.expand_dims(self.g[j]*self.P,0) for j in range(self.i)],0)
        return F_predict
    
    def load_variables(self,parameters):
        self.P=tf.constant(parameters[0])
        self.g=tf.constant(parameters[2])

In [4]:
## model that also decomposes GPU and CPU
class model_cpu_gpu():
    def __init__(self,shape):
        self.i=shape[0]
        self.m=shape[1]
        self.n=shape[2]
        self.g=tf.Variable(tf.random.truncated_normal(shape=(self.i,)))
        self.G=tf.Variable(tf.random.truncated_normal(shape=(self.m,)))
        self.C=tf.Variable(tf.random.truncated_normal(shape=(self.n,)))
        self.trainable_variables=[self.G,self.C,self.g]
        
    def __call__(self):
        P=tf.concat([tf.expand_dims(self.G[j]*self.C,0) for j in range(self.m)],0)
        F_predict=tf.concat([tf.expand_dims(self.g[j]*P,0) for j in range(self.i)],0)
        return F_predict
    
    def load_variables(self,parameters):
        self.P=tf.constant(parameters[0])
        self.g=tf.constant(parameters[2])

# Next we defien a pipeline to train the model.

This is the main pipeline. It takes the model, the epochs and training data F. 
F is a np array with dimension (games,GPU,CPU).
Non-tested FPS in the training data F should be denoted by np.nan.

In [5]:
def train_model(model,F,savepath,epochs=300): 
    optimizer = tf.keras.optimizers.Adam(learning_rate=0.1)
    for epoch in range(epochs):           
        train_one_step(model,F,optimizer)  
        if epoch%10==0:
            F_predict=model()
            print('for epoch {}, MSE is {}'.format(epoch,compute_loss_sparse(F_predict,F)))
    save_model(model,savepath)

In [6]:
## uses tensorflow to do backpropagation onece for each epoch.
def train_one_step(model,F,optimizer):
    with tf.GradientTape() as tape:
        F_predict = model()
        loss=compute_loss_sparse(F_predict, F)
        # compute gradient
        grads = tape.gradient(loss, model.trainable_variables)
        # update to weights
        optimizer.apply_gradients(zip(grads, model.trainable_variables))      

In [7]:
## computes the mean squared error of predicted FPS with respect to the real FPS at those tested data point in F.  
def compute_loss_sparse(F_predict, F):
    mse = tf.keras.losses.MeanSquaredError()
    indices_true,indices_false=cal_indices(F)
    
    ## if there is no None data or missing data in F, return a normal mse
    ## else return mse based on the given data
    if not indices_false:
        return mse(F_predict,F) 
    else:
        F=tf.constant(F)    
        return mse(tf.gather_nd(F_predict,indices_true),tf.gather_nd(F,indices_true))

In [8]:
## indices_true is where FPS test is given 
## indices_false is where FPS test is missing
def cal_indices(F):
    indices_true=[]
    indices_false=[]
    for i in range(F.shape[0]):
        for j in range(F.shape[1]):
            for k in range(F.shape[2]):
                if np.isnan(F[i,j,k]):
                    indices_false.append([i,j,k])
                else:
                    indices_true.append([i,j,k])
    return indices_true, indices_false

In [9]:
def save_model(model,path):
    stored_variables=np.array([i.numpy() for i in model.trainable_variables])
    np.save(path, stored_variables,allow_pickle=True, fix_imports=True)

# Next, we will load the data and take part of the data as validation set.
The format of the data will be numpy.array with shape (i,m,n), with i the game label, m the GPU label, and n the CPU label.

In [72]:
import sqlite3
import os
import pandas as pd
import numpy as np

In [17]:
def sql_to_np():
    cwd = os.getcwd()
    cwd='/'.join(cwd.split('/')[:-1])
    path=cwd+'/tested_data/games_fps_cpu_gpu.db'
    
    cnx = sqlite3.connect(path)
    c=cnx.cursor()
    Game_Name=c.execute('''SELECT DISTINCT Game_Name FROM games_fps''').fetchall()

    Game_Name=[i[0] for i in Game_Name]
    GPU=[i[0] for i in GPU]
    CPU=[i[0] for i in CPU]

    total=[]
    for game in Game_Name:
        result=pd.read_sql('''SELECT GPU,CPU,FPS FROM games_fps where Game_Name='{}' '''.format(game),cnx)
        result=result.pivot(index='GPU', columns='CPU', values='FPS')
        result=result.sort_index()
        result=result.reindex(sorted(result.columns), axis=1)
        total.append(result.to_numpy())

    total=np.array(total)

    cnx.commit()
    c.close()
    cnx.close()
    
    return total

In [74]:
total.shape

(24, 28, 14)

In [10]:
## find2d return a 2d array representing fps among 
## different cpu, gpu combinations for a fixed game
def find2d(game_name):
    game_fps=pd.read_csv(game_name+'.csv')
    game_fps=np.array(game_fps)
    game_fps=[i[1:] for i in game_fps]
    game_fps=np.array(game_fps).astype(np.float32)
    return game_fps

def load_fps_data():
    game_names=pd.read_csv('games_fps_hyperlinks.csv')['game_name']
    fps_all=np.array([find2d(i) for i in game_names])
    print('The training data contains {} games tested,\n'.format(fps_all.shape[0]), 
          '{} gpu types,'.format(fps_all.shape[1]), 
          '{} cpu types'.format(fps_all.shape[2]))
    return fps_all

In [11]:
## randomly set N data in F to be None and return the missing data indices
def setzero(F,N):
    indices=[]
    F_missing=np.copy(F)
    shape=F.shape
    for i in range(N):
        indices.append([random.randint(0,shape[0]-1),random.randint(0,shape[1]-1),random.randint(0,shape[2]-1)])    
    for i,j,k in indices:
        F_missing[i,j,k]=None
    
    return indices,F_missing     

In [12]:
def validation(indices,model,F):
    mse=tf.keras.losses.MeanSquaredError()
    F_predict=model()  
    return mse(tf.gather_nd(F_predict,indices),tf.gather_nd(F,indices))

In [13]:
# This is the pipeline that discard N training data and train a model on the processed data
def valid_pipeline(N):
    F=load_fps_data()
    testmodel=model_cpu_gpu(F.shape)
    ## create some missing data manually
    indices,F_missing=setzero(F,N)
    i,j,k=F.shape
    print('\n','The number of training data is {} out of {} \n'.format(np.count_nonzero(~np.isnan(F_missing)),i*j*k))
    ## use the missing data to train the model and save the model
    train_model(testmodel,F_missing,'savedmodel')
    ## print out the validation accuracy
    print('\n','The validation MSE is {}'.format(tf.keras.backend.get_value(validation(indices, testmodel,F))))

In [14]:
valid_pipeline(10000)

The training data contains 24 games tested,
 28 gpu types, 14 cpu types

 The number of training data is 3244 out of 9408 

for epoch 0, MSE is 11633.25390625
for epoch 10, MSE is 11560.9765625
for epoch 20, MSE is 10882.68359375
for epoch 30, MSE is 8228.2333984375
for epoch 40, MSE is 4202.56640625
for epoch 50, MSE is 1544.2957763671875
for epoch 60, MSE is 454.0294494628906
for epoch 70, MSE is 143.18777465820312
for epoch 80, MSE is 43.316951751708984
for epoch 90, MSE is 14.380014419555664
for epoch 100, MSE is 6.280126571655273
for epoch 110, MSE is 2.0478708744049072
for epoch 120, MSE is 0.7865241765975952
for epoch 130, MSE is 0.3017795979976654
for epoch 140, MSE is 0.13132977485656738
for epoch 150, MSE is 0.05146137624979019
for epoch 160, MSE is 0.03429296985268593
for epoch 170, MSE is 0.028801994398236275
for epoch 180, MSE is 0.025604140013456345
for epoch 190, MSE is 0.02415713667869568
for epoch 200, MSE is 0.024025723338127136
for epoch 210, MSE is 0.023909011855721

In [69]:
G,C,g=np.load('savedmodel.npy',allow_pickle=True)

In [70]:
G=abs(G)
C=abs(C)
g=abs(g)

In [71]:
g=g*max(G)*max(C)/(10000)
G=(G/max(G))*100
C=(C/max(C))*100

In [73]:
1/g

array([ 54.790993,  68.70181 ,  51.418213, 105.09394 ,  82.19783 ,
        49.767143,  28.36716 ,  57.85394 ,  42.797348,  65.78896 ,
        54.470993,  53.646492,  83.26474 ,  31.391153,  44.04533 ,
        44.72377 ,  84.23137 ,  74.925   ,  71.426605,  53.308975,
        46.95886 ,  91.00462 ,  41.15585 , 100.87287 ], dtype=float32)

In [None]:
def load_predict(model,path):
    parameters=np.load('savedmodel.npy',allow_pickle=True)
    model.load_variables(parameters)

initializer=[(fps,gpu_number,cpu_number),......]