# User Case

PROFAB is a benchmarking platform that provides dataset, classifies the proteins according functional annotations and evaluates the training models. The reason to do this platform is providing complete sets of dataset-training-evaluation triangle. Since the workflow is dense, an easy to implement user case is prepared.

## 1. Data Importing

Test files to apply ProFAB can be downloaded from https://drive.google.com/file/d/1slhzT7LBp_AE67XoxjIWhlbWnfRZD8_j/view?usp=sharing. After downloading them, inside folders ('ec_dataset' and 'go_dataset') should be directed into '/prb/import_dataset' directory. The reason of this extra workload is coming from size of data.

To import data in Python, following lines of code can be used. It is designed as easy to implement and if user needs to import multiple dataset at the sametime a loop can be used. For example following codes can be examined:

- To import single dataset from ecNo prediction data

In [None]:
from prb.import_dataset impor ECNO
data_model = ECNO(ratio = 0.2, protein_feature = pf, pre_determined = True, set_type = 'target')
X_train,X_test,X_validation,y_train,y_test,y_validation = data_model.get_data(data_name = 'ec_1-2-1')

- To import multiple dataset from ecNo prediction data in a loop

In [None]:
from prb.import_dataset impor ECNO
data_model = ECNO(ratio = 0.2, protein_feature = pf, pre_determined = True, set_type = 'random')
ecNames = ['ec_1-2-1','ec_1-4-4-2','ec_2-7-4-6']
for name in ecNames:
    X_train,X_test,X_validation,y_train,y_test,y_validation = data_model.get_data(data_name = name)

## 2. Training

PROFAB can train any type of data. It provides both classification and regression training. Since our datasets are based on classication of proteins, as an example, classification method is shown, however, the same process is valid for regression, too.

After training session, outcome of training is stored in 'model_path'. Because this process lasts to long, saving the outcome will be time-saver. Stored model must be exported and be imported with 'pickle' Python library.

In [None]:
#To train the data:
import pickle
from prb.process_learn_evaluate import scale_methods, classification_methods, evaluate_score

#Let's define model path where training model will be saved.
model_path = 'model_path.txt'

#Then sets are scaled to eleminate bias. Scaler is obtained from train data and can be used for different sets
X_train,scaler = scale_methods(X_train,scale_type = 'Standard_Scaler')
X_test,X_validation = scaler.transform(X_test),scaler.transform(X_validation)

#After assigning paths and scaling datasets, training can be done manually like this way:
classification_methods(path = model_path,ml_type = 'naive_bayes',
                                        X_train = X_train,
                                        y_train = y_train,
                                        cv = None)

## 3. Evaluation

After training session is done, evaluation can be done with following lines of code. The output of evaluation is given below of code.

### Get Scores

In [None]:
#To get saved model, following code can be run.
model = pickle.load(open(model_path,'rb'))

#After that, for all sets evaluation metrics can be obtained separately.
score_train,f_train = evaluate_score(model,X_train,y_train)
score_test,f_test = evaluate_score(model,X_test,y_test)
score_validation,f_validation = evaluate_score(model,X_validation,y_validation)

The score of train and test are given for data: 'ecNo_1-2-1', 'target'.
```{python}
score_train =  {Precision:0.74352651, Recall:0.914560162, F1-score:0.820222172, F05-score:0.772416738, Accuracy:0.821917808, MCC:0.661149793, TP:1809, FP:624, TN:1851, FN:169}

score_train =  {Precision:0.817891, Recall:0.937729, F1-score:0.87372, F05-score:0.839344, Accuracy:0.864964, MCC:0.737965, TP:256, FP:57, TN:218, FN:17}

score_train =  {Precision:0.749588138, Recall:0.913655, F1-score:0.823529, F05-score:0.777512, Accuracy:0.824955, MCC:0.665838, TP:455, FP:152, TN:464, FN:43}
```

### Table Formating

To get the data in table format, following lines of code can be executed. Besides scores, sizes of each sets are also given. Tables is stored in .csv format

In [None]:
#If user wants to see all results in a table, following codes can be run:
from prb.utils import form_table

score_path = 'score_path.csv' #To save the results.

scores = [score_train,score_test,score_validation]
size_of = [str(len(X_train))  + 
            'x' + str(len(X_train[0])),str(len(X_test))  +
            'x' + str(len(X_test[0])),str(len(X_validation))  +
            'x' + str(len(X_validation [0]))]

preds = [f_train,f_test,f_validation]
names = ['Train','Test','Validation']


learning_method = 'classification'
form_table(score_path = score_path, names = names,
         scores = scores,sizes = size_of, 
         learning_method = learning_method,preds = preds)

To make a processing with multiple datasets, all functions can be introduced in 'for' loop. In following lines can be copied and pasted to do that.

In [None]:
from prb.import_dataset impor ECNO
import pickle
from prb.process_learn_evaluate import scale_methods, classification_methods, evaluate_score
from prb.utils import form_table

data_model = ECNO(ratio = 0.2, protein_feature = pf, pre_determined = True, set_type = 'random')
ecNames = ['ec_1-2-1','ec_1-4-4-2','ec_2-7-4-6']
for name in ecNames:
    X_train,X_test,X_validation,y_train,y_test,y_validation = data_model.get_data(data_name = name)
    #To train the data:

    #Let's define model path where training model will be saved.
    model_path = 'model_path.txt'

    #Then sets are scaled to eleminate bias. Scaler is obtained from train data and can be used for different sets
    X_train,scaler = scale_methods(X_train,scale_type = 'Standard_Scaler')
    X_test,X_validation = scaler.transform(X_test),scaler.transform(X_validation)

    #After assigning paths and scaling datasets, training can be done manually like this way:
    classification_methods(path = model_path,ml_type = 'naive_bayes',
                                            X_train = X_train,
                                            y_train = y_train,
                                            cv = None)
    #To get saved model, following code can be run.
    model = pickle.load(open(model_path,'rb'))

    #After that, for all sets evaluation metrics can be obtained separately.
    score_train,f_train = evaluate_score(model,X_train,y_train)
    score_test,f_test = evaluate_score(model,X_test,y_test)
    score_validation,f_validation = evaluate_score(model,X_validation,y_validation)
    
    #If user wants to see all results in a table, following codes can be run:

    score_path = 'score_path.csv' #To save the results.

    scores = [score_train,score_test,score_validation]
    size_of = [str(len(X_train))  + 
                'x' + str(len(X_train[0])),str(len(X_test))  +
                'x' + str(len(X_test[0])),str(len(X_validation))  +
                'x' + str(len(X_validation [0]))]

    preds = [f_train,f_test,f_validation]
    names = ['Train','Test','Validation']
    
    learning_method = 'classification'
    form_table(score_path = score_path, names = names,
             scores = scores,sizes = size_of, 
             learning_method = learning_method,preds = preds)