# User Case

PROFAB is a benchmarking platform that provides dataset, classifies the proteins according functional annotations and evaluates the training models. The reason to do this platform is providing complete sets of dataset-training-evaluation triangle. Since the workflow is dense, an easy to implement user case is prepared.

## 1. Data Importing

To import data in Python, following lines of code can be used. It is designed as easy to implement and if user needs to import multiple dataset at the sametime a loop can be used. For example following codes can be examined:

- To import single dataset from ecNo prediction data

In [3]:
from profab.import_dataset import ECNO
data_model = ECNO(ratio = 0.2, protein_feature = 'paac', pre_determined = True, set_type = 'similarity')
X_train,X_test,X_validation,y_train,y_test,y_validation = data_model.get_data(data_name = 'ecNo_1-2-7')

- To import multiple dataset from ecNo prediction data in a loop

## 2. Training

PROFAB can train any type of data. It provides both classification and regression training. Since our datasets are based on classication of proteins, as an example, classification method will be shown, however, the same process is valid for regression, too (only training algorithms are different in name).

After training session, outcome of training can be stored in 'model_path' ```if path != None```. Because this process lasts to long, saving the outcome will be time-saver. Stored model must be exported and be imported with 'pickle' Python library.

In [4]:
#To train the data:
import pickle
from profab.model_process import scale_methods, classification_methods

#Let's define model path where training model will be saved.
model_path = 'model_path.txt'

#Then sets are scaled to eleminate bias. Scaler is obtained from train data and can be used for different sets
X_train,scaler = scale_methods(X_train,scale_type = 'standard')
X_test,X_validation = scaler.transform(X_test),scaler.transform(X_validation)

#After assigning paths and scaling datasets, training can be done manually like this way (validation by hand):
model = classification_methods(ml_type = 'logistic_reg',
                                X_train = X_train,
                                y_train = y_train,
                                X_valid = X_validation,
                                y_valid = y_validation,
                                path = model_path
                                )

LogisticRegression(C=67.67709090909092, class_weight=None, dual=False,
                   fit_intercept=True, intercept_scaling=1, l1_ratio=None,
                   max_iter=1000, multi_class='ovr', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)


## 3. Evaluation

After training session is done, evaluation can be done with following lines of code. The output of evaluation is given below of code.

### Get Scores

In [5]:
#by returning model, it can be directly used or
#saved model can be obtained by using 'pickle' package.
model = pickle.load(open(model_path,'rb'))

from profab.model_evaluate import evaluate_score

score_train,f_train = evaluate_score(model,X_train,y_train,preds = True)
score_test,f_test = evaluate_score(model,X_test,y_test,preds = True)
score_validation,f_validation = evaluate_score(model,X_validation,y_validation,preds = True)

The score of train and test are given for data: 'ecNo_1-2-7 'target'.

In [6]:
print(score_train)

{'Precision': 0.868421052631579, 'Recall': 0.75, 'F1-Score': 0.8048780487804879, 'F05-Score': 0.8418367346938777, 'Accuracy': 0.9166666666666666, 'MCC': 0.7555256046223662, 'AUC': 0.8581081081081082, 'AUPRC': 0.8378563596491229, 'TP': 33, 'FP': 5, 'TN': 143, 'FN': 11}


In [7]:
print(score_test)

{'Precision': 0.8, 'Recall': 0.6666666666666666, 'F1-Score': 0.7272727272727272, 'F05-Score': 0.7692307692307692, 'Accuracy': 0.875, 'MCC': 0.6515837655350015, 'AUC': 0.8055555555555555, 'AUPRC': 0.775, 'TP': 4, 'FP': 1, 'TN': 17, 'FN': 2}


In [8]:
print(score_validation)

{'Precision': 0.8823529411764706, 'Recall': 0.8823529411764706, 'F1-Score': 0.8823529411764706, 'F05-Score': 0.8823529411764706, 'Accuracy': 0.9183673469387755, 'MCC': 0.8198529411764706, 'AUC': 0.9099264705882353, 'AUPRC': 0.9027611044417767, 'TP': 15, 'FP': 2, 'TN': 30, 'FN': 2}


### Table Formating

To get the data in table format, following lines of code can be executed. Besides scores, sizes of each sets are also given. Tables is stored in .csv format

In [9]:
#If user wants to see all results in a table, following codes can be run:
from profab.model_evaluate import form_table

score_path = 'score_path.csv' #To save the results.

scores = {'train':score_train,'test':score_test,'validation':score_validation}
form_table(scores = scores, path = score_path)