# AutoPrognosis API Tutorial

A demonstration for AP functionality and operation

This tutorial shows how to use [Autoprognosis](https://arxiv.org/abs/1802.07207). 

See [installation instructions](../../doc/install.md) to install the dependencies.

In [None]:
import pandas as pd
import numpy as np
import initpath_ap
initpath_ap.init_sys_path()
import utilmlab
import json;
from scipy import stats
#import AutoPrognosis Library:
import model

# Run the model from command line

## introduce with a sample

In [None]:
df = pd.read_csv('../../../AutoPrognosisThings/cardio_data/withImage2/with_prob_no_test_vascular.csv')
df

-i : the input csv file

--target : the name of the column that contains the outcome (what you want to predict for the validation/test set)

-o : the folder in which the output of AutoPrognosis is written

--it : total number of iterations for each fold or n-fold cross validation

--cv : If 0, that means a normal validation with train and test (or validation) set. -iValIndex should also be set. Otherwise, n for n-fold cross validation

-iValIndex: address of the test index file, test_indexes.csv or val_indexes.csv", 

--nstage: size of pipeline: 0: auto (selects imputation when missing data is detected),
        1: only classifiers, 
        2: feature processesing + clf, 
        3: imputers + feature processors and clf
        4: imputers (if needed) + clf
        
--ensemble : include ensembles when fitting. It gives an assertion error when set to 0! should be looked into.

--modelindexes : list of classifiers that we want to try

0 Random Forest,
1 Gradient Boosting, 
2 XGBoost, 
3 Adaboost, 
4 Bagging, 
5 Bernoulli Naive Bayes, 
6 Gauss Naive Bayes, 
7 Multinomial Naive Bayes, 
8 Logistic Regression, 
9 Perceptron, 
10 Decision Trees, 
11 QDA, 
12 LDA, 
13 KNN, 
14 Linear SVM, 
15 Neural Network

### Use AutoPrognosis with cross validation

In [None]:
!python3 autoprognosis.py\
-i ../../../AutoPrognosisThings/cardio_data/withImage2/with_prob_no_test_vascular.csv\
--target outcome \
-o ../../../AutoPrognosisThings/outputs \
--it 3 \
--cv 3 \
--nstage 4 \
--modelindexes 2\
--num_components 1\
--kernel_freq 100

In [None]:
!python3 autoprognosis_report.py -i ../../../AutoPrognosisThings/outputs --verbose 1

### Use AutoPrognosis with train and validation set

In [None]:
!python3 autoprognosis.py\
-i ../../../AutoPrognosisThings/cardio_data/withImage2/with_prob_no_test_vascular.csv\
-iValIndex ../../../AutoPrognosisThings/cardio_data/val_indexes.csv\
--target outcome \
-o ../../../AutoPrognosisThings/outputs \
--it 15 \
--cv 0 \
--nstage 4 \
--modelindexes 2\
--num_components 1\
--kernel_freq 100

# Run the model by short simple python code

In [None]:
df_all= pd.read_csv('../../../AutoPrognosisThings/cardio_data/withImage2/with_prob_no_test_vascular.csv')
X_= df_all.drop(columns=['outcome'])
Y_= df_all[['outcome']]

In [None]:
metric = 'aucprc'
acquisition_type = 'MPI' # default and prefered is LCB but this generates excessive warnings, MPI is a good compromise.
model.nmax_model= 4 #this is the same as nstage 
AP_mdl   = model.AutoPrognosis_Classifier(
    metric=metric, CV=3, num_iter=3, kernel_freq=100, ensemble=True,
    ensemble_size=3, Gibbs_iter=100, burn_in=50, num_components=1, 
    acquisition_type=acquisition_type, my_model_indexes=[2])

In [None]:
AP_mdl.fit(X_, Y_)

## Computing model predictions

In [None]:
AP_mdl.predict(X_)

## Compute performance via multi-fold cross-validation

In [None]:
model.evaluate_ens(X_, Y_, AP_mdl, n_folds=3, visualize=True, X_val_indexes=[])

## Visualize data...

In [None]:
AP_mdl.visualize_data(X_)

## Visualize the model...

In [None]:
AP_mdl.APReport()