# Tutorial Autoprognosis

## Automated Clinical Prognostic Modeling 

This tutorial shows how to use [Autoprognosis](https://arxiv.org/abs/1802.07207). We are using the UCI spam dataset.

See [installation instructions](../../doc/install.md) to install the dependencies.



In [1]:
import pandas as pd
import initpath_ap
initpath_ap.init_sys_path()
import utilmlab

* Load dataset.
* Convert the id column to an object type
* Set the target column
* Show the first five samples:

In [2]:
fn_csv = 'kaggle_breastcancer.csv'.format(utilmlab.get_data_dir())
df = pd.read_csv(fn_csv)
df['id'] = df['id'].astype(object)
print(df.dtypes)
target = 'diagnosis'
df.head()

id                          object
diagnosis                    int64
radius_mean                float64
texture_mean               float64
perimeter_mean             float64
area_mean                  float64
smoothness_mean            float64
compactness_mean           float64
concavity_mean             float64
concave points_mean        float64
symmetry_mean              float64
fractal_dimension_mean     float64
radius_se                  float64
texture_se                 float64
perimeter_se               float64
area_se                    float64
smoothness_se              float64
compactness_se             float64
concavity_se               float64
concave points_se          float64
symmetry_se                float64
fractal_dimension_se       float64
radius_worst               float64
texture_worst              float64
perimeter_worst            float64
area_worst                 float64
smoothness_worst           float64
compactness_worst          float64
concavity_worst     

Unnamed: 0,id,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
0,842302,0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,842517,0,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,84300903,0,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,84348301,0,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,84358402,0,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


Run autoprognosis for a number of iterations

In [None]:
python_exe = 'python3' # on some platforms the name of the python3.6 executable is python or python3.6
odir = "."   # output directory
verboselevel = 0  
niter = 3  # number of interations
nstage = 1 # number of components in the pipeline: 1:classifiers, 2:feature processing + classifier: 3:imputation + ...
acquisition_type = 'MPI' # default and prefered is LCB but this generates excessive warnings, MPI is a good compromise.
!python3 autoprognosis.py -i {fn_csv} -o {odir} --target {target} --verbose {verboselevel} --nstage 1 --it  {niter} --acquisitiontype {acquisition_type}

R[write to console]: Loading required package: missForest

R[write to console]: Loading required package: randomForest

R[write to console]: randomForest 4.6-14

R[write to console]: Type rfNews() to see new features/changes/bug fixes.

R[write to console]: Loading required package: foreach

R[write to console]: Loading required package: itertools

R[write to console]: Loading required package: iterators

R[write to console]: Loading required package: softImpute

R[write to console]: Loading required package: Matrix

R[write to console]: Loaded softImpute 1.4


[ Gradient Boosting ]
[ MultinomialNaiveBayes ]
[ LinearSVM ]
Widget Javascript not detected.  It may not be installed or enabled properly.
Widget Javascript not detected.  It may not be installed or enabled properly.
[ Random Forest ]
[ BernoullinNaiveBayes ]
[ GaussianNaiveBayes ]
Iteration number: 1 4s (4s) (13s), Current pipelines:  [[[ Random Forest ]]], [[[ BernoullinNaiveBayes ]]], [[[ GaussianNaiveBayes ]]], BO objective

Display results

In [None]:
!{python_exe} autoprognosis_report.py -i {odir}