<a href="https://colab.research.google.com/github/rayou2/AutoML-examples/blob/main/AutoML.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Install Packages

In [None]:
!pip install tpot mljar-supervised

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tpot
  Downloading TPOT-0.11.7-py3-none-any.whl (87 kB)
[K     |████████████████████████████████| 87 kB 3.1 MB/s 
[?25hCollecting mljar-supervised
  Downloading mljar-supervised-0.11.3.tar.gz (112 kB)
[K     |████████████████████████████████| 112 kB 30.5 MB/s 
Collecting stopit>=1.1.1
  Downloading stopit-1.1.2.tar.gz (18 kB)
Collecting deap>=1.2
  Downloading deap-1.3.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (139 kB)
[K     |████████████████████████████████| 139 kB 34.4 MB/s 
[?25hCollecting xgboost>=1.1.0
  Downloading xgboost-1.6.2-py3-none-manylinux2014_x86_64.whl (255.9 MB)
[K     |████████████████████████████████| 255.9 MB 43 kB/s 
Collecting update-checker>=0.16
  Downloading update_checker-0.18.0-py3-none-any.whl (7.0 kB)
Collecting lightgbm>=3.0.0
  Downloading lightgbm-3.3.3-py3-none-manylinux1_x86_64.whl

In [None]:
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from supervised.automl import AutoML

# Options Available

*  mode — the package ships with four built-in models.
      *  The Explain mode is ideal for explaining and understanding the data. It results in visualizations of feature importance as well as tree visualizations.
      *  The Perform is used when building ML models for production.
      *  The Compete is meant to build models used in machine learning competitions.
      *  The Optuna mode is used to search for highly-tuned ML models.
*  algorithms — specifies the algorithms you would like to use. They are usually passed in as a list.
*  results_path — the path where the results will be stored
*  total_time_limit — the total time in seconds for training the model
*  train_ensemble — dictates if an ensemble will be created at the end of the training process
*  stack_models — determines if a models stack will be created
*  eval_metric — the metric that will be optimized. If auto the logloss is used for classification problems while the rmse is used for regression problems

In [None]:
#automl = AutoML(
    # mode="Explain"
    # algorithms=""
    # results_path="AutoML_22",
    # total_time_limit= 30 * 60,
    # train_ensemble=true,
    # stack_models="",
    # eval_metric=""
#)

# Healthcare Dataset - SPARCS

## Load in dataset

In [None]:
import pandas as pd
tuition = pd.read_csv('/content/Tuition_Assistance_Program__TAP__Recipients___Dollars_by_College__Sector_Group__and_Level_of_Study__Beginning_2000.csv')
tuition

Unnamed: 0,Academic Year,TAP College Code,Federal School Code,Level,TAP Level of Study,TAP College Name,Sector Type,TAP Sector Group,TAP Recipient Headcount,TAP Recipient FTEs,TAP Recipient Dollars
0,2020,1075,7465,U,2 yr Undergrad,AMERICAN ACAD OF DRAMATIC ARTS,PRIVATE,5-INDEPENDENT,3,3.06,12364.30
1,2020,8323,42785,U,4 yr Undergrad,YESHIVA HECHAL SHEMUEL,PRIVATE,9-CHAPTER XXII,7,7.03,25198.17
2,2020,1040,2899,U,4 yr Undergrad,WAGNER COLLEGE 4YR UNDERGRAD,PRIVATE,5-INDEPENDENT,252,239.09,722802.93
3,2020,8244,7264,U,4 yr Undergrad,MESIVTA TORAH VODAATH RABBINICAL,PRIVATE,9-CHAPTER XXII,127,130.41,551196.83
4,2020,1016,2894,U,4 yr Undergrad,UNIV OF ROCHESTER EASTMAN UG,PRIVATE,5-INDEPENDENT,14,12.73,30044.02
...,...,...,...,...,...,...,...,...,...,...,...
11121,2000,5215,2688,G,Grad,CUNY CITY COLLEGE GRAD,PUBLIC,1-CUNY SR,56,41.75,17967.00
11122,2000,535,20662,U,5 yr Undergrad,THE NEW SCHOOL 4YR UNDERGRAD,PRIVATE,5-INDEPENDENT,22,12.25,41006.00
11123,2000,965,7109,U,2 yr Undergrad,SUC OLD WESTBURY (UNDERGRAD),PUBLIC,3-SUNY SO,10,6.75,18912.25
11124,2000,5210,2681,G,Grad,CANISIUS COLLEGE GRAD,PRIVATE,5-INDEPENDENT,44,31.75,10030.75


In [None]:
tuition.columns

Index(['Academic Year', 'TAP College Code', 'Federal School Code', 'Level',
       'TAP Level of Study', 'TAP College Name', 'Sector Type',
       'TAP Sector Group', 'TAP Recipient Headcount', 'TAP Recipient FTEs',
       'TAP Recipient Dollars'],
      dtype='object')

## Potential variables of interest

* TAP Level of Study (categorical) 
* TAP Recipient Dollars (continuous)
* Level




In [None]:
tuition['TAP Level of Study'].describe()

count              11126
unique                 6
top       4 yr Undergrad
freq                3945
Name: TAP Level of Study, dtype: object

In [None]:
tuition['TAP Level of Study'].value_counts()

4 yr Undergrad    3945
2 yr Undergrad    3286
5 yr Undergrad    2367
Grad              1487
STAP                39
Unknown              2
Name: TAP Level of Study, dtype: int64

In [None]:
tuition['TAP Recipient Dollars'].describe()

count    1.112600e+04
mean     1.599339e+06
std      3.303060e+06
min      1.875000e+01
25%      1.795453e+04
50%      1.538812e+05
75%      1.662690e+06
max      3.393526e+07
Name: TAP Recipient Dollars, dtype: float64

In [None]:
tuition['Level'].value_counts()

U    9633
G    1493
Name: Level, dtype: int64

# Create new model

In [None]:
x = tuition.drop(columns =['TAP Level of Study'])

In [None]:
y = tuition["TAP Level of Study"]

In [None]:
x

Unnamed: 0,Academic Year,TAP College Code,Federal School Code,TAP College Name,Sector Type,TAP Sector Group,TAP Recipient Headcount,TAP Recipient FTEs,TAP Recipient Dollars,tuition_los
0,2020,1075,7465,AMERICAN ACAD OF DRAMATIC ARTS,PRIVATE,5-INDEPENDENT,3,3.06,12364.30,short
1,2020,8323,42785,YESHIVA HECHAL SHEMUEL,PRIVATE,9-CHAPTER XXII,7,7.03,25198.17,short
2,2020,1040,2899,WAGNER COLLEGE 4YR UNDERGRAD,PRIVATE,5-INDEPENDENT,252,239.09,722802.93,short
3,2020,8244,7264,MESIVTA TORAH VODAATH RABBINICAL,PRIVATE,9-CHAPTER XXII,127,130.41,551196.83,short
4,2020,1016,2894,UNIV OF ROCHESTER EASTMAN UG,PRIVATE,5-INDEPENDENT,14,12.73,30044.02,short
...,...,...,...,...,...,...,...,...,...,...
11121,2000,5215,2688,CUNY CITY COLLEGE GRAD,PUBLIC,1-CUNY SR,56,41.75,17967.00,short
11122,2000,535,20662,THE NEW SCHOOL 4YR UNDERGRAD,PRIVATE,5-INDEPENDENT,22,12.25,41006.00,short
11123,2000,965,7109,SUC OLD WESTBURY (UNDERGRAD),PUBLIC,3-SUNY SO,10,6.75,18912.25,short
11124,2000,5210,2681,CANISIUS COLLEGE GRAD,PRIVATE,5-INDEPENDENT,44,31.75,10030.75,short


In [None]:
y

0        2 yr Undergrad
1        4 yr Undergrad
2        4 yr Undergrad
3        4 yr Undergrad
4        4 yr Undergrad
              ...      
11121              Grad
11122    5 yr Undergrad
11123    2 yr Undergrad
11124              Grad
11125              Grad
Name: TAP Level of Study, Length: 11126, dtype: object

In [None]:
x_train, x_test, y_train, y_test = train_test_split(x, y, stratify=y, test_size= 0.25)

In [None]:
x_test

Unnamed: 0,Academic Year,TAP College Code,Federal School Code,TAP College Name,Sector Type,TAP Sector Group,TAP Recipient Headcount,TAP Recipient FTEs,TAP Recipient Dollars,tuition_los
477,2019,305,2734,HOUGHTON COLLEGE,PRIVATE,5-INDEPENDENT,341,316.25,885722.00,short
25,2020,405,2758,MANHATTAN COLLEGE 4 YR UNDERGRAD,PRIVATE,5-INDEPENDENT,817,773.53,2625417.24,short
6575,2007,650,2806,ROCHESTER INST TECH 4YR UNDERGRAD,PRIVATE,5-INDEPENDENT,2,0.67,2080.30,short
6751,2006,1402,2694,CUNY KINGSBOROUGH CC,PUBLIC,2-CUNY CC,2,1.00,1314.00,short
6265,2007,5390,2835,SUNY AT ALBANY GRAD,PUBLIC,3-SUNY SO,363,281.50,99848.25,short
...,...,...,...,...,...,...,...,...,...,...
6588,2007,575,2790,NYACK COLLEGE 4YR UNDERGRAD,PRIVATE,5-INDEPENDENT,2,1.00,3499.50,short
4369,2011,695,2817,ST BONAVENTURE UNIVERSITY UG,PRIVATE,5-INDEPENDENT,5,3.00,14592.00,short
6318,2007,5030,2711,CORNELL UNIVERSITY GRAD,PRIVATE,5-INDEPENDENT,41,34.00,12063.50,short
1689,2017,2070,2866,FASHION INST TECH-LOWER DIVISION,PUBLIC,4-SUNY CC,1140,986.75,2841713.02,short


In [None]:
automl = AutoML(results_path="tuition_level", mode="Explain")

In [None]:
automl.fit(x_train, y_train)

AutoML directory: tuition_level
The task is multiclass_classification with evaluation metric logloss
AutoML will use algorithms: ['Baseline', 'Linear', 'Decision Tree', 'Random Forest', 'Xgboost', 'Neural Network']
AutoML will ensemble available models
AutoML steps: ['simple_algorithms', 'default_algorithms', 'ensemble']
* Step simple_algorithms will try to check up to 3 models
1_Baseline logloss 1.358953 trained in 0.63 seconds




2_DecisionTree logloss 0.638737 trained in 25.38 seconds
3_Linear logloss 0.737755 trained in 14.49 seconds
* Step default_algorithms will try to check up to 3 models
4_Default_Xgboost logloss 0.175651 trained in 39.61 seconds
5_Default_NeuralNetwork logloss 0.364518 trained in 8.1 seconds
6_Default_RandomForest logloss 0.527478 trained in 17.53 seconds
* Step ensemble will try to check up to 1 model
Ensemble logloss 0.175651 trained in 0.52 seconds
AutoML fit time: 117.35 seconds
AutoML best model: 4_Default_Xgboost


AutoML(results_path='tuition_level')

In [None]:
pred = automl.predict(x_test)
pred

array(['4 yr Undergrad', '4 yr Undergrad', '2 yr Undergrad', ..., 'Grad',
       '2 yr Undergrad', '2 yr Undergrad'], dtype=object)

In [None]:
automl.report()

Best model,name,model_type,metric_type,metric_value,train_time
,1_Baseline,Baseline,logloss,1.35895,1.42
,2_DecisionTree,Decision Tree,logloss,0.638737,26.51
,3_Linear,Linear,logloss,0.737755,15.63
the best,4_Default_Xgboost,Xgboost,logloss,0.175651,40.78
,5_Default_NeuralNetwork,Neural Network,logloss,0.364518,9.07
,6_Default_RandomForest,Random Forest,logloss,0.527478,18.71
,Ensemble,Ensemble,logloss,0.175651,0.52

Unnamed: 0,2 yr Undergrad,4 yr Undergrad,5 yr Undergrad,Grad,STAP,Unknown,accuracy,macro avg,weighted avg,logloss
precision,0.754829,0.870036,0.652397,0.996429,0,0,0.789096,0.545615,0.801755,0.527478
recall,0.824675,0.651351,0.858108,1.0,0,0,0.789096,0.555689,0.789096,0.527478
f1-score,0.788208,0.744977,0.741245,0.998211,0,0,0.789096,0.54544,0.786434,0.527478
support,616.0,740.0,444.0,279.0,7,5,0.789096,2091.0,2091.0,0.527478

Unnamed: 0,Predicted as 2 yr Undergrad,Predicted as 4 yr Undergrad,Predicted as 5 yr Undergrad,Predicted as Grad,Predicted as STAP,Predicted as Unknown
Labeled as 2 yr Undergrad,508,19,89,0,0,0
Labeled as 4 yr Undergrad,150,482,107,1,0,0
Labeled as 5 yr Undergrad,15,48,381,0,0,0
Labeled as Grad,0,0,0,279,0,0
Labeled as STAP,0,0,7,0,0,0
Labeled as Unknown,0,5,0,0,0,0

Unnamed: 0,2 yr Undergrad,4 yr Undergrad,5 yr Undergrad,Grad,STAP,Unknown,accuracy,macro avg,weighted avg,logloss
precision,0.768362,0.760243,0.57971,0.996429,0,0,0.740316,0.517457,0.751452,0.638737
recall,0.662338,0.677027,0.810811,1.0,0,0,0.740316,0.525029,0.740316,0.638737
f1-score,0.711421,0.716226,0.676056,0.998211,0,0,0.740316,0.516986,0.739796,0.638737
support,616.0,740.0,444.0,279.0,7,5,0.740316,2091.0,2091.0,0.638737

Unnamed: 0,Predicted as 2 yr Undergrad,Predicted as 4 yr Undergrad,Predicted as 5 yr Undergrad,Predicted as Grad,Predicted as STAP,Predicted as Unknown
Labeled as 2 yr Undergrad,408,75,133,0,0,0
Labeled as 4 yr Undergrad,117,501,121,1,0,0
Labeled as 5 yr Undergrad,6,78,360,0,0,0
Labeled as Grad,0,0,0,279,0,0
Labeled as STAP,0,0,7,0,0,0
Labeled as Unknown,0,5,0,0,0,0

Unnamed: 0,2 yr Undergrad,4 yr Undergrad,5 yr Undergrad,Grad,STAP,Unknown,accuracy,macro avg,weighted avg,logloss
precision,0.869085,0.889201,0.744589,0.975177,0,0,0.862745,0.579675,0.858937,0.364518
recall,0.894481,0.856757,0.774775,0.985663,0,0,0.862745,0.585279,0.862745,0.364518
f1-score,0.8816,0.872677,0.759382,0.980392,0,0,0.862745,0.582342,0.860613,0.364518
support,616.0,740.0,444.0,279.0,7,5,0.862745,2091.0,2091.0,0.364518

Unnamed: 0,Predicted as 2 yr Undergrad,Predicted as 4 yr Undergrad,Predicted as 5 yr Undergrad,Predicted as Grad,Predicted as STAP,Predicted as Unknown
Labeled as 2 yr Undergrad,551,24,41,0,0,0
Labeled as 4 yr Undergrad,31,634,70,5,0,0
Labeled as 5 yr Undergrad,50,48,344,2,0,0
Labeled as Grad,0,0,4,275,0,0
Labeled as STAP,2,2,3,0,0,0
Labeled as Unknown,0,5,0,0,0,0

Unnamed: 0,2 yr Undergrad,4 yr Undergrad,5 yr Undergrad,Grad,STAP,Unknown,accuracy,macro avg,weighted avg,logloss
precision,0.946429,0.959722,0.880342,0.996429,1.0,0.833333,0.942611,0.936042,0.943681,0.175651
recall,0.946429,0.933784,0.927928,1.0,0.142857,1.0,0.942611,0.825166,0.942611,0.175651
f1-score,0.946429,0.946575,0.903509,0.998211,0.25,0.909091,0.942611,0.825636,0.941856,0.175651
support,616.0,740.0,444.0,279.0,7.0,5.0,0.942611,2091.0,2091.0,0.175651

Unnamed: 0,Predicted as 2 yr Undergrad,Predicted as 4 yr Undergrad,Predicted as 5 yr Undergrad,Predicted as Grad,Predicted as STAP,Predicted as Unknown
Labeled as 2 yr Undergrad,583,8,25,0,0,0
Labeled as 4 yr Undergrad,16,691,31,1,0,1
Labeled as 5 yr Undergrad,15,17,412,0,0,0
Labeled as Grad,0,0,0,279,0,0
Labeled as STAP,2,4,0,0,1,0
Labeled as Unknown,0,0,0,0,0,5

Unnamed: 0,2 yr Undergrad,4 yr Undergrad,5 yr Undergrad,Grad,STAP,Unknown,accuracy,macro avg,weighted avg,logloss
precision,0,0.353898,0,0,0,0,0.353898,0.0589829,0.125244,1.35895
recall,0,1.0,0,0,0,0,0.353898,0.166667,0.353898,1.35895
f1-score,0,0.522783,0,0,0,0,0.353898,0.0871306,0.185012,1.35895
support,616,740.0,444,279,7,5,0.353898,2091.0,2091.0,1.35895

Unnamed: 0,Predicted as 2 yr Undergrad,Predicted as 4 yr Undergrad,Predicted as 5 yr Undergrad,Predicted as Grad,Predicted as STAP,Predicted as Unknown
Labeled as 2 yr Undergrad,0,616,0,0,0,0
Labeled as 4 yr Undergrad,0,740,0,0,0,0
Labeled as 5 yr Undergrad,0,444,0,0,0,0
Labeled as Grad,0,279,0,0,0,0
Labeled as STAP,0,7,0,0,0,0
Labeled as Unknown,0,5,0,0,0,0

Model,Weight
4_Default_Xgboost,1

Unnamed: 0,2 yr Undergrad,4 yr Undergrad,5 yr Undergrad,Grad,STAP,Unknown,accuracy,macro avg,weighted avg,logloss
precision,0.946429,0.959722,0.880342,0.996429,1.0,0.833333,0.942611,0.936042,0.943681,0.175651
recall,0.946429,0.933784,0.927928,1.0,0.142857,1.0,0.942611,0.825166,0.942611,0.175651
f1-score,0.946429,0.946575,0.903509,0.998211,0.25,0.909091,0.942611,0.825636,0.941856,0.175651
support,616.0,740.0,444.0,279.0,7.0,5.0,0.942611,2091.0,2091.0,0.175651

Unnamed: 0,Predicted as 2 yr Undergrad,Predicted as 4 yr Undergrad,Predicted as 5 yr Undergrad,Predicted as Grad,Predicted as STAP,Predicted as Unknown
Labeled as 2 yr Undergrad,583,8,25,0,0,0
Labeled as 4 yr Undergrad,16,691,31,1,0,1
Labeled as 5 yr Undergrad,15,17,412,0,0,0
Labeled as Grad,0,0,0,279,0,0
Labeled as STAP,2,4,0,0,1,0
Labeled as Unknown,0,0,0,0,0,5

Unnamed: 0,2 yr Undergrad,4 yr Undergrad,5 yr Undergrad,Grad,STAP,Unknown,accuracy,macro avg,weighted avg,logloss
precision,0.649819,0.762821,0.639432,0.773639,0,0,0.7011,0.470952,0.700396,0.737755
recall,0.584416,0.643243,0.810811,0.967742,0,0,0.7011,0.501035,0.7011,0.737755
f1-score,0.615385,0.697947,0.714995,0.859873,0,0,0.7011,0.481367,0.694845,0.737755
support,616.0,740.0,444.0,279.0,7,5,0.7011,2091.0,2091.0,0.737755

Unnamed: 0,Predicted as 2 yr Undergrad,Predicted as 4 yr Undergrad,Predicted as 5 yr Undergrad,Predicted as Grad,Predicted as STAP,Predicted as Unknown
Labeled as 2 yr Undergrad,360,139,98,19,0,0
Labeled as 4 yr Undergrad,138,476,91,34,0,1
Labeled as 5 yr Undergrad,50,8,360,26,0,0
Labeled as Grad,1,1,7,270,0,0
Labeled as STAP,0,0,7,0,0,0
Labeled as Unknown,5,0,0,0,0,0

Unnamed: 0,2 yr Undergrad,4 yr Undergrad,5 yr Undergrad,Grad,STAP,Unknown
intercept,6.06943,6.27401,0.326364,-2.90855,-2.346,-7.41525
Academic Year,-0.273755,0.15808,0.189285,-1.38461,-0.857558,2.16856
TAP College Code,-0.633708,-0.628574,-0.927317,2.42116,-0.980313,0.748751
Federal School Code,0.630413,0.379101,0.0386286,-0.00863484,-0.344665,-0.694843
TAP College Name,-0.157055,-0.0477283,-0.0106782,0.433878,-0.209735,-0.00868053
Sector Type,0.926113,0.407807,1.13307,-2.63263,0.177392,-0.0117496
TAP Sector Group,1.07061,0.623883,0.437183,-4.19718,0.215916,1.84959
TAP Recipient Headcount,7.24263,-3.42542,-3.7786,0.72071,-0.598216,-0.161104
TAP Recipient FTEs,-3.56634,7.03864,-4.50686,1.89698,-0.675325,-0.187093
TAP Recipient Dollars,3.33042,3.84856,2.39031,-8.64638,-0.623034,-0.299876


# Regression

In [43]:
df = pd.read_csv("/content/Tuition_Assistance_Program__TAP__Recipients___Dollars_by_College__Sector_Group__and_Level_of_Study__Beginning_2000.csv")
x_cols = [c for c in df.columns if c != "TAP COllege Code"]
x = df[x_cols]
y = df["TAP College Code"]

In [44]:
df

Unnamed: 0,Academic Year,TAP College Code,Federal School Code,Level,TAP Level of Study,TAP College Name,Sector Type,TAP Sector Group,TAP Recipient Headcount,TAP Recipient FTEs,TAP Recipient Dollars
0,2020,1075,7465,U,2 yr Undergrad,AMERICAN ACAD OF DRAMATIC ARTS,PRIVATE,5-INDEPENDENT,3,3.06,12364.30
1,2020,8323,42785,U,4 yr Undergrad,YESHIVA HECHAL SHEMUEL,PRIVATE,9-CHAPTER XXII,7,7.03,25198.17
2,2020,1040,2899,U,4 yr Undergrad,WAGNER COLLEGE 4YR UNDERGRAD,PRIVATE,5-INDEPENDENT,252,239.09,722802.93
3,2020,8244,7264,U,4 yr Undergrad,MESIVTA TORAH VODAATH RABBINICAL,PRIVATE,9-CHAPTER XXII,127,130.41,551196.83
4,2020,1016,2894,U,4 yr Undergrad,UNIV OF ROCHESTER EASTMAN UG,PRIVATE,5-INDEPENDENT,14,12.73,30044.02
...,...,...,...,...,...,...,...,...,...,...,...
11121,2000,5215,2688,G,Grad,CUNY CITY COLLEGE GRAD,PUBLIC,1-CUNY SR,56,41.75,17967.00
11122,2000,535,20662,U,5 yr Undergrad,THE NEW SCHOOL 4YR UNDERGRAD,PRIVATE,5-INDEPENDENT,22,12.25,41006.00
11123,2000,965,7109,U,2 yr Undergrad,SUC OLD WESTBURY (UNDERGRAD),PUBLIC,3-SUNY SO,10,6.75,18912.25
11124,2000,5210,2681,G,Grad,CANISIUS COLLEGE GRAD,PRIVATE,5-INDEPENDENT,44,31.75,10030.75


In [45]:
x_cols

['Academic Year',
 'TAP College Code',
 'Federal School Code',
 'Level',
 'TAP Level of Study',
 'TAP College Name',
 'Sector Type',
 'TAP Sector Group',
 'TAP Recipient Headcount',
 'TAP Recipient FTEs',
 'TAP Recipient Dollars']

In [46]:
x

Unnamed: 0,Academic Year,TAP College Code,Federal School Code,Level,TAP Level of Study,TAP College Name,Sector Type,TAP Sector Group,TAP Recipient Headcount,TAP Recipient FTEs,TAP Recipient Dollars
0,2020,1075,7465,U,2 yr Undergrad,AMERICAN ACAD OF DRAMATIC ARTS,PRIVATE,5-INDEPENDENT,3,3.06,12364.30
1,2020,8323,42785,U,4 yr Undergrad,YESHIVA HECHAL SHEMUEL,PRIVATE,9-CHAPTER XXII,7,7.03,25198.17
2,2020,1040,2899,U,4 yr Undergrad,WAGNER COLLEGE 4YR UNDERGRAD,PRIVATE,5-INDEPENDENT,252,239.09,722802.93
3,2020,8244,7264,U,4 yr Undergrad,MESIVTA TORAH VODAATH RABBINICAL,PRIVATE,9-CHAPTER XXII,127,130.41,551196.83
4,2020,1016,2894,U,4 yr Undergrad,UNIV OF ROCHESTER EASTMAN UG,PRIVATE,5-INDEPENDENT,14,12.73,30044.02
...,...,...,...,...,...,...,...,...,...,...,...
11121,2000,5215,2688,G,Grad,CUNY CITY COLLEGE GRAD,PUBLIC,1-CUNY SR,56,41.75,17967.00
11122,2000,535,20662,U,5 yr Undergrad,THE NEW SCHOOL 4YR UNDERGRAD,PRIVATE,5-INDEPENDENT,22,12.25,41006.00
11123,2000,965,7109,U,2 yr Undergrad,SUC OLD WESTBURY (UNDERGRAD),PUBLIC,3-SUNY SO,10,6.75,18912.25
11124,2000,5210,2681,G,Grad,CANISIUS COLLEGE GRAD,PRIVATE,5-INDEPENDENT,44,31.75,10030.75


In [47]:
y

0        1075
1        8323
2        1040
3        8244
4        1016
         ... 
11121    5215
11122     535
11123     965
11124    5210
11125    5205
Name: TAP College Code, Length: 11126, dtype: int64

In [48]:
automl = AutoML()
automl.fit(x, y)

Linear algorithm was disabled.
AutoML directory: AutoML_1
The task is regression with evaluation metric rmse
AutoML will use algorithms: ['Baseline', 'Decision Tree', 'Random Forest', 'Xgboost', 'Neural Network']
AutoML will ensemble available models
AutoML steps: ['simple_algorithms', 'default_algorithms', 'ensemble']
* Step simple_algorithms will try to check up to 2 models
1_Baseline rmse 2761.685621 trained in 2.52 seconds
2_DecisionTree rmse 199.660753 trained in 12.9 seconds
* Step default_algorithms will try to check up to 3 models
3_Default_Xgboost rmse 3.994924 trained in 12.79 seconds
4_Default_NeuralNetwork rmse 43.200041 trained in 1.92 seconds
5_Default_RandomForest rmse 80.804783 trained in 10.15 seconds
* Step ensemble will try to check up to 1 model
Ensemble rmse 3.994922 trained in 0.31 seconds
AutoML fit time: 49.17 seconds
AutoML best model: 3_Default_Xgboost


AutoML()

In [49]:
df["predictions"] = automl.predict(x)

In [50]:
print("Predictions")
print(df[["TAP College Code", "predictions"]].head())

Predictions
   TAP College Code  predictions
0              1075  1071.956665
1              8323  8323.121094
2              1040  1040.532715
3              8244  8244.034180
4              1016  1016.330750


In [51]:
# get current working directory
import os
os.getcwd()

'/content'

In [59]:
folders = os.listdir()
foldersML = [x for x in folders if x.startswith('AutoML')]
print(foldersML) 

['AutoML_1']


In [60]:
!zip -r /content/AutoML_1.zip /content/AutoML_1

  adding: content/AutoML_1/ (stored 0%)
  adding: content/AutoML_1/data_info.json (deflated 76%)
  adding: content/AutoML_1/ldb_performance_boxplot.png (deflated 21%)
  adding: content/AutoML_1/4_Default_NeuralNetwork/ (stored 0%)
  adding: content/AutoML_1/4_Default_NeuralNetwork/status.txt (stored 0%)
  adding: content/AutoML_1/4_Default_NeuralNetwork/permutation_importance.png (deflated 18%)
  adding: content/AutoML_1/4_Default_NeuralNetwork/learner_fold_0_importance.csv (deflated 37%)
  adding: content/AutoML_1/4_Default_NeuralNetwork/true_vs_predicted.png (deflated 9%)
  adding: content/AutoML_1/4_Default_NeuralNetwork/README.md (deflated 50%)
  adding: content/AutoML_1/4_Default_NeuralNetwork/predicted_vs_residuals.png (deflated 4%)
  adding: content/AutoML_1/4_Default_NeuralNetwork/learner_fold_0.neural_network (stored 0%)
  adding: content/AutoML_1/4_Default_NeuralNetwork/learner_fold_0_training.log (deflated 53%)
  adding: content/AutoML_1/4_Default_NeuralNetwork/framework.jso