## CAS Connection

### Connect to the Cas Server

In [1]:
import swat
s = swat.CAS(host, port)
s.session.setLocale(locale="en_US") 
s.sessionProp.setSessOpt(timeout=864000)

The history saving thread hit an unexpected error (DatabaseError('database disk image is malformed',)).History will not be written to the database.


# Document Classification
## Part 2: Forest Models and Autotune
In this notebook, you will build forest models that performs the classification task. To improve the model and illustrate the power of SAS Optimization, hyperparameter tuning is automated using the autotune action. The results after autotune give significant improvement over the forest models trained with default parameters.

# Load Data

The Cora data set is publicly available via [this hyperlink](https://linqs.soe.ucsc.edu/data).

In [2]:
import document_classification_scripts as scripts
import importlib
importlib.reload(scripts)
from document_classification_scripts import AttributeDict, nClasses, nWords, targetColumn, baseFeatureList
demo = scripts.Demo(s)

NOTE: Added action set 'sampling'.
NOTE: Added action set 'pca'.
NOTE: Added action set 'fedsql'.
NOTE: Added action set 'deepLearn'.
NOTE: Added action set 'network'.
NOTE: Added action set 'transpose'.
NOTE: Added action set 'table'.
NOTE: Added action set 'builtins'.
NOTE: Added action set 'neuralNet'.
NOTE: Added action set 'autotune'.
NOTE: Added action set 'session'.
NOTE: Added action set 'decisionTree'.
NOTE: Added action set 'aStore'.
NOTE: Added action set 'aggregation'.


In [3]:
demo.loadRawData()

NOTE: Cloud Analytic Services made the uploaded file available as table CONTENT in caslib CASUSERHDFS(brrees).
NOTE: The table CONTENT has been created in caslib CASUSERHDFS(brrees) from binary data uploaded to Cloud Analytic Services.
NOTE: Cloud Analytic Services made the uploaded file available as table CITES in caslib CASUSERHDFS(brrees).
NOTE: The table CITES has been created in caslib CASUSERHDFS(brrees) from binary data uploaded to Cloud Analytic Services.


# Data Preprocessing
### Creates a custom format definition for target labels

In [4]:
demo.defineTargetVariableFormat()

NOTE: Format library MYFMTLIB added. Format search update using parameter APPEND completed.


### Partitions data into training and test

In [5]:
demo.loadOrPartitionData()

NOTE: Cloud Analytic Services added the caslib 'cora'.
NOTE: Cloud Analytic Services made the file contentPartitioned.sashdat available as table CONTENTPARTITIONED in caslib CASUSERHDFS(brrees).
NOTE: Cloud Analytic Services made the file contentTrain.sashdat available as table CONTENTTRAIN in caslib CASUSERHDFS(brrees).
NOTE: Cloud Analytic Services made the file contentTest.sashdat available as table CONTENTTEST in caslib CASUSERHDFS(brrees).


### Performs Principal Component Analysis (PCA)

In [6]:
nPca = 40
demo.performPca(nPca)
pcaFeatureList = [f"pca{i}" for i in range(1,nPca)]



### Joins citations and training data targets

In [7]:
demo.joinTrainingTargets()

NOTE: Table CITESTRAIN was created in caslib CASUSERHDFS(brrees) with 3562 rows returned.
NOTE: Table CITESCOMBINED was created in caslib CASUSERHDFS(brrees) with 5429 rows returned.


## Generate Network Features

In [8]:
%%capture
networkParam=AttributeDict({
    "useCentrality":True,
    "useNodeSimilarity":True,
    "useCommunity":True,
    "useCore":True
})

tableContentNetwork, networkFeatureList = demo.addNetworkFeatures(
    "contentTrain", "citesTrain", networkParam)
tableContentPartitionedNetwork, networkFeatureList = demo.addNetworkFeatures(
    "contentPartitioned", "citesCombined", networkParam)

tableContentNetworkPca, networkFeatureList = demo.addNetworkFeatures(
    "contentTrainPca", "citesTrain", networkParam)
tableContentPartitionedNetworkPca, networkFeatureList = demo.addNetworkFeatures(
    "contentPartitionedPca", "citesCombined", networkParam)

In [9]:
s.datastep.runCode(
    code = f"data contentTestNetwork; set {tableContentPartitionedNetwork}(where=(partition=0)); run;"
)
print(f"contentTestNetwork: (rows, cols) = {s.CASTable('contentTestNetwork').shape}")

s.datastep.runCode(
    code = f"data contentTestPcaNetwork; set {tableContentPartitionedNetworkPca}(where=(partition=0)); run;"
)
print(f"contentTestPcaNetwork: (rows, cols) = {s.CASTable('contentTestPcaNetwork').shape}")

contentTestNetwork: (rows, cols) = (542, 1485)
contentTestPcaNetwork: (rows, cols) = (542, 92)


# Build Forest Classifiers
Using the Decision Tree action set, you can train a forest model which, with default hyperparameters and baseline features, predicts poorly compared to the neural networks trained in Part 1. Significant improvements are made by adding network features to the model and finding ideal hyperparameters using the tuneForest action in the Autotune action set.

In [10]:
def trainForestModel(modelName, tableTrain, featureList, forestParam):
    return s.decisionTree.forestTrain(
        inputs=featureList,
        target=targetColumn,
        nominal={targetColumn},
        table=tableTrain,
        varImp=True,
        seed=forestParam.randomSeed,
        casOut={"name": modelName, "replace": True},
        saveState={"name": f"{modelName}AStore", "replace": True}
    )

def scoreForestModel(modelName, tableTest):
    r = s.aStore.score(
        table=tableTest,
        rstore=f"{modelName}AStore",
        copyVars={"node", "target"},
        casout={"name": f"{modelName}Scored", "replace": True}
    )
    s.datastep.runCode(
        single="YES",
        code=f"""
        data {modelName}Scores;
           set {modelName}Scored end=_end;
           retain correct count 0;
           if I_target EQ target then correct = correct+1;
           count=count+1;
           if _end then do;
              accuracy = correct / count;
              misclassification = 1 - accuracy;
              output;
           end;
           keep correct count accuracy misclassification;
        run;
     """
    )

def bootstrapForestModel(modelName, tableTrain, tableTest, featureList, forestParam=None, n=25):
    accuracies = []
    for i in range(n):
        partitionData(tableIn=tableTest, tableOut=f"{tableTest}Part_", table1Out=f"{tableTest}Boot_", table2Out=None, frac1=90, randomSeed=(i+5678), partName="bootstrap")
        trainForestModel(modelName, tableTrain, featureList, randomSeed=(12345+i), forestParam=forestParam)
        acc = scoreForestModel(modelName, f"{tableTest}Boot_")
        accuracies = accuracies + [acc]
    print(f"Bootstrap Accuracy = {np.mean(accuracies)} +- {np.std(accuracies)}")
    return accuracies

## Train Baseline Forest Model

In [11]:
baseForestModel = "baseForestModel"

In [12]:
%%time
resultsTrainBaseForest = demo.trainForestModel(
    baseForestModel, "contentTrain", baseFeatureList)
resultsTrainBaseForest['OutputCasTables']

NOTE: 1274001 bytes were written to the table "baseForestModelAStore" in the caslib "CASUSERHDFS(brrees)".
CPU times: user 31.2 ms, sys: 0 ns, total: 31.2 ms
Wall time: 32.7 s


### Score Baseline Forest Model

In [13]:
resultsScoreBaseForest=demo.scoreForestModel(baseForestModel,"contentTest")

Accuracy = 0.43357933579335795


### Bootstrap Runs

In [14]:
%%time
accuracies = demo.bootstrapForestModel(baseForestModel,"contentTrain",
                          "contentTest",
                          baseFeatureList);

NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5678 for sampling.
NOTE: 1274001 bytes were written to the table "baseForestModelAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.4385245901639344
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5679 for sampling.
NOTE: 1267373 bytes were written to the table "baseForestModelAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.4139344262295082
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5680 for sampling.
NOTE: 1269965 bytes were written to the table "baseForestModelAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.4323770491803279
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5681 for sampling.
NOTE: 1270045 bytes were written to the table "baseForestModelAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.41598360655737704
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5682 for sampling.
NOTE: 1272049 bytes were written to the table "baseFores

## Train PCA Forest Model

In [15]:
pcaForestModel = "pcaForestModel"
pcaFeatureList = [f"pca{i}" for i in range(1,nPca)]

In [16]:
%%time
resultsTrainPcaForest = demo.trainForestModel(
    pcaForestModel, "contentTrainPca", pcaFeatureList)
resultsTrainPcaForest['OutputCasTables']

NOTE: 1158292 bytes were written to the table "pcaForestModelAStore" in the caslib "CASUSERHDFS(brrees)".
CPU times: user 15.6 ms, sys: 0 ns, total: 15.6 ms
Wall time: 2.08 s


### Score PCA Forest Model

In [17]:
resultsScorePcaForest=demo.scoreForestModel(pcaForestModel,"contentTestPca")

Accuracy = 0.44095940959409596


### Bootstrap Runs

In [18]:
%%time
accuracies = demo.bootstrapForestModel(pcaForestModel,"contentTrainPca",
                          "contentTestPca",
                          pcaFeatureList);

NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5678 for sampling.
NOTE: 1158292 bytes were written to the table "pcaForestModelAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.4385245901639344
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5679 for sampling.
NOTE: 1153612 bytes were written to the table "pcaForestModelAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.4180327868852459
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5680 for sampling.
NOTE: 1153708 bytes were written to the table "pcaForestModelAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.41598360655737704
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5681 for sampling.
NOTE: 1155204 bytes were written to the table "pcaForestModelAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.4385245901639344
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5682 for sampling.
NOTE: 1155108 bytes were written to the table "pcaForestMode

## Train Network-Features-Only Forest Model

In [19]:
networkForestModel = "networkForestModel"

In [20]:
%%time
resultsTrainNetworkForest = demo.trainForestModel(
    networkForestModel, "contentTrainNetwork", networkFeatureList)
resultsTrainNetworkForest['OutputCasTables']

NOTE: 1366499 bytes were written to the table "networkForestModelAStore" in the caslib "CASUSERHDFS(brrees)".
CPU times: user 31.2 ms, sys: 0 ns, total: 31.2 ms
Wall time: 2.55 s


### Score Network-Features-Only Forest Model

In [21]:
resultsScoreNetworkForest=demo.scoreForestModel(networkForestModel,"contentTestPcaNetwork")

Accuracy = 0.8081180811808119


### Bootstrap Runs

In [22]:
%%time
accuracies = demo.bootstrapForestModel(networkForestModel,"contentTrainNetwork",
                          "contentTestNetwork",
                          networkFeatureList);

NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5678 for sampling.
NOTE: 1366499 bytes were written to the table "networkForestModelAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.8114754098360656
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5679 for sampling.
NOTE: 1355451 bytes were written to the table "networkForestModelAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.805327868852459
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5680 for sampling.
NOTE: 1352235 bytes were written to the table "networkForestModelAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.7991803278688525
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5681 for sampling.
NOTE: 1352267 bytes were written to the table "networkForestModelAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.8012295081967213
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5682 for sampling.
NOTE: 1353875 bytes were written to the table 

## Train Baseline+Network Features Forest Model

In [23]:
networkBaseForestModel = "networkBaseForestModel"

In [24]:
%%time
resultsTrainNetworkBaseForest = demo.trainForestModel(
    networkBaseForestModel, "contentTrainNetwork", baseFeatureList+networkFeatureList)
resultsTrainNetworkBaseForest['OutputCasTables']

NOTE: 1608867 bytes were written to the table "networkBaseForestModelAStore" in the caslib "CASUSERHDFS(brrees)".
CPU times: user 46.9 ms, sys: 0 ns, total: 46.9 ms
Wall time: 34.5 s


### Score Baseline+Network Forest Model

In [25]:
resultsScoreNetworkBaseForest=demo.scoreForestModel(networkBaseForestModel,"contentTestNetwork")

Accuracy = 0.7416974169741697


### Bootstrap Runs

In [26]:
%%time
accuracies = demo.bootstrapForestModel(networkBaseForestModel,"contentTrainNetwork",
                          "contentTestNetwork",
                          baseFeatureList+networkFeatureList);

NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5678 for sampling.
NOTE: 1608867 bytes were written to the table "networkBaseForestModelAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.7459016393442623
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5679 for sampling.
NOTE: 1615683 bytes were written to the table "networkBaseForestModelAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.7418032786885246
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5680 for sampling.
NOTE: 1616983 bytes were written to the table "networkBaseForestModelAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.7295081967213115
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5681 for sampling.
NOTE: 1612143 bytes were written to the table "networkBaseForestModelAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.7295081967213115
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5682 for sampling.
NOTE: 1614167 bytes were writ

## Train PCA+Network Features Forest Model

In [27]:
networkPcaForestModel = "networkPcaForestModel"

In [28]:
%%time
resultsTrainNetworkPcaForest = demo.trainForestModel(
    networkPcaForestModel, "contentTrainPcaNetwork", pcaFeatureList+networkFeatureList)
resultsTrainNetworkPcaForest['OutputCasTables']

NOTE: 1334778 bytes were written to the table "networkPcaForestModelAStore" in the caslib "CASUSERHDFS(brrees)".
CPU times: user 15.6 ms, sys: 0 ns, total: 15.6 ms
Wall time: 3.61 s


### Score PCA+Network Forest Model

In [29]:
resultsScoreNetworkPcaForest=demo.scoreForestModel(networkPcaForestModel,"contentTestPcaNetwork")

Accuracy = 0.7915129151291513


### Bootstrap Runs

In [30]:
%%time
accuracies = demo.bootstrapForestModel(networkPcaForestModel,"contentTrainPcaNetwork",
                          "contentTestPcaNetwork",
                          pcaFeatureList+networkFeatureList);

NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5678 for sampling.
NOTE: 1334778 bytes were written to the table "networkPcaForestModelAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.7950819672131147
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5679 for sampling.
NOTE: 1336482 bytes were written to the table "networkPcaForestModelAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.7971311475409836
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5680 for sampling.
NOTE: 1333346 bytes were written to the table "networkPcaForestModelAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.7971311475409836
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5681 for sampling.
NOTE: 1341178 bytes were written to the table "networkPcaForestModelAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.7971311475409836
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5682 for sampling.
NOTE: 1342754 bytes were written 

# Autotune the Forest Models

In [31]:
def tuneForestModel(modelName, tableTrain, featureList, tunerOptions=None):
    if tunerOptions is None:
        tunerOptions = {
            "seed": 123,
            "objective": "MISC"
        }
    result = s.autotune.tuneForest(
        trainOptions={
            "table": tableTrain,
            "inputs": featureList,
            "target": targetColumn,
            "nominal": {targetColumn},
            "casout": {"name": modelName, "replace": True},
            "saveState": {"name": f"{modelName}AStore", "replace": True}
        },
        tunerOptions=tunerOptions
    )
    return result

def loadOrTuneForestModel(
        modelName,
        tableTrain,
        featureList,
        tunerOptions=None,
        newRun=False):
    coraCaslib = "cora"
    addCaslibIfNeeded(coraCaslib)

    r = s.table.fileInfo(caslib="cora")
    if not f"{modelName}AStore.sashdat" in r.FileInfo["Name"].unique():
        newRun = True
    if not os.path.exists(f"../data/{modelName}Best.pkl"):
        newRun = True

    if newRun:
        r = resultsTrainNetworkForestAuto = tuneForestModel(
            modelName, tableTrain, featureList)
        saveTables([f"{modelName}AStore", f"{modelName}"])
        r.BestConfiguration.to_pickle(f"../data/{modelName}Best.pkl")
        return r.BestConfiguration
    else:
        loadTables([f"{modelName}AStore", f"{modelName}"])
        bestConfiguration = pd.read_pickle(f"../data/{modelName}Best.pkl")
        return bestConfiguration

## Autotune PCA Features Model

In [32]:
newRun=False

In [33]:
pcaModelAuto = "pcaModelAuto"

In [34]:
forestParamAuto = demo.loadOrTuneForestModel(pcaModelAuto,
                           "contentTrainPca",
                           pcaFeatureList,
                           newRun=newRun
                          )

NOTE: Cloud Analytic Services made the file pcaModelAutoAStore.sashdat available as table PCAMODELAUTOASTORE in caslib CASUSERHDFS(brrees).
NOTE: Cloud Analytic Services made the file pcaModelAuto.sashdat available as table PCAMODELAUTO in caslib CASUSERHDFS(brrees).


### Score Autotuned PCA Features Model

In [35]:
resultsScorePcaModelAuto=demo.scoreForestModel(pcaModelAuto,"contentTestPca")

Accuracy = 0.6881918819188192


### Bootstrap Runs

In [36]:
%%time
accuracies = demo.bootstrapForestModel(pcaModelAuto,"contentTrainPca",
                          "contentTestPca",
                          pcaFeatureList,
                          forestParamAuto,
                          25);

NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5678 for sampling.
NOTE: 27483492 bytes were written to the table "pcaModelAutoAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.694672131147541
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5679 for sampling.
NOTE: 27513260 bytes were written to the table "pcaModelAutoAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.6844262295081968
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5680 for sampling.
NOTE: 27535980 bytes were written to the table "pcaModelAutoAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.680327868852459
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5681 for sampling.
NOTE: 27490660 bytes were written to the table "pcaModelAutoAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.6844262295081968
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5682 for sampling.
NOTE: 27451780 bytes were written to the table "pcaModelAutoAStore"

## Autotune Network-Features-Only Model

In [37]:
networkModelAuto = "networkModelAuto"

In [38]:
forestParamAuto = demo.loadOrTuneForestModel(networkModelAuto,
                           "contentTrainNetwork",
                           networkFeatureList,
                           newRun=newRun
                          )

NOTE: Cloud Analytic Services made the file networkModelAutoAStore.sashdat available as table NETWORKMODELAUTOASTORE in caslib CASUSERHDFS(brrees).
NOTE: Cloud Analytic Services made the file networkModelAuto.sashdat available as table NETWORKMODELAUTO in caslib CASUSERHDFS(brrees).


### Score Autotuned Network-Features-Only Model

In [39]:
resultsScoreNetworkModelAuto=demo.scoreForestModel(networkModelAuto,"contentTestNetwork")

Accuracy = 0.8523985239852399


### Bootstrap Runs

In [40]:
%%time
accuracies = demo.bootstrapForestModel(networkModelAuto,"contentTrainNetwork",
                          "contentTestNetwork",
                          networkFeatureList,
                          forestParamAuto,
                          25);

NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5678 for sampling.
NOTE: 6362659 bytes were written to the table "networkModelAutoAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.8463114754098361
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5679 for sampling.
NOTE: 6363299 bytes were written to the table "networkModelAutoAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.8504098360655737
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5680 for sampling.
NOTE: 6358395 bytes were written to the table "networkModelAutoAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.8442622950819673
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5681 for sampling.
NOTE: 6356387 bytes were written to the table "networkModelAutoAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.8463114754098361
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5682 for sampling.
NOTE: 6353107 bytes were written to the table "networ

## Autotune PCA+Network Model

In [41]:
networkPcaModelAuto = "networkPcaModelAuto"

In [42]:
forestParamAuto = demo.loadOrTuneForestModel(networkPcaModelAuto,
                           "contentTrainPcaNetwork",
                           pcaFeatureList + networkFeatureList,
                           newRun=newRun
                          )

NOTE: Cloud Analytic Services made the file networkPcaModelAutoAStore.sashdat available as table NETWORKPCAMODELAUTOASTORE in caslib CASUSERHDFS(brrees).
NOTE: Cloud Analytic Services made the file networkPcaModelAuto.sashdat available as table NETWORKPCAMODELAUTO in caslib CASUSERHDFS(brrees).


### Score Autotuned PCA+Network Model

In [43]:
resultsScoreNetworkPcaModelAuto=demo.scoreForestModel(networkPcaModelAuto,"contentTestPcaNetwork")

Accuracy = 0.8523985239852399


### Bootstrap Runs

In [44]:
%%time
accuracies = demo.bootstrapForestModel(networkPcaModelAuto,"contentTrainPcaNetwork",
                          "contentTestPcaNetwork",
                          pcaFeatureList+networkFeatureList,
                          forestParamAuto,
                          25);

NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5678 for sampling.
NOTE: 6615870 bytes were written to the table "networkPcaModelAutoAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.8545081967213115
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5679 for sampling.
NOTE: 6621238 bytes were written to the table "networkPcaModelAutoAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.8565573770491803
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5680 for sampling.
NOTE: 6590182 bytes were written to the table "networkPcaModelAutoAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.8504098360655737
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5681 for sampling.
NOTE: 6590214 bytes were written to the table "networkPcaModelAutoAStore" in the caslib "CASUSERHDFS(brrees)".
Accuracy = 0.8504098360655737
NOTE: Simple Random Sampling is in effect.
NOTE: Using SEED=5682 for sampling.
NOTE: 6589022 bytes were written to the t

# Session Cleanup

In [45]:
s.terminate();