## Amazon SageMaker Data-Processing & Training Job

With Amazon SageMaker, you can leverage a simplified, managed experience to run data pre- or post-processing and model evaluation workloads on the Amazon SageMaker platform.

A processing job downloads input from Amazon Simple Storage Service (Amazon S3), then uploads outputs to Amazon S3 during or after the processing job.

This notebook shows how you can:

1. Run a processing job to run a scikit-learn script that cleans, pre-processes, performs feature engineering, and splits the input data into train and test sets.
2. Run a training job on the pre-processed training data to train a model
3. Predict on the trained model

The dataset used here is the [Census-Income KDD Dataset](https://archive.ics.uci.edu/ml/datasets/Census-Income+%28KDD%29). You select features from this dataset, clean the data, and turn the data into features that the training algorithm can use to train a binary classification model, and split the data into train and test sets. The task is to predict whether rows representing census responders have an income greater than `$50,000`, or less than `$50,000` by training a logistic regression model.

## Mounting the EFS filesystem

In [1]:
import boto3

client = boto3.client('efs')

In [2]:
ip_addr = client.describe_mount_targets(FileSystemId='fs-7b1a6df8')['MountTargets'][0]['IpAddress']
ip_addr

'10.0.2.136'

In [4]:
%%sh 
mkdir efs
sudo mount -t nfs \
    -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 \
    10.0.2.136:/ \
    ./efs

sudo chmod go+rw ./efs

mount.nfs: /home/ec2-user/SageMaker/repo_efs/efs_lambda/efs is busy or already mounted


## Data pre-processing and feature engineering

To run the scikit-learn preprocessing script as a processing job, create a `SKLearnProcessor`, which lets you run scripts inside of processing jobs using the scikit-learn image provided.

In [15]:
!jupyter kernelspec list

Available kernels:
  ir         /home/ec2-user/.local/share/jupyter/kernels/ir
  python3    /home/ec2-user/anaconda3/envs/python3/share/jupyter/kernels/python3


In [10]:
!pip install pandas==0.25.3 
!pip install scikit-learn==0.21.3

Collecting pandas==0.25.3
  Downloading pandas-0.25.3-cp36-cp36m-manylinux1_x86_64.whl (10.4 MB)
[K     |████████████████████████████████| 10.4 MB 17.5 MB/s eta 0:00:01
Installing collected packages: pandas
  Attempting uninstall: pandas
    Found existing installation: pandas 1.0.3
    Uninstalling pandas-1.0.3:
      Successfully uninstalled pandas-1.0.3
Successfully installed pandas-0.25.3
You should consider upgrading via the '/home/ec2-user/anaconda3/envs/python3/bin/python -m pip install --upgrade pip' command.[0m
Collecting scikit-learn==0.21.3
  Downloading scikit_learn-0.21.3-cp36-cp36m-manylinux1_x86_64.whl (6.7 MB)
[K     |████████████████████████████████| 6.7 MB 15.5 MB/s eta 0:00:01
Installing collected packages: scikit-learn
  Attempting uninstall: scikit-learn
    Found existing installation: scikit-learn 0.22.1
    Uninstalling scikit-learn-0.22.1:
      Successfully uninstalled scikit-learn-0.22.1
Successfully installed scikit-learn-0.21.3
You should consider upgrad

In [29]:
import boto3
import sagemaker
from sagemaker import get_execution_role
from sagemaker.sklearn.processing import SKLearnProcessor

region = boto3.session.Session().region_name

role = get_execution_role()
sklearn_processor = SKLearnProcessor(framework_version='0.20.0',
                                     role=role,
                                     instance_type='ml.m5.xlarge',
                                     instance_count=1)

Before introducing the script you use for data cleaning, pre-processing, and feature engineering, inspect the first 20 rows of the dataset. The target is predicting the `income` category. The features from the dataset you select are `age`, `education`, `major industry code`, `class of worker`, `num persons worked for employer`, `capital gains`, `capital losses`, and `dividends from stocks`.

In [30]:
import pandas as pd

input_data = 's3://sagemaker-sample-data-{}/processing/census/census-income.csv'.format(region)
df = pd.read_csv(input_data, nrows=10)
df.head(n=10)

Unnamed: 0,age,class of worker,detailed industry recode,detailed occupation recode,education,wage per hour,enroll in edu inst last wk,marital stat,major industry code,major occupation code,...,country of birth father,country of birth mother,country of birth self,citizenship,own business or self employed,fill inc questionnaire for veteran's admin,veterans benefits,weeks worked in year,year,income
0,73,Not in universe,0,0,High school graduate,0,Not in universe,Widowed,Not in universe or children,Not in universe,...,United-States,United-States,United-States,Native- Born in the United States,0,Not in universe,2,0,95,- 50000.
1,58,Self-employed-not incorporated,4,34,Some college but no degree,0,Not in universe,Divorced,Construction,Precision production craft & repair,...,United-States,United-States,United-States,Native- Born in the United States,0,Not in universe,2,52,94,- 50000.
2,18,Not in universe,0,0,10th grade,0,High school,Never married,Not in universe or children,Not in universe,...,Vietnam,Vietnam,Vietnam,Foreign born- Not a citizen of U S,0,Not in universe,2,0,95,- 50000.
3,9,Not in universe,0,0,Children,0,Not in universe,Never married,Not in universe or children,Not in universe,...,United-States,United-States,United-States,Native- Born in the United States,0,Not in universe,0,0,94,- 50000.
4,10,Not in universe,0,0,Children,0,Not in universe,Never married,Not in universe or children,Not in universe,...,United-States,United-States,United-States,Native- Born in the United States,0,Not in universe,0,0,94,- 50000.
5,48,Private,40,10,Some college but no degree,1200,Not in universe,Married-civilian spouse present,Entertainment,Professional specialty,...,Philippines,United-States,United-States,Native- Born in the United States,2,Not in universe,2,52,95,- 50000.
6,42,Private,34,3,Bachelors degree(BA AB BS),0,Not in universe,Married-civilian spouse present,Finance insurance and real estate,Executive admin and managerial,...,United-States,United-States,United-States,Native- Born in the United States,0,Not in universe,2,52,94,- 50000.
7,28,Private,4,40,High school graduate,0,Not in universe,Never married,Construction,Handlers equip cleaners etc,...,United-States,United-States,United-States,Native- Born in the United States,0,Not in universe,2,30,95,- 50000.
8,47,Local government,43,26,Some college but no degree,876,Not in universe,Married-civilian spouse present,Education,Adm support including clerical,...,United-States,United-States,United-States,Native- Born in the United States,0,Not in universe,2,52,95,- 50000.
9,34,Private,4,37,Some college but no degree,0,Not in universe,Married-civilian spouse present,Construction,Machine operators assmblrs & inspctrs,...,United-States,United-States,United-States,Native- Born in the United States,0,Not in universe,2,52,94,- 50000.


This notebook cell writes a file `preprocessing.py`, which contains the pre-processing script. You can update the script, and rerun this cell to overwrite `preprocessing.py`. You run this as a processing job in the next cell. In this script, you

* Remove duplicates and rows with conflicting data
* transform the target `income` column into a column containing two labels.
* transform the `age` and `num persons worked for employer` numerical columns into categorical features by binning them
* scale the continuous `capital gains`, `capital losses`, and `dividends from stocks` so they're suitable for training
* encode the `education`, `major industry code`, `class of worker` so they're suitable for training
* split the data into training and test datasets, and saves the training features and labels and test features and labels.

Our training script will use the pre-processed training features and labels to train a model, and our model evaluation script will use the trained model and pre-processed test features and labels to evaluate the model.

In [31]:
%%writefile preprocessing.py

import argparse
import os
import warnings

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder, LabelBinarizer, KBinsDiscretizer
from sklearn.preprocessing import PolynomialFeatures
from sklearn.compose import make_column_transformer
from pickle import dump
from sklearn.exceptions import DataConversionWarning
warnings.filterwarnings(action='ignore', category=DataConversionWarning)


columns = ['age', 'education', 'major industry code', 'class of worker', 'num persons worked for employer',
           'capital gains', 'capital losses', 'dividends from stocks', 'income']
class_labels = [' - 50000.', ' 50000+.']

def print_shape(df):
    negative_examples, positive_examples = np.bincount(df['income'])
    print('Data shape: {}, {} positive examples, {} negative examples'.format(df.shape, positive_examples, negative_examples))

if __name__=='__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--train-test-split-ratio', type=float, default=0.3)
    args, _ = parser.parse_known_args()
    
    print('Received arguments {}'.format(args))

    input_data_path = os.path.join('/opt/ml/processing/input', 'census-income.csv')
    
    print('Reading input data from {}'.format(input_data_path))
    df = pd.read_csv(input_data_path)
    df = pd.DataFrame(data=df, columns=columns)
    df.dropna(inplace=True)
    df.drop_duplicates(inplace=True)
    df.replace(class_labels, [0, 1], inplace=True)
    
    negative_examples, positive_examples = np.bincount(df['income'])
    print('Data after cleaning: {}, {} positive examples, {} negative examples'.format(df.shape, positive_examples, negative_examples))
    
    split_ratio = args.train_test_split_ratio
    print('Splitting data into train and test sets with ratio {}'.format(split_ratio))
    X_train, X_test, y_train, y_test = train_test_split(df.drop('income', axis=1), df['income'], test_size=split_ratio, random_state=0)

    preprocess = make_column_transformer(
        (['age', 'num persons worked for employer'], KBinsDiscretizer(encode='onehot-dense', n_bins=10)),
        (['capital gains', 'capital losses', 'dividends from stocks'], StandardScaler()),
        (['education', 'major industry code', 'class of worker'], OneHotEncoder(sparse=False))
    )
    print('Running preprocessing and feature engineering transformations')
    train_features = preprocess.fit_transform(X_train)
    test_features = preprocess.transform(X_test)
    
    print('Train data shape after preprocessing: {}'.format(train_features.shape))
    print('Test data shape after preprocessing: {}'.format(test_features.shape))
    
    train_features_output_path = os.path.join('/opt/ml/processing/train', 'train_features.csv')
    train_labels_output_path = os.path.join('/opt/ml/processing/train', 'train_labels.csv')
    
    test_features_output_path = os.path.join('/opt/ml/processing/test', 'test_features.csv')
    test_labels_output_path = os.path.join('/opt/ml/processing/test', 'test_labels.csv')
    
    print('Saving training features to {}'.format(train_features_output_path))
    pd.DataFrame(train_features).to_csv(train_features_output_path, header=False, index=False)
    
    print('Saving test features to {}'.format(test_features_output_path))
    pd.DataFrame(test_features).to_csv(test_features_output_path, header=False, index=False)
    
    print('Saving training labels to {}'.format(train_labels_output_path))
    y_train.to_csv(train_labels_output_path, header=False, index=False)
    
    print('Saving test labels to {}'.format(test_labels_output_path))
    y_test.to_csv(test_labels_output_path, header=False, index=False)
    
    dump(preprocess, open('/opt/ml/processing/processor/preprocessor.pkl', 'wb'))

Overwriting preprocessing.py


Run this script as a processing job. Use the `SKLearnProcessor.run()` method. You give the `run()` method one `ProcessingInput` where the `source` is the census dataset in Amazon S3, and the `destination` is where the script reads this data from, in this case `/opt/ml/processing/input`. These local paths inside the processing container must begin with `/opt/ml/processing/`.

Also give the `run()` method a `ProcessingOutput`, where the `source` is the path the script writes output data to. For outputs, the `destination` defaults to an S3 bucket that the Amazon SageMaker Python SDK creates for you, following the format `s3://sagemaker-<region>-<account_id>/<processing_job_name>/output/<output_name/`. You also give the ProcessingOutputs values for `output_name`, to make it easier to retrieve these output artifacts after the job is run.

The `arguments` parameter in the `run()` method are command-line arguments in our `preprocessing.py` script.

In [None]:
from sagemaker.processing import ProcessingInput, ProcessingOutput

sklearn_processor.run(code='preprocessing.py',
                      inputs=[ProcessingInput(
                        source=input_data,
                        destination='/opt/ml/processing/input')],
                      outputs=[ProcessingOutput(output_name='train_data',
                                                source='/opt/ml/processing/train'),
                               ProcessingOutput(output_name='test_data',
                                                source='/opt/ml/processing/test'),
                               ProcessingOutput(output_name='saved_processor',
                                                source='/opt/ml/processing/processor')],
                      arguments=['--train-test-split-ratio', '0.2']
                     )

preprocessing_job_description = sklearn_processor.jobs[-1].describe()

output_config = preprocessing_job_description['ProcessingOutputConfig']
for output in output_config['Outputs']:
    if output['OutputName'] == 'train_data':
        preprocessed_training_data = output['S3Output']['S3Uri']
    if output['OutputName'] == 'test_data':
        preprocessed_test_data = output['S3Output']['S3Uri']
    if output['OutputName'] == 'saved_processor':
        preprocessor = output['S3Output']['S3Uri']

In [None]:
preprocessor

In [61]:
%%sh
mkdir efs/ml/sagemaker_model
os.system("aws s3 cp {} efs/ml/sagemaker_model".format(preprocessor))


Completed 256.0 KiB/99.1 MiB (1.6 MiB/s) with 1 file(s) remainingCompleted 512.0 KiB/99.1 MiB (3.2 MiB/s) with 1 file(s) remainingCompleted 768.0 KiB/99.1 MiB (4.6 MiB/s) with 1 file(s) remainingCompleted 1.0 MiB/99.1 MiB (6.0 MiB/s) with 1 file(s) remaining  Completed 1.2 MiB/99.1 MiB (7.4 MiB/s) with 1 file(s) remaining  Completed 1.5 MiB/99.1 MiB (8.7 MiB/s) with 1 file(s) remaining  Completed 1.8 MiB/99.1 MiB (10.0 MiB/s) with 1 file(s) remaining Completed 2.0 MiB/99.1 MiB (11.4 MiB/s) with 1 file(s) remaining Completed 2.2 MiB/99.1 MiB (12.4 MiB/s) with 1 file(s) remaining Completed 2.5 MiB/99.1 MiB (13.6 MiB/s) with 1 file(s) remaining Completed 2.8 MiB/99.1 MiB (14.7 MiB/s) with 1 file(s) remaining Completed 3.0 MiB/99.1 MiB (15.9 MiB/s) with 1 file(s) remaining Completed 3.2 MiB/99.1 MiB (16.8 MiB/s) with 1 file(s) remaining Completed 3.5 MiB/99.1 MiB (18.0 MiB/s) with 1 file(s) remaining Completed 3.8 MiB/99.1 MiB (18.8 MiB/s) with 1 file(s) remaining Completed 

mkdir: cannot create directory ‘efs/ml/sagemaker_model’: File exists
mkdir: cannot create directory ‘efs/ml/sagemaker_model/test’: File exists
mkdir: cannot create directory ‘efs/ml/sagemaker_model/train’: File exists


Now inspect the output of the pre-processing job, which consists of the processed features.

In [9]:
training_features = pd.read_csv(preprocessed_training_data + '/train_features.csv', nrows=10)
print('Training features shape: {}'.format(training_features.shape))
training_features.head(n=10)

Training features shape: (10, 73)


Unnamed: 0,0.0,0.0.1,0.0.2,0.0.3,0.0.4,1.0,0.0.5,0.0.6,0.0.7,0.0.8,...,0.0.56,0.0.57,0.0.58,0.0.59,0.0.60,1.0.4,0.0.61,0.0.62,0.0.63,0.0.64
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
6,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
7,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
8,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
9,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0


## Training using the pre-processed data

We create a `SKLearn` instance, which we will use to run a training job using the training script `train.py`.  

In [10]:
from sagemaker.sklearn.estimator import SKLearn

sklearn = SKLearn(
    entry_point='train.py',
    train_instance_type="ml.m5.xlarge",
    role=role)

The training script `train.py` trains a logistic regression model on the training data, and saves the model to the `/opt/ml/model` directory, which Amazon SageMaker tars and uploads into a `model.tar.gz` file into S3 at the end of the training job.

In [11]:
%%writefile train.py

import os

import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.externals import joblib

if __name__=="__main__":
    training_data_directory = '/opt/ml/input/data/train'
    train_features_data = os.path.join(training_data_directory, 'train_features.csv')
    train_labels_data = os.path.join(training_data_directory, 'train_labels.csv')
    print('Reading input data')
    X_train = pd.read_csv(train_features_data, header=None)
    y_train = pd.read_csv(train_labels_data, header=None)

    model = LogisticRegression(class_weight='balanced', solver='lbfgs')
    print('Training LR model')
    model.fit(X_train, y_train)
    model_output_directory = os.path.join('/opt/ml/model', "model.joblib")
    print('Saving model to {}'.format(model_output_directory))
    joblib.dump(model, model_output_directory)

Writing train.py


Run the training job using `train.py` on the preprocessed training data.

In [None]:
sklearn.fit({'train': preprocessed_training_data})
training_job_description = sklearn.jobs[-1].describe()
model_data_s3_uri = '{}{}/{}'.format(
    training_job_description['OutputDataConfig']['S3OutputPath'],
    training_job_description['TrainingJobName'],
    'output/model.tar.gz')

In [None]:
model_data_s3_uri

In [46]:
os.system("aws s3 cp {} efs/ml/sagemaker_model".format(model_data_s3_uri))
os.system("tar -xzf efs/ml/sagemaker_model/model.tar.gz --directory efs/ml/sagemaker_model")

0

## Creating Sample Test Data For Inference

In [None]:
os.system('mkdir test_data')

In [49]:
import pandas as pd
df = pd.read_csv('s3://sagemaker-sample-data-us-east-1/processing/census/census-income.csv')

In [81]:
df1 =df.sample(5)
df1['income']

88215       50000+.
94347       50000+.
135917     - 50000.
175569     - 50000.
74772      - 50000.
Name: income, dtype: object

In [84]:
pd.DataFrame(df1).to_csv("test_data/test_data.csv", index=False)

In [None]:
%%sh

aws s3 cp test_data/test_data.csv s3://lambdatestbucket

In [87]:
df2 = pd.read_csv("s3://lambdatestbucket/test_data.csv")
df2

Unnamed: 0,age,class of worker,detailed industry recode,detailed occupation recode,education,wage per hour,enroll in edu inst last wk,marital stat,major industry code,major occupation code,...,country of birth father,country of birth mother,country of birth self,citizenship,own business or self employed,fill inc questionnaire for veteran's admin,veterans benefits,weeks worked in year,year,income
0,51,Private,35,17,Some college but no degree,0,Not in universe,Married-civilian spouse present,Finance insurance and real estate,Sales,...,United-States,United-States,United-States,Native- Born in the United States,1,Not in universe,2,52,95,50000+.
1,55,Private,25,3,High school graduate,0,Not in universe,Married-civilian spouse present,Manufacturing-nondurable goods,Executive admin and managerial,...,United-States,United-States,United-States,Native- Born in the United States,0,Not in universe,2,52,94,50000+.
2,3,Not in universe,0,0,Children,0,Not in universe,Never married,Not in universe or children,Not in universe,...,United-States,United-States,United-States,Native- Born in the United States,0,Not in universe,0,0,94,- 50000.
3,31,Private,34,2,Bachelors degree(BA AB BS),0,Not in universe,Married-civilian spouse present,Finance insurance and real estate,Executive admin and managerial,...,United-States,United-States,United-States,Native- Born in the United States,2,Not in universe,2,51,94,- 50000.
4,28,Private,4,40,High school graduate,0,Not in universe,Married-civilian spouse present,Construction,Handlers equip cleaners etc,...,United-States,United-States,United-States,Native- Born in the United States,2,Not in universe,2,14,95,- 50000.


## Inference


In [1]:
import json
import os
import tarfile
from pickle import load
import pandas as pd

from sklearn.externals import joblib
from sklearn.metrics import classification_report, roc_auc_score, accuracy_score

model_path = os.path.join('efs/ml/sagemaker_model', 'model.joblib')
preprocessor_path = os.path.join('efs/ml/sagemaker_model','preprocessor.pkl' )
preprocessor = load(open(preprocessor_path, 'rb'))
print("Preprocessor Loaded")

print('Loading model')
model = joblib.load(model_path)



Preprocessor Loaded
Loading model




In [2]:
columns = ['age', 'education', 'major industry code', 'class of worker', 'num persons worked for employer',
       'capital gains', 'capital losses', 'dividends from stocks', 'income']
class_labels = [' - 50000.', ' 50000+.']

In [3]:
print('Loading test input data')
test_data = "test_data/test_data.csv"
df = pd.read_csv(test_data)
df = pd.DataFrame(data=df, columns=columns)
df.dropna(inplace=True)
df.drop_duplicates(inplace=True)
df.replace(class_labels, [0, 1], inplace=True)
X_test = df.drop('income', axis=1)
y_test = df['income']
X_test

Loading test input data


Unnamed: 0,age,education,major industry code,class of worker,num persons worked for employer,capital gains,capital losses,dividends from stocks
0,51,Some college but no degree,Finance insurance and real estate,Private,6,5178,0,0
1,55,High school graduate,Manufacturing-nondurable goods,Private,6,0,0,175
2,3,Children,Not in universe or children,Not in universe,0,0,0,0
3,31,Bachelors degree(BA AB BS),Finance insurance and real estate,Private,2,0,0,0
4,28,High school graduate,Construction,Private,4,0,0,0


In [4]:
print('Running preprocessing and feature engineering transformations')
test_features = preprocessor.transform(X_test)

test_features_output_path = os.path.join('test_data', 'test_features.csv')  
test_labels_output_path = os.path.join('test_data', 'test_labels.csv')

print('Saving test features to {}'.format(test_features_output_path))
pd.DataFrame(test_features).to_csv(test_features_output_path, header=False, index=False)

print('Saving test labels to {}'.format(test_labels_output_path))
y_test.to_csv(test_labels_output_path, header=False, index=False)
    
X_test = pd.read_csv(test_features_output_path, header=None)
actual_values = pd.read_csv(test_labels_output_path, header=None)

Running preprocessing and feature engineering transformations
Saving test features to test_data/test_features.csv
Saving test labels to test_data/test_labels.csv


In [5]:
predictions = model.predict(X_test)
predictions_df = pd.DataFrame(predictions)
predictions_df.replace([0,1], ["Less than 50K", "Greater than 50K"], inplace=True)
actual_values.replace([0,1], ["Less than 50K", "Greater than 50K"], inplace=True)

In [6]:
print("Actual Values:")
actual_values

Actual Values:


Unnamed: 0,0
0,Greater than 50K
1,Greater than 50K
2,Less than 50K
3,Less than 50K
4,Less than 50K


In [7]:
print("Model Predicitons:")
predictions_df

Model Predicitons:


Unnamed: 0,0
0,Greater than 50K
1,Less than 50K
2,Less than 50K
3,Greater than 50K
4,Less than 50K
