### Installing datarobot client and loading dataset from google drive

In [None]:
# installing datarobot client
!pip install datarobot

In [None]:
# loading dataset from google colab
from google.colab import drive
drive.mount('/content/drive')

### Configure application credentials

To use Python with DataRobot, you first need to establish a connection between your application or notebook and the DataRobot server. This requires your credentials, consisting of a DataRobot application *endpoint* URL and the *API access token* for your DataRobot account.

* To retrieve your access token, log into your DataRobot application account, and then select **Profile Settings** (top right icon) -> **Developer Tools**. Create a new API token here if you haven't created one already.

* Your [endpoint](https://api-docs.datarobot.com/docs/guide-to-different-datarobot-endpoints) URL depends on the type of your DataRobot account:

    * Use `https://app.datarobot.com/api/v2` if you have a US **Managed AI Cloud** account

    * Use `https://app.eu.datarobot.com/api/v2` if you have a European **Managed AI Cloud** account
    
    * Use `https://app2.datarobot.com/api/v2` if you have a **trial or pay-as-you-go** account
    
    * If you are on a **self-managed or on-premises system**, find the host name you use to access your DataRobot account and use `https://your-host-name/api/v2`

You can specify your credentials in [a number of ways](https://datarobot-public-api-client.readthedocs-hosted.com/en/v2.25.0/setup/getting_started.html#credentials). For this activity, specify them in the provided file called `drconfig.yaml`, which should be in the same directory as this Jupyter notebook.

The `.yaml` file is a text file containing two lines. Edit the file to add your credentials and save the file.

`token: YOUR_API_TOKEN`  
`endpoint: YOUR_ENDPOINT`


### Connect the application or notebook to DataRobot

The DataRobot Python API package is called `datarobot`. Take a look at the [API documentation](https://datarobot-public-api-client.readthedocs-hosted.com/en/v2.25.0/index.html) now. Keep this page open so that you can refer to the documentation throughout the lesson.

For convenience, import it using the identifier `dr`.

In [4]:
import datarobot as dr

In order for your client to connect to DataRobot, you need to [create and configure](https://datarobot-public-api-client.readthedocs-hosted.com/en/v2.25.0/setup/configuration.html) a global `Client` object, which will be used by any Python Client API calls that need to connect to the DataRobot server. 

**Tip**: To learn more about any Python object, property or method, you can use the Jupyter's tab completion functionality. Try this now: type `dr.Client` and enter `SHIFT+TAB+TAB` (that is, hold the `SHIFT` key while you press the `TAB` key twice). This will pop up a window with details about how to configure the client. Click anywhere outside of the window to close it.

In [5]:
dr.Client

<function datarobot.client.Client>

Now create a new `Client` object based on the credentials in the configuration YAML file you edited above.

In [6]:
dr.Client(config_path = 'drconfig.yaml')

<datarobot.rest.RESTClientObject at 0x7fc0a4cdfed0>

***

## Training Models

### Explore the training data using [pandas](https://pandas.pydata.org/)

This activity uses the Python `pandas` data analysis package. You do not have to use `pandas` to use the DataRobot Python API, but it is a convenient way to work with datasets, so we will use it here.

In [7]:
import pandas as pd

Create a pandas DataFrame based on the wine quality training dataset we've provided and view the first few rows.

In [8]:
trainingData = pd.read_csv('winequality-white-training.csv')
trainingData.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.0,0.27,0.36,20.7,0.045,45.0,170.0,1.001,3.0,0.45,8.8,6
1,6.3,0.3,0.34,1.6,0.049,14.0,132.0,0.994,3.3,0.49,9.5,6
2,8.1,0.28,0.4,6.9,0.05,30.0,97.0,0.9951,3.26,0.44,10.1,6
3,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6
4,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6


---

### Start a project


A [`Project`](https://datarobot-public-api-client.readthedocs-hosted.com/en/v2.25.0/entities/project.html) object contains a dataset as well as the models trained from that dataset. You need a project before you can build a model.

Every project requires a name. Here we set a project name that includes the creation date  a project to help easily distinguish projects from one another. 


In [9]:
from datetime import date
projectName = 'Python wine quality ' + date.today().strftime(format = "%Y-%m-%d")
projectName

'Python wine quality 2021-11-17'

There are a number of ways to create a project. Here we use the `Project.create` method, passing the DataFrame created above and the name. When you create a project, it uploads the dataset to DataRobot and performs the initial [exploratory data analysis](https://app.datarobot.com/docs/modeling/reference/model-detail/eda-explained.html#eda1); the operation may take a minute to complete. 

**Tip**: position the cursor after `dr.Project` and use `SHIFT+TAB+TAB` to view the various attributes of a `Project` object.

In [10]:
project = dr.Project.create (
    sourcedata = trainingData,
    project_name = projectName
)

print(project.id, project.project_name)

6194e7b6430565f5e943de01 Python wine quality 2021-11-17


In [11]:
# we have already created a project so we will load it
project = dr.Project.get(project_id='6194e7b6430565f5e943de01')
project.project_name

'Python wine quality 2021-11-17'

*Optional*: Log in to the DataRobot web UI and confirm the new project is shown in the list of projects. (Use https://app.datarobot.com/manage-projects for a Managed AI Cloud account or https://app2.datarobot.com/manage-projects for a trial or pay-as-you-go account). If you leave this page open, you can return to it to view and verify the results of your work using the API.

`Project` objects give you many ways to explore and interact with projects and datasets. Going into depth on these capabilities is beyond the scope of this activity, but as an example, try viewing the project's list of features.

In [12]:
project.get_features()

[Feature(alcohol),
 Feature(chlorides),
 Feature(citric acid),
 Feature(density),
 Feature(fixed acidity),
 Feature(free sulfur dioxide),
 Feature(pH),
 Feature(quality),
 Feature(residual sugar),
 Feature(sulphates),
 Feature(total sulfur dioxide),
 Feature(volatile acidity)]

----

### Build models for the project

Now that you have the data uploaded, you can start DataRobot Autopilot to perform the [second round of exploratory data analysis](https://app.datarobot.com/docs/modeling/reference/model-detail/eda-explained.html#eda2) and train an [initial set of models](https://app.datarobot.com/docs/tutorials/creating-ai-models/tut-model-mode.html#autopilot). 

There are a number of ways to do this. Here we use `.set_target()` method.
This tells DataRobot which feature to use as the target feature -- that is, the feature the models will predict.
By default, this starts Autopilot using "quick" mode, which builds a limited set of common models based
on the informative features of the data.

Autopilot runs asynchronously. It takes about a minute to kick off Autopilot, after which the API call will return, but Autopilot keeps running.

In [12]:
project.set_target(target = 'quality')

Project(Python wine quality 2021-11-17)

_Optional_: Open the project in the DataRobot web UI and click the **Models** tab. This will display the models that have been built so far and show the Autopilot status on the right.


You can track the Autopilot process in your notebook using `wait_for_autopilot`, which blocks the application until the models are complete.

The process can take a while.
While you are waiting, _go back to the main course content_, and we'll meet back here in about 15 minutes.

In [13]:
project.wait_for_autopilot()

In progress: 0, queued: 0 (waited: 0s)


*Optional*: While you wait, view the project in the Web UI and select **Models**, where can see the models that have been built so far as they complete.

---

# DataRobot Python API Starter Activity 
# Part 2 — Predictions

## Pre-requisites


*Before starting this activity (Part 2), be sure to complete Part 1 (including waiting for Autopilot to complete).*
    


## Part 2 Objectives

In Part 1 of this activity, you:
- Created a project based on a dataset containing wine quality ratings
- Ran autopilot to generate a set of models based on the data

In Part 2, you will:
- Deploy the model recommended by DataRobot
- Request wine quality predictions in batch and real-time modes

The goal in this activity is to predict the quality rating for a particular wine based on various characteristics of the wine, such as its acidity and alcohol and sugar content.

The wine data comes from the University of California, Irvine Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/wine+quality. 

Citation: *P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.*

---

## Viewing and Deploying Models 

### Retrieve the recommended model for the project

When `wait_for_autopilot()` returns, Autopilot has trained an initial set of models in the project. Because you are working in a separate notebook from the one where you started the project, you need to request a reference to the project created from DataRobot.

There are a number of ways to get a handle to your project. For example:

- You can go to the DataRobot application UI, open the project, and extract the project ID from the URL. For example:
    - `app.datarobot.com/projects/`**`60faf10710c1574209c6ddb0`**`/models`
    

- You can get a list of all of your projects using `Project.list()` and find the right one by name. Try that now:
    

In [None]:
for p in dr.Project.list():
    print (p.id, p.project_name)

Using the ID, you can get the right `Project` object.

In [None]:
projectId = 'your-project-id'
project = dr.Project.get(projectId)
print(project)

A DataRobot Client API `Model` object represents a model calculated by DataRobot.
The `Model` class provides numerous ways to evaluate, interact with and test models.
Most of those are beyond the scope of this activity.

For now, just get an array of `Model` objects representing all the project's models built by Autopilot earlier.

In [14]:
models = project.get_models()
for m in models:
    print(m.id,m.model_type)

6194ea2520596cfc64d0e8c0 RandomForest Regressor
6194e8e38d412dfa379cc98d RandomForest Regressor
6194ea531cea9a6c169cc9a4 AVG Blender
6194e9bc630dbdcf5cd0e8c3 RandomForest Regressor
6194e8e38d412dfa379cc98f Light Gradient Boosted Trees Regressor with Early Stopping
6194e9f39e8dc9cac39cc99a RandomForest Regressor
6194e8e38d412dfa379cc990 eXtreme Gradient Boosted Trees Regressor
6194e8e38d412dfa379cc98e Light Gradient Boosting on ElasticNet Predictions 
6194e88f4c3a645fc36d7600 RandomForest Regressor
6194e8904c3a645fc36d7604 Light Gradient Boosting on ElasticNet Predictions 
6194e8904c3a645fc36d7603 Light Gradient Boosted Trees Regressor with Early Stopping
6194ea501cea9a6c169cc99d RandomForest Regressor
6194e88f4c3a645fc36d75fd eXtreme Gradient Boosted Trees Regressor
6194e8904c3a645fc36d7602 RuleFit Regressor
6194e88f4c3a645fc36d75fc Keras Slim Residual Neural Network Regressor using Training Schedule (1 Layer: 64 Units)
6194e88f4c3a645fc36d7601 Generalized Additive2 Model
6194e88f4c3a6

Evaluating and comparing the various characteristics of the models and choosing one to deploy is beyond the scope of this activity. 

Instead, use the "recommended model" chosen automatically by DataRobot.

In [15]:
recommendedModel = dr.ModelRecommendation.get(project.id).get_model()
print (recommendedModel.id,recommendedModel.model_type)

6194ea2520596cfc64d0e8c0 RandomForest Regressor


Note that when using Autopilot in "quick" mode, the recommended model is already [prepared for deployment](https://app.datarobot.com/docs/modeling/reference/model-detail/model-rec-process.html#prepare-a-model-for-deployment), meaning that it has been trained on the validation and holdout datasets in addition to the initial training dataset sample.

----

### Deploy the recommended model

Now that you have selected a model (the recommended model, in this case), the next step is to deploy it to a prediction server. This makes it available to do real-time or batch predictions. 

This operation is different depending on whether you are using a trial or pay-as-you-go account, full Managed AI Cloud account, or an on-premises/self-managed DataRobot installation. 

* **Managed AI Cloud Accounts** and **self-managed DataRobot installations**: for full (as opposed to trial) accounts, you must specify a default prediction server. Select one from the list of available predictions servers provided by your DataRobot account. In this example, the list will contain only a single prediction server.

In [18]:
predictionServer = dr.PredictionServer.list()[0]

deployment = dr.Deployment.create_from_learning_model(
    model_id=recommendedModel.id, 
    label='Wine Quality',
    description='Model for scoring wine quality',
    default_prediction_server_id=predictionServer.id
)


- With **Trial or pay-as-you-go accounts** you don't specify a prediction server when deploying.

In [None]:
deployment = dr.Deployment.create_from_learning_model(
    model_id=recommendedModel.id, 
    label='Wine Quality',
    description='Model for scoring wine quality'
)

## Requesting Predictions

For this activity, we have provided you with a small test dataset containing wines and their feature values. You will practice scoring this data to predict the `quality` target using the batch prediction method and the realtime prediction method.

Review the data in the `winequality-white-score.csv` file.

### Request batch predictions

Start a prediction job that passes in the scoring data from the provided data file, and saves the predictions to a local file called `winequality-white-predictions.csv`.

A "passthrough column" allows you to pass a column value to the prediction engine, which will be included unchanged in the output. In this example, including a unique ID for each wine allows you to easily correlate rows in the scoring dataset with rows in the predictions output.

The prediction job might take a minute or so.

In [20]:
job = dr.BatchPredictionJob.score (
    deployment=deployment.id,
    passthrough_columns=['wine_id'],
    intake_settings={
        'type': 'localFile',
        'file': 'winequality-white-score.csv'
    },
    output_settings={
        'type': 'localFile',
        'path': 'winequality-white-predictions.csv'
    }
)

View the output file `winequality-white-predictions.csv` using the `cat` command on Linux or Mac, or `type` on Windows.

**Note**: If your account has the [Model Deployment Approval Workflow](https://app.datarobot.com/docs/mlops/governance/dep-admin.html) enabled, the output will include a column called `DEPLOYMENT_APPROVAL_STATUS`. For this activity, you can disregard those values.

In [None]:
# On Mac/Linux
!cat ./winequality-white-predictions.csv
# On Windows (uncomment the following line and comment the previous one)
# !type winequality-white-predictions.csv

### Request realtime predictions

Realtime predictions in DataRobot use a [REST API](https://app.datarobot.com/docs/predictions/api/dr-predapi.html) rather than the Python Client API you've been using so far. Like any REST API, using the DataRobot Predictions API consists of making HTTP _requests_ to which the API server sends _responses_.

In this activity, you'll be using accessing the API using the standard Python `requests` package. However, the API can be used by any language or system that supports REST APIs, such as the Linux `curl` command.

Define the URL of the prediction server, based on the hostname of the prediction server for your model's deployment and the deployment ID. This is the server you will send prediction requests to below.

In [21]:
predictionServer = deployment.default_prediction_server['url']

predictionUrl = f'{predictionServer}/predApi/v1.0/deployments/{deployment.id}/predictions'
predictionUrl

'https://app2.datarobot.com/predApi/v1.0/deployments/61950cba667c1dacfd41e7f0/predictions'

Read the batch of data to score into a binary string.

In [23]:
dataToScore = open('winequality-white-score.csv', 'rb').read()

Set up the credentials to access the server. These are the same credentials you used at the beginning of the activity when you created the DataRobot Python API client. However, real-time predictions don't the Python Client API; the API access token needs to be set as part of the HTTP request you will use to request predictions. You can pull the token value out of the same `drconfig.yaml` file you edited earlier.

In [25]:
import yaml 
stream = open('drconfig.yaml', 'r')
drconfig = yaml.load(stream, Loader=yaml.Loader)
drtoken = drconfig['token']

Next use the Python `requests` package to construct a request to send (post) to the prediction server. Note that the settings in the headers depends on whether you have a trial or pay-as-you-go account, or a Managed AI or self-managed system. 

Managed AI and self-managed systems require a separate access key for the prediction server than used for the modeling server:

In [None]:
import requests

# construct a header for the request to include the content type (text)
# and access token and for managed cloud and self-managed installations
# including the prediction server key.
predictionRequestHeaders = {
    'Content-Type': 'text/plain; charset=UTF-8', 
    'Authorization': f'Bearer {drtoken}',
    'datarobot-key': deployment.default_prediction_server['datarobot-key']}

If you are using a trial or pay-as-you-go account, you don't need to specify a key:

In [27]:
import requests

# construct a header for the request to include the content type (text)
# and access token and (for managed cloud and self-managed installations)
# the prediction server key
predictionRequestHeaders = {
    'Content-Type': 'text/plain; charset=UTF-8', 
    'Authorization': f'Bearer {drtoken}'
}

Use the header you created above and the server URL you set up earlier to make a predictions request.

In [28]:
# construct and send the request using the prediction server URL, 
# the binary string containing the data to score, and the header
# values you set above.
predictionsResponse = requests.post(
        predictionUrl,
        data=dataToScore,
        headers=predictionRequestHeaders
)

The server will send a response which includes an response status code and the predictions (if the operation was successful). A status code of 200 means the request was completed successfully. For details about the other error codes, see the [Prediction API Reference](https://app.datarobot.com/docs/predictions/predapi/index.html) docs.

In [29]:
predictionsResponse.status_code

200

Display the returned predictions. If the request was not successful, this will display additional information about the error.

In [30]:
predictionsResponse.json()

{'data': [{'deploymentApprovalStatus': 'APPROVED',
   'prediction': 6.213347619,
   'predictionValues': [{'label': 'quality', 'value': 6.213347619}],
   'rowId': 0},
  {'deploymentApprovalStatus': 'APPROVED',
   'prediction': 6.5103365079,
   'predictionValues': [{'label': 'quality', 'value': 6.5103365079}],
   'rowId': 1},
  {'deploymentApprovalStatus': 'APPROVED',
   'prediction': 5.6953134921,
   'predictionValues': [{'label': 'quality', 'value': 5.6953134921}],
   'rowId': 2},
  {'deploymentApprovalStatus': 'APPROVED',
   'prediction': 6.468334127,
   'predictionValues': [{'label': 'quality', 'value': 6.468334127}],
   'rowId': 3},
  {'deploymentApprovalStatus': 'APPROVED',
   'prediction': 5.8119634921,
   'predictionValues': [{'label': 'quality', 'value': 5.8119634921}],
   'rowId': 4},
  {'deploymentApprovalStatus': 'APPROVED',
   'prediction': 5.5977793651,
   'predictionValues': [{'label': 'quality', 'value': 5.5977793651}],
   'rowId': 5},
  {'deploymentApprovalStatus': 'APPR

The data in the response should include a JSON record for each record in the scoring dataset. Each record will include the predicted value for wine quality as well as the index (row number) of the corresponding row in the scoring data file.