# <H1>The DXC AI Starter</H1>

The code in this document makes it easier to build and deploy a machine-learning microservice. It installs the required library dependencies, builds a data pipeline, builds a model, deploys a microservice, and publishes an API endpoint to the microservice. Find the code marked with <code># TODO</code> and replace it with your own.

<table class="tfo-notebook-buttons" align="left">

  <td>
    <a target="_blank" href="https://colab.research.google.com/github/dxc-technology/DXC-Industrialized-AI-Starter/blob/master/DXC_Industrialized_AI_Starter.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>

  <td>
    <a target="_blank" href="https://github.com/dxc-technology/DXC-Industrialized-AI-Starter"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>

## <H2> Set up the development environment</H2>

This code installs all the packages you'll need. Run it first. It should take 30 seconds or so to complete. If you get missing module errors later, it may be because you haven't run this code. Restart the runtime/session after executing the below code.

In [1]:
##After executing this code, You must restart the runtime/session to use newly installed versions.
%%capture
# ! pip install DXC-Industrialized-AI-Starter
! pip install DXC-Industrialized-AI-Starter==2.0.7
# import os
# os.kill(os.getpid(), 9)

UsageError: Line magic function `%%capture` not found.


From DXC-Industrialized-AI-Starter library import dxc-ai package

### <b>Todo: </b>
Restart the runtime/session after executing the above code cell. 

Runtime -> Restart Runtime

In [1]:
%%capture
from dxc import ai

ModuleNotFoundError: No module named 'dxc'

## <H2>The Industrialized AI Open Badge Academy</H2>

The AI Open Badges are verifiable, portable digital badges with embedded metadata about skills and achievements. They comply with the Open Badges Specification and are shareable across the web. This code defines the parameters needed to apply for an Industrialized AI Open Badge. This is where you define the email address that gets credit for the badge, the platform responsible for issuing the badge, and the evidence used to justify granting the badge. You should not have to change any of the badge platform parameters. For the badge evidence, you must paste a link to this notebook.
<code>AI_Badge</code> is an enumeration of all unique badges. <code>ai_badge_id</code> is a mapping from <code>AI_Badge</code> to a unique identifier.<br />
**AI_Badge:** <ol> <li> CREATE_DATA_STORIES </li><li>RUN_AGILE_TRANSFORMATION</li><li>BUILD_DATA_PIPELINES</li><li>RUN_AI_EXPERIMENT</li><li>BUILD_UTILITY_AI_SERVICES</li><li>PERFORM_AI_FORENSICS</li><li>TEST</li> </ol>**AI_Guild_Roles:** <ol><li>PROJECT_MANAGER</li><li>DATA_SCIENTIST</li><li>DATA_ENGINEER</li><li>ALL</li></ol>

In [None]:
# TODO: create an AI guild profile
ai_guild_profile = {
    "guild_number": 37,
    #Provide the URL to the current notebook
    "badge_evidence": "https://colab.research.google.com/drive/18G3PKNxFJk-_h0S8unPPfywHAfakt3AR?usp=sharing",
    "badge_platform_apiKey": "Yp8bmtzN85lrkGGmhjAM8jGpC1QniYw6EFk5lHh7",
    "badge_platform_apiHost": "https://uefowgpyw6.execute-api.us-east-1.amazonaws.com/",
    "badge_platform_apiBasePath": "prod/partner/",
    #Please identify guild members and roles
    #Please have each guild member use their DXC email address
    "guild_members" : {
        1: {
            "badge_applicant_email": "smathari@dxc.com",
            "roles" : [ai.AI_Guild_Role.ALL]
        },
        2: {
            "badge_applicant_email": "sjoshi86@dxc.com",
            "roles" : [ai.AI_Guild_Role.ALL]
        },
        3: {
            "badge_applicant_email": "nvelagapudi2@dxc.com",
            "roles" : [ai.AI_Guild_Role.ALL]
        },
        4: {
            "badge_applicant_email": "sb263@dxc.com",
            "roles" : [ai.AI_Guild_Role.ALL]
        },
        5: {
            "badge_applicant_email": "dlakshminar2@dxc.com",
            "roles" : [ai.AI_Guild_Role.ALL]
        },
        6: {
            "badge_applicant_email": "rta3@dxc.com",
            "roles" : [ai.AI_Guild_Role.ALL]
        },
        7: {
            "badge_applicant_email": "rgupta329@dxc.com",
            "roles" : [ai.AI_Guild_Role.ALL]
        } 
    }  
}


## Import Modules

This code imports the modules that you will need from each installed library. If you require additional modules, place them here. Modules that have been depricated should be upgraded or replaced.

In [None]:
import doctest #documenting data stories
from IPython.display import YouTubeVideo
import requests

## <H1>Create a Data Story</H1>

The data story defines what the microservice is required to do. The code in the section accesses the raw data and defines an interface that the microservice must satisfy. Explore the raw data. Decide what the microservice will do. Write a test (data story) that will pass only when the microservice is successfully deployed.

## <H2> Access the raw data </H2>
Getting access to raw data is the very first task you have to complete. Your microservice is a wrapper for a machine-learning model. This code accesses the raw data that will be used to train the model.The read_data_frame_from_local_csv function allows you to import local character-delimited (commas, tabs, spaces) files.All parameters are optional. By default, the function will infer the header from the data, but an explicit header can be specified.read_data_frame_from_remote_csv works the same way except that it reads the file from a URL instead of from your local machine. The URL is required. The read_data_frame_from_local_excel_file function allows you to import XLSX files.The read_data_frame_from_local_json function allows you to import JSON files. When the file explorer is launched, you must select an XLSX file or the function will result in an error. The read_data_frame_from_remote_json function reads JSON files from a URL. the JSON data is flattened (in the case of nested data) and cast into Pandas data frame.

NOTE: Run the below code to access required file format. For <code>remote files</code> provide the <code> URL</code>, in case of <code>local file</code> one you run the code, it will allow you to select the file from your local drive.



In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# To upload dataset into google colab file
from google.colab import files
uploaded = files.upload()

In [None]:
# TODO: Access raw data.
##reads json from a url and flattens it into a dataframe
# URL to download the file: https://www.kaggle.com/teertha/ushealthinsurancedataset 
# Dataset can be uploaded by using the above cell code
import pandas as pd

dataset =  pd.read_csv("insurance.csv")

dataset.head()

In [None]:
dataset.shape

### <H2> Define data fields </H2>
Mention the <code>text_fileds</code>,<code>date_fields</code>,<code>numeric_fields</code> and <code>categorical_fields</code> as per you data set. Below are example only.

In [None]:
# TODO: define the data fields
text_fields = []
date_fields = []
numeric_fields = ['age','bmi','charges','children']
categorical_fields = ['sex','smoker','region']
target_clm=['charges']

### <H2> Clean the raw data </H2>
Execute <code>raw_data</code> so that it accesses your raw data and returns it as a Pandas dataframe. Any preprocessing of the raw data should be done here. 

In [None]:
#clean the data
impute = True
data1 = ai.clean_dataframe(dataset, impute, text_fields, date_fields, numeric_fields, categorical_fields)

#display excerpts of the raw data
data1.head()

### <H2>Explore the raw data</H2>
Now that you've read in the raw data, you can explore the data to determine how it can be used. This code provides methods for visualizing the data in useful ways.

<code>explore_features</code> visualizes the relationships between all features in a given data frame. Areas of heat show closely-related features. This visualization is useful when trying to determine which features can be predicted and which features are needed to make the prediction.

<code>visualize_missing_data</code> creates a visual display of missing data in a data frame. Each column of the data frame is shown as a column in the graph. Missing data is represented as horizontal lines in each column. This visualization is useful when determining whether or not to impute missing values or for determining whether or not the data is complete enough for analysis.

<code>plot_distributions</code> creates a distribution graph for each column in a given data frame. Graphs for data types that cannot be plotted on a distribution graph without refinement (types like dates), will show as blank in the output. This visualization is useful for determining skew or bias in the source data.


Use <code>visualize_missing_data</code> to visualize missing fields in your raw data. Determine if imputing is necessary. Refine <code>raw_data()</code>, if necessary, and repeat this analysis.

In [None]:
ai.visualize_missing_data(data1)

In [None]:
# checking for null values in the dataset
data1.isnull().sum()

In [None]:
# checking for null values in the dataset
data1.isna().sum()

Use <code>explore_features</code> to explore the correlations between features in the <code>raw_data</code>. Use the visualization to form a hypothesis about how the <code>raw_data</code> can be used. It may be necessary to enrich <code>raw_data</code> with other features to increase the number and strength of correlations. If necessary, refine <code>raw_data()</code> and repeat this analysis.

In [None]:
ai.explore_features(data1)

Use <code>plot_distributions</code> to show the distributions for each feature in <code>raw_data</code>. Depending on <code>raw_data</code>, this visualization may take several minutes to complete. Use the visualization to determine if there is a data skew that may prevent proper analysis or useful insight. If necessary, refine <code>raw_data()</code> and repeat this analysis.

In [None]:
ai.plot_distributions(data1)

## <H2>Define a story</H2>

The data story is a unit test that will only pass when the microservice is successfully deployed. After defining <code>raw_data()</code>, you will build a <code>data_story()</code>. Although this test will fail (initially), it defines the requirements for all remaining work. After writing the <code>data_story()</code>, complete the remaining tasks in this notebook. Rerun the <code>data_story()</code>. At this point, the test should succeed. All tasks are successfully complete when the <code>data_story()</code> succeeds.

<b>DO NOT SKIP THIS STEP.</b> Although unit testing does not contribute to the functionality that you will deploy, it does determine the requirements of success. You should clearly document your goals before continuing. This video provides an overview of test-driven development. It describes the concept of writing tests first and the reasons for doing so.

In [None]:
YouTubeVideo('uGaNkTahrIw')

This video provides an overview of Python Doctests. It provides an explanation of automated testing in Python. It walks you through the basic tasks of creating and executing a test. Watch this video if you are unfamiliar with Doctests. This video should be removed or replaced if data stories are executed using something other than Doctests.

In [None]:
YouTubeVideo('_BFeAJ8hC7Y')

This code defines a unit test that sends data to an API endpoint and checks for an expected result. Update the <code>Context</code>, <code>Intent</code>, and <code>Design</code> to reflect your story. 

The <code>design</code> is the specification for your AI microservice. It defines the URL enpoint for the service. The test submits test input to the endpoint and tests if the output is within an expected range. Given the input you defined, you must also define an expected range within which the microservice will output when it is working properly. This means that you must form an expectation or reasonable behavior for the microservice.

The <code>datastory</code> function acts as a contract that automatically verifies when you have completed the microservice. Create the <code>datastory()</code> and verify that the test fails. Complete the remaining tasks in this notebook. Rerun the <code>datastory()</code> and verify that the test passes. If the requirements of the microservice changes, update <code>datastory</code> and repeat this process.


In [None]:
# TODO: write the AI microservice specification
def datastory(api_endpoint, input, header):
    """
    Context:
    
    This microservice is part of AI which predicts the Insurance premium based on the factors of: 
     - Smoking
     - Region
     - Age
     - Gender
     - BMI
    =================================================
    
    Intent:
    
    This microservice intends to provide information regarding change of premiums for an individual based on the age, BMI and smoking habits while also considering 
    at the same time the other factors such as gender/region/dependents.

    This microservice can be used by the Insurance Companies to predict the premium rates for an individual based on age, BMI and smoking habits. When this microservice is 
    integrated with any Insurance Company portal, customer could check how much premiuim is to be paid by providing the inputs as mentioned in the DESIGN section below.
    ========================================================
    
    Data Source: 
    
    Data set has been taken from Kaggle: https://www.kaggle.com/teertha/ushealthinsurancedataset
    This dataset contains 1338 rows of insured data, where the Insurance charges are given against the following attributes of the insured: 
    Age, Sex, BMI, Number of Children, Smoker and Region. The attributes are a mix of numeric and categorical variables.

    Numerical attributes:
     - age
     - bmi
     - children
    
    Categorical attributes:
     - sex
     - smoker
     - region

    Target attribute:
     - charges

    ========================================================
    
    Observations on dataset: 

    We have observed that there is strong positive correlation between "smoker" and "charges" attribute and no correlation between 'region' and 'charges' attributes. Positive 
    correlation implies, the attributes are dependent on each other and no correlation would imply that the change on 'region' attribute will not be effecting the 'charges'
    attribute. Our dataset does not contain any negatively correlated attributes. 
    
    As per the observation from our dataset, 'age', 'smoker', 'bmi' and 'sex' attributes are majorly contributing towards variation of 'charges' attribute. 
    After exploring the data we found out that there are no missing, null or undefined values in the dataset.
    ========================================================

    Business Value:
    
    Once this micro service is consumed and made available to the front end customers, it would allow the individual to reach and look out for the premium themselves which gives a 
    better experience to them. Also, the need for external agents to perform this task can be automated which will allow the company to channel some premium discount to the customer. 
    As the customer experience is the key in the market and insurance premium are always cut-throat in the market, such transparency will give customer an experience and command 
    that will help improve the business in the longer run.

    ========================================================

    Doctest working: 
    
    The inputs required for our microservice are 'age', 'sex', 'bmi', 'children', 'smoker' and 'region'. When the inputs are provided, the microservice will predict the charges of 
    Insurance premium for the individual. If any of the input is not provided, there are chances that our microservice may fail to execute. 
    
    Our aim is that our microservice would provide 95% of accuracy in predicting the Insurance premium.


    Design:
    >>> api_endpoint = "https://api.algorithmia.com/v1/algo/shubhamjoshi/aibootcamp/0.1.0"
    >>> input = '{"age":19, "sex":"female", "bmi":27.9, "children":0, "smoker":"yes", "region":"southwest"}'
    >>> header = {'Content-Type': 'application/json',  'Authorization': 'simVnAbGo/j8B2nfYqb5yxKRw891'}
    >>> 9000 < float(datastory(api_endpoint, input, header)) < 9200
    
    True
    """

    try:
      headers = {
          'Content-Type': 'application/json',
          'Authorization': 'simVnAbGo/j8B2nfYqb5yxKRw891',
      }
      params = (
          ('timeout', '300'),
      )
      data = input
      response = requests.post(api_endpoint, headers=headers, params=params, data=data)
      result = response.json()['result']['results']
    
    except Exception as error:
      result = {error}

    return result

doctest.testmod(verbose=False)

### <H2>Apply for the Create Data Stories badge </H2>
This code applies for the Create Data Stories Industrialized AI Open Badge.<code>apply_for_an_ai_badge</code> applies for a a secific <code>ai_badge</code> on behalf of the user specified in the <code>ai_guide_profile</code>.
<b>Run this code only if you are interested in earning the badge.</b> This code will submit a link to this notebook to reviewers as evidence for your badge. Badge reviewers will inspect this notebook to ensure that:
<ul>
  <li>You have successfuly completed all <code>#TODO</code> items for <code>datastory()</code>.</li>
  <li><code>datastory()</code> is a test that runs and fails.
  <li>The <code>datastory()</code> makes sense given the output of <code>raw_data()</code>.</li>
</ul>

After inspection, you will receive notification either confirming that you have earned the badge or with suggested changes.</li>

### <H2>Todo: </H2>
Before applying for the create data stories badge please answer the following questions.

- Please provide the reviewer access to the raw data so that the reviewer can upload the data and run the code block.
- Describe your observation and analysis of the data exploration? 
- Describe how will you implement the AI functionality/AI driven transformation in your project?


# **Todo: Response**

*Please provide the reviewer access to the raw data so that the reviewer can upload the data and run the code block.*

**Ans**: Access provided to the reviewer.

*Describe your observation and analysis of the data exploration?*

**Ans**: The data set taken is based on regression analysis. We have observed that there is strong positive correlation between "smoker" and "charges" attribute and no correlation between 'region' and 'charges' attributes. Positive correlation implies, the attributes are dependent on each other and no correlation would imply that the change on 'region' attribute will not be effecting the 'charges' attribute. Our dataset does not contain any negatively correlated attributes. As per the observation from our dataset, 'age', 'smoker', 'bmi' and 'sex' attributes are majorly contributing towards variation of 'charges' attribute. After exploring the data we found out that there are no missing, null or undefined values in the dataset.

*Describe how will you implement the AI functionality/AI driven transformation in your project?*

**Ans**: We will deploy the data in MongoDb. Once done, we will use the data in the DXC-AI Starter package and deploy it in algorithmia. We will use the API to test the micro services. Once this micro service is consumed and made available to the front end customers, it would allow the individual to reach and look out for the premium themselves which gives a better experience to them. Also, the need for external agents to perform this task can be automated which will allow the company to channel some premium discount to the customer. As the customer experience is the key in the market and Insurance premium are always cut-throat, such transparency will give customer an experience that will help improve the business in the longer run.

Doctest working: 
The inputs required for our microservice are 'age', 'sex', 'bmi', 'children', 'smoker' and 'region'. When the inputs are provided, the microservice will predict the charges of Insurance premium for the individual. If any of the input is not provided, there are chances that our microservice may fail to execute. Our aim is that our microservice would provide 95% of accuracy in predicting the Insurance premium.

In [None]:
##Todo: When you are ready to apply for the badge, please uncomment the below code and run the code for badge submission.
# ai.apply_for_an_ai_badge(ai_guild_profile, ai.AI_Badge.CREATE_DATA_STORIES)

## <H1>Build a data pipeline</H1>
A data pipeline takes raw data and turns it into refined data that can be used to train and score a machine-learning model. The code in this section takes the output of <code>raw_data()</code> and puts it into a data store. It instructs the data store to refine the raw data into training data. It extracts the training data for use in training a machine-learning model.

You will be using MongoDb as your data store. This video provides a general overview of MongoDB. The document model of MongoDB breaks from the traditional relational model of common relational databases. This video describes the basic idea behind the document mdoel. It also describes MongoDb clusters and the methods used to scale. It introduces MongoDB Atlas, which you will be using in the remainder of this notebook.

In [None]:
YouTubeVideo('EE8ZTQxa0AM')

This video provides an overview of Mongo DB Atlas. It provides an explanation of the software. It walks you through the basic tasks of setting up an account and generating the proper connection credentials. Watch this video if you are unfamiliar with Mongo DB Atlas. This video should be removed or replaced if the data is stored using something other than Mongo DB Atlas.

In [None]:
YouTubeVideo('rPqRyYJmx2g')

### <H2>Collect raw data</H2>

This code defines the meta-data needed to connect to Mongo DB Atlas and create a new data store cluster. This is where you define basic information about the location of the cluster and the collection and database to use. Update this code with information appropriate to your project. This code assumes that the data store is Mongo DB Atlas. If the raw data is stored and refined using something other than Mongo DB Atlas, the parameters of the <code>data_layer</code> will need to be updated or replaced with something else. In order to provide the information required in <code>data_layer</code>, you must:
<ul>
  <li>Create a MongoDB Atlas account</li>
  <li>Create a cluster</li>
  <li>Create a user</li>
  <li>Generate a connection string</li>
</ul>

Note: 

When you configure the IP whitelist for your cluster, choose to allow a connection from anywhere. Since your notebook is running in Colab, we cannot guarantee a known IP address.

When creating the database connection string, choose the <code>Python</code> driver version 3.4 or later.

In [None]:
# TODO: specify the details of the data layer

data_layer = {
    "connection_string": "mongodb+srv://shubham:Dxc1234@cluster0.cptej.mongodb.net/<dbname>?retryWrites=true&w=majority",
    "collection_name": "ai_clnme_37",
    "database_name": "ai_dbnme_37",
    "data_source":"data1",
    "cleaner":"no"
}

Use <code>write_raw_data</code> function from <code>ai library</code>  to convert <code>Arrow</code> dates to <code>Strings</code> data types. This function also connects to Mongo DB ATlas a build a database and collection according to the parameters of <code>data_layers</code>.It also transfers the output of <code>raw_data()</code> into the database and collection. This function handles Mongo DB Atlas automatically.

In [None]:
data1

In [None]:
data1.age.unique()

In [None]:
data2 = ai.write_raw_data(data_layer, data1, date_fields)

In [None]:
data2

### <H2>Ingest and clean data</H2>
This video provides an overview of how to create aggregation pipelines in Mongo DB Atlas. It describes the basic concepts and walks you through example pipelines. Watch this video if you are unfamiliar with Mongo DB Atlas aggregation pipelines. This video should be removed or replaced if the data is stored using something other than Mongo DB Atlas or the data is refined using something other than aggregation pipelines.

In [None]:
YouTubeVideo('Kk6Er0c7srU')

This code instructs the data store on how to refine the output of <code>raw_data()</code> into something that can be used to train a machine-learning model. Update <code>data_pipeline()</code> with code with an aggregation pipeline that fits your project. The refined data using <code>access_data_from_pipeline</code> from <code>ai library</code>  will be stored in the <code>df</code> Pandas dataframe. Make sure the output is what you want before continuing. 

In [None]:
# TODO: define the code needed to refine the raw data

def data_pipeline():
 
  pipe = [
          {
              '$group':{
                  '_id': {
                            "age":"$age_ins",
                            "sex":"$sex",
                            "bmi":"$bmi",
                            "children":"$children",
                            "smoker":"$smoker",
                            "region":"$region",
                            "charges":"$charges",
                  }
              }
          }
  ]

  return pipe

df = ai.access_data_from_pipeline(data2, data_pipeline())
df.head()

### <H2>Apply for the Build Data Pipeline badge</H2>

This code applies for the Build Data Pipeline AI Open Badge.
<b>Run this code only if you are interested in earning the badge.</b> This code will submit a link to this notebook to reviewers as evidence for your badge. Badge reviewers will inspect this notebook to ensure that:
<ul>
  <li>You have successfuly completed all <code>#TODO</code> items for <code>datapieline()</code> and <code>access_data_from_pipeline()</code>.</li>
  <li><code>df</code> is populated with data.</li>
  <li>The data in <code>df</code> matches the input specified in the design section of <code>datastory()</code>.</li>
</ul>

After inspection, you will receive notification either confirming that you have earned the badge or with suggested changes.

In [None]:
##Todo: When you are ready to apply for the badge, please uncomment the below code and run the code for badge submission.
ai.apply_for_an_ai_badge(ai_guild_profile, ai.AI_Badge.BUILD_DATA_PIPELINES)

## <H1>Run an experiment</H1>
An experiment trains and tests a machine-learning model. The code in this section runs a model through a complete lifecycle and saves the final model to the local drive. Run the code that defines a machine-learning model and its lifecycle. Design an experiment and execute it. Most of the work of choosing features and specific model parameters will be done automatically. The code will also automatically score each option and return the options with the best predictive performance.

### <H2>Execute the experiment</H2>

This code executes an experiment by running <code>run_experiment</code> from <code>ai library</code> on a model. Update <code>experiment_design</code> with parameters that fit your project. The <code>data</code> parameter should remain <code>df</code>-- the refined training data. The <code>model</code> parameter must be a <code>model</code> subclass. The <code>labels</code> parameter indicates the column of the <code>data</code> dataframe to be predicted. For the <code>prediction</code> model, the <code>meta-data</code> must describe the column to be predicted and the types for non-numeric columns. Check out [auto_ml](https://auto-ml.readthedocs.io/en/latest/index.html) to learn more about the auto_ml library usage and documentation. 

Auto_Clustering model is also available in the AI_Starter, which inturn looks for the best model in the three Clustering models(Affinity Propagation, DBScan and K-means). Please refer to the [example document](https://github.com/dxc-technology/DXC-Industrialized-AI-Starter/blob/master/Examples/Clustering.ipynb.ipynb) on implementing the clustering model.


In [None]:
df.head()


In [None]:
# TODO: design and run an experiment
experiment_design = {
    #model options include ['regression()', 'classification()','timeseries']
    "model": ai.regression(),
    "labels": df['_id.charges'],
    "data": df,
    #Tell the model which column is 'output'
    #Also note columns that aren't purely numerical
    #Examples include ['nlp', 'date', 'categorical', 'ignore']
    "meta_data": {
      "_id.charges": "output",
      "_id.sex": "categorical",
      "_id.smoker": "categorical",
      "_id.region": "categorical",

  }
}
trained_model = ai.run_experiment(experiment_design, verbose = False)

### <H2> Apply for the Run AI Experiment badge </H2>

This code applies for the Run AI Experiment Industrialized AI Open Badge. Run this code only if you are interested in earning the badge. This code will submit a link to this notebook to reviewers as evidence for your badge. Badge reviewers will inspect this notebook to ensure that:
<ul>
<li>You have successfuly completed all <code>#TODO</code> items for the <code>experiment_design</code>.</li>
<li>You have successfully executed <code>run_experiment</code> on the <code>experiment_design</code>.</li>
</ul>

After inspection, you will receive notification either confirming that you have earned the badge or with suggested changes.</li>

### <H2>Todo: </H2>
Before applying for the Run AI Experiment badge please answer the following questions.
<ul>
<li><b>Goal:</b> What is the overall goal of the AI? This should be an expansion on the text already supplied for AI service in your Data Story. </li>

<b>Ans</b>Overall goal of the AI is to predict the charges of insurance based on the smoking habits, age and sex.

<li><b>Source:</b> This should be where and type of data was obtained. </li>
<b>Ans</b> https://www.kaggle.com/teertha/ushealthinsurancedataset


<li><b>Processing Steps:</b> Bullet points detailing what your AI intends to perform.</li>
<b>Ans</b> It checks the smoking habit of the person for which insurance charge is to be obtained. Then it looks for the age and sex for the person. This way it is able to calculate for the premium charges of the person.

<li><b>Output:</b> Describe the output type (last step from processing) and the type of resultants you expect to obtain (success and failures if appropriate).

<b>Ans</b> Output type is continuous value, it provides the imsurance charges for the person applying for it. We estimate the charges to be within range and experiment to be successful.

In [None]:
##Todo: When you are ready to apply for the badge, please uncomment the below code and run the code for badge submission.
ai.apply_for_an_ai_badge(ai_guild_profile, ai.AI_Badge.RUN_AI_EXPERIMENT)

## <H1>Model Explainability</H1>

<b>Note:</b> In the present version, Model Explainability supports only the custom models. We are implementing the explainability changes for Auto_ml models. Changes will be published soon. 

Model explainability is one of the most important problems in machine learning today. The code in this section helps you to understand the output of the machine learning model using interactive dashboards. Model explainability supports [SHAP - based explainer](https://github.com/slundberg/shap). Depending on the model, Model Explainer uses one of the supported SHAP explainers.  

### <H4>SHAP explainers:</H4>
<ul>
<li>SHAP TreeExplainer</li>
<li>SHAP DeepExplainer</li>
<li>SHAP LinearExplainer</li>
<li>SHAP KernelExplainer</li></ul>


<code>ai.Global_Model_Explanation</code> function generates the overall model predictions and generates a dictionary of sorted feature importance names and values. <code>ai.Explanation_Dashboard</code> function will generate an interactive visualization dashboard, you can investigate different aspects of your dataset and trained model via four tab views:
<ul>
<li>Model Performance</li>
<li>Data Explorer</li>
<li>Aggregate Feature Importance</li>
<li>Individual Feature Importance</li></ul>

To generate the model explainability, you need to pass your model, training data, test data to the functions. You can also optionally pass in feature names and output class names(classification) which will be used to make the explanations and visualizations more informative. Explanations will be generated default on the test data. If you pass the value of <code>explantion_data</code> parameter as 'Training', then the explanation will be generated on training data. But with more examples, explanations will take longer although they may be more accurate.

Check out [Examples](https://github.com/dxc-technology/DXC-Industrialized-AI-Starter/tree/master/Examples) to understand how to use each function, what parameters are expected for each function. Also check out [shap](https://github.com/slundberg/shap), [lime](https://github.com/marcotcr/lime), [interpret-community](https://github.com/interpretml/interpret-community) libraries to learn more about the Model explainability and its usage.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder


cat_attribs = ['_id.sex','_id.smoker','_id.region']
feature_names = df.columns
feature_names = feature_names.to_numpy()
feature_names
 
full_pipeline = ColumnTransformer([
        
        ("cat", OneHotEncoder(), cat_attribs),
    ])


 
train_prepared = df.drop('_id.charges', axis=1)
y_train1 = df['_id.charges']
 
x_train1 = full_pipeline.fit_transform(train_prepared)
 
x_train, x_test, y_train, y_test = train_test_split(x_train1, y_train1, test_size=0.2, random_state=0)
 
reg = GradientBoostingRegressor(n_estimators=100, max_depth=4,
                                learning_rate=0.1, loss='huber',
                                random_state=1)
model = reg.fit(x_train, y_train)

In [None]:
global_explanation = ai.Global_Model_Explanation(model,x_train,x_test,feature_names = None,classes = None, explantion_data = None)

In [None]:
ai.Explanation_Dashboard(global_explanation, model, x_train, x_test, explantion_data = None)

## <H2>Apply for the AI Forensics AI badge</H2>

This code applies for the AI Forensics Open Badge. Run this code only if you are interested in earning the badge. This code will submit a link to this notebook to reviewers as evidence for your badge. Badge reviewers will inspect this notebook to ensure that:
<ul>
<li>After exploring the raw data, you have ensured that the <code>raw_data</code> is free from bias that could adversely affect the <code>intent</code> of the <code>datastory()</code></li>
</ul>

After inspection, you will receive notification either confirming that you have earned the badge or with suggested changes.</li>

### <H2>Todo: </H2>
Before applying for the badge please answer the following questions.
<ul>
<li>State where you got the dataset and what it contains.</li>
<b>Ans</b> We found the data from Kaggle: https://www.kaggle.com/teertha/ushealthinsurancedataset 

<li>State how you got your pipeline into your system.</li>
<b>Ans</b>
First the AI checks the parameters such assmoking, sex and age as the paarameter and then uses them to predict the insurance charges. 

Pipelining is done in MongoDB
<li>Describe the Predicted Outcome – what you intended it to do.
</li> <b>Ans</b> Outcome is the continuous variable which are the charges that incur to the person applying for insurance., These details can be used in pre-application process.

<li>Analysis of the solution/General Analysis of your AI. This should be evaluation of the topics e.g. what topic modelling did you perform, topic identification,AI Experiment Result Observations. </li>
<b>Ans</b> We have used DXC industrilised Auto ML program to model the data, we also used GradientBoostingRegressor for the explainability as the explainer was unable to explain the Auto MLmodel.
<li>High Level Overview of your AI – enterprise risks, establish there is no risk.</li><b>Ans</b> DXC Industrialized AI tests different models and predictsthe outcomes based on these models. There are no risk seen for now.

</ul> 


In [None]:
##Todo: When you are ready to apply for the badge, please uncomment the below code and run the code for badge submission.
ai.apply_for_an_ai_badge(ai_guild_profile, ai.AI_Badge.PERFORM_AI_FORENSICS)

## <H1>Generate insight</H1>
Insights are delivered through microservices with published APIs. The code in this section prepares an execution environment for the microservice, builds a microservice using the machine-learning model, deploys the microservice into the execution environment, and publishes an API enpoint for the microservice. Design the microservice and deploy it. The work of creating the microservice and deploying it will be done automatically. The code will also automatically handle the source code reposity management.

This video provides an overview of the algorithm execution environment provided by Algorithmia. It describes the basic concept of the Algorithmia AI Layer and walks you through publishing a microservice. Watch this video if you are unfamiliar with publishing microservices using Algorithmia. This video should be removed or replaced if the microservices are run using something other than Algorithmia.

In [None]:
YouTubeVideo('56yt2Bouq0o')

### <H2>Configure the microservice execution environment</H2>
The execution environment is where the micorservice runs. This code assumes that the microservice execution environment is Algorithmia. If the microservices will be deployed somewhere other than Algorithmia, the code in this section will need to be replaced. In order to provide the information required to design the microservice, you must:
<ul>
  <li>create an Algorithmia account</li>
  <li>create an <a href='https://algorithmia.com/user#credentials' target='new'>API key</a> with BOTH "Read & Write Data" and "Manage Algorithms" permissions enabled</li>
  <li>create an algorithm user name</li>
</ul>

### <H2> Design the microservice </H2>
This code defines the parameters needed to build and delpoy a microservice based on the trained <code>model</code>. Update <code>microservice_design</code> with parameters appropriate for your project. The parameters must contain valid keys, namespaces, and model paths from Algorithmia (see above). The <code>microservice_design</code> will need to be updated if the microservice will run in something other than Algorithmia.

In [None]:
# TODO design a microservice
microservice_design = {
    "microservice_name": "aibootcamp",
    "microservice_description": "API generated for AIbootcamp for Insurance data",
    "execution_environment_username": "shubhamjoshi",
    "api_key": "simEKjTH3G4gK+6sdj9+sfKcjki1",
    "api_namespace": "shubhamjoshi/aibootcamp",   
    "model_path":"data://.my/mycollection"
}

### <H2>Apply for the Create Utility AI Services Badge</H2>

This code applies for the Create Utility AI Services Open Badge. Run this code only if you are interested in earning the badge. This code will submit a link to this notebook to reviewers as evidence for your badge. Badge reviewers will inspect this notebook to ensure that you have successfully publised the AI microservice.

After inspection, you will receive notification either confirming that you have earned the badge or with suggested changes.</li>
### <H2>Todo: </H2> 
Before applying for the badge please answer the following questions.
<ul>
<li>How did you monitor the AI Utility and manage the data pipelines? Should they have implemented any monitoring? In scope, was there additional needs?

<b>Ans</b> Pipelines in the project have been done using MongoDB, and they have a security checklist which enable Access Control and Enforce Authentication.Also we have configured Role-Based Access Control
</li><li>How did you manage the security of the data and models?

<b>Ans</b> Once the Trained models are created they have been uploaded to algorithmia API which saves the trained model and provides access to the API which can be accessed using a json call. We wiill in return recieve a response with the prediction.
</li><li>How did you expose models of the AI Utility as APIs?

<b>Ans</b> We have used DXC Industrialized AI which automatically test different models using the installation of packages done in the collab notebook.
</li></ul>

In [None]:
##Todo: When you are ready to apply for the badge, please uncomment the below code and run the code for badge submission.
ai.apply_for_an_ai_badge(ai_guild_profile, ai.AI_Badge.BUILD_UTILITY_AI_SERVICES)

### <H2>Publish the microservice</H2>
<code>publish_microservice</code> function from <code>ai library</code> committs the changes made to the local, cloned GitHub repository and compiles the new microservice in Algorithmia and publish the microservice. It also generates the api endpoint for the newly published microservice. Run the code. Copy the URL and paste it into the <code>datastory</code>. After pasting the enpoint into the <code>datastory</code>, the <code>datastory</code> should succeed and you should be done.

In [None]:
# publish the micro service and display the url of the api
api_url = ai.publish_microservice(microservice_design, trained_model, verbose = False)
print("api url: " + api_url)

## <H2>Apply for the Agile Transformation Badge</H2>
This code applies for the Agile Transformation Open Badge. Run this code only if you are interested in earning the badge. This code will submit a link to this notebook to reviewers as evidence for your badge. Badge reviewers will inspect this notebook to ensure that the <code>datastory()</code> runs properly and the test passes.

After inspection, you will receive notification either confirming that you have earned the badge or with suggested changes.
### <H2>Todo: </H2> 
Before applying for the badge please answer the following questions.
<ul><li>Explain your data, how it is transmitted and stored – how? 

<b>Ans</b> Our data consists of data related to Insurance charges for the individual applying for the insurance policy. We would use the data to predict the charges that would apply to the individual based on smoking habits, age and sex.

Data was first accessed via csv and we wrangled the data as required and then the processed data is stored in mongoDB and accessed from there.

Modelling is done using the DXCindustrilised AI with AutoML models and the accuracy was measured.

We have then moved the model to algorithmia to create an APi which can be called to get the predictions.

</li></ul>

In [None]:
##Todo: When you are ready to apply for the badge, please uncomment the below code and run the code for badge submission.
ai.apply_for_an_ai_badge(ai_guild_profile, ai.AI_Badge.RUN_AGILE_TRANSFORMATION)