# QBE

## Goal
Modernize the way we deliver models and technical premium to our business partners (e.g.,
Underwriters, Claims Reps, etc.) Our aim is to deploy these models as an API, such that they can
be easily consumed from various downstream applications. In addition to this, we’re interested
to see a solution with the following components:
- Deployment Governance (e.g., approval process to move to production, version history,
etc.)
- Complete, searchable, and auditable provenance of all models
- Model explainability (e.g., which features were important to the prediction and why)
- Model quality monitoring (e.g., model accuracy over time)
- Data quality monitoring (e.g., data drift)
- Holistic view of all models in production (e.g., how many models do we have, what’s the aggregate model quality, how many need to be retrained, etc.)
- API telemetry and transaction auditability (e.g., how often is the API used, how performant is it, what were the inputs and outputs in each transaction, etc.)
- Native support for development in open-source languages (e.g., R, Python, etc.) and deployment of varying model types to the solution (e.g., GLM, GBM, CNN, etc.)
- Ability to make pricing transactional data available for analytical purposes

## Load Libraries

In [1]:
!pip3 install --user -r libraries.txt

Collecting statsmodels
  Downloading statsmodels-0.12.2-cp36-cp36m-manylinux1_x86_64.whl (9.5 MB)
     |████████████████████████████████| 9.5 MB 5.3 MB/s            
Collecting patsy>=0.5
  Downloading patsy-0.5.2-py2.py3-none-any.whl (233 kB)
     |████████████████████████████████| 233 kB 49.8 MB/s            
Installing collected packages: patsy, statsmodels
Successfully installed patsy-0.5.2 statsmodels-0.12.2


## Import Dependencies

In [1]:
import requests
import numpy as np 
import pandas as pd
import scipy
import statsmodels.api as sm

## Load Data 

In [18]:
data = pd.read_csv("comm_auto_sample_data.csv")
data.info()
indication = pd.read_csv("comm_auto_sample_indication.csv")
indication.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 131773 entries, 0 to 131772
Data columns (total 11 columns):
 #   Column                           Non-Null Count   Dtype  
---  ------                           --------------   -----  
 0   coverage_type                    131773 non-null  object 
 1   pol_eff_year                     131773 non-null  int64  
 2   risk_state                       131773 non-null  object 
 3   naics3                           131539 non-null  float64
 4   n_power_units                    131773 non-null  int64  
 5   prior_claim_freq_3yr             99408 non-null   float64
 6   dnb_credit_score                 108504 non-null  float64
 7   easi_snowfall                    131504 non-null  float64
 8   experience_rated_manual_premium  131773 non-null  float64
 9   incurred_loss_and_alae           131773 non-null  float64
 10  split                            131773 non-null  object 
dtypes: float64(6), int64(2), object(3)
memory usage: 11.1+ MB
<class 

## Split Data

In [3]:
train_df = data[data.split == 'Training']
train_df = train_df.drop(['split'], axis=1)
train_df.info()
test_df = data[data.split == 'Holdout']
test_df = test_df.drop(['split'], axis=1)
test_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 103414 entries, 1 to 131772
Data columns (total 10 columns):
 #   Column                           Non-Null Count   Dtype  
---  ------                           --------------   -----  
 0   coverage_type                    103414 non-null  object 
 1   pol_eff_year                     103414 non-null  int64  
 2   risk_state                       103414 non-null  object 
 3   naics3                           103219 non-null  float64
 4   n_power_units                    103414 non-null  int64  
 5   prior_claim_freq_3yr             78067 non-null   float64
 6   dnb_credit_score                 85344 non-null   float64
 7   easi_snowfall                    103201 non-null  float64
 8   experience_rated_manual_premium  103414 non-null  float64
 9   incurred_loss_and_alae           103414 non-null  float64
dtypes: float64(6), int64(2), object(2)
memory usage: 8.7+ MB
<class 'pandas.core.frame.DataFrame'>
Int64Index: 28359 entries, 0 to 1

## Sample Use Case
To deliver technical premium for Commercial Retail – Auto, you need a combination of pricing
inputs from:
### 1. Manual Premium
- Comes from our Policy Admin System, Majesco
- Would be API data from Majesco
- For the demo, briefly describe how you would ingest this type of data from an internal vendor.

In [14]:
url = "https://httpbin.org/post"
querystring = {"search.sessiontype":"1522435540042001BxTD"}
payload = "{}"
headers = {
    'content-type': "multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW",
    'rfapiprofileid': "uGiII5rYGOjoHXOZx0ch4r7f1KzFC0zd",
    'rfwidgetid': "KKA8rC3VuZo5clh8gX5Aq07XFonUTLyU",
    'cache-control': "no-cache",
    'Postman-Token': "dadd9f76-6a7f-41ed-8f31-04359976c622"
}
print(requests.request("POST", url, data=payload, headers=headers, params=querystring).text)

{
  "args": {
    "search.sessiontype": "1522435540042001BxTD"
  }, 
  "data": "", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Cache-Control": "no-cache", 
    "Content-Length": "2", 
    "Content-Type": "multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW", 
    "Host": "httpbin.org", 
    "Postman-Token": "dadd9f76-6a7f-41ed-8f31-04359976c622", 
    "Rfapiprofileid": "uGiII5rYGOjoHXOZx0ch4r7f1KzFC0zd", 
    "Rfwidgetid": "KKA8rC3VuZo5clh8gX5Aq07XFonUTLyU", 
    "User-Agent": "python-requests/2.25.1", 
    "X-Amzn-Trace-Id": "Root=1-61b3bd9b-347d774243eccd5a019311e8"
  }, 
  "json": null, 
  "origin": "104.197.249.205", 
  "url": "https://httpbin.org/post?search.sessiontype=1522435540042001BxTD"
}



### 2. Experience Mod (tool_name = Experience Rater, file_type = Excel, model_type = Rater)
- Relies on historical losses, which comes in the form of PDFs, Excel files, etc.
- Sample Algorithm:

`If Manual Premium < $10,000
    If losses > $20,000 then 2 else 0.8
Else
    If losses > $1,000,000 then 1.5 else 1.0`

In [5]:
def experience_rater(manual_premium, losses):
    rating = 0.0
    if manual_premium < 10000:
        if losses > 20000:
            rating = 2.0
        else:
            rating = 0.8
    else:
        if losses > 1000000:
            rating = 1.5
        else:
            rating = 1
    return rating

In [16]:
print(experience_rater(20000,0))

1


### 3. Loss Rating Tool (tool_name = CLRT, file_type = Excel, model_type = Rater)
- Relies on (1) and (2)
- Only used for large accounts
    - $500,000+ experience rated manual premium
    - This is an example of a business rule we’d like to see handled (i.e., when to use a specific algorithm)
- In production, the input to this algorithm is dynamic (e.g., what if analysis), and will come from our Underwriting Workbench (i.e., the UI where the Underwriter works to quote policies).
- Example Input Variables
    - Historical exposure by year
        - Auto exposure is vehicle count
    - Losses by year
    - Claim detail
    - Aggregate losses by year
    - Account information
- Please describe how you would handle the dynamic input and output of Example Input Variables as part of your deployment
- Output is technical premium
    - CLRT Mod * Experience Mod * Manual Premium = Technical Premium
    - CLRT Mod Sample Algorithm:
    
`If exposure < 100
    If losses > 1000000 then 1.8
    If losses > 500000 then 1.4
    If losses > 250000 then 1.0
    If losses > 100000 then 0.9
    Else 0.8
Else
    If losses > 1000000 then 2.0
    If losses > 500000 then 1.5
    If losses > 250000 then 1.0
    If losses > 100000 then 0.85
    Else 0.75`


In [7]:
def clrt (exposure, losses):
    technical_preminum = 0.0
    if exposure < 100:
        if losses > 1000000:
            technical_preminum = 1.8
        if losses > 500000:
            technical_preminum = 1.4
        if losses > 250000:
            technical_preminum = 1.0
        if losses > 100000:
            technical_preminum = 1.9
        else:
            technical_preminum = 0.8
    else:
        if losses > 1000000:
            technical_preminum = 2.0
        if losses > 500000:
            technical_preminum = 1.5
        if losses > 250000:
            technical_preminum = 1.0
        if losses > 100000:
            technical_preminum = 0.85
        else:
            technical_preminum = 0.75
    return technical_preminum

In [8]:
print(clrt(101, 110000))

0.85


### 4. Discretionary Pricing Guidance (tool_name = NBPT, file_type = Excel, model_type = GLM)
- Relies on (1) and (2)
- Sample data and schema provided
- Please build a GLM with the following model specification
    - link = “log”
    - family = “Tweedie”
    - weight = manual_premium
    - target = manual_loss_ratio (incurred_loss_and_alae / manual_premium)
    - features = [
        - policy_year
        - risk_state
        - n_power_units
        - prior_claim_freq_3yr
        - easi_snowfall
        - dnb_credit_score
    ]
- Output is technical premium
    - GLM prediction * Experience Mod * Manual Premium * (1 + Indication) = Technical Premium
    - Indications are provided in a sample file
        - [sample_data].[risk_state] = [indication].[risk_state]
        - [sample_data].[coverage_type] = [indication].[coverage_type]
        - Join on these keys to obtain the Indication

In [20]:
glm_data = sm.datasets.scotland.load()
glm_data.exog = sm.add_constant(glm_data.exog)
glm_model = sm.GLM(glm_data.endog, glm_data.exog, family=sm.families.Tweedie())
glm_results = glm_model.fit()
print(glm_results.summary())

                 Generalized Linear Model Regression Results                  
Dep. Variable:                      y   No. Observations:                   32
Model:                            GLM   Df Residuals:                       24
Model Family:                 Tweedie   Df Model:                            7
Link Function:                    log   Scale:                         0.21414
Method:                          IRLS   Log-Likelihood:                    nan
Date:                Fri, 10 Dec 2021   Deviance:                       5.1846
Time:                        21:08:14   Pearson chi2:                     5.14
No. Iterations:                     5                                         
Covariance Type:            nonrobust                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          5.7793      0.683      8.457      0.0