# Motor Burning Cost Calculation for AXA France

This notebook 

## 1. Parameters

Change these parameters to :
1. update the burning cost models ;
2. apply to a different contract database ;
3. select the output csv file.

In [13]:
burning_cost_models_filename = './data/models.xlsx'
contract_database_filename = './data/base_RC_1000.csv'
output_filename = './output/pp'

## 2. Initialization

This block :
1. imports all externals libraries ;
2. configures the spark big data engine for distributed computation.

In [14]:
import shutil
import importlib
import findspark
import pyspark
import model

findspark.init()
spark = pyspark.sql.SparkSession.builder \
        .master("local") \
        .appName("Pure Premium") \
        .getOrCreate()

## 3. Data Loading

It's time to load :
1. the burning cost models ;
2. the contract database.

The cell outputs the number of contracts in the database.

In [15]:
models = importlib.reload(model).Models(burning_cost_models_filename)

df = spark.read.csv(contract_database_filename, header=True)
df.count()

999

## 4. User Define Computation Functor

This cell creates a [functor](https://github.com/dbrattli/oslash/wiki/Functors,-Applicatives,-And-Monads-In-Pictures) with a [closure](https://en.wikipedia.org/wiki/Closure_(computer_programming) to bind the models into the spark scope by using a technique known as [currying](https://en.wikipedia.org/wiki/Currying).

The method involves a function to create a function that with a binding of the variable "models" into the scope of the inner function.

At the end of the cell, we create the functor by calling the outer function with the models as parameter.

In [21]:
def create_calculate_models_function(models):
    
    def calculate_models(contract):
        results = {key: model.calculate(contract) for (key, model) in models.models.items()}
        return pyspark.sql.Row(**results)
    
    return calculate_models

calculate_model_functor = create_calculate_models_function(models)

## 5. Apply computation functor in the distributed spark engine

This cell applies the functor to every line of the contract database and returns a dataframe with the results of all models.

In [22]:
burning_costs = df.rdd.map(calculate_model_functor).toDF()

## 6. Persist results to disk (csv file)



In [23]:
# delete any previous result
shutil.rmtree(output_filename, ignore_errors=True)

# persist results to a csv file
burning_costs.write.csv(output_filename, header=True)

### Apendices

In [None]:
# print(models.models)

In [None]:
# print(models.models['Modele_RC_FREQ'])

In [None]:
# models.models['Modele_RC_FREQ'].calculate(df.head(), True)

In [None]:
# {key: model.calculate(df.head(), True) for key, model in models.models.items()}

In [20]:
for x in df.take(1000):
    {key: model.calculate(x) for key, model in models.models.items()}
print('ok', len(model.variables_ok), sorted(model.variables_ok))
print('ko', len(model.variables_ko), sorted(model.variables_ko))
print('not_found', len(model.variables_not_found), sorted(model.variables_not_found))
print(model.variables_ok & model.variables_ko)
print(model.variables_ok & model.variables_not_found)
print(model.variables_not_found & model.variables_ko)

ok 0 []
ko 0 []
not_found 0 []
set()
set()
set()
