# Main calculations

Currently kick off by creating the `coeff_ser` pd.Series.  In due course might want to set up so this is automatically created when the library is imported (i.e. call the functions in the `__init__.py` of the module directory).  Though that would mean you had to have the target specified.  Not great.  Possibly best left as is - given that need target to be explicit. Perhaps I should pass views of `coeff_ser` to the functions?


## Next up
Just work through the component functions.
Haven't worked out how to do higher order calcs, eg for treatment impact or populations, but figure will need these components anyway.  May do patient objects?

Write some unit tests.

Probably a good idea to `pickle` or `shelve` the `coeff_ser` object.  (And rename it!)


## Notes
Only imports `FormCare_spICD_mults`, not the ICD identifiers, so need to do the lookups in the calculation based on knowing which ICD corresponded to which index.  This is basically what I do for other stuff anyway.  Had to do it this way to keep the `pd.Series` as a float.


In [1]:
import pandas as pd
import numpy as np
from openpyxl import load_workbook
from read_coeffs import *

In [45]:
wb = load_workbook('data/wsi_v0.1.31.xlsx', data_only=True)
coeff_ser = make_pdseries(get_coeffs(wb))

Check the index

In [78]:
ind = coeff_ser.index.levels[0]
print("{:<25}".format("Entry"), "Length")
print("")
for entry in ind:
    print(entry.ljust(27), str(len(coeff_ser[entry])).rjust(4))
print("\n{:<27}".format("Total"), len(coeff_ser))    
coeff_ser.dtype

Entry                     Length

ChCareCons_costpcm             1
FormCare_agecutoff             1
FormCare_coeff                 1
FormCare_const                 1
FormCare_costpcm               1
FormCare_spICD_mults           6
GovCons_Ed                     1
GovCons_EdAdj_byage          100
GovCons_Health                 1
GovCons_HealthAdj_byage      100
GovCons_exclHealthEd           1
InfCare_betacoeffs            25
InfCare_gammacoeffs            4
InfCare_maxhpd                 1
InfCare_minhpd                 1
PrivCons_byage               100
Prod_MCS_coeffs                3
Prod_PCS_coeffs                3
Prod_prod_coeffs               7
Prod_rate_in_FH_byage        100
Prod_wagepcm_F_byage         100
Prod_wagepcm_M_byage         100
UChCare_hpcm_F_byage         100
UChCare_hpcm_M_byage         100
UProd_agecutoff                1
UProd_coeff_F                  1
UProd_coeff_M                  1
UProd_const_F                  1
UProd_const_M                  1
USickCare

dtype('float64')

In [58]:
len(coeff_ser)

1066

Can treat this series like an object (well it is an object, so..).  Eg:

In [54]:
coeff_ser.Prod_MCS_coeffs

0     1.0383
1     5.0122
2    32.5459
dtype: float64

## Now the actual functions
Currently in middle of `unpaid_prod`.

In [82]:
def production(age, gen, qol):
    return paid_prod(age, gen, qol) + unpaid_prod(age, gen, qol)

In [56]:
def paid_prod(age=35, gen='F', qol=0.9, debug=False):
    '''X this new docstring here'''
    if gen=='F':
        wage = coeff_ser.Prod_wagepcm_F_byage[age]    
    else:
        wage = coeff_ser.Prod_wagepcm_M_byage[age]
    on_costs = coeff_ser.on_costs
    prod = productivity(age, qol, debug=False) 
    return prod * wage * (1 + on_costs)

In [55]:
def productivity(age, qol, debug=False):
    '''returns productivity, as % of time worked, for given age and QoL'''
    MCS = (coeff_ser.Prod_MCS_coeffs[0] * (age/10)) \
          + (coeff_ser.Prod_MCS_coeffs[1] * qol) \
            + coeff_ser.Prod_MCS_coeffs[2]

    PCS = (coeff_ser.Prod_PCS_coeffs[0] * (age/10)) \
           + (coeff_ser.Prod_PCS_coeffs[1] * qol) \
            + coeff_ser.Prod_PCS_coeffs[2]
    
    prod_sum = ((coeff_ser.Prod_prod_coeffs[0] * (age/10)) 
                + (coeff_ser.Prod_prod_coeffs[1] * ((age/10)**2)) 
                + (coeff_ser.Prod_prod_coeffs[2] * (MCS/10)) 
                + (coeff_ser.Prod_prod_coeffs[3] * ((MCS/10)**2)) 
                + (coeff_ser.Prod_prod_coeffs[4] * (PCS/10)) 
                + (coeff_ser.Prod_prod_coeffs[5] * ((PCS/10)**2)) 
                + coeff_ser.Prod_prod_coeffs[6])
    
    productivity = np.exp(prod_sum)/(1+np.exp(prod_sum))
    
    return productivity

In [70]:
def unpaid_prod(age, gen, qol):
    unpaid_prod_hpcm = gen_unpaid_prod_hpcm() + \
                       unpaid_sick_care_hpcm() + \
                       unpaid_childcare_hpcm()
    sick_rate = productivity(age, qol) / coeff_ser.Prod_rate_in_FH_byage[age]
    unpaid_prod = unpaid_prod_hpcm * coeff_ser.Value_per_hour_of_time * sick_rate
    return unpaid_prod

In [69]:
def gen_unpaid_prod_hpcm():
    return 1

In [68]:
def unpaid_sick_care_hpcm():
    return 1

In [67]:
def unpaid_childcare_hpcm():
    return 1

In [84]:
timeit production(33,'M',0.9)

100 loops, best of 3: 8.17 ms per loop


In [75]:
productivity(33, 0.9)
coeff_ser.Prod_rate_in_FH_byage[33]
coeff_ser.Value_per_hour_of_time

0    8.698674
dtype: float64

In [80]:
timeit unpaid_prod(88, 'M', 0.5)

The slowest run took 5.51 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 4 ms per loop
