# Disentangling cost drivers using cost change decomposition
This notebook demonstrates a Python module (`ccdecomp`) for cost change decomposition, a method of separating the contributions of different driving variables to cost change in a technology.  This method is described in [Kavlak G, McNerney J, Trancik JE, “Evaluating the causes of cost reduction in photovoltaic modules”, Energy Policy 123, 700 - 710 (2018)](https://www.sciencedirect.com/science/article/abs/pii/S0301421518305196).

Cost change decomposition is useful for studying a technology or production process for which cost be expressed as an equation in a set of underlying variables, and one has data on how these variables have changed (or will) over time.  For example, in the reference above, the authors analyzed the drivers of cost evolution in solar modules.  The costs of a solar module (dollars per Watt) can be modeled as
$$
C = \frac{\alpha}{\sigma A \eta y} \left[ A \nu \rho p_s + cA + p_0 \left( \frac{K}{K_0} \right)^{-b} \right],
$$
where symbols have the following meanings:

| Symbol   | Meaning                                | Type      |
|:--------:|----------------------------------------|-----------|
| $A$      | wafer area                             | variable  |
| $y$      | manufacturing yield                    | variable  |
| $\eta$   | module efficiency                      | variable  |
| $\nu$    | silicon usage                          | variable  |
| $p_s$    | polysilicon price                      | variable  |
| $c$      | per-area cost of non-silicon materials | variable  |
| $K$      | manufacturing plant size               | variable  |
| $\alpha$ | module area utilization                | parameter |
| $\sigma$ | solar constant                         | parameter |
| $\rho$   | wafer density                          | parameter |
| $K_0$    | reference manufacturing plant size     | parameter |
| $p_0$    | cost of reference plant                | parameter |
| $b$      | scaling factor                         | parameter |

Improvements in wafer area, manufacturing yield, and other variables caused significant reductions in the cost of solar modules over time.  However, it can be challenging to pinpoint the contributions of each variable to cost reduction to understand which were most important, because costs may have non-additive and non-linear dependence on these variables, and because many variables may have changed at the same time.  Cost change decomposition provides a method for quantifying the separate contributions of these variables to cost reduction.  The mathematical details of this approach are given in [the reference above](https://www.sciencedirect.com/science/article/abs/pii/S0301421518305196).  This Python module gives users the ability to apply this method more readily to their own cost problems.

The library exposes a simple interface with **three steps**:

1. **Specify the cost model:** Enter a cost model equation.
2. **Bind the model to data:** Enter the values of variables and parameters of this equation at different periods in time.
3. **Compute cost change decompositions:** Decompose total cost change between any two periods into contributions from individual variables.

After each step various display methods can be used to verify that problem parameters were specified correctly.  The final results are amounts that each variable contributed to cost change over an asked-for span of time.  These are returned in a data frame from which a user can carry out further anaylsis.

This code is at an early stage.  Future work will include adding tests, providing other ways to load data (such as csv), and adding documentation.

In [1]:
import numpy as np
import pandas as pd
from ccdecomp import CostModel
pd.set_option('display.precision', 3)

## 1. Specify the cost model


In [21]:
# Initialize
cp = CostModel(
    title = 'A made up cost model.',
    equation = 'a1 * x1 x2+a2 * x3 *x4_cubed x2 * x1 + x5 + a3 a4 CF * x2'
)

# Rename cost components [optional]
cp.name_cost_components(['Materials','Labor','Equipment','O&M'])

# Specify which symbols are fixed parameters [optional]
cp.identify_parameters(['a1','a2','a3','a4'])

# Check that the cost model was correctly parsed
print('\n\nCost model summary:')
print(cp)



Cost model summary:
Title:                  A made up cost model.
Equation:               a1 x1 x2 + a2 x3 x4_cubed x2 x1 + x5 + a3 a4 CF x2
Cost comp. names:       ['Materials', 'Labor', 'Equipment', 'O&M']
Cost comp. expressions: ['a1 x1 x2', 'a2 x3 x4_cubed x2 x1', 'x5', 'a3 a4 CF x2']
Symbols:                ['a1', 'x1', 'x2', 'a2', 'x3', 'x4_cubed', 'x5', 'a3', 'a4', 'CF']
Variables:              ['x1', 'x2', 'x3', 'x4_cubed', 'x5', 'CF']
Parameters:             ['a1', 'a2', 'a3', 'a4']
Num. cost components:   4
Num. symbols:           10
Num. variables:         6
Num. parameters:        4
Dependency matrix: 
[[1. 1. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 1. 1. 1. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 1. 1. 1.]]


## 2. Bind to data

In [41]:
# Enter data
time = [1980, 1985, 1993]
data = pd.DataFrame(index=time)
data['a1'] = 1 * np.ones((len(time), 1))
data['a2'] = 2.5 * np.ones((len(time), 1))
data['a3'] = 10 * np.ones((len(time), 1))
data['a4'] = 0.03 * np.ones((len(time), 1))
data['x1'] = [1, 1, 1.2]
data['x2'] = [100, 120, 150]
data['x3'] = [0.5, 0.6, 0.55]
data['x4_cubed'] = [5, 5.2, 5.7]
data['x5']= [10, 80, 120]
data['CF'] = [30, 40, 41]

# Bind data to symbols
cp.bind_data(data)

# Verify that data was bound correctly.  Examine costs in each period.
print('\n\nOn binding the cost model to data, cost components are automatically computed:')
cp.data



On binding the cost model to data, cost components are automatically computed:


Unnamed: 0_level_0,a1,a2,a3,a4,x1,x2,x3,x4_cubed,x5,CF,Materials,Labor,Equipment,O&M,Total_cost,Share_Materials,Share_Labor,Share_Equipment,Share_O&M
Time period,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
1980,1.0,2.5,10.0,0.03,1.0,100,0.5,5.0,10,30,100.0,625.0,10,900.0,1635.0,0.061,0.382,0.006,0.55
1985,1.0,2.5,10.0,0.03,1.0,120,0.6,5.2,80,40,120.0,936.0,80,1440.0,2576.0,0.047,0.363,0.031,0.559
1993,1.0,2.5,10.0,0.03,1.2,150,0.55,5.7,120,41,180.0,1410.75,120,1845.0,3555.75,0.051,0.397,0.034,0.519


## 3. Compute cost change decompositions

In [42]:
# Specify desire time spans to study
time_spans = [
    (1980, 1985),
    (1985, 1993),
    (1980, 1993)
]

# Get change decomposition over these spans
cp.cost_change_decomposition(time_spans)

# Report results
print('\n\nAfter asking for change decompositions over particular time spans:')
cp.cost_change_contributions



After asking for change decompositions over particular time spans:


Unnamed: 0_level_0,Total,Sum_of_changes(vars),Materials,Labor,Equipment,O&M,x1,x2,x3,x4_cubed,x5,CF
Time span,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1980-1985,941.0,941.0,20.0,311.0,70,540.0,0.0,369.873,140.399,30.202,70.0,330.526
1985-1993,979.75,979.75,60.0,474.75,40,405.0,237.96,655.888,-100.689,106.239,40.0,40.351
1980-1993,1920.75,1920.75,80.0,785.75,110,945.0,200.782,980.293,91.988,126.461,110.0,411.226


In [43]:
# Show auxiliary quantities used in the computation (optional)
print('\n\nRepresentative values of cost components:')
cp.representative_costs



Representative values of cost components:


Unnamed: 0_level_0,Materials,Labor,Equipment,O&M
Time span,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1980-1985,109.696,770.062,33.663,1148.927
1985-1993,147.978,1157.189,98.652,1634.144
1980-1993,136.104,965.147,44.267,1316.45


In [44]:
print('Log changes to variables:')
cp.variable_changes

Log changes to variables:


Unnamed: 0_level_0,x1,x2,x3,x4_cubed,x5,CF
Time span,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1980-1985,0.0,0.182,0.182,0.039,2.079,0.288
1985-1993,0.182,0.223,-0.087,0.092,0.405,0.025
1980-1993,0.182,0.405,0.095,0.131,2.485,0.312
