# Convex Nonparametric Least Square (`CNLS`)

   + Author : Sheng Dai (sheng.dai@aalto.fi)
   + Date : April 26, 2020

References:

[1] Kuosmanen, T., Johnson, A. and Saastamoinen, A. (2015). Stochastic Nonparametric Approach to Efficiency Analysis: A unified Framework, in Zhu, J. (ed.) Data Envelopment Analysis. Springer, pp. 191–244.

[2] Johnson, A. L. and Kuosmanen, T. (2015). An Introduction to CNLS and StoNED Methods for Efficiency Analysis: Economic Insights and Computational Aspects, in Ray, S. C., Kumbhakar, S. C., and Dua, P. (eds) Benchmarking for Performance Evaluation: A Production Frontier Approach. Springer, pp. 117–186.

## Estimating cost function

Hildreth (1954) was the first to consider nonparametric regression subject to monotonicity and concavity constraints in the case of a single input variable $x$. Kuosmanen (2008) extended Hildreth’s approach to the multivariate setting with a vector-valued $\bf{x}$, and coined the term convex nonparametric least squares (`CNLS`) for this method. `CNLS` builds upon the assumption that the true but unknown cost function $f$ belongs to the set of continuous, monotonic decreasing and globally concave functions, imposing exactly the same production axioms as standard DEA. 

+ The multivariate `CNLS` formulation is defined as:

\begin{align*}
& \underset{\alpha, \beta, \varepsilon} {min} \sum_{i=1}^n\varepsilon_i^2 \\
& \text{s.t.} \\
&  y_i = \alpha_i + \beta_i^{'}X_i + \varepsilon_i \quad \forall i \\
&  \alpha_i + \beta_i^{'}X_i \ge \alpha_j + \beta_j^{'}X_i  \quad  \forall i, j\\
&  \beta_i \ge 0 \quad  \forall i \\
\end{align*}

   where $\alpha_i$ and $\beta_i$ define the intercept and slope parameters of tangent hyperplanes that characterize the estimated piece-wise linear frontier. $\varepsilon_i$ denotes the CNLS residuals. The first constraint can be interpreted as a multivariate regression equation, the second constraint imposes concavity, and the third constraint imposes monotonicity.

+ Log-transformed CNLS formulation

    Most SFA studies use Cobb-Douglas or translog functional forms where inefficiency and noise affect production in a multiplicative fashion. Note that the assumption of constant returns to scale (CRS) would also require multiplicative error structure. The log-transformed `CNLS` formulation:

\begin{align*}
& \underset{\alpha, \beta, \varepsilon} {min} \sum_{i=1}^n\varepsilon_i^2 \\
& \text{s.t.} \\
&  \text{ln}y_i = \text{ln}(\phi_i+1) + \varepsilon_i  \quad \forall i\\
& \phi_i  = \alpha_i+\beta_i^{'}X_i -1 \quad \forall i \\
&  \alpha_i + \beta_i^{'}X_i \ge \alpha_j + \beta_j^{'}X_i  \quad  \forall i, j\\
&  \beta_i \ge 0 \quad  \forall i \\
\end{align*}

## Example

In [1]:
import pandas as pd
import numpy as np

In [2]:
# import the package pystoned
from pystoned import CNLS

In [3]:
# import data
url = 'https://raw.githubusercontent.com/ds2010/pyStoNED-Tutorials/master/Data/data.csv'
df = pd.read_csv(url, error_bad_lines=False)

# output
y = df['TOTEX']

# inputs
x1 = df['Energy']
x1 = np.asmatrix(x1).T
x2 = df['Length']
x2 = np.asmatrix(x2).T
x3 = df['Customers']
x3 = np.asmatrix(x3).T
x = np.concatenate((x1, x2, x3), axis=1)

In [4]:
# define and solve the CNLS model

cet = "mult"
fun = "cost"
rts = "crs"

model = CNLS.cnls(y, x, cet, fun, rts)

# using remote solver (NEOS)
from pyomo.environ import SolverManagerFactory
solver_manager = SolverManagerFactory('neos')
results = solver_manager.solve(model, opt='knitro', tee=True)

In [5]:
# display the estimates (beta and residual)
model.b.display()
model.e.display()

b : beta
    Size=267, Index=b_index
    Key     : Lower : Value                  : Upper : Fixed : Stale : Domain
     (0, 0) :   0.0 :        8.3344157645552 :  None : False : False :  Reals
     (0, 1) :   0.0 :     0.9986512579899623 :  None : False : False :  Reals
     (0, 2) :   0.0 :   0.010791032486150796 :  None : False : False :  Reals
     (1, 0) :   0.0 :  6.088020374594125e-08 :  None : False : False :  Reals
     (1, 1) :   0.0 :     0.9426507526204403 :  None : False : False :  Reals
     (1, 2) :   0.0 :    0.11508362781827979 :  None : False : False :  Reals
     (2, 0) :   0.0 :      5.014162092001479 :  None : False : False :  Reals
     (2, 1) :   0.0 :     0.9639507564790553 :  None : False : False :  Reals
     (2, 2) :   0.0 :   0.057820335789778306 :  None : False : False :  Reals
     (3, 0) :   0.0 :      6.928933032375247 :  None : False : False :  Reals
     (3, 1) :   0.0 :     1.1081384440873212 :  None : False : False :  Reals
     (3, 2) :   0.0 :   0.0

In [6]:
# retrive the residuals
val = list(model.e[:].value)
eps = np.asarray(val)
eps

array([ 0.03591705,  0.02603548,  0.20219662, -0.00451797,  0.17430052,
       -0.11912249, -0.01130134,  0.1192172 ,  0.37369844, -0.04018423,
        0.02303955, -0.07779539,  0.24790125,  0.25962007, -0.02513006,
       -0.14458385,  0.19495398,  0.16927398, -0.01384278, -0.09871993,
       -0.12696537, -0.26051316,  0.18701396, -0.09403671,  0.12152759,
        0.06493156,  0.24369437, -0.19088083,  0.10527361,  0.0277184 ,
       -0.00155868, -0.43113541, -0.0450978 ,  0.21595499, -0.00401978,
        0.15996073, -0.21593312, -0.15266743, -0.115049  , -0.03704671,
       -0.04825153,  0.04513426,  0.06501279, -0.06184066, -0.08923651,
       -0.13083339,  0.02732435,  0.08403226, -0.13731314, -0.13258987,
        0.19606224, -0.02272717,  0.20040745,  0.1998938 ,  0.04287153,
       -0.21797092, -0.22973008, -0.08861547, -0.19033982,  0.02023355,
       -0.21417276, -0.08037898, -0.03575945,  0.01552399,  0.36593916,
       -0.10950723,  0.08123884, -0.04885468,  0.08217855, -0.23

In [7]:
# retrive the beta
ind = list(model.b)
val = list(model.b[:, :].value)
beta = np.asarray([i + tuple([j]) for i, j in zip(ind, val)])

beta = pd.DataFrame(beta, columns=['Name', 'Key', 'Value'])
beta = beta.pivot(index='Name', columns='Key', values='Value')
beta.columns = ['b1', 'b2', 'b3']
beta

Unnamed: 0_level_0,b1,b2,b3
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0.0,8.334416e+00,0.998651,1.079103e-02
1.0,6.088020e-08,0.942651,1.150836e-01
2.0,5.014162e+00,0.963951,5.782034e-02
3.0,6.928933e+00,1.108138,7.154119e-03
4.0,6.887182e+00,1.139710,6.203649e-12
...,...,...,...
84.0,8.334416e+00,0.998651,1.079103e-02
85.0,8.334414e+00,0.998651,1.079108e-02
86.0,8.334415e+00,0.998651,1.079104e-02
87.0,4.714679e+00,1.033776,5.042171e-02
