# Convex Nonparametric Least Square (`CNLS`)

   + Author : Sheng Dai (sheng.dai@aalto.fi)
   + Date : April 26, 2020

References:

[1] Kuosmanen, T., Johnson, A. and Saastamoinen, A. (2015). Stochastic Nonparametric Approach to Efficiency Analysis: A unified Framework, in Zhu, J. (ed.) Data Envelopment Analysis. Springer, pp. 191–244.

[2] Johnson, A. L. and Kuosmanen, T. (2015). An Introduction to CNLS and StoNED Methods for Efficiency Analysis: Economic Insights and Computational Aspects, in Ray, S. C., Kumbhakar, S. C., and Dua, P. (eds) Benchmarking for Performance Evaluation: A Production Frontier Approach. Springer, pp. 117–186.

## Estimating production function

Hildreth (1954) was the first to consider nonparametric regression subject to monotonicity and concavity constraints in the case of a single input variable $x$. Kuosmanen (2008) extended Hildreth’s approach to the multivariate setting with a vector-valued $\bf{x}$, and coined the term convex nonparametric least squares (`CNLS`) for this method. `CNLS` builds upon the assumption that the true but unknown production function $f$ belongs to the set of continuous, monotonic increasing and globally concave functions, imposing exactly the same production axioms as standard DEA. 

The multivariate `CNLS` formulation is defined as:

\begin{align*}
& \underset{\alpha, \beta, \varepsilon} {min} \sum_{i=1}^n\varepsilon_i^2 \\
& \text{s.t.} \\
&  y_i = \alpha_i + \beta_i^{'}X_i + \varepsilon_i \quad \forall i \\
&  \alpha_i + \beta_i^{'}X_i \le \alpha_j + \beta_j^{'}X_i  \quad  \forall i, j\\
&  \beta_i \ge 0 \quad  \forall i \\
\end{align*}

where $\alpha_i$ and $\beta_i$ define the intercept and slope parameters of tangent hyperplanes that characterize the estimated piece-wise linear frontier. $\varepsilon_i$ denotes the CNLS residuals. The first constraint can be interpreted as a multivariate regression equation, the second constraint imposes convexity, and the third constraint imposes monotonicity.

## Example

In [1]:
import pandas as pd
import numpy as np

In [2]:
# import the package pystoned
from pystoned import CNLS

In [3]:
# import Finnish electricity distribution firms data
url = 'https://raw.githubusercontent.com/ds2010/pyStoNED-Tutorials/master/Data/firms.csv'
df = pd.read_csv(url, error_bad_lines=False)
df.head(5)

Unnamed: 0,OPEX,CAPEX,TOTEX,Energy,Length,Customers,PerUndGr
0,681,729,1612,75,878,4933,0.11
1,559,673,1659,62,964,6149,0.21
2,836,851,1708,78,676,6098,0.75
3,7559,8384,18918,683,12522,55226,0.13
4,424,562,1167,27,697,1670,0.03


In [4]:
# output
y = df['Energy']

# inputs
x1 = df['OPEX']
x1 = np.asmatrix(x1).T
x2 = df['CAPEX']
x2 = np.asmatrix(x2).T
x = np.concatenate((x1, x2), axis=1)

In [5]:
# define and solve the CNLS model

cet = "addi"
fun = "prod"
rts = "vrs"

model = CNLS.cnls(y, x, cet, fun, rts)

# using local solver (MOSEK API)
from pyomo.opt import SolverFactory
opt = SolverFactory("mosek")
results = opt.solve(model, tee=True)

Problem
  Name                   :                 
  Objective sense        : min             
  Type                   : QO (quadratic optimization problem)
  Constraints            : 7921            
  Cones                  : 0               
  Scalar variables       : 445             
  Matrix variables       : 0               
  Integer variables      : 0               

Optimizer started.
Quadratic to conic reformulation started.
Quadratic to conic reformulation terminated. Time: 0.02    
Presolve started.
Linear dependency checker started.
Linear dependency checker terminated.
Eliminator started.
Freed constraints in eliminator : 89
Eliminator terminated.
Eliminator started.
Freed constraints in eliminator : 0
Eliminator terminated.
Eliminator - tries                  : 2                 time                   : 0.00            
Lin. dep.  - tries                  : 1                 time                   : 0.00            
Lin. dep.  - number                 : 0              

In [6]:
# display the estimates (alpha, beta, and residual)
model.a.display()
model.b.display()
model.e.display()

a : alpha
    Size=89, Index=i
    Key : Lower : Value               : Upper : Fixed : Stale : Domain
      0 :  None : -22.935741971867564 :  None : False : False :  Reals
      1 :  None :  -22.98654289510969 :  None : False : False :  Reals
      2 :  None :   -22.8637959714704 :  None : False : False :  Reals
      3 :  None :   33.46107704292502 :  None : False : False :  Reals
      4 :  None : -22.938166263476212 :  None : False : False :  Reals
      5 :  None :  -17.75390495681091 :  None : False : False :  Reals
      6 :  None :  -22.92070101309105 :  None : False : False :  Reals
      7 :  None : -17.794878482671272 :  None : False : False :  Reals
      8 :  None :  -22.90315616956426 :  None : False : False :  Reals
      9 :  None :  -17.87076425504981 :  None : False : False :  Reals
     10 :  None :   89.09009200831117 :  None : False : False :  Reals
     11 :  None :   90.88462434112832 :  None : False : False :  Reals
     12 :  None : -22.835086214732662 :  None 

      1 :  None :  1.4139940976609893 :  None : False : False :  Reals
      2 :  None : -22.223707778234058 :  None : False : False :  Reals
      3 :  None : -350.91104218460237 :  None : False : False :  Reals
      4 :  None :  -13.70003770933792 :  None : False : False :  Reals
      5 :  None :  101.01492221871933 :  None : False : False :  Reals
      6 :  None : -28.872351766285107 :  None : False : False :  Reals
      7 :  None : -14.039666476541953 :  None : False : False :  Reals
      8 :  None : -0.8474305859600975 :  None : False : False :  Reals
      9 :  None :  56.894372464358895 :  None : False : False :  Reals
     10 :  None :   285.5068453191418 :  None : False : False :  Reals
     11 :  None :   679.3838954793009 :  None : False : False :  Reals
     12 :  None : -20.229963004068964 :  None : False : False :  Reals
     13 :  None :  -70.00784191554195 :  None : False : False :  Reals
     14 :  None :  10.506539573501414 :  None : False : False :  Reals
     1

In [7]:
# retrive the alpha
val = list(model.a[:].value)
alpha = np.asarray(val)
alpha

array([-22.93574197, -22.9865429 , -22.86379597,  33.46107704,
       -22.93816626, -17.75390496, -22.92070101, -17.79487848,
       -22.90315617, -17.87076426,  89.09009201,  90.88462434,
       -22.83508621, -17.80922168, 102.38639157, -17.67072607,
       -22.96478157, -22.91711305, -22.90710605, -17.6820098 ,
       -17.28941355, -21.14677138, -25.26674503, -17.63199337,
       -22.95755108, -22.84459768, -22.84621257,  33.4613221 ,
       -22.96383052, -17.80276734, -22.90794557, 104.33191265,
       -17.7114322 , -17.77622109, -19.65343202, -17.66687285,
       -17.70397656, -22.96480511, -22.99005476, -17.6281484 ,
       -17.66150147, -22.99221792,  33.46178545, -22.92142768,
       -22.89371926, -22.94276172, -17.6369817 , -22.99131548,
       -17.66649268,  33.45230894, -22.94213913, -17.70744385,
       -22.87512746, -17.66156705, -22.91239992,  22.31227387,
       -14.38711198, -22.8824601 , -22.0951086 , -17.76763495,
       -24.54292696,  33.46143999, -17.62882348, -13.85

In [8]:
# retrive the residuals
val = list(model.e[:].value)
eps = np.asarray(val)
eps

array([  -2.80240049,    1.4139941 ,  -22.22370778, -350.91104218,
        -13.70003771,  101.01492222,  -28.87235177,  -14.03966648,
         -0.84743059,   56.89437246,  285.50684532,  679.38389548,
        -20.229963  ,  -70.00784192,   10.50653957,   74.59739943,
         -6.53896952,  -30.07641319,  -40.134715  ,  -27.77778312,
         48.75951015,   87.00804975,   22.48052139,   23.95079962,
         -2.14170243,  -22.83655374,  -37.86099929, -351.07100145,
         66.85755272,  -21.70272781,   -8.95127728,  216.0060913 ,
         37.59744429,  -31.81138603,  -23.78465848, -201.34117915,
        -69.29026659,    6.26873673,    3.75574364,  -74.60736509,
         -1.97475364,  -11.97099005, -236.74633737,    6.57459175,
         11.65690679,  -20.00411823,  -67.2145622 ,   -5.23924544,
        -55.94688679,  265.51405577,    1.92405574,  -17.65073349,
          4.65633465,   38.85438249,   -2.90330145,  349.52822197,
        163.99055389,   35.00139386,   28.39297571,  -99.13207

In [9]:
# retrive the beta
ind = list(model.b)
val = list(model.b[:, :].value)
beta = np.asarray([i + tuple([j]) for i, j in zip(ind, val)])

beta = pd.DataFrame(beta, columns=['Name', 'Key', 'Value'])
beta = beta.pivot(index='Name', columns='Key', values='Value')
beta.columns = ['b1', 'b2']
beta

Unnamed: 0_level_0,b1,b2
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
0.0,0.135858,1.127412e-02
1.0,0.136375,1.090467e-02
2.0,0.135667,1.136316e-02
3.0,0.132352,2.103738e-08
4.0,0.147554,1.913021e-03
...,...,...
84.0,0.134069,8.090706e-03
85.0,0.135626,1.175963e-02
86.0,0.135828,1.145338e-02
87.0,0.136025,1.123857e-02
