Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
branch: master
Fetching contributors…

Octocat-spinner-32-eaf2f5

Cannot retrieve contributors at this time

file 270 lines (195 sloc) 12.084 kb
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270
> module Main (
> main
> ) where
>
> import Data.LinearProgram
> import Control.Monad.LPMonad
> import Data.LinearProgram.GLPK
> import Control.Monad.State
> import qualified Data.Map as Map

One of the more important tools in my computational toolbox is linear programming. It's a great way to solve a
lot of otherwise difficult problems in a straightforward, nearly black-box manner. I've discovered that a lot
of programmers aren't too familiar with it, so I'm writing this tutorial that explains how to use it in
practical purposes.

The problem
-----------

Specifically, I'll discuss AWS capacity plannning. On AWS, you have two types of servers.

The first is an on-demand instance. You can spin these servers up or down whenever you want.
An extra large server costs you $0.64/hour if you rent it on-demand.

The second type of server is a reserved instance. To use a reserved instance, you must
pay a large fixed cost. After that, to use the server you pay a much lower hourly rate.
The specific costs:

1. Heavy utilization - $1560/year $0.128/hour
2. Medium utilization - $1280/year, $0.192/hour
2. Light utilization - $552/year, $0.312/hour

I'll demonstrate shortly (with a few lines of Haskell) how to use linear programming to
provide sufficient capacity while minimizing your costs.

Introduction
------------

Linear programming is the science of solving a deceptively simple math problem:

    find argmax f(x)
    subject to constraint: M*x >= v

Here, `x` is a finite dimensional vector, in computational terms, an array with
N elements in it.

`f(x)` is a linear function of type: `R^N -> R` (vectors of `N` real numbers to a
single real number) with the property that `f(a*x+y)=a*f(x)+f(y)`. Here, `a` is
a scalar, and `y` is also a vector. `f(x)` is typically called the *objective function*.

The constraint takes the form `M*x >= v`, where `M` is a matrix, and `v` is a vector.
Inequality is interpreted pointwise. A simple example:

    find argmax 1.5*x+y
    subject to:
        x >= 0
        y >= 0
        -1*x + -2*y >= 4

The constraints demand that the solution `(x, y)` lives in a bounded set, specifically
the triangle with vertices `(0,0)`, `(4,0)` and `(0,2)`. And the goal is to make `1.5*x+y`
as big as possible.

So where can the solution lie?

The first fact to observe is that it can't lie on the *interior* of the triangle. It must
be on one of the edges. To see why this is so, imagine the maximum were at some point`(x0, y0)`
inside the triangle but not on the edge. In that case, I can find a very small value `d`
for which `(x0+d, y0)` is still inside the triangle - specifically, I just pick `d` equal
to half the distance to the edge. If I compute `f(x0+d,y) = 1.5*x0+1.5*d+y0`, then this value
is bigger than `f(x0,y)=1.5*x0+y0`. This means that `(x0, y0)` was not really the maximum.

I can make similar arguments about the edges. Suppose `p` is a point
on the edge of triangle (but not the corner), and suppose `d` is a
vector pointing along the edge. By the linearity of the objective
function, `f(p+d)=f(p)+f(d)`. If `f(d) > 0`, then I can increase the
objective function by moving in the direction of `d`, otherwise I can
increase it by moving in the direction of `-d`. This means the maximum
can't be on the edge either, except in the special case that `f(d)=0`.

![diagram illustrating reasoning](polygon_diagram.png)

So since the maximum of the objective function doesn't live inside the
triangle or on the side, we find it must like at one of the corners of
the triangle.

This observation gives rise to the [simplex
algorithm](https://en.wikipedia.org/wiki/Simplex_algorithm). The
simplex algorithm, as oversimplified by myself, consists of the
following:

1. Start at at a corner
2. Compute the set of edges adjacent to this corner for which moving
away from the corner along that edge increases `f(x)`.
3. Pick one of these edges, and follow it until you reach another corner.
4. Repeat until you can't find any directions to move which increase `f(x)`.
5. Declare victory.

In practice, you need to be a lot more careful than this. But in
principle, this is the core of the simplex algorithm.



How it's used
-------------

To begin, make sure you have the
[GLPK](http://www.gnu.org/software/glpk/) installed. On an ubuntu
system with [cabal](http://www.haskell.org/cabal/) installed, run:

    $ sudo aptitude install glpk glpk-dev
    $ cabal install glpk-hs

Now let's represent our problem in Haskell.

For simplicity, suppose we have 3 typical 8-hour load periods -
midnight-8AM, 8AM-4PM, and 4PM-midnight. During these periods, we
expect various load patterns. For example, we might expect:

> loadPattern = Map.fromList [ ("night", 5), ("morning", 25), ("evening", 100) ]

As I mentioned in the intro, we have the option of using Amazon
on-demand instances or various types of reserved instances. Amazon's
[pricing](http://aws.amazon.com/ec2/pricing/) is described here, so
all we need to do is translate this to code.

First, on-demand instances:

> onDemandHourlyCost = 0.64

Then, reserved instances:

> reservationFixedCosts = [("light", 552.0), ("medium", 1280.0), ("heavy", 1560.0)]
> reservationVariableCosts = [("light", 0.312), ("medium", 0.192), ("heavy", 0.128)]
> reservationTypes = map (\(x,y) -> x) reservationFixedCosts

Now let's write our *objective function*, given a load
pattern. Although we are trying to minimize rather than maximize it,
everything I said in the introduction applies.

We will do this as follows. We will first construct a list of tuples,
of the form `[(3, "x"), (2, "y")]`. We then pass this to the function
`linCombination` from
[glpk-hs](http://hackage.haskell.org/package/glpk-hs), which
translates this into a form glpk can read. I.e., `linCombination [(3,
"x"), (2, "y")]` is interpreted by glpk to mean 3*x+2*y.

> dailyCost :: Map.Map [Char] Double -> LinFunc [Char] Double
> dailyCost loadPattern = let
> periods = Map.keys loadPattern
> reservationFixedCostsObj = [ ( cost / 365.0, "reservation_" ++ kind)
> | (kind, cost) <- reservationFixedCosts ]
> -- cost of reserving an instance
> reservationVariableCostsObj = [ (cost*8, "reserved_" ++ kind ++ "_" ++ period)
> | (kind, cost) <- reservationVariableCosts,
> period <- periods ]
> -- cost of *running* the reserved instance
> onDemandVariableCostsObj = [ (onDemandHourlyCost*8, "onDemand_" ++ p) |
> p <- periods ] -- cost of running on-demand instances
> in
> linCombination (reservationFixedCostsObj ++
> reservationVariableCostsObj ++
> onDemandVariableCostsObj)

Now we need to actually represent the problem to glpk. We do this using our constraints.

The first constraint is that we actually need to meet our capacity requirements:

> capacityConstraints loadPattern = [ (linCombination ([ (1.0, "onDemand_" ++ period) ] ++
> [(1.0, "reserved_" ++ k ++ "_" ++ period)
> | k <- reservationTypes]), load)
> | (period, load) <- (Map.assocs loadPattern)]

This set of constraints means that in any period, the number of
on-demand instances running plus the number of reserved instances
running must be high enough to meet capacity in that period. I.e.:

    (# on-demand) + (# light reserved) + (# medium reserved) + (# heavy reserved) >= desired capacity

The second constraint tells GLPK that in order to *use* a reserved
instance, we need to reserve it first.

> reservationConstraints loadPattern = [ (linCombination [ (1.0, "reservation_" ++ k), (-1.0, "reserved_" ++ k ++ "_" ++ p) ], 0.0) |
> p <- (Map.keys loadPattern),
> k <- reservationTypes ]

In traditional math terminology, this is just the constraint that:

    (# heavy instances reserved) - (# heavy reserved instances running at night) >= 0

Equivalently:

    (# heavy instances reserved) >= (# heavy reserved instances running at night)

(And similarly for other periods, and for medium and light instances.)


The final set of constraints is that all these quantities must be
positive. We just create a list of variables, and then use the glpk
function `varGeq` on each of these variables:

> allVariables loadPattern = ["onDemand_" ++ p | p <- periods] ++
> ["reserved_" ++ k ++ "_" ++ p | p <- periods, k <- reservationTypes] ++
> ["reservation_" ++ k | k <- reservationTypes]
> where periods = Map.keys loadPattern

Now we actually call glpk:

> lp :: Map.Map [Char] Double -> LP String Double
> lp loadPattern = execLPM $ do
> setDirection Min

Tell glpk we want to minimize, rather than maximize.

> setObjective (dailyCost loadPattern)

Tell glpk what we want to minimize.

> mapM (\(func, val) -> func `geqTo` val) (reservationConstraints loadPattern)
> mapM (\(func, val) -> func `geqTo` val) (capacityConstraints loadPattern)

Tell glpk about our first and second set of constraints.

> mapM (\var -> varGeq var 0.0) (allVariables loadPattern)

Tell glpk that all variables need to be positive.

> mapM (\var -> setVarKind var IntVar) (allVariables loadPattern)

Lastly, we'll tell glpk to make each variable an Integer (since we
can't reserve fractional instances).


Finally, a little function to print the results:

> printLPSolution loadPattern = do
> x <- glpSolveVars mipDefaults (lp loadPattern)
> putStrLn (show (allVariables loadPattern))
> case x of (Success, Just (obj, vars)) -> do
> putStrLn "Success!"
> putStrLn ("Cost: " ++ (show obj))
> putStrLn ("Variables: " ++ (show vars))
> (failure, result) -> putStrLn ("Failure: " ++ (show failure))

And we run.

> main :: IO ()
> main = do
> printLPSolution (Map.fromList [ ("night", 12.2), ("morning", 25.1), ("evening", 53.5) ])

This applies to the particular case that we need 12 servers at night, 25 in the morning, and 53 in the evening. The result:

    Success!
    Cost: 289.9164931506849
    Variables: fromList [("onDemand_evening",0.0),("onDemand_morning",0.0),("onDemand_night",0.0),("reservation_heavy",26.0),("reservation_light",28.0),("reservation_medium",0.0),("reserved_heavy_evening",26.0),("reserved_heavy_morning",26.0),("reserved_heavy_night",13.0),("reserved_light_evening",28.0),("reserved_light_morning",0.0),("reserved_light_night",0.0),("reserved_medium_evening",0.0),("reserved_medium_morning",0.0),("reserved_medium_night",0.0)]

So now we know what we need to do. We make 26 heavy reservations and
28 light ones. At night we run 13 heavy reserved instances. In the
morning we run 26 heavy reserved instances. In the evening, we run 26
heavy reserved instances and 28 light reserved instances.

As this post shows, linear programming is fairly straightforward to
use. This post only contains 30 lines of code, ignoring the
`printLPSolution` function. And once you learn it, it's a great tool
to have in your toolbox. A surprisingly large number of problems can
be solved using linear programming, and calling glpk is virtually
always easier than writing your own code to solve the problem.

P.S. This blog post is generated from [literate Haskell](http://www.haskell.org/haskellwiki/Literate_programming) -
it's a great way to write code. If you go read the [source code](https://github.com/stucchio/Cloud-capacity-planning/blob/master/cloud-planning.lhs)
for this program, you'll discover it's the blog post itself.
Something went wrong with that request. Please try again.