Permalink
Switch branches/tags
Nothing to show
Find file
Fetching contributors…
Cannot retrieve contributors at this time
353 lines (321 sloc) 11.7 KB
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Building Reliable Websites</title>
<meta name="description" content="Building Reliable Websites that Operate as Designed">
<meta name="author" content="Stephen Kuenzli">
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
<link href='http://fonts.googleapis.com/css?family=Lato:400,700,400italic,700italic' rel='stylesheet' type='text/css'>
<link rel="stylesheet" href="css/reveal.css">
<link rel="stylesheet" href="css/theme/serif.css" id="theme">
<!-- For syntax highlighting -->
<link rel="stylesheet" href="lib/css/zenburn.css">
<!-- If the query includes 'print-pdf', use the PDF print sheet -->
<script>
document.write( '<link rel="stylesheet" href="css/print/' + ( window.location.search.match( /print-pdf/gi ) ? 'pdf' : 'paper' ) + '.css" type="text/css" media="print">' );
</script>
<!--[if lt IE 9]>
<script src="lib/js/html5shiv.js"></script>
<![endif]-->
</head>
<body>
<div class="reveal">
<!-- Used to fade in a background when a specific slide state is reached -->
<div class="state-background"></div>
<!-- Any section element inside of this container is displayed as a slide -->
<div class="slides">
<section>
<h2>Building Reliable Websites</h2>
<br/>
<h4>Load and Performance Edition</h4>
<br/>
<i><small><a href="https://github.com/skuenzli">Stephen Kuenzli</a></small></i>
</section>
<section>
<h3>@author skuenzli</h3>
<br/>
<p>
breaking systems for fun and profit since 2000
</p>
<br/>
<br/>
<p>
<i><small><a href="https://github.com/skuenzli">https://github.com/skuenzli</a></small></i>
</p>
</section>
<section>
<h3>The Process</h3>
<ol>
<li>determine expected site load</li>
<li>validate site handles expected load</li>
<li>stay operational when load exceeds expectations</li>
<li class="fragment">profit!</li>
</ol>
<div class="fragment">
<image src="images/nice.jpg" width="375" height="375"/>
</div>
</section>
<section>
<section>
<br/>
<br/>
<h2>determine expected site load</h2>
</section>
<section>
<h2>Key Metrics</h2>
<ul>
<li>Throughput<span class="fragment">: requests per second</span></li>
<li>Performance<span class="fragment">: response time</span></li>
</ul>
</section>
<section>
<p>What percentage of your customers do you care about?</p>
<br/>
<p class="fragment">50%</p>
<p class="fragment">95%</p>
<p class="fragment">99%</p>
<br/>
<p class="fragment">?</p>
<aside class="notes">
In reality, a page requires multiple requests, so we have a conditional probability to account for.
Good designs will attempt to make services as independent and parallelized as possible.
</aside>
</section>
<section>
<h3>Define a Service Level Agreement</h3>
<br/>
<ul>
<li class="fragment">Throughput: 42 requests per second</li>
<li class="fragment">Performance: 99% of response times <= 100ms</li>
</ul>
<br/>
</section>
<section>
<h3>Don't Forget!</h3>
<br/>
<ul>
<li class="fragment">network latency and bandwidth</li>
<li class="fragment">client processing power</li>
</ul>
</section>
<section>
<h3>HOWTO: measure historical throughput</h3>
<pre><code contenteditable>
# total number of GETs to /myservice for a given day
grep -c 'GET /myservice' logs/app*/access.log.2012-11-16
# estimate peak hour for service from sample
grep 'GET /myservice' logs/app??5/access.log.2012-11-16 | \
perl -nle 'print m|/201\d:(\d\d):|' | sort -n | uniq -c
# total number of GETs to /myservice at peak hour
grep -c '2012-11-16 17:.*GET /myservice' \
logs/app*/access.log.2012-11-16
</code></pre>
</section>
<section>
<h3>HOWTO: measure response time</h3>
<pre><code contenteditable>
# processing times recorded by server in access log
grep "GET /myservice" logs/app*/access.log.2012-11-16 | \
cut -d\" -f7 | sort -n > service.access_times.2012-11-16
</code></pre>
<br/>
<p class="fragment">what about network latency and bandwidth?</p>
<br/>
<p class="fragment">does request fit in the client's resource budget? 50/95/99%?</p>
</section>
<section>
<p>all models are wrong; some models are useful</p>
<br/>
<table>
<tr>
<td width="40%">
<div class="fragment">
<p>model +/- 20%</p>
<ol>
<li>count</li>
<li>compute</li>
<li>judge</li>
</ol>
</div>
</td>
<td width="20%"/>
<td width="40%">
<div class="fragment">
<p>adjust for</p>
<ol>
<li>growth</li>
<li>seasonal loading</li>
<li>margin of error</li>
</ol>
</div>
</td>
</tr>
</table>
<br/>
<br/>
<aside class="notes">
count actual requests in server logs
compute expected request patterns from firebug/chrome waterfall charts
defer to judgement of 'experts' - use with extreme caution
</aside>
</section>
</section>
<section>
<section>
<br/>
<br/>
<h2>validate site handles expected load</h2>
</section>
<section>
<h3>validation process</h3>
<ol>
<li>select a tool</li>
<li>build a simulation</li>
<li>run simulation multiple times / periodically</li>
<li>analyze trends</li>
</ol>
</section>
<section>
<p>Gatling is an Open Source Stress Tool with:</p>
<ul>
<li>A DSL to describe scenarios</li>
<li>High performance</li>
<li>HTTP support</li>
<li>Meaningful reports</li>
<li>Executable from command-line or maven</li>
<li>A scenario recorder</li>
</ul>
<br/>
<br/>
<p>gatling-tool.org</p>
<image src="images/gatling-logo.png" width="274" height="90" />
</section>
<section>
<h3>build simulation</h3>
</section>
<section>
<h3>run simulation multiple times / periodically</h3>
<ul>
<li>gather statistically significant results</li>
<li>establish a baseline</li>
<li>verify site does [not meet] SLAs</li>
<li>detect changes over time</li>
</ul>
</section>
<section>
<h3>analyze trends</h3>
<p>detect changes in trend with control charts</p>
</section>
<section>
<p>is process changing?</p>
<img src="images/UT_SmokeTestSignIn_LoadSignInPage.rchart.jpg"/>
</section>
<section>
<p>is process changing?</p>
<img src="images/UT_SmokeTestSignIn_SignIn.rchart.jpg"/>
</section>
</section>
<section>
<section>
<br/>
<br/>
<h2>stay operational when load exceeds expectations</h2>
</section>
<section>
<iframe width="640" height="360" src="http://www.youtube.com/embed/Xa6c3OTr6yA?feature=player_detailpage#t=10s" frameborder="0" allowfullscreen></iframe>
<br/>
<br/>
<p>the needs of the many outweigh the needs of the few or the one</p>
</section>
<section>
<br/>
<p>a <i>dead</i> site is no good to <i>anyone</i></p>
<br/>
<p class="fragment">know the site's limits and stay within them</p>
</section>
<section>
<p>implement a series of circuit breakers that can be tripped to <i>reduce load</i> in a <i>managed</i> way</p>
<br/>
<ul>
<li class="fragment">manual breakers, tripped by operations staff</li>
<li class="fragment">automatic breakers, tripped by software</li>
</ul>
<aside class="notes">
This is classic triage. It is much better to shut down or limit less critical, nice-to-have features rather than letting the whole site become unavailable.
Especially when starting out, there is no shame in tripping breakers manually. Having an experienced engineer
make a decision is rarely a bad thing when managing a non-trivial distributed system.
In particular, people are good at avoiding 'flapping' situations where a service oscillates between available and unavailable.
</aside>
</section>
<section>
<p>Example services ranked by criticality</p>
<ol reversed>
<li>send marketing email<span class="fragment" style="color:blue"> off-line / off-hours</span></li>
<li>update customer's dashboard<span class="fragment" style="color:goldenrod"> breaker #1</span></li>
<li>upload images<span class="fragment" style="color:orange"> breaker #2</span></li>
<li>render images<span class="fragment" style="color:red"> breaker #3</span></li>
<li>sign-in</li>
<li>checkout</li>
<li>save customer's work</li>
</ol>
</br>
</br>
<p class="fragment">there's usually a trade-off available</p>
<aside class="notes">
Of course, the definition of critical and non-critical totally depends on the business.
However, if you try hard enough, you can always rank services in order of criticality.
</aside>
</section>
</section>
<section>
<p>Resources</p>
<ul>
<li>This presentation: https://github.com/skuenzli/building-reliable-websites</li>
<li>Concurrency Limiting Filter: https://github.com/skuenzli/simplyreliable</li>
<li>Web Operations: http://shop.oreilly.com/product/0636920000136.do</li>
<li>Circuit Breaker Pattern: http://doc.akka.io/docs/akka/2.1.0-RC1/common/circuitbreaker.html</li>
<li>Gatling: http://gatling-tool.org</li>
<li>USL</li>
</ul>
</section>
<section>
<h1>fin</h1>
<i><small><a href="https://github.com/skuenzli">https://github.com/skuenzli</a></small></i>
</section>
</div>
<!-- The navigational controls UI -->
<aside class="controls">
<a class="left" href="#">&#x25C4;</a>
<a class="right" href="#">&#x25BA;</a>
<a class="up" href="#">&#x25B2;</a>
<a class="down" href="#">&#x25BC;</a>
</aside>
<!-- Presentation progress bar -->
<div class="progress"><span></span></div>
</div>
<script src="lib/js/head.min.js"></script>
<script src="js/reveal.min.js"></script>
<script>
// Full list of configuration options available here:
// https://github.com/hakimel/reveal.js#configuration
Reveal.initialize({
controls: true,
progress: true,
history: true,
theme: Reveal.getQueryHash().theme || 'serif', // available themes are in /css/theme
transition: Reveal.getQueryHash().transition || 'default', // default/cube/page/concave/linear(2d)
// Optional libraries used to extend on reveal.js
dependencies: [
{ src: 'lib/js/highlight.js', async: true, callback: function() { window.hljs.initHighlightingOnLoad(); } },
{ src: 'lib/js/classList.js', condition: function() { return !document.body.classList; } },
{ src: 'lib/js/showdown.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
{ src: 'lib/js/data-markdown.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
{ src: '/socket.io/socket.io.js', async: true, condition: function() { return window.location.host === 'localhost:1947'; } },
{ src: 'plugin/speakernotes/client.js', async: true, condition: function() { return window.location.host === 'localhost:1947'; } },
]
});
</script>
</body>
</html>