-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
2299 lines (1840 loc) · 157 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<!--[if IEMobile 7 ]><html class="no-js iem7"><![endif]-->
<!--[if lt IE 9]><html class="no-js lte-ie8"><![endif]-->
<!--[if (gt IE 8)|(gt IEMobile 7)|!(IEMobile)|!(IE)]><!--><html class="no-js" lang="en"><!--<![endif]-->
<head>
<meta charset="utf-8">
<title>Liam Kaufman</title>
<meta name="author" content="Liam Kaufman">
<meta name="description" content="UPDATE: Based on the feedback in the comments (Phineas), I’ve added an index to the comments table and updated the results. Using the right …">
<!-- http://t.co/dKP3o1e -->
<meta name="HandheldFriendly" content="True">
<meta name="MobileOptimized" content="320">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="canonical" href="http://liamkaufman.com">
<link href="/stylesheets/screen.css" media="screen, projection" rel="stylesheet" type="text/css">
<script src="/javascripts/modernizr-2.0.js"></script>
<script src="/javascripts/ender.js"></script>
<script src="/javascripts/octopress.js" type="text/javascript"></script>
<link href="http://feeds.feedburner.com/liamkaufman/uHFr" rel="alternate" title="Liam Kaufman" type="application/atom+xml">
<!--Fonts from Google"s Web font directory at http://google.com/webfonts -->
<link href="http://fonts.googleapis.com/css?family=PT+Serif:regular,italic,bold,bolditalic" rel="stylesheet" type="text/css">
<link href="http://fonts.googleapis.com/css?family=PT+Sans:regular,italic,bold,bolditalic" rel="stylesheet" type="text/css">
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-1004776-5']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
</head>
<body >
<header role="banner"><hgroup>
<h1><a href="/">Liam Kaufman</a></h1>
<h2>Software Developer and Entrepreneur</h2>
</hgroup>
</header>
<nav role="navigation"><ul class="subscription" data-subscription="rss">
<li><a href="http://feeds.feedburner.com/liamkaufman/uHFr" rel="subscribe-rss" title="subscribe via RSS">RSS</a></li>
</ul>
<form action="http://google.com/search" method="get">
<fieldset role="search">
<input type="hidden" name="q" value="site:liamkaufman.com" />
<input class="search" type="text" name="q" results="0" placeholder="Search"/>
</fieldset>
</form>
<ul class="main-navigation">
<li><a href="/about.html">About</a></li>
<li><a href="/projects.html">Projects</a></li>
<li><a href="/">Blog</a></li>
<li><a href="/blog/archives">Archives</a></li>
</ul>
</nav>
<div id="main">
<div id="content">
<div class="blog-index">
<article>
<header>
<h1 class="entry-title"><a href="/blog/2012/06/04/redis-and-relational-data/">Redis and Relational Data</a></h1>
<p class="meta">
<time datetime="2012-06-04T00:00:00-04:00" pubdate data-updated="true">Jun 4<span>th</span>, 2012</time>
</p>
</header>
<div class="entry-content"><p>
<strong>UPDATE:</strong> Based on the feedback in the comments (Phineas), I’ve added an index to the comments table and updated the results.
</p>
<p>
Using the right tool for the job is a basic tenant amongst programmers. However, with all the currently available database options it’s increasingly difficult to figure out what the right tool is. Sometime it’s nice to have a very simple tool that can be used for many different tasks: Redis. Over the last 4 months I’ve been using Redis heavily and I’ve even started to use it for relational data. I’ve been curious to find out the performance differences between Redis and PostgreSQL. Below I’ll provide an example of storing a simple relational dataset in Redis, and I’ll look at the performance differences between Redis and PostgreSQL.
</p>
<h2>Why use Redis for Relational data?</h2>
<p>
I find Redis appealing because it’s the simplest database that I have ever used (relative to: MySQL, PostgreSQL, Riak & Mongo). The documentation includes the time complexity of each command, and the documentation provides an interactive console to experiment with a given command. There’s also a certain appeal to using a single database instead of 2 or 3:
<ol>
<li>It’s much quicker to master 1 database than 2.</li>
<li>Two different databases means twice the updates, bugs and crashes.</li>
</ol>
</p>
<p>
I’ll outline a few ways Redis can be used to store relational data and the performance differences between redis and PostgreSQL. All the examples and performance tests were done using Node.js.
</p>
<h2>Storing Relational Data in Redis</h2>
<p>
Redis values can be 1 of 5 different datatypes: strings, hashes, lists, sets and sorted sets. Each row in a relational database can be represented using a hash, and a list, set or sorted set can be used to represent a table. The datatype that’s used to represent the table is dependent on how the data needs to be retrieved.
</p>
<p>
For example, let’s say we’re storing blog posts. In Redis, each post will be stored in its own hash, with its key corresponding to the post’s url:
</p>
<figure class='code'><figcaption><span>A Post</span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class='javascript'><span class='line'><span class="s1">'a-post-about-databases'</span> <span class="o">:</span>
</span><span class='line'> <span class="p">{</span> <span class="nx">title</span> <span class="o">:</span> <span class="s1">'A post about databases'</span><span class="p">,</span> <span class="nx">body</span> <span class="o">:</span> <span class="s1">'...'</span><span class="p">,</span> <span class="nx">createdAt</span> <span class="o">:</span> <span class="mi">1338751532301</span><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>
<p>
Retrieving a single post using the url becomes O(N), where N is the size of the hash (post). Since the number of keys in a post is constant, retrieving that post becomes O(1). However, if we wanted to get all the posts, or a subset of them, it becomes useful to also store the keys in a sorted set (e.g. the “table”). Using a sorted set means that posts can be stored by their createdAt date and it allows us to retrieve all the posts, or a subset of them (useful for pagination).
</p>
<figure class='code'><figcaption><span>Retrieving a subset of all posts</span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
</pre></td><td class='code'><pre><code class='javascript'><span class='line'><span class="nx">redis</span><span class="p">.</span><span class="nx">zrange</span><span class="p">(</span><span class="s1">'posts'</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="kd">function</span><span class="p">(</span><span class="nx">error</span><span class="p">,</span> <span class="nx">posts</span><span class="p">){</span>
</span><span class='line'> <span class="c1">//return the keys (urls) associated with the first 11 posts</span>
</span><span class='line'><span class="p">})</span>
</span><span class='line'>
</span><span class='line'><span class="kd">var</span> <span class="nx">startDate</span> <span class="o">=</span> <span class="p">(</span><span class="k">new</span> <span class="nb">Date</span><span class="p">(</span><span class="mi">2012</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">1</span><span class="p">)).</span><span class="nx">getTime</span><span class="p">()</span> <span class="p">;</span> <span class="c1">// June 1st</span>
</span><span class='line'><span class="kd">var</span> <span class="nx">endDate</span> <span class="o">=</span> <span class="p">(</span><span class="k">new</span> <span class="nb">Date</span><span class="p">(</span><span class="mi">2012</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">30</span><span class="p">)).</span><span class="nx">getTime</span><span class="p">();</span> <span class="c1">// June 30th</span>
</span><span class='line'><span class="nx">redis</span><span class="p">.</span><span class="nx">zrangebyscore</span><span class="p">(</span><span class="s1">'posts'</span><span class="p">,</span> <span class="nx">startDate</span><span class="p">,</span> <span class="nx">endDate</span><span class="p">,</span> <span class="kd">function</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span> <span class="nx">posts</span><span class="p">){</span>
</span><span class='line'> <span class="c1">//returns the keys (urls) associated with all the posts from June 2012</span>
</span><span class='line'><span class="p">})</span>
</span></code></pre></td></tr></table></div></figure>
<p>
The above example is relatively straight forward, but what about storing the post’s comments? For every post we create a new sorted set called: ‘comments-KEYofPOST’. The comments are sorted by their creation time. To get a post, and its comments, we could do the following:
</p>
<figure class='code'><figcaption><span>Storing a post’s comments</span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
</pre></td><td class='code'><pre><code class='javascript'><span class='line'><span class="kd">var</span> <span class="nx">postURL</span> <span class="o">=</span> <span class="s1">'a-post-about-databases'</span>
</span><span class='line'>
</span><span class='line'><span class="kd">var</span> <span class="nx">multi</span> <span class="o">=</span> <span class="nx">redis</span><span class="p">.</span><span class="nx">multi</span><span class="p">();</span>
</span><span class='line'>
</span><span class='line'><span class="c1">// queue up the queries</span>
</span><span class='line'><span class="nx">multi</span><span class="p">.</span><span class="nx">hgetall</span><span class="p">(</span><span class="nx">postURL</span><span class="p">);</span>
</span><span class='line'><span class="nx">multi</span><span class="p">.</span><span class="nx">zrange</span><span class="p">(</span><span class="s1">'comments-'</span> <span class="o">+</span> <span class="nx">postURL</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">);</span>
</span><span class='line'>
</span><span class='line'><span class="c1">// execute the queries atomically</span>
</span><span class='line'><span class="nx">multi</span><span class="p">.</span><span class="nx">exec</span><span class="p">(</span> <span class="kd">function</span><span class="p">(</span><span class="nx">error</span><span class="p">,</span> <span class="nx">results</span><span class="p">){</span>
</span><span class='line'> <span class="cm">/*</span>
</span><span class='line'><span class="cm"> results[0] will contain the post</span>
</span><span class='line'><span class="cm"> results[1] will contain an array with all the comments</span>
</span><span class='line'><span class="cm"> */</span>
</span><span class='line'><span class="p">});</span>
</span></code></pre></td></tr></table></div></figure>
<h2>Redis vs. PostgreSQL Performance</h2>
<p>
In SQL you might do 1 query to get the post and another to get the comments, or use a join to get the post and the comments in one query. With the approach above, using Redis, 2 queries are atomically executed, using the multi and exec commands. Both in PostgreSQL, and Redis, a single request is sent the database to retrieve 1 post and its 10 comments.
</p>
<p>
To test the the performance I created a dataset that includes 10,000 ‘blog posts’, with each post having 10 comments (100,000 comments in total). All tests were run on a 2011 Macbook Pro (2.3 GHz i7, 8GB RAM). To test PostgreSQL, I sequentially fetched each post and used a join to retrieve its comments (10,000 separate queries). The test was repeated six times to produce an average time and was done for both PostgreSQL and Redis.
</p>
<div class="table">
<h2>Redis & PostgreSQL Performance</h2>
<table>
<tr>
<th></th>
<th>Average Time (Seconds)</th>
<th>Query (Milliseconds)</th>
</tr>
<tr>
<td>psql</td>
<td>138.34</td>
<td>13.8</td>
</tr>
<tr>
<td>psql (Native Bindings - NB)</td>
<td>125.95</td>
<td>12.6</td>
</tr>
<tr>
<td>psql (NB + Index)</td>
<td>2.72</td>
<td>0.27</td>
</tr>
<tr>
<td>Redis (Hires)</td>
<td>0.76</td>
<td>0.067</td>
</tr>
</table>
</div>
<p>
Using PostgreSQL, it took an average of 138.34 seconds to execute all 10,000 queries, or 13.8 milliseconds/query. Using the native bindings, that come with the psql node module, yielded an improvement and was associated with 12.6 millisecons/query. When an index was added to comments (post_id), the time dropped to 2.72 seconds, or 0.27 seconds for a post and its 10 comments. In contrast, Redis can retrieve a post and its comments in 0.067 milliseconds. Of course the above is akin to comparing apples to oranges, but it still provides a glimpse into the performance differences between Redis and PostgreSQL.
</p>
<p>
While Redis is in memory and should be fast, PostgreSQL uses caching algorithms (<a href="http://archives.postgresql.org/pgsql-hackers/2007-11/msg00562.php">LRU</a>) to keep its contents in memory. Of course, keeping everything in memory (Redis) will most likely be faster than using LRU.
</p>
<h2>Caveats to using Redis for Relational Data</h2>
<p>
The single biggest caveat to using Redis, is that it is entirely in memory. If your relational dataset is 2.5GB (not that large), you’ll need a $160/month Linode (4GB RAM) to keep it in Redis. In contrast, a $20/month Linode (512MB RAM) has 20GB of disk space and could easily hold that same dataset using PostgreSQL. This tradeoff becomes even more of an issue as your dataset become larger than 4GB.
</p>
<p>
The above example only represents a very simple relationship between two pieces of data (posts and comments), mapping a many-to-many relationship in Redis would take a little more imagination.
</p>
<h2>Conclusions</h2>
<p>
Before storing all your app’s data in Redis it’s advisable to estimate how large your dataset will be in a year, or two, and how much much RAM will be required to use Redis. If your dataset will be greater than 4GB in a year, and money is a constraint, it probably makes sense to put all, or a portion of the data, in PostgreSQL, or use an alternative noSQL solution (e.g. Riak or Mongo).
</p>
<p>
<a class="github" href="https://github.com/liamks/Redis-and-Relational-Data"><span></span>Code on Github</a>
</p>
</div>
</article>
<article>
<header>
<h1 class="entry-title"><a href="/blog/2012/04/21/adding-authentication-waiting-lists-and-sign-ups-to-and-express-app-using-drawbridge-and-redis/">Adding Authentication, Waiting Lists and Sign Ups to an Express App Using Drawbridge.js and Redis</a></h1>
<p class="meta">
<time datetime="2012-04-21T00:00:00-04:00" pubdate data-updated="true">Apr 21<span>st</span>, 2012</time>
</p>
</header>
<div class="entry-content"><p>
There are several popular modules for adding password-based user authentication to an Express.js app. Unfortunately, they require writing lots of code to get started. I prefer the approach that authentication libraries like Devise take: they generate code and views, and you’re free to modify, or delete, what’s created.
</p>
<p>
Given the authentication options for Express.js I wanted to create a module that would make adding user authentication quick and easy. Moreover, I also wanted developers to be free to edit and modify the generated views. In addition to authentication I wanted the module to handle sign ups (the type you see on a just-launched startup’s page) and to handle waiting lists and invitations. Based on the module’s functionality I’ve decided to call it Drawbridge.js.
</p>
<h2>User Authentication with Drawbridge.js</h2>
<p>
Drawbridge.js uses Redis to persists its data, but it’s possible for developers to create other database adapters for Drawbridge (pull-requests accepted). I chose Redis because its ability to pipeline multiple commands reducing round trips between the server and the database. The atomic nature of pipelined commands obviates a lot of complex callbacks and makes the resulting code much easier to understand. Overall Redis is easy to use, easy to understand and fast - great features for an authentication module.
</p>
<p>
To send email, Drawbridge uses either nodemailer, or the postmark modules. I included the <a href="https://postmarkapp.com">Postmark</a> option because I’m currently using it and I like it. However, developers are free to add additional email adapters.
</p>
<h2>Drawbridge Screencast</h2>
<p>
I’ve created a short screencast to show how easy it is to add drawbridge to an existing Express.js application. Before you watch the screencast it’s important that I outline a couple of caveats:
</p>
<p>
<ol>
<li>Drawbridge is not ready for production - it’s basically a working prototype.</li>
<li>Drawbridge views and variables are inconsistently named, that will need to be fixed.</li>
<li>The code needs refactoring and more testing.</li>
<li>Drawbridge needs to be picked apart for security issues.</li>
</ol>
</p>
<p>
With those caveats out of the way here is the video:
</p>
<p>
<iframe src="http://player.vimeo.com/video/40780990?title=0&byline=0&portrait=0" width="400" height="300" frameborder="0" webkitAllowFullScreen mozallowfullscreen allowFullScreen></iframe><p><a href="http://vimeo.com/40780990">Drawbridge.js</a> from <a href="http://vimeo.com/user11381617">Liam Kaufman</a> on <a href="http://vimeo.com">Vimeo</a>.</p>
</p>
<p>
While I built Drawbridge.js to scratch my own itch, I hope others will find it useful as well. Once I refine it further I will most certainly start to use it in my own projects. If you’re interested in Drawbridge 1) watch the project on Github and 2) try and get it working on your toy Express apps. I welcome feedback on both the architecture of Drawbridge and its security.
</p>
<p>
<a class="github" href="https://github.com/liamks/Drawbridge.js"><span></span>Drawbridge.js on Github</a>
</p></div>
</article>
<article>
<header>
<h1 class="entry-title"><a href="/blog/2012/03/22/making-hacker-news-faster-two-approaches/">Making Hacker News Faster: Two Approaches</a></h1>
<p class="meta">
<time datetime="2012-03-22T00:00:00-04:00" pubdate data-updated="true">Mar 22<span>nd</span>, 2012</time>
</p>
</header>
<div class="entry-content"><p>
Over the years traffic to <a href="http://news.ycombinator.com">Hacker News (HN)</a>, “a social news website about computer hacking and startup companies” <a href="http://en.wikipedia.org/wiki/Hacker_News">(Wikipedia)</a>, has grown consistently, with an <a href=”http://www.ycombinator.com/images/hntraffic-5mar12.png”>average of 150,000 daily uniques</a>. The growth in traffic may explain why load times seem increasingly variable. I couldn’t help but wonder if some optimizations could be made to decrease both variability and load times. I’ll propose two broad approaches, the first involves migrating away from table based layouts while the second involves consuming a JSON API.
</p>
<h2> Approach 1: Tables to Divs </h2>
<div class="table">
<h2> Table 1. Hacker News Resource Statistics </h2>
<table>
<tr>
<th>Resource</th>
<th>Size (With Tables) </th>
<th>Size (With Divs) </th>
<th>% Change </th>
</tr>
<tr>
<td>HTML</td>
<td>26KB</td>
<td>15KB</td>
<td>-42%</td>
</tr>
<tr>
<td>CSS</td>
<td>1.7KB</td>
<td>2.3KB</td>
<td>+35%</td>
</tr>
<tr>
<td>Logo</td>
<td>100B</td>
<td>0</td>
<td>-100%</td>
</tr>
<tr>
<td>Up Arrow</td>
<td>111B</td>
<td>0</td>
<td>-100%</td>
</tr>
<tr>
<td>Total</td>
<td>27.9KB</td>
<td>17.3KB</td>
<td>-37.2%</td>
</tr>
</table>
<p>
In the DIV version, the logo and up arrow were base 64 encoded and included in the HTML and CSS files.
</p>
</div>
<p>
HN’s front page is comprised of: 4 tables, 98 rows, 159 columns, 37 inline style declarations and numerous attributes that dictate style. To reduce the markup on the front page I created a new HN front page (<a href="https://github.com/liamks/Making-HN-Faster/blob/master/HackerNewsV2.html">Github link</a>) that looks identical to the existing page but does not include tables or inline css. I also went a step further and base64 encoded both the logo and the up arrow to decrease the number of requests. The completed CSS file was run through a <a href="http://www.minifycss.com/css-compressor/">css minifyer</a> to yield further reductions. With those changes only two requests are necessary, one for the HTML file and one for the CSS file. Table 1 shows that those changes yielded an overall reduction of 37%.
</p>
<p>
I also slightly modified the JavaScript responsible for sending up-votes to the server. Instead of grabbing a vote’s id from the id of the HTML node, it gets it from the ‘data-id’ attribute. Otherwise, the JavaScript remains identical. As an aside, if you have not examined the JavaScript that is responsible for sending votes to the server, I’ve included it below (the existing code). It’s a creative use of an image tag. An image node is created, but not added to the DOM. When the image node is assigned a ‘src’, which happens to include all the vote info, it then requests the ‘image’, using the constructed url. Thus the ‘image’ request becomes analogous to an AJAX GET request, but without a conventional response.
</p>
<figure class='code'><figcaption><span>Votes With IMG Nodes</span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
</pre></td><td class='code'><pre><code class='javascript'><span class='line'><span class="kd">function</span> <span class="nx">byId</span><span class="p">(</span><span class="nx">id</span><span class="p">)</span> <span class="p">{</span>
</span><span class='line'> <span class="k">return</span> <span class="nb">document</span><span class="p">.</span><span class="nx">getElementById</span><span class="p">(</span><span class="nx">id</span><span class="p">);</span>
</span><span class='line'><span class="p">}</span>
</span><span class='line'>
</span><span class='line'><span class="kd">function</span> <span class="nx">vote</span><span class="p">(</span><span class="nx">node</span><span class="p">)</span> <span class="p">{</span>
</span><span class='line'> <span class="kd">var</span> <span class="nx">v</span> <span class="o">=</span> <span class="nx">node</span><span class="p">.</span><span class="nx">id</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="sr">/_/</span><span class="p">);</span> <span class="c1">// {'up', '123'}</span>
</span><span class='line'> <span class="kd">var</span> <span class="nx">item</span> <span class="o">=</span> <span class="nx">v</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
</span><span class='line'>
</span><span class='line'> <span class="c1">// hide arrows</span>
</span><span class='line'> <span class="nx">byId</span><span class="p">(</span><span class="s1">'up_'</span> <span class="o">+</span> <span class="nx">item</span><span class="p">).</span><span class="nx">style</span><span class="p">.</span><span class="nx">visibility</span> <span class="o">=</span> <span class="s1">'hidden'</span><span class="p">;</span>
</span><span class='line'> <span class="nx">byId</span><span class="p">(</span><span class="s1">'down_'</span> <span class="o">+</span> <span class="nx">item</span><span class="p">).</span><span class="nx">style</span><span class="p">.</span><span class="nx">visibility</span> <span class="o">=</span> <span class="s1">'hidden'</span><span class="p">;</span>
</span><span class='line'>
</span><span class='line'> <span class="c1">// ping server</span>
</span><span class='line'> <span class="kd">var</span> <span class="nx">ping</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">Image</span><span class="p">();</span>
</span><span class='line'> <span class="nx">ping</span><span class="p">.</span><span class="nx">src</span> <span class="o">=</span> <span class="nx">node</span><span class="p">.</span><span class="nx">href</span><span class="p">;</span>
</span><span class='line'>
</span><span class='line'> <span class="k">return</span> <span class="kc">false</span><span class="p">;</span> <span class="c1">// cancel browser nav</span>
</span><span class='line'><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>
<h2> Approach 2: JSON API </h2>
<p>
Although approach 1 results in a 37% decrease in data transferred to the client, markup and data must be transferred to the client on every refresh. In approach two, the markup is only transferred to the client once, and then cached, while the data is sent the client via JSON. Using this approach would decrease the HTML file but no doubt increase the JavaScript file. However, both of those resources could be cached in browser, and cached on a CDN, drastically reducing the number of requests to HN’s server. Furthermore, the JSON representing the stories on the front page <a href="https://github.com/liamks/Making-HN-Faster/blob/master/frontpage.json">is 7.8KB</a>, much smaller than the amount size of the existing solution or even approach 1.
</p>
<p>
Approach 2 is not without its drawbacks. It would require significant changes in both HN’s backend and large changes to the client-side. A JavaScript application and API would have to be created. This approach would likely be incompatible with bots that would not execute the JavaScript necessary to populate the page with stories. To get around this the agent-type could be detected and a static version could be served to bots. Alternatively, the webpage could be pre-populated with stories and subsequent requests would take advantage of AJAX get requests. This would simplify matters, but make caching more difficult, since the cache page would require updating every time the front page changes.
</p>
<h2> Conlusions </h2>
<p>
By transitioning from tables to divs, and inline css to external css, HN could dramatically reduce the bandwidth required to serve its web pages. The first approach would require minimal changes to HN’s back-end making it a good candidate for adoption. While the second approach could yield even better results, it would require drastic changes to both the server and the client, making it more suitable as a long term solution.
</p>
<p>
In addition to the two approaches above, gzip compressing both the .html and the .css would further reduce transferred data. It would also be beneficial to add the appropriate headers to enable browser caching for CSS.
</p>
<p>
While Paul Graham may have insufficient time, or interests, in implementing some of the above changes, I suspect he knows a few individuals who would be willing to help out.
</p>
<p>
<a class="github" href="https://github.com/liamks/Making-HN-Faster"><span></span>Code on Github</a>
</p>
</div>
</article>
<article>
<header>
<h1 class="entry-title"><a href="/blog/2012/03/22/from-digg-to-reddit-to-hacker-news-whats-next/">From Digg to Reddit to Hacker News: What’s Next?</a></h1>
<p class="meta">
<time datetime="2012-03-22T00:00:00-04:00" pubdate data-updated="true">Mar 22<span>nd</span>, 2012</time>
</p>
</header>
<div class="entry-content"><p>
Dustin Curtis, the creator of <a href="http://svbtle.com/">Svbtle</a>, recently mentioned on <a href="https://twitter.com/#!/dcurtis/status/182986897444966402">Twitter</a>:
</p>
<blockquote>
I miss the Hacker News from four years ago. It was awesome. The discussions there are not even worth reading anymore. It’s sad.
</blockquote>
<p>
Based on the number of retweets and favorites, I suspect that others agree. In fact the idea that Hacker News is degrading is common enough that it has been addressed on <a href="http://ycombinator.com/newsguidelines.html">HN’s guidelines</a>:
</p>
<blockquote>
If your account is less than a year old, please don’t submit comments saying that HN is turning into Reddit. (It’s a common semi-noob illusion.).
</blockquote>
<p>
However, Mr Curtis has been on HN for <a href="http://news.ycombinator.com/user?id=dcurtis">over five years</a>, and is certainly not subject to the ‘common semi-noob illusion’. Is HN getting worse then?
</p>
<p>
I suspect that when HN started only those that were most passionate about hacking were familiar with HN and would take the time to comment on it. As time went on the popularity of both Y Combinator and HN may have resulted in the level of discourse regressing to the mean. That’s not to say that there aren’t still intelligent comments, in fact I’d argue that there are more intelligent comments than there were 4 or 5 years ago. However, people tend to remember the unintelligent comments more, especially when those comments contain opinions that differ from their own or originate from non-experts.
</p>
<p>
If Digg, Reddit and Hacker News are no longer the best places for discussion how can we create a place that is? While there ought to be many ways to encourage scholarly discussion, and discourage idiotic comments, I want to explore several ideas.
</p>
<h2>Exclusivity</h2>
<p>
In the early stages Digg, Reddit and Hacker News were implicitly exclusive. They didn’t discourage people from joining, but their initial lack of popularity acted as a filter to those who were technically savvy and within certain social networks. Once the exclusivity vanished the communities became diluted. <a href="http://forrst.com">Forrst</a> is explicitly exclusive and is by invitation only. Does Forrst’s exclusivity lead to a stronger community? To reiterate, is exclusivity a necessity in keeping a social news site strong and viable?
</p>
<h2>Experts</h2>
<p>
I enjoy when an article pops up on HN about physics or biology and several graduate students in those fields provide intelligent comments. Is there a way to officially denote that someone is an expert in a field and automatically give their comments more weight? In very esoteric subjects it isn’t necessary, the complexity of the subject reduces “average” comments. However, in simpler subjects, <a href="http://en.wikipedia.org/wiki/Parkinson's_Law_of_Triviality">bikeshedding</a> becomes an issue. Could bikeshedding be prevented by weighting comments based on the user’s past comments? For instance, if an individual has been voted up when discussing physics, perhaps future comments on physics should be algorithmically voted up.
</p>
<h2>Focus</h2>
<p>
One thing I really appreciate about HN is the variety of content. However, I can’t help but wonder if a social news site restricted the content to just a specific topic if that would both decrease the probability that the content, and discussion, become watered down? Forrst focuses on design and development, has their focus helped them? Reddit has addressed this issue with subreddits, but the result seems to be many hardly used subreddits. I think focus is important, but at the same time I like being exposed to new topics, can the those two wishes be balanced?
</p>
<h2>Conclusions</h2>
<p>
There is no magic bullet for maintaining the quality of a social news site, but there are a collection of concepts that may help. It would be interesting to A/B test some of those ideas. One could imagine creating several social news websites for different topics and making some exclusive and some not, or altering other variables and see which succeed. What do you think is important in a social news site?
</p></div>
</article>
<article>
<header>
<h1 class="entry-title"><a href="/blog/2012/03/08/scraping-web-pages-with-jquery-nodejs-and-jsdom/">Scraping Web Pages With jQuery, Node.js and Jsdom</a></h1>
<p class="meta">
<time datetime="2012-03-08T00:00:00-05:00" pubdate data-updated="true">Mar 8<span>th</span>, 2012</time>
</p>
</header>
<div class="entry-content"><p>
I always found it odd that accessing DOM elements with Ruby, or Python, wasn’t as easy as it was with jQuery. Many HTML parsing libraries employ Simple API for XML (SAX) that can handle extremely large XML documents, but is cumbersome and adds complexity. Other parsing libraries use XML Path Language (XPath), which is conceptually simpler than SAX, but still more of an effort than jQuery. I was pleasantly surprised to discover that it’s possible to use jQuery to parse web pages with Node.js. This is accomplished by using <a href="https://github.com/tmpvar/jsdom">jsdom, “a javascript implementation of the W3C DOM”</a>.
</p>
<h2>jQuery and jsdom</h2>
<p>
Using jsdom you can specify a local file, or url, and jsdom will return the <code>window</code> object for that document. Additionally, JavaScript can be inserted into the document; in our case we’re inserting the jQuery library. In the example below all the links from the Hacker News front page are logged to the console.
</p>
<figure class='code'><figcaption><span>Scraping Links From Hacker News</span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
</pre></td><td class='code'><pre><code class='javascript'><span class='line'><span class="kd">var</span> <span class="nx">jsdom</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'jsdom'</span><span class="p">);</span>
</span><span class='line'><span class="kd">var</span> <span class="nx">fs</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'fs'</span><span class="p">);</span>
</span><span class='line'><span class="kd">var</span> <span class="nx">jquery</span> <span class="o">=</span> <span class="nx">fs</span><span class="p">.</span><span class="nx">readFileSync</span><span class="p">(</span><span class="s2">"./jquery-1.7.1.min.js"</span><span class="p">).</span><span class="nx">toString</span><span class="p">();</span>
</span><span class='line'>
</span><span class='line'><span class="nx">jsdom</span><span class="p">.</span><span class="nx">env</span><span class="p">({</span>
</span><span class='line'> <span class="nx">html</span><span class="o">:</span> <span class="s1">'http://news.ycombinator.com/'</span><span class="p">,</span>
</span><span class='line'> <span class="nx">src</span><span class="o">:</span> <span class="p">[</span>
</span><span class='line'> <span class="nx">jquery</span>
</span><span class='line'> <span class="p">],</span>
</span><span class='line'> <span class="nx">done</span><span class="o">:</span> <span class="kd">function</span><span class="p">(</span><span class="nx">errors</span><span class="p">,</span> <span class="nb">window</span><span class="p">)</span> <span class="p">{</span>
</span><span class='line'> <span class="kd">var</span> <span class="nx">$</span> <span class="o">=</span> <span class="nb">window</span><span class="p">.</span><span class="nx">$</span><span class="p">;</span>
</span><span class='line'> <span class="nx">$</span><span class="p">(</span><span class="s1">'a'</span><span class="p">).</span><span class="nx">each</span><span class="p">(</span><span class="kd">function</span><span class="p">(){</span>
</span><span class='line'> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span> <span class="nx">$</span><span class="p">(</span><span class="k">this</span><span class="p">).</span><span class="nx">attr</span><span class="p">(</span><span class="s1">'href'</span><span class="p">)</span> <span class="p">);</span>
</span><span class='line'> <span class="p">});</span>
</span><span class='line'> <span class="p">}</span>
</span><span class='line'><span class="p">});</span>
</span></code></pre></td></tr></table></div></figure>
<h2>Making Scraping More Robust</h2>
<p>
Unfortunately there are few common bugs that I ran into when scraping content with jQuery and jsdom. Specifically there are two issues, that aren’t necessarily specific to jsdom, that are worth watching out for.
</p>
<h3>jQuery Return Values</h3>
<p>
The first issue are return values from jQuery function calls. Extra attention has to be paid to return values. Applying a method to <code>undefined</code> will crash a program, a problem that can be especially apparent in DOM parsing. Consider the example below:
</p>
<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='javascript'><span class='line'><span class="nx">$</span><span class="p">(</span><span class="nx">$</span><span class="p">(</span><span class="s1">'a'</span><span class="p">)[</span><span class="mi">7</span><span class="p">]).</span><span class="nx">attr</span><span class="p">(</span><span class="s1">'href'</span><span class="p">).</span><span class="nx">split</span><span class="p">(</span><span class="s1">'/'</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>
<p>
If there are 8, or more, links on a page the 8th link will be returned and its href attribute will be split into an array. However, if there are less than 8 the <code>attr('href')</code> will return <code>undefined</code> and calling <code>split()</code> on it will crash the program. Since HTML pages aren’t as structured as API responses, its important not to assume too much and always check return values.
</p>
<h3>Web Page Errors</h3>
<p>
It’s entirely possible that the url passed to jsdom returns an error. If the error is temporary, your scraper might miss out on important information. This issue can be mitigated by recursively retrying the url, like the example below:
</p>
<figure class='code'><figcaption><span>Managing Errors</span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
</pre></td><td class='code'><pre><code class='javascript'><span class='line'><span class="kd">var</span> <span class="nx">getLinks</span> <span class="o">=</span> <span class="kd">function</span><span class="p">(</span><span class="nx">retries</span><span class="p">){</span>
</span><span class='line'>
</span><span class='line'> <span class="k">if</span><span class="p">(</span><span class="nx">retries</span> <span class="o">===</span> <span class="mi">3</span><span class="p">){</span>
</span><span class='line'> <span class="k">return</span><span class="p">;</span>
</span><span class='line'> <span class="p">}</span><span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="nx">retries</span> <span class="o">===</span> <span class="kc">undefined</span><span class="p">){</span>
</span><span class='line'> <span class="nx">retries</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
</span><span class='line'> <span class="p">}</span>
</span><span class='line'>
</span><span class='line'> <span class="nx">jsdom</span><span class="p">.</span><span class="nx">env</span><span class="p">({</span>
</span><span class='line'> <span class="nx">html</span><span class="o">:</span> <span class="s1">'http://news.ycombinator.com/'</span><span class="p">,</span>
</span><span class='line'> <span class="nx">src</span><span class="o">:</span> <span class="p">[</span>
</span><span class='line'> <span class="nx">jquery</span>
</span><span class='line'> <span class="p">],</span>
</span><span class='line'> <span class="nx">done</span><span class="o">:</span> <span class="kd">function</span><span class="p">(</span><span class="nx">errors</span><span class="p">,</span> <span class="nb">window</span><span class="p">)</span> <span class="p">{</span>
</span><span class='line'>
</span><span class='line'> <span class="k">if</span><span class="p">(</span><span class="nx">errors</span><span class="p">){</span>
</span><span class='line'> <span class="k">return</span> <span class="nx">getLinks</span><span class="p">(</span><span class="nx">retries</span> <span class="o">+</span> <span class="mi">1</span><span class="p">);</span>
</span><span class='line'> <span class="p">}</span>
</span><span class='line'>
</span><span class='line'> <span class="kd">var</span> <span class="nx">$</span> <span class="o">=</span> <span class="nb">window</span><span class="p">.</span><span class="nx">$</span><span class="p">;</span>
</span><span class='line'> <span class="nx">$</span><span class="p">(</span><span class="s1">'a'</span><span class="p">).</span><span class="nx">each</span><span class="p">(</span><span class="kd">function</span><span class="p">(){</span>
</span><span class='line'> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span> <span class="nx">$</span><span class="p">(</span><span class="k">this</span><span class="p">).</span><span class="nx">attr</span><span class="p">(</span><span class="s1">'href'</span><span class="p">)</span> <span class="p">);</span>
</span><span class='line'> <span class="p">});</span>
</span><span class='line'> <span class="p">}</span>
</span><span class='line'> <span class="p">});</span>
</span><span class='line'><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>
<p>
With the above approach, if errors are encountered <code>getLinks</code> will be called recursively with a larger <code>retries</code> value. On the 3rd retry the function will return. If you wanted to go further you could wrap the recursive call in <code>setTimeout</code> to ensure that the recursive web request was not made immediately after the error was encountered.
</p>
<h2>Conclusions</h2>
<p>
Parsing web pages with jQuery on the server is a much more natural experience for developers already accustomed to using jQuery in the client. However, prior to scraping it’s worth checking if the site 1) allows scraping and 2) does not already have an API. Consuming a JSON API would be even easy than scraping and parsing!
</p></div>
</article>
<article>
<header>
<h1 class="entry-title"><a href="/blog/2012/03/01/why-riak-and-nodejs-make-a-great-pair/">Why Riak and Node.js Make a Great Pair</a></h1>
<p class="meta">
<time datetime="2012-03-01T00:00:00-05:00" pubdate data-updated="true">Mar 1<span>st</span>, 2012</time>
</p>
</header>
<div class="entry-content"><p>
In the last few years there has been a proliferation of noSQL databases. Searching on Google for <code>site:news.ycombinator.com nosql</code> yields over 2,500 hits, many of which include include posts asking when you’d want to use a noSQL database. If you’re used to a relational database it might seem like an unnecessary burden to learn another database paradigm, but there’s one open source noSQL database that I think is not only worth the burden, but is a perfect fit for Node.js development: <a href=”http://wiki.basho.com/What-is-Riak%3F.html”>Riak (pronounced “REE-ack”)</a>.
</p>
<h2>Why is Riak a Good Fit with JavaScript and Node.js?</h2>
<p>
<a href="http://riakjs.org/">Riak-js</a> makes storing JavaScript objects easy. There is no need to <code>JSON.stringify()</code> a JavaScript object when saving it, or applying <code>JSON.parse()</code> when retrieving.
</p>
<figure class='code'><figcaption><span>Riak and Node.js Basics</span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
</pre></td><td class='code'><pre><code class='javascript'><span class='line'><span class="kd">var</span> <span class="nx">db</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'riak-js'</span><span class="p">).</span><span class="nx">getClient</span><span class="p">();</span>
</span><span class='line'><span class="kd">var</span> <span class="nx">post</span> <span class="o">=</span> <span class="p">{</span><span class="nx">id</span><span class="o">:</span> <span class="mi">17</span><span class="p">,</span>
</span><span class='line'> <span class="nx">date</span><span class="o">:</span> <span class="k">new</span> <span class="nb">Date</span><span class="p">(),</span>
</span><span class='line'> <span class="nx">title</span><span class="o">:</span> <span class="s1">'a blog post'</span><span class="p">,</span>
</span><span class='line'> <span class="nx">body</span><span class="o">:</span> <span class="s1">'A blog post about Riak'</span><span class="p">};</span>
</span><span class='line'>
</span><span class='line'><span class="nx">db</span><span class="p">.</span><span class="nx">save</span><span class="p">(</span><span class="s1">'posts'</span><span class="p">,</span> <span class="nx">post</span><span class="p">.</span><span class="nx">id</span><span class="p">,</span> <span class="nx">post</span><span class="p">);</span>
</span><span class='line'><span class="nx">db</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="s1">'posts'</span><span class="p">,</span> <span class="mi">17</span><span class="p">,</span> <span class="kd">function</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span> <span class="nx">data</span><span class="p">,</span> <span class="nx">meta</span><span class="p">){</span>
</span><span class='line'> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">data</span><span class="p">);</span>
</span><span class='line'><span class="p">});</span>
</span><span class='line'><span class="cm">/* prints</span>
</span><span class='line'><span class="cm">{ id: 17,</span>
</span><span class='line'><span class="cm"> date: '2012-02-29T18:26:44.400Z',</span>
</span><span class='line'><span class="cm"> title: 'a blog post',</span>
</span><span class='line'><span class="cm"> body: 'A blog post about Riak' }</span>
</span><span class='line'><span class="cm">*/</span>
</span></code></pre></td></tr></table></div></figure>
<p>
In the above example, a riak-js client is created and a post object is created. The post object is saved into the ‘posts’ bucket, with its id as its key. To retrieve the post, the bucket and key are referenced.
</p>
<p>
Using JavaScript in the client, and the server are nice, and being able to easily save JavaScript objects is even better. Not having to switch between languages would certainly reduce annoying syntactic problems. This would also allow you to write the entire stack in JavaScript, CoffeeScript or ClojureScript, giving you several programming paradigms to choose from.
</p>
<p>
If you decided to use <A href="http://andyet.net/blog/2011/feb/15/re-using-backbonejs-models-on-the-server-with-node/">Backbone.js on the server</a>, calling <code>toJSON()</code> on a Backbone model would allow you to easily store the model in the Riak.
</p>
<h2>Why Riak?</h2>
<p>
At this point you might be wondering why you’d want to use Riak when CouchDB also has a JavaScript interface and can do some of the above. As Damien Katz, the creator of CouchDB has pointed out, <a href="http://damienkatz.net/2012/01/why_couchbase.html">CouchDB is slow and can’t “scale-out on it’s own”</a>. In contrast, Riak was built for replication and scaling out. In fact, people at Basho, the company behind Riak, <a href="http://lists.basho.com/pipermail/riak-users_lists.basho.com/2010-April/000876.html">indicate that adding new nodes actually increases throughput</a>.
</p>
<h2>Additional Features of Riak</h2>
<p>
Saving JavaScript objects and easy scaling are both good fits with node.js but Riak has some additional features, such as buckets and links, that make retrieval convenient. With buckets the following functions become possible:
</p>
<figure class='code'><figcaption><span>Riak Buckets</span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
</pre></td><td class='code'><pre><code class='javascript'><span class='line'><span class="c1">// Get all the posts within the posts bucket</span>
</span><span class='line'><span class="nx">db</span><span class="p">.</span><span class="nx">getAll</span><span class="p">(</span><span class="s1">'posts'</span><span class="p">);</span>
</span><span class='line'>
</span><span class='line'><span class="c1">// Get all the posts with the title === 'a blog post' </span>
</span><span class='line'><span class="nx">db</span><span class="p">.</span><span class="nx">getAlll</span><span class="p">(</span><span class="s1">'posts'</span><span class="p">,</span> <span class="p">{</span> <span class="nx">where</span><span class="o">:</span> <span class="p">{</span><span class="nx">title</span><span class="o">:</span> <span class="s1">'a blog post'</span> <span class="p">}});</span>
</span><span class='line'>
</span><span class='line'><span class="c1">// Get the number of posts</span>
</span><span class='line'><span class="nx">db</span><span class="p">.</span><span class="nx">count</span><span class="p">(</span><span class="s1">'posts'</span><span class="p">);</span>
</span></code></pre></td></tr></table></div></figure>
<p>
Another interesting property of Riak, is it’s concepts of links. A link establishes a “one-way relationships between objects in Riak”. For instance, say we wanted to link similar posts the following would do:
</p>
<figure class='code'><figcaption><span>Riak Links</span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
</pre></td><td class='code'><pre><code class='javascript'><span class='line'><span class="kd">var</span> <span class="nx">aNewPost</span> <span class="o">=</span> <span class="p">{</span><span class="nx">id</span><span class="o">:</span> <span class="mi">18</span><span class="p">,</span>
</span><span class='line'> <span class="nx">date</span><span class="o">:</span> <span class="k">new</span> <span class="nb">Date</span><span class="p">(),</span>
</span><span class='line'> <span class="nx">title</span><span class="o">:</span> <span class="s1">'a second blog post'</span><span class="p">,</span>
</span><span class='line'> <span class="nx">body</span><span class="o">:</span> <span class="s1">'blog post about Riak part 2'</span><span class="p">};</span>
</span><span class='line'>
</span><span class='line'><span class="c1">// Save the second post, with a link to the first post</span>
</span><span class='line'><span class="nx">db</span><span class="p">.</span><span class="nx">save</span><span class="p">(</span><span class="s1">'posts'</span><span class="p">,</span> <span class="nx">aNewPost</span><span class="p">.</span><span class="nx">id</span><span class="p">,</span> <span class="nx">aNewPost</span><span class="p">,</span>
</span><span class='line'> <span class="p">{</span> <span class="nx">links</span><span class="o">:</span> <span class="p">[</span> <span class="p">{</span><span class="nx">bucket</span><span class="o">:</span> <span class="s1">'posts'</span><span class="p">,</span> <span class="s1">'key'</span><span class="o">:</span> <span class="mi">17</span> <span class="p">}</span> <span class="p">]});</span>
</span><span class='line'>
</span><span class='line'><span class="nx">db</span><span class="p">.</span><span class="nx">walk</span><span class="p">(</span><span class="s1">'posts'</span><span class="p">,</span> <span class="s1">'18'</span><span class="p">,</span> <span class="p">[{</span><span class="nx">bucket</span><span class="o">:</span><span class="s1">'posts'</span><span class="p">,</span><span class="nx">tag</span><span class="o">:</span><span class="s1">'_'</span><span class="p">}]);</span>
</span><span class='line'><span class="c1">// db.walk, traverses object 18's links, </span>
</span><span class='line'><span class="c1">// which happens to be post 17, and returns them.</span>
</span></code></pre></td></tr></table></div></figure>
<h2>Conclusions</h2>
<p>
While the above assessment is pretty rosey, Riak shouldn’t be the only database in your toolbox. Redis’ pub/sub and sorted sets are unmatched in Riak. If you’re running <a href="http://labs.linkfluence.net/nosql/2011/03/07/moving_from_couchdb_to_riak.html">map/reduce over large datasets, you’re likely better off using Hadoop</a>. Conversely if your already well-versed in SQL, and your data is relational, using Postgresql is probably a better fit. Despite those caveats being able to easily scale your database, and save JavaScript objects, is a pretty compelling reason to use Riak with Node.js
</p>
<h2>Further Riak and Node.js Reading</h2>
<ul>
<li><a href="http://www.slideshare.net/seancribbs/riak-with-nodejs">Riak with Node.js</a></li>
<li><a href="http://blogs.digitar.com/jjww/2011/03/riak-vs-couchdb-for-storing-100000-coupons/">Riak vs Couchdb for storing 100000 Coupons</a></li>
<li><a href="http://labs.linkfluence.net/nosql/2011/03/07/moving_from_couchdb_to_riak.html">Moving from Couchdb to Riak</a></li>
<li><a href="http://siculars.posterous.com/paginating-with-riak">Pagination with Riak</a></li>
</ul></div>
</article>
<article>
<header>
<h1 class="entry-title"><a href="/blog/2012/02/25/adding_real-time_to_rails_with_socket.IO_nodejs_and_backbonejs_with_demo/">Adding Real-Time to Rails With Socket.IO, Node.js and Backbone.js (With Demo)</a></h1>
<p class="meta">
<time datetime="2012-02-25T00:00:00-05:00" pubdate data-updated="true">Feb 25<span>th</span>, 2012</time>
</p>
</header>
<div class="entry-content"><p>
<a href="http://node-chatty.herokuapp.com/chatty" target="_blank"><img src="/images/chatty-screen.png"></a>
</p>
<p>
Despite the <a href="http://gilesbowkett.blogspot.in/2012/02/rails-went-off-rails-why-im-rebuilding.html">recent distaste for Rails</a>, I still think its a nice framework for developing websites (e.g. devise & active record). However, if you want real-time communication Socket.IO and Node.js seem to be the best options. If you already have an existing Rails application porting the entire application to Node.js is likely not on option. Fortunately, it is relatively easy to use Rails to serve your client-side Socket.IO web application, while Node.js and Socket.IO are used for real-time communication. The primary goal of this article is to show one method of integrating a real-time application, that is slightly more complex than a todo app, with Rails. Thus, I created Chatty, a simple chat room web application that allows a user to see all the messages in the chat room, or filter the messages by user. <a href="http://twitter.github.com/bootstrap/index.html">Twitter’s Bootstrap</a> was used for the CSS and modal dialogue.
</p>
<p>
<a class="github" href="https://github.com/liamks/Chatty"><span></span>Code on Github</a>
</p>
<p>
Rather than explain the code step-by-step, I’ll provide a high level overview of:
<ul>
<li>File organization</li>
<li>JavaScript Templates and EJS</li>
<li>Application Archicture and Publish/Subscribe</li>
<li>Module Architecture</li>
<li>Deploying to Heroku</li>
</ul>
<p>
<h2>File Organization</h2>
<p>
The entire client-side Backbone.js application is within <code>app/assets/javascripts</code>. Using a JavaScript manifest file (<code>backboneApp.js</code>) all of the application’s JavaScript files are specified.
</p>
<figure class='code'><figcaption><span>Manifest file (app/assets/javasripts/bacboneApp.js)</span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
</pre></td><td class='code'><pre><code class='text'><span class='line'>//= require jquery
</span><span class='line'>//= require bootstrap
</span><span class='line'>//= require underscore
</span><span class='line'>//= require backbone
</span><span class='line'>//= require socket.io
</span><span class='line'>//= require app
</span></code></pre></td></tr></table></div></figure>
<p>
The Backbone application is within the <code>app</code> folder, which also has a manifest file. The manifest files describe all the JavaScript files that comprise the application. Within the application’s html file only a single line of code is needed to include the manifest file: <code>=javascript_include_tag "backboneApp"</code> (haml for templating). The actual organization of the files is as follows:
</p>
<figure class='code'><figcaption><span>app/assets</span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>