-
Notifications
You must be signed in to change notification settings - Fork 1
/
atom.xml
1973 lines (1613 loc) · 189 KB
/
atom.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title><![CDATA[Mark S. Rasmussen]]></title>
<link href="/atom.xml" rel="self"/>
<link href="http://improve.dk/"/>
<updated>2014-08-25T07:22:59.326Z</updated>
<id>http://improve.dk/</id>
<author>
<name><![CDATA[Mark S. Rasmussen]]></name>
<email><![CDATA[mark@improve.dk]]></email>
</author>
<generator uri="http://zespia.tw/hexo/">Hexo</generator>
<entry>
<title><![CDATA[New Horizons]]></title>
<link href="http://improve.dk/new-horizons/"/>
<id>http://improve.dk/new-horizons/</id>
<published>2014-08-25T00:00:00.000Z</published>
<updated>2014-08-25T07:22:22.000Z</updated>
<content type="html"><![CDATA[<p>Over the last years I’ve been doing less and less coding while doing more and more management. As of today I’m taking it a step further as I’m assuming the role of CEO at <a href="(http://www.ipaper.dk">iPaper A/S</a>.</p>
<a id="more"></a>
<p><a href="http://www.ipaper-cms.com/pages/ipaper-as-appoints-new-ceo/" target="_blank">Official press release</a></p>
<p>While this does mean less coding at work, I won’t be going full manager. I’ll still be reverse engineering databases in my spare time, I’ll still be debugging interesting problems, just as I’ll still continue contributing to open source.</p>
<h2 id="Me,_Myself_&_SQL_Server">Me, Myself & SQL Server</h2>
<p>Though I won’t be giving up coding, this will have an impact on the amount of time I can spend on extracurricular activies, such as presenting. Unfortunately this also means I won’t be able to participate in neither the MVP nor the SQL PASS summits this year. I had really looked forward to joining the MVP summit for the first time, but unfortunately I will have to prioritize time differently for now.</p>
<p>This is <strong>not</strong> a goodbye to the family, simply an explanation for why I won’t be seeing you this November. </p>
]]></content>
<summary type="html"><![CDATA[<p>Over the last years I’ve been doing less and less coding while doing more and more management. As of today I’m taking it a step further as I’m assuming the role of CEO at <a href="(http://www.ipaper.dk">iPaper A/S</a>.</p>
]]></summary>
<category term="Misc" scheme="http://improve.dk/category/Misc/"/>
</entry>
<entry>
<title><![CDATA[Presenting at Microsoft DevCon 2014]]></title>
<link href="http://improve.dk/presenting-at-microsoft-devcon-2014/"/>
<id>http://improve.dk/presenting-at-microsoft-devcon-2014/</id>
<published>2014-05-18T13:27:00.000Z</published>
<updated>2014-05-19T07:07:16.000Z</updated>
<content type="html"><![CDATA[<p>I’m happy to announce that I’ll be presenting at <a href="http://www.msdevcon.ru/en/" target="_blank">Microsoft DevCon 2014</a> in Russia!</p>
<a id="more"></a>
<div class="imgwrapper" style=""><div><a href="/presenting-at-microsoft-devcon-2014/devcon.png" class="fancy"><img src="/presenting-at-microsoft-devcon-2014/devcon.png" style="max-height: 250px"/></a></div></div>
<p>While visiting Russia will be a first-time experience for me, the topic I’m presenting on is not. In just 30 minutes I will try to give an overview of not only how SQL Server stores data internally, but also how it keeps track of that data.</p>
<h2 id="Full_Abstract">Full Abstract</h2>
<blockquote>
<p><strong>Understanding SQL Server Data Files at the Byte Level</strong></p>
<p><em>Think SQL Server is magical? You’re right! However, there’s some sense to the magic, and that’s what I’ll show you in this level 500 deep dive session. I will walk you through the internal storage format of MDF files, how we might go about parsing a complete database ourselves, using nothing but a hex editor. I will cover how SQL Server stores its own internal metadata about objects, how it knows where to find your data on disk, and once it finds it, how to read it.</em></p>
</blockquote>
]]></content>
<summary type="html"><![CDATA[<p>I’m happy to announce that I’ll be presenting at <a href="http://www.msdevcon.ru/en/" target="_blank">Microsoft DevCon 2014</a> in Russia!</p>
]]></summary>
<category term="Conferences and Presenting" scheme="http://improve.dk/category/Conferences%20and%20Presenting/"/>
</entry>
<entry>
<title><![CDATA[Redirecting Old Permalinks on Statically Generated Blogs]]></title>
<link href="http://improve.dk/redirecting-old-permalinks-on-statically-generated-blogs/"/>
<id>http://improve.dk/redirecting-old-permalinks-on-statically-generated-blogs/</id>
<published>2014-04-20T23:56:12.000Z</published>
<updated>2014-05-04T16:12:23.000Z</updated>
<content type="html"><![CDATA[<p>Having <a href="/migrating-from-wordpress-to-hexo/">just migrated from Wordpress to Hexo</a>, I quickly realized I forgot something. I forgot to redirect my old permalinks to the new ones…</p>
<h2 id="Permalinks_Aren’t_Necessarily_Permanent">Permalinks Aren’t Necessarily Permanent</h2>
<p>A permalink ought to live for the duration of your content, and most importantly, never change. However, having been through a number of different blog engines, not all of them support the same permalink structures, and might not even support redirecting old ones. As such, throgh the years my posts have ended up with multiple permalinks:</p>
<ul>
<li><a href="http://improve.dk/archive/2008/03/23/sql-server-mirroring-a-practical-approach.aspx">http://improve.dk/archive/2008/03/23/sql-server-mirroring-a-practical-approach.aspx</a></li>
<li><a href="http://improve.dk/blog/2008/03/23/sql-server-mirroring-a-practical-approach/">http://improve.dk/blog/2008/03/23/sql-server-mirroring-a-practical-approach/</a></li>
<li><a href="http://improve.dk/sql-server-mirroring-a-practical-approach/">http://improve.dk/sql-server-mirroring-a-practical-approach/</a></li>
</ul>
<p>As you can see, I’ve dropped both the /archive/ and the /blog/ prefixes, as well as the dates. Redirecting old incoming links to the new ones was easy enough when I ran Wordpress on Apache. All it requires were a couple of lines in the .htaccess file:</p>
<figure class="highlight"><pre><span class="comment"># Redirect old permalink structure</span>
<span class="tag"><IfModule mod_rewrite.c></span>
<span class="keyword"><span class="common">RewriteEngine</span></span> <span class="literal">On</span>
<span class="keyword"><span class="common">RewriteRule</span></span> ^archive/([0-9]{4})/([0-9]{2})/([0-9]{2})/([^\.]+)\.aspx$ http://improve.dk/<span class="number">$4</span>/<span class="sqbracket"> [NC,R=301,L]</span>
<span class="keyword"><span class="common">RewriteRule</span></span> ^blog/([0-9]{4})/([0-9]{2})/([0-9]{2})/([^\.]+)$ http://improve.dk/<span class="number">$4</span>/<span class="sqbracket"> [NC,R=301,L]</span>
<span class="tag"></IfModule></span>
</pre></figure>
<h2 id="Static_Woes">Static Woes</h2>
<p>Since I’ve migrated to <a href="http://hexo.io" target="_blank">Hexo</a> it’s not as simple, unfortunately. I no longer host my site on Apache, but on <a href="https://pages.github.com/" target="_blank">GitHub Pages</a>. GitHub Pages only allow static files to be served, so I’m no longer able to utilize the .htaccess rewriting rules. There’s also no server-side functionality available, so I can’t even manually send out a 302-redirect, needed to preserve my incoming links SEO value.</p>
<p>What I ended up doing was to write a small script that would parse my Wordpress backup file and then recreate the /blog/ and /archive/ directories, as if the posts were actually stored there:</p>
<figure class="highlight cs"><pre><span class="keyword">string</span> template = <span class="string">@"layout: false
---
<!DOCTYPE html>
<html>
<head>
<title>Redirecting to [Title]</title>
<link rel=""canonical"" href=""[Permalink]""/>
<meta http-equiv=""content-type"" content=""text/html; charset=utf-8"" />
<meta http-equiv=""refresh"" content=""0;url=[Permalink]"" />
</head>
<body>
Redirecting to <a href=""[Permalink]"">[Title]</a>...
</body>
</html>"</span>;
<span class="keyword">void</span> Main()
{
<span class="keyword">var</span> outputPath = <span class="string">@"D:\Projects\improve.dk (GIT)\source\"</span>;
<span class="keyword">var</span> xmlPath = <span class="string">@"D:\Projects\improve.dk (GIT)\marksrasmussen-blog.wordpress.2014-03-08.xml"</span>;
<span class="keyword">var</span> xml = File.ReadAllText(xmlPath);
<span class="keyword">var</span> xd = <span class="keyword">new</span> XmlDocument();
xd.LoadXml(xml);
<span class="keyword">var</span> nsmgr = <span class="keyword">new</span> XmlNamespaceManager(xd.NameTable);
nsmgr.AddNamespace(<span class="string">"content"</span>, <span class="string">"http://purl.org/rss/1.0/modules/content/"</span>);
nsmgr.AddNamespace(<span class="string">"wp"</span>, <span class="string">"http://wordpress.org/export/1.2/"</span>);
<span class="keyword">foreach</span> (XmlNode item <span class="keyword">in</span> xd.SelectNodes(<span class="string">"//item"</span>))
{
<span class="keyword">var</span> title = item.SelectSingleNode(<span class="string">"title"</span>).InnerText;
<span class="keyword">var</span> date = Convert.ToDateTime(item.SelectSingleNode(<span class="string">"pubDate"</span>).InnerText);
<span class="keyword">var</span> slug = item.SelectSingleNode(<span class="string">"wp:post_name"</span>, nsmgr).InnerText;
<span class="keyword">var</span> indexHtml = template
.Replace(<span class="string">"[Permalink]"</span>, <span class="string">"http://improve.dk/"</span> + slug + <span class="string">"/"</span>)
.Replace(<span class="string">"[Title]"</span>, HttpUtility.HtmlEncode(title));
<span class="comment">// First create the /archive/ entry</span>
<span class="keyword">var</span> outputFolder = Path.Combine(outputPath, <span class="string">"archive"</span>, date.Year.ToString(), date.Month.ToString().PadLeft(<span class="number">2</span>, <span class="string">'0'</span>), date.Day.ToString().PadLeft(<span class="number">2</span>, <span class="string">'0'</span>), slug + <span class="string">".aspx"</span>);
<span class="keyword">var</span> indexPath = Path.Combine(outputFolder, <span class="string">"index.html"</span>);
Directory.CreateDirectory(outputFolder);
File.WriteAllText(indexPath, indexHtml);
<span class="comment">// Then the /blog/ entry</span>
outputFolder = Path.Combine(outputPath, <span class="string">"blog"</span>, date.Year.ToString(), date.Month.ToString().PadLeft(<span class="number">2</span>, <span class="string">'0'</span>), date.Day.ToString().PadLeft(<span class="number">2</span>, <span class="string">'0'</span>), slug);
indexPath = Path.Combine(outputFolder, <span class="string">"index.html"</span>);
Directory.CreateDirectory(outputFolder);
File.WriteAllText(indexPath, indexHtml);
}
}
</pre></figure>
<p>Now the post stored directly in the root, while placeholders have been put in place in the old /blog/ and /archive/ directories. The placeholder code is very simple:</p>
<figure class="highlight"><pre>layout: false
---
<span class="doctype"><!DOCTYPE html></span>
<span class="tag"><<span class="title">html</span>></span>
<span class="tag"><<span class="title">head</span>></span>
<span class="tag"><<span class="title">title</span>></span>Redirecting to TxF presentation materials<span class="tag"></<span class="title">title</span>></span>
<span class="tag"><<span class="title">link</span> <span class="attribute">rel</span>=<span class="value">"canonical"</span> <span class="attribute">href</span>=<span class="value">"http://improve.dk/txf-presentation-materials/"</span>/></span>
<span class="tag"><<span class="title">meta</span> <span class="attribute">http-equiv</span>=<span class="value">"content-type"</span> <span class="attribute">content</span>=<span class="value">"text/html; charset=utf-8"</span> /></span>
<span class="tag"><<span class="title">meta</span> <span class="attribute">http-equiv</span>=<span class="value">"refresh"</span> <span class="attribute">content</span>=<span class="value">"0;url=http://improve.dk/txf-presentation-materials/"</span> /></span>
<span class="tag"></<span class="title">head</span>></span>
<span class="tag"><<span class="title">body</span>></span>
Redirecting to <span class="tag"><<span class="title">a</span> <span class="attribute">href</span>=<span class="value">"http://improve.dk/txf-presentation-materials/"</span>></span>TxF presentation materials<span class="tag"></<span class="title">a</span>></span>...
<span class="tag"></<span class="title">body</span>></span>
<span class="tag"></<span class="title">html</span>></span>
</pre></figure>
<p>It’s simply a small script that contains a meta refresh tag that sends the user on to the new URL. By utilizing the ´rel=”canonical”` meta tag, I ensure that this retains the SEO value as if I had performed a 302 redirect.</p>
<h2 id="Going_Forward">Going Forward</h2>
<p>Creating the placeholder files is a one-off task, seeing as I’ll only ever need to redirect posts that precede the time when I changed my URL structure to contain neither the /blog/ and /archive/ prefixes, nor the dates. All posts from the beginning of 2013 were published using the current URL scheme, which I intend to keep for the foreseeable future.</p>
]]></content>
<category term="Miscellaneous" scheme="http://improve.dk/category/Miscellaneous/"/>
</entry>
<entry>
<title><![CDATA[Migrating from Wordpress to Hexo]]></title>
<link href="http://improve.dk/migrating-from-wordpress-to-hexo/"/>
<id>http://improve.dk/migrating-from-wordpress-to-hexo/</id>
<published>2014-04-19T00:00:00.000Z</published>
<updated>2014-04-22T10:39:11.000Z</updated>
<content type="html"><![CDATA[<p>It’s this time of the year again - the time to migrate from one blog engine to another.</p>
<a id="more"></a>
<p>About a year ago, I migrated from <a href="http://subtextproject.com/" target="_blank">Subtext</a> to <a href="http://wordpress.org/" target="_blank">Wordpress</a>. While I was initially happy, I still wasn’t completely satisfied with the workflow. My primary peeves were:</p>
<ul>
<li>Complexity - I had to pay a host to run a stack consisting of PHP and MySQL and keep it updated.</li>
<li>Security - I needed to constantly keep watch over Wordpress and keep it updated, seeing as it’s a popular target for mass defacements, etc.</li>
<li>Backups - While I did run an automated backup plugin, it was cumbersome as I needed an offsite location (i used FTP).</li>
<li>Writing - While the WYSIWYG editor works for some, it didn’t for me. As such I ended up writing all my posts in pure HTML.</li>
<li>Openness - I’m a big proponent of open source and while I did publish the source code for my <a href="https://github.com/improvedk/improve.dk_Wordpress" target="_blank">custom Wordpress theme</a>, I wanted to also open up my blog posts themselves.</li>
<li>Speed - I’ve spent more time than I’d like to, just keeping Wordpress running smoothly. A lot of things were outside of my control though, seeing as performance optimization was typically relegated to third party plugins.</li>
</ul>
<p>While considering the above list, I ended up settling on <a href="http://hexo.io" target="_blank">Hexo</a> - a static site generator powered by <a href="http://nodejs.org" target="_blank">Node.js</a>.</p>
<h2 id="Migration">Migration</h2>
<p>The migration process was simple enough, though it required some manual labor. All my Wordpress posts are written in HTML and since Hexo posts are based on Markdown, they needed to be converted. After dumping my old Wordpress site into a backup XML file, I was able to <a href="https://github.com/improvedk/improve.dk/blob/master/WP%20Conversion.linq" target="_blank">write a script</a> that parsed the backup XML file and converted each post into the Hexo Markdown format. There were some misses that required manual intervention, seeing as I had invalid HTML, special cases, etc. But overall, 95% of the posts were converted automatically.</p>
<p>Since Hexo is a static site generator, I needed to host my comments offsite. Thankfully <a href="http://disqus.com/" target="_blank">Disqus</a> has native support for the Wordpress comment backup format so importing the comments was a breeze.</p>
<p>Hexo does not support storing assets and posts in folders but prefers to store posts and assets seperately. As I like to keep them together (seeing as I’ve got close to 300 posts), I had to write a small script that copied the assets into the right output locations:</p>
<figure class="highlight js"><pre><span class="keyword">var</span> fs = <span class="built_in">require</span>(<span class="string">'fs'</span>);
<span class="keyword">var</span> path = <span class="built_in">require</span>(<span class="string">'path'</span>);
<span class="keyword">var</span> publicDir = hexo.public_dir;
<span class="keyword">var</span> sourceDir = hexo.source_dir;
<span class="keyword">var</span> postsDir = path.join(sourceDir, <span class="string">'_posts'</span>);
<span class="keyword">var</span> htmlTag = hexo.util.html_tag;
<span class="keyword">var</span> route = hexo.route;
<span class="comment">// Stores assets that'll need to be copied to the post output folders</span>
<span class="keyword">var</span> filesToCopy = [];
<span class="comment">// After Hexo's done generating, we'll copy post assets to their public folderse</span>
hexo.on(<span class="string">'generateAfter'</span>, <span class="function"><span class="keyword">function</span><span class="params">()</span> {</span>
filesToCopy.forEach(<span class="function"><span class="keyword">function</span><span class="params">(obj)</span> {</span>
fs.writeFileSync(obj.destination, fs.readFileSync(obj.source));
});
});
<span class="comment">// Each time a post is rendered, note that we need to copy its assets</span>
hexo.extend.filter.register(<span class="string">'post'</span>, <span class="function"><span class="keyword">function</span><span class="params">(data, cb)</span> {</span>
<span class="keyword">if</span> (data.slug) {
<span class="keyword">var</span> postDir = path.join(postsDir, data.slug);
<span class="keyword">var</span> files = fs.readdirSync(postDir);
files.forEach(<span class="function"><span class="keyword">function</span><span class="params">(file)</span> {</span>
<span class="comment">// Skip the markdown files themselves</span>
<span class="keyword">if</span> (path.extname(file) == <span class="string">'.md'</span>)
<span class="keyword">return</span>;
<span class="keyword">var</span> outputDir = path.join(publicDir, data.slug);
<span class="keyword">var</span> outputPath = path.join(publicDir, data.slug, file);
<span class="keyword">var</span> inputPath = path.join(postDir, file);
<span class="keyword">if</span> (!fs.existsSync(outputDir))
fs.mkdirSync(path.join(outputDir));
filesToCopy.push({ source: inputPath, destination: outputPath });
});
}
cb();
});
</pre></figure>
<p>Though Hexo has a number of helpers to easily insert image links, I prefer to be able to just write an image name on a line by itself and then have the asset link inserted. Enabling that was easy enough too:</p>
<figure class="highlight js"><pre><span class="comment">// Replaces lines with image names with the actual image markup</span>
hexo.extend.filter.register(<span class="string">'pre'</span>, <span class="function"><span class="keyword">function</span><span class="params">(data, cb)</span> {</span>
<span class="comment">// Find all matching image tags</span>
<span class="keyword">var</span> regex = <span class="keyword">new</span> <span class="built_in">RegExp</span>(<span class="regexp">/^([a-z_0-9\-\.]+(?:.jpg|png|gif))(?: ([a-z]+)( \d+)?)?$/gim</span>);
data.content = data.content.replace(regex, <span class="function"><span class="keyword">function</span><span class="params">(match, file, type, maxHeight)</span> {</span>
<span class="comment">// Create image link</span>
<span class="keyword">var</span> imgLink;
<span class="keyword">if</span> (data.slug) <span class="comment">// Posts need to reference image absolutely</span>
imgLink = <span class="string">'/'</span> + data.slug + <span class="string">'/'</span> + file;
<span class="keyword">else</span>
imgLink = file;
<span class="comment">// Max height of image</span>
<span class="keyword">var</span> imgMaxHeight = <span class="string">'250px'</span>;
<span class="keyword">if</span> (maxHeight)
imgMaxHeight = maxHeight + <span class="string">'px'</span>;
<span class="comment">// Set style depending on type</span>
<span class="keyword">var</span> style = <span class="string">''</span>;
<span class="keyword">if</span> (type) {
<span class="keyword">switch</span> (type) {
<span class="keyword">case</span> <span class="string">'right'</span>:
style = <span class="string">'float: right; margin: 20px'</span>;
<span class="keyword">break</span>;
<span class="keyword">case</span> <span class="string">'left'</span>:
style = <span class="string">'float: left'</span>;
<span class="keyword">break</span>;
}
}
<span class="keyword">return</span> <span class="string">'<div class="imgwrapper" style="'</span> + style + <span class="string">'"><div><a href="'</span> + imgLink + <span class="string">'" class="fancy"><img src="'</span> + imgLink + <span class="string">'" style="max-height: '</span> + imgMaxHeight + <span class="string">'"/></a></div></div>'</span>;
});
<span class="comment">// Let hexo continue</span>
cb();
});
</pre></figure>
<h2 id="Hosting,_Security,_Backup_&_Speed">Hosting, Security, Backup & Speed</h2>
<p>Due to its static nature, there are no logins to protect, per se - seeing as there’s no backend. The blog itself is hosted on Github, both the <a href="https://github.com/improvedk/improve.dk" target="_blank">source</a> as well as the statically generated <a href="https://github.com/improvedk/improvedk.github.io" target="_blank">output files</a>. This means I’ve got full backup in the form of distributed git repositories, as well as very easy rollback in case of mistakes.</p>
<p>As for speed, it doesn’t get much faster than serving static files. Comments are lazily loaded after the post itself is loaded. While I can’t utilize the Github CDN (seeing as I’m hosting the blog at an apex domain, making it impossible for me to setup a CNAME - which is required to use the Github CDN), the speed is way faster than it used to be on Wordpress. I could move my DNS to a registrar that supports apex aliasing, but I’m happy with the speed for now.</p>
<h2 id="Openness">Openness</h2>
<p>Finally, since the source for the blog itself is hosted on Github, including the posts themselves, each post is actually editable directly on Github. You’ll notice that I’ve added an Octocat link at the bottom of each post, next to the social sharing icons. Clicking the Octocat will lead you directly to the source of the post you’re looking at. If you find an error or have a suggestion for an edit, feel free to fork the post and submit a pull request.</p>
]]></content>
<summary type="html"><![CDATA[<p>It’s this time of the year again - the time to migrate from one blog engine to another.</p>
]]></summary>
<category term="Miscellaneous" scheme="http://improve.dk/category/Miscellaneous/"/>
</entry>
<entry>
<title><![CDATA[Presenting at SQL Saturday 275]]></title>
<link href="http://improve.dk/presenting-sql-saturday-275/"/>
<id>http://improve.dk/presenting-sql-saturday-275/</id>
<published>2014-02-04T00:00:00.000Z</published>
<updated>2014-04-22T10:39:11.000Z</updated>
<content type="html"><![CDATA[<p>I’m happy to announce that I’ll be presenting at <a href="http://sqlsaturday.com/275/" target="_blank">SQLSaturday #275</a> in Copenhagen on March 29th!</p>
<a id="more"></a>
<p>I’ll be presenting my <strong>Recovering Data from Fatally Corrupt Databases</strong> session:</p>
<blockquote>
<p>Imagine the worst case scenario: Your database won’t come online. Lots of checksum errors logged. DBCC CheckDB won’t even run on the database. And worst of all - you have no backups! Now what do you do with this 20GB binary blob of an MDF file? In this demo-rich session I will briefly introduce the internals of MDF files while primarly concentrating on how to manually extract data from corrupt databases. I will be using the OrcaMDF RawDatabase framework to do most of the parsing, which will also be explained during the session.</p>
</blockquote>
<p>If you want to be able to <a href="/sql-server-corruption-recovery-when-all-else-fails/">save the day</a> when all other options are exhausted, you shouldn’t miss this session.</p>
]]></content>
<summary type="html"><![CDATA[<p>I’m happy to announce that I’ll be presenting at <a href="http://sqlsaturday.com/275/" target="_blank">SQLSaturday #275</a> in Copenhagen on March 29th!</p>
]]></summary>
<category term="SQL Server - Community" scheme="http://improve.dk/category/SQL%20Server%20-%20Community/"/>
<category term="Conferences and Presenting" scheme="http://improve.dk/category/Conferences%20and%20Presenting/"/>
<category term="SQL Server - OrcaMDF" scheme="http://improve.dk/category/SQL%20Server%20-%20OrcaMDF/"/>
</entry>
<entry>
<title><![CDATA[SQL Server Corruption Recovery - When All Else Fails]]></title>
<link href="http://improve.dk/sql-server-corruption-recovery-when-all-else-fails/"/>
<id>http://improve.dk/sql-server-corruption-recovery-when-all-else-fails/</id>
<published>2013-11-06T00:00:00.000Z</published>
<updated>2014-05-04T16:12:23.000Z</updated>
<content type="html"><![CDATA[<p>In this post I want to walk through a number of SQL Server corruption recovery techniques for when you’re out of luck, have no backups, and the usual methods don’t work. I’ll be using the <a href="http://msftdbprodsamples.codeplex.com/releases/view/93587" target="_blank">AdventureWorksLT2008R2 sample database</a> as my victim.</p>
<a id="more"></a>
<h2 id="A_Clean_Start">A Clean Start</h2>
<p>To start out, I’ve attached the downloaded database and it’s available on my SQL Server 2008 R2 instance, under the name of <strong>AWLT2008R2</strong>.</p>
<div class="imgwrapper" style=""><div><a href="/sql-server-corruption-recovery-when-all-else-fails/A9.png" class="fancy"><img src="/sql-server-corruption-recovery-when-all-else-fails/A9.png" style="max-height: 250px"/></a></div></div>
<p>To ensure we’ve got a clean start, I’ll run DBCC CHECKDB with the DATA_PURITY flag set, just to make sure the database is OK.</p>
<figure class="highlight sql"><pre>DBCC CHECKDB (AWLT2008R2) WITH ALL_ERRORMSGS, DATA_PURITY
</pre></figure>
<figure class="highlight"><pre>DBCC results <span class="keyword">for</span> <span class="string">'AWLT2008R2'</span>.
Service Broker Msg <span class="number">9675</span>, State <span class="number">1</span>: Message Types analyzed: <span class="number">14.</span>
Service Broker Msg <span class="number">9676</span>, State <span class="number">1</span>: Service Contracts analyzed: <span class="number">6.</span>
Service Broker Msg <span class="number">9667</span>, State <span class="number">1</span>: Services analyzed: <span class="number">3.</span>
Service Broker Msg <span class="number">9668</span>, State <span class="number">1</span>: Service Queues analyzed: <span class="number">3.</span>
Service Broker Msg <span class="number">9669</span>, State <span class="number">1</span>: Conversation Endpoints analyzed: <span class="number">0.</span>
Service Broker Msg <span class="number">9674</span>, State <span class="number">1</span>: Conversation Groups analyzed: <span class="number">0.</span>
Service Broker Msg <span class="number">9670</span>, State <span class="number">1</span>: Remote Service Bindings analyzed: <span class="number">0.</span>
Service Broker Msg <span class="number">9605</span>, State <span class="number">1</span>: Conversation Priorities analyzed: <span class="number">0.</span>
DBCC results <span class="keyword">for</span> <span class="string">'sys.sysrscols'</span>.
There are <span class="number">805</span> rows <span class="keyword">in</span> <span class="number">9</span> pages <span class="keyword">for</span> object <span class="string">"sys.sysrscols"</span>.
DBCC results <span class="keyword">for</span> <span class="string">'sys.sysrowsets'</span>.
There are <span class="number">125</span> rows <span class="keyword">in</span> <span class="number">1</span> pages <span class="keyword">for</span> object <span class="string">"sys.sysrowsets"</span>.
DBCC results <span class="keyword">for</span> <span class="string">'SalesLT.ProductDescription'</span>.
There are <span class="number">762</span> rows <span class="keyword">in</span> <span class="number">18</span> pages <span class="keyword">for</span> object <span class="string">"SalesLT.ProductDescription"</span>.
<span class="keyword">...</span>
CHECKDB found <span class="number">0</span> allocation errors and <span class="number">0</span> consistency errors <span class="keyword">in</span> database <span class="string">'AWLT2008R2'</span>.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
</pre></figure>
<h2 id="Enter_Corruption">Enter Corruption</h2>
<p>As I don’t want to kill my disk drives just to introduce corruption, I’ll be using <a href="/corrupting-databases-purpose-using-orcamdf-corruptor/">OrcaMDF’s Corruptor class</a> instead. First up we need to shut down SQL Server:</p>
<figure class="highlight sql"><pre>SHUTDOWN WITH NOWAIT
</pre></figure>
<figure class="highlight"><pre><span class="built_in">Server</span> shut down by NOWAIT <span class="built_in">request</span> from login MSR\Mark S. Rasmussen.
SQL <span class="built_in">Server</span> <span class="keyword">is</span> terminating this process.
</pre></figure>
<p>Once the instance has been shut down, I’ve located my MDF file, stored at <strong>D:\MSSQL Databases\AdventureWorksLT2008R2.mdf</strong>. Knowing the path to the MDF file, I’ll now intentially corrupt 5% of the pages in the database (at a database size of 5,312KB this will end up corrupting 33 random pages, out of a total of 664 pages).</p>
<figure class="highlight cs"><pre>Corruptor.CorruptFile(<span class="string">@"D:\MSSQL Databases\AdventureWorksLT2008R2.mdf"</span>, <span class="number">0.05</span>);
</pre></figure>
<p>At this point I have no idea about which pages were actually corrupted, I just know that 33 random pages just got overwritten by all zeros.</p>
<h2 id="Uh_Oh">Uh Oh</h2>
<p>After restarting the SQL Server instance and looking at the tree of databases, it’s obvious we’re in trouble…</p>
<div class="imgwrapper" style=""><div><a href="/sql-server-corruption-recovery-when-all-else-fails/A11.png" class="fancy"><img src="/sql-server-corruption-recovery-when-all-else-fails/A11.png" style="max-height: 250px"/></a></div></div>
<p>Running DBCC CHECKDB doesn’t help much:</p>
<figure class="highlight sql"><pre>DBCC CHECKDB (AWLT2008R2) WITH ALL_ERRORMSGS, DATA_PURITY
</pre></figure>
<figure class="highlight"><pre>Msg <span class="number">926</span>, Level <span class="number">14</span>, State <span class="number">1</span>, Line <span class="number">1</span>
Database <span class="string">'AWLT2008R2'</span> cannot be opened. It has been marked SUSPECT <span class="keyword">by</span> recovery.
See <span class="operator">the</span> SQL Server errorlog <span class="keyword">for</span> more information.
</pre></figure>
<p>What does the errorlog say?</p>
<ul>
<li>Starting up database ‘AWLT2008R2’.</li>
<li>1 transactions rolled forward in database ‘AWLT2008R2’ (13). This is an informational message only. No user action is required.</li>
<li>Error: 824, Severity: 24, State: 2.</li>
<li><strong>SQL Server detected a logical consistency-based I/O error</strong>: incorrect pageid (expected 1:2; actual 0:0). It occurred during a read of page (1:2) in database ID 13 at offset 0x00000000004000 in file ‘D:\MSSQL Databases\AdventureWorksLT2008R2.mdf’. Additional messages in the SQL Server error log or system event log may provide more detail. <strong>This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB).</strong> This error can be caused by many factors; for more information, see SQL Server Books Online.</li>
<li>Error: 3414, Severity: 21, State: 1.</li>
<li><strong>An error occurred during recovery, preventing the database ‘AWLT2008R2’ (database ID 13) from restarting. Diagnose the recovery errors and fix them, or restore from a known good backup. If errors are not corrected or expected, contact Technical Support.</strong></li>
<li>CHECKDB for database ‘AWLT2008R2’ finished without errors on 2013-11-05 20:02:07.810 (local time). This is an informational message only; no user action is required.</li>
<li>Recovery is complete. This is an informational message only. No user action is required.</li>
</ul>
<p>This is officially not good. Our database failed to recover and can’t be put online at the moment, due to I/O consistency errors. We’ve also got our first hint:</p>
<figure class="highlight"><pre><span class="tag">incorrect</span> <span class="tag">pageid</span> (<span class="tag">expected</span> 1<span class="pseudo">:2</span>; <span class="tag">actual</span> 0<span class="pseudo">:0)</span>
</pre></figure>
<p>What this tells us is that the header of page 2 has been overwritten by zeros since SQL Server expected to find the value 1:2, but found 0:0 instead. Page 2 is the first GAM page in the database and is an essential part of the metadata.</p>
<p>SQL Server also wisely told us to either fix the errors or <strong>restore from a known good backup</strong>. And this is why you should always have a recovery strategy. If you ever end up in a situation like this, without a backup, you’ll have to continue reading.</p>
<h2 id="DBCC_CHECKDB">DBCC CHECKDB</h2>
<p>SQL Server recommended that we run a <strong>full database consistency check</strong> using DBCC CHECKDB. Unfortunately, given the state of our database, DBCC CHECKDB is unable to run:</p>
<figure class="highlight sql"><pre>DBCC CHECKDB (AWLT2008R2) WITH ALL_ERRORMSGS, DATA_PURITY
</pre></figure>
<figure class="highlight"><pre>Msg <span class="number">926</span>, Level <span class="number">14</span>, State <span class="number">1</span>, Line <span class="number">1</span>
Database <span class="string">'AWLT2008R2'</span> cannot be opened. It has been marked SUSPECT <span class="keyword">by</span> recovery.
See <span class="operator">the</span> SQL Server errorlog <span class="keyword">for</span> more information.
</pre></figure>
<p>In some cases you may be able to force the database online, by putting it into <strong>EMERGENCY</strong> mode. If we could get the database into EMERGENCY mode, we might just be able to run DBCC CHECKDB.</p>
<figure class="highlight sql"><pre><span class="operator"><span class="keyword">ALTER</span> <span class="keyword">DATABASE</span> AWLT2008R2 <span class="keyword">SET</span> EMERGENCY</span>
</pre></figure>
<figure class="highlight"><pre>Msg <span class="number">824</span>, Level <span class="number">24</span>, State <span class="number">2</span>, Line <span class="number">1</span>
SQL Server detected <span class="operator">a</span> logical consistency-based I/O error: incorrect pageid
(expected <span class="number">1</span>:<span class="number">16</span>; actual <span class="number">0</span>:<span class="number">0</span>). It occurred during <span class="operator">a</span> <span class="built_in">read</span> <span class="operator">of</span> page (<span class="number">1</span>:<span class="number">16</span>) <span class="operator">in</span> database
ID <span class="number">13</span> <span class="keyword">at</span> <span class="built_in">offset</span> <span class="number">0x00000000020000</span> <span class="operator">in</span> <span class="built_in">file</span> <span class="string">'D:\MSSQL Databases\AdventureWorksLT2008R2.mdf'</span>.
Additional messages <span class="operator">in</span> <span class="operator">the</span> SQL Server error <span class="built_in">log</span> <span class="operator">or</span> <span class="keyword">system</span> event <span class="built_in">log</span> may provide more
detail. This is <span class="operator">a</span> severe error condition that threatens database integrity <span class="operator">and</span> must
be corrected immediately. Complete <span class="operator">a</span> full database consistency check (DBCC CHECKDB).
This error can be caused <span class="keyword">by</span> many factors; <span class="keyword">for</span> more information, see SQL Server
Books Online.
</pre></figure>
<p>Even worse, it seems that page 16 has also been hit by corruption. Page 16 is the root page of the sysallocunits base table, holding all of the allocation unit storage metadata. Without page 16 there is no way for SQL Server to access any of its metadata. In short, there’s no way we’re getting this database online!</p>
<h2 id="Enter_OrcaMDF">Enter OrcaMDF</h2>
<p>The OrcaMDF Database class won’t be able to open the database, seeing as it does not handle corruption very well. Even so, I want to try anyway, you never know. First off you’ll have to shut down SQL Server to release the locks on the corrupt MDF file.</p>
<figure class="highlight sql"><pre>SHUTDOWN WITH NOWAIT
</pre></figure>
<p>If you then try opening the database using the OrcaMDF Database class, you’ll get a result like this:</p>
<figure class="highlight cs"><pre><span class="keyword">var</span> db = <span class="keyword">new</span> Database(<span class="string">@"D:\MSSQL Databases\AdventureWorksLT2008R2.mdf"</span>);
</pre></figure>
<div class="imgwrapper" style=""><div><a href="/sql-server-corruption-recovery-when-all-else-fails/Capture.png" class="fancy"><img src="/sql-server-corruption-recovery-when-all-else-fails/Capture.png" style="max-height: 250px"/></a></div></div>
<p>Interestingly the Database class didn’t puke on the boot page (ID 9) itself, so we know that that one’s OK, at least. But as soon as it hit page 16, things started to fall apart - and we already knew page 16 was corrupt.</p>
<h3 id="RawDatabase">RawDatabase</h3>
<p>While the OrcaMDF <strong>Database</strong> class can’t read the database file either, <strong>RawDatabase</strong> can. RawDatabase doesn’t care about metadata, it doesn’t read anything but what you tell it to, and as a result of that, it’s much more resilient to corruption.</p>
<p>Given that we know the corruption has resulted in pages being zeroed out, we could easily gather a list of corrupted pages by just searching for pages whose logical page ID doesn’t match the one in the header:</p>
<figure class="highlight cs"><pre><span class="keyword">var</span> db = <span class="keyword">new</span> RawDatabase(<span class="string">@"D:\MSSQL Databases\AdventureWorksLT2008R2.mdf"</span>)
db.Pages
.Where(x => x.Header.PageID != x.PageID)
.Select(x => x.PageID)
.ToList()
.ForEach(Console.WriteLine);
</pre></figure>
<figure class="highlight"><pre><span class="number">2</span>
<span class="number">4</span>
<span class="number">5</span>
<span class="number">16</span>
<span class="number">55</span>
<span class="keyword">...</span>
<span class="number">639</span>
<span class="number">649</span>
<span class="number">651</span>
<span class="number">662</span>
<span class="number">663</span>
</pre></figure>
<p>This is only possible since we know the corruption caused pages to be zeroed out, so you’ll rarely be this lucky. However, sometimes you may be able to detect the exact result of the corruption, thus enabling you to pinpoint the corrupted pages, just like we did here. However, this doesn’t really help us much - all we have now is a list of some page ID’s that are useless to us.</p>
<h3 id="Getting_a_List_of_Objects">Getting a List of Objects</h3>
<p>For this next part we’ll need a working database, any database, on an instance running the same version that our corrupted database this. This could be the master database - literally any working database. First you’ll want to connect to the database using the <a href="http://technet.microsoft.com/en-us/library/ms178068(v=sql.105" target="_blank">Dedicated Administrator Connection</a>.aspx). Connecting through the DAC allows us to query the base tables of the database.</p>
<p>The base table beneath sys.tables is called <strong>sys.sysschobjs</strong>, and if we can get to that, we can get a list of all the objects in the database, which might be a good start. Having connected to the working database, we can get the sys.sysschobjs details like so:</p>
<figure class="highlight sql"><pre><span class="operator"><span class="keyword">SELECT</span> * <span class="keyword">FROM</span> sys.sysschobjs <span class="keyword">WHERE</span> name = <span class="string">'sysschobjs'</span></span>
</pre></figure>
<div class="imgwrapper" style=""><div><a href="/sql-server-corruption-recovery-when-all-else-fails/Capture1.png" class="fancy"><img src="/sql-server-corruption-recovery-when-all-else-fails/Capture1.png" style="max-height: 250px"/></a></div></div>
<p>The only thing I’m looking for here is the object id, provided by the <strong>id</strong> column. In contrast to all user tables, the system tables have their actual object id stored in the page header, which allows us to easily query for pages by their id. Knowing sys.sysschobjs has ID <strong>34</strong>, let’s see if we can get a list of all the pages belonging to it (note that the .Dump() method is native to <a href="http://www.linqpad.net/" target="_blank">LinqPad</a> - all it does is to output the resulting objects as a table):</p>
<figure class="highlight cs"><pre><span class="keyword">var</span> db = <span class="keyword">new</span> RawDatabase(<span class="string">@"D:\MSSQL Databases\AdventureWorksLT2008R2.mdf"</span>);
db.Pages
.Where(x => x.Header.ObjectID == <span class="number">34</span>)
.Dump();
</pre></figure>
<div class="imgwrapper" style=""><div><a href="/sql-server-corruption-recovery-when-all-else-fails/Capture2.png" class="fancy"><img src="/sql-server-corruption-recovery-when-all-else-fails/Capture2.png" style="max-height: 250px"/></a></div></div>
<p>Now that we have a list of pages belonging to the sys.sysschobjs table, we need to retrieve the actual rows from there. Using <strong>sp_help</strong> on the working database, we can see the underlying schema of sys.sysschobjs:</p>
<figure class="highlight sql"><pre>sp_help 'sys.sysschobjs'
</pre></figure>
<div class="imgwrapper" style=""><div><a href="/sql-server-corruption-recovery-when-all-else-fails/Capture3.png" class="fancy"><img src="/sql-server-corruption-recovery-when-all-else-fails/Capture3.png" style="max-height: 250px"/></a></div></div>
<p>Once we have the schema of sys.sysschobjs, we can make RawDatabase parse the actual rows for us, after which we can filter it down to just the user tables, seeing as we don’t care about procedures, views, indexes and so forth:</p>
<figure class="highlight cs"><pre><span class="keyword">var</span> db = <span class="keyword">new</span> RawDatabase(<span class="string">@"D:\MSSQL Databases\AdventureWorksLT2008R2.mdf"</span>);
<span class="keyword">var</span> pages = db.Pages.Where(x => x.Header.ObjectID == <span class="number">34</span> && x.Header.Type == PageType.Data);
<span class="keyword">var</span> records = pages.SelectMany(x => x.Records).Select(x => (RawPrimaryRecord)x);
<span class="keyword">var</span> rows = RawColumnParser.Parse(records, <span class="keyword">new</span> IRawType[] {
RawType.Int(<span class="string">"id"</span>),
RawType.NVarchar(<span class="string">"name"</span>),
RawType.Int(<span class="string">"nsid"</span>),
RawType.TinyInt(<span class="string">"nsclass"</span>),
RawType.Int(<span class="string">"status"</span>),
RawType.Char(<span class="string">"type"</span>, <span class="number">2</span>),
RawType.Int(<span class="string">"pid"</span>),
RawType.TinyInt(<span class="string">"pclass"</span>),
RawType.Int(<span class="string">"intprop"</span>),
RawType.DateTime(<span class="string">"created"</span>),
RawType.DateTime(<span class="string">"modified"</span>)
});
rows.Where(x => x[<span class="string">"type"</span>].ToString().Trim() == <span class="string">"U"</span>)
.Select(x => <span class="keyword">new</span> {
ObjectID = (<span class="keyword">int</span>)x[<span class="string">"id"</span>],
Name = x[<span class="string">"name"</span>]
}).Dump();
</pre></figure>
<div class="imgwrapper" style=""><div><a href="/sql-server-corruption-recovery-when-all-else-fails/Capture4.png" class="fancy"><img src="/sql-server-corruption-recovery-when-all-else-fails/Capture4.png" style="max-height: 250px"/></a></div></div>
<p>We just went from a completely useless suspect database, with no knowledge of the schema, to now having a list of each user table name & object id. Sure, if one of the pages belonging to sys.syschobjs was corrupt, we’d be missing some of the tables without knowing it. Even so, this is a good start, and there are ways of detecting the missing pages (we could look for broken page header references, for example).</p>
<h3 id="Getting_Schemas">Getting Schemas</h3>
<p>As we saw for sys.sysschobjs, if we are to parse any of the user table data, we need to know the schema of the tables. The schema happens to be stored in the <strong>sys.syscolpars</strong> base table, and if we lookup in sys.sysschobjs for ‘sys.syscolpars’, we’ll get an object ID of <strong>41</strong>. As we did before, we can get a list of all pages belonging to sys.syscolpars:</p>
<figure class="highlight cs"><pre><span class="keyword">var</span> db = <span class="keyword">new</span> RawDatabase(<span class="string">@"D:\MSSQL Databases\AdventureWorksLT2008R2.mdf"</span>);
db.Pages
.Where(x => x.Header.ObjectID == <span class="number">41</span>)
.Dump();
</pre></figure>
<div class="imgwrapper" style=""><div><a href="/sql-server-corruption-recovery-when-all-else-fails/Capture5.png" class="fancy"><img src="/sql-server-corruption-recovery-when-all-else-fails/Capture5.png" style="max-height: 250px"/></a></div></div>
<p>By looking up the schema of sys.syscolpars using sp_help, in the working database, we can parse the actual rows much the same way:</p>
<figure class="highlight cs"><pre><span class="comment">// Parse sys.syscolpars</span>
<span class="keyword">var</span> db = <span class="keyword">new</span> RawDatabase(<span class="string">@"D:\MSSQL Databases\AdventureWorksLT2008R2.mdf"</span>);
<span class="keyword">var</span> pages = db.Pages.Where(x => x.Header.ObjectID == <span class="number">41</span> && x.Header.Type == PageType.Data);
<span class="keyword">var</span> records = pages.SelectMany(x => x.Records).Select(x => (RawPrimaryRecord)x);
<span class="keyword">var</span> rows = RawColumnParser.Parse(records, <span class="keyword">new</span> IRawType[] {
RawType.Int(<span class="string">"id"</span>),
RawType.SmallInt(<span class="string">"number"</span>),
RawType.Int(<span class="string">"colid"</span>),
RawType.NVarchar(<span class="string">"name"</span>),
RawType.TinyInt(<span class="string">"xtype"</span>),
RawType.Int(<span class="string">"utype"</span>),
RawType.SmallInt(<span class="string">"length"</span>),
RawType.TinyInt(<span class="string">"prec"</span>),
RawType.TinyInt(<span class="string">"scale"</span>),
RawType.Int(<span class="string">"collationid"</span>),
RawType.Int(<span class="string">"status"</span>),
RawType.SmallInt(<span class="string">"maxinrow"</span>),
RawType.Int(<span class="string">"xmlns"</span>),
RawType.Int(<span class="string">"dflt"</span>),
RawType.Int(<span class="string">"chk"</span>),
RawType.VarBinary(<span class="string">"idtval"</span>)
});
rows.Select(x => <span class="keyword">new</span> {
ObjectID = (<span class="keyword">int</span>)x[<span class="string">"id"</span>],
ColumnID = (<span class="keyword">int</span>)x[<span class="string">"colid"</span>],
Number = (<span class="keyword">short</span>)x[<span class="string">"number"</span>],
TypeID = (<span class="keyword">byte</span>)x[<span class="string">"xtype"</span>],
Length = (<span class="keyword">short</span>)x[<span class="string">"length"</span>],
Name = x[<span class="string">"name"</span>]
}).Dump();
</pre></figure>
<div class="imgwrapper" style=""><div><a href="/sql-server-corruption-recovery-when-all-else-fails/Capture6.png" class="fancy"><img src="/sql-server-corruption-recovery-when-all-else-fails/Capture6.png" style="max-height: 250px"/></a></div></div>
<h3 id="Recovering_the_Customer_Table_Schema">Recovering the Customer Table Schema</h3>
<p>While there are 12 tables, none are probably more important than the <strong>Customer</strong> table. Based on parsing the sys.sysschobjs base table, we know that the customer table has an object ID of <strong>117575457</strong>. Let’s try and filter down to just that object ID, using the code above:</p>
<figure class="highlight cs"><pre>rows.Where(x => (<span class="keyword">int</span>)x[<span class="string">"id"</span>] == <span class="number">117575457</span>).Select(x => <span class="keyword">new</span> {
ObjectID = (<span class="keyword">int</span>)x[<span class="string">"id"</span>],
ColumnID = (<span class="keyword">int</span>)x[<span class="string">"colid"</span>],
Number = (<span class="keyword">short</span>)x[<span class="string">"number"</span>],
TypeID = (<span class="keyword">byte</span>)x[<span class="string">"xtype"</span>],
Length = (<span class="keyword">short</span>)x[<span class="string">"length"</span>],
Name = x[<span class="string">"name"</span>]
}).OrderBy(x => x.Number).Dump();
</pre></figure>
<div class="imgwrapper" style=""><div><a href="/sql-server-corruption-recovery-when-all-else-fails/Capture7.png" class="fancy"><img src="/sql-server-corruption-recovery-when-all-else-fails/Capture7.png" style="max-height: 250px"/></a></div></div>
<p>Running the following query in any working database, we can correlate the TypeID values with the SQL Server type names:</p>
<figure class="highlight sql"><pre><span class="operator"><span class="keyword">SELECT</span>
*
<span class="keyword">FROM</span>
sys.types
<span class="keyword">WHERE</span>
system_type_id <span class="keyword">IN</span> (<span class="number">56</span>, <span class="number">104</span>, <span class="number">231</span>, <span class="number">167</span>, <span class="number">36</span>, <span class="number">61</span>) <span class="keyword">AND</span>
system_type_id = user_type_id</span>
</pre></figure>
<div class="imgwrapper" style=""><div><a href="/sql-server-corruption-recovery-when-all-else-fails/Capture8.png" class="fancy"><img src="/sql-server-corruption-recovery-when-all-else-fails/Capture8.png" style="max-height: 250px"/></a></div></div>
<p>Using the output from syscolpars and the type names, we can now deduce the schema of the Customer table (note that the syscolpars lengths are physical, meaning a length of 16 for an nvarchar column means a logical length of 8):</p>
<figure class="highlight sql"><pre><span class="operator"><span class="keyword">CREATE</span> <span class="keyword">TABLE</span> Customer (
CustomerID <span class="keyword">int</span>,
NameStyle <span class="keyword">bit</span>,
Title nvarchar(<span class="number">8</span>),
FirstName nvarchar(<span class="number">50</span>),
MiddleName nvarchar(<span class="number">50</span>),
LastName nvarchar(<span class="number">50</span>),
Suffix nvarchar(<span class="number">10</span>),
CompanyName nvarchar(<span class="number">128</span>),
SalesPerson nvarchar(<span class="number">256</span>),
EmailAddress nvarchar(<span class="number">50</span>),
Phone nvarchar(<span class="number">25</span>),
PasswordHash <span class="keyword">varchar</span>(<span class="number">128</span>),
PasswordSalt <span class="keyword">varchar</span>(<span class="number">10</span>),
rowguid uniqueidentifier,
ModifiedDate datetime
)</span>
</pre></figure>
<p>All we need now is to find the pages belonging to the Customer table. That’s slightly easier said than done however. While each object has an object ID, as can be verified using sys.sysschobjs, that object ID is not what’s stored in the page headers, except for system objects. Thus we can’t just query for all pages whose Header.ObjectID == 117575457, as the value 117575457 won’t be stored in the header.</p>
<h3 id="Recovering_the_Customer_Allocation_Unit">Recovering the Customer Allocation Unit</h3>
<p>To find the pages belonging to the Customer table, we’ll first need to find the allocation unit to which it belongs. Unfortunately we already know that page 16 is corrupt - the first page of the <strong>sys.sysallocunits</strong> table, containing all of the metadata. However, we might just be lucky enough for that first page to contain the allocation units for all of the internal tables, which we do not care about. Let’s see if there are any other pages belonging to sys.sysallocunits:</p>
<figure class="highlight cs"><pre><span class="keyword">var</span> db = <span class="keyword">new</span> RawDatabase(<span class="string">@"D:\MSSQL Databases\AdventureWorksLT2008R2.mdf"</span>);
db.Pages
.Where(x => x.Header.ObjectID == <span class="number">7</span>)
.Dump();
</pre></figure>
<div class="imgwrapper" style=""><div><a href="/sql-server-corruption-recovery-when-all-else-fails/Capture9.png" class="fancy"><img src="/sql-server-corruption-recovery-when-all-else-fails/Capture9.png" style="max-height: 250px"/></a></div></div>
<p>There are 5 other pages available. Let’s try and parse them out so we have as much of the allocation unit data available, as possible. Once again we’ll get the schema from the working database, using sp_help, after which we can parse the remaining rows using RawDatabase. By looking up ‘sysallocunits’ in sysschobjs, we know it has an object ID of 7:</p>
<figure class="highlight cs"><pre><span class="keyword">var</span> db = <span class="keyword">new</span> RawDatabase(<span class="string">@"D:\MSSQL Databases\AdventureWorksLT2008R2.mdf"</span>);
<span class="keyword">var</span> pages = db.Pages.Where(x => x.Header.ObjectID == <span class="number">7</span> && x.Header.Type == PageType.Data);
<span class="keyword">var</span> records = pages.SelectMany(x => x.Records).Select(x => (RawPrimaryRecord)x);
<span class="keyword">var</span> rows = RawColumnParser.Parse(records, <span class="keyword">new</span> IRawType[] {
RawType.BigInt(<span class="string">"auid"</span>),
RawType.TinyInt(<span class="string">"type"</span>),
RawType.BigInt(<span class="string">"ownerid"</span>),
RawType.Int(<span class="string">"status"</span>),
RawType.SmallInt(<span class="string">"fgid"</span>),
RawType.Binary(<span class="string">"pgfirst"</span>, <span class="number">6</span>),
RawType.Binary(<span class="string">"pgroot"</span>, <span class="number">6</span>),
RawType.Binary(<span class="string">"pgfirstiam"</span>, <span class="number">6</span>),
RawType.BigInt(<span class="string">"pcused"</span>),
RawType.BigInt(<span class="string">"pcdata"</span>),
RawType.BigInt(<span class="string">"pcreserved"</span>),
RawType.Int(<span class="string">"dbfragid"</span>)
});
rows.Select(x => <span class="keyword">new</span> {
AllocationUnitID = (<span class="keyword">long</span>)x[<span class="string">"auid"</span>],
Type = (<span class="keyword">byte</span>)x[<span class="string">"type"</span>],
ContainerID = (<span class="keyword">long</span>)x[<span class="string">"ownerid"</span>]
}).Dump();
</pre></figure>
<div class="imgwrapper" style=""><div><a href="/sql-server-corruption-recovery-when-all-else-fails/Capture10.png" class="fancy"><img src="/sql-server-corruption-recovery-when-all-else-fails/Capture10.png" style="max-height: 250px"/></a></div></div>
<p>By itself, we can’t use this data, but we’ll need it in just a moment. First we need to get a hold of the Customer table partitions as well. We do so by looking up the schema of <strong>sys.sysrowsets</strong> using sp_help, after which we can parse it. Looking up ‘sysrowsets’ in sysschobjs, we know that sys.sysrowsets has an object ID of 5:</p>
<figure class="highlight cs"><pre><span class="keyword">var</span> db = <span class="keyword">new</span> RawDatabase(<span class="string">@"D:\MSSQL Databases\AdventureWorksLT2008R2.mdf"</span>);
<span class="keyword">var</span> pages = db.Pages.Where(x => x.Header.ObjectID == <span class="number">5</span> && x.Header.Type == PageType.Data);
<span class="keyword">var</span> records = pages.SelectMany(x => x.Records).Select(x => (RawPrimaryRecord)x);
<span class="keyword">var</span> rows = RawColumnParser.Parse(records, <span class="keyword">new</span> IRawType[] {
RawType.BigInt(<span class="string">"rowsetid"</span>),
RawType.TinyInt(<span class="string">"ownertype"</span>),
RawType.Int(<span class="string">"idmajor"</span>),
RawType.Int(<span class="string">"idminor"</span>),
RawType.Int(<span class="string">"numpart"</span>),
RawType.Int(<span class="string">"status"</span>),
RawType.SmallInt(<span class="string">"fgidfs"</span>),
RawType.BigInt(<span class="string">"rcrows"</span>),
RawType.TinyInt(<span class="string">"cmprlevel"</span>),
RawType.TinyInt(<span class="string">"fillfact"</span>),
RawType.SmallInt(<span class="string">"maxnullbit"</span>),
RawType.Int(<span class="string">"maxleaf"</span>),
RawType.SmallInt(<span class="string">"maxint"</span>),
RawType.SmallInt(<span class="string">"minleaf"</span>),
RawType.SmallInt(<span class="string">"minint"</span>),
RawType.VarBinary(<span class="string">"rsguid"</span>),
RawType.VarBinary(<span class="string">"lockres"</span>),
RawType.Int(<span class="string">"dbfragid"</span>)
});
rows.Where(x => (<span class="keyword">int</span>)x[<span class="string">"idmajor"</span>] == <span class="number">117575457</span>).Select(x => <span class="keyword">new</span> {
RowsetID = (<span class="keyword">long</span>)x[<span class="string">"rowsetid"</span>],
ObjectID = (<span class="keyword">int</span>)x[<span class="string">"idmajor"</span>],
IndexID = (<span class="keyword">int</span>)x[<span class="string">"idminor"</span>]
}).Dump();
</pre></figure>
<div class="imgwrapper" style=""><div><a href="/sql-server-corruption-recovery-when-all-else-fails/Capture11.png" class="fancy"><img src="/sql-server-corruption-recovery-when-all-else-fails/Capture11.png" style="max-height: 250px"/></a></div></div>
<p>By filtering down to just the Customer table’s object ID, we’ve now got the three partitions that belongs to the table - one for each allocation unit type - ROW_OVERFLOW_DATA (3), LOB_DATA (2) and IN_ROW_DATA (1). We don’t care about LOB and SLOB for now, all we need is the IN_ROW_DATA partition - giving us a RowsetID value of <strong>72057594039697408</strong>.</p>
<p>Now that we have the RowsetID, let’s lookup the allocation unit using the data we got from sys.sysallocunits earlier on:</p>
<figure class="highlight cs"><pre><span class="keyword">var</span> db = <span class="keyword">new</span> RawDatabase(<span class="string">@"D:\MSSQL Databases\AdventureWorksLT2008R2.mdf"</span>);
<span class="keyword">var</span> pages = db.Pages.Where(x => x.Header.ObjectID == <span class="number">7</span> && x.Header.Type == PageType.Data);
<span class="keyword">var</span> records = pages.SelectMany(x => x.Records).Select(x => (RawPrimaryRecord)x);
<span class="keyword">var</span> rows = RawColumnParser.Parse(records, <span class="keyword">new</span> IRawType[] {
RawType.BigInt(<span class="string">"auid"</span>),
RawType.TinyInt(<span class="string">"type"</span>),
RawType.BigInt(<span class="string">"ownerid"</span>),
RawType.Int(<span class="string">"status"</span>),
RawType.SmallInt(<span class="string">"fgid"</span>),
RawType.Binary(<span class="string">"pgfirst"</span>, <span class="number">6</span>),
RawType.Binary(<span class="string">"pgroot"</span>, <span class="number">6</span>),
RawType.Binary(<span class="string">"pgfirstiam"</span>, <span class="number">6</span>),
RawType.BigInt(<span class="string">"pcused"</span>),
RawType.BigInt(<span class="string">"pcdata"</span>),
RawType.BigInt(<span class="string">"pcreserved"</span>),
RawType.Int(<span class="string">"dbfragid"</span>)
});
rows.Where(x => (<span class="keyword">long</span>)x[<span class="string">"ownerid"</span>] == <span class="number">72057594039697408</span>).Select(x => <span class="keyword">new</span> {
AllocationUnitID = (<span class="keyword">long</span>)x[<span class="string">"auid"</span>],
Type = (<span class="keyword">byte</span>)x[<span class="string">"type"</span>],
ContainerID = (<span class="keyword">long</span>)x[<span class="string">"ownerid"</span>]
}).Dump();
</pre></figure>
<div class="imgwrapper" style=""><div><a href="/sql-server-corruption-recovery-when-all-else-fails/Capture12.png" class="fancy"><img src="/sql-server-corruption-recovery-when-all-else-fails/Capture12.png" style="max-height: 250px"/></a></div></div>
<h3 id="Recovering_the_Customers">Recovering the Customers</h3>
<p>Now that we have the allocation unit ID, we can convert that into the object ID value, as stored in the page headers (big thanks goes out to <a href="http://www.sqlskills.com/blogs/paul/" target="_blank">Paul Randal</a> who was kind enough to blog about the <a href="http://www.sqlskills.com/blogs/paul/inside-the-storage-engine-how-are-allocation-unit-ids-calculated/" target="_blank">relationship between the allocation unit ID and the page header m_objId and m_indexId fields</a>):</p>
<figure class="highlight cs"><pre><span class="keyword">var</span> allocationUnitID = <span class="number">72057594041270272</span>;
<span class="keyword">var</span> indexID = allocationUnitID >> <span class="number">48</span>;
<span class="keyword">var</span> objectID = (allocationUnitID - (indexID << <span class="number">48</span>)) >> <span class="number">16</span>;
Console.WriteLine(<span class="string">"IndexID: "</span> + indexID);
Console.WriteLine(<span class="string">"ObjectID: "</span> + objectID);
</pre></figure>
<figure class="highlight"><pre><span class="attribute">IndexID</span>: <span class="string">256</span>
<span class="attribute">ObjectID</span>: <span class="string">51</span>
</pre></figure>
<p>Now that we have not only the object ID, but also the index ID, we can easily get a list of all the pages belonging to the Customer table:</p>
<figure class="highlight cs"><pre><span class="keyword">var</span> db = <span class="keyword">new</span> RawDatabase(<span class="string">@"D:\MSSQL Databases\AdventureWorksLT2008R2.mdf"</span>);
db.Pages
.Where(x => x.Header.ObjectID == <span class="number">51</span> && x.Header.IndexID == <span class="number">256</span>)
.Dump();
</pre></figure>
<div class="imgwrapper" style=""><div><a href="/sql-server-corruption-recovery-when-all-else-fails/Capture13.png" class="fancy"><img src="/sql-server-corruption-recovery-when-all-else-fails/Capture13.png" style="max-height: 250px"/></a></div></div>
<p>And since we already know the schema for the Customer table, it’s a simple matter of making RawDatabase parse the actual rows:</p>
<figure class="highlight cs"><pre><span class="keyword">var</span> db = <span class="keyword">new</span> RawDatabase(<span class="string">@"D:\MSSQL Databases\AdventureWorksLT2008R2.mdf"</span>);
<span class="keyword">var</span> pages = db.Pages.Where(x => x.Header.ObjectID == <span class="number">51</span> && x.Header.IndexID == <span class="number">256</span> && x.Header.Type == PageType.Data);
<span class="keyword">var</span> records = pages.SelectMany(x => x.Records).Select(x => (RawPrimaryRecord)x);
<span class="keyword">var</span> rows = RawColumnParser.Parse(records, <span class="keyword">new</span> IRawType[] {
RawType.Int(<span class="string">"CustomerID"</span>),
RawType.Bit(<span class="string">"NameStyle"</span>),
RawType.NVarchar(<span class="string">"Title"</span>),
RawType.NVarchar(<span class="string">"FirstName"</span>),
RawType.NVarchar(<span class="string">"MiddleName"</span>),
RawType.NVarchar(<span class="string">"LastName"</span>),
RawType.NVarchar(<span class="string">"Suffix"</span>),
RawType.NVarchar(<span class="string">"CompanyName"</span>),
RawType.NVarchar(<span class="string">"SalesPerson"</span>),
RawType.NVarchar(<span class="string">"EmailAddress"</span>),
RawType.NVarchar(<span class="string">"Phone"</span>),
RawType.Varchar(<span class="string">"PasswordHash"</span>),
RawType.Varchar(<span class="string">"PasswordSalt"</span>),
RawType.UniqueIdentifier(<span class="string">"rowguid"</span>),
RawType.DateTime(<span class="string">"ModifiedDate"</span>)
});
rows.Select(x => <span class="keyword">new</span> {
CustomerID = (<span class="keyword">int</span>)x[<span class="string">"CustomerID"</span>],
FirstName = (<span class="keyword">string</span>)x[<span class="string">"FirstName"</span>],
MiddleName = (<span class="keyword">string</span>)x[<span class="string">"MiddleName"</span>],
LastName = (<span class="keyword">string</span>)x[<span class="string">"LastName"</span>],
CompanyName = (<span class="keyword">string</span>)x[<span class="string">"CompanyName"</span>],
EmailAddress = (<span class="keyword">string</span>)x[<span class="string">"EmailAddress"</span>]
}).Dump();
</pre></figure>
<div class="imgwrapper" style=""><div><a href="/sql-server-corruption-recovery-when-all-else-fails/Capture15.png" class="fancy"><img src="/sql-server-corruption-recovery-when-all-else-fails/Capture15.png" style="max-height: 250px"/></a></div></div>
<p>And there we have it. 795 customers were just recovered from an otherwise unrecoverable state. Now it’s just a matter of repeating this process for the other tables as well.</p>
<h2 id="Summary">Summary</h2>
<p>As I’ve just shown, even though all hope seems lost, there are still options. If you know what you’re doing, a tool like OrcaMDF, or another homebrewn solution, might come in as an invaluable out, during a disaster. This is not, and should never be, a replacement for a good recovery strategy. That being said, not a week goes by without someone posting on a forum somewhere about a corrupt database without any backups.</p>
<p>In this case we went from fatal corruption to recovering 795 customers from the Customer table. Looking at the database, before it was corrupted, there was originally 847 customers in the table. Thus 52 customers were lost due to the corruption. If the pages really are hit by corruption, nothing will get that data back, unless you have a backup. However, if you’re unlucky and end up with metadata corruption, and/or a database that won’t come online, this may be a viable solution.</p>
<p>Should you come across a situation where OrcaMDF might come in handy, I’d love to hear about it - nothing better to hear than success stories! If you don’t feel like going through this process yourself, feel free to contact me; I may be able to help.</p>
]]></content>
<summary type="html"><![CDATA[<p>In this post I want to walk through a number of SQL Server corruption recovery techniques for when you’re out of luck, have no backups, and the usual methods don’t work. I’ll be using the <a href="http://msftdbprodsamples.codeplex.com/releases/view/93587" target="_blank">AdventureWorksLT2008R2 sample database</a> as my victim.</p>
]]></summary>
<category term=".NET" scheme="http://improve.dk/category/.NET/"/>
<category term="SQL Server - Internals" scheme="http://improve.dk/category/SQL%20Server%20-%20Internals/"/>
<category term="SQL Server - OrcaMDF" scheme="http://improve.dk/category/SQL%20Server%20-%20OrcaMDF/"/>
<category term="SQL Server" scheme="http://improve.dk/category/SQL%20Server/"/>
</entry>
<entry>
<title><![CDATA[Corrupting Databases on Purpose Using the OrcaMDF Corruptor]]></title>
<link href="http://improve.dk/corrupting-databases-purpose-using-orcamdf-corruptor/"/>
<id>http://improve.dk/corrupting-databases-purpose-using-orcamdf-corruptor/</id>
<published>2013-11-05T00:00:00.000Z</published>
<updated>2014-05-04T16:12:23.000Z</updated>
<content type="html"><![CDATA[<p>Sometimes you must first do evil, to do good. Such is the case when you want to hone your skills in corruption recovery of SQL Server databases.</p>
<a id="more"></a>
<p>To give me more material to test the new <a href="/orcamdf-rawdatabase-a-swiss-army-knife-for-mdf-files/">RawDatabase</a> functionality, I’ve now added a <a href="https://github.com/improvedk/OrcaMDF/blob/master/src/OrcaMDF.Framework/Corruptor.cs" target="_blank">Corruptor class</a> to OrcaMDF. Corruptor does more or less what the name says - it corrupts database files on purpose.</p>
<p>The corruption itself is quite simple. Corruptor will choose a number of random pages and simply overwrite the page completely with all zeros. Depending on what pages are hit, this can be quite fatal.</p>
<p>I shouldn’t have to say this, but just in case… Please do not use this on anything valuable. <strong>It will fatally corrupt your data.</strong></p>
<h2 id="Examples">Examples</h2>
<p>There are two overloads for the Corruptor.CorruptFile method, both of them return an IEnumerable of integers - a list of the page IDs that have been overwritten by zeros.</p>
<p>The following code will corrupt 5% of the pages in the AdventureWorks2008R2LT.mdf file, after which it will output each page ID that has been corrupted. You can specify the percentage of pages to corrupt by changing the second parameter.</p>
<figure class="highlight cs"><pre><span class="keyword">var</span> corruptedPageIDs = Corruptor.CorruptFile(<span class="string">@"C:\AdventureWorks2008R2LT.mdf"</span>, <span class="number">0.05</span>);
Console.WriteLine(<span class="keyword">string</span>.Join(<span class="string">", "</span>, corruptedPageIDs));
</pre></figure>
<figure class="highlight"><pre>606, 516, 603, 521, 613, 621, 118, 47, 173, 579,
323, 217, 358, 515, 615, 271, 176, 596, 417, 379,
269, 409, 558, 103, 8, 636, 200, 361, 60, 486,
366, 99, 87
</pre></figure>
<p>To make the corruption hit even harder, you can also use the second overload of the CorruptFile method, allowing you to specify the exact number of pages to corrupt, within a certain range of page IDs. The following code will corrupt exactly 10 pages within the first 50 pages (zero-based), thus hitting mostly metadata.</p>
<figure class="highlight cs"><pre><span class="keyword">var</span> corruptedPageIDs = Corruptor.CorruptFile(<span class="string">@"C:\AdventureWorks2008R2LT.mdf"</span>, <span class="number">10</span>, <span class="number">0</span>, <span class="number">49</span>);
Console.WriteLine(<span class="keyword">string</span>.Join(<span class="string">", "</span>, corruptedPageIDs));
</pre></figure>
<figure class="highlight"><pre>16, 4, 0, 32, 15, 14, 30, 2, 49, 9
</pre></figure>
<p>In the above case I was extraordinarily unlucky seeing as page 0 is the file header page, page 2 is the first GAM page, page 9 is the boot page and finally page 16 is the page that contains the allocation unit metadata. With corruption like this, you can be certain that DBCC CHECKDB will be giving up, leaving you with no other alternative than to restore from a backup.</p>
<p>Or… You could try to recover as much data as possible using <a href="/orcamdf-rawdatabase-a-swiss-army-knife-for-mdf-files/">OrcaMDF RawDatabase</a>, but I’ll get back to that later :)</p>
]]></content>
<summary type="html"><![CDATA[<p>Sometimes you must first do evil, to do good. Such is the case when you want to hone your skills in corruption recovery of SQL Server databases.</p>
]]></summary>
<category term=".NET" scheme="http://improve.dk/category/.NET/"/>
<category term="SQL Server - Internals" scheme="http://improve.dk/category/SQL%20Server%20-%20Internals/"/>
<category term="SQL Server - OrcaMDF" scheme="http://improve.dk/category/SQL%20Server%20-%20OrcaMDF/"/>
<category term="SQL Server" scheme="http://improve.dk/category/SQL%20Server/"/>
</entry>
<entry>
<title><![CDATA[OrcaMDF RawDatabase - A Swiss Army Knife for MDF Files]]></title>
<link href="http://improve.dk/orcamdf-rawdatabase-a-swiss-army-knife-for-mdf-files/"/>
<id>http://improve.dk/orcamdf-rawdatabase-a-swiss-army-knife-for-mdf-files/</id>
<published>2013-11-04T00:00:00.000Z</published>
<updated>2014-05-04T16:12:23.000Z</updated>
<content type="html"><![CDATA[<p>When I initially started working on <a href="/introducing-orcamdf/">OrcaMDF</a> I had just one goal, to gain a deeper knowledge of MDF file internals than I could through most books available.</p>
<a id="more"></a>
<p>As time progressed, so did OrcaMDF. While I had no initial plans of doing so, OrcaMDF has ended up being capable of parsing base tables, metadata and even <a href="/orcamdf-now-exposes-metadata-through-system-dmvs/">dynamically recreating common DMVs</a>. On top of this, I made a <a href="/orcamdf-studio-release-feature-recap/">simple GUI</a>, just to make OrcaMDF easier to use.</p>
<p>While that’s great, it comes at the price of extreme complexity. To be able to automatically parse table metadata like schemas, partitions, allocation units and more, not to mention abstracting away details like heaps and indexes, it takes a lot of code and it requires intimate knowledge of the database itself. Seeing as metadata changes between versions, OrcaMDF currently only supports SQL Server 2008 R2. While the data structures themselves are rather stable, there are minor differences in the way metadata is stored, the data exposed by DMVs and so forth. And on top of this, requiring all of the metadata to be perfect, for OrcaMDF to work, results in OrcaMDF being just as vulnerable to corruption as SQL Server is itself. Got a corrupt boot page? Neither SQL Server nor OrcaMDF will be able to parse the database.</p>
<h2 id="Say_Hello_to_RawDatabase">Say Hello to RawDatabase</h2>
<p>I tried to imagine the future of OrcaMDF and how to make it the most useful. I could march on make it support more and more of the same features that SQL Server does, eventually being able to parse 100% of an MDF file. But what would the value be? Sure, it would be a great learning opportunity, but the thing is, if you’ve got a working database, SQL Server does a pretty good job too. So what’s the alternative?</p>
<p><em>RawDatabase</em>, in contrast to the <em>Database</em> class, doesn’t try to parse anything besides what you tell it to. There’s no automatic parsing of schemas. It doesn’t know about base tables. It doesn’t know about DMVs. It does however know about the SQL Server data structures and it gives you an interface for working with the MDF file directly. Letting RawDatabase parse nothing but the data structures means it’s significantly less vulnerable to corruption or bad data.</p>
<h2 id="Examples">Examples</h2>
<p>It’s still early in the development, but let me show some examples of what can be done using RawDatabase. While I’m running the code in <a href="http://www.linqpad.net/" target="_blank">LINQPad</a>, as that makes it easy to show the results, the result are just standard .NET objects. All examples are run against the AdventureWorks 2008R2 LT (Light Weight) database.</p>
<h3 id="Getting_a_Single_Page">Getting a Single Page</h3>
<p>In the most basic example, we’ll parse just a single page.</p>
<figure class="highlight cs"><pre><span class="comment">// Get page 197 in file 1</span>
<span class="keyword">var</span> db = <span class="keyword">new</span> RawDatabase(<span class="string">@"C:\AWLT2008R2.mdf"</span>);
db.GetPage(<span class="number">1</span>, <span class="number">197</span>).Dump();
</pre></figure>
<div class="imgwrapper" style=""><div><a href="/orcamdf-rawdatabase-a-swiss-army-knife-for-mdf-files/A.png" class="fancy"><img src="/orcamdf-rawdatabase-a-swiss-army-knife-for-mdf-files/A.png" style="max-height: 250px"/></a></div></div>
<h3 id="Parsing_the_Page_Header">Parsing the Page Header</h3>
<p>Now that we’ve got a page, how about we dump the header values?</p>
<figure class="highlight cs"><pre><span class="comment">// Get the header of page 197 in file 1</span>
<span class="keyword">var</span> db = <span class="keyword">new</span> RawDatabase(<span class="string">@"C:\AWLT2008R2.mdf"</span>);
db.GetPage(<span class="number">1</span>, <span class="number">197</span>).Header.Dump();
</pre></figure>
<div class="imgwrapper" style=""><div><a href="/orcamdf-rawdatabase-a-swiss-army-knife-for-mdf-files/A1.png" class="fancy"><img src="/orcamdf-rawdatabase-a-swiss-army-knife-for-mdf-files/A1.png" style="max-height: 250px"/></a></div></div>
<h3 id="Parsing_the_Slot_Array">Parsing the Slot Array</h3>
<p>Just as the header is available, you can also get the raw slot array entries.</p>
<figure class="highlight cs"><pre><span class="comment">// Get the slot array entries of page 197 in file 1</span>
<span class="keyword">var</span> db = <span class="keyword">new</span> RawDatabase(<span class="string">@"C:\AWLT2008R2.mdf"</span>);
db.GetPage(<span class="number">1</span>, <span class="number">197</span>).SlotArray.Dump();
</pre></figure>
<div class="imgwrapper" style=""><div><a href="/orcamdf-rawdatabase-a-swiss-army-knife-for-mdf-files/A2.png" class="fancy"><img src="/orcamdf-rawdatabase-a-swiss-army-knife-for-mdf-files/A2.png" style="max-height: 250px"/></a></div></div>
<h3 id="Parsing_Records">Parsing Records</h3>
<p>While getting the raw slot array entries can be useful, you’ll usually want to look at the records themselves. Fortunately, that’s easy to do too.</p>
<figure class="highlight cs"><pre><span class="comment">// Get all records on page 197 in file 1</span>
<span class="keyword">var</span> db = <span class="keyword">new</span> RawDatabase(<span class="string">@"C:\AWLT2008R2.mdf"</span>);
db.GetPage(<span class="number">1</span>, <span class="number">197</span>).Records.Dump();
</pre></figure>
<div class="imgwrapper" style=""><div><a href="/orcamdf-rawdatabase-a-swiss-army-knife-for-mdf-files/A3.png" class="fancy"><img src="/orcamdf-rawdatabase-a-swiss-army-knife-for-mdf-files/A3.png" style="max-height: 250px"/></a></div></div>
<h3 id="Retrieving_Data_from_Records">Retrieving Data from Records</h3>
<p>Once you’ve got the records, you could now access the FixedLengthData or the VariableLengthOffsetValues properties to get the raw fixed length and variable length column values. However, what you’ll typically want is to get the actually parsed values. To spare you the work, OrcaMDF can parse it for you, if you just provide it the schema.</p>
<figure class="highlight cs"><pre><span class="comment">// Read the record contents of the first record on page 197 of file 1</span>
<span class="keyword">var</span> db = <span class="keyword">new</span> RawDatabase(<span class="string">@"C:\AWLT2008R2.mdf"</span>);
RawPrimaryRecord firstRecord = (RawPrimaryRecord)db.GetPage(<span class="number">1</span>, <span class="number">197</span>).Records.First();
<span class="keyword">var</span> values = RawColumnParser.Parse(firstRecord, <span class="keyword">new</span> IRawType[] {
RawType.Int(<span class="string">"AddressID"</span>),
RawType.NVarchar(<span class="string">"AddressLine1"</span>),
RawType.NVarchar(<span class="string">"AddressLine2"</span>),
RawType.NVarchar(<span class="string">"City"</span>),
RawType.NVarchar(<span class="string">"StateProvince"</span>),
RawType.NVarchar(<span class="string">"CountryRegion"</span>),
RawType.NVarchar(<span class="string">"PostalCode"</span>),
RawType.UniqueIdentifier(<span class="string">"rowguid"</span>),
RawType.DateTime(<span class="string">"ModifiedDate"</span>)
});
values.Dump();
</pre></figure>
<div class="imgwrapper" style=""><div><a href="/orcamdf-rawdatabase-a-swiss-army-knife-for-mdf-files/A4.png" class="fancy"><img src="/orcamdf-rawdatabase-a-swiss-army-knife-for-mdf-files/A4.png" style="max-height: 250px"/></a></div></div>
<p>RawColumnParser.Parse will, given a schema, automatically convert the raw bytes into a Dictionary<string, object>, the key being the column name from the schema and the value being the actual type of the column, e.g. int, short, Guid, string, etc. By letting you, the user, specify the schema, OrcaMDF can get rid of a slew of dependencies on metadata, thus ignoring any possible corruption in metadata. Given the availability of the Next & PreviousPageID properties of the header, it would be simple to iterate through all linked pages, parsing all records of each page - basically performing a scan on a given allocation unit.</p>
<h3 id="Filtering_Pages">Filtering Pages</h3>
<p>Besides retrieving a specific page, RawDatabase also has a Pages property that enumerates over all pages in a database. Using this you could, for example, get a list of all IAM pages in the database.</p>
<figure class="highlight cs"><pre><span class="comment">// Get a list of all IAM pages in the database</span>
<span class="keyword">var</span> db = <span class="keyword">new</span> RawDatabase(<span class="string">@"C:\AWLT2008R2.mdf"</span>);
db.Pages
.Where(x => x.Header.Type == PageType.IAM)
.Dump();
</pre></figure>
<div class="imgwrapper" style=""><div><a href="/orcamdf-rawdatabase-a-swiss-army-knife-for-mdf-files/A5.png" class="fancy"><img src="/orcamdf-rawdatabase-a-swiss-army-knife-for-mdf-files/A5.png" style="max-height: 250px"/></a></div></div>
<p>And since this is powered by LINQ, it’s easy to project just the properties you want. For example, you could get all index pages and their slot counts like this:</p>
<figure class="highlight cs"><pre><span class="comment">// Get all index pages and their slot counts</span>
<span class="keyword">var</span> db = <span class="keyword">new</span> RawDatabase(<span class="string">@"C:\AWLT2008R2.mdf"</span>);
db.Pages
.Where(x => x.Header.Type == PageType.Index)
.Select(x => <span class="keyword">new</span> {
x.PageID,
x.Header.SlotCnt
}).Dump();
</pre></figure>
<div class="imgwrapper" style=""><div><a href="/orcamdf-rawdatabase-a-swiss-army-knife-for-mdf-files/A6.png" class="fancy"><img src="/orcamdf-rawdatabase-a-swiss-army-knife-for-mdf-files/A6.png" style="max-height: 250px"/></a></div></div>
<p>Or let’s say you wanted to get all data pages with at least one record and more than 7000 bytes of free space - with the page id, free count, record count and average record size as the output:</p>
<figure class="highlight cs"><pre><span class="keyword">var</span> db = <span class="keyword">new</span> RawDatabase(<span class="string">@"C:\AWLT2008R2.mdf"</span>);
db.Pages
.Where(x => x.Header.FreeCnt > <span class="number">7000</span>)
.Where(x => x.Header.SlotCnt >= <span class="number">1</span>)
.Where(x => x.Header.Type == PageType.Data)
.Select(x => <span class="keyword">new</span> {
x.PageID,
x.Header.FreeCnt,
RecordCount = x.Records.Count(),
RecordSize = (<span class="number">8096</span> - x.Header.FreeCnt) / x.Records.Count()
}).Dump();
</pre></figure>
<div class="imgwrapper" style=""><div><a href="/orcamdf-rawdatabase-a-swiss-army-knife-for-mdf-files/A7.png" class="fancy"><img src="/orcamdf-rawdatabase-a-swiss-army-knife-for-mdf-files/A7.png" style="max-height: 250px"/></a></div></div>
<p>And as a final example, imagine you’ve got just an MDF file but you seem to have forgotten what objects are stored inside of it. Fret not, we’ll just get the data from the sysschobjs base table! Sysschobjs is the base table that stores all object data, and fortunately it has a static object ID of <em>34</em>. Using this, we can filter down to all of the data pages for object 34, get all the records and then parse just the two first columns of the schema (you may specify a partial schema, as long as you only omit columns at the end), ending up in us dumping just the names (we could of course have gotten the full schema, if we wanted to).</p>
<figure class="highlight cs"><pre><span class="keyword">var</span> db = <span class="keyword">new</span> RawDatabase(<span class="string">@"C:\AWLT2008R2.mdf"</span>);
<span class="keyword">var</span> records = db.Pages
.Where(x => x.Header.ObjectID == <span class="number">34</span> && x.Header.Type == PageType.Data)
.SelectMany(x => x.Records);
<span class="keyword">var</span> rows = records.Select(x => RawColumnParser.Parse((RawPrimaryRecord)x, <span class="keyword">new</span> IRawType[] {
RawType.Int(<span class="string">"id"</span>),
RawType.NVarchar(<span class="string">"name"</span>)
}));
rows.Select(x => x[<span class="string">"name"</span>]).Dump();
</pre></figure>
<div class="imgwrapper" style=""><div><a href="/orcamdf-rawdatabase-a-swiss-army-knife-for-mdf-files/A8.png" class="fancy"><img src="/orcamdf-rawdatabase-a-swiss-army-knife-for-mdf-files/A8.png" style="max-height: 250px"/></a></div></div>
<h2 id="Compatibility">Compatibility</h2>
<p>Seeing as RawDatabase doesn’t rely on metadata, it’s much easier to support multiple SQL Server versions. Thus, I’m happy to say that RawDatabase fully supports SQL Server 2005, 2008, 2008R2 and 2012. It probably supports 2014 too, I just haven’t tested that. Speaking of testing, all unit tests are automatically run against AdventureWorksLT for both 2005, 2008, 2008R2 and 2012 during testing. Right now there are tests demonstrating that OrcaMDF RawDatabase is able to parse the first record of each and every table in the AdventureWorks LT databases.</p>
<h2 id="Corruption">Corruption</h2>
<p>One of the really interesting use cases for RawDatabase is in the case of corrupted databases. You could filter pages on the object id you’re searching for and then brute-force parse each of them, retrieving whatever data is readable. If metadata is corrupted, you could ignore it, provide the schema manually and the just follow the linked lists of pages, or parse the IAM pages to read heaps. During the next couple of weeks I’ll be blogging more on OrcaMDF RawDatabase to show various use case examples, including ones on corruption.</p>
<h2 id="Source_&_Feedback">Source & Feedback</h2>
<p>I’m really excited about the new RawDatabase addition to OrcaMDF and I hope I’m not the only one who can see the potential. If you try it out, have any ideas, suggestions or other kinds of feedback, I’d love to hear it.</p>
<p>If you want to try it out, head on over to the <a href="https://github.com/improvedk/OrcaMDF" target="_blank">OrcaMDF project on GitHub</a>. Once it’s just a bit more polished, I’ll make it available on NuGet as well. Just like the rest of OrcaMDF, the code is licensed under GPL v3.</p>
]]></content>
<summary type="html"><![CDATA[<p>When I initially started working on <a href="/introducing-orcamdf/">OrcaMDF</a> I had just one goal, to gain a deeper knowledge of MDF file internals than I could through most books available.</p>
]]></summary>
<category term=".NET" scheme="http://improve.dk/category/.NET/"/>
<category term="SQL Server - Internals" scheme="http://improve.dk/category/SQL%20Server%20-%20Internals/"/>
<category term="SQL Server - OrcaMDF" scheme="http://improve.dk/category/SQL%20Server%20-%20OrcaMDF/"/>
<category term="SQL Server" scheme="http://improve.dk/category/SQL%20Server/"/>
<category term="Tools of the Trade" scheme="http://improve.dk/category/Tools%20of%20the%20Trade/"/>
</entry>
<entry>
<title><![CDATA[PowerPad - Powerpoint Presenters View for Tablets & Phones]]></title>
<link href="http://improve.dk/powerpad-powerpoint-presenters-view-for-tablets-phones/"/>
<id>http://improve.dk/powerpad-powerpoint-presenters-view-for-tablets-phones/</id>
<published>2013-10-28T00:00:00.000Z</published>
<updated>2014-04-22T10:39:11.000Z</updated>