Working with Big Data: Infrastructure, Algorithms, and Visualizations
<h1>Lesson 2: Building a Big Data Infrastructure Part 2</h1>
<section class="slide">
<h1>Structured Storage &amp; Cassandra</h1>
<section class="slide">
<h2>Structured Data</h2>
<li class="slide">
<h3>More like tables</h3>
<li class="slide">
<h3>Fast write and query times</h3>
<section class="slide">
<li class="slide"><h3>Modeled after Google's BigTable</h3></li>
<li class="slide"><h3>Distributed</h3></li>
<li class="slide"><h3>Column Oriented Database</h3></li>
<li class="slide"><h3>Open source originally from Facebook</h3></li>
<li class="slide"><h3>Used in Twitter, LinkedIn, Netflix, etc.</h3></li>
<section class="slide">
<h2>Column Oriented Properties</h2>
<li class="slide"><h3>Column names not set</h3></li>
<li class="slide"><h3>Wide rows</h3></li>
<li class="slide"><h3>Rows occupy non-contiguous disk space</h3></li>
<section class="slide">
<h2>Other Column Oriented Data Stores</h2>
<li class="slide"><h3>BigTable</h3></li>
<li class="slide"><h3>HBase</h3></li>
<li class="slide"><h3>DynamoDB</h3></li>
<section class="slide">
<h2>CAP Theorem</h2>
<li class="slide"><h3>Consistency</h3></li>
<li class="slide"><h3>Availability</h3></li>
<li class="slide"><h3>Partition Tolerance</h3></li>
<section class="slide">
<h1>Cassandra relaxes consistency</h1>
<section class="slide">
<h2>Cassandra is Good For</h2>
<li class="slide"><h3>Time Series Data</h3></li>
<li class="slide"><h3>Event Data</h3></li>
<li class="slide"><h3>Timelines</h3></li>
<li class="slide"><h3>High Volume</h3></li>
<section class="slide">
<h1>On to the install...</h1>
