# DS108 Databases : Lesson Ten Companion Notebook

### Table of Contents <a class="anchor" id="DS108L10_toc"></a>

* [Table of Contents](#DS108L10_toc)
    * [Page 1 - Overview](#DS108L10_page_1)
    * [Page 2 - Sharding](#DS108L10_page_2)
    * [Page 3 - More Methods](#DS108L10_page_3)
    * [Page 4 - Key Terms](#DS108L10_page_4)
    * [Page 5 - Lesson 5 Hands On](#DS108L10_page_5)

    

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 1 - Overview of this Module<a class="anchor" id="DS108L10_page_1"></a>

[Back to Top](#DS108L10_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

In [1]:
from IPython.display import VimeoVideo
# Tutorial Video Name: Sharding, More Methods and Project
VimeoVideo('245797657', width=720, height=480)

# Overview

During this last lesson, you will be learning about a few more in-depth NoSQL terms and methods. You will also be working on an in-depth Lesson 5 HandsOn for NoSQL. It is time to dive right into Sharding.

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 2 - Sharding<a class="anchor" id="DS108L10_page_2"></a>

[Back to Top](#DS108L10_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Sharding

**Sharding** is a way to spread data across multiple machines and servers. MongoDB uses Sharding to support deployments and applications that contain huge data sets. This is because when database systems have large data sets, a single server may have trouble keeping up with all the data. There are _two_ ways to deal with a situation like this: *Vertical* or *Horizontal* Scaling.

---

## Vertical Scaling

**Vertical Scaling** involves ways to increase the capacity of a server, such as using a much more powerful CPU, adding more RAM, or increasing the amount of storage space. There are limitations when using _Vertical Scaling_ because there may be restrictions on how much storage one machine can handle. Also, cloud-based providers have a maximum for how much storage they have.

---

## Horizontal Scaling

**Horizontal Scaling** is the process of spreading out the dataset between multiple servers and increasing the storage to those servers as needed. Even if a single machine out of the many handling the data may not be super high-speed, overall, it may increase the efficiency of the application having many machines. If the dataset expands, all that is needed is to add servers to handle that data as needed. MongoDB supports _Horizontal Scaling_ through _Sharding_.

---

## Enable Sharding

**Sharding** is something that is done at a very high level in your database, usually on the admin side of the database. The following command is used when you would like to create Sharding in your database:

```js
db.runCommand({
   shardCollection: "<database>.<collection>",
   key: <shardkey>,
   unique: <boolean>,
   numInitialChunks: <integer>,
   collation: { locale: "simple" }
})
```

As you can see, there are several options available to you when running this command; however, only the last is optional. Now it's time to explore these parts:

* **shardCollection:** How do you name which collection in which database you would like to shard. It will always be a string.

* **key:** The index specification document to use as the shard key. The shard key determines how MongoDB distributes the documents among the shards.

* **unique:** When true, the unique option ensures that the underlying index enforces a unique constraint. Hashed shard keys do not support unique constraints. Defaults to false.

* **numInitialChunks:** Specifies the number of chunks to initially create when sharding a collection that is empty with a hashed shard key. Then, MongoDB will create and balance chunks across the cluster. The `numInitialChunks` must be less than 8192 per shard.
  * MongoDB divides sharded data into chunks. Each chunk has an inclusive lower and exclusive upper range based on the shard key.

* **collation:** _Optional._ If the collection specified to shardCollection has a default collation, you must include a collation document with `{ locale : "simple" }`, or the shardCollection command fails. At least one of the indexes whose fields support the shard key pattern must have a simple collation.
  * Collation allows users to specify language-specific string comparison rules, such as letter case and accent marks.

<div class="panel panel-success">
    <div class="panel-heading">
        <h3 class="panel-title">Additional Info!</h3>
    </div>
    <div class="panel-body">
        <p><b>Sharding</b> can get quite complicated quickly, but you now have a basic understanding of what sharding is and how you can accomplish it. The documentation on <b>Sharding</b> is extensive, so if you would like to read more about it, you can visit MongoDB's documentation website <a href="https://docs.mongodb.com/manual/sharding/" target="_blank">here</a>.</p>
    </div>
</div>


<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 3 - More Methods<a class="anchor" id="DS108L10_page_3"></a>

[Back to Top](#DS108L10_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# More Methods

Now that you have made it this far in NoSQL, it is time to look into a few more available methods when working with a collection. Some of these methods can be in-depth, but it is good to know they are available to you.

---

## aggregate()

This method calculates the aggregate (total) values for data in a collection. Below is the syntax:

```js
db.collectionName.aggregate(pipeline, options);
```

Below is a description of the parameters of the above query:

* **pipeline:** An array that is a sequence of data aggregation operations or stages.
    <div class="panel panel-success">
    <div class="panel-heading">
        <h3 class="panel-title">Additional Info!</h3>
    </div>
    <div class="panel-body">
        <p>There are many pipeline stages, which you can read about <a href="https://docs.mongodb.com/v3.0/reference/operator/aggregation-pipeline/" target="_blank">here</a>.</p>
    </div>
    </div>

* **options:** _Optional_, additional documents that are passed in when using aggregate.
    <div class="panel panel-success">
    <div class="panel-heading">
        <h3 class="panel-title">Additional Info!</h3>
    </div>
    <div class="panel-body">
        <p>There are many options available to the aggregate method, which you can read about <a href="https://docs.mongodb.com/v3.0/reference/method/db.collection.aggregate/#db.collection.aggregate">here</a>.</p>
    </div>
    </div>

---

## count()

This method will count and return the number of results based on a query. The syntax is below:

```js
db.collectionName.count();
```

For example, if you wanted to count the number of documents in your `inventory` collection, you would run the following:

```js
db.inventory.count();
```

The query above will return 10, or however many documents are currently in the `inventory` collection.

You could also run this query with a filter. Check to see how many of your app users in your `appusers` collection have an age greater than 20 by running the below query:

```js
db.appusers.count( { age: { $gt : 20 } } )
```

After running the above query, it should return the number 4 or a number close, depending on your changes in that collection.

---


## totalSize()

This method will return the total size in bytes of the data in the collection plus the size of every index on the collection.

If you run the query below, a number around 16000 will be returned based on what your collection currently contains:

```js
db.appusers.totalSize()
```

<div class="panel panel-success">
    <div class="panel-heading">
        <h3 class="panel-title">Additional Info!</h3>
    </div>
    <div class="panel-body">
        <p>There are many more methods available to you. Each method has the possibility of being slightly complex. If you would like to read more about the methods available in NoSQL, visit MongoDB's documentation <a href="https://docs.mongodb.com/v3.0/reference/method/js-collection/" target="_blank">Collection Methods</a>.</p>
    </div>
</div>


<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 4 - Key Terms<a class="anchor" id="DS108L10_page_4"></a>

[Back to Top](#DS108L10_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Key Terms

Below is a list of a short description of the important keywords you have learned in this lesson. Please read through and go back and review any concepts you don't fully understand. Great Work!

<table class="table table-striped">
    <tr>
        <th>Keyword</th>
        <th>Description</th>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Sharding</td>
        <td>Sharding is a way to spread data across multiple machines and servers. MongoDB uses Sharding to support deployments and applications that contain huge data sets. The reason for this is because when database systems have large data sets, a single server may have trouble keeping up with all the data. There are two ways to deal with a situation like this: <em>Vertical</em> or <em>Horizontal</em> Scaling.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Vertical Scaling</td>
        <td>Involves ways to increase the capacity of a server, such as using a much more powerful CPU, adding more RAM, or increasing the amount of storage space.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Horizontal Scaling</td>
        <td>The process of spreading out the dataset between multiple servers and increasing the storage to those servers as needed.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>aggregate()</td>
        <td>This method calculates the aggregate (total) values for data in a collection.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>count()</td>
        <td>This method will count and return the number of results based on a query.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>totalSize()</td>
        <td>This method will return the total size in bytes of the data in the collection plus the size of every indexes on the collection.
</td>
    </tr>
</table>

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 5 - Lesson 5 Hands On<a class="anchor" id="DS108L10_page_5"></a>

[Back to Top](#DS108L10_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


Welcome to the last project for the NoSQL course! Great job making it this far! This hands on will be different from the hands on projects you have previously seen in a couple of different ways. You will be putting together the numerous topics you have learned into one large project. It is designed to mimic real problems which you may face in your career, so it may be a challenge for you and will also take several hours. 

<div class="panel panel-success">
    <div class="panel-heading">
        <h3 class="panel-title">Additional Info!</h3>
    </div>
    <div class="panel-body">
        <p>Before beginning this hands-on, you may want to watch this <a href="https://vimeo.com/428206689"> recorded live workshop, "Winnie the Pooh and Databases Too," </a> that goes over a similar example. </p>
    </div>
</div>

Take this project step-by-step and be aware that the project description below is written to be a bit less specific than previous Hands-Ons. The hands on is supposed to challenge you to do some problem solving to figure out how to accomplish a task. You can always review past lessons or use a Google search if needed. Good luck!

<div class="panel panel-danger">
    <div class="panel-heading">
        <h3 class="panel-title">Caution!</h3>
    </div>
    <div class="panel-body">
        <p>Do not submit your project until you have completed all requirements! You will not be able to resubmit.</p>
    </div>
</div>

---

## Requirements

For this hands on, you will be working through several real-life scenarios within new collections. This Hands-On is structured into _two_ parts, and each part will ask you to run multiple queries. After each query, please take a screenshot and add it to a text document (or an equivalent) and name this file `Lesson5handson`. This way, you will be able to submit your answers to each part all at once.

---

## Part 1

You have just been hired at a startup company. They currently only have ten employees, but they need to be included in the database. So far, they have only been tracked within an excel sheet. Your boss would like you to create a new collection in Atlas named `employees`. Take a look at the following data and the notes listed below before inserting any data:

<table class="table table-striped">
    <tr>
        <th>Name</th>
        <th>Birthday</th>
        <th>Address</th>
        <th>City</th>
        <th>State</th>
        <th>Position Name</th>
        <th>Remote</th>
        <th>Full Time</th>
    <tr>
    <tr>
        <td>Alison Davidson</td>
        <td>04/05/75</td>
        <td>874 W. Oak Place</td>
        <td>Gary</td>
        <td>Indiana</td>
        <td>Customer Support</td>
        <td>Yes</td>
        <td>Yes</td>
    <tr>
    <tr>
        <td>Henry Chapelton</td>
        <td>09/29/80</td>
        <td>9324 E. Vista Way</td>
        <td>Tempe</td>
        <td>Arizona</td>
        <td>Customer Support</td>
        <td>No</td>
        <td>Yes</td>
    <tr>
    <tr>
        <td>Alex Miller</td>
        <td>11/22/83</td>
        <td>244 Price Road</td>
        <td>Mesa</td>
        <td>Arizona</td>
        <td>Customer Support</td>
        <td>No</td>
        <td>No</td>
    <tr>
    <tr>
        <td>Carly Nielson</td>
        <td>08/04/87</td>
        <td>678 W. Westward Road</td>
        <td>Phoenix</td>
        <td>Arizona</td>
        <td>Office Manager</td>
        <td>No</td>
        <td>Yes</td>
    <tr>
    <tr>
        <td>Tom Talbot</td>
        <td>12/30/89</td>
        <td>12 Oakland Way</td>
        <td>Chandler</td>
        <td>Arizona</td>
        <td>Inventory Manager</td>
        <td>No</td>
        <td>Yes</td>
    <tr>
    <tr>
        <td>Mary Crawley</td>
        <td>07/06/80</td>
        <td>1010 Granite Way</td>
        <td>Charlotte</td>
        <td>North Carolina</td>
        <td>Human Resources</td>
        <td>Yes</td>
        <td>Yes</td>
    <tr>
    <tr>
        <td>Daisy Baxter</td>
        <td>09/09/87</td>
        <td>990 E. 84th St.</td>
        <td>Tempe</td>
        <td>Arizona</td>
        <td>CEO</td>
        <td>No</td>
        <td>Yes</td>
    <tr>
    <tr>
        <td>William Coyle</td>
        <td>10/11/91</td>
        <td>944 W. 16th St.</td>
        <td>Phoenix</td>
        <td>Arizona</td>
        <td>Intern</td>
        <td>No</td>
        <td>No</td>
    <tr>
    <tr>
        <td>Edith Bates</td>
        <td>07/28/90</td>
        <td>7 E. 20th Pl.</td>
        <td>Chandler</td>
        <td>Arizona</td>
        <td>Customer Support</td>
        <td>No</td>
        <td>Yes</td>
    <tr>
    <tr>
        <td>Gwen Harding</td>
        <td>10/11/86</td>
        <td>234 W. 48th. St.</td>
        <td>Phoenix</td>
        <td>Arizona</td>
        <td>Office Assistent</td>
        <td>No</td>
        <td>Yes</td>
    <tr>
</table>

**Notes:**

* The `Birthday` field should have a data type of Date.
* The `Position Name`, `Remote`, and `Full Time` fields should be within an embedded document called `position`.
* `Remote` and `Full Time` fields should have boolean values.

It's been about a month since you have inserted all employees into the database. There have been a couple of changes to the company. The CEO decided that he no longer wants remote employees, so they have transferred the remote employees and they are now living in Arizona. Alison Davidson now lives at 777 E. 1st St. # 120 Tempe, AZ and Mary Crawley now lives at 8322 W. Vista Pl. Scottsdale, AZ. Since all employees now all live in Arizona, there is no need to have a field named "state" within this collection, so please remove it. Lastly, they would like very efficient searching using the "position" field (remember that field includes a document with three other fields).

---

## Part 2

You are currently working for a company who wants to build an app similar to Spotify. Below is a list of data for different songs. Please insert this data into a new collection named `songs`.

<table class="table table-striped">
    <tr>
        <th>SongId</th>
        <th align="left">Title</th>
        <th align="left">Artist</th>
        <th align="left">Album</th>
        <th>ReleaseYear</th>
    <tr>
    <tr>
        <td>1</td>
        <td>Girls Just Want To Have Fun</td>
        <td>Cyndi Lauper</td>
        <td>She's So Unusual</td>
        <td>1983</td>
    </tr>
    <tr>
        <td>2</td>
        <td>Hips Don't Lie</td>
        <td>Shakira feat. Wyclef Jean</td>
        <td>Oral Fixation Vol. 2</td>
        <td>2006</td>
    </tr>
    <tr>
        <td>3</td>
        <td>Poker Face</td>
        <td>Lady Gaga</td>
        <td>The Fame</td>
        <td>2008</td>
    </tr>
    <tr>
        <td>4</td>
        <td>Wannabe</td>
        <td>Spice Girls</td>
        <td>Spice</td>
        <td>1996</td>
    </tr>
    <tr>
        <td>5</td>
        <td>California Gurls</td>
        <td>Katy Perry feat. Snoop Dogg</td>
        <td>Teenage Dream</td>
        <td>2010</td>
    </tr>
    <tr>
        <td>6</td>
        <td>Bye, Bye, Bye</td>
        <td>NSYNC</td>
        <td>No Strings Attached</td>
        <td>2000</td>
    </tr>
    <tr>
        <td>7</td>
        <td>I Will Always Love You</td>
        <td>Whitney Houston</td>
        <td>I Will Always Love You: The Best of Whitney Houston</td>
        <td>2012</td>
    </tr>
    <tr>
        <td>8</td>
        <td>Baby One More Time</td>
        <td>Britney Spears</td>
        <td>Baby One More Time</td>
        <td>1999</td>
    </tr>
    <tr>
        <td>9</td>
        <td>Vogue</td>
        <td>Madonna</td>
        <td>I'm Breathless</td>
        <td>1990</td>
    </tr>
    <tr>
        <td>10</td>
        <td>Rolling in the Deep</td>
        <td>Adele</td>
        <td>21</td>
        <td>2011</td>
    </tr>
    <tr>
        <td>11</td>
        <td>1234</td>
        <td>Feist</td>
        <td>The Reminder</td>
        <td>2007</td>
    </tr>
    <tr>
        <td>12</td>
        <td>Elastic Heart</td>
        <td>Sia</td>
        <td>The Hunger Games: Catching Fire Soundtrack</td>
        <td>2015</td>
    </tr>
    <tr>
        <td>13</td>
        <td>Oops! I Did It Again</td>
        <td>Britney Spears</td>
        <td>Oops! I Did It Again</td>
        <td>2000</td>
    </tr>
    <tr>
        <td>14</td>
        <td>Bad Romance</td>
        <td>Lady Gaga</td>
        <td>The Fame Monster</td>
        <td>2009</td>
    </tr>
    <tr>
        <td>15</td>
        <td>Lose Control</td>
        <td>Missy Elliot</td>
        <td>The Cookbook</td>
        <td>2005</td>
    </tr>
    <tr>
        <td>16</td>
        <td>U Can't Touch This</td>
        <td>MC Hammer</td>
        <td>Please Hammer, Don't Hurt 'Em</td>
        <td>1990</td>
    </tr>
    <tr>
        <td>17</td>
        <td>Thriller</td>
        <td>Michael Jackson</td>
        <td>Thriller</td>
        <td>1982</td>
    </tr>
    <tr>
        <td>18</td>
        <td>Single Ladies</td>
        <td>Beyonce</td>
        <td>I am... Sasha Fierce</td>
        <td>2008</td>
    </tr>
    <tr>
        <td>19</td>
        <td>Rhythm Nation</td>
        <td>Janet Jackson</td>
        <td>Janet Jackson's Rhythm Nation 1814</td>
        <td>1989</td>
    </tr>
</table>

**Notes:**

* The `artist`, `album`, and `releaseYear` fields should be an embedded document named `details`.
* Be sure that the `songId` and `releaseYear` fields have a type of number.

Next, your company has run into some things they would like to be changed within the database:

* The `title` field needs to be renamed to `songTitle`, so it is clearer to the developers working with the data.
* They would like to have the `artist` field to be outside the `details` document but the `album` and `releaseYear` should stay within that document.

<div class="panel panel-danger">
    <div class="panel-heading">
        <h3 class="panel-title">Caution!</h3>
    </div>
    <div class="panel-body">
        <p>Be sure to zip and submit your <code>Lesson5handson</code> text document when finished! You will not be able to re-submit, so be sure the screenshots to each part are located within this document.</p>
    </div>
</div>
