# NoSQL Databases - Quiz


## Introduction

In this lesson, you'll answer some common open-ended interview questions about NoSQL databases.


## Question 1

What is the difference between a NoSQL and a SQL database?  Which is better?

Write your answer below this line: 
_______________________________________________________________________________________________________________________________

SQL databases are vertically scalable while NoSQL databases are horizontally scalable. SQL databases have a predefined schema whereas NoSQL databases use dynamic schema for unstructured data. SQL requires specialized DB hardware for better performance while NoSQL uses commodity hardware. In some cases no sql works better when ther is nit much structure and you can upload as one file.





## Question 2

Describe a situation where a NoSQL database might be a better choice than a relational database, and explain your reasoning.

Write your answer below this line: 
_______________________________________________________________________________________________________________________________
The best way to determine which database is right for your business is to analyze what you need its functions to be. SQL is a good choice for any organization that will benefit from a predefined structure and set schemas, particularly if they require multi-row transactions. It is also a good option if all data must be consistent without leaving room for error, such as with accounting systems.

NoSQL is a good choice for those companies experiencing rapid growth with no clear schema definitions. NoSQL offers much more flexibility than a relational database and is a solid option for companies who must analyze large quantities of data or whose data structures they manage are variable.

Examples
Below you can clearly see that the first field is student and the second field is class.

Copy
{ student:  "Walker Rowe",
  class:  "biology"
}
In terms of SQL, the user would first create this schema before they could add data to the database:

Copy
CREATE TABLE studentClasses (
    student varchar,
    class varchar
);
Where varchar is variable character length.

To add data to that table, one would:

Copy
INSERT INTO studentClasses (student, class)
VALUES ("Walker Rowe", "biology:);
With a NoSQL database, in this example MongoDB, you would use the database API to insert data like this:

Copy
db.studentClasses.insert( { name: "Walker Rowe", class: "biology" } )
And then you can create the union (all elements from two or more sets) and intersection (common elements of two or more sets) of sets using SQL.

The big breakthrough here was to let programmers do all this using easy-to-understand SQL syntax. Then Oracle made further technological advances to ensure database referential integrity and improve performance by indexing fields and caching records. (Database referential integrity means the completeness of transactions so that there are no orphaned records. For example, a sales record with no corresponding product item. This is what is I meant by saying Oracle can enforce the relationship between tables.)

Note that in the MongoDB example we have described above, Oracle programmers would say that the table studentClasses is an intersection. Because you can determine from it both what classes a student has and which students are in which class. In this case you would also have both student and class records contain things like the class room number and the student phone number.

The Oracle database is called a row-oriented database. Data is grouped into rows and columns. We don’t need to mention column-oriented databases here, like Cassandra, as they are different in architecture and not conception to such a large degree. So they are not so fundamentally different as SQL versus NoSQL. In particular, the Cassandra NoSQL database is used to group similar columns of data near each other so they can be retrieved at the highest possible speed. Also, Cassandra and NoSQL database get rid of the concept of database normalization, which is key to Oracle, as we explain below. And they do not store empty column values, so the row lengths can differ.

Efficiency and Normalization
One thing that Oracle stressed was the relationship between objects. They said that all data should be normalized. This means no data should be stored twice. So instead of putting, for example, the school address in every student record, it would be better to maintain a school table and store the address there. NoSQL databases have gotten rid of this constraint, to a certain degree.

Disk space was expensive in the 1970s and so was memory, so normalization made sense. But it can take some time to do a joint operation to bring together a record that is stored in different tables into one logical unit. It also requires the overhead of maintaining index files and writing to those as data is added or deleted

NoSQL databases say all that does not matter as disk space and memory are cheap. Proponents of that say it is okay to, regarding the aforementioned case, put the school address in with the student. This speeds data retrieval time and makes coding easier.

NoSQL vs SQL
Oracle’s largest competitor in the business market is SAP. They have their own database, Hana. But the only difference between them and Oracle is Hana stores all its records in memory (flushing them to disk as needed.). It does this for speed. Regardless, it is still a rdbms.

It is difficult to make the case to switch to NoSQL databases in business applications that have been running for decades or to propose those for new applications when companies already have knowledge of rdbms. There are management issues that Oracle has solved, such as data replication, that could leave someone using, for example, ElasticSearch, without support and with a downed system. To fill that gap, some companies have taken over the support of — and sometimes most of the programming for — so-called opensource databases, like ElasticSearch. If you want support for that then you can buy support and a supported version from Elastic.

The other is the paradigm switch for transactional systems. It is easy to conceive of adding a sale to a sales database. Oracle then would automatically calculate on-hand inventory using a saved SQL operation called a view. For MongoDB, a program would have to sort through the inventory items and subtract the sales to determine the new on-hand inventory.

Some Common NoSQL Databases
If you read the use cases for NoSQL databases, you will find that those tend to be adopted as niche and not enterprise systems. For example, Uber uses Cassandra to keep track of drivers. But its needs are unique, including the need to write millions of records per second across multiple data centers. They even wrote their own implementation of Cassandra so that it could run on Mesos. Mesos is an orchestration system similar to containers.

Amazon markets is DynamoDB database as having “millisecond latency.” They also drop the term NoSQL and simply call it a nonrelational database.

DynamoDB, like MongoDB, has a JavaScript interface, so you can work with it using that relatively simple programming language as well. For example, to add a record, you first instantiate an instance of the database, then add the JSON item like this:

Copy
var docClient = AWS.DynamoDB.DocumentClient()
docClient.put("{JSON … }"}
One implementation detail is that you can run these operations in MongoDB and DynamoDB using Node.js. That is JavaScript running in the middle tier, so you do not need to create JAR files or middleware servers like Oracle Weblogic.

So, which should you be using for your new project? Your accounting system could very well continue to run on an RDBMS system. But there are alternatives to paying Oracle for licensing fees, like using MySQL. But will it use MongoDB? That is not very likely for the short term, as there are millions of programmers around the world using Java and Oracle and project managers and users who understand that. Use ElasticSearch for logs and Spark for analytics. As for the others, study those on a case by case basis to see which works best given your resources, skill, ability to suffer lost transactions, etc.

Conclusion
No matter what field you are in, choosing the correct database for your organization is an important decision. NoSQL databases are quickly becoming a major part of the database landscape today, and they are proving to be a real game-changer in the IT arena. They have numerous benefits, including lower cost, open-source availability, and easier scalability, which makes NoSQL an appealing option for anyone thinking about integrating in Big Data. They are a young technology, however, which makes them slightly more volatile.

On the other hand, SQL databases have proven themselves for over 40 years and use long-established standards that are well defined. They have a huge community of experts behind them, and the opportunity for collaboration is limitless.

Overall, the decision of using SQL versus NoSQL for business is not entirely black and white; it requires some comparing and contrasting to determine which database best fits your specific needs. With the proper amount of research and preparation, however, you will ensure that the database you choose provides an efficient and streamlined management system for your organization.





## Question 3

How does MongoDB work? How is it different from a relational database?

Write your answer below this line: 
_______________________________________________________________________________________________________________________________


The immediate and fundamental difference between MongoDB and an RDBMS is the underlying data model. A relational database structures data into tables and rows, while MongoDB structures data into collections of JSON documents. JSON is a self-describing, human readable data format.





## Question 4

What is fault tolerance, and what does it have to do with Resilient Distributed Datasets?

Write your answer below this line: 
_______________________________________________________________________________________________________________________________
Basically, when something is "fault tolerant" it means the system can have parts of it fail but the system is to able to keep operating.
An RDD has split up the dataset/database into chunks and has copies of those chunks separated. If some chunks of data get corrupted, deleted, or lost on a server, the copies can be looked up and allows normal operation (fault tolerant)







## Question 5

What is MapReduce? How is it related to Hadoop?

Write your answer below this line: 
_______________________________________________________________________________________________________________________________
Hadoop MapReduce (Hadoop Map/Reduce) is a software framework for distributed processing of large data sets on compute clusters of commodity hardware. It is a sub-project of the Apache Hadoop project. The framework takes care of scheduling tasks, monitoring them and re-executing any failed tasks.



## Summary


In this lesson, we reviewed some common NoSQL interview questions. 