<a href="https://colab.research.google.com/github/pallavibekal/Data-Engineering-Spark-and-Hadoop-Code/blob/main/NoSQL_databases_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Information

### Introduction

NoSQL ('Not only SQL') is a non-relational or distributed database system. It has a dynamic schema, best suited for hierarchical data storage. It is horizontally scalable.

Whereas **SQL databases** are used for querying and manipulating structured query language (SQL) and are very powerful and versatile, they also have the disadvantage of being restrictive: SQL requires the usage of predefined schemas to determine the structure of the data before you work with it; Also all of your data must follow the same structure. This can require significant prior preparation, which offers challenges.

A **NoSQL database** has dynamic schema for unstructured data. It stores data in many ways, such as document-oriented, column-oriented, graph-based or organized as a KeyValue. This flexibility means that documents can be created without having a prior defined structure. Also, each document can have its own unique structure. The syntax varies from database to database, and you can add fields as you go.

Here are some key comparisons between SQL and NoSQL

- **Scalability**: SQL databases are vertically scalable. This allows an increase in load on a single server by increasing the RAM, CPU, or SSD; NoSQL databases are horizontally scalable i.e. it can handle more traffic by sharding, or adding more servers in your NoSQL database
- **Structure**: SQL databases are table-based. NoSQL databases are either key-value pairs, document-based, graph databases, or wide-column stores. Relational SQL databases are a better option for applications that require multi-row transactions such as an accounting system.
- **Property**: SQL databases follow ACID properties (Atomicity, Consistency, Isolation and Durability) whereas the NoSQL databases follow the Brewers CAP theorem (Consistency, Availability and Partition tolerance). 
- **Examples**: Examples of SQL databases include PostgreSQL, MySQL, Oracle and Microsoft SQL Server. NoSQL database examples include Cassandra, MongoDB, BigTable, HBase, Neo4j and CouchDB.

**Why we need NoSQL databases?**

We see two primary reasons why people consider using a NoSQL database.
* Productivity in Application development: A lot of application development effort is spent on mapping data between in-memory data structures and a relational database. A NoSQL database may provide a data model that better fits the application’s needs, thus simplifying that interaction and resulting in less code to write, debug, and evolve.
* Large-scale data: Organizations are finding it valuable to capture more data and process it more quickly. They are finding it expensive, if even possible, to do so with relational databases. The primary reason is that a relational database is designed to run on a single machine, but it is usually more economic to run large data and computing loads on clusters of many smaller and cheaper machines. Many NoSQL databases are designed explicitly to run on clusters, so they make a better fit for big data scenarios.

**Terminology**

The basic terms related to NoSQL databases are as follows:

* **Big data:** a collection of data that is huge in volume, yet growing exponentially with time.
* **Polyglot persistent:** a term that refers to using different data stores in different circumstances.
* **Database cluster:** a collection of databases that is managed by a single instance of a running database server. 

### Types of NoSQL Databases

The following are the different types of NoSQL databases:

* **Document databases** pair each key with a complex data structure known as a document. A document is a set of key-value pairs. MongoDB is an example of a document store database. A group of MongoDB documents is known as a collection. This is the equivalent of an RDBMS table.

* **Graph stores** are used to store information about networks of data, for instance, social connections. Graph stores include Neo4J and Giraph.

* **Key-value stores** databases store every single item in the database as a key together with its value. Examples of key-value stores are Riak and Berkeley DB. Some key-value stores, such as Redis, allow each value to have a type, such as an integer, which adds functionality.

* **Wide-column** stores such as Cassandra and HBase are optimized for queries over large datasets, and store columns of data together, instead of rows.

<figure>
<img src='https://cdn.iisc.talentsprint.com/CDS/Images/Nosql_databases.png' />
</figure>

### Setup Steps:

In [None]:
#@title Please enter your registration id to start: { run: "auto", display-mode: "form" }
Id = "2200092" #@param {type:"string"}

In [None]:
#@title Please enter your password (your registered phone number) to continue: { run: "auto", display-mode: "form" }
password = "9686800288" #@param {type:"string"}

In [None]:
#@title Run this cell to complete the setup for this Notebook
from IPython import get_ipython

ipython = get_ipython()
  
notebook= "M5_AST_01_NoSQL_databases_C" #name of the notebook

def setup():
#  ipython.magic("sx pip3 install torch")  
    ipython.magic("sx wget https://cdn.iisc.talentsprint.com/CDS/Datasets/students.csv")
    ipython.magic("sx wget https://cdn.iisc.talentsprint.com/CDS/MiniProjects/secure-connect-cds.zip")
    from IPython.display import HTML, display
    display(HTML('<script src="https://dashboard.talentsprint.com/aiml/record_ip.html?traineeId={0}&recordId={1}"></script>'.format(getId(),submission_id)))
    print("Setup completed successfully")
    return

def submit_notebook():
    ipython.magic("notebook -e "+ notebook + ".ipynb")
    
    import requests, json, base64, datetime

    url = "https://dashboard.talentsprint.com/xp/app/save_notebook_attempts"
    if not submission_id:
      data = {"id" : getId(), "notebook" : notebook, "mobile" : getPassword()}
      r = requests.post(url, data = data)
      r = json.loads(r.text)

      if r["status"] == "Success":
          return r["record_id"]
      elif "err" in r:        
        print(r["err"])
        return None        
      else:
        print ("Something is wrong, the notebook will not be submitted for grading")
        return None
    
    elif getAnswer() and getComplexity() and getAdditional() and getConcepts() and getComments() and getMentorSupport():
      f = open(notebook + ".ipynb", "rb")
      file_hash = base64.b64encode(f.read())

      data = {"complexity" : Complexity, "additional" :Additional, 
              "concepts" : Concepts, "record_id" : submission_id, 
              "answer" : Answer, "id" : Id, "file_hash" : file_hash,
              "notebook" : notebook,
              "feedback_experiments_input" : Comments,
              "feedback_mentor_support": Mentor_support}
      r = requests.post(url, data = data)
      r = json.loads(r.text)
      if "err" in r:        
        print(r["err"])
        return None   
      else:
        print("Your submission is successful.")
        print("Ref Id:", submission_id)
        print("Date of submission: ", r["date"])
        print("Time of submission: ", r["time"])
        print("View your submissions: https://cds.iisc.talentsprint.com/notebook_submissions")
        #print("For any queries/discrepancies, please connect with mentors through the chat icon in LMS dashboard.")
        return submission_id
    else: submission_id
    

def getAdditional():
  try:
    if not Additional: 
      raise NameError
    else:
      return Additional  
  except NameError:
    print ("Please answer Additional Question")
    return None

def getComplexity():
  try:
    if not Complexity:
      raise NameError
    else:
      return Complexity
  except NameError:
    print ("Please answer Complexity Question")
    return None
  
def getConcepts():
  try:
    if not Concepts:
      raise NameError
    else:
      return Concepts
  except NameError:
    print ("Please answer Concepts Question")
    return None
  
  
# def getWalkthrough():
#   try:
#     if not Walkthrough:
#       raise NameError
#     else:
#       return Walkthrough
#   except NameError:
#     print ("Please answer Walkthrough Question")
#     return None
  
def getComments():
  try:
    if not Comments:
      raise NameError
    else:
      return Comments
  except NameError:
    print ("Please answer Comments Question")
    return None
  

def getMentorSupport():
  try:
    if not Mentor_support:
      raise NameError
    else:
      return Mentor_support
  except NameError:
    print ("Please answer Mentor support Question")
    return None

def getAnswer():
  try:
    if not Answer:
      raise NameError 
    else: 
      return Answer
  except NameError:
    print ("Please answer Question")
    return None
  

def getId():
  try: 
    return Id if Id else None
  except NameError:
    return None

def getPassword():
  try:
    return password if password else None
  except NameError:
    return None

submission_id = None
### Setup 
if getPassword() and getId():
  submission_id = submit_notebook()
  if submission_id:
    setup() 
else:
  print ("Please complete Id and Password cells before running setup")



Setup completed successfully


### Importing required packages

In [None]:
import pandas as pd
from pprint import pprint

### Loading the data

In [None]:
students = pd.read_csv('students.csv')

### Cassandra

Apache Cassandra is a free, open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers. It provides high availability with no single point of failure. It is a NoSQL database developed by Facebook. It is a great database that allows you to effectively run queries on a large amount of structured and semi-structured data.

Cassandra has three containers, one within another. The outermost container is Keyspace. You can think of Keyspace as a database in the RDBMS land. Next, you will see the column family, which is like a table. Within a column family are columns, and columns are placed under rows. Each row is identified by a unique row key, similar to the primary key in RDBMS.

<figure>
<img src='https://cdn.iisc.talentsprint.com/CDS/Datasets/cassandra-data-model.ppm' />
</figure>

The difference from RDBMS is in the way Cassandra treats the data. Column families, unlike tables, can be schema free (schema optional). This means we can have different column names for different rows within the same column family. We can store about two billion columns per row. This means it can be very handy to store time-series data, such as tweets or comments on a blog post.

Other than Cassandra, HBase is also a Wide-column store. For similarities and dissimilarities between them, refer [here](https://www.scnsoft.com/blog/cassandra-vs-hbase).

#### Components of Cassandra

Cassandra consists of the following components:

<figure>
<img src='https://cdn.iisc.talentsprint.com/CDS/Images/cassandra_cluster.jpg' width= 600 px/>
</figure>

**Node** is the place where data is stored. It is the basic component of Cassandra.

**Data Center** A collection of nodes is called a data center. Many nodes are categorized as a data center.

**Cluster** The cluster is the collection of many data centers.

**Commit Log** Every write operation is written to Commit Log. Commit log is used for crash recovery.

**Mem-table** After data written in Commit log, data is written in Mem-table. Data is written in Mem-table temporarily.

**SSTable** When Mem-table reaches a certain threshold, data is flushed to an SSTable disk file.

#### Data Replication

As hardware problems can occur or link can be down at any time during the data process, a solution is required to provide a backup when the problem has occurred. So data is replicated for assuring no single point of failure.

Cassandra places replicas of data on different nodes based on these two factors.
* Where to place the next replica is determined by the Replication Strategy.
* While the total number of replicas placed on different nodes is determined by the Replication Factor.

One Replication factor means that there is only a single copy of data while three replication factor means that there are three copies of the data on three different nodes.

To know more about data replication click [here](https://docs.datastax.com/en/cassandra-oss/3.x/cassandra/architecture/archDataDistributeReplication.html#:~:text=and%20fault%20tolerance.-,A%20replication%20strategy%2A).

#### Installs

In [None]:
!pip install cassandra-driver

Collecting cassandra-driver
  Downloading cassandra_driver-3.25.0-cp37-cp37m-manylinux1_x86_64.whl (3.8 MB)
[K     |████████████████████████████████| 3.8 MB 16.4 MB/s 
Collecting geomet<0.3,>=0.1
  Downloading geomet-0.2.1.post1-py3-none-any.whl (18 kB)
Installing collected packages: geomet, cassandra-driver
Successfully installed cassandra-driver-3.25.0 geomet-0.2.1.post1


In [None]:
import cassandra
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider

In [None]:
print(cassandra.__version__)

3.25.0


#### Connecting the database

**DataStax**

DataStax, Inc. is a data management company based in Santa Clara, California.
Its product provides commercial support, software, and cloud database-as-a-service based on open source NoSQL database Apache Cassandra.

We will be using its free tier version here.

**Important:** Datastax account and keyspace creation steps provided below are encouraged but not mandatory. It will allow you to create your own cluster, perform data insertion and code execution steps end-to-end using your own credentials. Note that we have already inserted the data and provided the cluster connection through the CDS account in Datastax for the purpose of running this notebook.

**For detailed instructions on account creation and keyspace creation**, please refer to this [document](https://cdn.iisc.talentsprint.com/CDS/DB_Connect_Docs/Instruction_for_Astra_Datastax_Database_Creation.pdf)

**Astra Datastax login:** Login to [Datastax](https://www.datastax.com/) and create a database


**Connect the database and create keyspace:**

* Download Secure Connect Bundle zip file from Datastax [connect](https://docs.datastax.com/en/astra/docs/obtaining-database-credentials.html) section. Follow the instructions on the page
* Upload the `Secure-connect-XXXX.zip` file, which is downloaded from Datastax.
* Generate the token and save the credentials (.csv) from settings section.
    * Hint: Select role as admin-user and generate token
* Using the credentials generated in settings, specify the `client Id` and `Client Secret` to the variables below.

Set the `Secure connect bundle zip file` path and specify the `clientID` and `Client_Secret`

In [None]:
zip_path = '/content/secure-connect-cds.zip'
Client_ID = 'SzdMZDsXLvXUQiHRsEogQgtR'
Client_Secret = 'SaYcoWaejFAx4CxXzuf1spOMRa+t1oyTd8Z+Medbuba1q0Ww5AY,1MOPNvrr9wWSnR82,IiQa4muFoF8OfOhxdndNXtZbuZsv.dSwKKccUaHr96B8-88gyAWGURFO2Wa'

#### Create a Cluster instance to connect to your Astra database

You will typically have one instance of Cluster for each Cassandra cluster you want to interact with. Create a session object using the cluster.

In [None]:
cloud_config= {
        'secure_connect_bundle': zip_path
}
auth_provider = PlainTextAuthProvider(Client_ID,  Client_Secret)
cluster = Cluster(cloud=cloud_config, auth_provider=auth_provider)

In [None]:
session = cluster.connect()

#### Verifying the database connection

In [None]:
row = session.execute("select release_version from system.local").one()
if row:
    print(row[0])
else:
    print("An error occurred.")

#### Setting the Key Space in database

A keyspace is the top-level database object that controls the replication for the object it contains at each datacenter in the cluster. Keyspaces contain tables, materialized views and user-defined types, functions and aggregates. Typically, a cluster has one keyspace per application. Since replication is controlled on a per-keyspace basis, store data with different replication requirements (at the same datacenter) in different keyspaces.

Before creating tables and inserting data let us create and set the keyspace
* we can either create keyspace manually on Datastax dashboard or using the CQL command. 
[Hint](https://docs.datastax.com/en/cql-oss/3.x/cql/cql_reference/cqlCreateKeyspace.html)
* once the keyspace is created successfully, set the keyspace using the command `set_keyspace()`

In [None]:
try:
    session.set_keyspace('ast_student')
except Exception as e:
    print(e)

#### Insert the data into Database

In [None]:
# Display few rows of students dataframe
students.head()

For the following data insertion step we have already uploaded the data on Datastax using the CDS account, so you are not required to insert the data again. Therefore we have commented out the below lines of code.

However, if you would like to perform the data insertion step then please **create your own account** on Datastax as given in the reference [here](https://cdn.iisc.talentsprint.com/CDS/DB_Connect_Docs/Assignment_Datastax_Connect.pdf) and change the credentials and run the below code by uncommenting it.

**Create a column family in keyspace and insert the data using CQL command**

In [None]:
# Creating the students table
# query = """CREATE TABLE  IF NOT EXISTS students (studentID INT,
#                                                 name TEXT,
#                                                 age INT,
#                                                 marks INT,
#                                                 PRIMARY KEY (studentID)); """
# try:
#     session.execute(query)
# except Exception as e:
#     print(e)

In [None]:
#students_cols = ','.join(students.columns.values)
#for (i,row) in students.iterrows():
#    query = 'INSERT INTO ast_student.students ({}) VALUES (%s, %s, %s, %s)'.format(students_cols)
#    session.execute(query, tuple(row))

#### Querying the database

Select first 5 rows of the students table

In [None]:
query = 'SELECT * FROM ast_student.students LIMIT 5;'
rows=session.execute(query)
for row in rows:
    print(row)

Select the count of records where marks are above 85

In [None]:
Query = 'SELECT COUNT(*) FROM ast_student.students WHERE marks>85 ALLOW FILTERING;'
count = session.execute(Query)

count.one()

#### Updating the database

<font color='blue'>Uncomment and run the below line of code</font> **<font color='blue'>only if you are using your own credentials</font>** <font color='blue'>, to not affect original database given from CDS account.</font>

Update the value of the marks to 98 in the document where studendID is 2.

In [None]:
# Query = 'UPDATE ast_student.students SET marks = 98 WHERE studentID = 2;'
# session.execute(Query)

# query = 'SELECT * FROM ast_student.students LIMIT 5;'
# rows = session.execute(query)
# for row in rows:
#     print(row)

To verify the tables in the keyspace below CQL command will be helpful.

In [None]:
query = "SELECT * FROM system_schema.tables WHERE keyspace_name = 'ast_student';"
rows = session.execute(query)
for row in rows:
    print(row[1])

#### Drop table

It is not advisable to delete a table but to reduce space we might sometimes need to delete tables from the Datastax database.

<font color='blue'>Uncomment and run the below line of code</font> **<font color='blue'>only if you are using your own credentials</font>** <font color='blue'>, to not affect original database given from CDS account.</font>

In [None]:
# query = "DROP TABLE IF EXISTS ast_student.students;"
# session.execute(query)

#### Close the session and cluster connection

In [None]:
session.shutdown()
cluster.shutdown()

### MongoDB

As more and more data become available as unstructured or semi-structured, the need of managing them through NoSQL database increases. Python can also interact with NoSQL databases in a similar way as it interacts with Relational databases. In order to get a thorough understanding of the terms used in MongoDB, the comparison of them with the equivalent in RDBMS can be seen in the figure below.

<figure>
<img src='https://cdn.iisc.talentsprint.com/CDS/Images/Mongodb.PNG' width=700 px />
</figure>

**Database:** In simple words, it can be called the physical container for data. Each of the databases has its own set of files on the file system with multiple databases existing on a single MongoDB server.

**Collection:** A group of database documents can be called a collection. The RDBMS equivalent to a collection is a table. The entire collection exists within a single database. There are no schemas when it comes to collections. Inside the collection, various documents can have varied fields, but mostly the documents within a collection are meant for the same purpose or for serving the same end goal.

**Document:** A set of key-value pairs can be designated as a document. Documents are associated with dynamic schemas. The benefit of having dynamic schemas is that a document in a single collection does not have to possess the same structure or fields. Also, the common fields in a collection’s document can have varied types of data.

The figure below shows the steps for making connection to a MongoDB custer.

<br>
<center>
<img src='https://cdn.iisc.talentsprint.com/CDS/Images/MongoDB_conn_steps.JPG' width =  900px/>
</center>

In order to connect to MongoDB, python uses a library known as pymongo. This library can be added to the python environment using the below command.

In [None]:
!pip install pymongo

#### Making a connection with MongoClient

**MongoDB Atlas login:** Login to [MongoDB Atlas](https://www.mongodb.com/) and create a cluster

For detailed instructions on account creation, please refer to this [document](https://cdn.iisc.talentsprint.com/CDS/DB_Connect_Docs/Assignment_MongoDB_Connect.pdf)

**Connect the cluster:**

* Create a cluster and click on connect option
* Select Connect your application option 
* Select the driver and version i.e, Python 3.4 or later
* Generate the connection string

Establishing a connection in MongoDB requires to create a MongoClient to the running MongoDB instance.

In [None]:
import pymongo
from pymongo import MongoClient

client = MongoClient('mongodb://localhost:27017/')

The above code will connect to the default host and port, but we can specify the host and port as shown below:

In [None]:
client = MongoClient("mongodb://CDS:cdsuser123@cdscluster-shard-00-00.jpzjh.mongodb.net:27017,cdscluster-shard-00-01.jpzjh.mongodb.net:27017,cdscluster-shard-00-02.jpzjh.mongodb.net:27017/myFirstDatabase?ssl=true&replicaSet=atlas-vhmege-shard-0&authSource=admin&retryWrites=true&w=majority")

#### Creating a database

To create a database in MongoDB, an instance of the same is used and the database name is specified. MongoDB will create a database if it doesn't exist and connect to it.

In [None]:
db = client['database_1']

After connecting to the database, we need to specify which collection or table we want to use.

In [None]:
coll = db.collection

#### Data in MongoDB

Data in MongoDB is represented and stored using JSON-style documents. In PyMongo we use dictionaries to represent documents.

In [None]:
document = students.iloc[0,:].to_dict()
for keys in document:
    if type(document[keys])!=str:
        document[keys] = int(document[keys])
document

In [None]:
documents = []
for i in range(len(students)):
    if i>0:
        doc = students.iloc[i,:].to_dict()
        for keys in doc:
            if type(doc[keys])!=str:
                doc[keys] = int(doc[keys])
        documents.append(doc)
documents

#### Inserting a Document

To insert a document into a collection, we use the `insert_one()` method. As we saw earlier, a collection is similar to a table in RDBMS while a document is similar to a row.

For the following data insertion step we have already uploaded the data on MongoDB Atlas using the CDS account, so you are not required to insert the data again. Therefore we have commented out the below lines of code.

However, if you would like to perform the data insertion step then please **create your own account** on MongoDB Atlas as given in the reference [here](https://cdn.iisc.talentsprint.com/CDS/DB_Connect_Docs/Assignment_MongoDB_Connect.pdf) and change the credentials and run the below code by uncommenting it.

In [None]:
# result = coll.insert_one(document)

We can insert multiple documents to a collection using the `insert_many()` method as shown below.

In [None]:
# new_document = coll.insert_many(documents)

Display list of collections within database

In [None]:
db.list_collection_names()

Display the number of documents within a collection

In [None]:
coll.count_documents({})

### Query the Database

We have used some examples of queries in the cells below. To find the comprehensive list of query examples you could refer [here](https://docs.mongodb.com/manual/tutorial/query-documents/).

#### Retrieving a Single Document

`find_one()` returns a single document matching the query or none if it doesn't exist. This method returns the first match that it comes across. When we call the method below, we get the first article we inserted into our collection.

In [None]:
pprint(coll.find_one({"marks": 86}))

#### Finding all Documents in a Collection

MongoDB also allows us to retrieve all documents in a collection using the `find` method.

In [None]:
# To retrieve all documents we can use find method with empty query
for i in coll.find():
    pprint(i)

#### Filter based on fields

If we want to see only a few fields, we can do that by just putting all the required field names with value 1.

In [None]:
# For documents where marks=86, display only 'name' and 'marks' fields
pprint(coll.find_one({"marks": 86},{"name": 1,"marks": 1}))

On the other hand, if we want to discard a few fields only from the complete document you can put the field names equal to 0. Therefore, only those fields will be excluded. Please note that you cannot use a combination of 1s and 0s to get the fields. Either all should be one or all should be zero.

In [None]:
# For documents where marks=86, display all fields other than 'name' and 'marks' 
pprint(coll.find_one({"marks": 86},{"name": 0,"marks": 0}))

#### Filter based on less than and greater than

Now, let us find all the documents where marks is greater than 75 and less than 96.

In [None]:
result_ = coll.find({
                    "marks" : { "$lt" : 96, "$gt" : 75}
                    })
for i in result_:
    print(i)

#### Filter with Regular Expressions

Regular Expressions are of great use when you have text fields and you want to search for documents with a specific pattern.

It can be used with the operator `$regex` and we can provide value to the operator for the regex pattern to match.

In [None]:
# Display the documents where the 'name' field starts with character 'J'
result = coll.find({
                    "name" : { "$regex" : "^J" }
                    })
for i in result:
    print(i)

#### Filter based on Logical operator

The following query will return all the documents where the marks is between 75 and 96 and name starts with character 'J'. Futher, the subqueries for the *and* operator will come inside a list.

In [None]:
result = coll.find({
    "$and" : [{
                 "marks" : { "$lt" : 96, "$gt" : 75}
              },
              {
                   "name" : { "$regex" : "^J" }
              }]
})

for i in result:
    print(i)

#### Updating a Document

To update a document `update_one()` method is used. The first parameter taken by this function is a query object defining the document to be updated. If the method finds more than one document, it will only update the first one.

<font color='blue'>Uncomment and run the below line of code</font> **<font color='blue'>only if you are using your own credentials</font>** <font color='blue'>, to not affect original database given from CDS account.</font>

In [None]:
# Update the value of the marks in the document where marks is *100*

# query = { "marks": 100 }
# new_marks = { "$set": { "marks": 98 } }

# coll.update_one(query, new_marks)

# for i in coll.find():
#     pprint(i)

#### MongoDB Delete Document

To delete a document in MongoDB `delete_one()` method is used. The first parameter for this method is the query object of the document we want to delete. If this method finds more than one document, it deletes only the first one found. Let's delete the document with the name *Joy*.

<font color='blue'>Uncomment and run the below line of code</font> **<font color='blue'>only if you are using your own credentials</font>** <font color='blue'>, to not affect original database given from CDS account.</font>

In [None]:
# delete_document = coll.delete_one({"name": "Joy"})
# print(delete_document.deleted_count, " document deleted.")

In order to delete many documents, the `delete_many()` method is used. Passing an empty query object will delete all the documents.

In [None]:
# delete_documents = coll.delete_many({})
# print(delete_documents.deleted_count, " documents deleted.")

#### Dropping a Collection

In MongoDB, we can delete a collection using the `drop()` method.

<font color='blue'>Uncomment and run the below line of code</font> **<font color='blue'>only if you are using your own credentials</font>** <font color='blue'>, to not affect original database given from CDS account.</font>

In [None]:
# db.collection.drop()

#### Close the MongoClient connection

In [None]:
client.close()

### Please answer the questions below to complete the experiment:




In [None]:
# @title Select the False statement: { run: "auto", form-width: "500px", display-mode: "form" }
Answer = "NoSQL databases have a predefined schema whereas SQL databases use dynamic schema for unstructured data" #@param ["","SQL databases are table based databases whereas NoSQL databases can be document based keyvalue pairs and graph databases","NoSQL databases have a predefined schema whereas SQL databases use dynamic schema for unstructured data", "keyvalue databases store data as a single collection without any structure or relation"]

In [None]:
#@title How was the experiment? { run: "auto", form-width: "500px", display-mode: "form" }
Complexity = "Good and Challenging for me" #@param ["","Too Simple, I am wasting time", "Good, But Not Challenging for me", "Good and Challenging for me", "Was Tough, but I did it", "Too Difficult for me"]


In [None]:
#@title If it was too easy, what more would you have liked to be added? If it was very difficult, what would you have liked to have been removed? { run: "auto", display-mode: "form" }
Additional = "na" #@param {type:"string"}


In [None]:
#@title Can you identify the concepts from the lecture which this experiment covered? { run: "auto", vertical-output: true, display-mode: "form" }
Concepts = "Yes" #@param ["","Yes", "No"]


In [None]:
#@title  Text and image description/explanation and code comments within the experiment: { run: "auto", vertical-output: true, display-mode: "form" }
Comments = "Very Useful" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [None]:
#@title Mentor Support: { run: "auto", vertical-output: true, display-mode: "form" }
Mentor_support = "Very Useful" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [None]:
#@title Run this cell to submit your notebook for grading { vertical-output: true }
try:
  if submission_id:
      return_id = submit_notebook()
      if return_id : submission_id = return_id
  else:
      print("Please complete the setup first.")
except NameError:
  print ("Please complete the setup first.")

Your submission is successful.
Ref Id: 6039
Date of submission:  09 Jan 2022
Time of submission:  15:38:54
View your submissions: https://cds.iisc.talentsprint.com/notebook_submissions
