# A Brief Intro to MongoDB with mongoose
## CSCI E-31 Final Project - Spring 2022
### Nathan Weeks

## Introduction

Web applications commonly need to persist application state.
Aside from flat files, relational database management systems were historically used in the [Web 1.0](https://www.techopedia.com/definition/27960/web-10) era.

However, the relational model does not easily accommodate semi-structured data, as well as data with hierarchical relationships, leading to the advent of alternatives (traditionally referred to as [NoSQL](https://en.wikipedia.org/wiki/NoSQL) databases).
A [document-oriented database](https://www.mongodb.com/document-databases) (aka "document database" or "document store") is one category of NoSQL database.
Document databases store [documents](https://www.mongodb.com/docs/manual/core/document/) (analogous to relational database rows/records), which are data structures that resemble Javascript objects.
Documents are organized into [collections](https://www.mongodb.com/docs/manual/core/databases-and-collections/#collections) (analogous to relational database *tables*).

This tutorial provides provide a brief overview of one such document database (MongoDB), and illustrates a few basic essentials for using the mongoose API to interact with MongoDB databases from Node.js applications.

### MongoDB

[MongoDB](https://www.mongodb.com/) is a popular open-source document database (for a listing of others, see [this Wikipedia entry](https://en.wikipedia.org/wiki/Document-oriented_database#Implementations)).

#### Installing

This [Jupyter](https://jupyter.org/) notebook runs in a [Binder](https://jupyter.org/binder) computing environment that has MongoDB preinstalled.

For other software environments, a number of options exist for provisioning MongoDB, including:

* [MongoDB Community Server](https://www.mongodb.com/try/download/community) can be downloaded and installed on a local Windows, macOS, or Linux workstation/server (note macOS aach64 is not officially supported as of this writing, but can be installed via [homebrew](https://brew.sh/)).
* The [conda](https://docs.conda.io/) package manager can be used to install mongodb from the [conda-forge](https://anaconda.org/conda-forge/mongodb) channel:
```
conda install -c conda-forge mongodb
```
* MongoDB can also be deployed using cloud service free tiers, such as [MongoDB Atlas](https://www.mongodb.com/cloud/atlas/).
 [Azure CosmosDB](https://azure.microsoft.com/en-us/services/cosmos-db/) is a similar document store that provides a compatible API (in particular, mongoose can be used with CosmosDB).


### mongoose
[mongoose](https://mongoosejs.com/) is an Object Data Modeling (ODM) for Node.js and MongoDB.
mongoose builds on top of the [MongoDB Node.js driver](https://www.mongodb.com/docs/drivers/node/current/), additionally providing application-layer schema validation.

#### Installing
```
npm install mongoose
```

#### Connecting to a MongoDB instance
The [mongoose.connect()](https://mongoosejs.com/docs/connections.html) method is used to establish a connection from the Node.js application to a MongoDB instance.

e.g., to connect to a MongoDB database `mydb` on a MongoDB instance running on `localhost`, listening on the MongoDB-default port 27017:

In [116]:
const mongoose = require('mongoose');

mongoose.connect('mongodb://localhost:27017/mydb');

SyntaxError: Identifier 'mongoose' has already been declared

## Schemas and Models

In mongoose, collections are accessed through [Models](https://mongoosejs.com/docs/models.html), which are compiled from [Schemas](https://mongoosejs.com/docs/guide.html#definition) that define the structure of the collection.

The following is an example schema that implements a subset of the [Breeding API (BrAPI) Germplasm API](https://app.swaggerhub.com/apis/PlantBreedingAPI/BrAPI-Germplasm/2.0) (where germplasm == germ cells; e.g., crop seeds).

In [117]:
var germplasmSchema = mongoose.Schema({
    accessionNumber: {type: String, required: true},
    acquisitionDate: {type: Date, required: true},
    commonCropName: {type: String, required: true},
    additionalInfo: [String]
   });

`type` and `required` properties are specified in a [SchemaType](https://mongoosejs.com/docs/schematypes.html) object; `required: true` is analogous to SQL `NOT NULL`.
See a list of valid SchemaTypes [here](https://mongoosejs.com/docs/schematypes.html#what-is-a-schematype).
Note that `additionalInfo` is an *Array* of *String*s.

Next, a *model* must be created from the schema.\
Models act as constructors from which documents in an underlying collection are created or accessed.

The basic syntax to create a model from a schema using the `model` constructor is:
```
var model = mongoose.model('ModelName', schema);
```
where `ModelName` is singular; this model name will be mapped to a lower-case, pluralized MongoDB collection name.

For example, specifying "Germplasm" as the model name:

In [118]:
var Germplasm = mongoose.model('Germplasm', germplasmSchema);

OverwriteModelError: Cannot overwrite `Germplasm` model once compiled.

will create a model for documents that will be saved in the "germplasms" collection (note the plural of "germplasm" should be "germplasm", not "germplasms", but c'est la vie...)

## CRUD with mongoose

A complete list of CRUD (Create, Read, Update, Delete) operations supported by mongoose is available at https://mongoosejs.com/docs/queries.html

A few common operations in each category are illustrated below.

#### Create

The [Model.create()](https://mongoosejs.com/docs/api/model.html#model_Model.create) method creates a document in the underlying collection.

E.g., using our `Germplasm` model to create a document in the `germplasms` collection:

In [124]:
// analogous SQL: INSERT INTO germplasms VALUES ('A000123', DATE '2020-01-03', 'barley', NULL);
Germplasm.create({
    accessionNumber: 'A000123',
    acquisitionDate: '2020-01-03',
    commonCropName: 'barley',
    additionalInfo: null
}, (err, germplasm)=>{
    if (err){console.log(err)}
    else {
        console.log("germplasm created!");
        console.log(germplasm);
    }
  });

germplasm created!
{
  accessionNumber: 'A000123',
  acquisitionDate: 2020-01-03T00:00:00.000Z,
  commonCropName: 'barley',
  additionalInfo: null,
  _id: new ObjectId("6266784ce4f787609db9f58b"),
  __v: 0
}


A main feature of mongoose models is *schema validation*.
In the example above, the value of the `additionalInfo` property may be `null` since a `required: true` SchemaType option wasn't specified.
However, if a required property is `null` or missing, an error will result.

e.g., attempting to create a document with the required `acquisitionDate` property missing will result in an error, and a document will not be created:

In [121]:
// THIS WILL ERROR!
Germplasm.create({
    accessionNumber: 'A000123',
//  acquisitionDate: '2020-01-03', // required property commented-out
    commonCropName: 'barley',
    additionalInfo: null
}, (err, germplasm)=>{
    if (err){console.log(err)}
    else {
        console.log("germplasm saved!");
        console.log(germplasm);
    }
  });

Error: Germplasm validation failed: acquisitionDate: Path `acquisitionDate` is required.
    at ValidationError.inspect (/home/nweeks/node_modules/mongoose/lib/error/validation.js:48:26)
    at formatValue (internal/util/inspect.js:745:19)
    at inspect (internal/util/inspect.js:319:10)
    at formatWithOptionsInternal (internal/util/inspect.js:1979:40)
    at formatWithOptions (internal/util/inspect.js:1861:10)
    at Console.value (internal/console/constructor.js:328:14)
    at Console.log (internal/console/constructor.js:364:61)
    at evalmachine.<anonymous>:8:22
    at /home/nweeks/node_modules/mongoose/lib/helpers/promiseOrCallback.js:17:11
    at /home/nweeks/node_modules/mongoose/lib/model.js:5028:21 {
  errors: {
    acquisitionDate: ValidatorError: Path `acquisitionDate` is required.
        at validate (/home/nweeks/node_modules/mongoose/lib/schematype.js:1331:13)
        at SchemaDate.SchemaType.doValidate (/home/nweeks/node_modules/mongoose/lib/schematype.js:1315:7)
     

Let's save a few more accessions to make things interesting when searching:

In [122]:
Germplasm.create([
  {
    accessionNumber: 'C123',
    acquisitionDate: '1963-10-20',
    commonCropName: 'oats',
    additionalInfo: ["donated by Quaker Oats Company", "missing; suspect accidentally consumed for breakfast"]
  },
  {
    accessionNumber: 'D123',
    acquisitionDate: '1999-12-31',
    commonCropName: 'maize',
    additionalInfo: ["popcorn variety", "missing; suspect accidentally microwaved and eaten during Oscars"]
  }
]).then(() => {console.log("done!")});

Promise { <pending> }

done!


#### Read

The `Model.find()` function is the core query function (analogous to SQL `SELECT`).
`Model.find()` returns the MogoDB collection associated with the `Model` as an array of documents, optionally filtered by any specified filter.

In [132]:
// Query object assigned to variable so it isn't pretty-printed below cell

// SELECT * FROM germplasms;
var query = Germplasm.find({}, function(err, germplasm){ console.log(germplasm) });

[
  {
    _id: new ObjectId("62652d30e4f787609db9f512"),
    accessionNumber: 'A000123',
    acquisitionDate: 2009-12-30T00:00:00.000Z,
    commonCropName: 'barley',
    additionalInfo: [ 'foo' ],
    __v: 0
  },
  {
    _id: new ObjectId("62652d3ee4f787609db9f514"),
    accessionNumber: 'A000123',
    acquisitionDate: 2020-01-03T00:00:00.000Z,
    commonCropName: 'barley',
    additionalInfo: null,
    __v: 0
  },
  {
    _id: new ObjectId("62652f75e4f787609db9f516"),
    accessionNumber: 'A000123',
    acquisitionDate: 2020-01-03T00:00:00.000Z,
    commonCropName: 'barley',
    additionalInfo: null,
    __v: 0
  },
  {
    _id: new ObjectId("62652f7be4f787609db9f518"),
    accessionNumber: 'A000123',
    acquisitionDate: 2020-01-03T00:00:00.000Z,
    commonCropName: 'barley',
    additionalInfo: null,
    __v: 0
  },
  {
    _id: new ObjectId("626530d1e4f787609db9f51c"),
    accessionNumber: 'A000123',
    acquisitionDate: 2020-01-03T00:00:00.000Z,
    commonCropName: 'barley',
    a

In [126]:
// example using a filter

// SQL: SELECT * FROM germplasms WHERE commmonCropName = 'maize';
var query = Germplasm.find({commonCropName: "maize"}, function(err, germplasm){ console.log(germplasm) });

[
  {
    _id: new ObjectId("626677e7e4f787609db9f586"),
    accessionNumber: 'D123',
    acquisitionDate: 1999-12-31T00:00:00.000Z,
    commonCropName: 'maize',
    additionalInfo: [
      'popcorn variety',
      'missing; suspect accidentally microwaved and eaten during Oscars'
    ],
    __v: 0
  }
]


#### Update

The simplest routine to update a single document is `Model.updateOne()`.
As with `Model.find()`, the first argument is a filter; the second argument is an object containing updated values for the specified properties:

In [133]:
// SQL: UPDATE germplasms SET acquisitionDate = DATE '2009-12-31' WHERE accessionNumber = 'A000123';
var query = 
  Germplasm
    .updateOne({ accessionNumber: "A000123" },
               { acquisitionDate: '2009-12-31'},
               function(err, germplasm){ 
                   console.log(`Number of documents matched ${germplasm.matchedCount}`)});

Number of documents matched 1


There is also a [Model.updateMany()](https://mongoosejs.com/docs/api.html#model_Model.updateMany) method for updating multiple documents.

#### Delete

[Model.findOneAndDelete()](https://mongoosejs.com/docs/api/model.html#model_Model.findOneAndDelete) deletes a single document matching the specified filter:

In [109]:
var query = Germplasm.findOneAndDelete({ accessionNumber: "D123" }, function(err, germplasm) { console.log(`DELETED: ${germplasm}`) });

DELETED: null


To delete *all* documents in a collection, the [Model.deleteMany()](https://mongoosejs.com/docs/api/model.html#model_Model.deleteMany) method can be used, with an empty object specified for the filter:

In [129]:
// SQL: DELETE FROM germplasms;
var query = Germplasm.deleteMany({});

## Conclusions
TODO

## References
TODO