# A Intro to MongoDB with mongoose
## CSCI E-31 Final Project
### Nathan Weeks
### Spring 2022

## Abstract

Node applications (and other backend web languages/frameworks) may have a need to persist state.
Flat files and relational database management systems have historically been used to persist state for web applications.

However, the relational model does not easily accommodate semi-structured date, as well as data with hierarchical relationships, leading to the advent of alternatives (traditionally referred to as [NoSQL](https://en.wikipedia.org/wiki/NoSQL) databases).

A [document-oriented database](https://www.mongodb.com/document-databases) (aka "document database" or "document store") is one such alternative database.
In contrast to relational databases, document databases store *collections* of *documents (as opposed to *tables* of *records*).

This tutorial provides provide a brief overview of one such document database (MongoDB), and illustrates a few basic essentials for using the mongoose API to interact with MongoDB databases from Node.js applications.

### MongoDB

[MongoDB](https://www.mongodb.com/) is an open-source [document-oriented database](https://www.mongodb.com/document-databases) (aka "document database" or "document store"). In contrast to relational databases, document databases store *collections* of *documents (as opposed to *tables* of *records*).



#### Installing

A number of options exist for installing MongoDB, including:

* [MongoDB Community Server](https://www.mongodb.com/try/download/community) can be downloaded and installed on a local Windows, macOS, or Linux workstation/server (note macOS aach64 is not officially supported as of this writing, but can be installed via [homebrew](https://brew.sh/)).
* The [conda](https://docs.conda.io/) package manager can be used to install mongodb from the [conda-forge](https://anaconda.org/conda-forge/mongodb) channel:
```
conda install -c conda-forge mongodb
```
* MongoDB can also be deployed using cloud service free tiers, such as [MongoDB Atlas](https://www.mongodb.com/cloud/atlas/).
 [Azure CosmosDB](https://azure.microsoft.com/en-us/services/cosmos-db/) is a similar document store that provides a compatible API (in particular, mongoose can be used with CosmosDB).

This example [Binder](https://jupyter.org/binder) instance uses a local (running in the same container as JupyterLab) MongoDB installed via conda.


### mongoose
The [mongoose](https://mongoosejs.com/) library provides an API to MongoDB for Node.js applications.

#### Installing
```
npm install mongoose
```



#### Connecting to a MongoDB instance
The general syntax for connecting to a MongoDB instance is:


In [1]:
const mongoose = require('mongoose');

mongoose.connect('mongodb://localhost:27017/mydb');

Promise { <pending> }

## Schemas



## CRUD with mongoose

#### Create

mongoose requires that a [schema](https://mongoosejs.com/docs/guide.html#definition) be defined for a MongoDB collection.

The following is an example schema that implements a subset of the [BrAPI Germplasm API](https://app.swaggerhub.com/apis/PlantBreedingAPI/BrAPI-Germplasm/2.0) (where germplasm == germ cells; e.g., for crops, typically seeds).

In [2]:
var germplasmSchema = mongoose.Schema({
    accessionNumber: {type: String, required: true},
    acquisitionDate: {type: Date, required: true},
    commonCropName: {type: String, required: true},
    additionalInfo: [String]
   });

`type` and `required` properties are specified in a [SchemaType](https://mongoosejs.com/docs/schematypes.html) object; `required: true` is analogous to SQL `NOT NULL`.
See a list of valid SchemaTypes [here](https://mongoosejs.com/docs/schematypes.html#what-is-a-schematype).
Note that `additionalInfo` is an *Array* of *String*s.

Next, a [model](https://mongoosejs.com/docs/models.html) must be created from the schema.
*FIXME: update description*
This acts as a class from which objects (representing mongodb documents) are instantiated.

The basic syntax to create a model from a schema using the `model` constructor is:
```
var model = mongoose.model('ModelName', schema);
```
where `ModelName` is singular; this model name will be mapped to a lower-case, pluralized MongoDB collection name.

For example, specifying "Germplasm" as the model name:

In [5]:
var Germplasm = mongoose.model('Germplasm', germplasmSchema);

will create a model for documents that will be saved in the "germplasms" collection (note the plural of "germplasm" should be "germplasm", not "germplasms", but c'est la vie...)

Finally, we can create an instance of the model (i.e., a *document*) using the `create` method, and save it to the "germplasms" collection.

In [40]:
Germplasm.create({
    accessionNumber: 'A000123',
    acquisitionDate: '2020-01-03',
    commonCropName: 'barley',
    additionalInfo: null
}, (err, germplasm)=>{
    if (err){console.log(err)}
    else {
        console.log("germplasm created!");
        console.log(germplasm);
    }
  });

germplasm created!
{
  accessionNumber: 'A000123',
  acquisitionDate: 2020-01-03T00:00:00.000Z,
  commonCropName: 'barley',
  additionalInfo: null,
  _id: new ObjectId("626549ede4f787609db9f53b"),
  __v: 0
}


The value of the `additionalInfo` property may be `null` since a `required: true` SchemaType option wasn't specified.
However, if a required property is `null` or missing, an error will result; e.g., if the required `acquisitionDate` property is missing:

In [42]:
Germplasm.create({
    accessionNumber: 'A000123',
//  acquisitionDate: '2020-01-03', // commenting-out; will error!
    commonCropName: 'barley',
    additionalInfo: null
}, (err, germplasm)=>{
    if (err){console.log(err)}
    else {
        console.log("germplasm saved!");
        console.log(germplasm);
    }
  });

Error: Germplasm validation failed: acquisitionDate: Path `acquisitionDate` is required.
    at ValidationError.inspect (/home/nweeks/node_modules/mongoose/lib/error/validation.js:48:26)
    at formatValue (internal/util/inspect.js:745:19)
    at inspect (internal/util/inspect.js:319:10)
    at formatWithOptionsInternal (internal/util/inspect.js:1979:40)
    at formatWithOptions (internal/util/inspect.js:1861:10)
    at Console.value (internal/console/constructor.js:328:14)
    at Console.log (internal/console/constructor.js:364:61)
    at evalmachine.<anonymous>:7:22
    at /home/nweeks/node_modules/mongoose/lib/helpers/promiseOrCallback.js:17:11
    at /home/nweeks/node_modules/mongoose/lib/model.js:5028:21 {
  errors: {
    acquisitionDate: ValidatorError: Path `acquisitionDate` is required.
        at validate (/home/nweeks/node_modules/mongoose/lib/schematype.js:1331:13)
        at SchemaDate.SchemaType.doValidate (/home/nweeks/node_modules/mongoose/lib/schematype.js:1315:7)
     

Let's save a few more accessions to make things interesting when searching:

In [49]:
Germplasm.create([
  {
    accessionNumber: 'C123',
    acquisitionDate: '1963-10-20',
    commonCropName: 'oats',
    additionalInfo: ["donated by Quaker Oats Company", "missing; suspect accidentally consumed for breakfast"]
  },
  {
    accessionNumber: 'D123',
    acquisitionDate: '1999-12-31',
    commonCropName: 'maize',
    additionalInfo: ["popcorn variety", "missing; suspect accidentally microwaved and eaten during Oscars"]
  }
]).then(() => {console.log("done!")});

Promise { <pending> }

done!


#### Read

mongoose models have a number of static helper functions for  (see the [complete list](https://mongoosejs.com/docs/queries.html)).

The `Model.find()` function is the core query function (analogous to SQL `SELECT`).
`Model.find()` returns the MogoDB collection associated with the `Model` as an array of documents, optionally filtered by any specified filter.


In [59]:
// Query object assigned to variable so it isn't pretty-printed below cell

// SELECT * FROM germplasms;
var query = Germplasm.find({}, function(err, germplasm){ console.log(germplasm) });

[
  {
    _id: new ObjectId("62652d30e4f787609db9f512"),
    accessionNumber: 'A000123',
    acquisitionDate: 2020-01-03T00:00:00.000Z,
    commonCropName: 'barley',
    additionalInfo: null,
    __v: 0
  },
  {
    _id: new ObjectId("62652d3ee4f787609db9f514"),
    accessionNumber: 'A000123',
    acquisitionDate: 2020-01-03T00:00:00.000Z,
    commonCropName: 'barley',
    additionalInfo: null,
    __v: 0
  },
  {
    _id: new ObjectId("62652f75e4f787609db9f516"),
    accessionNumber: 'A000123',
    acquisitionDate: 2020-01-03T00:00:00.000Z,
    commonCropName: 'barley',
    additionalInfo: null,
    __v: 0
  },
  {
    _id: new ObjectId("62652f7be4f787609db9f518"),
    accessionNumber: 'A000123',
    acquisitionDate: 2020-01-03T00:00:00.000Z,
    commonCropName: 'barley',
    additionalInfo: null,
    __v: 0
  },
  {
    _id: new ObjectId("626530d1e4f787609db9f51c"),
    accessionNumber: 'A000123',
    acquisitionDate: 2020-01-03T00:00:00.000Z,
    commonCropName: 'barley',
    additi

In [61]:
// example using a filter

// SQL: SELECT * FROM germplasms WHERE commmonCropName = 'maize';
var query = Germplasm.find({commonCropName: "maize"}, function(err, germplasm){ console.log(germplasm) });

[
  {
    _id: new ObjectId("62653aa7e4f787609db9f52d"),
    accessionNumber: 'D123',
    acquisitionDate: 1999-12-31T00:00:00.000Z,
    commonCropName: 'maize',
    additionalInfo: [
      'popcorn variety',
      'missing; suspect accidentally microwaved and eaten during Oscars'
    ],
    __v: 0
  },
  {
    _id: new ObjectId("6265422be4f787609db9f531"),
    accessionNumber: 'D123',
    acquisitionDate: 1999-12-31T00:00:00.000Z,
    commonCropName: 'maize',
    additionalInfo: [
      'popcorn variety',
      'missing; suspect accidentally microwaved and eaten during Oscars'
    ],
    __v: 0
  },
  {
    _id: new ObjectId("62654c05e4f787609db9f540"),
    accessionNumber: 'D123',
    acquisitionDate: 1999-12-31T00:00:00.000Z,
    commonCropName: 'maize',
    additionalInfo: [
      'popcorn variety',
      'missing; suspect accidentally microwaved and eaten during Oscars'
    ],
    __v: 0
  },
  {
    _id: new ObjectId("62654c44e4f787609db9f544"),
    accessionNumber: 'D123',
    a

#### Update

The simplest routine to update a single document is `Model.updateOne()`.
*TODO: syntax, others*

In [96]:
// SQL: UPDATE germplasms SET acquisitionDate = '2009-12-31' WHERE accessionNumber = A000123
var query = 
  Germplasm
    .updateOne({ accessionNumber: "A000123" },
               { acquisitionDate: '2009-12-31' },
               function(err, germplasm){ 
                   console.log(`Number of documents matched ${germplasm.matchedCount}`)});

Number of documents matched 1


#### Delete



In [97]:
var query = Germplasm.findOneAndDelete({ accessionNumber: "D123" }, function(err, germplasm) { console.log(`DELETED: ${germplasm}`) });

DELETED: null


## Conclusions
TODO

## References
TODO