# MongoDB Tutorial
## *Prof. Gary L. Pavlis*

## Overview
This notebook is designed as a teaching tutorial on use of the MongoDB database used in MsPASS.  Numerous pedagogic materials exist online for learning MongoDB, but this notebook focuses on key features the author has found useful in seismology research.  It is best used in conjunction with two other sources:
1.  The section of the User's Manual titled "Using MongoDB with MsPASS".
2.  As with most modern IT topics a web search for details of some topics addressed in this tutorial may be helpful if the MsPASS User's Manual doesn't address the topic.

The bulk of this notebook is organized by the keywords of the standard CRUD acronymn of database theory.  CRUD is an abbreviation of Create (save), Read, Update, and Delete.  Section are titled with those keywords and covered in the order defined by CRUD.  Before that, however, it is necessary to review a few basic concepts covered in the section immediately below.

## MongoDB Core Concepts
### Client-server model
MongoDB is a client-server system.  That bit of jargon 
has some important implications:

1.  All database commands issued from python are not executed directly by the python interpreter.  Instead instructions are sent to the MongoDB server.   In MsPASS the server is launched inside a container.   Unless you are running this notebook on a cluster with multiple nodes, you can verify the server is running by launching a terminal in the jupyterlab interface and running the command `ps -A`.  You should get output similar to the following that shows the server as the CMD with the name `mongod`:
```
root@b0d79c4cc440:/home/scoped# ps -A
  PID TTY          TIME CMD
    1 ?        00:00:00 tini
    8 ?        00:00:00 start-mspass.sh
   15 ?        00:07:27 dask-scheduler
   21 ?        00:06:47 dask-worker
   22 ?        00:01:44 mongod
   23 ?        00:00:44 jupyter-lab
   34 ?        00:00:00 python3.10
   37 ?        00:10:43 python3.10
  154 ?        00:00:20 python
  364 ?        00:00:01 python
 1010 pts/0    00:00:00 bash
 1036 pts/0    00:00:00 ps
```
2.  All database IO passes through a network data connection on network "port number" 27017.   That is important to know as a fundamental issue because a network communication channel is not the fastest data pipe on most computers.
3.  To communicate with MongoDB, your program must create a connection to the "server".  In the jargon of modern computing you have to create a "client" that will act as your agent to talk to the arrogant MongoDB "server" (the mongod program running in the background).  

With that background, the first thing you will need to do, since mongod is already running in this environment, is to create the "client".    

In [1]:
from mspasspy.db import DBClient
dbclient=DBClient()

A geeky detail worth noting here is that we are using a python class (object) called `DBClient` that is a "subclass" of `pymongo.MongoClient`.  I point that out because all internet sources that are MongoDB introductions will create an instance of `pymongo.MongoClient` instead of the MsPASS extension used above.  An important "extension" DBClient adds is illustrated by the next code box:

In [2]:
db = dbclient.get_database("dbtutorial")

This incantation runs the `get_database` "method" of the class called `DBClient`.   It returns what we call a "database handle" in the User's Manual.   The MsPASS "database handle" is a python class that is itself a subclass  of another pymongo class.  Both have the name `Database`, but the MsPASS version adds a number of extensions for handling of seismic data.   The main ones of interest are readers and writers for seismic data objects, station metadata, and source metadata.  A key point is almost all MsPASS workflows begin with a variation of the combination of the two python code boxes above.   

When you call the `get_database` method as shown above the "handle" is created/constructed and can be accessed for the rest of your python workflow with the symbol you put on the left hand side of the expression (`db` in this example).  That name, of course, can be anything you want it to be, but For all examples in the MsPASS documentation we used `db` as a standard symbol to reduce confusion, but that should be viewed as simply a notation convention not a rule.  

### Documents and Collections
The User's Manual section companion to this tutorial discusses the MongoDB jargon terms `document` and `collection` at length.   I will not repeat that material here, but note from here on I assume you know what those two terms mean.   If you don't know what these terms mean consult the "Using MongoDB with MsPASS" section of the User's Manual or some other source before proceeding.

## Create
The first letter in the CRUD acronynm is "Create".  For this tutorial some form of "create" is an essential first step to put some kind of data into our tutorial database.   Most tutorials will begin inserting some largely arbitrary data.  Since this tutorial is designed for seismologists it seems more appropriate to work with seismology data.   The box below is a variant of one in the "getting_started" tutorial. It uses obspy's web service module to fetch station metadata for all "B-channels" defined for Earthscope TA stations that operated during the calendar year 2011. Obspy creates a python image of the stationxml downloaded from IRIS they call an `Inventory`.  In this code we use the MsPASS "create" method `save_inventory` to save a version of `Inventory` repackaged to mesh with MongoDB.

In [3]:
from obspy import UTCDateTime
from obspy.clients.fdsn import Client
client=Client("IRIS")
starttime=UTCDateTime('2011-01-01T00:00:00.0')
endtime=UTCDateTime('2012-01-01T00:00:00.0')
inv=client.get_stations(network='TA',starttime=starttime,endtime=endtime,
                        format='xml',channel='BH?',level='response')
db.save_inventory(inv)

Database.save_inventory processing summary:
Number of site records processed= 653
number of site records saved= 653
number of channel records processed= 2091
number of channel records saved= 2079


(653, 2079, 653, 2091)

## Read
The R of CRUD is "Read" and is more-or-less the inverse of "create".   The keyword used for pulling "documents" from a MongoDB database, however, is `find`.  There are two basic methods in the core MongoDB API for fetching documents:  `find_one` and `find`.  They behave completely differently.

### find_one

Let's begin with a simple application of `find_one`.  As the name implies it always returns one and only one document.  Here is a default application to the "site" collection that was created under the hood when we ran `save_inventory` above:

In [4]:
doc = db.site.find_one()
print("The type of a document = ",type(doc))
print("This is the content of that document")
print(doc)

The type of a document =  <class 'dict'>
This is the content of that document
{'_id': ObjectId('65e322fb5cd7e1e9490a5904'), 'loc': '', 'net': 'TA', 'sta': '034A', 'lat': 27.064699, 'lon': -98.683296, 'coords': [-98.683296, 27.064699], 'location': {'type': 'Point', 'coordinates': [-98.683296, 27.064699]}, 'elev': 0.155, 'edepth': 0.0, 'starttime': 1262908800.0, 'endtime': 1321574399.0, 'site_id': ObjectId('65e322fb5cd7e1e9490a5904')}


As the output demonstrates a `find_one` returns data in a python dictionary.   You might also note the raw `print(doc)` output is a bit challenging to read.   For the rest of this tutorial we will use a construct I've used a lot that makes the output a bit easier to read.   I'll define a small little function we will use elsewhere in this tutorial to make output more readable.

In [5]:
from bson import json_util
def pretty_print(doc,indent=2):
    print(json_util.dumps(doc,indent=indent))
doc=db['site'].find_one()
pretty_print(doc)

{
  "_id": {
    "$oid": "65e322fb5cd7e1e9490a5904"
  },
  "loc": "",
  "net": "TA",
  "sta": "034A",
  "lat": 27.064699,
  "lon": -98.683296,
  "coords": [
    -98.683296,
    27.064699
  ],
  "location": {
    "type": "Point",
    "coordinates": [
      -98.683296,
      27.064699
    ]
  },
  "elev": 0.155,
  "edepth": 0.0,
  "starttime": 1262908800.0,
  "endtime": 1321574399.0,
  "site_id": {
    "$oid": "65e322fb5cd7e1e9490a5904"
  }
}


Things of note in that box are:
1.  The `pretty_print` function definition is a bit trivial, which is why it isn't a standard MsPASS function.   It uses the `json_util.dumps` function to create the curly bracket formatted print that is a lot easier to understand than the raw dump of the python dictionary.   It shows more clearly that a document is always made of up of one or more key-value pairs.
2.  This example intentionally uses a variant of the syntax for interacting with the database handle.   Note in the first box I used `db.site` while in the second I used `db['site']`.   A powerful but confusing, in my opinion, feature of python is its capability to create that type of syntactic alternative incantation.   Technically, what it does is specify a "collection", which in this case is named "site".  In the jargon of MongoDB the `find` and `find_one` methods, which are the core MongoDB "read" methods, are "collection operation".   You should realize that `db` is the top-level symbol that refers to the "whole" database that is assumed to contain one more "collection"s.  The two incantations used above are alternative ways to get a handle to a specific "collection".   To clarify that point the following box illustrates a useful way to find the set of collections defined in our tutorial database at this point:

In [6]:
cursor=db.list_collections()
print("Current collections in tutorial database:")
for doc in cursor:
    print(doc['name'])

Current collections in tutorial database:
site
channel


### find
The above is also a good segway to the second standard MongoDB read method called `find`.  We defined the return of the `list_collection` function with the symbol "cursor".   That was a choice for the name, but consider this output:

In [7]:
print("Type type of the symbol cursor is ",type(cursor))

Type type of the symbol cursor is  <class 'pymongo.command_cursor.CommandCursor'>


A MongoDB `CommandCursor` is technically a __[forward iterator](https://www.boost.org/sgi/stl/ForwardIterator.html)__.   That means it acts like a list that can only be traversed "forward" with a construct like that above.   It is not at all the same thing, however, as a python list.   It is a handle that interacts with the database to sequentially return documents.   The following example with the `find` method illustrates the more common usage of a cursor:

In [8]:
cursor=db.site.find()
cursor.limit(3)
for doc in cursor:
    pretty_print(doc)

{
  "_id": {
    "$oid": "65e322fb5cd7e1e9490a5904"
  },
  "loc": "",
  "net": "TA",
  "sta": "034A",
  "lat": 27.064699,
  "lon": -98.683296,
  "coords": [
    -98.683296,
    27.064699
  ],
  "location": {
    "type": "Point",
    "coordinates": [
      -98.683296,
      27.064699
    ]
  },
  "elev": 0.155,
  "edepth": 0.0,
  "starttime": 1262908800.0,
  "endtime": 1321574399.0,
  "site_id": {
    "$oid": "65e322fb5cd7e1e9490a5904"
  }
}
{
  "_id": {
    "$oid": "65e322fb5cd7e1e9490a5907"
  },
  "loc": "",
  "net": "TA",
  "sta": "035A",
  "lat": 26.937901,
  "lon": -98.102303,
  "coords": [
    -98.102303,
    26.937901
  ],
  "location": {
    "type": "Point",
    "coordinates": [
      -98.102303,
      26.937901
    ]
  },
  "elev": 0.029,
  "edepth": 0.0,
  "starttime": 1263254400.0,
  "endtime": 1321315199.0,
  "site_id": {
    "$oid": "65e322fb5cd7e1e9490a5907"
  }
}
{
  "_id": {
    "$oid": "65e322fb5cd7e1e9490a590a"
  },
  "loc": "",
  "net": "TA",
  "sta": "035Z",
  "lat":

A few points of note about that simple 3 line code box:
1.  I used the default return for `find`.   The default returns "all", which in this would mean several hundred documents. For a large waveform data set it can easily be millions.
2.  To limit the output for this notebook I used a "method" of the `CommandCursor` class called "limit".  Here I did that with a separate line, but most python programmers would write the same expression as `cursor=db.site.find().limit(3)`.
3.  The output shows iterating through that (modified) cursor retrieves 3 documents from site.

Returning "all" is rarely what you want.  The more common use is to run `find` with a query as arg0 to the function.  The next subsection illustrates that use along with basics of the query language discussed in numerous printed sources, online sources, and the MsPASS User's Manual.   

### Mongo Query Language (MQL)
#### Single key match and basics
I will run a set of examples of increasing levels of complexity.   This particular section of this tutorial is intended as a hands on supplement to the section of the User's Manual titled "Using MongoDB with MsPASS" describing MQL.  

First, a unique match query:

In [9]:
query={'sta' : '134A'}
nsite=db.site.count_documents(query)
print("Number of site documents for station 134A=",nsite)
nchannel=db.channel.count_documents(query)
print("Number of channel documents for station 134A=",nchannel)
cursor=db.site.find(query)
for doc in cursor:
    pretty_print(doc)

Number of site documents for station 134A= 1
Number of channel documents for station 134A= 3
{
  "_id": {
    "$oid": "65e322fb5cd7e1e9490a591c"
  },
  "loc": "",
  "net": "TA",
  "sta": "134A",
  "lat": 32.572899,
  "lon": -98.079498,
  "coords": [
    -98.079498,
    32.572899
  ],
  "location": {
    "type": "Point",
    "coordinates": [
      -98.079498,
      32.572899
    ]
  },
  "elev": 0.297,
  "edepth": 0.0,
  "starttime": 1258329600.0,
  "endtime": 1315526399.0,
  "site_id": {
    "$oid": "65e322fb5cd7e1e9490a591c"
  }
}


Notice:
1.  I used another important collection method called `count_documents` to fetch the expected number of documents the query would yield.  Standard practice in working through many queries is to do a check that the number it returns makes sense.
2.  We see there is one and only one station matching query is site and three in channel.  The reason channel has three, of course, is that there is a three-component sensor at that station that defines the recording channels.  To see why I didn't run the for loop over a cursor created from channel consider this:

In [10]:
# Note find_one accepts the same query but returns 
# only the first one it finds
doc=db.channel.find_one(query)
pretty_print(doc)

{
  "_id": {
    "$oid": "65e322fb5cd7e1e9490a591c"
  },
  "loc": "",
  "net": "TA",
  "sta": "134A",
  "lat": 32.572899,
  "lon": -98.079498,
  "coords": [
    -98.079498,
    32.572899
  ],
  "location": {
    "type": "Point",
    "coordinates": [
      -98.079498,
      32.572899
    ]
  },
  "elev": 0.297,
  "edepth": 0.0,
  "starttime": 1258329600.0,
  "endtime": 1315487100.0,
  "chan": "BHE",
  "vang": 90.0,
  "hang": 90.7,
  "serialized_channel_data": {
    "$binary": {
      "base64": "gASVoBoAAAAAAACMHG9ic3B5LmNvcmUuaW52ZW50b3J5LmNoYW5uZWyUjAdDaGFubmVslJOUKYGUfZQojA5fbG9jYXRpb25fY29kZZSMAJSMCV9sYXRpdHVkZZSMGW9ic3B5LmNvcmUuaW52ZW50b3J5LnV0aWyUjAhMYXRpdHVkZZSTlEdAQElUwSJ0n4WUgZR9lCiMBWRhdHVtlE6MEWxvd2VyX3VuY2VydGFpbnR5lE6MEXVwcGVyX3VuY2VydGFpbnR5lE6MEm1lYXN1cmVtZW50X21ldGhvZJROdWKMCl9sb25naXR1ZGWUaAiMCUxvbmdpdHVkZZSTlEfAWIUWfseGPIWUgZR9lChoDk5oD05oEE5oEU51YowKX2VsZXZhdGlvbpRoCIwIRGlzdGFuY2WUk5RHQHKQAAAAAACFlIGUfZQoaA9OaBBOaBFOjAVfdW5pdJROdWKMBl9kZXB0aJRoGkcAAAAAAAAAAIWUgZR9lChoD

As you can see the attribute "serialized_channel_data" is huge and creates volumious output.   The reason is that it is a pickle format image of the raw "Inventory" record for that channel created by obspy's web service reader.  This example shows the common problem that documents can be too big to view with simple json_util dumps or a raw print.   For that reason it is often useful to specify a "projection" argument.   Here is an example where we extract and print only net, sta, chan, loc from each of the 3 channel documents:

In [11]:
projection={'net':1,'sta':1,'chan':1,'loc':1,'_id':0}
cursor=db.channel.find(query,projection)
for doc in cursor:
    print(doc)

{'loc': '', 'net': 'TA', 'sta': '134A', 'chan': 'BHE'}
{'loc': '', 'net': 'TA', 'sta': '134A', 'chan': 'BHN'}
{'loc': '', 'net': 'TA', 'sta': '134A', 'chan': 'BHZ'}


Here is a fancier variant using pandas to print a longer list of attributes in tabular form:

In [12]:
import pandas as pd
projection={
    'net':1,
    'sta':1,
    'chan':1,
    'lat':1,
    'lon':1,
    'elev':1,
    'hang':1,
    'vang':1,
    '_id':0,
}
cursor=db.channel.find(query,projection)
doclist=[]
for doc in cursor:
    doclist.append(doc)
df = pd.DataFrame.from_dict(doclist)
print(df)

  net   sta        lat        lon   elev chan  vang  hang
0  TA  134A  32.572899 -98.079498  0.297  BHE  90.0  90.7
1  TA  134A  32.572899 -98.079498  0.297  BHN  90.0   0.7
2  TA  134A  32.572899 -98.079498  0.297  BHZ   0.0   0.0


The pandas construct is useful for a number of reasons.  Therefore, let's create a function to simplify that type of printing operation.

In [13]:
import pandas as pd
def print_as_table(doclist):
    df = pd.DataFrame.from_dict(doclist)
    print(df)

#### Multiple key equality matching
Next let's do a query with multiple keys.   We will fetch the (shortened) record for the BHN component of a different station:

In [14]:
query={
    'sta' : '131A',
    'chan' : 'BHZ',
}
cursor=db.channel.find(query,projection)
for doc in cursor:
    pretty_print(doc)

{
  "net": "TA",
  "sta": "131A",
  "lat": 32.673698,
  "lon": -100.388802,
  "elev": 0.622,
  "chan": "BHZ",
  "vang": 0.0,
  "hang": 0.0
}


#### Range operator examples (compound query)
We often want to query by a range of values.  Here is an example that returns the coordinates of all TA stations within a 5 degree box defined by 30 to 35 latitude and -110 to -100 longitude: 

In [15]:
query={
    'lat' : {'$gte' : 30.0,'$lte' : 35.0},
    'lon' : {'$gte' : -110.0, '$lte' : -100},
}
projection={
   'net':1,
    'sta':1,
    'chan':1,
    'lat':1,
    'lon':1,
    'elev':1,
    '_id':0, 
}
cursor=db.site.find(query,projection)
doclist=[]
for doc in cursor:
    doclist.append(doc)
print_as_table(doclist)


   net   sta        lat         lon   elev
0   TA  121A  32.532398 -107.785103  1.652
1   TA  130A  32.596100 -100.965202  0.676
2   TA  131A  32.673698 -100.388802  0.622
3   TA  230A  31.887800 -101.112396  0.742
4   TA  231A  31.935301 -100.316299  0.574
5   TA  330A  31.406300 -101.175201  0.742
6   TA  331A  31.308500 -100.426598  0.615
7   TA  431A  30.682400 -100.607903  0.700
8   TA  530A  30.148899 -101.337898  0.636
9   TA  531A  30.164499 -100.546402  0.661
10  TA  MSTX  33.969601 -102.772400  1.167
11  TA  X30A  34.446098 -100.874001  0.698
12  TA  Y22D  34.073900 -106.921000  1.436
13  TA  Y22E  34.074200 -106.920799  1.444
14  TA  Y22E  34.074200 -106.920799  1.444
15  TA  Y30A  33.876598 -100.897797  0.812
16  TA  Y31A  33.962898 -100.261497  0.530
17  TA  Z30A  33.286098 -101.128197  0.729
18  TA  Z31A  33.318298 -100.143501  0.547


A variant using a regular expression to only select station names that start with the latter "Y":

In [16]:
query={
    'lat' : {'$gte' : 30.0,'$lte' : 35.0},
    'lon' : {'$gte' : -110.0, '$lte' : -100},
    'sta' : {'$regex' : 'Y.*'},
}
cursor=db.site.find(query,projection)
doclist=[]
for doc in cursor:
    doclist.append(doc)
print_as_table(doclist)

  net   sta        lat         lon   elev
0  TA  Y22D  34.073900 -106.921000  1.436
1  TA  Y22E  34.074200 -106.920799  1.444
2  TA  Y22E  34.074200 -106.920799  1.444
3  TA  Y30A  33.876598 -100.897797  0.812
4  TA  Y31A  33.962898 -100.261497  0.530


#### Geospatial query
MongoDB has some very useful geospatial query capabilities.  See the "MongoDB and MsPASS" section of the User's Manual for more about this capability.  On the other hand, it is probably best thought of, at least at present, as an advanced feature.   The syntax is complex and, as noted in that section of the manual, MongoDB documentation is less than ideal and many online sources are inconsistent with the current implementation.  For this tutorial I will just show an example that is a variant of that shown in User's Manual page.

An IMPORTANT rule about using geospatial searches is that a special index is REQUIRED.  For this example the following is needed to make this work:

In [17]:
db.site.create_index({'location' : '2dsphere'})

'location_2dsphere'

Noting:
1.  'location' is the key used to tag the geoJSON format documents `save_inventory` created in the site collection.  It is a constant tag in the MsPASS schema for these data.  Note also that if you were running this on the source collection the key has a different name ('epicenter') since the content exactly matches the definition of the jargon term. 
2. '2dsphere' is a magic string that tells MongoDB to create a special index that uses spherical geometry for spatial calculations.  The alternative is '2d' but the alternative is not advised for most if not all seismology applications.  The '2d' index uses a map projection that produces distorted answers unless the area of study is small. Examples you can find online use a '2d' index for applications like apps that are have data only on a single city.

Now that we have an index, we can do a search.  This search produces a similar result to the lat-lon range query above but for a circular (great circle path distance circle that is) region at the center of the same lat-lon box as above.  

In [18]:
query = {"location":{
        '$nearSphere': {
            '$geometry' : {
                'type' : 'Point',
                'coordinates' : [-105.0,32.5]
            },
            '$maxDistance' : 300000.0,
        }
      }
    }
# A flaw in the current MongoDB implementation is
# count_documents seems to not work with any geospatial 
# query.  If you remove this comment you will see 
# the error it throws.  If it works, it means MongoDB 
# developers fixed the problem
#n=db.site.count_documents(query)
cursor=db.site.find(query)
for doc in cursor:
    pretty_print(doc)

{
  "_id": {
    "$oid": "65e322fe5cd7e1e9490a608d"
  },
  "loc": "",
  "net": "TA",
  "sta": "Y22D",
  "lat": 34.0739,
  "lon": -106.921,
  "coords": [
    -106.921,
    34.0739
  ],
  "location": {
    "type": "Point",
    "coordinates": [
      -106.921,
      34.0739
    ]
  },
  "elev": 1.436,
  "edepth": 0.0,
  "starttime": 1191024000.0,
  "endtime": 1575158399.9998999,
  "site_id": {
    "$oid": "65e322fe5cd7e1e9490a608d"
  }
}
{
  "_id": {
    "$oid": "65e322fe5cd7e1e9490a6099"
  },
  "loc": "01",
  "net": "TA",
  "sta": "Y22E",
  "lat": 34.0742,
  "lon": -106.920799,
  "coords": [
    -106.920799,
    34.0742
  ],
  "location": {
    "type": "Point",
    "coordinates": [
      -106.920799,
      34.0742
    ]
  },
  "elev": 1.444,
  "edepth": 0.0,
  "starttime": 1301270400.0,
  "endtime": 1344297599.0,
  "site_id": {
    "$oid": "65e322fe5cd7e1e9490a6099"
  }
}
{
  "_id": {
    "$oid": "65e322fe5cd7e1e9490a6090"
  },
  "loc": "",
  "net": "TA",
  "sta": "Y22E",
  "lat": 34.074

Because of the pretty print of the full documents, that is a bit verbose, but it hopefully illustrates the point.  Although geospatial queries are complex, they have a lot of potential use for workflows that need to group data by the spatial location of stations (a "virtual array" concept) or by source (stacking of closely spaced sources).  

### Sorting
There are many situations where it is advantageous to 
sort the return of a query by one or more keys.   Sorting is technically a "method of the CommandCursor object" returned by a query but more magic happens when the client passes the query to the MongoDB server to assure the operation is done efficiently.   The reason I point that out here is mostly to clarify why the sort clause appears where it does in typical usage.  The User Manual addresses this in more detail, but here is an example that sorts 
channel documents to a form sensible for miniseed that 
uses the net:sta:chan:loc:time-interval as a unique 
key combination.  

In [19]:
# this is a test to verify sort syntax - delete when completed
filter_clause = {
    "_id":0,
    "sta":1,
    "chan":1,
    "starttime":1,
    "endtime":1,
}
sort_clause = [
    ("net",1),
    ("sta",1),
    ("chan",1),
    ("starttime",1),
  ]
cursor=db.channel.find({},filter_clause).sort(sort_clause).limit(6)
doclist=[]
for doc in cursor:
    doclist.append(doc)
from obspy import UTCDateTime
for doc in doclist:
    doc['starttime']=UTCDateTime(doc['starttime'])
    doc['endtime']=UTCDateTime(doc['endtime'])
print_as_table(doclist)
    

    sta                    starttime                      endtime chan
0  034A  2010-01-08T00:00:00.000000Z  2011-11-17T17:05:00.000000Z  BHE
1  034A  2010-01-08T00:00:00.000000Z  2011-11-17T17:05:00.000000Z  BHN
2  034A  2010-01-08T00:00:00.000000Z  2011-11-17T17:05:00.000000Z  BHZ
3  035A  2010-01-12T00:00:00.000000Z  2011-11-14T17:40:00.000000Z  BHE
4  035A  2010-01-12T00:00:00.000000Z  2011-11-14T17:40:00.000000Z  BHN
5  035A  2010-01-12T00:00:00.000000Z  2011-11-14T17:40:00.000000Z  BHZ


Noting:
1.  The "sort" function call appears after the find function with arguments.   That is the syntax because "sort" is a Cursor "method".
2.  I added a second qualifier, limit, to only return the first 6 documents.  I did that just to keep the volume of the output under control.   The number return is much larger if you remove the `.limit(6)` qualifier.
3.  I did a projection and used the `print_as_table` function we defined to make a more readable report. 

## Update
One has to do an "update" to a MongoDB database if you need to change the contents of one or more documents.  Database updates happen in the modern world in inconceivably huge numbers every day in commericial operations.  e.g. if you order something from Amazon all those tracking stages from your clicking history to the time a package is delivered to your home invoke a series of database transactions including, I presume, a lot of updates.  

Although updates are a common requirement in commercial databases, a less obvious thing to most people is that updates are rarely if ever needed in data processing with a system like MsPASS.   Most data processing involves three stages:  1) read the data set, 2) process the data set, and 3) save the results.   Some processors may need to do read operations from the database, but updates are rarely needed.  They are also highly undesirable in a data-driven workflow like that because database transactions, from the computer's perspective, are like a human talking to someone on Jupiter; a response to the request for an update takes forever in terms of computer clock cycles.  For that reason, updates should be avoided in any workflow and should absolutely never be embedded in a large, parallel processing sequence. 

In MsPASS updates can nearly always be avoided by a simple, alternative approach:   if a change is needed that needs to be saved (e.g. you compute a set of new attributes from the data) simply post that data to the associated object's `Metadata` container.   In that model, when the final results are saved the newly computed attributes will be saved with the data.  Then the overhead of writing to the database is absorbed in the normally essential save step anyway.  

With that long caveat, there are two standard ways to do updates:  `update_one` changes one document at a time, and `update_many` updates multiple documents with one client-server transaction.  Most people can understand usage of these two methods better by examples.  The examples below focus on updates to "normalizing" collections as that, from my experience, is the most common need for updates when using MsPASS.

### update_one example
Suppose we learned that the recording period for a seismic station are wrong.  That is, with SEED data station information has a time period for which the data are considered valid.   That period is defined by two attributes with the keys "starttime" and "endtime"  Changing these fields would be highly unusual for data downloaded from the FDSN, but is not at all uncommon for portable deployments while the experiment is in progress.  Our example is contrived as what we are about to do will make the entry we edit wrong.   So the hypothetical situation we are modeling is that we imagine we learned we the "endtime" for station O34A is wrong.  We first query the site collection to verify what we have:

In [20]:
from obspy import UTCDateTime
query={'sta' : 'O34A'}
# verify there is only one entry - not always true with this query
ndocs=db.site.count_documents(query)
print('Number of documents for station O34A = ',ndocs)
doc=db.site.find_one(query)
print(doc['sta'],
    UTCDateTime(doc['starttime']), UTCDateTime(doc['endtime']))

Number of documents for station O34A =  1
O34A 2010-06-11T00:00:00.000000Z 2012-04-18T23:59:59.000000Z


We say, "ahh the endtime should have been on March 19 not March 18 and our field notes show the actual time was 13:44 UTC. "   We can make that change with this use of update one.  

In [21]:
new_time=UTCDateTime('2012-04-19T13:44:00.0Z')
update_doc={ '$set' :
            {'endtime' : new_time.timestamp}
           }
db.site.update_one(query,update_doc)
print('Updated data for O34A')
doc=db.site.find_one(query)
print(doc['sta'],
    UTCDateTime(doc['starttime']), UTCDateTime(doc['endtime']))

Updated data for O34A
O34A 2010-06-11T00:00:00.000000Z 2012-04-19T13:44:00.000000Z


Notice update_one has two required arguments: arg0 is a query operator and arg1 is required to be an 'operator' meaning in has to use one of the 'dollar' operators discussed above.  This one uses '$set' with means replace the value.  In my experience, that is the most common operator for updates.

### update_many example
The basic argument structure required for `update_many` is the same as `update_one`.   The difference is you should use `update_many` when the query in arg0 is expected to return more than one document that are to be modified.  The example below is the same as  for `update_one` but applied to the "channel" collection.   As the `count_documents` output shows below the same query yields 3 documents for channel because the site has a three component sensor.

In [22]:
ndocs=db.channel.count_documents(query)
print('number of channel documents for O34A=',ndocs)
# we use the same query and update_doc as above
db.channel.update_one(query,update_doc)
print('Updated data for O34A')
cursor=db.channel.find(query)
for doc in cursor:
    print(doc['sta'],doc['chan'],
      UTCDateTime(doc['starttime']), 
      UTCDateTime(doc['endtime']))

number of channel documents for O34A= 3
Updated data for O34A
O34A BHE 2010-06-11T00:00:00.000000Z 2012-04-19T13:44:00.000000Z
O34A BHN 2010-06-11T00:00:00.000000Z 2012-04-18T15:20:00.000000Z
O34A BHZ 2010-06-11T00:00:00.000000Z 2012-04-18T15:20:00.000000Z


## Delete
The API for deleting documents is very similar to that for find.  There is a `delete_one` method to delete a single document and a `delete_many` method that more-or-less does a find followed by deleting each document the query found.  For instance, the following deletes what we just updated in channel:

In [23]:
# repeating this query to be clear but not required in this context
query={'sta' : 'O34A'}
ndocs=db.channel.count_documents(query)
print('number of channel documents for O34A before delete=',ndocs)
ret=db.channel.delete_many(query)
ndocs=db.channel.count_documents(query)
print('number of channel documents for O34A after delete_many=',ndocs)

number of channel documents for O34A before delete= 3
number of channel documents for O34A after delete_many= 0


Handling deletions of waveform data is a much more difficult problem.   In MsPASS there is a special method of our `Database` class called `delete_data`.  That method has to do a lot more than just call the `delete_one` method to remove the database document.  There are two reasons for that:
1.  In MsPASS the sample data, which are typically orders of magnitude larger than the "document" saved in MongoDB, are stored separately from the "document" of name-value pairs.
2.  MsPASS also support multiple "storage modes" for how to handle the sample data.   It also allow multiple "format"s for how that data is represented externally (e.g. miniseed is a "format" that is light years from the natural representation of seismic data). At this time there are three basic "storage modes":  (1) "file", (2) "gridfs", and "url".  How they need to be handled with a "delete" operation is very different.  When "storage_mode" is set to "file" the sample data are stored in a file system in a set of files.  There the problem is one file should normally contain many waveforms so if a lot of editing is done data will be stranded.  MsPASS has a way to automatically delete files that no longer contain a reference in the database to reduce debris, but it only works if the entire file content is deleted.   Using "gridfs" storage is a simpler problem as our waveform delete operator will automatically clear sample data stored in the gridfs system.  If your application requires a lot of editing to remove stale waveforms, gridfs is by far the best choice.  Finally, "URL" is pretty much defined to be read-only so the only thing that happens for data indexed that way is that the document vanishes. For data access via the cloud with the new Earthscope system this mode may become common.     

One common application of `delete_data` is to clear some temporary save copy that is no longer needed.  In MsPASS when data are saved we recommend ALWAYS using the "data_tag" argument to provide a unique tag for data at a specific stage of processing.   With that understand, suppose we saved an intermediate copy of a working dataset with the `data_tag="preprocessed"` and we wanted to clear the disk space associated with that intermediate copy.  The following simple code box would do that (Note it will do nothing here because the db we have been using contains no waveform data so I disabled the code box):  

Note arg0 of this method (currently) requires the ObjectId of the document to be deleted.  arg1 must be either "TimeSeries" or "Seismogram" or the method will throw an exception.

## Create part 2:  data import
### Import PhaseNet picks
A very common need in MsPASS or any project where you want to utilize MongoDB is the need to import data in some standard or weird format and put it into a form you can manage with MongoDB.   I'll close this tutorial with two examples.  The first reads data in a standard format called "comma separated value (csv)".  The second is a good example of a weird (aka clumsy) legacy format that has been around for decades and exists only because of inertia. 

Our example with a stardard format is the output of a newer package called __[PhaseNet](https://github.com/AI4EPS/PhaseNet)__.  PhaseNet uses a neural net for picking seismic phases.  We will be cracking an output file from this package found in the data directory where you are assumed to have run this tutorial.  (Thanks to __[Jianhua Gong](https://earth.indiana.edu/directory/faculty/gong-ginny.html)__ from supplying these data.)  The data are a standard "csv" format file and are relatively easy to read pandas.   This next box loads the file of picks into memory as a pandas DataFrame:

In [24]:
import pandas as pd
df = pd.read_csv('./data/picks.csv')
print(df[0:4])

                                           file_name               begin_time  \
0  X9.BB060..HH*__20120919T000000Z__20120920T0000...  2012-09-19T00:00:00.000   
1  X9.BB060..HH*__20120919T000000Z__20120920T0000...  2012-09-19T00:00:00.000   
2  X9.BB060..HH*__20120919T000000Z__20120920T0000...  2012-09-19T00:00:00.000   
3  X9.BB060..HH*__20120919T000000Z__20120920T0000...  2012-09-19T00:00:00.000   

                                          station_id  phase_index  \
0  X9.BB060..HH*__20120919T000000Z__20120920T0000...      1390027   
1  X9.BB060..HH*__20120919T000000Z__20120920T0000...      4120059   
2  X9.BB060..HH*__20120919T000000Z__20120920T0000...      4123674   
3  X9.BB060..HH*__20120919T000000Z__20120920T0000...      4252371   

                phase_time  phase_score phase_type  
0  2012-09-19T03:51:40.270        0.337          P  
1  2012-09-19T11:26:40.590        0.867          P  
2  2012-09-19T11:27:16.740        0.417          P  
3  2012-09-19T11:48:43.710        0.

We used pandas in kind of the reverse way before to reformat MongoDB documents to a more easily read form.  The print above illustrates that.   We are now, however, trying to do the the inverse operation; saving the contents of this DataFrame to MongoDB.   There is, however, a very easy way to do that.  The first step, illustrated in the next box, is to convert the DataFrame to something MongoDB can digest.  The DataFrame API has a stock way to do that with it's `to_dict` method illustrated in the box below:

In [25]:
doclist=df.to_dict('records')
print('type of to_dict output=',type(doclist),' size=',len(doclist))
print('First document in output of to_dict')
pretty_print(doclist[0])

type of to_dict output= <class 'list'>  size= 35917
First document in output of to_dict
{
  "file_name": "X9.BB060..HH*__20120919T000000Z__20120920T000000Z.mseed",
  "begin_time": "2012-09-19T00:00:00.000",
  "station_id": "X9.BB060..HH*__20120919T000000Z__20120920T000000Z.mseed",
  "phase_index": 1390027,
  "phase_time": "2012-09-19T03:51:40.270",
  "phase_score": 0.337,
  "phase_type": "P"
}


The print illustrates `to_dict` converts the DataFrame to a python list of dictionaries.   We print the first component of the list to show what it contains.  

It is good practice at this point to release the DataFrame.  We wouldn't have to do that, but if the file were large it could be an issue so it is best to do a bit of housecleaning:

In [26]:
del df

This particular example illustrates the nearly universal problem that data imported from anything independent from your own work has some kind of mismatch with your needs.   The data we just imported are phase pick estimates from a package called "PhaseNet" that used a neural net to "pick" P and S phases arrival times.   A detail of earthquake catalog preparation, of which "picking" is a basic operation, is that each "pick" is defined by the channel of seismic data from which it was derived.   Ultimately that channel need to be associated with a particular instrument with a known location on the earth for it to be used for an earthquake location (a primary purpose of "picks").  This particular implementation does that particularly weird way with the attribute seen in the DataFrame and conversion above with the key `station_id`.   ("Weird", by the way, is the norm with any research application that hasn't had a standard applied.)  The SEED standard has set how this supposed to be done for something like 30 years.  All we really need are "seed station code names" and a time to match each arrival the the full set of metadata for a given seismic station.  A full "seed station code" consists of four keywords commonly called the "network", "station", "channel", and "location" code.  In MsPASS we use the CSS3.0 abbreviations from Antelope of "net", "sta", "chan", and "loc" respectively.   For these data we are parsing we see the "station_id" attribute has "net" and "sta" defined at the start of the string separated by a "." character.  For example, the first entry printed above has 

In [27]:
print(doclist[0]['station_id'])

X9.BB060..HH*__20120919T000000Z__20120920T000000Z.mseed


So "net" is "X9" and "sta" is "BB060".   "HH*" means "all H channels" but that is ambiguous and rightly so.  For practical purpose all we usually care about for processing downstream is where the instrument was located so "net" plus "sta" is all we really need.   The whole point of that long discussion was to explain why when we actually write these data to MongoDB we use the python "split" method to extract the "net" and "sta" values and insert them into the data we are saving.  The following does that in a loop.  Once the data are edited in that way we insert the results into MongoDB with a standard create (C of CRUD) method called insert_many.  (Note insert_many used this way is orders of magnitude faster than if we had called insert_one after each doc was edited in the loop.  Why is easily learned from internet sources but the basic point is the data are written in blocks instead as single transactions.)

In [28]:
for doc in doclist:
    station_id = doc['station_id']
    slist=station_id.split('.')
    net=slist[0]
    sta=slist[1]
    doc['sta']=sta
    doc['net']=net

insert_many_output=db.arrival.insert_many(doclist)

Let's verify that worked by verifying the count compared to what we put in and looking at the first document we saved:

In [29]:
print('Size of doclist = ',len(doclist))
print('Number of arrival documents saved=',db.arrival.count_documents({}))
doc=db.arrival.find_one()
pretty_print(doc)

Size of doclist =  35917
Number of arrival documents saved= 35917
{
  "_id": {
    "$oid": "65e322fe5cd7e1e9490a6123"
  },
  "file_name": "X9.BB060..HH*__20120919T000000Z__20120920T000000Z.mseed",
  "begin_time": "2012-09-19T00:00:00.000",
  "station_id": "X9.BB060..HH*__20120919T000000Z__20120920T000000Z.mseed",
  "phase_index": 1390027,
  "phase_time": "2012-09-19T03:51:40.270",
  "phase_score": 0.337,
  "phase_type": "P",
  "sta": "BB060",
  "net": "X9"
}


### Load Centroid Moment Tensor Catalog
The centroid moment tensor catalog is a heavily used catalog of earthquake source information. For those unfamiliar with the concept, a Moment Tensor can be thought of as the union of a focal mechanism and magnitude. It arises from a general theoretical framework for elastic waves and is the best theoretical model we know for any seismic source. The point of this tutorial, however, a lesson in how to read data not directly supported by MsPASS, not a lesson in seismology. The standard CMT catalog is a type example of data distributed in an archaic, nonstandard, and/or complicated format data. All three of those pretty much describe the CMT format.  It is well documented __[here](https://www.ldeo.columbia.edu/~gcmt/projects/CMT/catalog/allorder.ndk_explained)__ and the data we will use in this section can be downloaded directly __[here](https://www.globalcmt.org/CMTfiles.html)__.  "Well documented", however, does not mean simple.  The format is a legacy of card images as the lines are limited to 80 ascii characters.  Each CMT solution spans 5 lines with a few keywords inserted to give humans a hope of figuring out what is what.   

With that long background, here is an implementation of a reader for "ndk" format files from the CMT project.   Run the next code box to define a couple key functions and then go to the text box following for a description.

In [30]:
"""
This module contains a set of utilities to parse the archaic so 
called "ndk" format text files used to distribute the Global 
Centroid Moment Tensor catalog.  ndk is an obnoxious text format 
with obvious archaic roots to the dark ages of punched cards.  
It uses 5, 80-column lines for each earthquake in the catalog. 
That type of data is very difficult to parse in python, but 
that is what this does.   

Created on Wed Jan  3 06:03:05 2024

@author: pavlis
"""

def read_ndk_file(path,apply_checks=True):
    """
    Reads the contents of the "ndk" format file assumed to be the leaf 
    of the string defined by the "path" argument.  The result is a really 
    just an image of the file returned by the standard python function 
    readlines.  The only thing this function adds is a sanity check 
    on the file content to verify the files has the weird structure of 
    5 lines per CMT solution with some key words in the expected place.
    That can be turned off by setting the "apply_checks" False.  In that 
    case is little more than open, readlines, and close. 
    """
    fh = open(path,'r')
    alllines = fh.readlines()
    fh.close()
    # This is a sanity check.  Every third line in the full catalog 
    # I have has this magic string.  The format description doesn't 
    # state this as a requirement, but it seems to be so.
    # If you get this message you may want to turn off default check
    if apply_checks:
        if len(alllines)%5 != 0:
            message = "read_ndk_file: file {} appears to be corrupted\n".format(path)
            message += "File has {} lines which is not a multiple of 5 that is propertly of ndk files\n".format(len(alllines))
            raise RuntimeError(message)
        i=2
        while i<len(alllines):
            testline=alllines[i]
            teststring=testline[0:8]
            if teststring != "CENTROID":
                message = "read_ndk_file: file {} appears to be corrupted at line {}\n".format(path,i)
                message += "Line content:  [{}]".format(testline)
                message += "That line should contain the magic string CENTROID"
                raise RuntimeError(message)
            i += 5
    return alllines

def parse_ndk_image(lines)->list:
    """
    Parses the binary image of an ndk file read with the read_ndk_file 
    function.   That means it assumes arg0 is a list of strings 
    from the formatted ndk text file.  The algorithm assumes the 
    file line count is a multiple of 5, which is specified by the 
    archaic ndk file structure.   
    
    
    Returns a list of python dicts with keys for attributes set as constants 
    in this function.  That effectively imposes a schema definition 
    that could be used with MongoDB but would require coordination 
    with the constants in this function.
    """
    # These tuples define the format for each line and the key 
    # to which each value should be associated
    # each tuple is start, end, key, type (s,f,i for string, float, and int)  
    # date strings have to be treated specially
    form_def = []
    # the document on this format is wrong on some of these fields
    # I think the riginal format may have dropped the decimal points for 
    # float values  Also ms and mb fields are not properly defined
    # had to infer this
    f = [
          [0,3,'location_source','s'],
          [4,14,'date','s'],
          [16,25,'time_of_day','s'],
          [27,33,'lat','f'],
          [34,41,'lon','f'],
          [42,47,'depth','f'],
          [48,51,'mb','f'],
          [52,54,'Ms','f'],
          [56,79,'geographic_comment','s']
        ]
    form_def.append(f)
    # line 2
    f = [
            [0,15,'CMT_event','s'],
            [17,60,'CMT_data_used','s'],
            [62,67,'invesion_type','s'],
            [69,79,'moment_rate_function','s']
        ]
    form_def.append(f)
    # line 3
    f = [
            [0,57,'centroid','s'],  # this token will need to be split into multiple attributes
            [59,61,'depth_type','s'],
            [64,79,'timestamp','s']
        ]
    form_def.append(f)
    # line 4
    f = [
            [0,1,'exponent','i'],
            [2,79,'MT_components','s'],  # will definitely need to be split up to be useful
        ]
    form_def.append(f)
    # line 5
    f = [
            [0,2,'version','s'],
            [3,47,'MT_pc','s'],
            [49,55,'moment','f'],
            [57,79,'sdr','s'], # strike-dip-rake needs to be split
        ]
    form_def.append(f)
    
    # loop over the list of lines 5 at a time. 
    # range increment makes this simpler
    doclist=[]
    for l0 in range(0,len(lines),5):
        doc = dict()
        # i runs over line number but ii run 0 to 4
        ii = 0
        for i in range(l0,l0+5):
            s = lines[i]
            f = form_def[ii]
            for j in range(len(f)):
                #print(l0,i,ii,j)
                start = f[j][0]
                end = f[j][1]
                key = f[j][2]
                type_def = f[j][3]
                sval = s[start:end]
                # too bad python doesn't have a swtich-case as that would be used here
                if type_def == 's':
                    val = sval
                elif type_def == 'f':
                    val = float(sval)
                elif type_def == 'i':
                    val = int(sval)
                else:
                    raise ValueError("Illegal value for type in foraat="+type_def)
                doc[key] = val
            ii += 1  
        doclist.append(doc)
        
    return doclist


A lot of lines of code there, but the docstrings for the two functions describe what they do.  We run both functions to crack translate the file into a form we can push to MongoDB.

In [31]:
# This form works for running from standard MsPASS container 
# Change may be needed if file is downloaded 
fname='/home/data/jan76_dec20.ndk'
lines = read_ndk_file(fname)
doclist=parse_ndk_image(lines)
del lines
insout=db.cmt.insert_many(doclist)
print("Number of documents inserted=",len(insout.inserted_ids))
print("Size of the cmt collection is now = ",db.cmt.count_documents({}))

Number of documents inserted= 56832
Size of the cmt collection is now =  56832


Note the `del lines` statement is not essential, but demonstrates good practice for memory management.   The first function simply loads the file into memory as a list of strings while the second (`parse_ndk_image`) converts that to a list of python dictionaries.   Once the conversion is finished the output of the first function is no longer needed.  The last line uses the `insert_many` method of `Database.collection` to add one document for each CMT solution to a collection I called "cmt".  Let's look at one of the documents that produced:

In [32]:
doc=db.cmt.find_one()
pretty_print(doc)

{
  "_id": {
    "$oid": "65e322ff5cd7e1e9490aed70"
  },
  "location_source": "MLI",
  "date": " 1976/01/0",
  "time_of_day": "01:29:39.",
  "lat": -28.61,
  "lon": -177.64,
  "depth": 59.0,
  "mb": 6.2,
  "Ms": 0.0,
  "geographic_comment": "KERMADEC ISLANDS REGION",
  "CMT_event": "M010176A       ",
  "CMT_data_used": "B:  0    0   0 S:  0    0   0 M: 12   30 13",
  "invesion_type": "CMT: ",
  "moment_rate_function": "BOXHD:  9.",
  "centroid": "CENTROID:     13.8 0.2 -29.25 0.02 -176.96 0.01  47.8  0.",
  "depth_type": "FR",
  "timestamp": "O-0000000000000",
  "exponent": 2,
  "MT_components": "  7.680 0.090  0.090 0.060 -7.770 0.070  1.390 0.160  4.520 0.160 -3.260 0.06",
  "version": "V1",
  "MT_pc": "   8.940 75 283   1.260  2  19 -10.190 15 11",
  "moment": 9.56,
  "sdr": "202 30   93  18 60   8"
}


As you can see, the reason the CMT format is so complex is that there are a lot of attributes for each CMT solution.  There are actually a lot more than the document layout suggests.  My implementation parsed the file by field widths described in the __[format description_document](https://www.ldeo.columbia.edu/~gcmt/projects/CMT/catalog/allorder.ndk_explained)__.  If I were going to actually use this further, I would change the code to split out the attributes above with strings containing multiple numbers.  e.g. the most fundamental quantities in the CMT are the 12 numbers above following the key "MT_components".   The last box of this tutorial shows how to fix the "MT_components" entry and convert it to a MongoDB "subdocument" with the data stored properly as floats instead of a formatted data string.  The subdocument translates the components to names defined in the format description:

In [33]:
mtcomp=doc['MT_components']
cstrl = mtcomp.split()
print(cstrl)

['7.680', '0.090', '0.090', '0.060', '-7.770', '0.070', '1.390', '0.160', '4.520', '0.160', '-3.260', '0.06']


In [34]:
cursor=db.cmt.find({}).limit(3)
for doc in cursor:
    id=doc['_id']   # we need this later for update
    mtcomp=doc['MT_components']
    slist = mtcomp.split()   # python function parsing tokens by white space
    subdoc = dict()
    #Mrr, Mtt, Mpp, Mrt, Mrp, Mtp,
    subdoc['Mrr'] = slist[0]
    subdoc['Mrr_sigma'] = slist[1]
    subdoc['Mtt'] = slist[2]
    subdoc['Mtt_sigma'] = slist[3]
    subdoc['Mpp'] = slist[4]
    subdoc['Mpp_sigma'] = slist[5]
    subdoc['Mrt'] = slist[6]
    subdoc['Mrt_sigma'] = slist[7]
    subdoc['Mrp'] = slist[8]
    subdoc['Mrp_sigma'] = slist[9]
    subdoc['Mtp'] = slist[10]
    subdoc['Mtp_sigma'] = slist[11]
    matcher={'_id' : id}
    update_command = {'$set' : {'MT_components' : subdoc}}
    db.cmt.update_one(matcher,update_command)

To see what that did, here is the same document displayed above but with the change we just made:

In [35]:
doc = db.cmt.find_one()
pretty_print(doc)

{
  "_id": {
    "$oid": "65e322ff5cd7e1e9490aed70"
  },
  "location_source": "MLI",
  "date": " 1976/01/0",
  "time_of_day": "01:29:39.",
  "lat": -28.61,
  "lon": -177.64,
  "depth": 59.0,
  "mb": 6.2,
  "Ms": 0.0,
  "geographic_comment": "KERMADEC ISLANDS REGION",
  "CMT_event": "M010176A       ",
  "CMT_data_used": "B:  0    0   0 S:  0    0   0 M: 12   30 13",
  "invesion_type": "CMT: ",
  "moment_rate_function": "BOXHD:  9.",
  "centroid": "CENTROID:     13.8 0.2 -29.25 0.02 -176.96 0.01  47.8  0.",
  "depth_type": "FR",
  "timestamp": "O-0000000000000",
  "exponent": 2,
  "MT_components": {
    "Mrr": "7.680",
    "Mrr_sigma": "0.090",
    "Mtt": "0.090",
    "Mtt_sigma": "0.060",
    "Mpp": "-7.770",
    "Mpp_sigma": "0.070",
    "Mrt": "1.390",
    "Mrt_sigma": "0.160",
    "Mrp": "4.520",
    "Mrp_sigma": "0.160",
    "Mtp": "-3.260",
    "Mtp_sigma": "0.06"
  },
  "version": "V1",
  "MT_pc": "   8.940 75 283   1.260  2  19 -10.190 15 11",
  "moment": 9.56,
  "sdr": "202 30  