Having used AWS DynamoDB for Alexa, it was time to investigate DynamoDB as a NoSQL database in its own right.
The contents are as follows:
I will be examining DynamoDB for use in serverless architectures as a replacement for MongoDB.
Specifically, I will be examining DynamoDB (and alternatives) for Database as a Service (DBaaS) capabilities.
[A principal component of serverless architectures is the ability to outsource the bulk of database operations.]
In particular, the ability to scale and maintain NoSQL databases at enterprise levels can involve substantial time and costs.
DBaaS is generally a premium cloud offering and can get expensive quickly. Even so, it's still probably far less expensive than managing an on-premises database. The choice of cloud provider is important as egress charges tend to be prohibitive (MongoDB Atlas can use any of the major cloud providers).
DBaaS is supposed to be a cost-effective way to handle bursty traffic where volumes are unknown. The classic example is getting a mention on SlashDot or HackerNoon that generates flash crowds - usually there is not enough capacity to handle the load and the servers fall over. Or to use a more recent success story, consider Pokemon Go where initial traffic estimates were quickly exceeded by astronomical demand.
DynamoDB auto-scaling is relatively simple but probably does not extend across AWS regions.
By default, auto-scaling is not enabled - but once auto-scaling is enabled, it becomes the default.
It is simple to create an auto-scaling IAM role from the DynamoDB Capacity tab.
[This role cannot then be deleted until it is no longer referenced - in other words, all auto-scaling is disabled.]
TL;DR Don't enable auto-scaling until you actually need it; it will be in permanent Alarm state otherwise.
AWS offers Global Tables for replication across regions.
[This is a premium service, and may well be overkill for most use cases.]
DynamoDB does not have a query optimizer. For some tips, refer to DynamoDB and Indexes.
Although NoSQL databases can offer transaction guarantees, most are geared more towards Eventual Consistency.
So it is little surprise that DynamoDB is Eventually Consistent.
[This is the default; Strong Consistency options are available (probably at a premium).]
To the extent that they are needed, Advanced query capabilities (text search, JSON parsing) are nice to have.
Of course, for more complicated use cases GraphQL is an option, but this is to be avoided if at all possible.
There is no clear winner in terms of querying - MongoDB queries are not particularly intuitive, which is perhaps why they have MongoDB Connector for BI - which allows for standard SQL queries. CouchBase has N1QL (which seems simple enough). And for another standard SQL option, there is Amazon Athena - which may well be a very good choice for a CQRS-type solution (for a Google Cloud alternative to Amazon Athena there is BigQuery).
Consistent with usage and query requirements, it should as simple as possible to use and manage.
[DynamoDB is simple to set up and administrate, although the IAM aspects can be tricky.]
The following is a quick list of alternatives to DynamoDB.
Some notes will be listed, otherwise how to evaluate DynamoDB against the alternatives?
[It will be assumed that what is required is a JSON-friendly NoSQL DBaaS that can auto-scale.]
Wikipedia has a good (if slightly out-of-date) summary of the alternatives: http://en.wikipedia.org/wiki/DBaaS
NoSQL databases are built to service heavy read/write loads and can scale up and down easily
[The article makes no mention of either AWS DocumentDB or Azure DocumentDB.]
- Seems to have been designed as a drop-in replacement for MongoDB
- Apparently runs on Aurora PostgreSQL under the covers
- Does not support all MongoDB data types
- Probably overkill for simple use cases
- Probably not a good choice for greenfield applications
- Apparently has the same price structure as MongoDB Atlas
- Requires a VPC (additional costs & doesn't help with Lambda cold starts, possibly leading to high tail-latencies)
- Uses S3 as a backing store
- Supports CSV, JSON, ORC, Avro, and Parquet
- Uses standard SQL (so good integration with traditional analysis tools)
- Based on Facebook Presto
- Offers a choice of cloud providers (AWS, Azure, GCP)
- Has a query optimizer (indexes are preferred), can be overridden with $hint
- Widest choice of supported programming languages
- A relatively old and mature offering, so offers good third-party tools
- Seems to offer global clusters which pair well with GCP's global VPCs
- Offers the widest JSON support (including BSON)
- Uses JSON query syntax
- Offers MongoDB Connector for BI which allows for standard SQL queries
- Has a free tier (which offers a subset of Atlas features) which doesn't require a credit card
- Seems to be playing catch-up with AWS and Azure (both of which have competing offerings)
- Scaleable, both horizontally and vertically
- Apparently has the same price structure as AWS DocumentDB
The following article offers a good overview of MongoDB cluster options:
http://docs.atlas.mongodb.com/create-new-cluster/
Some quotes follow.
Electable nodes for high availability
[HA is nice.]
Read-only nodes for optimal local reads
[They cannot take part in elections and cannot be used for replication, but possibly a nice option.]
Auto-Expand Storage: Available on clusters of size M10 and larger. When disk usage reaches 90%, automatically increase storage by an amount necessary to achieve 70% utilization. To enable this feature, check the box marked Auto-expand storage when disk usage reaches 90%.
Changes to storage capacity affect cost.
[Functional auto-scale but only at the enterprise (M10 and larger, not free) level.]
IOPS (configurable for AWS only)
The initial configuration is not cast in stone either; it's possible to modify just about everything after the fact (including scaling-up the cluster). For a list of what can be modified after the fact:
http://docs.atlas.mongodb.com/scale-cluster/
This article largely restates the initial article, however some interesting quotes follow:
You can only modify the cloud provider backing your cluster when you upgrade from an Atlas M0 Free Tier or M2/M5 Shared Tier cluster to a larger cluster.
And:
You cannot modify the cloud provider of M10 or larger dedicated clusters.
[You can, however, create a new cluster and do a live migration to a different cloud provider (possibly expensive, but probably still very useful). Expect to pay egress charges.]
And:
For dedicated clusters with an Instance Size of M10 or larger, you can modify cluster’s region.
[These are all pretty attractive options.]
- Apparently rebranded from Azure DocumentDB in 2017
[I am unfamiliar with Azure so I won't be examining this product.]
- Seems to have better scaling and replication options
- Probably more JSON-friendly than its competitors (apart from MongoDB)
- Has its own query language (N1QL)
- Not really DBaaS
[Check out my Couchbase repo.]
If performance becomes an issue, it is always possible to add a caching layer with Amazon DynamoDB Accelerator (DAX).
[DAX went GA in June, 2017.]
DynamoDB has a handy cost calculator, which is tied to DynamoDB Capacity.
It is accessible via the DynamoDB Capacity tab.
For more precise estimates, there is the Capacity calculator (available via a link).
And for the ultimate in precision, there is the AWS Simple Monthly Calculator.
DynamoDB has sensible default values. These can be easily modified after the fact:
Note the following:
Select on-demand if you want to pay only for the read and writes you perform, with no capacity planning required.
Select provisioned to save on throughput costs if you can reliably estimate your application's throughput requirements.
You can update to on-demand mode at any time.
[Switching to On-demand capacity can take 5 minutes or so. But switching back to Provisioned is nearly instantaneous.]
Nota bene:
- Changes to On-demand capacity are limited
- Auto-scaling is not an option if On-demand capacity is selected
Once production volumes have become established (after a few months running in production perhaps), it is possible to reserve DynamoDB capacity. This is a moderately long-term commitment (one or three years) but offers discounts. The capacity to reserve should be based upon the expected usage. The cost factor will play into this calculation of course; the higher the capacity reserved, the greater the cost savings.
[Costs can apparently be expected to continue to go down, so the one year option is the term to choose.]
Rather annoyingly, there is no way to export DynamoDB configurations/definitions.
Perhaps DDL (Data Definition Language) is not the correct term for this, as NoSQL databases are schema-less, but for replicating (say perhaps in different regions or for offline testing) or re-creating a database (NoSQL or not) it is useful to have a backup copy of the database definition (if only to be able to check it into a Git repo).
This is a pretty minor complaint, as it takes five minutes or so to create a DynamoDB table, however it can get a little tedious setting up the same DynamoDB table in multiple regions so not having a backup or definitive copy of the table configuration is a bit of an issue.
Probably the best option is to take a screenshot of the table overview once it has been defined:
[This should enable an easy re-creation of the table, if it ever got accidentally deleted or something.]
It's pretty easy to find examples where a DynamoDB project failed. Lots of projects fail, for any number of reasons, so this is hardly surprising. However it's worth looking into these stories, if only to find examples of things to NOT do. The following article gives a good explanation of the hot key problem - which seems to be a common problem with DynamoDB:
http://syslog.ravelin.com/you-probably-shouldnt-use-dynamodb-89143c1287ca
There are work-arounds for hot keys (DAX should fix this problem) but the key takeaway should be that the ease and simplicity of using DynamoDB comes at a cost - that of visibility into DynamoDB internals.
DynamoDB is available for local use.
[This may well be coupled with AWS CloudFormation (which can run locally) or LocalStack.]
Probably the best option is to use the Dockerized version:
$ docker run -p 8000:8000 amazon/dynamodb-local
Usage notes:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBLocal.UsageNotes.html
You can find the Docker tags here.
The example shown above really defaults to:
$ docker run -p 8000:8000 amazon/dynamodb-local:latest
And a better option is to use a tagged version as follows:
$ docker run -p 8000:8000 amazon/dynamodb-local:tag
My approach to security involves the Principle of least privilege.
Accordingly, it's better to allocate 'YourTableName' manually rather than give Create permission.
For online use, restrict access as follows:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dynamodb:PutItem",
"dynamodb:GetItem"
],
"Resource":["arn:aws:dynamodb:us-east-1:xxxxxxxxxxxx:table/YourTableName"],
}
]
}
The example above is fine for Alexa skills but for a CRUD something like the following is more appropriate:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dynamodb:DeleteItem",
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:Scan",
"dynamodb:UpdateItem"
],
"Resource":["arn:aws:dynamodb:us-east-1:xxxxxxxxxxxx:table/YourTableName"],
}
]
}
You may wish to include dynamodb:Query
in your access list - or even have dynamodb:Query
only:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dynamodb:Query"
],
"Resource":["arn:aws:dynamodb:us-east-1:xxxxxxxxxxxx:table/YourTableName"],
}
]
}
TL;DR Use an inline policy for DynamoDB unless you are anticipating reuse. Likewise, restrict DynamoDB access to the needed table or tables - it's not just a best practice, it will probably help with GDPR compliance.
DynamoDB offers Web Identity Federation.
[Current identity providers are Amazon, Facebook and Google.]
As always with the cloud, documentation is voluminous. Some useful links are listed below.
For a quick (and largely provider-agnostic) summary of Serverless, the MongoDB Stitch FAQ provides this:
Stitch represents the next stage in the industry's migration to a more streamlined, managed infrastructure. Virtual Machines running in public clouds (notably AWS EC2) led the way, followed by hosted containers, and serverless offerings such as AWS Lambda and Google Cloud Functions. With serverless systems, you don't need to pre-provision computing resources – you just send requests and rely on the provider to handle them.
Existing serverless offerings (sometimes referred to as "Functions as a Service") still require backend developers to implement and manage access controls and REST APIs to provide access to microservices, public cloud services, and of course data.
[MongoDB Stitch seems to be javascript-only. Also "You pay for both data transfer and compute usage (memory x time)." although there is a free tier. It does seem to have good developer tooling.]
Tracking Your Free Tier Usage:
http://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/tracking-free-tier-usage.html
Querying and Scanning an Index:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SQLtoNoSQL.Indexes.QueryAndScan.html
[Shows how to specify an index, also how to use a ProjectionExpression (basically, subset the query results).]
Some databases will return an error if the document being modified or deleted does not exist.
Likewise if the document being created already exists.
Not so DynamoDB.
A case in point: putItem
- this functions as an upsert - meaning it will update the document in question
if it exists or it will create it if it does not yet exist. This may be helpful but it is not RESTful.
TL;DR It is necessary to check for the presence (or absence) of the document in question in DynamoDB to avoid surprises.
How to update a DynamoDB item:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SQLtoNoSQL.UpdateData.html
[Also shows how to use atomic counters.]
ReturnValues:
Use ReturnValues
if you want to get the item attributes as they appear before or after they are updated. For UpdateItem
, the valid values are:
NONE
- IfReturnValues
is not specified, or if its value isNONE
, then nothing is returned. (This setting is the default forReturnValues
.)ALL_OLD
- Returns all of the attributes of the item, as they appeared before the UpdateItem operation.UPDATED_OLD
- Returns only the updated attributes, as they appeared before the UpdateItem operation.ALL_NEW
- Returns all of the attributes of the item, as they appear after the UpdateItem operation.UPDATED_NEW
- Returns only the updated attributes, as they appear after the UpdateItem operation.
[Note that unchanged values will be considered to be UPDATED if they were specified in the UpdateExpression
.]
From:
http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_UpdateItem.html
DAX as a drop-in accelerator for DynamoDB:
http://www.allthingsdistributed.com/2017/06/amazon-dynamodb-accelerator-dax.html
With DAX, we've created a fully managed caching service that is API-compatible with DynamoDB.
And:
With DAX, you get faster reads, more throughput, and cost savings - without having to write any new code.
What Is AWS Billing and Cost Management?:
http://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/billing-what-is.html
Simple Monthly Calculator:
http://calculator.s3.amazonaws.com/index.html
[Click the Amazon DynamoDB tab in the column on the right-hand side.]
FREE TIER: Each month, Amazon DynamoDB users pay no charges on the first 25GB of storage, the first 2.5 million DynamoDB Streams read request units, as well as 25 write capacity unit and 25 read capacity units of provisioned capacity. Free tier also provides 25 replicated write capacity units to deploy DynamoDB Global Tables in up to 2 AWS regions.
Billing alarm:
http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/monitor_estimated_charges_with_cloudwatch.html
You must be signed in using AWS account root user credentials; IAM users cannot enable billing alerts for your AWS account.
Local usage notes:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBLocal.UsageNotes.html
- Investigate MongoDB DBaaS offering (Atlas)
- Investigate MongoDB Atlas free tier
- Investigate DynamoDB data residency (in terms of regions)
- Investigate DynamoDB Capacity, Reserved Capacity, and Cost Calculator
- Investigate backing-up DynamoDB configuration
- Investigate DynamoDB indexing
- Investigate DynamoDB DML (updateItem, etc)
- Test DynamoDB Accelerator (DAX)
- Investigate Google's offerings (Cloud Firestore and Cloud Bigtable)
- Investigate AWS Athena
- Investigate BigQuery (Google Cloud alternative to AWS Athena)
- Investigate DynamoDB offline use
- Investigate AWS Cost Explorer
- Investigate AWS Budgets (budgeting, cost allocation tags, alerts, consolidated billing)
- Investigate AWS billing alerts
- Investigate AWS current IAM and RBAC
- Verify if the access permissions shown above are still current (yes, as of July 2019)
- Investigate Web Identity Federation
- Investigate DynamoDB auto-scaling
- Test DynamoDB Global Tables
- Investigate exporting DDL for DynamoDB alternatives