Getting Started

Yan Cui edited this page May 6, 2013 · 20 revisions
Clone this wiki locally

Whilst I tried to make the syntactical difference between querying and scanning minimal, DynamoDB has some limitations and differences to what you can do in a Query and Scan request. For instance, whilst you can use the full range of comparison operators ("EQ", "NE", "GT", etc.) in a Scan request, only a subset of these operators ("EQ", "GT", "GE", "LT", "LE", "BEGINS_WITH" and "BETWEEN") can be used in a Query request.

To learn more about the supported operations, read this and this API doc.

When to use

The external DSL outlined below is only intended to be used to perform a Query and Scan request, the code provided in the repository only allows you to use them as such. If you're looking to Get an item by keys (hash and range), you should continue to use the existing mechanisms to do so. This library does not intent to replace all existing functionalities of the AWS SDK but to make it easy to perform Query and Scan operations.

Assumptions

The rest of this guide assumes that you have at least basic understanding of DynamoDB and its data model. If you're not familiar with DynamoDB already, please refer to its API documentation here to help getting yourself familiarized, believe me, it'll pay off in the long run because it's such an awesome product!

Syntax

Query

A Query request is only permitted if your table defines both a hash and a range key. A Query allows you to specify the hash key value and optionally specify at most one comparison against the range key value, if no range key comparison is specified then all available range keys will be returned.

The basic format of a Query can be expressed as:

SELECT AttributeName1, AttributeName2, ..., AttributeNameN
FROM   TableName
WHERE  @HashKey = HashKeyValue
AND    @RangeKey operator RangeKeyValue(s)
ORDER DESC
LIMIT  N
WITH   (NoConsistentRead, PageSize(n))

where:

  • AttributeName1 to AttributeNameN are the names of the attributes you want to return with the query, and if you want to return all attributes, you can also just use asterisk (*) instead
  • TableName is the name of the table, which at the time of writing, the library does not allow table names to be any case variants of 'where' or 'limit' in order to avoid subtle bugs such as:

    SELECT * FROM WHERE @HashKey = 2012

    from creeping into your code because TableName had been parsed as "WHERE @HashKey = 2012"

  • @HashKey and @RangeKey are special keywords used to identify the hash and range key in your table
  • HashKeyValue and RangeKeyValue can be either numeric (integer or floating number) or string (which must be enclosed in double quotes ("))
  • WHERE clause is non-optional, and must specify one and only one value for the hash key with the '=' operator
  • an optional comparison can be made against the range key by specifying an operator with one of more values for the operator
  • operator is one of the allowed comparison operators, i.e.
    • = : SELECT * FROM Employees WHERE @HashKey = \"Yan\" AND @RangeKey = 99
    • >= : SELECT * FROM Employees WHERE @HashKey = \"Yan\" AND @RangeKey >= 99
    • > : SELECT * FROM Employees WHERE @HashKey = \"Yan\" AND @RangeKey > 99
    • <= : SELECT * FROM Employees WHERE @HashKey = \"Yan\" AND @RangeKey <= 99
    • < : SELECT * FROM Employees WHERE @HashKey = \"Yan\" AND @RangeKey < 99
    • BEGINS WITH : SELECT * FROM Employees WHERE @HashKey = \"Yan\" AND @RangeKey BEGINS WITH \"Dynamo\"
    • BETWEEN .. AND .. : SELECT * FROM Employees WHERE @HashKey = \"Yan\" AND @RangeKey BETWEEN 10 AND 30
  • ORDER clause is optional, and can be followed by either ASC or DESC to search forward or backward respectively. If not specified, the default behaviour (search forward) will be used.
  • LIMIT clause is optional, and can be used to limit the number of results that are returned.
  • WITH clause is optional, and can be used to set optional parameters for the query:
    • NoConsistentRead : by default the DSL will issue query requests with consitent read, the NoConsistentRead option will force the DSL to use eventual consistency instead
    • PageSize : by default DynamoDB will return as many items as available in your table that can fit into the max 1MB return size per request, but this creates strong bursts of activities against DynamoDB which is likely to cause other operations to be throttled. You can use the PageSize option to set the max number of items that are returned per query request to reduce the amount of bursts your generate against DynamoDB.
  • the keywords SELECT, FROM, WHERE, LIMIT, ORDER, WITH, @HashKey, @RangeKey and the operators are all case insensitive

Scan

A Scan request is more permissive but also more expensive to run (both in terms of latency and read units consumed which translates directly to running cost). A 'Scan' allows you to filter on arbitrary fields in your data using a wide range of comparison operators.

The basic format of a Scan can be expressed as:

SELECT AttributeName1, AttributeName2, ..., AttributeNameN
FROM   TableName
WHERE  AttributeName1 operator1 AttributeValue1
AND    AttributeName2 operator2 AttributeValue2
AND    ...
AND    AttributeNameN operatorN AttributeValueN
LIMIT  N
WITH   (PageSize(n))

where:

  • AttributeName1 to AttributeNameN are the names of the attributes you want to return with the query, and if you want to return all attributes, you can also just use asterisk (*) instead
  • TableName is the name of the table, which at the time of writing, the library does not allow table names to be any case variants of 'where' or 'limit' in order to avoid subtle bugs such as:

    SELECT * FROM WHERE Year = 2012

    from creeping into your code because TableName had been parsed as "WHERE Year = 2012"

  • WHERE clause is optional, and can specify one or more comparisons against the values of attributes using one of the supported operators
  • if WHERE clause is not specified, then all rows in the table will be returned
  • AttributeValue1 to AttributeValueN can be either numeric (integer or floating number) or string (which must be enclosed in double quotes ("))
  • operator is one of the allowed comparison operators, i.e.
    • = : SELECT * FROM Employees WHERE FirstName = \"Yan\"
    • != : SELECT * FROM Employees WHERE FirstName != \"Yan\"
    • >= : SELECT * FROM Employees WHERE Age >= 18
    • > : SELECT * FROM Employees WHERE Age > 18
    • <= : SELECT * FROM Employees WHERE Age <= 99
    • < : SELECT * FROM Employees WHERE Age < 99
    • CONTAINS : SELECT * FROM Employees WHERE Interests CONTAINS \"DynamoDB\"
    • NOT CONTAINS : SELECT * FROM Employees WHERE LuckyNumbers NOT CONTAINS 11
    • BEGINS WITH : SELECT * FROM Employees WHERE DateJoined BEGINS WITH \"2010\"
    • IS NULL : SELECT * FROM Employees WHERE FavouriteTeam IS NULL
    • IS NOT NULL : SELECT * FROM Employees WHERE FavouriteTeam IS NOT NULL
    • BETWEEN .. AND .. : SELECT * FROM Employees WHERE Age BETWEEN 18 AND 30
    • IN (...) : SELECT * FROM Employees WHERE Age IN (18, 25, 32, 39)
  • LIMIT clause is optional, and can be used to limit the number of results that are returned.
  • WITH clause is optional, and can be used to set optional parameters for the scan:
    • PageSize : by default DynamoDB will return as many items as available in your table that can fit into the max 1MB return size per request, but this creates strong bursts of activities against DynamoDB which is likely to cause other operations to be throttled. You can use the PageSize option to set the max number of items that are returned per scan request to reduce the amount of bursts your generate against DynamoDB.
  • the keywords SELECT, FROM, WHERE, LIMIT, WITH and the operators are all case insensitive

Query and Scan Count

If all you want to find out is the number of matching items rather than the actual items themselves, then you can also use the COUNT keyword in place of SELECT.

For instance, for a query count:

COUNT  * 
FROM   TableName
WHERE  @HashKey = HashKeyValue
AND    @RangeKey operator RangeKeyValue(s)
LIMIT  N

or a scan count:

COUNT  * 
FROM   TableName
WHERE  AttributeName1 operator1 AttributeValue1
AND    AttributeName2 operator2 AttributeValue2
AND    ...
AND    AttributeNameN operatorN AttributeValueN
LIMIT  N

Please note that DynamoDB does not allow attribute names to specified when performing a count operation, so a valid count query will always start with COUNT * FROM ....

Also, at the time of writing, count queries can only be used with the low-level AmazonDynamoDBClient class. This is because there's no support for performing counts from the DynamoDBContext class, which as an ORM layer I think it's sensible decision.

Usages

For now, there are extension methods for the AmazonDynamoDBClient and DynamoDBContext classes to allow you to execute a Query or Scan using a query string written in the syntax described above.

Working with the low-level AmazonDynamoDBClient class

In the DynamoDb.SQL.Execution namespace, there exists a number of extension methods to asynchronously or synchronously execute a Query or Scan:

/// Executes a query asynchronously and returns the results
[<Extension>]
static member QueryAsync       : AmazonDynamoDBClient * string -> Async<QueryResponse>

/// Executes a query asynchronously as a task and returns the results
[<Extension>]
static member QueryAsyncAsTask : AmazonDynamoDBClient * string -> Task<QueryResponse>

/// Executes a query synchronously and returns the results
[<Extension>]
static member Query            : AmazonDynamoDBClient * string -> QueryResponse

/// Executes a scan asynchronously and returns the results
[<Extension>]
static member ScanAsync        : AmazonDynamoDBClient * string -> Async<ScanResponse>

/// Executes a scan asynchronously as a task and returns the results
[<Extension>]
static member ScanAsyncAsTask  : AmazonDynamoDBClient * string -> Task<ScanResponse>

/// Executes a scan synchronously and returns the results
[<Extension>]
static member Scan             : AmazonDynamoDBClient * string -> ScanResponse

Here are some simple examples written in C#:

using DynamoDb.SQL.Execution

...

var client = new AmazonDynamoDBClient();

// query all
var hashKey = "\"Amazon DynamoDB#DynamoDB Thread 1\"";
var queryAllRes = client.Query("SELECT * from Reply where @HashKey = " + hashKey);

// query subset
var querySubsetRes1 = client.Query("select * from Reply where @HashKey = " + hashKey + " and @RangeKey >= \"2012-06-10\"");
var querySubsetRes2 = client.Query("select * from Reply where @HashKey = " + hashKey + " and @RangeKey < \"2012-06-10\"");

// scans
var scanRes1 = client.Scan("Select * from Reply where PostedBy contains \"A\"");
var scanRes2 = client.Scan("Select * from Reply where ReplyDateTime between \"2012-06-10\" and \"2012-06-20\"");

Working with the high-level DynamoDBContext class

In the DynamoDb.SQL.Execution namespace, there are also two extension methods for DynamoDBContext class to synchronously execute a Query or Scan:

/// Executes a query synchronously and returns the results
[<Extension>]
static member ExecQuery<'T>    : DynamoDBContext * string -> IEnumerable<'T>

/// Executes a scan synchronously and returns the results
[<Extension>]
static member ExecScan<'T>     : DynamoDBContext * string -> IEnumerable<'T>

I had decided to name the methods ExecQuery and ExecScan intentionally to avoid clashing with the existing Query<T>(object hashKey) and Scan<T>(object hashKey) methods. This is because methods that take in object as argument always leave doubt in developers' minds, and can introduce subtle runtime bugs if someone had forgotten to use the DynamoDb.SQL.Execution namespace and attempted to run a Query/Scan using a string written in the above DSL syntax, in which case the DSL string will be interpreted as the hash key instead and mostly likely return nothing...

Again, here are some simple examples written in C#:

using DynamoDb.SQL.Execution

...

var client = new AmazonDynamoDBClient();
var cxt = new DynamoDBContext(client);

// query all
var hashKey = "\"Amazon DynamoDB#DynamoDB Thread 1\"";            
var queryAllRes = cxt.ExecQuery<Reply>("SELECT * from Reply where @HashKey = " + hashKey);

// query subset
var querySubsetRes1 = cxt.ExecQuery<Reply>("select * from Reply where @HashKey = " + hashKey + " and @RangeKey >= \"2012-06-10\"");
var querySubsetRes2 = cxt.ExecQuery<Reply>("select * from Reply where @HashKey = " + hashKey + " and @RangeKey < \"2012-06-10\"");

// scans
var scanRes1 = cxt.ExecScan<Reply>("Select * from Reply where PostedBy contains \"A\"");
var scanRes2 = cxt.ExecScan<Reply>("Select * from Reply where ReplyDateTime between \"2012-06-10\" and \"2012-06-20\"");

Limitations

  • the usual DynamoDB limitations also supplies here, so a maximum of 1MB of data can be returned per API call, there's currently no support to automatically use LastEvaluatedKey to make the follow up request.
  • DynamoDB supports a number of data types (String, String Set, Number, Number Set, Binary, Binary Set), but for the time being this library only supports the most commonly used data types of String and Number.
  • there is no support for paginated queries or scans when using the low-level DynamoDBClient, this behaviour is built into the DynamoDBContext however.