Skip to content

Querying Data from Table Storage

Daniel Chambers edited this page Jan 12, 2021 · 9 revisions

To use the FSharp.Azure.Storage querying functions open the TableStorage module:

open FSharp.Azure.Storage.Table

Compatible Types

To provide an idiomatic F# experience when querying table storage, FSharp.Azure.Storage supports the use of record types when querying. For example, the following record type would be used to read a table with columns that match the field names:

type Game = 
    { Name : string
      Developer : string
      HasMultiplayer : bool
      Notes : string }

We will use this record type in the examples below. We will also assume, for the sake of these examples, that the Developer field is also used as the PartitionKey and the Name field is used as the RowKey. To understand how FSharp.Azure.Storage conceptually separates record fields away from the PK and RK table properties, please read Modifying Data in Table Storage.

FSharp.Azure.Storage also supports querying class types that implement the Microsoft.Azure.Cosmos.Table.ITableEntity interface.

Setting up

The easiest way to use the FSharp.Azure.Storage API is to define a quick helper function that allows you to query for rows from a particular table:

open Microsoft.Azure.Cosmos.Table

let account = CloudStorageAccount.Parse "UseDevelopmentStorage=true;" //Or your connection string here
let tableClient = account.CreateCloudTableClient()

let fromGameTable q = fromTable tableClient "Games" q

The fromGameTable function fixes the tableClient and table name parameters of the fromTable function, so you don't have to keep passing them. This technique is very common when using the FSharp.Azure.Storage API.

Getting everything

Here's how we'd query for all rows in the "Games" table:

let games = Query.all<Game> |> fromGameTable

games above is of type seq<Game * EntityMetadata>. The EntityMetadata type contains the Etag and Timestamp of each Game. Here's how you might work with that:

let gameRecords = games |> Seq.map fst
let etags = games |> Seq.map (fun game, metadata -> metadata.Etag)

The etags in particular are useful when updating those records in table storage, because they allow you to utilise Azure Table Storage's optimistic concurrency protection to ensure nothing else has changed the record since you queried for it. See Modifying Data in Table Storage for more information.

If you'd like the ETag and Timestamp values to be written to fields on your record type as well, you can add Etag and Timestamp attributes to the record fields like so:

type Game = 
    { Name : string
      Developer : string
      HasMultiplayer : bool
      Notes : string
      [<Timestamp>] Modified : DateTimeOffset option
      [<Etag>] Etag : string option }

Note that these fields will be ignored when you save back to table storage, as they are system-controlled properties.

Filtering with where

The Query.where function allows you to use an F# quotation of a lambda to specify what conditions you want to filter by. The lambda you specify must be of type:

'T -> SystemProperties -> bool

The SystemProperties type allows you to construct filters against system properties such as the Partition Key and Row Key, which are the only two properties that are indexed by Table Storage, and therefore the ones over which you will most likely be performing filtering.

For example, this is how we'd get an individual record by PartitionKey and RowKey:

let halo4, metadata = 
    Query.all<Game>
    |> Query.where <@ fun g s -> s.PartitionKey = "343 Industries" && s.RowKey = "Halo 4" @>
    |> fromGameTable
    |> Seq.head

You can, however, query over properties on your record type too. Be aware that queries over those properties are not indexed by Table Storage and as such will suffer performance penalties.

For example, if we wanted to find all multiplayer games made by Valve, we could write:

let multiplayerValveGames = 
    Query.all<Game>
    |> Query.where <@ fun g s -> s.PartitionKey = "Valve" && g.HasMultiplayer @>
    |> fromGameTable

The following operators/functions are supported for use inside the where lambda:

  • The =, <>, <, <=, >, >= operators
  • The not function

Taking only the first n rows

Table storage allows you to limit the query results to be only the first 'n' results it finds. Naturally, FSharp.Azure.Storage supports this.

Here's an example query that limits the results to the first 5 multiplayer games made by Valve:

let multiplayerValveGames = 
    Query.all<Game>
    |> Query.where <@ fun g s -> s.PartitionKey = "Valve" && g.HasMultiplayer @>
    |> Query.take 5
    |> fromGameTable

Query segmentation

Azure table storage may not return all the results that match the query in one go. Instead it may split the results over multiple segments, each of which must be queried for separately and sequentially. According to the documentation, table storage will start segmenting results if:

  • The resultset contains more than 1000 items
  • The query took longer than five seconds
  • The query crossed a partition boundary

FSharp.Azure.Storage supports handling query segmentation manually as well as automatically. The fromTable function we used in the previous examples returns a seq that will automatically query for additional segments as you iterate.

If you want to handle segmentation manually, you can use the fromTableSegmented function instead of fromTable. First, define a helper function:

let fromGameTableSegmented c q = fromTableSegmented tableClient "Games" c q

The fromGameTableSegmented function will have the type:

TableContinuationToken option -> EntityQuery<'T> -> List<'T * EntityMetadata> * TableContinuationToken option

This means it takes an optional continuation token and the query, and returns the list of results in that segment, and optionally the continuation token used to access the next segment, if any.

Here's an example that gets the first two segments of query results:

let query = Query.all<Game>

let games1, segmentToken1 = 
    query |> fromGameTableSegmented None //None means querying for the first segment (ie. no continuation)

//We're making the assumption segmentToken1 here is not None and therefore 
//there is another segment to read. In practice, this is a very poor assumption
//to make, since segmentation is performed arbitrarily by table storage
if segmentToken1.IsNone then failwith "No segment 2!"

let games2, segmentToken2 = 
    query |> fromGameTableSegmented segmentToken1

In practice, you'd probably write a recursive function or a loop to iterate through the segments until a certain condition, or use the asynchronous sequences support (see below).

Asynchronous support

FSharp.Azure.Storage also supports asynchronous equivalents of fromTable and fromTableSegmented. To use them, you would first create your helper functions:

let fromGameTableAsync q = fromTableAsync tableClient "Games" q
let fromGameTableSegmentedAsync c q = fromTableSegmentedAsync tableClient "Games" c q

fromTableAsync automatically and asynchronously makes requests for all the segments and returns all the results in a single seq. Note that unlike fromTable, all segments are queried for during the asynchronous operation, not during sequence iteration. (This is because seq doesn't support asynchronous iteration.)

Here's an example of using fromTableAsync:

let valveGames = 
    Query.all<Game> 
    |> Query.where <@ fun g s -> s.PartitionKey = "Valve" @> 
    |> fromGameTableAsync 
    |> Async.RunSynchronously

And finally, an example using the asynchronous segmentation variant:

let asyncOp = async {
    let query = Query.all<Game>

    let! games1, segmentToken1 = 
        query |> fromGameTableSegmentedAsync None //None means querying for the first segment (ie. no continuation)
        
    //We're making the assumption segmentToken1 here is not None and therefore 
    //there is another segment to read. In practice, this is a very poor assumption
    //to make, since segmentation is performed arbitrarily by table storage
    if segmentToken1.IsNone then failwith "No segment 2!"
    
    let! games2, segmentToken2 = 
        query |> fromGameTableSegmentedAsync segmentToken1
        
    return games1 @ games2
}

let games = asyncOp |> Async.RunSynchronously

Asynchronous sequences

If you need to query a large number of rows and you don't want to materialise them all in memory simultaneously, you can stream through them by using the fromTableSegmentedAsyncSeq function. This function returns an AsyncSeq (from the FSharp.Control.AsyncSeq library), where each item yielded by the sequence is a segment's worth of rows.

let fromGameTableSegmentedAsyncSeq q = fromTableSegmentedAsyncSeq tableClient "Games" q

let printAllGames = async {
    do! Query.all<Game>
        |> fromGameTableSegmentedAsyncSeq
        |> AsyncSeq.iter (fun rows ->
            for (game, metadata) in rows do
                printfn "%A" game
        )
}

For more information about what you can do with asynchronous sequences, see the FSharp.Control.AsyncSeq documentation.