Skip to content

Commit

Permalink
Moving work into Initialization
Browse files Browse the repository at this point in the history
Up until we implemented the connection and querying of the database our function's runtime was single-digit milliseconds and the initialization from a cold-start was around 25ms.

With the connection initialization and querying both being in the handler, our cold-start initialization time hasn't changed much, but our runtime ballooned to 700ms.

```rust
10:50:08 AM: cold start
10:50:08 AM: handler
10:50:08 AM: 6e363ebc Duration: 754.40 ms	Memory Usage: 31 MB	Init Duration: 29.82 ms
10:50:11 AM: handler
10:50:12 AM: ae0fbc1d Duration: 752.15 ms	Memory Usage: 31 MB
10:50:14 AM: handler
10:50:14 AM: c4eedefd Duration: 689.70 ms	Memory Usage: 31 MB
```

The work we're doing in a serverless function splits into two major pieces:

1. The work done when initializing: aka the "cold start" work
2. The work we do on every request: aka the `handler`

Our initialization work doesn't have much to do compared to our pre-sql function, as evidenced by the fact that our previous cold start time and our current cold start times are more or less the same.

Looking at our handler function, we have a couple segments of work happening.

- Retrieve an environment variable
- Set up a connection pool
- Query to the database
- Serialize our result to JSON

Without measuring we can make a couple assumptions about the performance of this work and where it should happen.

Netlify doesn't let us change environment variables while a function is running, so it seems reasonable to only get it once when the function starts up.

The connection pool handles instantiating connections to the database when we need them. On first initialization, it connects to the database once, and if the pool needs more it will spin up more connections.

By placing the initialization of the connection pool in the handler function, we're ensuring that a new connection is established on each invocation of the lambda. If we can instead move this to the cold start, then we should be able to re-use connections from the pool across warm lambda invocations.

Querying the database has to happen in the handler, because we'll be using user input to adjust the query.

Serializing our JSON relies on the SQL query request so it has to be in the handler as well.

So if the only large piece of work we need in our handler is the sql query, what times should we expect for such a request?

This gets into a bit of physics. Specifically:

- the location of our database
- the location our functions are running in
- the location we are making the request from

From the east coast of the US to the west coast the theoretical minimum latency is 25ms for a round-trip. This is the amount of time it takes light to travel that far. Realistically, information travels at something like 66% of that over the internet.

So we can estimate a single round trip at approximately 40ms.

This matters because I'm recording from San Francisco, Netlify Functions are in a US East datacenter, and my PlanetScale database is in US West. So my theoretical minimum latency is 80ms for the request to hit the function, then the database, and return to me.

80ms is not a number we'll reach as the minimum in practice is higher than the theoretical minimum, but 80ms is a lot lower than 700ms though, so we've got some work to do.

To move the connection pool to the initialization phase, we'll use a new data type: `OnceCell`. `OnceCell` comes from a third party crate called `once_cell`.

```rust
cargo add -p pokemon-api once_cell
```

Cells are data structures that exist in the standard library under the `std::cell` module path. `OnceCell` is an extension to this set of data structures that can only be set once, hence the name: "once" cell.

We can read the value as many times as we need to, later in our program.

We need to bring `OnceCell`, `MySql`, and `Pool` into scope.

```rust
use once_cell::sync::OnceCell;
use sqlx::{mysql::MySqlPoolOptions, MySql, Pool};
```

To construct a new `OnceCell`, we can use `OnceCell::new()` which returns a new `OnceCell` struct.

`OnceCell` takes a type argument that indicates what kind of value we're going to store in it. We're going to store an instantiated MySql connection pool, so the type of `POOL` will be `OnceCell<Pool<MySql>>`.

```rust
static POOL: OnceCell<Pool<MySql>> = OnceCell::new();
```

Which brings us to `static`. The `static` declaration is at the root of our file, it's not inside of any function like our `let` declarations are.

`static` basically means: make one of these and only one of these. All references will refer to this specific instance and there will never be a second.

We can continue in our `main` function, moving our `database_url` and `pool` initialization work from the match expression into `main`.

To actually initialize the `POOL` item, we can use `get_or_init`, which will call `init` if `POOL` hasn't been initialized yet. `get_or_init` takes a closure, which we return our initialized `pool` from. This moves `pool` into `POOL` and enables us to access it later in our handler.

```rust
async fn main() -> Result<(), Error> {
    println!("cold start");
    let database_url = env::var("DATABASE_URL")?;
    let pool = MySqlPoolOptions::new()
        .max_connections(5)
        .connect(&database_url)
        .await?;
    POOL.get_or_init(|| pool);
    let processor = handler_fn(handler);
    lambda_runtime::run(processor).await?;
    Ok(())
}
```

In our handler, we can use `.get()` to get access to the connection pool. `.get()` could fail, although it won't for us, so we can `.unwrap()`. This will panic if the `POOL` isn't set, but we always set it in the cold start phase, so if it doesn't exist something has gone very wrong.

```rust
.fetch_one(POOL.get().unwrap())
```

This leaves us with quite a difference in cold start vs handler times. The cold start is over 500 ms, while the warm execution time of our actual data fetch is around 150ms.

```rust
3:40:38 AM: 4943c4cf Duration: 222.31 ms	Memory Usage: 20 MB	Init Duration: 569.64 ms	3:41:16 AM: handler
3:41:16 AM: 0bf48336 Duration: 149.41 ms	Memory Usage: 20 MB	3:41:35 AM: handler
3:41:35 AM: ec25755d Duration: 147.69 ms	Memory Usage: 20 MB
```
  • Loading branch information
ChristopherBiscardi committed Oct 24, 2021
1 parent df01e51 commit 91128b7
Show file tree
Hide file tree
Showing 3 changed files with 13 additions and 7 deletions.
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions crates/pokemon-api/Cargo.toml
Expand Up @@ -13,3 +13,4 @@ sqlx = { version = "0.5.7", features = ["mysql", "runtime-tokio-rustls"] }
serde = { version = "1.0.130", features = ["derive"] }
tokio = "1.12.0"
serde_json = "1.0.68"
once_cell = "1.8.0"
18 changes: 11 additions & 7 deletions crates/pokemon-api/src/main.rs
Expand Up @@ -6,14 +6,23 @@ use aws_lambda_events::{
};
use http::header::HeaderMap;
use lambda_runtime::{handler_fn, Context, Error};
use once_cell::sync::OnceCell;
use serde::Serialize;
use serde_json::json;
use sqlx::mysql::MySqlPoolOptions;
use sqlx::{mysql::MySqlPoolOptions, MySql, Pool};
use std::env;

static POOL: OnceCell<Pool<MySql>> = OnceCell::new();

#[tokio::main]
async fn main() -> Result<(), Error> {
println!("cold start");
let database_url = env::var("DATABASE_URL")?;
let pool = MySqlPoolOptions::new()
.max_connections(5)
.connect(&database_url)
.await?;
POOL.get_or_init(|| pool);
let processor = handler_fn(handler);
lambda_runtime::run(processor).await?;
Ok(())
Expand All @@ -30,7 +39,6 @@ async fn handler(
_: Context,
) -> Result<ApiGatewayProxyResponse, Error> {
println!("handler");
let database_url = env::var("DATABASE_URL")?;
let path = event
.path
.expect("expect there to always be an event path");
Expand All @@ -51,16 +59,12 @@ async fn handler(
},
None => panic!("requested_pokemon is None, which should never happen"),
Some(pokemon_name) => {
let pool = MySqlPoolOptions::new()
.max_connections(5)
.connect(&database_url)
.await?;
let result = sqlx::query_as!(
PokemonHp,
r#"SELECT name, hp FROM pokemon WHERE slug = ?"#,
pokemon_name
)
.fetch_one(&pool)
.fetch_one(POOL.get().unwrap())
.await?;

let json_pokemon =
Expand Down

0 comments on commit 91128b7

Please sign in to comment.