This is an example of how to build a serverless API that leverages Snowflake's elastic data warehouse as DB engine for analytics applications. This example implements a read-only API that can run and paginate through the results of any view defined within a Snowflake database to which it has access to. It also leverages advanced Snowflake features like Multi-cluster Warehouses and multiple caching layers to build a truly scalable and performant analytics API at a fraction of the cost of legacy systems.
-
Access to a Snowflake account. Get your free trial account at: https://trial.snowflake.com/
-
Administrative access to an AWS account: https://aws.amazon.com/
This API is implemented as a completely serverless solution leveraging various AWS services including Lambda, API Gateway and Step Functions.
-
Typical REST endpoints timeout after a few minutes. So, we've implemented this analytics API on websockets which lends itself better for long running analytical queries. The client initiates a connection and sends a message:
{"action": "run_view", "view_name": "trip_weather_vw"}
-
The request is routed to a Lambda function which, in turn, triggers an AWS Step Functions workflow.
-
The step functions workflow initiates the query/view in Snowflake using the Snowflake connector for Python. The workflow then checks the status of the job in Snowflake every 5 seconds.
-
Once the query/view finishes execution, the response (either results or error) is sent back to the client asynchronously.
- Install the latest NodeJS: https://nodejs.org/en/download/. You can check to see if you have NodeJS installed by:
node --version
- This lab uses the Serverless framework (https://serverless.com/) which you can install globally using:
sudo npm install -g serverless
If installed successfully, you should be able to now check the installed version:
serverless --version
- The API is implemented using Python 3. Check to see if you Python 3 installed on your machine:
python --version
OR
python3 --version
If not installed, download and install Python 3: https://www.python.org/downloads/
- If you don't already, install the AWS CLI using pip3:
sudo pip3 install awscli --upgrade --user
You can use aws --version
command to verify if the AWS CLI was
correctly installed. If it wasn't, see this to troubleshoot:
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html
If installing for the first time, you will need to configure AWS CLI by using:
aws configure
Make sure you have your AWS credentials handy when you configure the AWS CLI.
- You can check if you have Git installed already by:
git --version
If not, install Git: https://git-scm.com/book/en/v2/Getting-Started-Installing-Git
- You will also need Docker to deploy the API. Download and install from here: https://www.docker.com/products/docker-desktop
Before we get into building the API, lets setup our backend Snowflake environment correctly so we have all the parameters ready when it comes time to edit the API code.
- Create and save the RSA public and private keys using the procedure described here: https://docs.snowflake.net/manuals/user-guide/snowsql-start.html#using-key-pair-authentication
Jot down the passphrase you used to encrypt the private key.
- This example API is read-only and will get data by running a particular view within Snowflake. Lets go ahead and create a view that the API can run. Login to your Snowflake account and run the following SQL statements:
use role accountadmin;
create role if not exists snowflake_api_role;
grant usage on database <YOUR_DB_NAME> to role snowflake_api_role;
grant usage on schema <YOUR_DB_NAME>.<YOUR_SCHEMA_NAME> to role snowflake_api_role;
grant select on all views in schema <YOUR_DB_NAME>.<YOUR_SCHEMA_NAME> to role snowflake_api_role;
grant select on future views in schema <YOUR_DB_NAME>.<YOUR_SCHEMA_NAME> to role snowflake_api_role;
grant usage on warehouse <YOUR_WAREHOUSE_NAME> to role snowflake_api_role;
create user snowflake_api_user password='Snowfl*ke' default_role = snowflake_api_role must_change_password = false;
alter user snowflake_api_user set rsa_public_key='<YOUR_RSA_PUBLIC_KEY>'; --exclude the header and footer
grant role snowflake_api_role to user snowflake_api_user;
grant role snowflake_api_role to user <CURRENT_LOGGED_IN_USER>;
- Create a test view with some test data, switch to using the new
snowflake_api_role
and try a simple select to see if the permissions work:
use role snowflake_api_role;
select * from <your_view_name> limit 10;
- Clone this repo:
git clone https://github.com/filanthropic/snowflake-api.git
- Before the Serverless framework can deploy this code, it needs the
serverless-python-requirements
plugin so lets install that (dependency is declared in package.json)
cd snowflake-api/
npm install
- Open the AWS Secrets Manager and create a new secret that will hold the private key. Select 'Other type of secret' and then select
plaintext
and usep_key
as the key and your private key that you generated in the Snowflake setup step 1 as the value.
Hit Next
, give the secret a name and description. Hit Next
again twice and then hit Store
. Note the name you gave to the secret.
- From within the
snowflake-api
directory, openkeypair_auth.py
and update the following line with the passphrase that you used when you created the key pair in Snowflake Setup step #1:
passkey = "<your_passphrase>"
- Also within
keypair_auth.py
, update thesecret_name
variable with the name (not the ARN) of the secret you just created within AWS Secrets Manager:
secret_name = "<SECRET_NAME>"
-
Open
serverless.yml
. At the top of this file contains the 'service' -> 'name' configuration. Go ahead and change the service name to whatever you want to name this project. -
Change AWS account number in serverless.yml
- If using the default AWS CLI profile, remove the
profile
attribute inserverless.yml
. If using a named profile, change it to match the AWS CLI profile you want to use to deploy:
- In serverless.yml, update the last part of the ARN (not the name) of the secret that holds the private key you previously created:
- Update the rest of the environment variables to match your Snowflake account, warehouse name, database and schema name within
serverless.yml
.
- Now we are ready to deploy the API to AWS. Go to the 'snowflake-api' folder and deploy the serverless stack:
serverless deploy
The command above will take all the code you cloned, package it up as AWS Lambda functions and deploys them. It also creates the AWS API Gateway endpoint with websockets and the AWS Step Functions state machine that orchestrates the Lambda functions.
The API is based on websockets because of the long running nature of analytics queries. The best way to understand how the client interacts with the API is to first install the "wscat" tool.
- Install the "wscat"
sudo npm install -g wscat
- Connect to the API endpoint you created in step #13:
wscat -c wss://<your_api_endpoint>
-
In the API code, we have implemented two websocket "routes" or types of actions that the API supports. First one is used to run a particular view and is called "run_view" and the other one called "fetch_results" is used to fetch cached results of an already run query and helps the client paginate through the results in an efficient manner.
-
Once connected, you can run the secure view you created previously by running:
{"action": "run_view", "view_name":"<your_view_name>"}
The response should look something like this:
- The response of the previous command should give you a
query_id
which you can use to paginate through the results:
{"action": "fetch_results","query_id": "<query_id>", "offset": "100"}
- Open up the 'client.html' file in a browser to see how a simple HTML client can interact with our Snowflake API.
-
AWS API Gateway WebSocket Support: https://docs.aws.amazon.com/apigateway/latest/developerguide/apigateway-websocket-api.html
-
Serverless Framework: https://serverless.com/
-
AWS Serverless Stack: https://aws.amazon.com/serverless/
-
Snowflake Python Connector Keypair Auth: https://docs.snowflake.net/manuals/user-guide/python-connector-example.html#using-key-pair-authentication