Skip to content

sfc-gh-ahkhan/snowflake-api

Repository files navigation

Snowflake API for Analytics Applications

This is an example of how to build a serverless API that leverages Snowflake's elastic data warehouse as DB engine for analytics applications. This example implements a read-only API that can run and paginate through the results of any view defined within a Snowflake database to which it has access to. It also leverages advanced Snowflake features like Multi-cluster Warehouses and multiple caching layers to build a truly scalable and performant analytics API at a fraction of the cost of legacy systems.

Requirements

Architecture

This API is implemented as a completely serverless solution leveraging various AWS services including Lambda, API Gateway and Step Functions.

  1. Typical REST endpoints timeout after a few minutes. So, we've implemented this analytics API on websockets which lends itself better for long running analytical queries. The client initiates a connection and sends a message: {"action": "run_view", "view_name": "trip_weather_vw"}

  2. The request is routed to a Lambda function which, in turn, triggers an AWS Step Functions workflow.

  3. The step functions workflow initiates the query/view in Snowflake using the Snowflake connector for Python. The workflow then checks the status of the job in Snowflake every 5 seconds.

  4. Once the query/view finishes execution, the response (either results or error) is sent back to the client asynchronously.

Setup & Configuration

  1. Install the latest NodeJS: https://nodejs.org/en/download/. You can check to see if you have NodeJS installed by:
node --version
  1. This lab uses the Serverless framework (https://serverless.com/) which you can install globally using:
sudo npm install -g serverless

If installed successfully, you should be able to now check the installed version:

serverless --version
  1. The API is implemented using Python 3. Check to see if you Python 3 installed on your machine:
python --version

OR

python3 --version

If not installed, download and install Python 3: https://www.python.org/downloads/

  1. If you don't already, install the AWS CLI using pip3:
sudo pip3 install awscli --upgrade --user

You can use aws --version command to verify if the AWS CLI was correctly installed. If it wasn't, see this to troubleshoot: https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html If installing for the first time, you will need to configure AWS CLI by using:

aws configure

Make sure you have your AWS credentials handy when you configure the AWS CLI.

  1. You can check if you have Git installed already by:
  git --version

If not, install Git: https://git-scm.com/book/en/v2/Getting-Started-Installing-Git

  1. You will also need Docker to deploy the API. Download and install from here: https://www.docker.com/products/docker-desktop

Snowflake Setup

Before we get into building the API, lets setup our backend Snowflake environment correctly so we have all the parameters ready when it comes time to edit the API code.

  1. Create and save the RSA public and private keys using the procedure described here: https://docs.snowflake.net/manuals/user-guide/snowsql-start.html#using-key-pair-authentication

Jot down the passphrase you used to encrypt the private key.

  1. This example API is read-only and will get data by running a particular view within Snowflake. Lets go ahead and create a view that the API can run. Login to your Snowflake account and run the following SQL statements:
use role accountadmin;
create role if not exists snowflake_api_role;
grant usage on database <YOUR_DB_NAME> to role snowflake_api_role;
grant usage on schema <YOUR_DB_NAME>.<YOUR_SCHEMA_NAME> to role snowflake_api_role;
grant select on all views in schema <YOUR_DB_NAME>.<YOUR_SCHEMA_NAME> to role snowflake_api_role;
grant select on future views in schema <YOUR_DB_NAME>.<YOUR_SCHEMA_NAME> to role snowflake_api_role;
grant usage on warehouse <YOUR_WAREHOUSE_NAME> to role snowflake_api_role;

create user snowflake_api_user password='Snowfl*ke' default_role = snowflake_api_role must_change_password = false;
alter user snowflake_api_user set rsa_public_key='<YOUR_RSA_PUBLIC_KEY>'; --exclude the header and footer
grant role snowflake_api_role to user snowflake_api_user;
grant role snowflake_api_role to user <CURRENT_LOGGED_IN_USER>;
  1. Create a test view with some test data, switch to using the new snowflake_api_role and try a simple select to see if the permissions work:
use role snowflake_api_role;
select * from <your_view_name> limit 10;

Clone, Modify and Deploy Code

  1. Clone this repo:
  git clone https://github.com/filanthropic/snowflake-api.git
  1. Before the Serverless framework can deploy this code, it needs the serverless-python-requirements plugin so lets install that (dependency is declared in package.json)
cd snowflake-api/
npm install
  1. Open the AWS Secrets Manager and create a new secret that will hold the private key. Select 'Other type of secret' and then select plaintext and use p_key as the key and your private key that you generated in the Snowflake setup step 1 as the value.

Hit Next, give the secret a name and description. Hit Next again twice and then hit Store. Note the name you gave to the secret.

  1. From within the snowflake-api directory, open keypair_auth.py and update the following line with the passphrase that you used when you created the key pair in Snowflake Setup step #1:
    passkey = "<your_passphrase>"
  1. Also within keypair_auth.py, update the secret_name variable with the name (not the ARN) of the secret you just created within AWS Secrets Manager:
secret_name = "<SECRET_NAME>"
  1. Open serverless.yml. At the top of this file contains the 'service' -> 'name' configuration. Go ahead and change the service name to whatever you want to name this project.

  2. Change AWS account number in serverless.yml

  1. If using the default AWS CLI profile, remove the profile attribute in serverless.yml. If using a named profile, change it to match the AWS CLI profile you want to use to deploy:

  1. In serverless.yml, update the last part of the ARN (not the name) of the secret that holds the private key you previously created:

  1. Update the rest of the environment variables to match your Snowflake account, warehouse name, database and schema name within serverless.yml.

  1. Now we are ready to deploy the API to AWS. Go to the 'snowflake-api' folder and deploy the serverless stack:
serverless deploy

The command above will take all the code you cloned, package it up as AWS Lambda functions and deploys them. It also creates the AWS API Gateway endpoint with websockets and the AWS Step Functions state machine that orchestrates the Lambda functions.

  1. Go ahead and make note of the API endpoint that you just created.

Using the API

The API is based on websockets because of the long running nature of analytics queries. The best way to understand how the client interacts with the API is to first install the "wscat" tool.

  1. Install the "wscat"
sudo npm install -g wscat
  1. Connect to the API endpoint you created in step #13:
wscat -c wss://<your_api_endpoint>
  1. In the API code, we have implemented two websocket "routes" or types of actions that the API supports. First one is used to run a particular view and is called "run_view" and the other one called "fetch_results" is used to fetch cached results of an already run query and helps the client paginate through the results in an efficient manner.

  2. Once connected, you can run the secure view you created previously by running:

 {"action": "run_view", "view_name":"<your_view_name>"}

The response should look something like this:

  1. The response of the previous command should give you a query_id which you can use to paginate through the results:
{"action": "fetch_results","query_id": "<query_id>", "offset": "100"}
  1. Open up the 'client.html' file in a browser to see how a simple HTML client can interact with our Snowflake API.

Additional Resources

About

Snowflake websockets API

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published