This is a demo of delivering a stream of Stack Exchange questions using Streamdata.io running on AWS Lambda as a "serverless" function. It is designed to take this existing JSON API and turn it into a streaming API using Server-Sent Events (SSE), applying incremental updates using JSON Patch, and then stream the results into a data lake store on Amazon S3.
You will need a Streamdata.io account and application key to run this function, as well as an AWS account to run the Lambda function in, and AWS S3 to store the data. You can run this as stream for up to five minutes (the maximum timeout of a Lambda script), then rerun every X minutes, or other schedule using AWS CloudWatch Events. Providing an existing API stream that can be turned on or off as needed, delivering real time updates to your Amazon S3 data lake.
- Streamdata.io Service
- Streamdata.io Account Signup
- Stack Exchange API
- Example API Call
- AWS Serverless Application Repository
- License: MIT License
- Where do I put my Streamdata.io key? In the appToken environment variable available in the settings for your function after you have installed in.
- Where does the data end up in Amazon S3? There are two environment variables for the function: 1) s3bucket, and 2) targetFolder, which will define your "data lake" destination on Amazon S3 -- you will need to create the bucket, but the folder will automatically be created.
- Where do I get support for this function? Submit your issues via the Github repository for this project
- Is access_token and key required? No, but if you want to have higher rate limits, you will need.
- How do I get a Stack Exchange key? You will have to register an application yourself using Stack Exchange.
- How do I get a Stack Exchange access_token? You will need to setup an application, and perform the OAuth dance for your application. We could generate one for you, but then we'd have access to your token.