Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use KnexJS with serverless-offline / AWS Lambda #3464

Closed
deepakgonda opened this issue Oct 4, 2019 · 25 comments
Closed

How to use KnexJS with serverless-offline / AWS Lambda #3464

deepakgonda opened this issue Oct 4, 2019 · 25 comments

Comments

@deepakgonda
Copy link

deepakgonda commented Oct 4, 2019

Environment

Knex version: 0.19.3
Database + version: Postgres/RDS- Postgres, (PostgreSQL) 10.10 (Ubuntu 10.10-0ubuntu0.18.04.1)

Bug

We are trying to use Knex in our serverless application. We are using serverless-offline for local development and serverless package to deploy application to AWS Lambda.

We are using following configuration:

   client: 'pg',
    connection: {
        host: process.env.DB_HOST,
        user: process.env.DB_USER,
        password: process.env.DB_PASS,
        database: process.env.DB_NAME
    },
    debug: process.env.NODE_ENV === 'local' ? true : false,
    pool: {
        min: 1,
        max: 1,
        afterCreate: (conn, done) => {
            console.log('AFTERCREATE DB CONNECTION');
            done();
            console.log('AFTERCREATE DB CONNECTION - 2');
            return ;
        }
    },
    migrations: {
        tableName: 'knex_migrations',
        directory: __dirname + '/src/db/migrations',
    },
    seeds: {
        directory: __dirname + '/src/db/seeds'
    }
});

We can see following in console, when function is called which requires knex to communicate with DB:

`Serverless: Offline [HTTP] listening on http://localhost:3000
Serverless: Enter "rp" to replay the last request

Serverless: ANY /test (λ: app)
AFTERCREATE DB CONNECTION
AFTERCREATE DB CONNECTION - 2
Serverless: Replying timeout after 30000ms
`

Sample code used in controller where we are using knex

const {Router} = require("express")
const knex = require('../db/knex');


const router = Router()

router.get('/', async(req,res) => {
    try {
        let users =null
        await knex.transaction(async trx => {
            users = await knex.select().returning(['*']).transacting(trx).into('users')
            trx.commit
        })

        res.status(200).json({
            data: {
                users:users
            }
        })

    } catch(err){
        res.status(200).json({failed:true , err: err})
    }
})

module.exports = router;

From above given console output, I can conclude only that in afterCreate function of knex pool configuration, even calling the done() function never finishes its execution; i.e why the timeout happens after 30000ms.

After some reading I found out that, in serverless environment/ AWS Lambda to fix this we have to add following;

context.callbackWaitsForEmptyEventLoop = false

I have added it also, but no gain. Following is the code sample for same:

const server = sls(app);
module.exports.server = async (event, context) => {
  context.callbackWaitsForEmptyEventLoop = false;
  console.log('Remaining time: ', context.getRemainingTimeInMillis())
  console.log('Function name: ', context.functionName)
  const result = await server(event, context);
  // and here
  return result;
};

Please help me understand, what we are doing wrong, or this is some bug in Knex Library or Serverless. And how to fix it.

@deepakgonda
Copy link
Author

Also if we increase the max value to 100 then in local setup with serverless offline and Postgress, it works for one or two time, b/c every time it attempts to create that much connection in Pg server.

And after few queries, it gives the error i.e two many clients. B/c each time query is issued max no. of connections are created. This I can check via following query:

select * from pg_stat_activity;

And for all the connections, the state is: idle in transaction

@Deepak969
Copy link

Facing the same issue from long time, please help me as soon as possible.

@salcanzar
Copy link

Facing the same issue.

@deepakgonda
Copy link
Author

Also if we increase the max value to 100 then in local setup with serverless offline and Postgress, it works for one or two time, b/c every time it attempts to create that much connection in Pg server.

And after few queries, it gives the error i.e two many clients. B/c each time query is issued max no. of connections are created. This I can check via following query:

select * from pg_stat_activity;

And for all the connections, the state is: idle in transaction

The query for all the connections shown as 'BEGIN;`

@kumar88kanishk
Copy link

Facing the same issue, please help!

@elhigu
Copy link
Member

elhigu commented Oct 5, 2019

I don't see anything here that would suggest there is something wrong with knex... If you can provide some complete reproduction example project (for example running with docker) I could take a look.

Also on issue description example codes there is a line:

            trx.commit

Which actually does not do anything, but it is also not the problem in this case.

You could cleanup router code to look like this:

const {Router} = require("express");
const knex = require('../db/knex');
const router = Router();

router.get('/', async(req,res) => {
    try {
        const users = await knex.transaction(async trx => {
          return knex.select().returning(['*']).transacting(trx).into('users');
        });

        res.status(200).json({
            data: {
                users:users
            }
        });
    } catch(err){
        res.status(200).json({failed:true , err: err});
    }
});

module.exports = router;

There could be something fishy going on in ../db/knex though...

@deepakgonda
Copy link
Author

@elhigu , I have following code in db/knex.js

module.exports = require('knex')({
    client: 'pg',
    connection: {
        host: process.env.DB_HOST,
        user: process.env.DB_USER,
        password: process.env.DB_PASS,
        database: process.env.DB_NAME
    },
    debug: process.env.NODE_ENV === 'local' ? true : false,
    pool: {
        min: 1,
        max: 1,
        afterCreate: async (conn, done) => {
            console.log('AFTER CREATE DB CONNECTION');
            await conn.query('SET timezone="UTC";');
            done(null, conn);
            console.log('AFTER CREATE DB CONNECTION - 2');
            return done;
        }
    },
    migrations: {
        tableName: 'knex_migrations',
        directory: __dirname + '/src/db/migrations',
    },
    seeds: {
        directory: __dirname + '/src/db/seeds'
    }
});

If you want, I can give you access to my sample application code.

@deepakgonda
Copy link
Author

deepakgonda commented Oct 6, 2019

@elhigu Here is the link of Sample Application GitHub Project:-

https://github.com/deepakgonda/sample-serverless-app

I have setup the project with online Postgres RDS Server, which we are planning to use. Please clone this app to your pc.

To Run it:

  • npm install
  • npm start

Try accessing the url when app starts.

http://localhost:3000/users

After some time you will see error like this:

[Serverless-Offline] Your λ handler 'app' timed out after 30000ms.

If you want to use local PG setup, I have included the SQL file also.

@elhigu
Copy link
Member

elhigu commented Oct 6, 2019

Mikaels-MacBook-Pro:sample-serverless-app mikaelle$ DEBUG=knex:* node 
> const db = require('./src/db/knex')
undefined
> db('users').then(res => console.log(res)).catch(e => console.log(e))
Promise [Object] {
  _bitField: 0,
  _fulfillmentHandler0: undefined,
  _rejectionHandler0: undefined,
  _promise0: undefined,
  _receiver0: undefined }
> AFTER CREATE DB CONNECTION
AFTER CREATE DB CONNECTION - 2
  knex:client acquired connection from pool: __knexUid1 +0ms
  knex:query select * from "users" undefined +0ms
  knex:bindings [] undefined +0ms
  knex:client releasing connection to pool: __knexUid1 +6ms
{ error: relation "users" does not exist
    at Connection.parseE (/Users/mikaelle/Projects/Vincit/sample-serverless-app/node_modules/pg/lib/connection.js:604:11)
    at Connection.parseMessage (/Users/mikaelle/Projects/Vincit/sample-serverless-app/node_modules/pg/lib/connection.js:401:19)
    at Socket.<anonymous> (/Users/mikaelle/Projects/Vincit/sample-serverless-app/node_modules/pg/lib/connection.js:121:22)
    at Socket.emit (events.js:198:13)
    at Socket.EventEmitter.emit (domain.js:466:23)
    at addChunk (_stream_readable.js:288:12)
    at readableAddChunk (_stream_readable.js:269:11)
    at Socket.Readable.push (_stream_readable.js:224:10)
    at TCP.onStreamRead [as onread] (internal/stream_base_commons.js:94:17)
  name: 'error',
  length: 104,
  severity: 'ERROR',
  code: '42P01',
  detail: undefined,
  hint: undefined,
  position: '15',
  internalPosition: undefined,
  internalQuery: undefined,
  where: undefined,
  schema: undefined,
  table: undefined,
  column: undefined,
  dataType: undefined,
  constraint: undefined,
  file: 'parse_relation.c',
  line: '1180',
  routine: 'parserOpenTable' }
(node:34788) [DEP0097] DeprecationWarning: Using a domain property in MakeCallback is deprecated. Use the async_context variant of MakeCallback or the AsyncResource class instead.
> db.select(1).then(res => console.log(res)).catch(e => console.log(e))
Promise [Object] {
  _bitField: 0,
  _fulfillmentHandler0: undefined,
  _rejectionHandler0: undefined,
  _promise0: undefined,
  _receiver0: undefined }
>   knex:client acquired connection from pool: __knexUid1 +5s
  knex:query select 1 undefined +5s
  knex:bindings [] undefined +5s
  knex:client releasing connection to pool: __knexUid1 +3ms
[ { '?column?': 1 } ]

Looks like knex is working just as expected... I'll check a bit more of your users case if I can spot easily where your error is. At least from knex debug it did look like transaction is left open and not ended...

@elhigu elhigu closed this as completed Oct 6, 2019
@elhigu
Copy link
Member

elhigu commented Oct 6, 2019

Something seems to be wrong in your middlewares. They always starts new transaction and the leave them open after the requests end. Even when I just query http://localhost:3000/ debugs shows that new transaction is created and never ended.

I was running app with DEBUG=knex:* environment variable set to see information what knex is doing.

@elhigu
Copy link
Member

elhigu commented Oct 7, 2019

Yep... in your app you have pool size 1, then you create globally abandoned transaction that is never resolved in controllers/users.js which always reserves the single connection in the pool and then you try to get another connection for another transaction, but you cannot because there was only 1 connection available.

If anyone is having real problems with serverless-offline/AWS lambda please open new issue with complete single file reproduction case. I would have spotted that problem directly from reproduction code.

@deepakgonda
Copy link
Author

deepakgonda commented Oct 7, 2019

@elhigu , Now I have removed all middlewares and other clutters form code, now only we have one route file and one controller file with only one method.

And Yes!!! after removing the middlewares from code, it is working again. Now I am dubugging what was wrong in my middleware.

Please check.

And if possible please help us to know, what would be the best way to use knex in serverless, b/c I think in case of serverless, after the request is finished functions will become cold, and cannot use the connection from pool when new request comes.

@elhigu
Copy link
Member

elhigu commented Oct 7, 2019

And if possible please help us to know, what would be the best way to use knex in serverless, b/c I think in case of serverless, after the request is finished functions will become cold, and cannot use the connection from pool when new request comes.

You should ask from serverless people what is the best way to use databases with their system.

And Yes!!! after removing the middlewares from code, it is working again. Now I am dubugging what was wrong in my middleware.

Please check.

Nope, I have used already an hour too much of my time in this. So good luck + try to get help from stackoverflow or somewhere else. Knex issue tracker is for bugs / feature discussions.

@deniapps
Copy link

deniapps commented Aug 22, 2020

Have been looking into this issue for a whole week. I would like to share what I have found. The key is

although Lambda functions do not share a connection pool, the queries in the same function do share a connection pool.

So we should take advantage of the pool connection, but since Lambda functions do not share a connection pool, we should set the min to 0, so then the connection could be destroyed after certain times defined by idleTimeoutMillis. If we set it as min:1, max:1, due to how lambda functions work, there will be one connection left for each lambda function call, and then scaling Lambda functions will overwhelm the RDS instance.

So try this setting:

pool: {
    min: 0, // very important!!!
    max: 10, 
    createTimeoutMillis: 30000,
    acquireTimeoutMillis: 30000,
    idleTimeoutMillis: 1000, //since no share, set this to a small number
    reapIntervalMillis: 1000,
    createRetryIntervalMillis: 100,
    propagateCreateError: true
  },

You could adjust max depend on how many queries you have in one Lambda function call. You can also adjust idleTimeoutMillis, which control how many seconds before "sleep" resource been destroyed.

Last, we could now use the AWS RDS proxy, which allows applications to pool and share connections established with the database.
More information could be found in the article I wrote: https://deniapps.com/blog/mysql-rds-aborted-connection-error-aws-lambda

@lastmjs
Copy link

lastmjs commented Sep 3, 2020

@deniapps Hey, I'm attempting to try out your solution. Where are all of these settings documented? And what version of Knex are you using? I see the typescript types for some of these settings, but not all of them. I also don't see any information about these settings in the documentation, so it's hard for me to reason about what each setting does. Any help would be great! Thanks

@deniapps
Copy link

deniapps commented Sep 3, 2020

@deniapps Hey, I'm attempting to try out your solution. Where are all of these settings documented? And what version of Knex are you using? I see the typescript types for some of these settings, but not all of them. I also don't see any information about these settings in the documentation, so it's hard for me to reason about what each setting does. Any help would be great! Thanks

KnexJS uses tarn.js to manage the connection pool, so you should check those settings at https://github.com/vincit/tarn.js

@lastmjs
Copy link

lastmjs commented Sep 3, 2020

Oh perfect, that really helps. And what version of Knex are you using?

@deniapps
Copy link

deniapps commented Sep 3, 2020

Oh perfect, that really helps. And what version of Knex are you using?

I use knex@0.20.13, but should be the same for the latest version (I will upgrade it soon.) This is not a problem with Knex. It's the problem of AWS Lambda with Serverless, which does not share the pool connection among the Lambda functions. If you also use AWS Lambda with RDS, you may check RDS Proxy, which allows you to share the database connection pool and improve database efficiency. I also wrote an article about that: https://deniapps.com/blog/setup-aws-lambda-to-use-amazon-rds-proxy.

@eugene-kim
Copy link

@deniapps thank you for the WONDERFUL article! This might be exactly what I've been looking for. Will be trying this out shortly.

@lastmjs did you happen to get this working for you? I've been looking for ways to get knex, amazon rds proxy and lambdas to play nicely and it seems as if you were in the same boat.

@eugene-kim
Copy link

@deniapps how did you get around connection pinning? Knex will issue prepared statements and as a result, RDS Proxy will pin the client connection to a specific DB connection, reducing the performance improvements that RDS Proxy would otherwise provide.

Reference: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/rds-proxy.html

@deniapps
Copy link

deniapps commented Oct 3, 2020

@deniapps how did you get around connection pinning? Knex will issue prepared statements and as a result, RDS Proxy will pin the client connection to a specific DB connection, reducing the performance improvements that RDS Proxy would otherwise provide.

Reference: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/rds-proxy.html

Hi @eugene-kim, my understanding RDS Proxy is completely transparent for the application code. What we did was making a DB connection to the Proxy's DNS instead of a standard RDS address. So we don't make any other changes except increasing pool max/min.

@eugene-kim
Copy link

RDS Proxy is completely transparent for the application code. What we did was making a DB connection to the Proxy's DNS instead of a standard RDS address. So we don't make any other changes except increasing pool max/min.

Yup, I'm with you there. The problem, however, is that given that Knex uses prepared statements, RDS Proxy will detect this and pin the connection between client and DB, which prevents other client instances from taking advantage of this connection.

AWS Documentation:

Your connections to the proxy can enter a state known as pinning. When a connection is pinned, each later transaction uses the same underlying database connection until the session ends. Other client connections also can't reuse that database connection until the session ends. The session ends when the client connection is dropped.

I imagine this will nullify many of the benefits that having a dedicated DB proxy is supposed to provide in the first place. Have you seen a lot of connection pinning happening, @deniapps ? The CloudWatch metric DatabaseConnectionsCurrentlySessionPinned should be useful here.

@elhigu
Copy link
Member

elhigu commented Oct 5, 2020

I would guess that pretty much every connection is pinned that is allocated by the knex-pool of lambda process. Especially if there is some afterCreate https://knexjs.org/#Installation-pooling-afterCreate configuration function set which sets some (timezone etc.) variables for the DB session.

Though I don't see how that nullify benefits that much, since rds proxy can still keep up those connections and wait for lambda instances to rapidly connect to the DB (I just imagined that connecting from lambda -> proxy is made super fast compared to directly connecting the DB). Also just using transactions would directly cause automatic pinning which is pretty common case when writing SQL.

@deniapps
Copy link

deniapps commented Oct 5, 2020

@eugene-kim The "pinning" is something new to me. I checked CloudWatch metric, there isn't too many DatabaseConnectionsCurrentlySessionPinned compared to DatabaseConnections:
image

I am still trying to understand this, so please let me know if you find anything.
Thanks!

@AtherBilal
Copy link

AtherBilal commented Jul 1, 2021

@deniapps Can you share a little bit more about how you are able to achieve those benchmarks? I'm also using knex with RDS proxy and this is what my configuration looks like at the moment.

 pool: {
            min: 5,
            max: 30,
            createTimeoutMillis: 30000,
            acquireTimeoutMillis: 10000,
            idleTimeoutMillis: 30000,
            reapIntervalMillis: 1000,
            createRetryIntervalMillis: 100
          }
    }

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants