Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Official benchmark of this repo comparing other PL #301

Closed
ppKrauss opened this issue Aug 13, 2018 · 18 comments
Closed

Official benchmark of this repo comparing other PL #301

ppKrauss opened this issue Aug 13, 2018 · 18 comments

Comments

@ppKrauss
Copy link

Hi, there are a "reliable source" or sister-project where we can check basic benchmark results comparing plv8 with PL/Python, PL/Perl or PL/pgSQL?

@JerrySievert
Copy link
Member

we don't provide benchmarks, and performance is going to vary based on what exactly you're trying to accomplish.

if you're trying to choose between languages based on speed alone, you might as well write whatever you need in c as an extension. otherwise, use your strengths and develop against whichever language makes the most sense for you.

any benchmarks that I've been able to find have been years old, and v8 (and thus plv8) have made huge improvements in speed since then.

what are you trying to do? maybe knowing that will help me be able to give you a better answer.

@ppKrauss
Copy link
Author

Hi @JerrySievert , thanks the explanations (!). Well, I try to check if the dream of the unified Javascript development stack is possible in nowdays, for reliable software, for day-by-day serious production...

Example: PostgreSQL have problems with filesystem, and I suppose that NodeJS functions offer better exprience for programmers (no permission barriers, etc.)... But we need peformance of NodeJS-filesystem to be comparable with foregin table with file_fdw.

@JerrySievert
Copy link
Member

plv8 is a trusted extension, meaning that it does not have direct access to the network, nor the filesystem. if that's what you're trying to accomplish, it's not going to work.

the good news is that you can have tests, see https://github.com/jerrysievert/equinox for an example.

@JerrySievert
Copy link
Member

fyi, the only relationship between plv8 and node.js is that they both embed the v8 javascript engine, and both happen to be javascript.

@ppKrauss
Copy link
Author

Oops, there are no way to use (call lib) with plv8? Is possible to do a fork of this project to use something (partial) of NodeJS, or it is a PostgreSQL constraint?

@JerrySievert
Copy link
Member

first, you need to understand the differences between trusted and untrusted languages in postgres. trusted languages are sandboxed to not allow any access to the filesystem or network. this is by design, and not an afterthought.

untrusted languages have full access to the machine and network running as the user that postgres is run under. you can see a list of trusted vs untrusted languages on the postgres wiki: https://wiki.postgresql.org/wiki/PL_Matrix - hosted providers, such as amazon's RDS, or Google or Microsoft's shared postgres hosting specifically do not support untrusted languages.

some languages are designed to be able to be both, plv8 is not. there is a fork of plv8 to be built as an untrusted language, but it is not complete, as there has not been much (any?) call for it.

if you just want to use modules from npm, there are a few projects that allow you to do that, but that are not affiliated with plv8 itself. you can find a few by [https://www.npmjs.com/search?q=plv8](searching npm for plv8).

@brandonros
Copy link

I'd be curious to see rudimentary operations per second on normal benchmarks like, fibonacci?

Obviously nobody implementing a performant fibonacci application would stand up a Postgre instance and add this extension to achieve it. However, comparing this extension's performance to say, a similar node.js app would be pretty telling on overhead.

@JerrySievert
Copy link
Member

ostensibly it should be pretty much the same speed as node - at least on comparable versions of v8 (I've tried to track v8 versions with node where it made sense).

the only differences that I can think of would be startup time - plv8 doesn't use snapshots, and when crossing the c++/javascript membrane - there's an additional layer of indirection because of type conversions.

in regards to startup time, this will be a hit at the beginning of a session/connection when using your first stored procedure for that session, but will use the same instance of the interpreter for the rest of the connection as long as the role remains the same.

as far as the membrane goes, that occurs right before entering the function and at exit, as well as SPI (queries).

@brandonros
Copy link

This project is very interesting and it is reassuring to hear your response that the overhead should not be too high.

Out of curiosity, what did you write this project for/what is it used for?

@JerrySievert
Copy link
Member

I'm not the original author, I'm just the maintainer.

as for what it's used for, it provides javascript as a language for stored procedures and triggers inside of postgres. it's used for quite a lot of things, and is one of the few language plugins available in RDS and Microsoft's hosted postgres.

I've used it extensibly in a lot of projects where it made sense, probably the most interesting was doing spatial conversions between geojson and esri's spatial json format.

@brandonros
Copy link

docker run -d --name postgres clkao/postgres-plv8:latest
docker exec -it postgres bash -c 'psql -U postgres -c "CREATE EXTENSION plv8;"'
docker run --rm -p 3000:3000 -e PGRST_DB_URI="postgres://postgres@postgres/postgres" -e PGRST_DB_ANON_ROLE="postgres" -e PGRST_DB_SCHEMA="public" --link postgres:postgres postgrest/postgrest

docker exec -it postgres bash -c "psql -U postgres"

CREATE FUNCTION plv8_test(input TEXT) RETURNS JSON AS $$
  return {
    output: input
  };
$$ LANGUAGE plv8 IMMUTABLE STRICT;

npm install -g loadtest
loadtest -c 10 --rps 200 http://localhost:3000/rpc/plv8_test?input=Hello

$ loadtest -c 10 --rps 200 http://localhost:3000/rpc/plv8_test?input=Hello
[Tue Oct 16 2018 22:55:44 GMT-0400 (EDT)] INFO Requests: 0, requests per second: 0, mean latency: 0 ms
[Tue Oct 16 2018 22:55:49 GMT-0400 (EDT)] INFO Requests: 899, requests per second: 180, mean latency: 18.9 ms
[Tue Oct 16 2018 22:55:54 GMT-0400 (EDT)] INFO Requests: 1900, requests per second: 200, mean latency: 486 ms
[Tue Oct 16 2018 22:55:54 GMT-0400 (EDT)] INFO Errors: 263, accumulated errors: 263, 13.8% of total requests
[Tue Oct 16 2018 22:55:59 GMT-0400 (EDT)] INFO Requests: 2760, requests per second: 172, mean latency: 14 ms
[Tue Oct 16 2018 22:55:59 GMT-0400 (EDT)] INFO Errors: 0, accumulated errors: 263, 9.5% of total requests
[Tue Oct 16 2018 22:56:04 GMT-0400 (EDT)] INFO Requests: 3825, requests per second: 213, mean latency: 144.7 ms
[Tue Oct 16 2018 22:56:04 GMT-0400 (EDT)] INFO Errors: 34, accumulated errors: 297, 7.8% of total requests
[Tue Oct 16 2018 22:56:09 GMT-0400 (EDT)] INFO Requests: 4826, requests per second: 200, mean latency: 36.2 ms
[Tue Oct 16 2018 22:56:09 GMT-0400 (EDT)] INFO Errors: 0, accumulated errors: 297, 6.2% of total requests
[Tue Oct 16 2018 22:56:14 GMT-0400 (EDT)] INFO Requests: 5816, requests per second: 198, mean latency: 14.5 ms
[Tue Oct 16 2018 22:56:14 GMT-0400 (EDT)] INFO Errors: 0, accumulated errors: 297, 5.1% of total requests
[Tue Oct 16 2018 22:56:19 GMT-0400 (EDT)] INFO Requests: 6826, requests per second: 202, mean latency: 57.2 ms

versus

var express = require('express')

var app = express()

app.get('/rpc/plv8_test', function(req, res) {
  console.log('Request');

  res.send({
    output: req.query.input
  });
});

app.listen(3000, function() {
  console.log('Listening...');
});
Brandons-iMac:~ brandonros$ loadtest -c 10 --rps 200 http://localhost:3000/rpc/plv8_test?input=Hello
[Tue Oct 16 2018 22:53:22 GMT-0400 (EDT)] INFO Requests: 0, requests per second: 0, mean latency: 0 ms
[Tue Oct 16 2018 22:53:27 GMT-0400 (EDT)] INFO Requests: 924, requests per second: 185, mean latency: 5 ms
[Tue Oct 16 2018 22:53:32 GMT-0400 (EDT)] INFO Requests: 1922, requests per second: 200, mean latency: 4.4 ms
[Tue Oct 16 2018 22:53:37 GMT-0400 (EDT)] INFO Requests: 2925, requests per second: 201, mean latency: 4.2 ms
[Tue Oct 16 2018 22:53:42 GMT-0400 (EDT)] INFO Requests: 3916, requests per second: 198, mean latency: 4.4 ms
[Tue Oct 16 2018 22:53:47 GMT-0400 (EDT)] INFO Requests: 4925, requests per second: 202, mean latency: 3.7 ms
[Tue Oct 16 2018 22:53:52 GMT-0400 (EDT)] INFO Requests: 5925, requests per second: 200, mean latency: 3.3 ms
[Tue Oct 16 2018 22:53:57 GMT-0400 (EDT)] INFO Requests: 6925, requests per second: 200, mean latency: 3.1 ms
[Tue Oct 16 2018 22:54:02 GMT-0400 (EDT)] INFO Requests: 7925, requests per second: 200, mean latency: 3.7 ms

I guess this + Postgrest really isn't viable?

@JerrySievert
Copy link
Member

there are so many variables in that test that it'd be hard to point a finger at any specific part of it as the limitation.

if you're not pooling a connection, or destroying connections at any point, there could be a huge cost in the connection (there's a reason why connection pooling is pretty much a requirement for any large Postgres implementation), in the instantiation of the v8 engine if a role is set or a new connection is made, etc.

a much better test of plv8 speed specifically would be to isolate each part of the test:

[local] jerry@test=# select plv8_version();
┌──────────────┐
│ plv8_version │
├──────────────┤
│ 3.0alpha     │
└──────────────┘
(1 row)

Time: 6.533 ms
[local] jerry@test=# DO $$
DECLARE
   counter INTEGER := 0 ;
BEGIN

LOOP
 EXIT WHEN counter = 10000 ;
 counter := counter + 1 ;
 PERFORM plv8_test('123456');
END LOOP ;
END ;
$$ LANGUAGE plpgsql;
DO
Time: 20.652 ms
CREATE OR REPLACE FUNCTION fibonacci (n INTEGER) 
 RETURNS INTEGER AS $$ 
DECLARE
   counter INTEGER := 0 ; 
   i INTEGER := 0 ; 
   j INTEGER := 1 ;
BEGIN
 
 IF (n < 1) THEN
 RETURN 0 ;
 END IF; 
 
 LOOP 
 EXIT WHEN counter = n ; 
 counter := counter + 1 ; 
 SELECT j, i + j INTO i, j ;
 END LOOP ; 
 
 RETURN i ;
END ; 
$$ LANGUAGE plpgsql;
[local] jerry@test=# select fibonacci(31);
┌───────────┐
│ fibonacci │
├───────────┤
│   1346269 │
└───────────┘
(1 row)

Time: 0.667 ms
CREATE FUNCTION plv8_fibonacci (n INTEGER) RETURNS INTEGER AS $$
function fibonacci(num, memo) {
  memo = memo || {};

  if (memo[num]) return memo[num];
  if (num <= 1) return 1;

  return memo[num] = fibonacci(num - 1, memo) + fibonacci(num - 2, memo);
}

return fibonacci(30);
$$ LANGUAGE plv8;
[local] jerry@test=# select plv8_fibonacci(30);
┌────────────────┐
│ plv8_fibonacci │
├────────────────┤
│        1346269 │
└────────────────┘
(1 row)

Time: 0.391 ms
function fibonacci(num, memo) {
  memo = memo || {};

  if (memo[num]) return memo[num];
  if (num <= 1) return 1;

  return memo[num] = fibonacci(num - 1, memo) + fibonacci(num - 2, memo);
}

console.log(fibonacci(30));
 ~  node --version
v8.9.4
 ~  time node /tmp/fib.js
1346269
        0.09 real         0.06 user         0.01 sys

@JerrySievert
Copy link
Member

and just for fun:

CREATE FUNCTION plv8_fibonacci2 (num INTEGER, memo JSON) RETURNS INTEGER AS $$
  memo = memo || {};

  if (memo[num]) return memo[num];
  if (num <= 1) return 1;

  let q = plv8.execute('SELECT plv8_fibonacci2($1 - 1, $2) + plv8_fibonacci2($1 - 2, $2)', [num, memo]);

  return q[0]['?column?'];
$$ LANGUAGE plv8;

which is exceedingly slow:

[local] jerry@test=# select plv8_fibonacci2(30, NULL);
┌─────────────────┐
│ plv8_fibonacci2 │
├─────────────────┤
│         1346269 │
└─────────────────┘
(1 row)

Time: 66144.019 ms (01:06.144)

@JerrySievert
Copy link
Member

the test was accidentally set to do 30 regardless, but the timing is the same fixed:

CREATE OR REPLACE FUNCTION plv8_fibonacci (n INTEGER) RETURNS INTEGER AS $$
function fibonacci(num, memo) {
  memo = memo || {};

  if (memo[num]) return memo[num];
  if (num <= 1) return 1;

  return memo[num] = fibonacci(num - 1, memo) + fibonacci(num - 2, memo);
}

return fibonacci(n);
$$ LANGUAGE plv8;
[local] jerry@test=# select plv8_fibonacci(30);
┌────────────────┐
│ plv8_fibonacci │
├────────────────┤
│        1346269 │
└────────────────┘
(1 row)

Time: 0.233 ms

@brandonros
Copy link

0.391 ms for the plv8 fibonnacci implementation versus 0.09ms for the node.js version? What could be the cause for that overhead?

@JerrySievert
Copy link
Member

JerrySievert commented Oct 17, 2018

these are in seconds, not milliseconds:

 ~  time node /tmp/fib.js
1346269
        0.09 real         0.06 user         0.01 sys

but, that said, there are differences in environment between a command-line program and a database are pretty large, including external startup data, transition between the c++/javascript layers, query parsing for the select statement, all sorts of things.

you're welcome to audit the code, and I'm happy to accept pull requests that will help improve the project, within the limitations of this being a trusted language extension, but the underlying engine here is still v8, and I think that you will find that beyond the specific choice of not using external startup data, that the javascript code executes the same in both environments.

@awbacker
Copy link

awbacker commented Sep 15, 2020

Since this is the first performance hit on google, I thought I'd add my thoughts. I haven't looked in depth, but I was looking for something faster for a task: generate 100K random strings, following some simple rules (with a variable length, but 12 in this test).

  • Postgres 12, PLv8 2.3.13

  • pgpl: ~3600+ ms

  • plv8: ~450 ms (8x)

  • cleaner code

I re-wrote in plv8 using the first reasonable answer (slightly tweaked) found on copy-paste heaven. No attempt at optimization, not that it seems much could be done. Same for the pg/pl version.

It seems that v8 can make a huge difference, at least depending on what you are doing. If I can get it deployed everywhere then I'll probably switch over to it for at least some things. I can't imagine a simple trigger would be much faster, but maybe if its cleaner reading...

CREATE OR REPLACE FUNCTION gen_rand_str_plv8(length int) returns text
    language plv8 volatile parallel safe
AS $$
    var charset = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';
    var result  = charset.charAt(Math.floor(Math.random() * 26));
    for (var i = 0; i < length - 1; i++ ) {
      result += charset.charAt(Math.floor(Math.random() * charset.length));
    }
    return result;
$$;
CREATE OR REPLACE FUNCTION gen_rand_str_pgpl(length int) RETURNS text
    language plpgsql volatile parallel safe
AS
$$
    charset constant char[] := '{A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z,0,1,2,3,4,5,6,7,8,9}';
    charlen constant int    := array_length(charset, 1);
    result           text   := charset[floor(random() * 26):: int + 1];
BEGIN
    loop
        exit when length(result) = length;
        -- generate_series(1, length-1) removes loop, but >2x perf hit
        result := result || charset[floor(random() * charlen):: int + 1];
    end loop;
    return result;
END;
$$;

And a simple query to test it, looking for duplicates. Similar to how it would actually be used:

with src as (
    select gen_rand_str_plv8(12) as x from generate_series(1, 100000) s
)
select x, count(*) from src group by x having count(*) > 1;

@luss
Copy link
Contributor

luss commented Sep 16, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants