Skip to content

Added out_pgsql plugin to support output to PostgreSQL#1931

Merged
edsiper merged 1 commit into
fluent:masterfrom
sxd:master
Mar 18, 2020
Merged

Added out_pgsql plugin to support output to PostgreSQL#1931
edsiper merged 1 commit into
fluent:masterfrom
sxd:master

Conversation

@sxd
Copy link
Copy Markdown
Member

@sxd sxd commented Feb 4, 2020

this pull request refers to issue #1739

Signed-off-by: Jonathan Gonzalez V jonathan.gonzalez@2ndquadrant.com

@sxd sxd mentioned this pull request Feb 4, 2020
Comment thread plugins/out_pgsql/pgsql.c
Comment thread plugins/out_pgsql/pgsql.c
Comment thread plugins/out_pgsql/pgsql.c
Comment thread plugins/out_pgsql/pgsql.c Outdated
@PettitWesley
Copy link
Copy Markdown
Contributor

@sxd My understanding of the contributing guidelines is that the commit message should be something like "out_ pgsql: new plugin to support output to PostgreSQL"

See: https://github.com/fluent/fluent-bit/blob/master/CONTRIBUTING.md#commit-changes

@sxd
Copy link
Copy Markdown
Member Author

sxd commented Feb 6, 2020

@sxd My understanding of the contributing guidelines is that the commit message should be something like "out_ pgsql: new plugin to support output to PostgreSQL"

See: https://github.com/fluent/fluent-bit/blob/master/CONTRIBUTING.md#commit-changes

yeah! that's correct that was the first commit in the base source but forgot the rule here, thanks for the remind =)

Thanks!

Comment thread plugins/out_pgsql/pgsql.c Outdated
@edsiper
Copy link
Copy Markdown
Member

edsiper commented Feb 6, 2020

@sxd I've submitted another change request

@sxd
Copy link
Copy Markdown
Member Author

sxd commented Feb 6, 2020

@sxd I've submitted another change request

thank you! working on it

Copy link
Copy Markdown
Member

@edsiper edsiper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some license change requests

Comment thread plugins/out_pgsql/pgsql.c Outdated
Comment thread plugins/out_pgsql/pgsql.h Outdated
Comment thread plugins/out_pgsql/pgsql.h Outdated
Comment thread plugins/out_pgsql/pgsql.c
Comment thread plugins/out_pgsql/pgsql.c Outdated
@edsiper
Copy link
Copy Markdown
Member

edsiper commented Feb 6, 2020

@PettitWesley comments?

Comment thread plugins/out_pgsql/pgsql.c Outdated
@PettitWesley
Copy link
Copy Markdown
Contributor

(This comment is mostly for my benefit so I can look back at it later)

To get this to build on Amazon Linux (context: I maintain aws-for-fluent-bit which is built on amazon linux), I had to create a symlink for the postgre header file:

sudo ln -s /usr/include/libpq-fe.h /usr/include/postgresql/libpq-fe.h

@PettitWesley
Copy link
Copy Markdown
Contributor

@sxd I tested with the following config:

[SERVICE]
    Log_Level debug

[INPUT]
    Name  cpu
    Tag   cpu

[OUTPUT]
    Name  pgsql
    Match *
    User fluent
    Password test

And Fluent Bit crashed with:

[2020/02/06 23:46:44] [ info] [out_pgsql] host=127.0.0.1 port=5432 dbname=fluentbit ...
[2020/02/06 23:46:44] [ info] [out_pgsql] host=127.0.0.1 port=5432 dbname=fluentbit OK
[2020/02/06 23:46:44] [ info] [out_pgsql] we check that the table "" exists, if not we create it
double free or corruption (out)[2020/02/06 23:46:44] [error] [out_pgsql] ERROR:  zero-length delimited identifier at or near """"
LINE 1: CREATE TABLE IF NOT EXISTS "" (data json);
                                   ^


Aborted

Note the double free or corruption (out)

@sxd
Copy link
Copy Markdown
Member Author

sxd commented Feb 7, 2020

@sxd I tested with the following config:

[SERVICE]
    Log_Level debug

[INPUT]
    Name  cpu
    Tag   cpu

[OUTPUT]
    Name  pgsql
    Match *
    User fluent
    Password test

And Fluent Bit crashed with:

[2020/02/06 23:46:44] [ info] [out_pgsql] host=127.0.0.1 port=5432 dbname=fluentbit ...
[2020/02/06 23:46:44] [ info] [out_pgsql] host=127.0.0.1 port=5432 dbname=fluentbit OK
[2020/02/06 23:46:44] [ info] [out_pgsql] we check that the table "" exists, if not we create it
double free or corruption (out)[2020/02/06 23:46:44] [error] [out_pgsql] ERROR:  zero-length delimited identifier at or near """"
LINE 1: CREATE TABLE IF NOT EXISTS "" (data json);
                                   ^


Aborted

Note the double free or corruption (out)

I'm checking this, give me a few minutes

@sxd
Copy link
Copy Markdown
Member Author

sxd commented Feb 7, 2020

@PettitWesley ok checking again looks like the name "Table" was taken and read somewhere before, don't know where but it's not the first time, had the same issue with other names before. I'm changing the name of the conf variable now

@sxd
Copy link
Copy Markdown
Member Author

sxd commented Feb 7, 2020

@PettitWesley look into the solution, it was more related to flb_sds_create() behavior rather than anything else =)

Comment thread CMakeLists.txt
@edsiper
Copy link
Copy Markdown
Member

edsiper commented Feb 7, 2020 via email

Comment thread plugins/out_pgsql/pgsql.h Outdated
@PettitWesley
Copy link
Copy Markdown
Contributor

@sxd So I tested it a few times, and it worked at first. Then I think the client ran into a connection issue (or something) which led the code to crash:

[2020/02/07 04:05:28] [debug] [storage] [cio stream] new stream registered: cpu.0
[2020/02/07 04:05:28] [ info] [storage] initializing...
[2020/02/07 04:05:28] [ info] [storage] in-memory
[2020/02/07 04:05:28] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2020/02/07 04:05:28] [ info] [engine] started (pid=21081)
[2020/02/07 04:05:28] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2020/02/07 04:05:28] [ info] [out_pgsql] host=127.0.0.1 port=5432 dbname=fluentbit ...
double free or corruption (out)
Aborted

My config is the same as last time.

Comment thread plugins/out_pgsql/pgsql.c Outdated
Comment thread plugins/out_pgsql/pgsql.c
@PettitWesley
Copy link
Copy Markdown
Contributor

I'm not sure what causes the current double free or corruption, but I suspect it has something to do with init failing (may be connection issues) and then clean up freeing pointers that are null. Something like that.

When Fluent Bit crashes, I believe you can not necessarily count on log messages being printed- if there was a log line right before the crash, it might not be printed.

For debugging purposes, you might be able to get around this by adding calls to sleep in the code- logging happens in a separate thread, so it you give it extra time all statements should be printed. I haven't tried this technique myself but I suspect it might work.

Comment thread plugins/out_pgsql/pgsql.c
@edsiper
Copy link
Copy Markdown
Member

edsiper commented Feb 7, 2020

let me know if we are ready to grab a beer :)

Comment thread plugins/out_pgsql/pgsql.c
Comment thread plugins/out_pgsql/pgsql.c
Comment thread plugins/out_pgsql/pgsql.c
Comment thread plugins/out_pgsql/pgsql.c
Comment thread plugins/out_pgsql/pgsql.c
Copy link
Copy Markdown
Contributor

@PettitWesley PettitWesley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sxd A bunch of suggestions, and a few required changes involving memory allocations.

I tested it again; Valgrind did not turn up anything this time.

I'll note though that I saw a lot of the same warning:

[2020/03/04 08:24:21] [ warn] [out_pgsql] another command is already in progress

[2020/03/04 08:24:21] [ warn] [out_pgsql] another command is already in progress

[2020/03/04 08:24:21] [ warn] [out_pgsql] another command is already in progress

@sxd
Copy link
Copy Markdown
Member Author

sxd commented Mar 6, 2020

@sxd A bunch of suggestions, and a few required changes involving memory allocations.

I tested it again; Valgrind did not turn up anything this time.

I'll note though that I saw a lot of the same warning:

[2020/03/04 08:24:21] [ warn] [out_pgsql] another command is already in progress

[2020/03/04 08:24:21] [ warn] [out_pgsql] another command is already in progress

[2020/03/04 08:24:21] [ warn] [out_pgsql] another command is already in progress

hoo yeah that's because there must be a command running or pushing that's another issue that we may want to discuss like provide the install of pgbouncer for connection pooling on the postgresql side, that's because we open just one connection and needs to handle all that in only one :(

Comment thread plugins/out_pgsql/pgsql.c Outdated
@edsiper
Copy link
Copy Markdown
Member

edsiper commented Mar 13, 2020

I would like to get this merged for v1.4 release in "experimental mode" and iterate over it after that.

Do you guys think this is ready to go from that perspective ?

@PettitWesley
Copy link
Copy Markdown
Contributor

@edsiper @sxd I am ready to approve it for "experimental mode" provided that the issues raised in my last review are addressed:

@sxd
Copy link
Copy Markdown
Member Author

sxd commented Mar 13, 2020

  • add memory check here: #1931 (comment)
    I change a little bit the logic here since we always need to use the value from PQescapeIdentifier() and added the proper memory check.
  • Add a call to flb_errno() in a number of places
    Done!

@sxd sxd requested a review from edsiper March 13, 2020 13:40
@PettitWesley
Copy link
Copy Markdown
Contributor

@sxd @edsiper The code looks good to me now, but while testing I'm seeing an issue:

Screen Shot 2020-03-14 at 6 30 31 PM

It looks like the single quote ' might be the problem?

@sxd
Copy link
Copy Markdown
Member Author

sxd commented Mar 16, 2020

@sxd @edsiper The code looks good to me now, but while testing I'm seeing an issue:

Screen Shot 2020-03-14 at 6 30 31 PM

It looks like the single quote ' might be the problem?

sorry about that but can't reproduce the issue :/ (last reply was a mistake) can you post the config and all that so I can reproduce? please!

@PettitWesley
Copy link
Copy Markdown
Contributor

@sxd

[SERVICE]
    Log_Level debug

[INPUT]
    Name  cpu
    Tag   cpu

[INPUT]
    Name dummy
    Tag  dummy

[INPUT]
    Name          netif
    Tag           netif
    Interval_Sec  1
    Interval_NSec 0
    Interface     eth0

[INPUT]
    Name   mem
    Tag    memory

[OUTPUT]
    Name  pgsql
    Match *
    User fluent
    Password test

@sxd
Copy link
Copy Markdown
Member Author

sxd commented Mar 17, 2020

@sxd

[SERVICE]
    Log_Level debug

[INPUT]
    Name  cpu
    Tag   cpu

[INPUT]
    Name dummy
    Tag  dummy

[INPUT]
    Name          netif
    Tag           netif
    Interval_Sec  1
    Interval_NSec 0
    Interface     eth0

[INPUT]
    Name   mem
    Tag    memory

[OUTPUT]
    Name  pgsql
    Match *
    User fluent
    Password test

I wasn't able to reproduce this for the last three hours, can you confirm that you can produce it? The case is that if theres any ' missing we need to find where because I'm working on the base that data from msgpack comes escaped

@PettitWesley
Copy link
Copy Markdown
Contributor

@sxd I am still seeing the error.

$ git fetch sxd && git checkout sxd/master
$ make
...
$ $ git rev-parse HEAD
1e262e10046c19e14e623567a264d0b753e6ffbb

This is the part that specifically keeps coming up; the error does not usually appear on the first round of flushes, but happens by the 2nd or third:

ERROR:  unterminated quoted string at or near "'[{"date":1584470022.001518,"eth0.rx.b

Notice the single quote before the [, it seems to be upset about that.

Postgre version:

$ postgres -V
postgres (PostgreSQL) 10.4

Comment thread plugins/out_pgsql/pgsql.c
}

snprintf(query, str_len,
"INSERT INTO %s SELECT * FROM json_array_elements('%s');",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is the line of code that causes the issue I saw?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's the line we changed to insert data... I'm using PostgreSQL 11 I'll try with that version later in the day

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with that version I mean "10.4" sorry not to be clear

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PettitWesley @sxd Btw, I tried out 1.4.1, and ran into this same issue. I think the issue happens when there's a single quote in the json. I'm not super familiar with C or fluent-bit, but according to this stack overflow issue I think the solution would be to replace all single quotes with two single quotes to escape them.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Tucker! sadly the solution means that we may run into a few more issues so I expect to have an answer tomorrow =) will keep you post!

Thanks again @tuckerconnelly

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jonathan, looking forward to the fix--let me know if I can do any testing or help at all.

@PettitWesley
Copy link
Copy Markdown
Contributor

@sxd I have been unable to reproduce this issue if I switch to a new table name (instead of the default "fluentbit" table which I have been using since I first started reviewing this plugin). May be that's the problem- it has old data in it??

@sxd
Copy link
Copy Markdown
Member Author

sxd commented Mar 17, 2020

@sxd I have been unable to reproduce this issue if I switch to a new table name (instead of the default "fluentbit" table which I have been using since I first started reviewing this plugin). May be that's the problem- it has old data in it??

it's probably since we changed the way to put data inside to something more effective, can you share me a dump with your old data? that will help me a lot since I don't have old data on my table :/

@PettitWesley
Copy link
Copy Markdown
Contributor

can you share me a dump with your old data

@sxd I'm not sure how to do that. I think that as long we can not reproduce this problem with the current iteration of the code it is fine.

Comment thread plugins/out_pgsql/pgsql.c Outdated
Copy link
Copy Markdown
Contributor

@PettitWesley PettitWesley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code looks good. Let's not merge it till @sxd finishes testing though.

Signed-off-by: Jonathan Gonzalez V <jonathan.gonzalez@2ndquadrant.com>
@sxd
Copy link
Copy Markdown
Member Author

sxd commented Mar 18, 2020

I think the code looks good. Let's not merge it till @sxd finishes testing though.

I've been testing since yesterday and haven't hit a bug yet so I think we're fine for an experimental state of this

@edsiper
Copy link
Copy Markdown
Member

edsiper commented Mar 18, 2020

thanks @sxd and @PettitWesley for the hard work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants