Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pack: added java_sql_timestamp, a format string used by amazon athena #4811

Merged
merged 2 commits into from Feb 17, 2022

Conversation

marcosdiez
Copy link
Contributor

@marcosdiez marcosdiez commented Feb 14, 2022

This commit adds a new timestamp format, the java_sql_timestamp,
which is very similar to iso8601, except that it has a space instead
of a T between the date and the hour and does not end with Z
(or any other timestamp delimiter)

This is the format: "%Y-%m-%d %H:%M:%S"

This is unfortunately the only format accepted by Amazon Athena.

Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

Documentation

  • Documentation required for this feature

fluent/fluent-bit-docs#708

Backporting

This PR will break #4131, but I can adopt it if this one gets merged before

Example Configuration snippet

[INPUT]
    name              dummy
    Samples           1

[OUTPUT]
    name  stdout
    match *
    Format json
    json_date_key date
    json_date_format java_sql_timestamp

Debug log output from testing the change

Fluent Bit v1.9.0
* Copyright (C) 2015-2021 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/02/14 10:22:59] [ info] [engine] started (pid=542166)
[2022/02/14 10:22:59] [ info] [storage] version=1.1.6, initializing...
[2022/02/14 10:22:59] [ info] [storage] in-memory
[2022/02/14 10:22:59] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2022/02/14 10:22:59] [ info] [cmetrics] version=0.2.3
[2022/02/14 10:22:59] [ info] [sp] stream processor started
[{"date":"2022-02-14 13:22:59.927113","message":"dummy"}]

valgrind

==542169== Memcheck, a memory error detector
==542169== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==542169== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==542169== Command: bin/fluent-bit -c test.conf
==542169==
Fluent Bit v1.9.0
* Copyright (C) 2015-2021 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/02/14 10:24:19] [ info] [engine] started (pid=542169)
[2022/02/14 10:24:19] [ info] [storage] version=1.1.6, initializing...
[2022/02/14 10:24:19] [ info] [storage] in-memory
[2022/02/14 10:24:19] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2022/02/14 10:24:19] [ info] [cmetrics] version=0.2.3
[2022/02/14 10:24:19] [ info] [sp] stream processor started
[{"date":"2022-02-14 13:24:19.925510","message":"dummy"}]
^C[2022/02/14 10:24:25] [engine] caught signal (SIGINT)
[2022/02/14 10:24:25] [ warn] [engine] service will shutdown in max 5 seconds
[2022/02/14 10:24:25] [ info] [engine] service has stopped (0 pending tasks)
==542169==
==542169== HEAP SUMMARY:
==542169==     in use at exit: 0 bytes in 0 blocks
==542169==   total heap usage: 2,976 allocs, 2,976 frees, 804,426 bytes allocated
==542169==
==542169== All heap blocks were freed -- no leaks are possible
==542169==
==542169== For lists of detected and suppressed errors, rerun with: -s
==542169== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

This commit adds a new timestamp format, the java_sql_timestamp,
which is very similar to iso8601, except that it has a space instead
of a T between the date and the hour and does not end with Z
(or any other timestamp delimiter)

This is the format: "%Y-%m-%d %H:%M:%S"

This is unfortunatelly the only format accepted by Amazon Athena.

Signed-off-by: Marcos Diez <marcos@unitron.com.br>
Signed-off-by: Marcos Diez <marcos@unitron.com.br>
Copy link
Contributor

@PettitWesley PettitWesley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcosdiez Thank you, this looks useful

@edsiper edsiper merged commit b02cb10 into fluent:master Feb 17, 2022
@edsiper
Copy link
Member

edsiper commented Feb 17, 2022

pls help with #4131

@Spritekin
Copy link

Spritekin commented Feb 24, 2022

Thank you so much! This is exactly what I'm looking for. When is the next release?

@marcosdiez
Copy link
Contributor Author

@Spritekin I've been using my branch in prod without issues: docker pull marcosdiez/fluent-bit:1.8.13 I recommend using it until the next release arrives.

@Spritekin
Copy link

@marcosdiez

I tested your image, changed the pod definition to use it and it loaded:

Containers:
  fluent-bit:
    Container ID:   docker://97a64e66d30c0797fab990d097f21191d60fec4e24652a670bb8ef746f0db99e
    Image:          marcosdiez/fluent-bit:1.8.13
    Image ID:       docker-pullable://marcosdiez/fluent-bit@sha256:ce13540c8d364c55a61f1f7dfcbb2f315ac4140e038e78505c0e3e00a397b24e

Changed my configmap:

[OUTPUT]
    Name s3
    Match *
    bucket mybucket
    region ap-southeast-2
    store_dir /home/ec2-user/buffer
    s3_key_format /fluentbit/$TAG[2]/$TAG[0]/%Y/%m/%d/%H/%M/%S/$UUID.txt
    s3_key_format_tag_delimiters .-
    json_date_format java_sql_timestamp      <<<<<<<<<<<<< Added this
    total_file_size 50M
    upload_timeout 10m

The container starts ok:

Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  37m   default-scheduler  Successfully assigned default/fluentbit-fluent-bit-5njfk to ip-xxx-yyy-zzz-www.ap-southeast-2.compute.internal
  Normal  Pulling    37m   kubelet            Pulling image "marcosdiez/fluent-bit:1.8.13"
  Normal  Pulled     37m   kubelet            Successfully pulled image "marcosdiez/fluent-bit:1.8.13" in 4.576690508s
  Normal  Created    37m   kubelet            Created container fluent-bit
  Normal  Started    37m   kubelet            Started container fluent-bit

But now it won't write any logs to S3. So I went to the container logs:

Fluent Bit v1.8.13                                   <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<  New image
* Copyright (C) 2015-2021 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

Input plugin 'systemd' cannot be loaded       <<<<<<<<<<<<<<<<<<<<< HERE
[2022/02/24 23:33:48] [ info] [engine] started (pid=1)
[2022/02/24 23:33:48] [ info] [storage] version=1.1.6, initializing...
[2022/02/24 23:33:48] [ info] [storage] in-memory
[2022/02/24 23:33:48] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2022/02/24 23:33:48] [ info] [cmetrics] version=0.2.2
[2022/02/24 23:33:48] [ info] [input:tail:tail.0] multiline core started
[2022/02/24 23:33:48] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[2022/02/24 23:33:48] [ info] [sp] stream processor started
[2022/02/24 23:33:48] [ info] [input:tail:tail.0] inotify_fs_add(): inode=188745418 watch_fd=1 name=/var/log/containers/aws-node-b64k4_kube-system_aws-node-97f509bb38f82c2d370df937144148e9d3076466ddd5754db5e34a01fa6561ef.log
[2022/02/24 23:33:48] [ info] [input:tail:tail.0] inotify_fs_add(): inode=89129095 watch_fd=2 name=/var/log/containers/aws-node-b64k4_kube-system_aws-vpc-cni-init-095c2b6261bf2c92c0fcdb1e784ea350a8d2e7fa679855508b3d9a3e83384ea8.log

You can see that "Input plugin 'systemd' cannot be loaded" message which doesn't exist when I run the previous 1.8.12 image:

Fluent Bit v1.8.12                                    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Old image
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/02/23 06:44:03] [ info] [engine] started (pid=1)
[2022/02/23 06:44:03] [ info] [storage] version=1.1.5, initializing...
[2022/02/23 06:44:03] [ info] [storage] in-memory
[2022/02/23 06:44:03] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2022/02/23 06:44:03] [ info] [cmetrics] version=0.2.2
[2022/02/23 06:44:03] [ info] [input:tail:tail.0] multiline core started
[2022/02/23 06:44:03] [ info] [filter:kubernetes:kubernetes.0] https=1 host=kubernetes.default.svc port=443
[2022/02/23 06:44:03] [ info] [filter:kubernetes:kubernetes.0] local POD info OK
[2022/02/23 06:44:03] [ info] [filter:kubernetes:kubernetes.0] testing connectivity with API server...
[2022/02/23 06:44:03] [ info] [filter:kubernetes:kubernetes.0] connectivity OK
[2022/02/23 06:44:03] [ info] [fstore] created root path /home/ec2-user/buffer/wagestream-au-dev-local-backup
[2022/02/23 06:44:03] [ info] [output:s3:s3.0] Using upload size 50000000 bytes
[2022/02/23 06:44:03] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[2022/02/23 06:44:03] [ info] [sp] stream processor started

Looks like there might be a problem in the build process for this version as this problem is mentioned in some old posts here: #1696

So I need to revert and I will have to wait for the official release.

Thanks!

@marcosdiez
Copy link
Contributor Author

Sorry, @Spritekin
I don't use the systemd plugin, so I can't support it. I have too much on my plate right now.
Just wait for the official release then.

That being said, in your output settings, I would add/change two lines:

s3_key_format /fluentbit/$TAG[2]/$TAG[0]/%Y/%m/%d/%H/%M/%S/$UUID.txt.gz
compression gzip

That will save you a lot of money on both storage and athena processing fees

@Spritekin
Copy link

@marcosdiez

No worries, thanks, will wait for the final release.

I need to mention that compression gzip might not be working. I tried to send the gzip files to S3, but when downloading (in multiple ways) the content always came as plain text. Finally I connected with Athena and tried to un the query on the files, but when I used gzip source it throws an error, when I select plain text it runs ok.

Suggests the files are not gzipped even when the option is on. Commented it all here:
#3676

Regards,
Guillermo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants