Object client scrape configs #2045

cyriltovena · 2020-05-05T20:55:59Z

Is your feature request related to a problem? Please describe.
I think it could be interesting to add to Promtail scraping capability from object storage such as s3, gcs, etc..

We already have this abstraction :

// ObjectClient is used to store arbitrary data in Object Store (S3/GCS/Azure/Etc)
type ObjectClient interface {
	PutObject(ctx context.Context, objectKey string, object io.ReadSeeker) error
	GetObject(ctx context.Context, objectKey string) (io.ReadCloser, error)
	List(ctx context.Context, prefix string) ([]StorageObject, []StorageCommonPrefix, error)
	DeleteObject(ctx context.Context, objectKey string) error
	Stop()
}

And it seems we have everything to list files and store a positions file. Although we could still use the position file to be local ?

Describe the solution you'd like

The configuration could be like this:

scrape_configs:
  - job_name: gcs
    object_storage:
         bucket_name: GCS_BUCKET_NAME
    labels:
      __path__: /foo/**.log

  # Additional labels to assign to the logs
  [ <labelname>: <labelvalue> ... ]

Describe alternatives you've considered

Fluentd but it seems to have struggle with out of order entries.

The text was updated successfully, but these errors were encountered:

Punkoivan · 2020-05-07T07:46:28Z

Hello,
it's great initiative. As a user I would like to add smth:

Position file should be kept on that storage or at least this ability should be provided. Reading from S3 or GCS is a kind of stateless process for promtail. If promtail is running in a docker or on some spot instances, we should be able to start exactly on the same step where we finished.
For sure, it'll require additional permission PutObject while just reading will not.
It would be great to "teach" promtail to read compressed data.

About alternative way (fluentd):
We're using this approach now and it works fine, the only caveat is really out of order entries - fluentd reads new objects from S3 via SQS and SQS doesn't use FIFO. So some time we hit out of order (I guess due to SQS not fluentd itself)

chancez · 2020-05-21T16:46:18Z

Hello,
it's great initiative. As a user I would like to add smth:

Position file should be kept on that storage or at least this ability should be provided. Reading from S3 or GCS is a kind of stateless process for promtail. If promtail is running in a docker or on some spot instances, we should be able to start exactly on the same step where we finished.

You gotta be very careful with this due to the data consistency model in object storage systems. It might also be an option to use something like etcd/consul similar to how Loki uses it for it's ring, you could store cursors in there as well. You could also do locking this way, which would allow multiple promtails to coordinate scraping from the same, or different object storage systems.

leon-seagate · 2022-04-26T13:31:54Z

Hey @adityacs,
Are there any updates about scraping logs from s3 bucket using promtail ?

AnthonyWC · 2022-05-23T14:43:40Z

Would be interested in this as well.

Design doc: object store scrape #2107

This was merged #2270 (has anyone have success story with it)?

jeschkies · 2022-11-11T10:40:22Z

@AnthonyWC #2270 was dropped not merged.

However, it came to my attention that Lambda Promtail can scrape logs from S3. That feature is not well documented. It's also specific to load balancer logs.

bt909 · 2023-03-24T16:52:23Z

Anyone use the Lambda Promtail possibilty? Or someone even tried Vector for this use-case?
I don't have the use-case yet, but maybe in the future and if Loki/Promtail doesn't support this, I would maybe give Vector a try.

neerajgk · 2023-09-21T05:29:10Z

Hey @adityacs, Are there any updates about scraping logs from s3 bucket using promtail ?

yes even i am interested, anyone please.

cstyan · 2023-11-08T01:13:13Z

In general, pulling logs from s3 buckets using lambda-promtail is currently possible. Some tweaks may be needed to have it work for all cases though.

As far as promtail and object storage scraping, we likely won't support it within promtail. We're currently reevaluating promtails position as a project within Grafana Labs. Internally we're actually using the Agent for both metrics and logs collection at this point.

While we haven't made a formal decision yet, we expect in the near future that all new feature work will be done in the Agent's log collection pipelines rather than in Promtail.

cyriltovena added component/agent component/integrations help wanted We would love help on these issues. Please come help us! keepalive An issue or PR that will be kept alive and never marked as stale. type/feature Something new we should do labels May 5, 2020

adityacs mentioned this issue May 21, 2020

Design doc: object store scrape #2107

Closed

adityacs mentioned this issue Jun 28, 2020

Feature: Promtail, scrape logs from Object store #2270

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Object client scrape configs #2045

Object client scrape configs #2045

cyriltovena commented May 5, 2020

Punkoivan commented May 7, 2020

chancez commented May 21, 2020

leon-seagate commented Apr 26, 2022

AnthonyWC commented May 23, 2022

jeschkies commented Nov 11, 2022

bt909 commented Mar 24, 2023

neerajgk commented Sep 21, 2023

cstyan commented Nov 8, 2023

Object client scrape configs #2045

Object client scrape configs #2045

Comments

cyriltovena commented May 5, 2020

Punkoivan commented May 7, 2020

chancez commented May 21, 2020

leon-seagate commented Apr 26, 2022

AnthonyWC commented May 23, 2022

jeschkies commented Nov 11, 2022

bt909 commented Mar 24, 2023

neerajgk commented Sep 21, 2023

cstyan commented Nov 8, 2023