Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object client scrape configs #2045

Open
cyriltovena opened this issue May 5, 2020 · 8 comments
Open

Object client scrape configs #2045

cyriltovena opened this issue May 5, 2020 · 8 comments
Labels
component/agent component/integrations help wanted We would love help on these issues. Please come help us! keepalive An issue or PR that will be kept alive and never marked as stale. type/feature Something new we should do

Comments

@cyriltovena
Copy link
Contributor

Is your feature request related to a problem? Please describe.
I think it could be interesting to add to Promtail scraping capability from object storage such as s3, gcs, etc..

We already have this abstraction :

// ObjectClient is used to store arbitrary data in Object Store (S3/GCS/Azure/Etc)
type ObjectClient interface {
	PutObject(ctx context.Context, objectKey string, object io.ReadSeeker) error
	GetObject(ctx context.Context, objectKey string) (io.ReadCloser, error)
	List(ctx context.Context, prefix string) ([]StorageObject, []StorageCommonPrefix, error)
	DeleteObject(ctx context.Context, objectKey string) error
	Stop()
}

And it seems we have everything to list files and store a positions file. Although we could still use the position file to be local ?

Describe the solution you'd like

The configuration could be like this:

scrape_configs:
  - job_name: gcs
    object_storage:
         bucket_name: GCS_BUCKET_NAME
    labels:
      __path__: /foo/**.log

  # Additional labels to assign to the logs
  [ <labelname>: <labelvalue> ... ]

Describe alternatives you've considered

Fluentd but it seems to have struggle with out of order entries.

@cyriltovena cyriltovena added component/agent component/integrations help wanted We would love help on these issues. Please come help us! keepalive An issue or PR that will be kept alive and never marked as stale. type/feature Something new we should do labels May 5, 2020
@Punkoivan
Copy link

Hello,
it's great initiative. As a user I would like to add smth:

  1. Position file should be kept on that storage or at least this ability should be provided. Reading from S3 or GCS is a kind of stateless process for promtail. If promtail is running in a docker or on some spot instances, we should be able to start exactly on the same step where we finished.
    For sure, it'll require additional permission PutObject while just reading will not.

  2. It would be great to "teach" promtail to read compressed data.

About alternative way (fluentd):
We're using this approach now and it works fine, the only caveat is really out of order entries - fluentd reads new objects from S3 via SQS and SQS doesn't use FIFO. So some time we hit out of order (I guess due to SQS not fluentd itself)

@chancez
Copy link
Contributor

chancez commented May 21, 2020

Hello,
it's great initiative. As a user I would like to add smth:

  1. Position file should be kept on that storage or at least this ability should be provided. Reading from S3 or GCS is a kind of stateless process for promtail. If promtail is running in a docker or on some spot instances, we should be able to start exactly on the same step where we finished.

You gotta be very careful with this due to the data consistency model in object storage systems. It might also be an option to use something like etcd/consul similar to how Loki uses it for it's ring, you could store cursors in there as well. You could also do locking this way, which would allow multiple promtails to coordinate scraping from the same, or different object storage systems.

@leon-seagate
Copy link

Hey @adityacs,
Are there any updates about scraping logs from s3 bucket using promtail ?

@AnthonyWC
Copy link

Would be interested in this as well.

Design doc: object store scrape #2107

This was merged #2270 (has anyone have success story with it)?

@jeschkies
Copy link
Contributor

@AnthonyWC #2270 was dropped not merged.

However, it came to my attention that Lambda Promtail can scrape logs from S3. That feature is not well documented. It's also specific to load balancer logs.

@bt909
Copy link
Contributor

bt909 commented Mar 24, 2023

Anyone use the Lambda Promtail possibilty? Or someone even tried Vector for this use-case?
I don't have the use-case yet, but maybe in the future and if Loki/Promtail doesn't support this, I would maybe give Vector a try.

@neerajgk
Copy link

Hey @adityacs, Are there any updates about scraping logs from s3 bucket using promtail ?

yes even i am interested, anyone please.

@cstyan
Copy link
Contributor

cstyan commented Nov 8, 2023

In general, pulling logs from s3 buckets using lambda-promtail is currently possible. Some tweaks may be needed to have it work for all cases though.

As far as promtail and object storage scraping, we likely won't support it within promtail. We're currently reevaluating promtails position as a project within Grafana Labs. Internally we're actually using the Agent for both metrics and logs collection at this point.

While we haven't made a formal decision yet, we expect in the near future that all new feature work will be done in the Agent's log collection pipelines rather than in Promtail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/agent component/integrations help wanted We would love help on these issues. Please come help us! keepalive An issue or PR that will be kept alive and never marked as stale. type/feature Something new we should do
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants