Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index aliasing on ES output #1670

Closed
andye2004 opened this issue Oct 21, 2019 · 25 comments
Closed

Index aliasing on ES output #1670

andye2004 opened this issue Oct 21, 2019 · 25 comments
Assignees

Comments

@andye2004
Copy link

andye2004 commented Oct 21, 2019

Firstly, just want say a huge thanks 馃憦 for all the effort you guys put into fbit and for producing what we consider an essential component of our tech stack!

Is your feature request related to a problem? Please describe.

We use ES primarily for centralised logging of on-prem k8s clusters and we are experiencing frustration, rather than problems trying to manage the over-all size of our ES DB and how to keep it manageable.

Given our primary usage, we don't really need to retain log events longer than a certain period of time and they need to remain 'hot', to use ES phraseology, only for a fairly short period of time. Given this we set Logstash_Formatto on and this effectively gives us daily rolling of indices.

However, this leaves us having to manage the ever growing list of daily created indices which isn't ideal. We are actually doing this manually, yes, we could use curator and may end up having to do this but a 'better' solution is to use the in-built ILM (see below).

Describe the solution you'd like

In newer versions of ES (6.7+) there is the index lifecycle management (ILM), which provides a way to create a policy that will allow us to rotate log indices using more granular timeframes than daily, something that is not essential but still very useful. Moore importantly, it allows us to specify a maximum retention time for indices meaning they are automatically deleted after a specified period of time. This is now the recommended way to manage index lifecycles.

The use of the ILM feature requires that indices are created with an alias such that the underlying index can be rolled over and anything writing to the alias can carry on as normal. It is for this reason that I'd like to request the addition of an aliasing feature to the ES output plugin that can be used in conjunction with the Index attribute.

Thanks, Andy.

P.S. Thought it worth adding that this feature is available when using filebeats.

@edsiper edsiper self-assigned this Nov 1, 2019
@andye2004
Copy link
Author

Hi @edsiper, I was just wondering if this was likely to make it into the 1.3 release, whenever that may be?

@JanKowalik
Copy link

JanKowalik commented Nov 27, 2019

This feature is something that would be very useful for us too.

Also, as mentioned in #1381 I was wondering if it could work without any changes. Would it work if I provide an alias name in the fluentbit index configuration field? The alias would point to a pre-created index with ILM linked to it. Or would it cause issues, as the precreated index would not have mappings defined?
In the above case I would not use LogStash format so fluentbit does not create index every day.

Thanks!

@JanKowalik
Copy link

Would it work if I provide an alias name in the fluentbit index configuration field? The alias would point to a pre-created index with ILM linked to it. Or would it cause issues, as the precreated index would not have mappings defined?

This does not work.
I disabled Logstash_Format so it does not create an index per day. This way it only uses fluent-bit index name.
Next, before I started Fluentbit, I created an index with an ILM policy and a fluent-bit alias. I was hoping that I can trick it to use the alias instead of creating its own index. I was not successful. Moreover Fluentbit did not send any data to Elasticsearch at all. I looked briefly into kubernetes deamonset pod logs but did not see any errors poping up there.

@JanKowalik
Copy link

Hi again :)
I have said the above to early. It does work. It collects data and rollsover, ages indices following the ILM I specified.

I still have a couple of concerns though.

  1. I am not sure if it creates correct mappings if I provide it with an empty index (via an alias).
  2. I still want to wait for a day at least to see if Fluentbit won't do something funny and rollover to a new index name on a day boundary (as it does when LogStash_Format is enabled)

@JanKowalik
Copy link

JanKowalik commented Dec 4, 2019

I am not sure if it creates correct mappings if I provide it with an empty index (via an alias).

Mappings are being deduced by elasticsearch from a document that is indexed as first for each new index. This might cause issues, if the first one produces mapping with a datetime field mapping and other subsequent ones from other sources do not match it and require plain text field mapping.

I still want to wait for a day at least to see if Fluentbit won't do something funny and rollover to a new index name on a day boundary (as it does when LogStash_Format is enabled)

All works fine here and indices rollover without issues.

@naseemkullah
Copy link
Contributor

Extra points if fluent-bit es output could setup the ILM policy, how filebeat does today.

@plaformsre
Copy link

Setting a write index and alias on an index should not be rocket science. One of the reasons we are switching back to filebeat is exactly the lack of support for ILM through index aliases.

@Giaco9
Copy link

Giaco9 commented Oct 22, 2020

@edsiper is there any news about this issue? We have the same need. We would like to apply an ILM policy to an index created by fluent bit.

Thank you

@beer1970
Copy link

any update on this issue

edsiper added a commit that referenced this issue Jan 25, 2021
The following patch adds 'record accessor' support for the
'logstash_prefix_key' configuration property. This enable to have
custom indices based on nested fields if required, e.g;

 {"key1": 1234, "kubernetes": {"labels: {"app": "prod"}}}

You can configure a logstash prefix as follows:

 [OUTPUT]
     name                 es
     match                *
     logstash_format      on
     logstash_prefix_key  $kubernetes['labels']['app']

The final index for the record above and the proposed config will generate:

 prod-2021.01.24

Signed-off-by: Eduardo Silva <eduardo@treasure-data.com>
@edsiper
Copy link
Member

edsiper commented Jan 25, 2021

Elasticsearch output plugin has been updated for logstash_prefix_key option. Changes:

    out_es: add support for record accessor on 'logstash_prefix_key' (#1670)
    
    The following patch adds 'record accessor' support for the
    'logstash_prefix_key' configuration property. This enable to have
    custom indices based on nested fields if required, e.g;
    
     {"key1": 1234, "kubernetes": {"labels: {"app": "prod"}}}
    
    You can configure a logstash prefix as follows:
    
     [OUTPUT]
         name                 es
         match                *
         logstash_format      on
         logstash_prefix_key  $kubernetes['labels']['app']
    
    The final index for the record above and the proposed config will generate:
    
     prod-2021.01.24

@edsiper edsiper added the fixed label Jan 25, 2021
@edsiper edsiper closed this as completed Jan 25, 2021
@beer1970
Copy link

But then still you fluent-bit will create a new index every day with the date in the titel as the alias cannot remove the date
prod in stead of prod-2021.01.24 so elastic can rollover the index with ilm

@edsiper
Copy link
Member

edsiper commented Jan 25, 2021

@beer1970 do you think the ideal case will port the same feature for non logstash_format ?

@beer1970
Copy link

would be nice to still have the logstash_format as format of the event but a option to skip the date on the end

@mayjak
Copy link

mayjak commented Feb 4, 2021

@edsiper At the minimum what is needed to use ILM successfully is basically an ability to specify a single index without trying to recreate it if it's already present.

I want to store my logs in:
logs-prod
not
logs-prod-2021.01.24

That way the automation scripts can configure a lifecycle policy for that index prior to running fluentbit. This makes ES in charge of maintaining the storage for the index without a need to create maintenance jobs to clean up old indexes etc.

A more fancy solution would require fluentbit to configure ILM, like filebeat does here:
https://www.elastic.co/guide/en/beats/filebeat/current/ilm.html
The most important part of that config is setting the ILM alias for the index. Unfortunately there's no way to automatically attach an ILM to an index pattern in ES.

@mayjak
Copy link

mayjak commented Feb 9, 2021

Just a comment about what I said above.
I couldn't believe it's still not supported so I updated our ELK stacks to see what changed.
Starting with 7.5 there is a way to attach an ILM to an index pattern, so this issue is not critical for anyone using ELK version 7.5 and above. It was a real PITA when I was using 7.3, but with the new Index Templates feature it's almost all good.
It would probably be better to support skipping the suffix as it would save some ES resources by having less indexes/shards, but at least there's a workaround now.

@alternaivan
Copy link

On our side (elasticsearch v7.14.0 and fluent-bit v1.8.15) we've configured the ILM to work with fluent-bit in this way:

  1. create an ILM policy fluent-bit
  2. create an index template and attach it to that ILM policy and specifying rollover_alias: fluent-bit
  3. create a write index with alias fluent-bit and name fluent-bit-000001
  4. configure on fluent-bit side the Logstash_Format off and Index fluent-bit
  5. logs started going into the fluent-bit-000001 and upon rollover, they are going into the next one.

Maybe the documentation for the elasticsearch output plugin should be updated? The part related to the Index configuration field description.

Regards,
Marjan

@zeusal
Copy link

zeusal commented Mar 8, 2023

On our side (elasticsearch v7.14.0 and fluent-bit v1.8.15) we've configured the ILM to work with fluent-bit in this way:

  1. create an ILM policy fluent-bit
  2. create an index template and attach it to that ILM policy and specifying rollover_alias: fluent-bit
  3. create a write index with alias fluent-bit and name fluent-bit-000001
  4. configure on fluent-bit side the Logstash_Format off and Index fluent-bit
  5. logs started going into the fluent-bit-000001 and upon rollover, they are going into the next one.

Maybe the documentation for the elasticsearch output plugin should be updated? The part related to the Index configuration field description.

Regards, Marjan

Hi @alternaivan

Can you send me the index template definition?

I try with:
{ "index_patterns": [ "logs-*" ], "template": { "aliases": { "logs": {} }, "settings": { "number_of_shards": 2, "number_of_replicas": 1 }, "mappings": { "properties": { "timestamp": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis" }, "value": { "type": "double" } } } } }

@alternaivan
Copy link

Hi @zeusal,

Sure, this is the index template definition I'm using.

{
    "index_patterns": [
        "fluent-*"
    ],
    "priority": 1,
    "template": {
        "settings": {
            "number_of_shards": 3,
            "number_of_replicas": 1,
            "index.lifecycle.name": "fluent-bit",
            "index.lifecycle.rollover_alias": "fluent-bit"
        }
    }
}

Hope this helps!

@zeusal
Copy link

zeusal commented Mar 8, 2023

alternaivan

Thank you for your help.

I have had to change that definition since I use opensearch in my case.

    "index_patterns": [
        "cluster-pro-*"
    ],
    "priority": 1,
    "template": {
    "settings": {
      "index": {
        "number_of_shards": 3,
        "number_of_replicas": 1,
        "opendistro": {
          "index_state_management": {
            "policy_id": "cluster-pro-policy",
            "rollover_alias": "cluster-pro"
          }
        }
      }
    },
}

Then I created the index with:

  "aliases": {
    "cluster-pro": {
      "is_write_index": true
    }
  }
}

I only have one question, in my policy creation, I should added rollover action?

Regards and thank you!

@alternaivan
Copy link

I only have one question, in my policy creation, I should added rollover action?

Yes, correct.

Regards! ;)

@rats-github01
Copy link

@alternaivan - after much hassle, I ended up exact same approach, however if I delete the main index (getting written), We get back to the same problem. Have you been able to get around that? Not sure why this issue is closed.

@alternaivan
Copy link

Hi @rats-github01,

Sorry for taking time to reply, didn't see thw comment.

If by main index you mean fluent-bit-00001, it gets automatically rolled over by elasticsearch, and after the configured period on your ILM gets deleted automatically. I haven't manually removed index.

Hope this helps.

@huziahmetovsv
Copy link

alternaivan

Thank you for your help.

I have had to change that definition since I use opensearch in my case.

    "index_patterns": [
        "cluster-pro-*"
    ],
    "priority": 1,
    "template": {
    "settings": {
      "index": {
        "number_of_shards": 3,
        "number_of_replicas": 1,
        "opendistro": {
          "index_state_management": {
            "policy_id": "cluster-pro-policy",
            "rollover_alias": "cluster-pro"
          }
        }
      }
    },
}

Then I created the index with:

  "aliases": {
    "cluster-pro": {
      "is_write_index": true
    }
  }
}

I only have one question, in my policy creation, I should added rollover action?

Regards and thank you!

Solution works fine. But what if i have dynamic index naming, for example prod-app01, prod-app02, dev-app01... and whatever i will deploy to k8s cluster. I can't create single template and match all my names. Is there are solutions, regex for example?
I was try to use data stream template, it works fire with dynamic naming. But data stream not always suit

@zeusal
Copy link

zeusal commented Sep 12, 2023

En el siguiente enlace puedes ver lo que permite elasticsearch: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-templates-v1.html

A priori por lo que tengo entendido solo acepta comodines. (*)

Puede utilizar: " -app " como patr贸n de 铆ndice.

alternaivan

Thank you for your help.
I have had to change that definition since I use opensearch in my case.

    "index_patterns": [
        "cluster-pro-*"
    ],
    "priority": 1,
    "template": {
    "settings": {
      "index": {
        "number_of_shards": 3,
        "number_of_replicas": 1,
        "opendistro": {
          "index_state_management": {
            "policy_id": "cluster-pro-policy",
            "rollover_alias": "cluster-pro"
          }
        }
      }
    },
}

Then I created the index with:

  "aliases": {
    "cluster-pro": {
      "is_write_index": true
    }
  }
}

I only have one question, in my policy creation, I should added rollover action?
Regards and thank you!

Solution works fine. But what if i have dynamic index naming, for example prod-app01, prod-app02, dev-app01... and whatever i will deploy to k8s cluster. I can't create single template and match all my names. Is there are solutions, regex for example? I was try to use data stream template, it works fire with dynamic naming. But data stream not always suit

In the following link you can see what elasticsearch allows:
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-templates-v1.html

A priori as far as I understand it only accepts wildcards (*).

You can use: " *-app* " as index pattern.

@huziahmetovsv
Copy link

En el siguiente enlace puedes ver lo que permite elasticsearch: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-templates-v1.html
A priori por lo que tengo entendido solo acepta comodines. (*)
Puede utilizar: " -app " como patr贸n de 铆ndice.

alternaivan

Thank you for your help.
I have had to change that definition since I use opensearch in my case.

    "index_patterns": [
        "cluster-pro-*"
    ],
    "priority": 1,
    "template": {
    "settings": {
      "index": {
        "number_of_shards": 3,
        "number_of_replicas": 1,
        "opendistro": {
          "index_state_management": {
            "policy_id": "cluster-pro-policy",
            "rollover_alias": "cluster-pro"
          }
        }
      }
    },
}

Then I created the index with:

  "aliases": {
    "cluster-pro": {
      "is_write_index": true
    }
  }
}

I only have one question, in my policy creation, I should added rollover action?
Regards and thank you!

Solution works fine. But what if i have dynamic index naming, for example prod-app01, prod-app02, dev-app01... and whatever i will deploy to k8s cluster. I can't create single template and match all my names. Is there are solutions, regex for example? I was try to use data stream template, it works fire with dynamic naming. But data stream not always suit

In the following link you can see what elasticsearch allows: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-templates-v1.html

A priori as far as I understand it only accepts wildcards (*).

You can use: " *-app* " as index pattern.

Sorry, my question was not clear. I asked about "rollover_alias"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests