Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

index rollover cronjob fails on openshift-logging operator #859

Closed
tucsolo opened this issue Mar 11, 2022 · 7 comments
Closed

index rollover cronjob fails on openshift-logging operator #859

tucsolo opened this issue Mar 11, 2022 · 7 comments

Comments

@tucsolo
Copy link

tucsolo commented Mar 11, 2022

As documentation suggests, we installed the 5.3.5-20 OpenShift Logging and Elasticsearch operators on our 4.9.0-0.okd-2022-02-12-140851 OKD. Unfortunately, during a oc get pods routine check we noticed a failing pod, and it's the elasticsearch-im-app-xxx, from the elasticsearch-im-app cronjob. During the delete-then-rollover script execution, delete process goes OK, but then during the rollover it gets stuck during the process on some random index:

OK Process:

Index management rollover process starting for app-.orphaned

Current write index for app-.orphaned-write: app-openshift-config-000047
/tmp/scripts/indexManagementClient.py:74: ElasticsearchDeprecationWarning: [types removal] The parameter include_type_name should be explicitly specified in rollover requests to prepare for 7.0. In 7.0 include_type_name will default to 'false', which means requests must omit the type name in mapping definitions.
  response = es_client.indices.rollover(alias=alias, body=decoded)
Checking results from _rollover call
Next write index for app-.orphaned-write: app-openshift-config-000047
Checking if app-openshift-config-000047 exists
/tmp/scripts/indexManagementClient.py:86: ElasticsearchDeprecationWarning: [types removal] The parameter include_type_name should be explicitly specified in get indices requests to prepare for 7.0. In 7.0 include_type_name will default to 'false', which means responses will omit the type name in mapping definitions.
  return es_client.indices.exists(index=index)
Checking if app-openshift-config-000047 is the write index for app-.orphaned-write
Done!

Faulty process:

Index management rollover process starting for app-openshift-config

Current write index for app-openshift-config-write: app-openshift-config-000046
/tmp/scripts/indexManagementClient.py:74: ElasticsearchDeprecationWarning: [types removal] The parameter include_type_name should be explicitly specified in rollover requests to prepare for 7.0. In 7.0 include_type_name will default to 'false', which means requests must omit the type name in mapping definitions.
  response = es_client.indices.rollover(alias=alias, body=decoded)
Checking results from _rollover call
Calculating next write index based on current write index...
/tmp/scripts/indexManagement: line 94: openshift: unbound variable
Next write index for app-openshift-config-write: app-
Checking if app- exists
/tmp/scripts/indexManagementClient.py:86: ElasticsearchDeprecationWarning: [types removal] The parameter include_type_name should be explicitly specified in get indices requests to prepare for 7.0. In 7.0 include_type_name will default to 'false', which means responses will omit the type name in mapping definitions.
  return es_client.indices.exists(index=index)
{"acknowledged": false, "shards_acknowledged": false, "old_index": "app-openshift-config-000047", "new_index": "app-openshift-config-000048", "rolled_over": false, "dry_run": false, "conditions": {"[max_age: 8h]": false, "[max_size: 120gb]": false, "[max_docs: 122880000]": false}}

of course we noticed that

/tmp/scripts/indexManagement: line 94: openshift: unbound variable
Next write index for app-openshift-config-write: app-
Checking if app- exists

Now I noticed that the problem is in lines 346-347 of index management scripts, because we've got indexes like app-openshift-something-012345 and the cut command does not get "app-openshift-something" and "012345" but it gets "app" and "openshift", then failing the counter advance at 012346.

What to do? Do I have to notify someone else? In the OKD github they told me to open an issue in https://issues.redhat.com/projects/LOG/ but I can't actually open an issue on that.

@tucsolo
Copy link
Author

tucsolo commented Mar 11, 2022

possible solution is to replace the cut -d'-' -f2 command with grep -o '[0-9]\+$'

or maybe the entire line with:

$ echo $test
app-test-000342-aa-we-000044123

$ echo ${test%-*}
app-test-000342-aa-we

$ echo ${test##*-}
000044123

@tucsolo
Copy link
Author

tucsolo commented May 19, 2022

Is there any update about this? We updated the operator to the last version but this issue is still there.

@btaani
Copy link
Member

btaani commented May 19, 2022

hi @tucsolo , thanks for reporting this.
we will take a look at the issue

@xperimental
Copy link
Contributor

The issue has been migrated to JIRA: https://issues.redhat.com/browse/LOG-2644

@xperimental
Copy link
Contributor

/close

@openshift-ci openshift-ci bot closed this as completed May 30, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 30, 2022

@xperimental: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tucsolo
Copy link
Author

tucsolo commented May 30, 2022

Just to know, I tried to open and issue on JIRA as someone else suggested me, buy there wasn't any way to do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants