Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Filebeat] aws-s3 input - fix integration test with AWS #33893

Merged

Conversation

andrewkroh
Copy link
Member

@andrewkroh andrewkroh commented Nov 30, 2022

What does this PR do?

[Filebeat] aws-s3 input - fix integration test with AWS

These tests verify the behavior by fetching metrics from the monitoring
registry. After the fix in #33259 the metrics were properly removed
from the registry after the input's Run() method returned. This meant
that the tests could not access the metrics.

The fix is to keep reference to the inputMetrics struct within the input
so that the tests can directly access the metrics without going through
the monitoring registry.

[Filebeat] aws-s3 - fix panic in aws integ test

The way the stub beat.Pipeline worked in the test code resulted
in a panic because the both the test code and input code was closing
the beat.Client that it produced. The test implementation had a Close()
method that closed a channel. And since you can only close a channel
once this caused a panic.

The solution is to use the fakePipeline which is simpler because it
produces a beat.Client implementation (ackClient) that only invokes
ACK() on published events and does not use any channels.

panic: close of closed channel

goroutine 62186 [running]:
github.com/elastic/beats/v7/libbeat/publisher/testing.(*ChanClient).Close(0xc0011e4bd0)
        /Users/akroh/code/beats/libbeat/publisher/testing/testing.go:71 +0x3c
github.com/elastic/beats/v7/x-pack/filebeat/input/awss3.(*sqsS3EventProcessor).processS3Events(0xc00042e700, {0x103dc78d8, 0xc00049b280}, 0xc000aae820, {0xc001050380, 0x318})
        /Users/akroh/code/beats/x-pack/filebeat/input/awss3/sqs_s3_event.go:327 +0x77c

Why is it important?

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Notes

I don't know how CI could have been passing unless the cloudAWS job is not working or not running.

On the last PR to change the package the tests were skipped. This means that -tags integration,aws were used, but the test could not find the terraform output file.

Screen Shot 2022-11-30 at 14 00 36

I checked the CI logs and terraform did run successfully. So checking the code that reads the terraform output file and I see some unusual code. I don't know why this was added. Could the test be looking in the wrong place?

_, filename, _, _ := runtime.Caller(0)
ymlData, err := ioutil.ReadFile(path.Join(path.Dir(filename), terraformOutputYML))

update 1: On the first run of this PR the test was skipped.

update 2: On the second run, after I removed the suspicious code above it still was skipped. This will need deeper investigation into what CI is doing.

update 3: @aspacca identified the cause to that the terraform local_file output is "stashed" before the Filebeat test runs. So the test cannot access the file that it needs to it skips itself. I have no idea how to fix that. Opening an issue to track that in #34003.

How to test this PR locally

cd x-pack/filebeat/input/awss3/_meta/terraform
terraform apply
cd ../.. # x-pack/filebeat/input/awss3
go test -tags=integration,aws -race -count=1 .

Related issues

These tests verify the behavior by fetching metrics from the monitoring
registry. After the fix in elastic#33259 the metrics were properly removed
from the registry after the input's `Run()` method returned. This meant
that the tests could not access the metrics.

The fix is to keep reference to the inputMetrics struct within the input
so that the tests can directly access the metrics without going through
the monitoring registry.
The way the stub beat.Pipeline worked in the test code resulted
in a panic because the both the test code and input code was closing
the beat.Client that it produced. The test implementation had a Close()
method that closed a channel. And since you can only close a channel
once this caused a panic.

The solution is to use the fakePipeline which is simpler because it
produces a beat.Client implementation (ackClient) that only invokes
ACK() on published events and does not use any channels.

panic: close of closed channel

goroutine 62186 [running]:
github.com/elastic/beats/v7/libbeat/publisher/testing.(*ChanClient).Close(0xc0011e4bd0)
        /Users/akroh/code/beats/libbeat/publisher/testing/testing.go:71 +0x3c
github.com/elastic/beats/v7/x-pack/filebeat/input/awss3.(*sqsS3EventProcessor).processS3Events(0xc00042e700, {0x103dc78d8, 0xc00049b280}, 0xc000aae820, {0xc001050380, 0x318})
        /Users/akroh/code/beats/x-pack/filebeat/input/awss3/sqs_s3_event.go:327 +0x77c
@andrewkroh andrewkroh added bug Filebeat Filebeat aws Enable builds in the CI for aws cloud testing backport-v8.6.0 Automated backport with mergify labels Nov 30, 2022
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Nov 30, 2022
@elasticmachine
Copy link
Collaborator

elasticmachine commented Nov 30, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-01-24T02:47:42.707+0000

  • Duration: 132 min 16 sec

Test stats 🧪

Test Results
Failed 0
Passed 7775
Skipped 516
Total 8291

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@andrewkroh andrewkroh force-pushed the bugfix/fb/aws-s3-integ-test-metrics branch from 4f1d4cc to 611d60f Compare December 1, 2022 13:53
@andrewkroh andrewkroh marked this pull request as ready for review December 1, 2022 14:52
@andrewkroh andrewkroh requested a review from a team as a code owner December 1, 2022 14:52
@andrewkroh
Copy link
Member Author

/test

@aspacca
Copy link
Contributor

aspacca commented Dec 2, 2022

/test

aws integrations tests are skipped in CI because they need the terraform output yml.
but after the step in the CI job where terraform apply is run the content of the working directory is stashed, and in the next step, when the tests are run, the terraform output yml is not present anymore, ending up here:

if os.IsNotExist(err) {
t.Skipf("Run 'terraform apply' in %v to setup S3 and SQS for the test.", filepath.Dir(terraformOutputYML))
}

@andrewkroh
Copy link
Member Author

@elastic/observablt-robots Regarding what @aspacca discovered about "stashing", is there a way to allow these cloud tests that depend on a terraform local_file output to run on CI?

@amannocci
Copy link
Contributor

It should be possible to run these cloud tests that depend on terraform to run on CI by stashing the output file and unstashing it at the right moment before integration tests.
A similar approach has been achieved in other repositories.

@mergify
Copy link
Contributor

mergify bot commented Dec 8, 2022

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b bugfix/fb/aws-s3-integ-test-metrics upstream/bugfix/fb/aws-s3-integ-test-metrics
git merge upstream/main
git push upstream bugfix/fb/aws-s3-integ-test-metrics

@andrewkroh
Copy link
Member Author

I created a separate issue for the CI changes because I don't know how to fix it myself. These test fixes can merge independently of the CI fix.

@sonarcloud
Copy link

sonarcloud bot commented Dec 8, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
2.4% 2.4% Duplication

@mergify
Copy link
Contributor

mergify bot commented Dec 12, 2022

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b bugfix/fb/aws-s3-integ-test-metrics upstream/bugfix/fb/aws-s3-integ-test-metrics
git merge upstream/main
git push upstream bugfix/fb/aws-s3-integ-test-metrics

This merges the changes from main minus the conflicting changes
from 065307c in elastic#33559 which fixed the
tests in a manner that conflicts with these changes. That change also
revert previous enhancements to report metrics under the "inputs" dataset.
@aspacca
Copy link
Contributor

aspacca commented Jan 4, 2023

@andrewkroh integration tests were fixed on https://github.com/elastic/beats/pull/33559/files#diff-c6b63b9cdaac006aa7c2fc1a148f81c69102e7eefc57bcf0a36aec3220c6b86f

feel free to refactor my solution, that's far from to be elegant

@andrewkroh andrewkroh added the Team:Cloud-Monitoring Label for the Cloud Monitoring team label Jan 4, 2023
@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jan 4, 2023
Copy link
Contributor

@aspacca aspacca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewkroh the same has to be addressed in x-pack/filebeat/input/awscloudwatch/input_integration_test.go:142

Unregister metrics when the input stops. This ensures that when the input stops
the metrics are removed. This allows the input to start again without encountering
an issue that the metric ID is already registered.

As a result the integration test needs modified to still be able to read the metrics
after the input stops to allow it to runs some assertions. To accomplish this the
inputMetrics struct is made directly accessible to the tests so they don't need to
go through the monitoring registry.
@andrewkroh andrewkroh requested a review from a team as a code owner January 9, 2023 16:55
@andrewkroh andrewkroh requested review from cmacknz and fearful-symmetry and removed request for a team January 9, 2023 16:55
@andrewkroh andrewkroh force-pushed the bugfix/fb/aws-s3-integ-test-metrics branch from c429c87 to 9545759 Compare January 24, 2023 02:47
@andrewkroh andrewkroh added backport-skip Skip notification from the automated backport with mergify and removed backport-v8.6.0 Automated backport with mergify labels Jan 24, 2023
@andrewkroh andrewkroh merged commit c7d9c64 into elastic:main Jan 24, 2023
chrisberkhout pushed a commit that referenced this pull request Jun 1, 2023
* [Filebeat] aws-s3 input - fix integration test with AWS

These tests verify the behavior by fetching metrics from the monitoring
registry. After the fix in #33259 the metrics were properly removed
from the registry after the input's `Run()` method returned. This meant
that the tests could not access the metrics.

The fix is to keep reference to the inputMetrics struct within the input
so that the tests can directly access the metrics without going through
the monitoring registry.

* [Filebeat] aws-s3 - fix panic in aws integ test

The way the stub beat.Pipeline worked in the test code resulted
in a panic because the both the test code and input code was closing
the beat.Client that it produced. The test implementation had a Close()
method that closed a channel. And since you can only close a channel
once this caused a panic.

The solution is to use the fakePipeline which is simpler because it
produces a beat.Client implementation (ackClient) that only invokes
ACK() on published events and does not use any channels.

panic: close of closed channel

goroutine 62186 [running]:
github.com/elastic/beats/v7/libbeat/publisher/testing.(*ChanClient).Close(0xc0011e4bd0)
        /Users/akroh/code/beats/libbeat/publisher/testing/testing.go:71 +0x3c
github.com/elastic/beats/v7/x-pack/filebeat/input/awss3.(*sqsS3EventProcessor).processS3Events(0xc00042e700, {0x103dc78d8, 0xc00049b280}, 0xc000aae820, {0xc001050380, 0x318})
        /Users/akroh/code/beats/x-pack/filebeat/input/awss3/sqs_s3_event.go:327 +0x77c

* Update terraform aws provider version to 4.46.0

* revert: fix integration tests from 33559

Revert 065307c

* aws-cloudwatch - unregister metrics on close

Unregister metrics when the input stops. This ensures that when the input stops
the metrics are removed. This allows the input to start again without encountering
an issue that the metric ID is already registered.

As a result the integration test needs modified to still be able to read the metrics
after the input stops to allow it to runs some assertions. To accomplish this the
inputMetrics struct is made directly accessible to the tests so they don't need to
go through the monitoring registry.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws Enable builds in the CI for aws cloud testing backport-skip Skip notification from the automated backport with mergify bug Filebeat Filebeat Team:Cloud-Monitoring Label for the Cloud Monitoring team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants