Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jaeger-v2] add GRPC storage backend integration test #5259

Merged
merged 21 commits into from
Mar 16, 2024

Conversation

james-ryans
Copy link
Contributor

Which problem is this PR solving?

Description of the changes

  • Created a grpc-integration-test.sh script to run a jaeger remote storage and execute the end-to-end test through the OpenTelemetry Collector pipeline with the jaeger storage extension inside connected to the remote storage. To have a visualization of this architecture, see the proposal at [jaeger-v2] Storage backend integration tests #5254
  • Separate the GRPC and Badger integration test CI because GRPC need to be run twice for the v1 and v2 versions.

How was this change tested?

  • Run ./scripts/grpc-integration-test.sh latest and the whole remote storage and pipeline will be built and executed for you.
  • I also ran a jaeger-querycomponent to query traces from the remote storage for manual checks.
Screenshot 2024-03-08 at 09 58 13

Checklist

Signed-off-by: James Ryans <james.ryans2012@gmail.com>
Copy link

codecov bot commented Mar 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.06%. Comparing base (3912f00) to head (a5d395e).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5259   +/-   ##
=======================================
  Coverage   95.05%   95.06%           
=======================================
  Files         339      340    +1     
  Lines       16529    16597   +68     
=======================================
+ Hits        15712    15778   +66     
- Misses        629      630    +1     
- Partials      188      189    +1     
Flag Coverage Δ
badger 13.27% <ø> (?)
cassandra-3.x 26.47% <ø> (ø)
cassandra-4.x 26.47% <ø> (ø)
elasticsearch-5.x 21.73% <ø> (+0.01%) ⬆️
elasticsearch-6.x 21.73% <ø> (ø)
elasticsearch-7.x 21.81% <ø> (ø)
elasticsearch-8.x 21.88% <ø> (-0.02%) ⬇️
grpc 10.86% <95.71%> (?)
grpc-badger ?
kafka 14.63% <ø> (ø)
opensearch-1.x 21.81% <ø> (ø)
opensearch-2.x 21.79% <ø> (-0.02%) ⬇️
unittests 92.30% <7.14%> (-0.38%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: James Ryans <james.ryans2012@gmail.com>
Signed-off-by: James Ryans <james.ryans2012@gmail.com>
Signed-off-by: James Ryans <james.ryans2012@gmail.com>
@james-ryans
Copy link
Contributor Author

Am I allowed to split the grpc-and-badger CI? I need to do this because the CIT gRPC requires a matrix for Jaeger v1 and v2. But the grpc-and-badger is stuck on Expected — Waiting for status to be reported. Is this something that needs to be tinkered with in the repository settings?

@james-ryans james-ryans marked this pull request as ready for review March 8, 2024 15:36
@james-ryans james-ryans requested a review from a team as a code owner March 8, 2024 15:36
Makefile Show resolved Hide resolved
cmd/jaeger/integration/fixtures/grpc_config.yaml Outdated Show resolved Hide resolved
scripts/grpc-integration-test.sh Show resolved Hide resolved
scripts/grpc-integration-test.sh Show resolved Hide resolved
"github.com/open-telemetry/opentelemetry-collector-contrib/testbed/testbed"
"github.com/stretchr/testify/require"

"github.com/jaegertracing/jaeger/cmd/jaeger/integration/datareceivers"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you foresee other data receivers? If not we could define this directly in integration pkg since they are closely coupled. And we could also move integration pkg under internal so that you can call components() function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you foresee other data receivers?

We will need another data receivers for Kafka, I've created one at #4971. Actually, I've just read your comment at Kafka data receiver that you prefer this package to be moved under "$root/internal/cit/". Is that still relevant?

we could also move integration pkg under internal so that you can call components() function

I've just tried that to move cmd/jaeger/integration to cmd/jaeger/internal/integration but I can't call the components() function when it is in private. Am I miss understood something here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we'd still need Components as public, I suppose. Although another alternative is to define your own components for integration tests, as you probably won't need everything from production build.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I just felt like making the Components function public, since it will still be kept private within the internal directory. Defining a different components function means maintaining two similar functions with different environments, which could lead to oversight in the future (probably? this is only my thought process).

Signed-off-by: James Ryans <james.ryans2012@gmail.com>
Signed-off-by: James Ryans <james.ryans2012@gmail.com>
@james-ryans
Copy link
Contributor Author

james-ryans commented Mar 11, 2024

The unit tests CI is failing due to jaeger storage extension goroutine leak but I still cannot find the cause. It's so hard to replicate, I can't even reproduce the error once.

goleak: Errors on successful test run: found unexpected goroutines:
[Goroutine 96 in state chan receive, with net.(*Resolver).lookupIPAddr.func2 on top of the stack:
net.(*Resolver).lookupIPAddr.func2(0xc0001107e0, 0x225ebb0)
	/opt/hostedtoolcache/go/1.22.0/x64/src/net/lookup.go:338 +0x31
created by net.(*Resolver).lookupIPAddr in goroutine 57
	/opt/hostedtoolcache/go/1.22.0/x64/src/net/lookup.go:353 +0x7dc
 Goroutine 58 in state chan receive, with net.(*Resolver).goLookupIPCNAMEOrder.func4 on top of the stack:
 ...
 Goroutine 60 in state IO wait, with internal/poll.runtime_pollWait on top of the stack:
 ...
]
FAIL	github.com/jaegertracing/jaeger/cmd/jaeger/internal/extension/jaegerstorage	6.670s

@yurishkuro yurishkuro added the changelog:ci Change related to continuous integration / testing label Mar 12, 2024
…twork

Signed-off-by: James Ryans <james.ryans2012@gmail.com>
@james-ryans
Copy link
Contributor Author

Not sure if this actually solves the issue but I've changed the TestESStorageExtensionError at extension_test.go to connect to http://127.0.0.1:65535 instead of http://badurl to avoid connecting to an external network, which caused the goroutine leaks of looking for badurl IP addr.

@yurishkuro
Copy link
Member

Not sure if this actually solves the issue but I've changed ...

This may cause the execution to go through a different code path and reduce code coverage (just guessing).

@james-ryans
Copy link
Contributor Author

Not sure if this actually solves the issue but I've changed ...

This may cause the execution to go through a different code path and reduce code coverage (just guessing).

Hmm.. from my understanding, I believe this unit test is to cover this line of code, which both http://badurl and http://localhost:65535 failed to initialize the storage due to no elasticsearch server hosted at that URL.

if err != nil {
return fmt.Errorf("failed to initialize %s storage %s: %w", s.storageKind, name, err)
}

And also, to back up my argument, I've done test coverage locally specific only to the TestESStorageExtensionError unit test, and both are exactly the same.

Or do you have any suggestions on how to fix this? Perhaps I'm doing it wrong. Because I've looked into another similar issue https://gitlab.com/gitlab-org/gitaly/-/issues/4916 and they fixed it by changing the URL to connect to localhost in the unit test too.

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with badurl change

cmd/jaeger/internal/integration/grpc_test.go Show resolved Hide resolved
james-ryans and others added 4 commits March 15, 2024 19:49
Signed-off-by: James Ryans <james.ryans2012@gmail.com>
Signed-off-by: James Ryans <james.ryans2012@gmail.com>
Signed-off-by: James Ryans <james.ryans2012@gmail.com>
@james-ryans
Copy link
Contributor Author

I've made some adjustments:

  • Refactor to remove the boilerplate of testbed codes and now we just need to pass the config file
  • I also implemented to automatically create the data receiver from that config file since the data receiver needs to host the storage extension exactly as the collector.
  • Fixed unit test goroutine leaking at /plugin/storage/es similar to the previous issue, instead this time fails at FAIL: TestPasswordFromFile because it contains individual goleak check which detected goroutine leaking exactly from the previous test at TestESStorageFactoryWithConfigError.

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jaeger-ui sub module should not be changing in this PR, right?

)

func TestGRPCStorage(t *testing.T) {
if os.Getenv("STORAGE") != "jaeger_grpc" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: why do you want it to have jaeger prefix? What other possibilities are there? Would es storage also have jaeger prefix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, now that you mention it, we don't need the jaeger_ prefix. I thought we might need it to differentiate the v1 and v2 envvar so they wouldn't collide, but they are already separated by the Makefile target. V1 calls storage-integration-test, and v2 calls jaeger-storage-integration-test. I'll remove the prefix.

func TestGRPCStorage(t *testing.T) {
if os.Getenv("STORAGE") != "jaeger_grpc" {
t.Skip("Integration test against Jaeger-V2 GRPC skipped; set STORAGE env var to jaeger_grpc to run this")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you foresee NOT using envvar to enable/skip storage tests? If it's always going to be used we could move the check to s.Test().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think it's always going to be used. Will move it.

// Hacky way to get the storage extension name.
// This way we don't need to modify this,
// when a new storage backend is added.
name := ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain what you are trying to do. Can we not create the component always with the same name for integration tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the architecture at issue #5254, I've designed that the data receiver will utilize the artificial jaeger receiver to pull the traces from the database which is through jaeger storage extension. Because of that, we need to host another jaeger storage extension that lives in the data receiver, and the jaeger receiver will point to that extension.

So, since the jaeger storage extension in the data receiver is a duplication from the collector's extension, I unmarshal the extension config to be used for the data receiver to host it. But now the jaeger receiver needs to know the name of that storage to be set at TraceStorage, so I need to scrape it through the config whether it is in Memory/Badger/GRPC/Elasticsearch.

But while explaining, I've thought of another easier way to set the jaeger receiver TraceStorage. Which is I just need to get it from the exporter. I'm changing the implementation right now and add the explanation to newDataReceiver function.

@james-ryans
Copy link
Contributor Author

jaeger-ui sub module should not be changing in this PR, right?

This is because I merged this change #5255 from main into this branch

Signed-off-by: James Ryans <james.ryans2012@gmail.com>
james-ryans and others added 3 commits March 16, 2024 13:03
Signed-off-by: James Ryans <james.ryans2012@gmail.com>
Signed-off-by: James Ryans <james.ryans2012@gmail.com>
Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to fix jaeger-ui:

  • git checkout main
  • git submodule update
  • git checkout your-branch
  • git status (should show jaeger-ui as changed)
  • git commit -s


config, err := os.ReadFile(s.ConfigFile)
require.NoError(t, err)
log.Println(string(config))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t.Log

Signed-off-by: James Ryans <james.ryans2012@gmail.com>
Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You used to have all kinds of diagrams explaining the set up of these integration tests, where are those? I only see one README, which is quite short.

Signed-off-by: James Ryans <james.ryans2012@gmail.com>
Signed-off-by: James Ryans <james.ryans2012@gmail.com>
@james-ryans
Copy link
Contributor Author

You used to have all kinds of diagrams explaining the set up of these integration tests, where are those? I only see one README, which is quite short.

I forgot to add the README but I have now added it. Currently, there's only one diagram for the general architecture of the integration test. For additional diagrams, I've referred to the Testbed README since they have their own diagram and another diagram will be included with the Jaeger-v2 Kafka PR.

@yurishkuro yurishkuro merged commit b82c806 into jaegertracing:main Mar 16, 2024
37 checks passed
@yurishkuro
Copy link
Member

@james-ryans can we add goleaks check?

🔴 Error(check-goleak): no goleak check in package ./cmd/jaeger/internal/integration/

@james-ryans
Copy link
Contributor Author

@james-ryans can we add goleaks check?

🔴 Error(check-goleak): no goleak check in package ./cmd/jaeger/internal/integration/

Yes, but apparently it has a goroutine leaking from the testbed. There's no functionality to close the data sender's gRPC connection, I'll try to solve the upstream issue first then get back here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog:ci Change related to continuous integration / testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants