Skip to content

Conversation

austin-denoble
Copy link
Contributor

@austin-denoble austin-denoble commented Mar 13, 2024

Problem

We've had continued issues with flakiness of integration tests, and tests taking a while to run (25 - 30 minutes for the entire pr workflow in some cases). A lot of this is waiting for indexes and collections to be Ready instead of Initializing. Specifically, the CollectionTest and CollectionErrorTest can take a really long time due to setting up and waiting on multiple indexes and collections.

While working on the above issue, I also noticed test failures and repeated issues due to findIndexWithDimensionAndType, and tests running with indexes created by and belonging to other test runs. The isIndexReady function also tends to be inefficient as the polling isn't consistent, and we can end up waiting sometimes minutes longer than we may need to.

With the new Pinecone and Index/AsyncIndex class structure in place, we can refactor our integration tests around the new pattern.

Solution

Ultimately, I would like to move our tests away from the createIndexIfNotExistsDataPlane / createIndexIfNotExistsControlPlane functions and their reliance on the findIndexWithDimensionAndType function, which should be deprecated. We should come up with a pattern for sharing resources across all tests in a run, and setting up / tearing those down once. This can be worked on in subsequent PRs.

  • Move CollectionErrorTests tests into CollectionTest. Share index and collection setup @BeforeAll across collections tests. Wait less time for the indexes created from the collection to be ready as this specifically can take a number of minutes. Relax some of the assertions on the status of created collections and indexes as I don't think we need to be that thorough here.
  • Update createIndexIfNotExistsDataPlane to return a tuple (AbstractMap.SimpleEntry<String, Pinecone>) of the indexName and the Pinecone client to make things a bit easier to work with.
  • Update all of the dataPlane/ tests to use pineconeClient.createIndexConnection() and pineconeClient.createAsyncIndexConnection() for managing data plane operations.
  • Deprecate isIndexReady in lieu of waitUntilIndexIsReady. The polling is more consistent with this function, and it most likely saves us time overall.
  • Refactor ConfigureIndexTest to create its own index and clean up after.
  • Fix issue with findIndexByDimensionAndType calling isIndexReady() on each index while iterating through the list.
  • Fix issue in IndexInterface.validateUpsertRequest where we were trying to call sparseValuesWithUnsignedIndices.getIndicesWithUnsigned32IntList() and sparseValuesWithUnsignedIndices.getValuesList() on a possible null causing a NullPointerException.
  • Talked with @ssmith-pc, and I think it's standard practice in tests to not try/catch yourself unless you need to assert on the result. We should be letting errors throw to the test runner and let it handle them so we're not clobbering logs and stack traces. I've cleaned up try/catch statements which don't seem to be needed.
  • Running gradle integrationTest --info in pr.yml to get more detailed log output in the console for better troubleshooting of ongoing flapping.
  • Adding assertWithRetry wrappers for specific actions which have been troublesome. Adding Thread.sleep() to a few places to avoid hammering an index too quickly / etc.
  • Added a new describeIndexStats() overload to IndexInterface and Index / AsyncIndex to allow calling without needing to explicitly pass null.

I spent a lot of time running these locally and in CI to see how they perform. Overall, it seems like these changes improve overall reliability, although we still do see a few failures like the gRPC no healthy upstream on data plane operations with fresh indexes.

The total amount of time it takes to run both sets of integration tests has been cut significantly in most cases:

Screenshot 2024-03-13 at 6 09 36 PM

Next steps would be to create a Junit extension and possible IndexManagerSingleton to manage index resources across all tests directly. This would help make things more predictable and reliable when adding future tests. This would also allow us to handle concurrent integration test runs under the same API key, which is currently very difficult due to findIndexByDimensionAndType.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • Infrastructure change (CI configs, etc)

Test Plan

Run integration tests and compare over run length and

@austin-denoble austin-denoble changed the title Tweak Collection and Configure Index tests again Refactor CollectionTest and ConfigureIndexTest to increase speed and reliability Mar 14, 2024
@austin-denoble austin-denoble force-pushed the adenoble/tweak-collection-and-configure-index-tests branch from 47513a6 to 5e1fef3 Compare March 14, 2024 21:21
@austin-denoble austin-denoble changed the title Refactor CollectionTest and ConfigureIndexTest to increase speed and reliability Refactor CollectionTest, ConfigureIndexTest, and IndexManager to improve integration test speed and reliability Mar 16, 2024
Copy link
Contributor

@ssmith-pc ssmith-pc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general these changes look good and should improve the reliability and speed of the test runs.

I added some various test structure comments throughout

Copy link
Contributor

@rohanshah18 rohanshah18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for going through all of the changes. Looks like the collections tests are still failing. Also updating the CI to output more logs will add more noise and its not easily clear what tests are failing. I think having the short summary of what assertions failed with line numbers is cleaner and we can reproduce them locally but if this is helpful to you, we can keep add more info temporarily.

@austin-denoble
Copy link
Contributor Author

Looks like the collections tests are still failing. Also updating the CI to output more logs will add more noise and its not easily clear what tests are failing. I think having the short summary of what assertions failed with line numbers is cleaner and we can reproduce them locally but if this is helpful to you, we can keep add more info temporarily.

I left a comment about this, but I don't think running locally is a substitute for debugging what is happening in CI. We should be able to look at the CI logs, and have a somewhat clear idea what the issue is. You can pretty easily search through the logs for FAILED to see which specific tests are failing.

@austin-denoble austin-denoble merged commit c79c2ee into main Mar 18, 2024
@austin-denoble austin-denoble deleted the adenoble/tweak-collection-and-configure-index-tests branch March 18, 2024 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants