Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In-place update of Cosmos DB geo_location #4315

Closed

Conversation

mikhailshilkov
Copy link
Contributor

Fixes #3532
Removes ForceNew from location property of geo_location

@ghost ghost added the size/XS label Sep 13, 2019
Copy link
Collaborator

@katbyte katbyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pr @mikhailshilkov,

Could we add a test to verify this works and prevent regressions? thanks

@mikhailshilkov
Copy link
Contributor Author

@katbyte Thanks for a quick response!

There are several tests that poke with locations and validate the result:

  • TestAbcAzureRMCosmosDBAccount_updatePropertiesAndLocation
  • TestAccAzureRMCosmosDBAccount_geoReplicated_add_remove
  • TestAccAzureRMCosmosDBAccount_geoReplicated_rename
  • TestAccAzureRMCosmosDBAccount_complete

Do you think that's not enough to prevent regression? If so, do you have a suggestion for scenarios of the extra test(s)?

Copy link
Collaborator

@katbyte katbyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mikhailshilkov,

I looked at TestAbcAzureRMCosmosDBAccount_updatePropertiesAndLocation as that would be the relevent test and it doesn't appear to actually change the location! Could we update it so it is, as well as update the documentation to reflect that changing the property no longer forces a new resource?

@mikhailshilkov
Copy link
Contributor Author

@katbyte I'm a bit confused here. My change is related to geo_location sub-resource, not the top-level location property.

The mentioned test does add the second geo_location. Here is the first configuration:

resource "azurerm_resource_group" "test" {
          name     = "acctestRG-190917093538081830"
          location = ""
        }

        resource "azurerm_cosmosdb_account" "test" {
          name                = "acctest-190917093538081830"
          location            = "${azurerm_resource_group.test.location}"
          resource_group_name = "${azurerm_resource_group.test.name}"
          offer_type          = "Standard"

          consistency_policy {
            consistency_level = "Session"


          }

          geo_location {
            location          = "${azurerm_resource_group.test.location}"
            failover_priority = 0
          }


        }

here is the second one:

resource "azurerm_resource_group" "test" {
          name     = "acctestRG-190917093538081830"
          location = ""
        }

        resource "azurerm_cosmosdb_account" "test" {
          name                = "acctest-190917093538081830"
          location            = "${azurerm_resource_group.test.location}"
          resource_group_name = "${azurerm_resource_group.test.name}"
          offer_type          = "Standard"

          consistency_policy {
            consistency_level = "BoundedStaleness"


                max_interval_in_seconds = 373
                max_staleness_prefix    = 100001

          }

          geo_location {
            location          = "${azurerm_resource_group.test.location}"
            failover_priority = 0
          }


                geo_location {
                    prefix            = "acctest-190917093538081830-custom-id"
                    location          = "westus"
                    failover_priority = 1
                }


        }

The doc doesn't mention that geo_location causes a replace either (while it does cause a replace in the current master).

@ghost ghost removed the waiting-response label Sep 17, 2019
@mikhailshilkov
Copy link
Contributor Author

@katbyte Were you able to take a look at my last comment? I'd love to finish this and make it into 1.35.

@tombuildsstuff tombuildsstuff modified the milestones: v1.35.0, v1.36.0 Oct 2, 2019
Copy link
Collaborator

@katbyte katbyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mikhailshilkov,

Sorry for the delay in looping back, I was pulled away from PRs for the last little while and am now trying to catch up. You are correct i mistakenly thought you were talking about the resource location, sorry about that i was trying to review as many as i could then.

The existing tests work with the proposed changes however this resource was written with the assumption changing that property would force a new resource because at the time i don't think the API could handle such requests. So the tests don't completely cover all situations, ie changing the location of a geo_location or the failover priority.

I tried to add a new test covering that but i can't push to you branch:

func TestAccAzureRMCosmosDBAccount_geoReplicated_changeLocation(t *testing.T) {
	ri := tf.AccRandTimeInt()
	resourceName := "azurerm_cosmosdb_account.test"

	resource.ParallelTest(t, resource.TestCase{
		PreCheck:     func() { testAccPreCheck(t) },
		Providers:    testAccProviders,
		CheckDestroy: testCheckAzureRMCosmosDBAccountDestroy,
		Steps: []resource.TestStep{
			{
				Config: testAccAzureRMCosmosDBAccount_geoReplicated(ri, testLocation(), testAltLocation()),
				Check:  checkAccAzureRMCosmosDBAccount_basic(resourceName, testLocation(), string(documentdb.BoundedStaleness), 2),
			},
			{
				Config: testAccAzureRMCosmosDBAccount_geoReplicated(ri, testLocation(), testAltLocation2()),
				Check:  checkAccAzureRMCosmosDBAccount_basic(resourceName, testLocation(), string(documentdb.BoundedStaleness), 2),
			},
			{
				ResourceName:      resourceName,
				ImportState:       true,
				ImportStateVerify: true,
			},
		},
	})
}

Would you mind adding that & a test that adds two replicated locations and then swaps their priority? (that one may exist but i'm not sure)

thanks!

@ghost ghost added size/M and removed size/XS labels Oct 9, 2019
@mikhailshilkov
Copy link
Contributor Author

mikhailshilkov commented Oct 9, 2019

Hey @katbyte thank you for coming back and for your test suggestions.

I added your test + a test with three locations and swapping the priorities of the last two. The first location's priority is hard-coded deep inside the chain, so I didn't dare to touch it. Please take a look.

@ghost ghost removed the waiting-response label Oct 9, 2019
Copy link
Collaborator

@katbyte katbyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for those new tests @mikhailshilkov,

I tried to verify they pass in our CI and i am getting the following errors:

------- Stdout: -------
=== RUN   TestAccAzureRMCosmosDBAccount_geoReplicated_changeLocation
=== PAUSE TestAccAzureRMCosmosDBAccount_geoReplicated_changeLocation
=== CONT  TestAccAzureRMCosmosDBAccount_geoReplicated_changeLocation
--- FAIL: TestAccAzureRMCosmosDBAccount_geoReplicated_changeLocation (2314.40s)
    testing.go:569: Step 1 error: Check failed: Check 10/16 error: azurerm_cosmosdb_account.test: Attribute 'read_endpoints.#' expected "2", got "1"
    testing.go:630: Error destroying resource! WARNING: Dangling resources
        may exist. The full state and error is shown below.
        
        Error: errors during apply: Error issuing AzureRM delete request for CosmosDB Account 'acctest-191010034311487224': documentdb.DatabaseAccountsClient#Delete: Failure sending request: StatusCode=412 -- Original Error: Code="PreconditionFailed" Message="There is already an operation in progress which requires exlusive lock on this service acctest-191010034311487224. Please retry the operation after sometime.\r\nActivityId: d2f4c128-5521-4550-9686-a791407b0b16, Microsoft.Azure.Documents.Common/2.7.0"

and

=== RUN   TestAccAzureRMCosmosDBAccount_geoReplicated_swapPriority
=== PAUSE TestAccAzureRMCosmosDBAccount_geoReplicated_swapPriority
=== CONT  TestAccAzureRMCosmosDBAccount_geoReplicated_swapPriority
--- FAIL: TestAccAzureRMCosmosDBAccount_geoReplicated_swapPriority (3685.57s)
    testing.go:630: Error destroying resource! WARNING: Dangling resources
        may exist. The full state and error is shown below.
        
        Error: errors during apply: Error issuing AzureRM delete request for CosmosDB Account 'acctest-191010034311531053': documentdb.DatabaseAccountsClient#Delete: Failure sending request: StatusCode=412 -- Original Error: Code="PreconditionFailed" Message="There is already an operation in progress which requires exlusive lock on this service acctest-191010034311531053. Please retry the operation after sometime.\r\nActivityId: 53cfbf65-3ca1-4a38-9f8a-75fa70b8205d, Microsoft.Azure.Documents.Common/2.7.0"

so some additional changes may be required.

@mikhailshilkov
Copy link
Contributor Author

@katbyte Did you run all tests and only the two new ones failed? Or you only ran two?

Based on the error, it sounds like we didn't wait for the update operation to complete properly.

@ghost ghost removed the waiting-response label Oct 10, 2019
@mikhailshilkov
Copy link
Contributor Author

mikhailshilkov commented Oct 10, 2019

Yes, I set up the infrastructure to test locally and can reproduce the failure.

In case of TestAccAzureRMCosmosDBAccount_geoReplicated_changeLocation, the update code runs two operation: scale down to 1 "old" region, then scale to two "new" regions. The second operation insta-succeeds while Cosmos is still in progress of actually updating. Subsequently, the delete operation fails.

Do you remember the reason why we have this 2-step account change with old locations and new locations? That was #1055 of yours. Should I try removing it? Or shall we adjust the readiness check to check the number of regions?

@katbyte
Copy link
Collaborator

katbyte commented Oct 15, 2019

@mikenomitch,

Honestly i don't recall, most likely it was because at the time the API didn't support a single step change. I would try removing it and seeing if it is smart enough now to figure it out (hence why i believe this property used to be force new, at the time it didn't work). the API has changed a fair amount in the last year and half.

@tombuildsstuff tombuildsstuff removed this from the v1.36.0 milestone Oct 22, 2019
@tombuildsstuff tombuildsstuff added this to the v1.37.0 milestone Oct 22, 2019
@katbyte katbyte removed this from the v1.37.0 milestone Oct 31, 2019
@katbyte
Copy link
Collaborator

katbyte commented Dec 4, 2019

Hi @mikhailshilkov,

Are you still working on this? I am looking into finishing this up and am unable to push to your branch so would need to open a new PR.

@mikhailshilkov
Copy link
Contributor Author

@katbyte I wasn't able to find the right combination to make this reliable, and then switched to something else... It would be awesome if you could do a better job.
I have "Allow edits from maintainers" on, let me know if I can grant you access in another way. Otherwise, a new PR should be fine.

@ghost ghost removed the waiting-response label Dec 4, 2019
@katbyte
Copy link
Collaborator

katbyte commented Dec 4, 2019

Still can't push to the branch, i'll see what i can do and open a new PR if when i'm ready i still can't.

@katbyte
Copy link
Collaborator

katbyte commented Dec 17, 2019

Hi @mikhailshilkov,

I wasn't able to get this working reliably either and am going to move on to other PRs. As it appears it may not be possible to reliably do this i am going to close the PR until we can figure out a solution that works in all cases. Thanks again for trying!

@katbyte katbyte closed this Dec 17, 2019
@ghost
Copy link

ghost commented Mar 29, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!

@hashicorp hashicorp locked and limited conversation to collaborators Mar 29, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding an additional geo_location to an azurerm_cosmosdb_account should not require replacement
3 participants