Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add public IP option to AzureML Computes #21377

Merged
merged 22 commits into from May 31, 2023

Conversation

Lucasjuv
Copy link
Contributor

@Lucasjuv Lucasjuv commented Apr 11, 2023

Trying to address issue #20078.

I've added the enable_node_public_ip option to AzureML computes to be able to provision secure environments.

Copy link
Collaborator

@katbyte katbyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pr @Lucasjuv - I think we'll need to read this back into state in the read function? aside from that i've left to comments inline and once address i think this will be good to test and merge

Copy link
Member

@stephybun stephybun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Property names need to be updated in the tests

@stephybun
Copy link
Member

@Lucasjuv I updated the properties in the tests hoping that'd be enough to get this in to this week's release but unfortunately there are still test failures:

------- Stdout: -------
=== RUN   TestAccComputeInstance_complete
=== PAUSE TestAccComputeInstance_complete
=== CONT  TestAccComputeInstance_complete
testcase.go:110: Step 1/2 error: Error running apply: exit status 1
Error: creating Machine Learning Compute ("Compute (Subscription: \"*******\"\nResource Group Name: \"acctestRG-ml-230419131828695174\"\nWorkspace Name: \"acctest-MLW2304191318286951\"\nCompute Name: \"acctest23041974\")"): machinelearningcomputes.MachineLearningComputesClient#ComputeCreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: Code="UserError" Message="Provided EnableNodePublicIp 'False' requires workspace private endpoint." Details=[]
with azurerm_machine_learning_compute_instance.test,
on terraform_plugin_test.tf line 106, in resource "azurerm_machine_learning_compute_instance" "test":
106: resource "azurerm_machine_learning_compute_instance" "test" {
--- FAIL: TestAccComputeInstance_complete (365.64s)
FAIL

@Lucasjuv
Copy link
Contributor Author

@stephybun I managed to run the acceptance tests locally but they are hanging once the compute instance is created successfully without public the public IP. I'll try to run the test for the cluster.

@Lucasjuv
Copy link
Contributor Author

I've tried testing the cluster and it complains about vCPU quotas. Even though I have nothing else on the subscription. The Compute Instance test still hangs. I've noticed that on some cases the public access on the workspace is disabled. That might be be due to me adding the private endpoint. I've also tried opening the nsg rules to access the workspace and take a look at the computes on the studio.

@irakhlin
Copy link

@Lucasjuv any progress on this getting merged in? I was about to push up a commit with these same changes but then found this open pull request. I notice you were having issues with your tests not finishing when no public IP is set with a private endpoint, I think the issue is related to not including a private DNS zone. I believe when deploy azure ML workspace on a private endpoint you will need to add a private DNS zone for ML that is linked to the vnet.
https://learn.microsoft.com/en-us/azure/private-link/private-endpoint-dns?view=azureml-api-2
https://learn.microsoft.com/en-us/azure/machine-learning/how-to-configure-private-link?view=azureml-api-2&tabs=cli#create-a-workspace-that-uses-a-private-endpoint

@irakhlin
Copy link

@stephybun now that tests are passing what would be needed to have this merged in for the next release?

@Lucasjuv
Copy link
Contributor Author

Related PR: #21706

@Lucasjuv
Copy link
Contributor Author

@stephybun Acceptance Tests are passing.

Copy link
Member

@mbfrahry mbfrahry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Lucasjuv, I've taken a pass at this PR and it's got breaking changes the way it is written. I've detailed what is going on below. We'll most likely want to remove the Default value and just have people pass it in when they want to use it.

@Lucasjuv
Copy link
Contributor Author

Hey @Lucasjuv, I've taken a pass at this PR and it's got breaking changes the way it is written. I've detailed what is going on below. We'll most likely want to remove the Default value and just have people pass it in when they want to use it.

I think that the current API version is indeed not returning the property for the compute instance. I didn't test the cluster yet but when we get that API version update it might work. If not I'll have to open a bug on azure's API spec repo.

@Lucasjuv Lucasjuv closed this May 25, 2023
@Lucasjuv
Copy link
Contributor Author

Lucasjuv commented May 29, 2023

After some testing I discovered that the API indeed does not return the value for the public IP on compute instances but It does for the compute cluster so I removed the compute instance changes from this PR.

@stephybun
Copy link
Member

stephybun commented May 30, 2023

@Lucasjuv unfortunately I see the same behaviour mentioned by @mbfrahry on compute clusters. If the value hasn't previously been set then it isn't returned by the API so the current implementation would be a breaking change for existing clusters.

@ms-henglu the default value defined in the swagger is for version 2023-04-01 so it looks like we should probably upgrade the SDK before adding support for this - could you take a look into doing this?

@ms-henglu
Copy link
Contributor

Hi @stephybun ,

My bad, I attached the wrong link, this default value is defined in 2022-05-01 as well, this is the api-version azurerm is using.
https://github.com/Azure/azure-rest-api-specs/blob/main/specification/machinelearningservices/resource-manager/Microsoft.MachineLearningServices/stable/2022-05-01/machineLearningServices.json#L3530

@Lucasjuv
Copy link
Contributor Author

Lucasjuv commented May 31, 2023

@Lucasjuv unfortunately I see the same behaviour mentioned by @mbfrahry on compute clusters. If the value hasn't previously been set then it isn't returned by the API so the current implementation would be a breaking change for existing clusters.

@ms-henglu the default value defined in the swagger is for version 2023-04-01 so it looks like we should probably upgrade the SDK before adding support for this - could you take a look into doing this?

If you look at the code right now the read function assumes the property true if it is not received. The basic and complete acceptance tests are working so the property is returned in case it is set to false. It is important to note that in the API the property for the CC is in a different object called 'AMLCompute'. This shouldn`t be a breaking change for existing clusters.

Copy link
Member

@stephybun stephybun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies - it appears I was looking at an outdated version of the PR. I made a minor change to remove the RequiredWith and to add validation in the create to check that subnet_resource_id is set when node_public_ip_enabled is false since otherwise subnet_resource_id will always be required even when node_public_ip_enabled is set to true.

Other than that LGTM 🎋

@stephybun stephybun merged commit 3e5769f into hashicorp:main May 31, 2023
14 checks passed
@github-actions github-actions bot added this to the v3.59.0 milestone May 31, 2023
stephybun added a commit that referenced this pull request May 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants