Skip to content

epomatti/azure-machinelearning-cm-vnet

Repository files navigation

Azure ML VNET

Implementation of AML network isolation with a customer-managed VNET.

Setup

Create the variables file:

cp config/template.tfvars .auto.tfvars

Configuration:

  1. Set your IP address in the allowed_ip_address variable.
  2. Set your the Entra ID tenant in the entraid_tenant_domain variable.

Generate a key pair to manage instances with SSH:

mkdir keys
ssh-keygen -f keys/ssh_key

Tip

To allow public connection to the AML workspace, set mlw_public_network_access_enabled = true.

Create the resources:

terraform init
terraform apply -auto-approve

Confirm and approve any private endpoints, both in the subscription, and within the managed AML workspace.

Manually create the datastores in AML and run the test notebooks.

Compute

Create the AML compute and other resources by changing the appropriate flags:

Note

Follow the documentation steps to enable AKS VNET integration, if not yet done so.

mlw_instance_create_flag = true
mlw_aks_create_flag      = true
mlw_mssql_create_flag    = true

Container Registry

Extra configuration is required when using an Container Registry with private endpoints.

After creating the compute node, follow the documentation to enable docker builds in AML:

az ml workspace update --name myworkspace --resource-group myresourcegroup --image-build-compute mycomputecluster

IAM

This project has two roles which require different set of permissions:

User Activities
azureadmin Administration of all related Azure resources.
datascientist Development in the AML workspace.

Firewall

To demonstrate protection against data exfiltration, this exercise implements Azure Firewall. The requirements for this design are documented in this Configure inbound and outbound network traffic article.

Important

Additional steps for hardening the data exfiltration protection are available in the Azure Machine Learning data exfiltration prevention documentation.

Set the flag to enable the Azure Firewall resources and apply the infrastructure:

firewall_create_flag = true

This will create the firewall, policies, rules, routes, and other resources.

Tip

It's also possible to get a list of hosts and ports, following this guideline.

Forward Proxy

Caution

It was not possible to configure a forward proxy on instance creation (with a creation script) when deploying to an isolated Virtual Network. It seems that the provisioning procedure is overriding the proxy configuration from the startup script. The only official architecture supported by Microsoft with network isolation seems to be using a Firewall for egress.

Enable Proxy

Set the proxy flag to true:

vm_proxy_create_flag = true

Configure the compute instance with sample file custom/instance-proxy-init.sh.

Proxy connection will be configured on init following the proxy documentation.

Squid

Connect to the proxy VM server:

ssh -i keys/ssh_key azureuser@<public-ip>

Squid will already be installed via cloud-init. If you need to make changes, check the official docs.

Configuration can be set in file /etc/squid/squid.conf.

Set some hostname parameters:

visible_hostname squid.private.litware.com
hostname_aliases squid.private.litware.com

Change the http_access setting to allow all connections:

# http_access deny !Safe_ports
http_access allow all

Restart the service:

sudo systemctl restart squid.service

Testing with default configuration:

curl -x "http://squid.private.litware.com:3128" "https://example.com/"

NGINX

Note

From this thread, running NGINX full proxy with HTTPS will required additional configuration steps.

Connect to the proxy server:

ssh -i keys/ssh_key azureuser@<public-ip>

I've used this article as reference to setup the forward proxy server on NGINX.

  1. Comment the default server config within /etc/nginx/sites-enabled/default.
  2. Create the [nginx/forward][nginx/forward] config file.
  3. Restart NGINX (systemctl restart nginx.service).

The forward proxy service should be available at port 8888.

curl -x "http://127.0.0.1:8888" "https://example.com/"

Clean-up

Delete the resources and avoid unplanned costs:

terraform destroy -auto-approve