Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for ARM shapes on Oracle OCI #16277

Merged
merged 3 commits into from Oct 6, 2023
Merged

Conversation

nvinuesa
Copy link
Member

@nvinuesa nvinuesa commented Sep 18, 2023

By correctly parsing the version of the ubuntu cloud images returned by OCI's API, we can then create different bases for ubuntu aarch64 (the cloud image needed to target arm-based shapes).

It is important to note that we will not be supporting Minimal aarch64 ubuntu images, based on what is documented for using ARM-based shapes: https://docs.oracle.com/en-us/iaas/Content/Compute/References/images.htm

...For Arm-based shapes, use the Ubuntu image, not Minimal Ubuntu.

Bonus: By moving the retrieve of instanceTypes() after we correctly parsed the InstanceImage, we avoid useless calls to the API to retrieve the shapes for images that we don't support anyways.

Bonus 2: Fixed the Makefile for building cross-compiled versions (thx to @hmlanigan and @tlm).

Bonus 3: Fixed the TestRefreshImageCacheWithInvalidImage test which had incorrect logic.

Checklist

  • Code style: imports ordered, good names, simple structure, etc
  • Comments saying why design decisions were made
  • Go unit tests, with comments saying what you're testing
  • Integration tests, with comments saying what you're testing
  • doc.go added or updated in changed packages

QA steps

  • Build
rm -rf _build/
make go-install
GOOS=linux GOARCH=arm64 make go-build
AGENT_PACKAGE_PLATFORMS=linux/arm64 make simplestreams
export JUJU_METADATA_SOURCE="/home/nicolas/workspace/canonical/juju/_build/simplestreams"
  • Bootstrap and add model
juju bootstrap --debug --constraints arch=arm64 --config compartment-id=ocid1.compartment.oc1..aaaaaaaanvu2racnlczevenu73dlcf3nokgsjpdkbdgp4xbrz3lb2y4giyjq oci-canonical c
juju add-model m
  • Add machines
juju add-machine --constraints "arch=arm64"
juju add-machine --constraints "arch=arm64 cores=3 mem=11G"

If you log in the OCI dashboard, you should see 3 VM.Standard.A1.Flex instances (one for the controller, two for the added machines) with 1 Ocpu for two of them and 3 Ocpus/ 11GB for the third one.

Links

Jira card: JUJU-4514

Copy link
Member

@hmlanigan hmlanigan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Images supporting Ubuntu 22.04 with amd64 are getting lost in the processing somewhere, not sure where: Image ocid1.image.oc1.phx.aaaaaaaapj3tk2to46hrihgab6aytk3xfv5tcdbdms4k2ixzoz64zxlpmjfq with String Canonical-Ubuntu-22.04-aarch64-2023.08.23 is being parsed as Canonical Ubuntu 22.04 per MM yesterday

Makefile Show resolved Hide resolved
// Specifically, for ubuntu, we need to create a different base when the image
// name contains "aarch64", which are the images that target ARM-based shapes.
func NewInstanceImage(img ociCore.Image, compartmentID *string) (InstanceImage, error) {
logger.Warningf("*** CREATING INSTANCE IMAGE FROM '%s'//'%s'", *img.OperatingSystem, *img.OperatingSystemVersion)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this a trace or debug, it's noise as a warning

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I had forgotten to clean up after debugging, done

Comment on lines -334 to -353
// This code is very brittle. It's highly dependent on the strings currently
// returned by the oracle sdk. The best option is to have
// PlatformConfigOptions as they have types we can rely on.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a warning like this for developers somewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we are no longer using PlatformConfigOptions, what comment should we keep from that?

Comment on lines 41 to 42
UbuntuMinimalBase = "ubuntu-minimal"
UbuntuARMBase = "ubuntu-" + arch.ARM64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not a valid OS in a Base in juju proper, but an Oracle implementation detail. If they leak out of the provider into state, juju won't be able to handle them appropriately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the logic, this is gone now

UbuntuPostfix UbuntuVersionPostfix = ""
UbuntuMinimalPostfix UbuntuVersionPostfix = "Minimal"
UbuntuAARCH64Postfix UbuntuVersionPostfix = "aarch64"
UbuntuMinimalAARCH64Postfix UbuntuVersionPostfix = "Minimal aarch64"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UbuntuMinimalAARCH64Postfix is unused.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleaned it up, done

@@ -116,6 +127,21 @@ func (i *InstanceImage) SetInstanceTypes(types []instances.InstanceType) {
i.InstanceTypes = types
}

// UniqueArchs returns the list of unique supported architectures by the
// InstanceImage. This is necessary because an InstanceImage can support
// multiple shapes, and therefore supporting multiple architectures.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"and therefore supporting multiple architectures." is incorrect for oracle.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yes, I removed it thanks

Comment on lines 134 to 131
uniqueVals := make(map[string]bool)
var archs []string
for _, val := range i.InstanceTypes {
if !uniqueVals[val.Arch] {
archs = append(archs, val.Arch)
}
uniqueVals[val.Arch] = true
}
return archs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set.Strings from github.com/juju/collections does this for you, nor does it allow you to have duplicate strings.

Suggested change
uniqueVals := make(map[string]bool)
var archs []string
for _, val := range i.InstanceTypes {
if !uniqueVals[val.Arch] {
archs = append(archs, val.Arch)
}
uniqueVals[val.Arch] = true
}
return archs
archs := set.NewStrings()
for _, val := range i.InstanceTypes {
arches.Add(val.Arch)
}
return archs.Values()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh nice! done

Comment on lines 506 to 512
// Since the images for ARM machine will include "aarch64" on their
// name, we need to make sure we select the correct base that was
// taken from the parsed image name.
if arch == corearch.ARM64 {
args.InstanceConfig.Base.OS = UbuntuARMBase
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have concerns about overloading the defined juju base.OS values with architecture details and the values ending up in state. Nor am I clear how this interacts with the architecture constraints.

What about updating imgCache.SupportedShapes to take an architecture value as well. The actual imgCache is local to the Oracle provider.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, this was a very bad idea. I changed the implementation, now we pass the architecture as parameter to the image functions. Also, we keep track of the image's architecture on the cache so we can select them using the passed arch parameter.

// normaliseArchAndInstType returns the Juju architecture and instance type
// corresponding to a shape's reported architecture and instance type based
// off ShapePlatformConfigOptionsTypeEnum.
func normaliseArchAndInstType(val ociCore.ShapePlatformConfigOptionsTypeEnum) (string, string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still need to handle the InstanceTypes here, VM, BM etc for use.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is done in instTypeByShapeName() which I didn't touch

@hmlanigan
Copy link
Member

@nvinuesa once a review has started, please add new commits with changes rather than force pushing an amended commit (excepting rebases). It makes it more difficult for reviewers. They can be squashed before merge.

@hmlanigan hmlanigan added the do not merge Even if a PR has been approved, do not merge the PR! label Sep 22, 2023
@hmlanigan
Copy link
Member

This PR is blocked while @nvinuesa investigates improved image and instance modeling for juju where Flexible instances exist in a cloud.

jujubot added a commit that referenced this pull request Oct 3, 2023
#16357

In the context of adding ARM support for Oracle OCI (#16277), we must fist support oracle flexible shapes because the only ARM shape on OCI is a flexible one (https://docs.oracle.com/en-us/iaas/Content/Compute/References/computeshapes.htm#flexible).

In order to support flexible (https://docs.oracle.com/en-us/iaas/Content/Compute/References/computeshapes.htm#flexible) shapes we must pass some configuration `ShapeConfig` (at least the number of cpu-cores) at the moment of launching the instance. This configuration value will be either the user-passed constraint(s) cpu-cores and memory, else a default minimum cpu-cores. We also define a new constant for minimum cpu-cores that is matched on `instances.MatchingInstanceTypes`. 

Also, we cannot pass this `ShapeConfig` value if the selected shape is not flexible so we have to be able to distinct between flexible and non-flexible shapes. This was done by adding `MaxCpuCores` and `MaxMem` on `InstanceType`

_Bonus: some cleanup on tests._

## Checklist

*If an item is not applicable, use `~strikethrough~`.*

- [X] Code style: imports ordered, good names, simple structure, etc
- [X] Comments saying why design decisions were made
- [X] Go unit tests, with comments saying what you're testing
- [ ] ~[Integration tests](https://github.com/juju/juju/tree/main/tests), with comments saying what you're testing~
- [ ] ~[doc.go](https://discourse.charmhub.io/t/readme-in-packages/451) added or updated in changed packages~

## QA steps

```
juju bootstrap --debug --config compartment-id=ocid1.compartment.oc1..aaaaaaaanvu2racnlczevenu73dlcf3nokgsjpdkbdgp4xbrz3lb2y4giyjq oci-canonical c
```
check OCIs console and make sure the `VM.Standard1.1` shape was selected as the controller running instance.
 
## Links

**Jira card:** JUJU-[4694]
@nvinuesa nvinuesa removed the do not merge Even if a PR has been approved, do not merge the PR! label Oct 3, 2023
@hmlanigan
Copy link
Member

@nvinuesa with the qa instructions, I fail to bootstrap with:

18:36:11 ERROR juju.cmd.juju.commands bootstrap.go:988 failed to bootstrap model: cannot use agent built for "arm64" using a machine running on "amd64"
18:36:11 DEBUG juju.cmd.juju.commands bootstrap.go:989 (error details: [{github.com/juju/juju/cmd/juju/commands.(*bootstrapCommand).Run:1087: failed to bootstrap model} {github.com/juju/juju/environs/bootstrap.Bootstrap:732: } {cannot use agent built for "arm64" using a machine running on "amd64"}])

Copy link
Member

@hmlanigan hmlanigan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need more time to review. The parsing is complex. The qa did not work as written.

provider/oci/images.go Show resolved Hide resolved
provider/oci/images.go Show resolved Hide resolved
VirtType: string(val.ImageType),
// Iterate over the list of supported architectures of the
// image.
for _, arch := range val.Archs() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to iterate over val.Arches(), if val can only be 1 arch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the issue is that we cannot enforce that. Since the instanceTypes are created from the returned list of shapes from the API, we can't be sure that it will always be the case. You suggest that we throw an error if len(val.Archs()) > 1?

Comment on lines 272 to 273
// struct returned by oci's API. Also, it returns the architecture that is
// detected from the image OperatingSystemVersion and it's name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2nd sentence doesn't match the function return values.

s/it's/its/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just rephrased it

// we need to split the provided OperatingSystemVersion, since
// it can contain information that does not belong to the base's
// channel.
// E.g: Canonical-Ubuntu-22.04-Minimal-aarch64-2023.08.27-0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More examples of possible OperatingSystemVersion values would be helpful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a second example, with more detailed explanation

Comment on lines 321 to 323
// becomes:
// OperatingSystem: Canonical Ubuntu
// OperatingSystemVersion: 22.04 Minimal aarch64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was copied from older code, and should not be valid with your fixes. Please validate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rephrased this comment

@nvinuesa
Copy link
Member Author

nvinuesa commented Oct 4, 2023

@hmlanigan regarding the QA, I suspect you already had something in _build/ probably another simplestreams? can you try running:

rm -rf _build
make go-install
GOOS=linux GOARCH=arm64 make go-build
AGENT_PACKAGE_PLATFORMS=linux/arm64 make simplestreams
export JUJU_METADATA_SOURCE="/home/nicolas/workspace/canonical/juju/_build/simplestreams"

please?

Copy link
Member

@hmlanigan hmlanigan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one took a while. It was worth it to improve how juju models instance selections for newer types of cloud machines.

provider/oci/images.go Show resolved Hide resolved
By correctly parsing the version of the ubuntu cloud images returned by
OCI's API, we can then create different bases for ubuntu aarch64 (the
cloud image needed to target arm-based shapes).
Bonus: fixed a test that had incorrect logic.
Since OCI image parsing is becoming complex, more examples were needed.
@nvinuesa
Copy link
Member Author

nvinuesa commented Oct 6, 2023

/merge

@jujubot jujubot merged commit 53ac4b4 into juju:3.3 Oct 6, 2023
18 of 19 checks passed
@jack-w-shaw jack-w-shaw mentioned this pull request Oct 6, 2023
jujubot added a commit that referenced this pull request Oct 8, 2023
#16396

Merges:
- #16368
- #16357
- #16135
- #16391
- #16277
- #16296

Conflicts:
- deleted by us: api/state_test.go
- both modified: apiserver/admin_test.go
- both modified: apiserver/facades/client/client/client.go
- both modified: apiserver/facades/client/client/status.go
- both modified: apiserver/facades/client/client/status_test.go
- both modified: apiserver/facades/client/storage/storage.go
- both modified: cmd/juju/ssh/ssh_machine.go
- both modified: cmd/juju/ssh/ssh_machine_test.go
- both modified: cmd/juju/status/export_test.go
- both modified: cmd/juju/status/status.go
- both modified: cmd/juju/status/status_internal_test.go
- both modified: cmd/juju/status/status_test.go
- deleted by them: core/charm/computedseries.go
- both modified: state/application.go
- both modified: state/application_test.go
- both modified: state/state.go
- both added: testcharms/charm-repo/bionic/mysql/manifest.yaml

All conflicts originating from:
- #16135
- #16296
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants