Skip to content

some datasets in RSS blueprint erroneously include addresses #7299

@davepacheco

Description

@davepacheco

After a fresh deployment of a4x2, I found that many datasets in the system's initial blueprint have non-NULL (and misleading) address and port fields in the database. (This is very far removed from the initial symptoms so I'll jump to the root cause here and put the consequences / debugging process into a separate comment.) I think the problem is here:

for d in sled_config.datasets.datasets.values() {
// Only the "Crucible" dataset needs to know the address
let address = sled_config.zones.iter().find_map(|z| {
if let BlueprintZoneType::Crucible(
blueprint_zone_type::Crucible { address, dataset },
) = &z.zone_type
{
if &dataset.pool_name == d.name.pool() {
return Some(*address);
}
};
None
});
datasets.insert(
d.id,
BlueprintDatasetConfig {
disposition: BlueprintDatasetDisposition::InService,
id: d.id,
pool: d.name.pool().clone(),
kind: d.name.dataset().clone(),
address,
compression: d.inner.compression,
quota: d.inner.quota,
reservation: d.inner.reservation,
},
);
}

This code is taking the DatasetsConfig that was generated during RSS and converting it into a BlueprintDatasetsConfig that will become the rack's initial blueprint. The blueprint struct has space for a socket address (IP addr and TCP port), which is only used for one kind of dataset: the persistent dataset of a Crucible zone. That's not in DatasetsConfig. This code has to fill that in from the zone information. For each dataset in DatasetsConfig, it does this by looking for any zone of type "Crucible" on the same pool. If it finds one, then it populates the new BlueprintDatasetConfig for this dataset with the socket address (IP address and TCP port) of that Crucible zone. I think this is just wrong. As an example, my system has these datasets on this pool:

oxp_15b53b30-72cf-4edb-a7c4-325ee3f7c679/crucible
oxp_15b53b30-72cf-4edb-a7c4-325ee3f7c679/crypt/debug                                                 
oxp_15b53b30-72cf-4edb-a7c4-325ee3f7c679/crypt/zone                                                  
oxp_15b53b30-72cf-4edb-a7c4-325ee3f7c679/crypt/zone/oxz_crucible_049d9f96-6e06-43a0-a924-35146efd7b8c
oxp_15b53b30-72cf-4edb-a7c4-325ee3f7c679/crypt/zone/oxz_ntp_2b3c2cf8-bf97-4a7c-9327-712f1d589c7b

That's one Crucible zone's persistent dataset, a debug dataset, and a couple of transient zone root filesystems. In the initial blueprint, all of these have the same IP address and port (the one from the Crucible zone):

root@[fd00:1122:3344:102::3]:32221/omicron> select ip,port,id,kind,zone_name from bp_omicron_dataset where pool_id = '15b53b30-72cf-4edb-a7c4-325ee3f7c679' AND blueprint_id = '831679c9-26f8-4e3b-9873-e2522cfdc087';
           ip           | port  |                  id                  |   kind    |                     zone_name
------------------------+-------+--------------------------------------+-----------+----------------------------------------------------
  fd00:1122:3344:101::a | 32345 | 43a80037-e23f-44be-84eb-bb30bd1f539e | zone      | oxz_ntp_2b3c2cf8-bf97-4a7c-9327-712f1d589c7b
  fd00:1122:3344:101::a | 32345 | 6f610524-4329-4634-adab-ffbd6f65a653 | debug     | NULL
  fd00:1122:3344:101::a | 32345 | 801a8141-9e83-4cc0-9428-fb1db210657d | zone      | oxz_crucible_049d9f96-6e06-43a0-a924-35146efd7b8c
  fd00:1122:3344:101::a | 32345 | aff65822-a39d-4b21-9b1e-d94ba1688057 | crucible  | NULL
  fd00:1122:3344:101::a | 32345 | cac4df64-07ac-4266-9c73-822fb620ff9f | zone_root | NULL
(5 rows)

I believe this is wrong because the IP/port fields are supposed to be NULL for datasets other than a Crucible zone's persistent dataset. It's also misleading because if you didn't know that, you might reasonably think that the value for the NTP zone's dataset there is the IP of the NTP zone (for example), but it's not.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions