Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
7f1087d
Pull in RSS changes from 'use-dns' branch
smklein Jun 9, 2022
eca5484
RSS performs config by itself, mostly
smklein Jun 9, 2022
565862e
RSS side of handoff to Nexus mostly complete
smklein Jun 9, 2022
dfa614b
Handoff to Nexus is hacky, but working
smklein Jun 10, 2022
dc3b84b
Add bg work user, rack insert populate, patch tests
smklein Jun 12, 2022
e265f0d
Await RSS handoff, even in tests
smklein Jun 12, 2022
b5ca139
Partway through service allocation - still very WIP
smklein Jun 13, 2022
7e986b8
v1 of nexus-managed services is code complete; no tests yet
smklein Jun 14, 2022
95a5873
Add indices, add tests, fix bugs
smklein Jun 15, 2022
2a28eb9
It's hacky, but it's working. I'm seeing services be re-balanced corr…
smklein Jun 15, 2022
248d4cb
Merge branch 'dns-client' into rss-handoff
smklein Jun 15, 2022
b07322c
clippy, fmt
smklein Jun 15, 2022
7f41e42
Strongly-typed DNS service names
smklein Jun 15, 2022
a68de33
Populate DNS records
smklein Jun 16, 2022
746114b
Fix dns client bug, start shortening timeouts
smklein Jun 16, 2022
1b019b1
clippy
smklein Jun 16, 2022
94b4b46
Concurrent provisioning
smklein Jun 16, 2022
a02e009
Dynamic oximeter config
smklein Jun 16, 2022
a5be4d0
Allow oximeter to use config-provided addresses
smklein Jun 17, 2022
59dc382
Fix command-based tests
smklein Jun 17, 2022
81bf2d4
Nexus lazily accessing timeseries DB
smklein Jun 20, 2022
aed3ba6
Cleanup TODOs
smklein Jun 20, 2022
8fce9a1
Box resolver to make clippy happy
smklein Jun 20, 2022
d26ee14
Internal DNS tests
smklein Jun 20, 2022
4b5dab7
Clean up test code
smklein Jun 20, 2022
db2b545
no retry in client library
smklein Jun 20, 2022
027fb3b
Fix internal-dns
smklein Jun 20, 2022
bccb416
Merge branch 'dns-client' into rss-handoff
smklein Jun 20, 2022
bce58f4
Merge branch 'rack-id' into rss-handoff
smklein Jun 20, 2022
089623e
Merge branch 'background-work-user' into rss-handoff
smklein Jun 20, 2022
e33fb4b
fix typos, warnings
smklein Jun 20, 2022
9a9ca35
Merge branch 'rss-set-dns' into rss-handoff
smklein Jun 20, 2022
36135b2
Merge branch 'oximeter-resolves-nexus-address' into rss-handoff
smklein Jun 20, 2022
a82b653
Merge branch 'rack-populate' into rss-handoff
smklein Jun 21, 2022
9fc4994
Cleanup imports
smklein Jun 21, 2022
11ebb7b
[nexus] Add tests for rack endpoints
smklein Jun 21, 2022
19356a8
Merge branch 'rack-populate' into rss-handoff
smklein Jun 21, 2022
e6dc594
Merge branch 'nexus-resolves-clickhouse' into rss-handoff
smklein Jun 21, 2022
1822762
[nexus] Add tunable to disable background tasks
smklein Jun 21, 2022
55feaf6
Merge branch 'nexus-service-management' into rss-handoff
smklein Jun 21, 2022
d2536d7
Delete out-dated docs
smklein Jun 21, 2022
7fb947b
Merge branch 'nexus-service-management' into rss-handoff
smklein Jun 21, 2022
6c5e035
Merge branch 'nexus-service-management' into rss-handoff
smklein Jun 21, 2022
768221d
Merge branch 'nexus-service-management' into rss-handoff
smklein Jun 21, 2022
b6434e3
Merge branch 'nexus-service-management' into rss-handoff
smklein Jun 22, 2022
c5bd827
Merge branch 'nexus-service-management' into rss-handoff
smklein Jun 22, 2022
0ff033a
renamed opctx
smklein Jun 22, 2022
4d7a46c
in tests too
smklein Jun 22, 2022
5fc64aa
merge
smklein Jun 24, 2022
286699c
Merge branch 'nexus-service-management' into rss-handoff
smklein Jun 24, 2022
eed9437
Merge branch 'nexus-service-management' into rss-handoff
smklein Jun 24, 2022
900cd38
Merge branch 'nexus-service-management' into rss-handoff
smklein Jun 24, 2022
f540d3a
Merge branch 'nexus-service-management' into rss-handoff
smklein Jun 26, 2022
8cf9ad4
Merge branch 'nexus-service-management' into rss-handoff
smklein Jul 6, 2022
1a7bb40
partial merge
smklein Jul 11, 2022
ba1a731
Merge branch 'nexus-service-management' into rss-handoff
smklein Jul 11, 2022
6047b93
remove unused
smklein Jul 11, 2022
eba4486
Finish merge
smklein Jul 11, 2022
9b07d55
Merge branch 'nexus-service-management' into rss-handoff
smklein Jul 27, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 3 additions & 12 deletions docs/how-to-run.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -147,11 +147,11 @@ When we deploy, we're effectively creating a number of different zones
for all the components that make up Omicron (Nexus, Clickhouse, Crucible, etc).
Since all these services run in different zones they cannot communicate with
each other (and Sled Agent in the global zone) via `localhost`. In practice,
we'll assign addresses as per RFD 63 as well as incorporating DNS based
we assign addresses as per RFD 63 as well as incorporating DNS based
service discovery.

For the purposes of local development today, we specify some hardcoded IPv6
unique local addresses in the subnet of the first Sled Agent: `fd00:1122:3344:1::/64`.
For the purposes of local development today, we specify some hardcoded IP
addresses.

If you'd like to modify these values to suit your local network, you can modify
them within the https://github.com/oxidecomputer/omicron/tree/main/smf[`smf/` subdirectory].
Expand All @@ -164,15 +164,6 @@ be set as a default route for the Nexus zone.
|===================================================================================================
| Service | Endpoint
| Sled Agent: Bootstrap | Derived from MAC address of physical data link.
| Sled Agent: Dropshot API | `[fd00:1122:3344:0101::1]:12345`
| Cockroach DB | `[fd00:1122:3344:0101::2]:32221`
| Nexus: Internal API | `[fd00:1122:3344:0101::3]:12221`
| Oximeter | `[fd00:1122:3344:0101::4]:12223`
| Clickhouse | `[fd00:1122:3344:0101::5]:8123`
| Crucible Downstairs 1 | `[fd00:1122:3344:0101::6]:32345`
| Crucible Downstairs 2 | `[fd00:1122:3344:0101::7]:32345`
| Crucible Downstairs 3 | `[fd00:1122:3344:0101::8]:32345`
| Internal DNS Service | `[fd00:1122:3344:0001::1]:5353`
| Nexus: External API | `192.168.1.20:80`
| Internet Gateway | None, but can be set in `smf/sled-agent/config.toml`
|===================================================================================================
Expand Down
54 changes: 46 additions & 8 deletions nexus/src/app/rack.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ use crate::authz;
use crate::context::OpContext;
use crate::db;
use crate::db::lookup::LookupPath;
use crate::internal_api::params::ServicePutRequest;
use crate::internal_api::params::RackInitializationRequest;
use omicron_common::api::external::DataPageParams;
use omicron_common::api::external::Error;
use omicron_common::api::external::ListResultVec;
Expand Down Expand Up @@ -57,12 +57,13 @@ impl super::Nexus {
&self,
opctx: &OpContext,
rack_id: Uuid,
services: Vec<ServicePutRequest>,
request: RackInitializationRequest,
) -> Result<(), Error> {
opctx.authorize(authz::Action::Modify, &authz::FLEET).await?;

// Convert from parameter -> DB type.
let services: Vec<_> = services
let services: Vec<_> = request
.services
.into_iter()
.map(|svc| {
db::model::Service::new(
Expand All @@ -74,14 +75,51 @@ impl super::Nexus {
})
.collect();

// TODO(https://github.com/oxidecomputer/omicron/pull/1216):
// Actually supply datasets provided from the sled agent.
//
// This requires corresponding changes on the RSS side.
let datasets: Vec<_> = request
.datasets
.into_iter()
.map(|dataset| {
db::model::Dataset::new(
dataset.dataset_id,
dataset.zpool_id,
dataset.request.address,
dataset.request.kind.into(),
)
})
.collect();
self.db_datastore
.rack_set_initialized(opctx, rack_id, services, vec![])
.rack_set_initialized(opctx, rack_id, services, datasets)
.await?;

Ok(())
}

/// Awaits the initialization of the rack.
///
/// This will occur by either:
/// 1. RSS invoking the internal API, handing off responsibility, or
/// 2. Re-reading a value from the DB, if the rack has already been
/// initialized.
///
/// See RFD 278 for additional context.
pub async fn await_rack_initialization(&self, opctx: &OpContext) {
loop {
let result = self.rack_lookup(&opctx, &self.rack_id).await;
match result {
Ok(rack) => {
if rack.initialized {
return;
}
info!(
self.log,
"Still waiting for rack initialization: {:?}", rack
);
}
Err(e) => {
warn!(self.log, "Cannot look up rack: {}", e);
}
}
tokio::time::sleep(std::time::Duration::from_secs(2)).await;
}
}
}
11 changes: 6 additions & 5 deletions nexus/src/internal_api/http_entrypoints.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,9 @@ use crate::context::OpContext;
use crate::ServerContext;

use super::params::{
DatasetPutRequest, DatasetPutResponse, OximeterInfo, ServicePutRequest,
SledAgentStartupInfo, ZpoolPutRequest, ZpoolPutResponse,
DatasetPutRequest, DatasetPutResponse, OximeterInfo,
RackInitializationRequest, SledAgentStartupInfo, ZpoolPutRequest,
ZpoolPutResponse,
};
use dropshot::endpoint;
use dropshot::ApiDescription;
Expand Down Expand Up @@ -104,15 +105,15 @@ struct RackPathParam {
async fn rack_initialization_complete(
rqctx: Arc<RequestContext<Arc<ServerContext>>>,
path_params: Path<RackPathParam>,
info: TypedBody<Vec<ServicePutRequest>>,
info: TypedBody<RackInitializationRequest>,
) -> Result<HttpResponseUpdatedNoContent, HttpError> {
let apictx = rqctx.context();
let nexus = &apictx.nexus;
let path = path_params.into_inner();
let svcs = info.into_inner();
let request = info.into_inner();
let opctx = OpContext::for_internal_api(&rqctx).await;

nexus.rack_initialize(&opctx, path.rack_id, svcs).await?;
nexus.rack_initialize(&opctx, path.rack_id, request).await?;

Ok(HttpResponseUpdatedNoContent())
}
Expand Down
74 changes: 55 additions & 19 deletions nexus/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -66,23 +66,27 @@ pub fn run_openapi_internal() -> Result<(), String> {
.map_err(|e| e.to_string())
}

/// Packages up a [`Nexus`], running both external and internal HTTP API servers
/// wired up to Nexus
pub struct Server {
/// A partially-initialized Nexus server, which exposes an internal interface,
/// but is not ready to receive external requests.
pub struct InternalServer<'a> {
/// shared state used by API request handlers
pub apictx: Arc<ServerContext>,
/// dropshot server for external API
pub http_server_external: dropshot::HttpServer<Arc<ServerContext>>,
/// dropshot server for internal API
pub http_server_internal: dropshot::HttpServer<Arc<ServerContext>>,

config: &'a Config,
log: Logger,
}

impl Server {
/// Start a nexus server.
impl<'a> InternalServer<'a> {
/// Creates a Nexus instance with only the internal API exposed.
///
/// This is often used as an argument when creating a [`Server`],
/// which also exposes the external API.
pub async fn start(
config: &Config,
config: &'a Config,
log: &Logger,
) -> Result<Server, String> {
) -> Result<InternalServer<'a>, String> {
let log = log.new(o!("name" => config.deployment.id.to_string()));
info!(log, "setting up nexus server");

Expand All @@ -92,24 +96,55 @@ impl Server {
ServerContext::new(config.deployment.rack_id, ctxlog, &config)
.await?;

let http_server_starter_external = dropshot::HttpServerStarter::new(
&config.deployment.dropshot_external,
external_api(),
Arc::clone(&apictx),
&log.new(o!("component" => "dropshot_external")),
)
.map_err(|error| format!("initializing external server: {}", error))?;

let http_server_starter_internal = dropshot::HttpServerStarter::new(
&config.deployment.dropshot_internal,
internal_api(),
Arc::clone(&apictx),
&log.new(o!("component" => "dropshot_internal")),
)
.map_err(|error| format!("initializing internal server: {}", error))?;
let http_server_internal = http_server_starter_internal.start();

Ok(Self { apictx, http_server_internal, config, log })
}
}

/// Packages up a [`Nexus`], running both external and internal HTTP API servers
/// wired up to Nexus
pub struct Server {
/// shared state used by API request handlers
pub apictx: Arc<ServerContext>,
/// dropshot server for external API
pub http_server_external: dropshot::HttpServer<Arc<ServerContext>>,
/// dropshot server for internal API
pub http_server_internal: dropshot::HttpServer<Arc<ServerContext>>,
}

impl Server {
pub async fn start(internal: InternalServer<'_>) -> Result<Self, String> {
let apictx = internal.apictx;
let http_server_internal = internal.http_server_internal;
let log = internal.log;
let config = internal.config;

// Wait until RSS handoff completes.
let opctx = apictx.nexus.opctx_for_service_balancer();
apictx.nexus.await_rack_initialization(&opctx).await;

// With the exception of integration tests environments,
// we expect background tasks to be enabled.
if config.pkg.tunables.enable_background_tasks {
apictx.nexus.start_background_tasks().map_err(|e| e.to_string())?;
}

let http_server_starter_external = dropshot::HttpServerStarter::new(
&config.deployment.dropshot_external,
external_api(),
Arc::clone(&apictx),
&log.new(o!("component" => "dropshot_external")),
)
.map_err(|error| format!("initializing external server: {}", error))?;
let http_server_external = http_server_starter_external.start();
let http_server_internal = http_server_starter_internal.start();

Ok(Server { apictx, http_server_external, http_server_internal })
}
Expand Down Expand Up @@ -167,7 +202,8 @@ pub async fn run_server(config: &Config) -> Result<(), String> {
} else {
debug!(log, "registered DTrace probes");
}
let server = Server::start(config, &log).await?;
let internal_server = InternalServer::start(config, &log).await?;
let server = Server::start(internal_server).await?;
server.register_as_producer().await;
server.wait_for_finish().await
}
33 changes: 30 additions & 3 deletions nexus/test-utils/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -109,15 +109,42 @@ pub async fn test_setup_with_config(
.expect("Tests expect to set a port of Clickhouse")
.set_port(clickhouse.port());

let server =
omicron_nexus::Server::start(&config, &logctx.log).await.unwrap();
server
// Start the Nexus internal API.
let internal_server =
omicron_nexus::InternalServer::start(&config, &logctx.log)
.await
.unwrap();
internal_server
.apictx
.nexus
.wait_for_populate()
.await
.expect("Nexus never loaded users");

// Perform the "handoff from RSS".
//
// However, RSS isn't running, so we'll do the handoff ourselves.
let opctx = internal_server.apictx.nexus.opctx_for_service_balancer();
internal_server
.apictx
.nexus
.rack_initialize(
&opctx,
config.deployment.rack_id,
// NOTE: In the context of this test utility, we arguably do have an
// instance of CRDB and Nexus running. However, as this info isn't
// necessary for most tests, we pass no information here.
omicron_nexus::internal_api::params::RackInitializationRequest {
services: vec![],
datasets: vec![],
},
)
.await
.expect("Could not initialize rack");

// Start the Nexus external API.
let server = omicron_nexus::Server::start(internal_server).await.unwrap();

let testctx_external = ClientTestContext::new(
server.http_server_external.local_addr(),
logctx.log.new(o!("component" => "external client test context")),
Expand Down
1 change: 1 addition & 0 deletions nexus/tests/config.test.toml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ address = "[::1]:0"
[tunables]
# Allow small subnets, so we can test IP address exhaustion easily / quickly
max_vpc_ipv4_subnet_prefix = 29
# Disable background tests to help with test determinism
enable_background_tasks = false

[deployment]
Expand Down
13 changes: 13 additions & 0 deletions nexus/types/src/internal_api/params.rs
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,19 @@ pub struct ServicePutRequest {
pub kind: ServiceKind,
}

#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)]
pub struct DatasetCreateRequest {
pub zpool_id: Uuid,
pub dataset_id: Uuid,
pub request: DatasetPutRequest,
}

#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)]
pub struct RackInitializationRequest {
pub services: Vec<ServicePutRequest>,
pub datasets: Vec<DatasetCreateRequest>,
}

/// Message used to notify Nexus that this oximeter instance is up and running.
#[derive(Debug, Clone, Copy, JsonSchema, Serialize, Deserialize)]
pub struct OximeterInfo {
Expand Down
48 changes: 43 additions & 5 deletions openapi/nexus-internal.json
Original file line number Diff line number Diff line change
Expand Up @@ -255,11 +255,7 @@
"content": {
"application/json": {
"schema": {
"title": "Array_of_ServicePutRequest",
"type": "array",
"items": {
"$ref": "#/components/schemas/ServicePutRequest"
}
"$ref": "#/components/schemas/RackInitializationRequest"
}
}
},
Expand Down Expand Up @@ -674,6 +670,27 @@
"value"
]
},
"DatasetCreateRequest": {
"type": "object",
"properties": {
"dataset_id": {
"type": "string",
"format": "uuid"
},
"request": {
"$ref": "#/components/schemas/DatasetPutRequest"
},
"zpool_id": {
"type": "string",
"format": "uuid"
}
},
"required": [
"dataset_id",
"request",
"zpool_id"
]
},
"DatasetKind": {
"description": "Describes the purpose of the dataset.",
"type": "string",
Expand Down Expand Up @@ -1711,6 +1728,27 @@
}
]
},
"RackInitializationRequest": {
"type": "object",
"properties": {
"datasets": {
"type": "array",
"items": {
"$ref": "#/components/schemas/DatasetCreateRequest"
}
},
"services": {
"type": "array",
"items": {
"$ref": "#/components/schemas/ServicePutRequest"
}
}
},
"required": [
"datasets",
"services"
]
},
"Sample": {
"description": "A concrete type representing a single, timestamped measurement from a timeseries.",
"type": "object",
Expand Down
Loading