-
Notifications
You must be signed in to change notification settings - Fork 1.6k
/
component.yaml
164 lines (154 loc) · 9.71 KB
/
component.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
name: model_deploy
description: |
Deploys a Google Cloud Vertex Model to the Endpoint, creating a DeployedModel within it.
For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.endpoints/deployModel.
Args:
model (google.VertexModel):
Required. The model to be deployed.
endpoint (google.VertexEndpoint):
Required. The endpoint to be deployed to.
deployed_model_display_name (Optional[str]):
The display name of the DeployedModel. If not provided
upon creation, the Model's display_name is used.
traffic_split (Optional[Dict[str, int]]):
A map from a DeployedModel's ID to the percentage
of this Endpoint's traffic that should be forwarded to that DeployedModel.
If this field is non-empty, then the Endpoint's trafficSplit
will be overwritten with it. To refer to the ID of the just
being deployed Model, a "0" should be used, and the actual ID
of the new DeployedModel will be filled in its place by this method.
The traffic percentage values must add up to 100.
If this field is empty, then the Endpoint's trafficSplit is not updated.
dedicated_resources_machine_type (Optional[str]):
The specification of a single machine used by the prediction.
For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.endpoints#dedicatedresources.
dedicated_resources_accelerator_type (Optional[str]):
Hardware accelerator type. Must also set accelerator_count if used.
See https://cloud.google.com/vertex-ai/docs/reference/rest/v1/MachineSpec#AcceleratorType
for available options.
dedicated_resources_accelerator_count (Optional[int]):
The number of accelerators to attach to a worker replica.
dedicated_resources_min_replica_count (Optional[int]):
The minimum number of machine replicas this DeployedModel will be
always deployed on. This value must be greater than or equal to 1.
If traffic against the DeployedModel increases, it may dynamically be deployed
onto more replicas, and as traffic decreases, some of these extra replicas may be freed.
dedicated_resources_max_replica_count (Optional[int]):
The maximum number of replicas this deployed model may
the larger value of min_replica_count or 1 will
be used. If value provided is smaller than min_replica_count, it
will automatically be increased to be min_replica_count.
The maximum number of replicas this deployed model may
be deployed on when the traffic against it increases. If requested
value is too large, the deployment will error, but if deployment
succeeds then the ability to scale the model to that many replicas
is guaranteed (barring service outages). If traffic against the
deployed model increases beyond what its replicas at maximum may
handle, a portion of the traffic will be dropped. If this value
is not provided, will use dedicated_resources_min_replica_count as
the default value.
automatic_resources_min_replica_count (Optional[int]):
The minimum number of replicas this DeployedModel
will be always deployed on. If traffic against it increases,
it may dynamically be deployed onto more replicas up to
automatic_resources_max_replica_count, and as traffic decreases,
some of these extra replicas may be freed. If the requested value
is too large, the deployment will error.
automatic_resources_max_replica_count (Optional[int]):
The maximum number of replicas this DeployedModel may
be deployed on when the traffic against it increases. If the requested
value is too large, the deployment will error, but if deployment
succeeds then the ability to scale the model to that many replicas
is guaranteed (barring service outages). If traffic against the
DeployedModel increases beyond what its replicas at maximum may handle,
a portion of the traffic will be dropped. If this value is not provided,
a no upper bound for scaling under heavy traffic will be assume,
though Vertex AI may be unable to scale beyond certain replica number.
service_account (Optional[str]):
The service account that the DeployedModel's container runs as. Specify the
email address of the service account. If this service account is not
specified, the container runs as a service account that doesn't have access
to the resource project.
Users deploying the Model must have the `iam.serviceAccounts.actAs`
permission on this service account.
disable_container_logging (Optional[bool]):
For custom-trained Models and AutoML Tabular Models, the container of the
DeployedModel instances will send stderr and stdout streams to Stackdriver
Logging by default. Please note that the logs incur cost, which are subject
to Cloud Logging pricing.
User can disable container logging by setting this flag to true.
enable_access_logging (Optional[bool]):
These logs are like standard server access logs, containing information like
timestamp and latency for each prediction request.
Note that Stackdriver logs may incur a cost, especially if your project
receives prediction requests at a high queries per second rate (QPS).
Estimate your costs before enabling this option.
explanation_metadata (Optional[dict]):
Metadata describing the Model's input and output for explanation.
For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/ExplanationSpec#explanationmetadata.
explanation_parameters (Optional[dict]):
Parameters that configure explaining information of the Model's predictions.
For more details, see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/ExplanationSpec#explanationmetadata.
Returns:
gcp_resources (str):
Serialized gcp_resources proto tracking the deploy model's long running operation.
For more details, see https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/proto/README.md.
inputs:
- {name: model, type: google.VertexModel}
- {name: endpoint, type: google.VertexEndpoint, optional: true}
- {name: deployed_model_display_name, type: String, optional: true}
- {name: traffic_split, type: JsonArray, optional: true, default: '{}'}
- {name: dedicated_resources_machine_type, type: String, optional: true}
- {name: dedicated_resources_min_replica_count, type: Integer, optional: true, default: 0}
- {name: dedicated_resources_max_replica_count, type: Integer, optional: true, default: 0}
- {name: dedicated_resources_accelerator_type, type: String, optional: true}
- {name: dedicated_resources_accelerator_count, type: Integer, optional: true, default: 0}
- {name: automatic_resources_min_replica_count, type: Integer, optional: true, default: 0}
- {name: automatic_resources_max_replica_count, type: Integer, optional: true, default: 0}
- {name: service_account, type: String, optional: true}
- {name: disable_container_logging, type: Boolean, optional: true}
- {name: enable_access_logging, type: Boolean, optional: true}
- {name: explanation_metadata, type: JsonObject, optional: true, default: '{}'}
- {name: explanation_parameters, type: JsonObject, optional: true, default: '{}'}
outputs:
- {name: gcp_resources, type: String}
implementation:
container:
image: gcr.io/ml-pipeline/google-cloud-pipeline-components:0.1.9
command: [python3, -u, -m, google_cloud_pipeline_components.container.experimental.gcp_launcher.launcher]
args: [
--type, DeployModel,
--payload,
concat: [
'{',
'"endpoint": "', "{{$.inputs.artifacts['endpoint'].metadata['resourceName']}}", '"',
', "traffic_split": ', {inputValue: traffic_split},
', "deployed_model": {',
'"model": "', "{{$.inputs.artifacts['model'].metadata['resourceName']}}", '"',
', "dedicated_resources": {',
'"machine_spec": {',
'"machine_type": "',{inputValue: dedicated_resources_machine_type}, '"',
', "accelerator_type": "',{inputValue: dedicated_resources_accelerator_type}, '"',
', "accelerator_count": ',{inputValue: dedicated_resources_accelerator_count},
'}',
', "min_replica_count": ', {inputValue: dedicated_resources_min_replica_count},
', "max_replica_count": ', {inputValue: dedicated_resources_max_replica_count},
'}',
', "automatic_resources": {',
'"min_replica_count": ',{inputValue: automatic_resources_min_replica_count},
', "max_replica_count": ',{inputValue: automatic_resources_max_replica_count},
'}',
', "service_account": "', {inputValue: service_account}, '"',
', "disable_container_logging": "', {inputValue: disable_container_logging}, '"',
', "enable_access_logging": "', {inputValue: enable_access_logging}, '"',
', "explanation_spec": {',
'"parameters": ', {inputValue: explanation_parameters},
', "metadata": ', {inputValue: explanation_metadata},
'}',
'}',
'}'
],
--project, '', # not being used
--location, '', # not being used
--gcp_resources, {outputPath: gcp_resources},
]