New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add binary configmap #57938

Merged
merged 4 commits into from Jan 26, 2018

Conversation

@dims
Member

dims commented Jan 6, 2018

Reviving code from #33549 submitted by @zreigz

What this PR does / why we need it:
Add support for binary files in ConfigMap

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #32432

Special notes for your reviewer:

Sample API data:

apiVersion: v1
kind: ConfigMap
data:
  foo: text value
binaryData:
  bar: jmjKxUGNs/WqDQ==

Release note:

ConfigMap objects now support binary data via a new `binaryData` field. When using `kubectl create configmap --from-file`, files containing non-UTF8 data will be placed in this new field in order to preserve the non-UTF8 data. Use of this feature requires 1.10+ apiserver and kubelets.
@dims

This comment has been minimized.

Member

dims commented Jan 6, 2018

New variation based on discussion in #57899

@dims

This comment has been minimized.

Member

dims commented Jan 6, 2018

/assign @liggitt
/assign @thockin

@@ -4353,6 +4353,12 @@ func ValidateConfigMap(cfg *core.ConfigMap) field.ErrorList {
}
totalSize += len(value)
}
for key, value := range cfg.BinaryData {
for _, msg := range validation.IsConfigMapKey(key) {
allErrs = append(allErrs, field.Invalid(field.NewPath("binarydata").Key(key), key, msg))

This comment has been minimized.

@liggitt

liggitt Jan 7, 2018

Member

binaryData

This comment has been minimized.

@dims

dims Jan 7, 2018

Member

Fixed!

@@ -4353,6 +4353,12 @@ func ValidateConfigMap(cfg *core.ConfigMap) field.ErrorList {
}
totalSize += len(value)
}
for key, value := range cfg.BinaryData {

This comment has been minimized.

@liggitt

liggitt Jan 7, 2018

Member

also need to ensure there is no key duplication between data and binarydata maps

This comment has been minimized.

@dims

dims Jan 7, 2018

Member

Please see note from @smarterclayton here, there was an ask to remove that validation i believe:
9ba0da8#r100961600

This comment has been minimized.

@liggitt

liggitt Jan 7, 2018

Member

hmm... allowing the same key in both data and binaryData seems buggy and undefined...

also, from #33549 (comment):

Needs a much more descriptive comment, including a description of the validation rules (must not overlap with data)

This comment has been minimized.

@dims

dims Jan 7, 2018

Member

@smarterclayton can you please chime in?

@thockin

This comment has been minimized.

Member

thockin commented Jan 7, 2018

@dims

This comment has been minimized.

Member

dims commented Jan 7, 2018

/test pull-kubernetes-verify
/test pull-kubernetes-bazel-test

@dims

This comment has been minimized.

Member

dims commented Jan 8, 2018

What we have today is as follows

  • staging/src/k8s.io/api/core/v1/types.go:

    1. type ConfigMap struct
      • has field Data - which is a map of string to string
    2. type Secret struct
      • has field Data - which is a map of string to byte array
      • has write-only convenience field StringData - which is a map of string to string
      • has optional field Type - only possible value is "Opaque"
  • pkg/apis/core/types.go:

    1. type ConfigMap struct
      • has field Data - which is a map of string to string
    2. type Secret struct
      • has field Data - which is a map of string to byte array
      • has optional field Type

Proposal is to:

  • proposed field BinaryData to type ConfigMap (both spots) - which is a map of string to byte array, this will enable to functionality needed to save binary files using ConfigMap.

Problems still that need to be thought about:

  • Data field is different between Secret and ConfigMap, one takes map with byte array values and another takes string values. This is an existing problem, not introduced by this PR
  • with this PR we will have BinaryData in ConfigMap and StringData in ConfigMap (and Data in both!) which seems bad.
    1. Do we make sure both have BinaryData and StringData?
    2. In ConfigMap, we will have two buckets stored in etcd (Data and BinaryData), while in Secret, we have only one. Does it matter to the end user as long as we have validation rules that the buckets cannot have duplicate items?
  • Do we need a Type in ConfigMap to be consistent with Secret's Type?

Suggestions welcome!

( cc @thockin @liggitt @smarterclayton )

@liggitt

This comment has been minimized.

Member

liggitt commented Jan 8, 2018

Do we make sure both have BinaryData and StringData?

I don't think so. The existing data field in Secret objects can express both binary and text content. I don't think the improved usability of representing text as text outweighs the loss of compatibility with old clients that wouldn't know about any new field.

In ConfigMap, we will have two buckets stored in etcd (Data and BinaryData), while in Secret, we have only one. Does it matter to the end user as long as we have validation rules that the buckets cannot have duplicate items?

the objects are fundamentally different today, and ConfigMap is less expressive than Secret. duplicating the same data in multiple fields won't work (see https://github.com/kubernetes/community/blob/master/contributors/devel/api_changes.md#backward-compatibility-gotchas - "A single feature/property cannot be represented using multiple spec fields in the same API version simultaneously")

I probably wouldn't try to harmonize the two types here. This change makes ConfigMap as expressive as Secret.

@dims

This comment has been minimized.

Member

dims commented Jan 8, 2018

+1 @liggitt. i agree.

@roberthbailey roberthbailey removed their assignment Jan 8, 2018

@dims

This comment has been minimized.

Member

dims commented Jan 9, 2018

/assign @smarterclayton

},
"binaryData": {
"type": "object",
"description": "Data contains the binary data. Each key must consist of alphanumeric characters, '-', '_' or '.'."

This comment has been minimized.

@smarterclayton

smarterclayton Jan 9, 2018

Contributor

Better description here, and improve the existing data description. binaryData contains configuration data that is not required to be UTF-8 and .... Describe how the two fields validate together. Describe why I would use one or the other.

This comment has been minimized.

@dims

dims Jan 18, 2018

Member

Done

@smarterclayton

This comment has been minimized.

Contributor

smarterclayton commented Jan 9, 2018

I agree with Jordan's position - V2 can pick up the pieces, as long as validation is crisp, this is the best way I can see to add the feature without breaking old clients. The minor ugliness seems outweighed by the value in the config map.

I'd like to see the api description significantly improved so users understand when to use each, and understand that keys are exclusive. Improve both. Otherwise makes sense to do this change.

@dims

This comment has been minimized.

Member

dims commented Jan 9, 2018

@smarterclayton @liggitt thanks for the guidance, looks like we are on the right track. will add validation, update description, add some tests over the next few days.

@dims

This comment has been minimized.

Member

dims commented Jan 9, 2018

/hold

@@ -4383,8 +4383,22 @@ type ConfigMap struct {
// Data contains the configuration data.
// Each key must consist of alphanumeric characters, '-', '_' or '.'.
// The value must be a string (UTF-8). If the value has byte
// sequences not in the UTF-8 range, please use the BinaryData field.
// The keys stored in Data should not overlap with the keys in

This comment has been minimized.

@liggitt

liggitt Jan 10, 2018

Member

must not overlap

This comment has been minimized.

@dims

dims Jan 10, 2018

Member

Done

@@ -4383,8 +4383,22 @@ type ConfigMap struct {
// Data contains the configuration data.
// Each key must consist of alphanumeric characters, '-', '_' or '.'.
// The value must be a string (UTF-8). If the value has byte
// sequences not in the UTF-8 range, please use the BinaryData field.

This comment has been minimized.

@liggitt

liggitt Jan 10, 2018

Member

values with non-UTF-8 byte sequences must use the binaryData field

This comment has been minimized.

@dims

dims Jan 10, 2018

Member

Done

// +optional
Data map[string]string
// Data contains the binary data.

This comment has been minimized.

@liggitt

liggitt Jan 10, 2018

Member

binaryData

This comment has been minimized.

@dims

dims Jan 10, 2018

Member

Done

// Data contains the binary data.
// Each key must consist of alphanumeric characters, '-', '_' or '.'.
// BinaryData can contain byte sequences that are not in the UTF-8 range.
// The keys stored in BinaryData should not overlap with the ones in

This comment has been minimized.

@liggitt

liggitt Jan 10, 2018

Member

must not overlap

This comment has been minimized.

@dims

dims Jan 10, 2018

Member

Done

// Each key must consist of alphanumeric characters, '-', '_' or '.'.
// BinaryData can contain byte sequences that are not in the UTF-8 range.
// The keys stored in BinaryData should not overlap with the ones in
// the Data field. Since BinaryData is a new field (since 1.10), older

This comment has been minimized.

@liggitt

liggitt Jan 10, 2018

Member

Maybe just say "Requires 1.10+ apiserver and kubelet"

This comment has been minimized.

@dims

dims Jan 10, 2018

Member

Done

@cblecker

This comment has been minimized.

Member

cblecker commented Jan 19, 2018

/approve
For the hack/ test

@saad-ali

This comment has been minimized.

Member

saad-ali commented Jan 19, 2018

Don't want to derail this PR. But I want to make sure we don't make more mistakes in the Kubernetes v2 API around encoding.

Do you have thoughts on what you'd want this and Secret to look like in v2 API?

I hadn't thought a whole lot about it, but possibly something like:

type Secret struct {
  ...
  Data map[string]DataValue `json:"data"`
}

type ConfigMap struct {
  ...
  Data map[string]DataValue `json:"data"`
}

type DataValue struct {
  Value    string `json:"value"`
  Encoding string `json:"encoding"`
}
"data":{
  "file1": {"value":"text file data", "encoding":"UTF8"},
  "file2": {"value":"Gr26nF1UfAH4FQ", "encoding":"Base64"}
}

We must be careful about not mixing up "content transfer encodings" (like base64 and quoted-printable) and "character encodings" (like UTF-8, US-ASCII, GB 18030, etc.). The former encodes binary data in to text. The later encodes binary characters/strings in to bytes.

Looking at email SMTP headers is a great way to understand the difference:

Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Content-Type: image/png; name="logo.png"
Content-Transfer-Encoding: base64
Content-Type: application/pdf; name="test.pdf"
Content-Transfer-Encoding: base64

Because the SMTP protocol started out as purely ASCII text-based, unless plain ASCII is being transmitted, an SMTP client must encode non-ASCII data (including non-ASCII text data, like UTF-8) using a binary-to-text encoding scheme (like base64 or quoted-printable).

The recipient can then use the "Content-Transfer-Encoding" to convert from encoded string to bytes. It then uses the "Content-Type" to decode the bytes in to something meaningful. If the "Content-Type" is text/* it MUST also specify a charset or the character encoding to use to convert from bytes to a meaningful character string.

The JSON format doesn't support binary data fields, and if we intend to continue to support JSON as a Kubernetes API protocol in v2, we're in a similar predicament as SMTP. In handling this we will have to accept Content-Transfer-Encoding but be careful not to confuse it with charset.

Starting with @liggitt's suggestion:

type Secret struct {
  ...
  Data map[string]DataValue `json:"data"`
}

type ConfigMap struct {
  ...
  Data map[string]DataValue `json:"data"`
}

type DataValue struct {
  Value    string `json:"value"`
  Encoding string `json:"encoding"`
}
"data":{
  "file1": {"value":"text file data", "encoding":"UTF8"},
  "file2": {"value":"Gr26nF1UfAH4FQ", "encoding":"Base64"}
}

Let's consider the three possible cases we must support:

  1. Plain US-ASCII text data
    • No issue. No need to specify any encoding.
  2. Non-US-ASCII text data (e.g. UTF-8 data)
    • JSON is just a data format, if the transport protocol used to transmit the JSON encodes it using UTF-8, then there is no issue, and no need to specify any encoding. The sender must ensure that any text that they want to transmit is UTF-8 encoded.
    • If we can't make that assumption (and maybe we should not depend on it), then we must encode our non-ASCII string to a byte sequence, using for example UTF-8, and take the byte sequence and encode it to plain ASCII using a binary-to-text encoding scheme. However, if we do this then we need to be able to communicate both what the "Content-Transfer-Encoding" was (base64) and what the character encoding was (UTF-8).
  3. Non-text binary data
    • We will need to use a binary-to-text encoding scheme, let's say base64, to encode the binary data to text. So we will set "encoding Base64".

If the answer to 2 is that we will ensure the transport protocol handles text encoding, then for all text encoding we only need to specify "value" not "encoding". And encoding would only need to be specified for binary data. Presence of an encoding would indicate binary data. And encoding should be a "Content-Transfer-Encoding" (e.g. base64, quoted-printable, etc.), it should never be a character encoding (UTF-8, GB 18030, etc.) since that is assumed to be handled by the transport protocol.

If the answer to to 2 is that we can't depend on the transport protocol to properly encode text and can only assume the JSON is US-ASCII, then we must allow the user to specify "Content-Transfer-Encoding" for binary and both character set and "Content-Transfer-Encoding" for text.

// Using this field will require 1.10+ apiserver and
// kubelet.
// +optional
BinaryData map[string][]byte `json:"binaryData,omitempty" protobuf:"bytes,3,rep,name=binaryData"`

This comment has been minimized.

@saad-ali

saad-ali Jan 20, 2018

Member

How are byte arrays encoded in JSON by our API machinery?

This comment has been minimized.

@liggitt

liggitt Jan 20, 2018

Member

As base64-encoded strings

This comment has been minimized.

@liggitt

liggitt Jan 20, 2018

Member

As seen in Secret objects

This comment has been minimized.

@saad-ali
@dims

This comment has been minimized.

Member

dims commented Jan 20, 2018

@saad-ali +1 on all points. that was a great write up and will be very useful when we design V2 API.

@thockin

This LGTM - I don't see much else to pick at. Thanks. Just one request for comment.

totalSize += len(value)
}
if totalSize > core.MaxSecretSize {
allErrs = append(allErrs, field.TooLong(field.NewPath("data"), "", core.MaxSecretSize))
allErrs = append(allErrs, field.TooLong(field.NewPath(""), cfg, core.MaxSecretSize))

This comment has been minimized.

@thockin

thockin Jan 23, 2018

Member

Ugh, I hate this. BUt I don't have anything better. Leave a comment?

@dims

This comment has been minimized.

Member

dims commented Jan 23, 2018

/retest

@dims

This comment has been minimized.

Member

dims commented Jan 23, 2018

thanks @thockin i've added a comment and rebased the PR to latest master as well.

@k8s-merge-robot k8s-merge-robot removed the lgtm label Jan 23, 2018

@liggitt

This comment has been minimized.

Member

liggitt commented Jan 23, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Jan 23, 2018

@dims

This comment has been minimized.

Member

dims commented Jan 23, 2018

/test pull-kubernetes-e2e-kops-aws

1 similar comment
@dims

This comment has been minimized.

Member

dims commented Jan 23, 2018

/test pull-kubernetes-e2e-kops-aws

@dims

This comment has been minimized.

Member

dims commented Jan 25, 2018

@pmorie @saad-ali @thockin @matchstick - Can one of you please approve for "pkg/volume/configmap"? thanks!

@saad-ali

This comment has been minimized.

Member

saad-ali commented Jan 26, 2018

/lgtm
/approve

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Jan 26, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cblecker, dims, liggitt, saad-ali

Associated issue: #32432

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented Jan 26, 2018

/test all [submit-queue is verifying that this PR is safe to merge]

@dims

This comment has been minimized.

Member

dims commented Jan 26, 2018

/test pull-kubernetes-unit

@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented Jan 26, 2018

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented Jan 26, 2018

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-merge-robot k8s-merge-robot merged commit 27d01b5 into kubernetes:master Jan 26, 2018

12 of 13 checks passed

Submit Queue Required Github CI test is not green: pull-kubernetes-e2e-gce
Details
cla/linuxfoundation dims authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke-gci Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-unit Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details

@tengqm tengqm referenced this pull request Jan 27, 2018

Open

Document binaryData for ConfigMap #7108

1 of 2 tasks complete
@@ -148,7 +148,7 @@ not their metadata (e.g. the Data of a ConfigMap, but nothing in ObjectMeta).
obj interface{}
expect int
}{
{"ConfigMap", v1.ConfigMap{}, 3},
{"ConfigMap", v1.ConfigMap{}, 4},

This comment has been minimized.

@mtaufen

mtaufen Mar 14, 2018

Contributor

Shouldn't you have also updated the hash function to include the new binary data field in the hash? @liggitt @dims

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment