Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable WebClient to send binary data #513

Closed
wants to merge 1 commit into from

Conversation

clavin
Copy link
Contributor

@clavin clavin commented Mar 17, 2018

Summary

Enables the WebClient to send Buffers and Streams as binary data via multipart/form-data requests.

web.files.upload({
    file: new Buffer('Hi! I like crème-filled chocolate eggs!'),

    // or:
    file: fs.createReadStream('image/of/chocolate_egg.jpg'),

    // or:
    file: new ReadableEggFileStream({ type: 'chocolate' }),
});

This is mostly done using a simple WebClientBinaryData container which signals to WebClient that the request should be sent as multipart/form-data. All top-level arguments that are Buffers or Streams are automatically wrapped in WebClientBinaryData for convenience.

WebClientBinaryData is exposed for future cases where a user might specifically want the multipart/form-data file name to be something specific.

web.some.method({
    binary_argument: new WebClientBinaryData(bufferOrStream, 'binary data name'),
});

If the name argument is omitted (as it is when wrapping Buffers and most Streams), then a random 24-character name is generated to ensure that the data will be sent.

Best of all, this solution is generic, so there's nothing in it specific to files.upload. The only cost of this generic solution is the multipart/form-data name of the file won't be the same as the filename argument, if provided, but this shouldn't matter as the Web API itself handles renaming the file to the provided value of filename.

Follow-up

If this and #512 are merged, any binary arguments (file of files.upload and image of users.setPhoto) should be modified to also accept WebClientBinaryData.

Requirements (place an x in each [ ])

@codecov
Copy link

codecov bot commented Mar 17, 2018

Codecov Report

Merging #513 into master will decrease coverage by <.01%.
The diff coverage is 85.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #513      +/-   ##
==========================================
- Coverage   81.32%   81.32%   -0.01%     
==========================================
  Files           6        6              
  Lines         241      257      +16     
  Branches       37       43       +6     
==========================================
+ Hits          196      209      +13     
- Misses         29       30       +1     
- Partials       16       18       +2
Impacted Files Coverage Δ
src/WebClient.ts 80.95% <85.71%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a206c3e...ad2c4d3. Read the comment docs.

@aoberoi
Copy link
Contributor

aoberoi commented Mar 21, 2018

i like the fact that this makes sending binary data arguments via the WebClient less dependent on the argument name or method name (this fixes bug #452 - and as you said enables users.setPhoto to work).

i'm not sure how i feel about creating the new top-level export of WebClientBinaryData since it seems a bit heavy for the functionality it actually adds -- which is a randomly generated filename if none is specified. IMHO this could be achieved simply by swapping out the this.logger.warn() call for the bit of code that generates the name. is there some other benefit of this type that I am missing?

i see these two changes as orthogonal and so i'd like to discuss them separately (unless you see some reason they make more sense being grouped into one change).

@clavin
Copy link
Contributor Author

clavin commented Mar 22, 2018

tl;dr: because implementation logistics and developer experience (DX) 👏 👍 ✨.


It's important to remember that WebClientBinaryData is not only there to generate a name, but also store the name and the file data--it's a container that can be recognized.

I made a personal call on the random name generation because:

  • The multipart/form-data file name will probably never matter outside of file uploads (be it files.upload or a new method); thus, a user shouldn't be forced to supply a file name in these (notably less common) cases.
  • When the filename is significant, the user will either be using a file stream from fs where they expect the filename to transfer (which it does), or they're supplying data from a different stream where one wouldn't expect this SDK to magically guess the name, hence a developer should be willing to supply a name (via filename or by using the WebClientBinaryData container).

But why make a whole new type?

Swapping out the this.logger.warn call with some code to generate a name field on the accompanying Buffer isn't the only work to be done in this change. You also need to:

  • Recognize binary fields once to set containsBinaryData to true
  • Recognize them once more when adding them to the form so proper options can be forwarded to form.append.

The container approach makes changing the implementation somewhat trivial, as seen in this PR: just wrap any binary parameters in the container, then only supply options to form.append for anything that's in the container.

So why can't we just use some kind of special object shape?

A practical solution would be to have the user just supply an object with name and data fields, i.e.:

web.files.upload({
    file: {
        data: someBuffer,
        name: 'magical kitten gifs'
    }
});

In contrast, I find the above breaks some kind of unspoken rule about the API method bindings, wherein we've added our own unique structure to the method, which has some consequences:

An application that accepts key-value pairs based on user data (i.e. { [username]: someValue }) then a malicious attacker could force API requests to be sent as multipart/form-data simply by targeting the right key (i.e. with a username of data) for any method (remember, this solution is generic!). While this would probably never lead to any security holes, it might cause false positives and is a waste of resources.

If a developer is reading the source of an application that uses this SDK, but making their app/bot in anything but this library, they can't really transfer ideas from the application they're attempting to learn from. I can say I've personally learned about numerous concepts and APIs just by reading open source code, be it in a different programming language or with a different library than what I might use. Making this SDK's structure differ from the official API's parameter structure breaks this ability to learn about the Slack API from consumers' apps.

An example situation of the above

Consider this situation: you want to write a Slack application in Lua, but you know nothing about Slack's API. You get some of the gist from the official API documentation, but you really want to see some code to solidify the idea.

So you search on GitHub for some other Slack apps. You come across GitHub's slack integration. You know some JS, so you're able to read the source and transfer the ideas.

At some point, you want to use the files.upload method in your Lua app. You see this in the (theoretically v4-based) code:

web.files.upload({
    file: {
        data: someBuffer,
        name: 'magical kitten gifs'
    }
});

So you believe that the Slack API accepts a file object with two fields. Thus, you write for your app:

web.files.upload({
    file = {
        data = someDataObject,
        name = 'magical kitten gifs'
    }
})

Alas, you run into an API error.



The WebClientBinaryData type solves this issue by signalling to users that binary fields aren't handled by Slack's API, but rather an implementation detail of whatever library is used to communicate with the API, as other object-like fields

This also somewhat carries over in code readability; it's easier to identify the data source of a binary field at a quick glance if the data field is always a single object (Stream/Buffer/WebClientBinaryData), not sometimes a Stream/Buffer, sometimes an object shape.

I'll agree, functionality-wise, the WebClientBinaryData type is shallow and at first glance seems like an odd choice of making a single function into an entire class; however, the class also exists to satisfy much stronger and subliminal psychological and logistical constraints as I've shown above.


Side note: one more possible "solution"

There is one solution that I haven't mentioned above: exporting a function that somehow attaches the name field to an input data source (a Stream or Buffer), i.e.:

import { attachNameMetadata } from '@slack/client';

web.files.upload({
    file: attachNameMetadata(bufferOrStream, "super top-secret kitten gif")
});

This function is limited to not being able to return an object because either it returns an object shape, which I've debunked above, or it returns a custom type, which is the solution I've already suggested above, without the need to call a function.

The other option is for the function to set the name field of the input data source and then returning the source, but behavior like this (mutating function parameters) is generally considered bad, and is unreliable.

Also, it's actually surprisingly hard to name this function in a way that both conveys its functionality and also keeps developers at ease knowing it doesn't do anything bad (i.e. setFileName is a bad name because it implies the bad behavior of setting the name field on the input data source).

For those reasons, I eliminated this solution early on.


These reasons are very niche and opinionated, and if you're not sold, I'm more than happy with implementing any of the other solutions I mentioned. I found that the runtime cost of creating one object in the uncommon case of binary data was worth improving the developer experience for consumers and helping consumers write intuitive code for others, all at what feels like little to no cost.

@aoberoi
Copy link
Contributor

aoberoi commented Mar 22, 2018

It's important to remember that WebClientBinaryData is not only there to generate a name, but also store the name and the file data--it's a container that can be recognized.

Sure, its implemented as a container, but does the user care? The user just wants to use the method. Wrapping things up in another container is an additional cognitive step towards reaching that goal. AFAICT the ability to be recognized is only to the benefit of the RTMClient, not to the user or the user's application. However, the RTMClient can just as easily recognize a Buffer or a Stream (as your PR already shows). For instance, I can't imagine a scenario where an application would keep around a data structure of WebClientBinaryDatas around.

The container approach makes changing the implementation somewhat trivial, as seen in this PR: just wrap any binary parameters in the container, then only supply options to form.append for anything that's in the container.

I guess my point is, its more trivial for the user without a new type. Just supply a file argument that has the binary data and optionally a filename argument to override what may or may not be specified in the binary data (the detection code inside the WebClientBinaryData constructor seems useful for this).

In contrast, I find the above breaks some kind of unspoken rule about the API method bindings, wherein we've added our own unique structure to the method...

I agree, I'd like to preserve the structure as much as possible, the chief reason being that the general reference documentation at https://api.slack.com/methods should map cleanly to the methods and arguments presented by this library.


You make multiple appeals to the idea that the WebClientBinaryData container "attaches" a name field to the binary data, and that's why this container is useful. I just want to illustrate that the Slack Web API (and HTTP) already solve this problem:

In the multipart/form-data body, each part has a header. For file uploads the part's header contains name="file". It may (and is recommended) also contain filename="foo.txt". If you think of that part as a "container" it wraps both the file and filename. The Slack Web API allows for an additional part, whose header has name="filename" that behaves as an override for the (only optionally present) filename attribute in the part I previously described.

Now from the perspective of this package, the most clean mapping to HTTP is to map the file argument to the part with name="file" (lets call this part A), and the filename argument to the part with name="filename" (lets call this part B). When the value of the file argument is some binary type that has hints of its filename, use some "magic" to populate the filename="" attribute of part A.
If there are no hints, that's fine, its optional anyway. The filename argument already allows the user to specify an explicit override anyway.

So you see, file on its own is a container.


It might be useful to compare this implementation with the one I'm theoretically suggesting. I'm happy to open a new PR based on this branch to illustrate, and then we'd have something concrete to compare.


edit: i've learned that the statement about the filename attribute being optional isn't entirely correct. while in HTTP it is optional, Slack doesn't accept parts with binary data that don't include that attribute.

@aoberoi
Copy link
Contributor

aoberoi commented Mar 22, 2018

@clavin #519 is where I'm suggesting we go with this. please take a look if you have a chance. if we can agree on it being a valid next step (whether or not you'd still like to go further with the new class), then i'd like to make a release today (4.1.0). there's some demand for a release that contains landed patches like #500.

update: i'm pretty happy with #519 now, would like to merge ASAP.

@clavin
Copy link
Contributor Author

clavin commented Mar 23, 2018

@aoberoi That PR has my approval 👍

#519 turns my "it works" PR into a "it just works" solution, which I like. I was definitely too caught up in the idea of giving the consumer an option to specify the multipart/form-data filename to consider a solution that doesn't have that kind of unneeded/contrived configurability.

I also commend you on the extra details in the comments and test names, some of that flew right over my head 👏 👏 👏

@clavin clavin closed this Mar 23, 2018
@clavin clavin deleted the feat-webclient-binary-data branch December 15, 2018 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants