Skip to content

Conversation

@squeakymouse
Copy link
Contributor

Client functions + some documentation for File API

@squeakymouse squeakymouse requested a review from yixu34 July 21, 2023 09:07

id: str = Field(..., description="ID of the requested file.")
"""ID of the requested file."""
filename: str = Field(..., description="File name.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: newline spacing to let things breathe a bit?


class File(APIEngine):
"""
File API. This API is used to upload private files to Scale so that fine-tunes can access them for training and validation data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't mention Scale here, because this API will eventually work for self-hosted users too. The client can after all work in dual modes, talking to either the Scale-hosted or self-hosted LLM Engine server.

"""

@classmethod
def upload(cls, file_path: str) -> UploadFileResponse:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think to start, we can do a file instead of a path.

@classmethod
def get(cls, file_id: str) -> GetFileResponse:
"""
Get filename and size of a file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would suggest generalizing this to the metadata of a file, in case we add more.

@classmethod
def list(cls) -> ListFilesResponse:
"""
List all files, with information about their filenames and sizes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing w.r.t metadata.

return DeleteFileResponse.parse_obj(response)

@classmethod
def get_content(cls, file_id: str) -> GetFileContentResponse:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to call this download to have symmetry with your upload function. Granted, from the server's perspective, get_content makes more sense, but given that we're in the context of the client, I think it's ok to have the functions be client-centric.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the response be GetFileContentResponse or DownloadFileResponse, then? (I.e. is it more important for this naming to be consistent in the client, or for the client DTOs to be copy-pastable from the server DTOs?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say it's more important to be consistent in the client, because per above, from the server's perspective it's more about getting (extracting) content, whereas from this client, you're more specifically downloading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll change it to DownloadFileResponse in the client, but I'm concerned it might be overridden if someone re-copies the DTOs from the server in the future 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait sorry what I meant was that the client methods should be named upload/download, but it makes it easier to avoid mistakes when copying DTOs, the DTOs can follow the server. Also curious if @phil-scale or Will have thoughts on this.

f"v1/files/{file_id}/content",
timeout=DEFAULT_TIMEOUT,
)
return GetFileContentResponse.parse_obj(response)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually maybe it is fine to have this always be a string - we just need to document the expectations, e.g. if you uploaded text, then it'll ofc be that text; else if it's binary, we'll return it as a string subject to some encoding.

@squeakymouse squeakymouse requested a review from yixu34 July 31, 2023 19:50
@squeakymouse squeakymouse merged commit 24f6a32 into main Aug 1, 2023
@squeakymouse squeakymouse deleted the katiewu/file-api-functions branch August 1, 2023 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants