Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

包含中文的文件名上传群文件时出错 #698

Closed
Natsukage opened this issue Mar 14, 2023 · 7 comments
Closed

包含中文的文件名上传群文件时出错 #698

Natsukage opened this issue Mar 14, 2023 · 7 comments

Comments

@Natsukage
Copy link

上传纯数字与英文的文件名时候可以正常上传,所以其他的配置应该是没问题的,就不特地去贴日志和配置了。
但是只要换成包含汉字或日文等字符的文件名,上传时就会报错。
本地的文件因为一些原因不在上传前改名,导致文件完全无法成功上传。

Mirai窗口中的提示为:

2023-03-14 08:57:21 E/MAH Access: java.lang.IllegalArgumentException: Chars ':*?"<>|' are not allowed in path. RemoteFile path contains illegal char: '?'. path='=?utf-8?B?5rWLMTg4NDY1OTguemlw?='
java.lang.IllegalArgumentException: Chars ':*?"<>|' are not allowed in path. RemoteFile path contains illegal char: '?'. path='=?utf-8?B?5rWLMTg4NDY1OTguemlw?='
        at net.mamoe.mirai.internal.utils.FileSystem.checkLegitimacy(FileSystem.kt:17)
        at net.mamoe.mirai.internal.utils.FileSystem.normalize(FileSystem.kt:26)
        at net.mamoe.mirai.internal.contact.file.RemoteFilesImpl$Companion.findFileByPath(RemoteFilesImpl.kt:34)
        at net.mamoe.mirai.internal.contact.file.CommonAbsoluteFolderImpl.uploadNewFile$suspendImpl(AbsoluteFolderImpl.kt:400)
        at net.mamoe.mirai.internal.contact.file.CommonAbsoluteFolderImpl.uploadNewFile(AbsoluteFolderImpl.kt)
        at net.mamoe.mirai.contact.file.AbsoluteFolder.uploadNewFile$default(AbsoluteFolder.kt:194)
        at mirai-api-http-2.9.1.mirai2.jar//net.mamoe.mirai.api.http.adapter.internal.action.FileKt.onUploadFile(file.kt:62)
        at mirai-api-http-2.9.1.mirai2.jar//net.mamoe.mirai.api.http.adapter.http.router.FileKt$fileRouter$1$invoke$$inlined$httpAuthedMultiPart$1$1.invokeSuspend(dsl.kt:228)
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
        at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
        at kotlinx.coroutines.internal.LimitedDispatcher.run(LimitedDispatcher.kt:42)
        at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:95)
        at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:570)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:750)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:677)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:664)

但是手机电脑都是能正常上传中文文件名的群文件的,想问下这是MAH的限制,只允许上传英文文件名吗?多谢!

@ryoii
Copy link
Collaborator

ryoii commented Mar 14, 2023

你怎么传的,要不打开debug模式看看参数

@cssxsh
Copy link
Contributor

cssxsh commented Mar 14, 2023

path='=?utf-8?B?5rWLMTg4NDY1OTguemlw?='

=?utf-8?B?5rWLMTg4NDY1OTguemlw?=

我觉得这不太像汉字或日文

@Natsukage
Copy link
Author

这个不需要特地看debug模式,直接看错误内容就知道了
我是在C#中通过FlUrl的.AddFile()方法添加的文件。但是我测试确认过,即使通过C#自己的HttpClient与MultipartFormDataContent来提交数据,效果也是一样的,例如
(以下代码生成自ChatGPT)

using var httpClient = new HttpClient();
using var content = new MultipartFormDataContent();
byte[] fileBytes = await File.ReadAllBytesAsync("新建文本文档.txt");

 // 创建文件内容
var fileContent = new ByteArrayContent(fileBytes);
content.Add(fileContent, "file", "新建文本文档.txt");

查看header可以看到,filename=部分的值为 filename="=?utf-8?B?5paw5bu65paH5pys5paH5qGjLnR4dA==?=" 这样的格式,这也与MAH的报错一致。

   If a "filename" parameter is supplied, the requirements of
   Section 2.3 of [RFC2183] for the "receiving MUA" (i.e., the receiving
   Mail User Agent) apply to receivers of multipart/form-data as well:
   do not use the file name blindly, check and possibly change to match
   local file system conventions if applicable, and do not use directory
   path information that may be present.

   In most multipart types, the MIME header fields in each part are
   restricted to US-ASCII; for compatibility with those systems, file
   names normally visible to users MAY be encoded using the percent-
   encoding method in Section 2, following how a "file:" URI
   [URI-SCHEME] might be encoded.

   NOTE: The encoding method described in [RFC5987], which would add a
   "filename*" parameter to the Content-Disposition header field, MUST
   NOT be used.

显然,C#自身的MultipartFormDataContent()遵守了RFC 7578的章节4.2,对Non-ASCII字符串进行了转义,使文件名变为了类似于=?utf-8?B?5paw5bu65paH5pys5paH5qGjLnR4dA==?=的样子。但是MAH收到这串字符串后并没有将其转回来,而是直接报错了。

如果手搓轮子,自己手动构造Header的话,则MAH可以正常上传文件(中文文件名)

var hackedFileName = new string(Encoding.UTF8.GetBytes(fileName).Select(b => (char)b).ToArray());
streamContent.Headers.Add("Content-Disposition", $@"form-data; name=file; filename=""{hackedFileName}""; filename*=""{hackedFileName}""");
content.Add(streamContent);

这样构造出来的Header内容中,filename=“新建文本文档.txt“,MAH可以正常接收并上传文件。
但是这显然是和标准规范不符的,标准库和绝大多数第三方库也都并不支持这种做法。所以想确认一下是MAH的特性还是Bug

@ryoii
Copy link
Collaborator

ryoii commented Mar 14, 2023

我查看了源码和各种客户端实现,大多数客户端都在 filename 中使用了 utf-8 编码直接请求。我猜测是一种约定俗成。

根据RFC的规定,filename 里使用的是 URL 编码,如需指定编码,则需要追加到 filename* 参数,形如 filename*=<charset>''<content>。显然,将上述格式放到 filename 里是不合理的。

另外通过源码调试发现,mah 使用的 ktor 引擎只支持从 filename 中获取 originalFileName

@ryoii
Copy link
Collaborator

ryoii commented Mar 14, 2023

通过 Postman 等工具抓包得到的结果,是直接使用 utf-8 编码直接请求
image

@ryoii
Copy link
Collaborator

ryoii commented Mar 14, 2023

回复中既然提到了 RFC7578, 它里面的描述是这样的

NOTE: The encoding method described in [[RFC5987](https://www.rfc-editor.org/rfc/rfc5987)], which would add a
   "filename*" parameter to the Content-Disposition header field, MUST
   NOT be used.

   Some commonly deployed systems use multipart/form-data with file
   names directly encoded including octets outside the US-ASCII range.
   The encoding used for the file names is typically UTF-8, although
   HTML forms will use the charset associated with the form.

如多数客户端一样,可以采用 UTF-8 编码直接发送。另外 filename 中不得出现 filename* 中提及的 encodeing method

从结果上看,ktor 是满足 RFC7578 的。但是不满足 RFC6266 4.3 中提到的 filenamefilename* 同时存在的兼容问题

@Natsukage
Copy link
Author

Natsukage commented Mar 14, 2023

原来如此,我看到non-ASCII需要转码这里想当然了,没留意到标准的filename应该用url编码,以为=?utf-8?B?5paw5bu65paH5pys5paH5qGjLnR4dA==?=这种就是正确编码后的结果了
看来是.NET Standard 2.0用的MultipartFormDataContent有问题,那看来还是免不了得自己造轮子
多谢解答!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants