-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support characters in the Unicode Astral Plane #7030
Comments
Upon further investigation, this appears to be a result of ownCloud using the utf8 charset with its MySQL backend, which only supports characters up to three bytes long. Perhaps switching to utf8mb4 would be sufficient to fix this and the related issues? Right now the database truncates entries at the first 4-byte character, leading to invalid objects being stored that cause problems when loaded back into ownCloud. |
If we do, we'd need to make sure this also works with other databases. |
This is a MySQL only problem. Everyone else does this properly. utf8mb4 |
Moving to utf8mb4 has an impact on the index length as @bantu pointed out https://area51.phpbb.com/phpBB/viewtopic.php?f=108&t=44807#p258271 We will really run into issues here from a conceptual point of view as our indexes are 'optimized' to fit into 3x255 Would it make sense to extend you db schema xml to give us the possibility to choose utf8mb4? |
We have to check the compatibility with older and other databases. And we also have to consider the increased space requirements and the decreased speed that this would mean. I think a valid option would be to just not support characters like that. |
As I said, this is a MySQL only issue. All other DBMSes support 4 byte utf8 characters just fine.
There are basically no additional space requirements, both utf8 and utf8mb4 use a variable number of 8bit blocks (this is what the 8 stands for). The difference is that utf8 only supports up to three bytes, while utf8mb4 supports up to four bytes (and considering RFC3629 basically all utf8 characters). Four bytes per character will only be used when required.
The only good way of doing this (in terms of complexity and required work) in my opinion is to just switch from utf8 to utf8mb4 and require MySQL 5.5.3. The only remaining concern is key/index size considerations. |
not sure if it maybe related, I tested the mapper.php file on owncloud 6.0.2 \lib\private\files There is a private function slugify($text) it seems it is doing a job to change the file name stored onto the physic_path in mysql table oc_file_map in a way that unicode part of the file name is removed. for example, (unicdoename).pdf becomes -.pdf in physic_path stored into the database. in mysql oc_file_map table I think it can be one of the reason why unicode files will have problem, become considering two unicdoe file name: (unicode1).pdf j:\datapath(unicode1).pdf under current arrangement, both file will be stored in physic_path as I tested that, if private function slugify($text) simply returns $text, but still, there maybe some other function need to be modified so that the correct unicode file name can also stored into the datafolder (and keeping the unicode file name). Andrew |
Please don't push this back too much. Without it any notes, calendar, txt app can't be used in a professional environment as it's too unreliable. Any field using an emoji will be saved as empty text. |
Im totally 100% with what @oparoz said! Im not that deep into coding, im more a frontend developer. But as i think, this problem seems not very hard to get solved. But it is a big problem and a long time showstopper for me to suggest owncloud to others. I dont understand why it took so long to get this working, cause other projects which are working with sabredav and mysql got it working a long time ago. Example software for this would be http://baikal-server.com/. |
Please be aware that there were a lot of other more important bugs that needed to be fixed and that resources/time are limited, which is why this issue here hasn't been fixed yet. As this is an open-source project, you and others are free to look into the issue too and submit a proposal / pull request that fixes it. Even documentation/research details about how to fix can be useful and save some time. Thanks for your understanding. |
To ensure greater consistency with oc8.1 we properly detect astral plane characters and throw exceptions to the clients (browser, desktop and mobile). Fully support astral plane chars support is moved to the backlog with respect to files. With respect to contacts and calendar we need to find a way to make things work on mysql - urlencoding is still my idea to fight this issue ... |
@DeepDiver1975 does mb4 support cover this ? If yes, please close |
Please try again with 10.0.4 which supports emojis when MySQL is configured properly for mb4 support. If emojis work but not astral plane chars, please reopen. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Expected behaviour
Users can use all Unicode characters when naming files, entering calendar appointments, or saving contacts.
Actual behaviour
Using characters outside the Basic Multilingual Plane causes severe problems on both the web interface and through WebDAV-based sync services.
Steps to reproduce
Server configuration
Operating system: Ubuntu 12.04.4 LTS
Web server: Apache/2.2.22
Database: 5.5.35-0ubuntu0.12.04.2
PHP version: 5.3.10-1ubuntu3.9
ownCloud version: ownCloud 6.0.1 (stable)
Updated from an older ownCloud or fresh install: updated from 6.0.0
Client configuration
Browser: Safari 7.0.1
Operating system: Mac OS X 10.9.1
Logs
ownCloud log (data/owncloud.log)
Example with contact card:
OCA\Contacts\Contact::retrieve Error parsing carddata for: 907 Invalid VObject. Document ended prematurely.
Related Issues
The text was updated successfully, but these errors were encountered: