Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Filename encoding with gbk standard should be supported #5676

Closed
ericlin1001 opened this issue Apr 3, 2017 · 23 comments
Closed

[Feature] Filename encoding with gbk standard should be supported #5676

ericlin1001 opened this issue Apr 3, 2017 · 23 comments
Assignees
Labels
ReadyToTest QA, please validate the fix/enhancement
Milestone

Comments

@ericlin1001
Copy link

Steps to reproduce

  1. In linux, use owncloud client(owncloud version 2.3.0 Using Qt 4.8.6)
  2. In the owncloud syncing directory.
  3. touch 乱码.txt
  4. convmv -f utf-8 -t gbk 乱码.txt
  5. owncloud complained: *???.txt could not be synced due to errors. *
  6. In log, I found: File ignored because of invalid utf-8 sequence: ???.txt

Expected behaviour

Tell us what should happen
The file with name encoding in gbk, should be synced. Because I'm converting file name encoding of a large amount of files in linux, and these files are to be used in windows(win7).

Actual behaviour

Tell us what happens instead

  1. owncloud complained: *???.txt could not be synced due to errors. *
  2. In log, I found: File ignored because of invalid utf-8 sequence: ???.txt

Server configuration

Operating system:
LSB Version: core-2.0-amd64:core-2.0-noarch:core-3.0-amd64:core-3.0-noarch:core-3.1-amd64:core-3.1-noarch:core-3.2-amd64:core-3.2-noarch:core-4.0-amd64:core-4.0-noarch:core-4.1-amd64:core-4.1-noarch:security-4.0-amd64:security-4.0-noarch:security-4.1-amd64:security-4.1-noarch
Distributor ID: Ubuntu
Description: Ubuntu 14.04.5 LTS
Release: 14.04
Codename: trusty

Web server:

Database:

PHP version:

ownCloud version: (see ownCloud admin page)

Updated from an older ownCloud or fresh install:

Where did you install ownCloud from:

Signing status (ownCloud 9.0 and above):

Login as admin user into your ownCloud and access 
http://example.com/index.php/settings/integrity/failed 
paste the results here.

List of activated apps:

If you have access to your command line run e.g.:
sudo -u www-data php occ app:list
from within your ownCloud installation folder

The content of config/config.php:

If you have access to your command line run e.g.:
sudo -u www-data php occ config:list system
from within your ownCloud installation folder

or 

Insert your config.php content here
(Without the database password, passwordsalt and secret)

Are you using external storage, if yes which one: local/smb/sftp/...

Are you using encryption: yes/no

Are you using an external user-backend, if yes which one: LDAP/ActiveDirectory/Webdav/...

LDAP configuration (delete this part if not used)

With access to your command line run e.g.:
sudo -u www-data php occ ldap:show-config
from within your ownCloud installation folder

Without access to your command line download the data/owncloud.db to your local
computer or access your SQL server remotely and run the select query:
SELECT * FROM `oc_appconfig` WHERE `appid` = 'user_ldap';


Eventually replace sensitive data as the name/IP-address of your LDAP server or groups.

Client configuration

Browser:

Operating system:

Logs

Web server error log

Insert your webserver log here

ownCloud log (data/owncloud.log)

Insert your ownCloud log here

Browser log

Insert your browser log here, this could for example include:

a) The javascript console log
b) The network log 
c) ...
@hodyroff
Copy link

hodyroff commented Apr 3, 2017

We are currently preparing 2.3.1 packages with QT5.6 as new Linux packages, those will be worth a try first and then we might be able to look into this.

@guruz guruz added this to the 2.3.2 milestone Apr 4, 2017
@guruz
Copy link
Contributor

guruz commented Apr 4, 2017

@guruz guruz changed the title Filename encoding with gbk in Linux, will not be synced! [Linux] Filename encoding with gbk will not be synced Apr 4, 2017
@hodyroff
Copy link

hodyroff commented Apr 4, 2017

@guruz
Copy link
Contributor

guruz commented Apr 4, 2017

I heard from the other engineers that we only support UTF-8 on the client side, sorry.

@guruz guruz closed this as completed Apr 4, 2017
@ericlin1001
Copy link
Author

Ok, that's fine. Though I'm still hoping it'll support all file name encoding. Because with it, life will be more easy.

@hodyroff hodyroff changed the title [Linux] Filename encoding with gbk will not be synced [Feature] Filename encoding with gbk standard should be supported Apr 5, 2017
@hodyroff hodyroff removed this from the 2.3.2 milestone Apr 5, 2017
@hodyroff
Copy link

hodyroff commented Apr 5, 2017

Lets keep it as a feature request, but if not heavily requested from Chinese users this will not be implemented. If you could add some reasoning in regards of the use case and benefits rather then using UTF-8 only this would also be appreicated.

@hodyroff hodyroff reopened this Apr 5, 2017
@guruz
Copy link
Contributor

guruz commented Apr 9, 2017

My wife (who is Chinese) says:

it is the most used codec in mainland china
big5 is the one for taiwan.

@ericlin1001
Copy link
Author

Yes, in China, most of Chinese computer users use GBK encoding in Windows, even Linux.

@guruz guruz added this to the 2.4.0 milestone Jul 4, 2017
ogoffart added a commit that referenced this issue Jul 4, 2017
Issues:
 - #5661   On mac, iconv did not support all of unicode and some
           files with emoji in the filename could not be uploaded

 - #5719 , #5676  On linux, we will now support non utf-8 locale
ogoffart added a commit that referenced this issue Jul 13, 2017
Issues:
 - #5661   On mac, iconv did not support all of unicode and some
           files with emoji in the filename could not be uploaded

 - #5719 , #5676  On linux, we will now support non utf-8 locale
@ogoffart ogoffart added the ReadyToTest QA, please validate the fix/enhancement label Jul 17, 2017
@guruz
Copy link
Contributor

guruz commented Sep 11, 2017

@ericlin1001 Do you want to try this using a 2.4 nightly build?
We have a testpilot package available https://download.owncloud.com/desktop/daily/testpilotcloud-client-2.4.0.8373nightly20170911.linux-repo.html

@guruz
Copy link
Contributor

guruz commented Oct 7, 2017

@guruz
Copy link
Contributor

guruz commented Oct 17, 2017

@ericlin1001 ping

@ckamm
Copy link
Contributor

ckamm commented Oct 26, 2017

@guruz @ogoffart I've tested with the example from the first post and it still doesn't work. The client attempts to convert the gbk characters to utf8 and fails, showing a "Filename encoding is not valid" error. That's done in SyncEngine::treewalkFile. I don't think the client is prepared to deal with non-unicode filenames?

EDIT: Oh, 'codecForLocale' is used in the conversion to unicode in csync? I'll try again with a different locale then.
EDIT2: Couldn't convince Qt to not use UTF8 as codecForLocale. Any ideas?

@guruz guruz modified the milestones: 2.4.0, 2.5.0 Oct 30, 2017
@guruz guruz removed the ReadyToTest QA, please validate the fix/enhancement label Oct 30, 2017
@guruz
Copy link
Contributor

guruz commented Oct 30, 2017

Moved to 2.5

@ogoffart
Copy link
Contributor

@ckamm, but what's your file system encoding, and the environment variable that specify the encoding (LC_ALL/LC_CTYPE/LANG)?
QString::fromLocal8Bit should do the right thing.
If the filename is invalid on the file system, then there is nothing that can be done.

@ckamm
Copy link
Contributor

ckamm commented Nov 14, 2017

@ogoffart I agree, it's just that none of the env variables I tried had an effect on the system locale. So what I report is a failure of testing and not indicative of a bug on its own.

@ogoffart
Copy link
Contributor

I tested:

unset LC_ALL
unset LC_CTYPE
export LANG=en_US

(en_US is specified in my /etc/locale.gen as en_US ISO-8859-1)
I verified that the codecForLocale is indeed latin1.
Filename that can be encoded as latin1 works as expected.

However the bug i noticed is that if the server contains filename that cannot be encoded in the user's locale, Qt uses replacement character instead ('?'). And next sync rename the file with the replacement character.

ogoffart added a commit that referenced this issue Nov 21, 2017
This is mainly for linux, whose local is not UTF-8.
For example, in latin1, it is not possible to encode emoji or chinese character.
If there are such character in the filename, Qt would just save the file using
the replacement character ('?'). Then, on the next sync, client would rename
the files using this replacement character.

Avoid this by ignoring the files which cannot be downloaded because the
filename cannot be represented with the user's locale

Relates to issue #5676 and #5719
ogoffart added a commit that referenced this issue Nov 23, 2017
This is mainly for linux, whose local is not UTF-8.
For example, in latin1, it is not possible to encode emoji or chinese character.
If there are such character in the filename, Qt would just save the file using
the replacement character ('?'). Then, on the next sync, client would rename
the files using this replacement character.

Avoid this by ignoring the files which cannot be downloaded because the
filename cannot be represented with the user's locale

Relates to issue #5676 and #5719
@ogoffart ogoffart added the ReadyToTest QA, please validate the fix/enhancement label Nov 28, 2017
ckamm added a commit that referenced this issue Jan 10, 2018
There's an upstream bug where QTextCodec::canEncode returns true even
though it should be false. This works around that issue and adds a test.

The original work was done in 72809ef

See #6287, #5676, #5719
See https://bugreports.qt.io/browse/QTBUG-6925
ckamm added a commit that referenced this issue Jan 10, 2018
There's an upstream bug where QTextCodec::canEncode returns true even
though it should be false. This works around that issue and adds a test.

The original work was done in 72809ef

See #6287, #5676, #5719
See https://bugreports.qt.io/browse/QTBUG-6925
ckamm added a commit that referenced this issue Jan 10, 2018
There's an upstream bug where QTextCodec::canEncode returns true even
though it should be false. This works around that issue and adds a test.

The original work was done in 72809ef

See #6287, #5676, #5719
See https://bugreports.qt.io/browse/QTBUG-6925
@guruz
Copy link
Contributor

guruz commented Jul 23, 2018

2.5.0 beta1 is out :-)
https://central.owncloud.org/t/desktop-sync-client-2-5-0-beta1-released/14667
Everyone, please comment here if we can close this issue. Thank you.

@jnweiger
Copy link
Contributor

jnweiger commented Aug 5, 2018

Mint Tara (aka Ubuntu 18.04)

testpilotcloud version 2.5.0daily20180801 (build 9974)

touch 乱码.txt
... syncs fine.
convmv -f utf-8 -t gbk 乱码.txt --notest
... but then we still get a sync error as reported by the OP:
grafik

This feature does not seem to be ready for 2.5.0 release.

@ckamm I understand the expected outcome should be: file ignored.
Maybe 7d70f1b only sets the ignore flag when down-syncing? This was asked for upwards (too).

My personal opinion (aka server enhancement idea): filenames should be opaque byte streams for us. Whatever bytes a user puts into owncloud, should store that and bring it back unchanged. Interpreting encoding of a filename is a problem of the representation layer, not of the transport layer.

@michaelstingl
Copy link
Contributor

My personal opinion (aka server enhancement idea): filenames should be opaque byte streams for us.

@jnweiger Could you open or link to an issue in core and elaborate on your idea?

@ogoffart
Copy link
Contributor

ogoffart commented Aug 6, 2018

@jnweiger that's because your locale is still UTF-8. So it is expected that when a filename is not in UTF-8, we show one error for this file. (but we still continue to sync the other files)

My personal opinion (aka server enhancement idea): filenames should be opaque byte streams for us. Whatever bytes a user puts into owncloud, should store that and bring it back unchanged. Interpreting encoding of a filename is a problem of the representation layer, not of the transport layer.

That's not really possible as you need to display the filename in the web interface from HTML whose encoding is UTF-8. And since you need to sync, you need to compare the filename for equality. So we need to understand what it is. And we can't sync invalid filename.

@jnweiger
Copy link
Contributor

jnweiger commented Aug 6, 2018

Client observing the local locale is actually cool! So correct behaviour there!
Some more tests:

Mint Tara (aka Ubuntu 18.04)

locale LC_ALL=de_DE.UTF-8

LC_CTYPE= LANG=C LC_ALL=C LANGUAGE=C testpilotcloud &

touch xxx.txt; sleep 10 # sufficient sleep so that it syncs
mv xxx.txt x乱码.txt
OK

  • syncs up fine.

Client Error log:

  • 6 Aug 2018 16:51:25, Documents/issue5676/x������.txt, testpilotcloud3, The filename cannot be encoded on your file system.

BUT: the file gets renamed on the server.

  • local changes are no longer synced up.
  • changes on the server are no longer synced down.

OK:
Files that 'cannot be encoded on your file system' according to the locale of the owncloud-client (LC_ALL=C) in this case are rejected with the correct error message.

BUT:

  • touch 乱码.txt
    Different error message: 'File Removed'. But the file was not removed.

@jnweiger
Copy link
Contributor

jnweiger commented Aug 6, 2018

@ogoffart Regarding byte-streams.
I believe both your points are invalid/can be worked around. Let's move that discussion to the server space: owncloud/core#32251

@jnweiger
Copy link
Contributor

jnweiger commented Aug 8, 2018

The behaviour seen in #5676 (comment) is as expected. Closing.

@jnweiger jnweiger closed this as completed Aug 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ReadyToTest QA, please validate the fix/enhancement
Projects
None yet
Development

No branches or pull requests

7 participants