Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Case sensitive versus case insentive or case preserving file systems #1348

Open
dragotin opened this issue Jan 15, 2014 · 28 comments
Open

Case sensitive versus case insentive or case preserving file systems #1348

dragotin opened this issue Jan 15, 2014 · 28 comments
Assignees

Comments

@dragotin
Copy link
Contributor

dragotin commented Jan 15, 2014

Different systems have different file systems that handle file names differently. One particular problem is the case sensitivity handling which can cause all kinds of problems when operating in a cross platform environment.

We need to track the problems down and find a solution. This bug tracks the changes to be done on the desktop sync clients.

There is owncloud/core#4747 to track this problem on sever side.


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

@dragotin
Copy link
Contributor Author

Here is some input to consider summarized by @danimo:

This problem is part of a much bigger problem scope: Limitations of the OS/file systems on the clients (1), limitations on the backend (2), limitations due to the nature of PHP and / or web technology used to build ownCloud (3).

Examples for (1):

1. Case sensitive vs Case preserving file system
2. Allowed character set (usually Unicode NFD vs Unicode NFC problems these days)
3. Reserved characters
4. Reserved words (NUL, COM1, etc)
5. Maximum length (253 chars on Windows, insignificant on all other supported OSes)

Examples for (2):

- All of (1)
- Additionally, backends that violate POSIX FS semantics (hopefully only a temporary problem with the first version of the Jive plugin)

Examples for (3)

- Limitations purposefully introduced to avoid security issues: disallow files called ".htaccess" -> same problem as reserved words, disallow '<', '>' (same problem as 'reserved characters')
- Limitations due to PHP limitations, i.e. missing unicode support on windows, which Thomas wrote a mapper for.

@dragotin
Copy link
Contributor Author

And this is some general information how others solve this problem: https://www.dropbox.com/help/145/en

@PVince81
Copy link
Contributor

One idea @schiesbn and I examined during lunch was to store the files on the server using hashes instead of real files names to get rid of all the encoding/special characters/casing issues. But that wouldn't be compatible with external storage, as we still need the real file names on the target storages...

Just putting this here in case it might spawn other ideas.

@DeepDiver1975
Copy link
Member

And this is some general information how others solve this problem: https://www.dropbox.com/help/145/en

Awesome! More that a great starting point!

@ogoffart
Copy link
Contributor

one "simple'" way to do it would be to have mirall detect the error and just show an error to the user without syncing the file. That way we avoid dataloss (but we don't sync all the files)

@DeepDiver1975
Copy link
Member

One idea @schiesbn and I examined during lunch was to store the files on the server using hashes instead of real files names to get rid of all the encoding/special characters/casing issues. But that wouldn't be compatible with external storage, as we still need the real file names on the target storages...

Just putting this here in case it might spawn other ideas.

We do something similar on Windows Server already - files are stored with their 'sluggified' name.
In addition we maintain a table where we map the real name to the physical name.

https://github.com/owncloud/core/blob/master/lib/private/files/mapper.php#L170

@dragotin
Copy link
Contributor Author

refering to @ogoffart : ...like we do it with invalid characters today for example. But what is "the error"?

  • a file Monster.jpg exists on linux client and is synced to server. Now another file monster.jpg is created on the linux client which is legal from the linux POV. The client could recognise that and exclude monster.jpg from syncing and/or offering a possibility to rename the file.
  • a file Monster.jpg is created on the linux client. At the same time a file monster.jpg is created on the server. We exclude monster.jpg from sync (do not download it). Additionally we could offer a dialog to rename the local file Monster.jpg to something different as we do not want to rename server files in the first place.

other cases?

@DeepDiver1975
Copy link
Member

@dragotin in that given scenario on the client we are okay on:

  • Linux - nothing has to be changes - works today
  • MacOS - I have no idea - @danimo
  • On Windows we can treat this as a conflict - right?

@danimo
Copy link
Contributor

danimo commented Jan 16, 2014

@DeepDiver1975:

  • MacOS (cheap): Treat like Windows
  • MacOS (thorough): Check File System properties. If case sensensitive bit is flipped: Do nothing, If not, handle conflict.

Note that Linux (i.e. VFS) also does not guarantee a case sensitive file system. Prominent examples are JFS (by default) and SMB mounts.

@PVince81
Copy link
Contributor

There is another case, with external storage: if there's a linux filesystem for root, but a SMB storage is mounted in "/smb", then for that mountpoint we need the same handling as with Windows FS.
The idea is to let the server find out what partition the mount is on and then return a "case sensitive/insensitive" flag in the PROPFIND headers of folders.

@DeepDiver1975
Copy link
Member

@PVince81 I think we shall continue the server side discussion in owncloud/core#4747

But generally speaking I have to agree

@dragotin
Copy link
Contributor Author

@DeepDiver1975 no, nothing of this works today. Currently we only have a solution that expects the Linux way everywhere. Once we start fixing this we need to fix this on all clients very conservatively because we might have the Linux -> Server -> MacOS sync way. And in addition what @PVince81 says.

The idea to handle that problem on a directory base by a PROPFIND header is ...interesting. It would be much more easier to change filenames to something that is accepted everywhere.

Btw, we should make the whole story configurable for those people who never want to sync stuff cross to a system with crippled file system.

@ogoffart
Copy link
Contributor

ogoffart commented Feb 5, 2014

Proposed solution:

If a file cannot be synced because there is already a file existing with a different case, we do not sync that file and report an error to the user.

Example: The server report to the client that there is two files Monster.png and monster.png. If none of the file were previously synced, the client will pick one at random and sync it normally, preserving the case. And it will report an error to the user regarding the second file: File cannot be synced because it conflicts with another file that only differ in case: Monster.png. Please rename or remove one of the two files.
Once a file is synced, we keep syncing that file and throw an error for the other one until the conflict is resolved. (The file is renamed on the server)

Symmetrically, exactly the same happens if the client is case sensitive, but not the server. All we need from the server is a flag telling us if it is or not case sensitive (possibly per directory)

@jancborchardt Is that solution good enough from an usability point of view.
@DeepDiver1975 All we would need is a flag (in the propfind for example) telling us if this directory is case sensitive.

@jancborchardt
Copy link
Member

@ogoffart I’m not comfortable at all with »we do not sync the file« and »the client will pick one at random«. That’s exactly what the »(case conflict)« is for. So one would be uploaded as »Monster.png« and the other one as »monster (case conflict).png«. Does that work?

@ogoffart
Copy link
Contributor

ogoffart commented Feb 5, 2014

@jancborchardt The problem with the (case conflict) mapping is that it is complicated to maintain and goes sure we still sync the same file and such. For this reason we tought that reporting an error for one of the file was good enough. Case conflict is supposed to be seldom, and not many users will have the problem. As for the 'randomness', do you have a suggestion on which one to pick?

@jancborchardt
Copy link
Member

Well if a case conflict is supposed to be seldom, we should still handle it properly. Picking one (if at random or not doesn’t matter) and not syncing the other is not really a feasible solution for a sync client.

@dragotin
Copy link
Contributor Author

dragotin commented Feb 5, 2014

@jancborchardt yes, but we also should not have the client moving around files. Moreover, moving can cause other serious trouble if the file is opened by another software like M$ Word.

@woboq
Copy link

woboq commented Feb 7, 2014

CreateFile on Windows has an interesting flag

FILE_FLAG_POSIX_SEMANTICS 
0x0100000
 Access will occur according to POSIX rules. This includes allowing multiple files with names, 
differing only in case, for file systems that support that naming. Use care when using this option, 
because files created with this flag may not be accessible by applications that are written for MS-
DOS or 16-bit Windows.

@jancborchardt
Copy link
Member

Ok, the client should show a notification as well. Then when you have the file open and you see that notification, you probably get that there’s a problem.

If we’re saying that case conflicts are edge cases already, then case conflicts where one of the file is open is even more on the edge.

As an additional idea: Can we by any chance detect if a file is opened? Then rename the non-open file?

@jancborchardt
Copy link
Member

Some agreements we came to during discussion – whatever implementation we end up with, it should:

  • not rename files (especially if there’s no need to, like someone who does not use Windows)
  • not just leave files unsynced and throw a warning (we should not try to educate and condition people on such a basic sync aspect)

@danimo
Copy link
Contributor

danimo commented May 21, 2014

APIs for detecting case preserving / case sensitivity:

  • Windows: GetVolumeInformationByHandleW(): FILE_CASE_PRESERVED_NAMES, FILE_CASE_SENSITIVE_SEARCH
  • Mac OS: getattrlist(): VOL_CAP_FMT_CASE_SENSITIVE, VOL_CAP_FMT_CASE_PRESERVING
  • Linux: no API (samba just assumes FSes are case sensitive and case preserving, see. source3/smbd/statvfs.c)
  • BSD: TODO

@guruz guruz added the type:bug label Sep 12, 2014
@phil-davis
Copy link
Contributor

This discussion is a bit old, so maybe there is somewhere more current that an ongoing discussion of file-naming issues is happening? Anyway, I will post here!
In a (mostly or all) Windows clients use case, the Windows clients are "dumb" (case-preserving only) because the typical Windows file systems are that way anyhow. But users can induce sync issues from the web-browser interface by making case-sensitive-conflicting file names there.
Perhaps it would be useful to be able to set the server to "dumb case-preserving mode" - a server setting "Use dumbed-down case-preserving file naming only".
Then Windows clients would be happy - they are already dumbed-down because they are Windows. And the web-browser client could understand when the server is "dumbed-down case-preserving" and thus simply refuse to accept uploads/creations of files or folders that would create case-sensitive conflicts.
_nix and other clients could also understand the server setting and only upload the first of a case-sensitive-conflicting set of files, and log errors about the others.
That would allow the server admin to side-step this issue (at their discretion), pushing it back to the (_nix) case-sensitive clients (of which there might be none or few in use).

@PVince81
Copy link
Contributor

New ticket for the server-side discussion: owncloud/core#17161

@count0-krsk
Copy link

So, is any decision for problem now? Or developers just "to hammer a bolt" on this?
We have linux server, windows clients and stupid users what rename Folder to FOLDER often and then ask: "Why your buggy cloud don't syncing anything?"

@michaelstingl
Copy link
Contributor

@pmaier1 One day, we need a decision here… Related: https://github.com/owncloud/documentation/issues/2832

@guruz
Copy link
Contributor

guruz commented Aug 18, 2017

@count0-krsk Can you post this as a new bug (or search for duplicates).
Changing the case of a file is/was supported AFAIK.
(The users rename via windows client or web interface?)

@count0-krsk
Copy link

count0-krsk commented Aug 18, 2017

@guruz
Of course, if it requred. I search before post and found this. And more complex thread with platform-dependent issues (michaelstingl show link before), where writen what no "gold bullet" for mac, win and linux for now.
Clients have many folders with sharing from one user to many other partially (some dirs to one group, some to other users). Some users rename Folder locally to FOLDER, add some files to it. Other see Folder with no new files and add their own. Then I must go to their PC's and sync manually...

@guruz
Copy link
Contributor

guruz commented Aug 25, 2017

@count0-krsk Please create a new issue, also by mentioning the exact server and client versions you are using. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests