Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't drop non-utf8 file paths #20

Open
fezzzza opened this issue Oct 28, 2018 · 2 comments
Open

Don't drop non-utf8 file paths #20

fezzzza opened this issue Oct 28, 2018 · 2 comments

Comments

@fezzzza
Copy link

fezzzza commented Oct 28, 2018

I can get mirror working fine as a client without watchman, but with watchman installed I get this:
I am running linux mint 19 (~ubuntu 18 bionic).
Same result whether running as user or root
Same result with whichever version of openjdk-8/9/10/11-jre
A quick google and it appears to be related to character encodings. It may help to mention that I am in the UK and most of my system defaults to UTF-8, but it may be related to some form of internationalisation.
With reference to your notes about WatchService, I notice that JDK-8145981 is now fixed - is WatchService still considered buggy in the latest release and is watchman still recommended/required for stability?

$mirror client -h localhost -l /var/www/html -r /var/www/html
2018-10-28 16:15:39 INFO Connected, starting session, version unspecified
2018-10-28 16:15:41 INFO Watchman root is /var/www/html
2018-10-28 16:15:41 ERROR Exception starting the client
java.nio.charset.MalformedInputException: Input length = 1
at java.nio.charset.CoderResult.throwException(CoderResult.java:281)
at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:816)
at com.facebook.buck.bser.BserDeserializer.deserializeString(BserDeserializer.java:236)
at com.facebook.buck.bser.BserDeserializer.deserializeRecursiveWithType(BserDeserializer.java:332)
at com.facebook.buck.bser.BserDeserializer.deserializeTemplate(BserDeserializer.java:302)
at com.facebook.buck.bser.BserDeserializer.deserializeRecursiveWithType(BserDeserializer.java:338)
at com.facebook.buck.bser.BserDeserializer.deserializeRecursive(BserDeserializer.java:313)
at com.facebook.buck.bser.BserDeserializer.deserializeObject(BserDeserializer.java:276)
at com.facebook.buck.bser.BserDeserializer.deserializeRecursiveWithType(BserDeserializer.java:336)
at com.facebook.buck.bser.BserDeserializer.deserializeRecursive(BserDeserializer.java:313)
at com.facebook.buck.bser.BserDeserializer.deserializeBserValue(BserDeserializer.java:113)
at mirror.watchman.WatchmanChannelImpl.read(WatchmanChannelImpl.java:93)
at mirror.watchman.WatchmanChannelImpl.query(WatchmanChannelImpl.java:87)
at mirror.watchman.WatchmanFileWatcher.startWatchAndInitialFind(WatchmanFileWatcher.java:197)
at mirror.watchman.WatchmanFileWatcher.performInitialScan(WatchmanFileWatcher.java:140)
at mirror.MirrorSession.calcInitialState(MirrorSession.java:78)
at mirror.MirrorClient.startSession(MirrorClient.java:88)
at mirror.MirrorClient.access$300(MirrorClient.java:27)
at mirror.MirrorClient$SessionStarter.runOneLoop(MirrorClient.java:198)
at mirror.tasks.ThreadBasedTask.run(ThreadBasedTask.java:62)
at mirror.tasks.ThreadBasedTask.lambda$new$0(ThreadBasedTask.java:39)
at java.lang.Thread.run(Thread.java:748)
2018-10-28 16:15:41 INFO Stopping session

@stephenh
Copy link
Owner

stephenh commented Oct 28, 2018

Oh, yes, this is from getting non-UTF8 paths. I ran into this myself but hadn't released the "fix". If you bump to 1.2.1, which I just pushed, it should not blow up.

The disclaimer is that I wasn't sure how to fix it, so for now when watchman says "um, this file path can't be decoded as utf-8", mirror just skips it and does not sync that path.

I guess in theory it could transfer the file path as binary (just a byte[]) across the wire ... however all of the Java file system APIs take strings, so once the remote side got it, there is not a (standard) Java API that would accept it. I'd have to do something janky like save it to a temp file (via the Java APIs) and then use a JNI call/something to rename it.

In my case, these were corrupted file paths, so I used env LC_ALL=C find . -name '*[! -~]*' to find them and delete them. But I suppose for you they are real files...

I'll leave this issue open as "somehow support non-utf8 file names in a way that is not dropping them".

@stephenh stephenh changed the title Watchman causes exception Don't drop non-utf8 file paths Oct 28, 2018
@fezzzza
Copy link
Author

fezzzza commented Oct 28, 2018

Ah yes, just to confirm, there are a bunch of image files of international flags that have accented characters in the filenames - that's the way they came from the source - I certainly wouldn't have chosen to use complex characters in the filenames and I've seen it documented that it's not a good idea - but I wouldn't know how to check whether they are UTF-8 or an international ISO like ISO-8859-1 or some other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants