Unicode normalization #5162
I have been struggling for a while with uploading filenames with accented characters in them to an FTP site. The file name "seems" to be preserved on upload and it looks correct, but if I copy the filename out of CyberDuck after the upload has completed, and paste it into TextWrangler, it shows a red upside down question mark instead of the accented characters.
And if I retype that name in CyberDuck, I can then upload the file again from my Mac so that it then looks like there are 2 identically named files on the server.
For whatever it's worth, Captain FTP also seems to have this same problem, but FileZilla doesn't.
If you look at this directory listing
all the names are mangled (that's an Apache thing apparently), but the 2 that end in .CD.2.gif both point to different files, but if you click them, the name that comes up in the Safari address bar is apparently the same, yet both exist in the same directory on the server. One of these was uploaded with FileZilla, the other with CyberDuck.
This has come about because the files are used in PHP scripts for genealogy and there are problems reading the file names and writing them into the database for later retrieval. It is the files that when the name is copied into TextWrangler show the red upside down question mark that cause issues with the PHP scripts. The file names "look" OK in phpMyAdmin but there's an issue somewhere.
CyberDuck is set to UTF-8 in the Preferences, and in the settings for that Bookmark. I've been told that the server is set to UTF-8 also.
If you need access to this server let me know and I can eMail the credentials to you.
The text was updated successfully, but these errors were encountered:
Replying to [comment:1 dkocher]:
yes, that makes things even worse. There is something about how CyberDuck (and Captain FTP) uploads files with accented characters when UTF-8 is chosen compared to how FileZilla uploads them. If you look in this directory
you see two files that apparently have the same name, but if you view the source code for the page, you see one file represented as
and the other as
If I view their names in CyberDuck and copy it out to TextWrangler and turn on "Show Invisibles" then I see what is in the attached image - the CyberDuck file shows the red upside down ? symbol in place of each accented character.
If I turn off the Apache option for showing UTF-8 Directory listings, then the CyberDuck file shows as
and the FileZilla file shows as
I have worked yesterday with the owner of Simply Hosting where this site is and he confirms that the server is running UTF-8. I am able to repeat the same results on my server which is running Mac OS X 10.5.8 with PureFTPd running as the FTP server - the file uploaded by CyberDuck isn't referenced correctly once it's saved using PHP. This can be seen here
where the bottom 2 files, if you click the link for the FileZilla one, you'll see the image, but if you click the link for the CyberDuck one you don't see the image, yet the file name field below the image is the same in each case.
Thanks for the detailed analysis! I certainly hope that will bring us closer to find the cause of the issue. In the meantime, could you try the following. Open a Terminal.app window and paste
Restart Cyberduck. Let me know if that makes any difference. I'll have a closer look next week.
That has made all the difference. That same file now uploaded by CyberDuck is recognised correctly by the PHP scripts and can be retrieved and displayed.
(compared to previously with CyberDuck)
where even though the filename on the page "seemed" correct, the image wasn't displaying
On the Apache index page here
the View Source of that page shows the same encoding of the file name
as was resulted after the upload by FileZilla (see above).
It also fixed the same issue on my Mac OS X 10.5.8 server where now the image will display on the PHP pages calling it.
What is the significance of setting that preference to Unicode?
Replying to [comment:4 thekiwi]:
Hope you are still here. Can you let me know from where (which volume or network mount) you were uploading these files in question.
Replying to [comment:4 thekiwi]:
Can you please let me know if using the latest snapshot build still works with the custom property you set removed using
Thanks for your reply.
David - sorry I hadn't noticed before now the couple of recent messages from you about this.
I've just now done as you asked - removed the custom property (how can I check that it truly was removed?), and installed the latest nightly build 4.0b9, and as far as I can remember from when this cropped up, the behaviour is working as one would expect - a file with accented characters in the file name seems to be correctly uploaded to a server, and can be found by the PHP scripts involved. And when the name is copied from CyberDuck and pasted into TextWrangler the name is as expected.
I tried it with 2 different servers - one at SimplyHosting.net running Linux, and my server on Mac OS X 10.6.5 with pureftpd 1.0.29 - the same computer I upload the files from.
As to the previous question - I'm not entirely sure what you're asking about "volume or network mount" - the files with the accented characters in the file name were on my Mac's startup disk - the same computer, and disk that has the server on it that the files are uploaded to.
Hope this all helps.
Edit about 30 minutes later with some extra information...
Actually I had forgotten to tell TextWrangler to "Show Invisibles" when I checked the file names as copied out of CyberDuck.
1 - for the Linux Server - it was all OK - the file name - Kénnátsîdeheads2.gif appeared as expected.
2 - for my Mac OS X Server, the file name copied from CyberDuck showed an upside down red question mark in each place where the accented characters were meant to be, but in the Finder the filename looked as expected, and in fact was identical to the filename that I uploaded, at least in as much as using the Finder to copy the file I uploaded from the source folder to the destination folder said the file already existed.
But the PHP scripts had detected the expected file name and written it into the MySQL database as expected - is this perhaps an issue with CyberDuck not displaying the name correctly?
Some more information on the connections to the 2 different servers.
1 - the connection to the Linux Server
2 - the connection to my Mac OS X Server
The notable difference is that the Mac OS X server indicates that
while the Linux server doesn't show this. Knowing why this might be is outside my range of knowledge/skill.
The Linux server seemed to get the filename entirely correct while the Mac OS X server had trouble when the file name was copied out of CyberDuck and pasted into TextWrangler.
I also add that since my earlier testing some months ago on my Mac, I've upgraded to Mac OS X 10.6 from 10.5 and had to reinstall PureFTPd using the PureFTPd Manager application.
OK - I ran it again and got
Replying to [comment:16 theKiwi]:
The property was successfully removed then.