Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import - multibyte/special characters problem #600

Closed
JKoelman opened this issue Feb 15, 2014 · 20 comments

Comments

Projects
None yet
3 participants
@JKoelman
Copy link
Contributor

commented Feb 15, 2014

right, we had an export problem (#599) and it seems we do have an import problem too or i'm doing something wrong.

Did add some text (chinese/greek) exported and am now trying to import the file.
Result-> question marks .

@JKoelman JKoelman added this to the JEM 1.9.6 beta milestone Feb 15, 2014

@JKoelman JKoelman added the bug label Feb 15, 2014

@JKoelman

This comment has been minimized.

Copy link
Contributor Author

commented Feb 15, 2014

not sure about this one as I didn't save the export-file correctly.

//
will try it later on as my pc is experiencing some problems right now (it doesn't like special characters)
it seems that Joomla has a problem with a long title name of special characters :(

@JKoelman

This comment has been minimized.

Copy link
Contributor Author

commented Feb 15, 2014

will close this issue as import seems to be working correct.

@JKoelman JKoelman closed this Feb 15, 2014

@Hoffi1

This comment has been minimized.

Copy link
Contributor

commented Feb 15, 2014

...when importing new utf-8 (bom) file I see "ä½ ä»¬å¥½" instead of "你们好" - seems importer always expects ANSI and tries to convert to utf-8 again.
Also the bom must be removed to handle id field as expected. ("(bom)id" is not "id" so first column is always ignored).

@Hoffi1 Hoffi1 reopened this Feb 15, 2014

@JKoelman

This comment has been minimized.

Copy link
Contributor Author

commented Feb 15, 2014

did try:

  • add event with greek characters
  • exported that csv file
  • didn't open with Excel but imported the file and it looked fine

but will try out some other characters,etc and then see what's happening.

@JKoelman

This comment has been minimized.

Copy link
Contributor Author

commented Feb 15, 2014

hmm, tried again but not seeing the problem right :(
Did try Greek, Hindi,Japenese but will try again tomorrow.
(didn't add the categories, so that's why the categories are missing)

clipboard01a
clipboard02a

@Hoffi1

This comment has been minimized.

Copy link
Contributor

commented Feb 15, 2014

Had you forgot to commit your fixed import controller?
I see iconv('windows-1252', 'utf-8', file_get_contents... in JEMControllerImport::CsvImport().
Commenting out this and next line solves the conversion problem (but may cause a problem on EL import). But the bom-id problem persists.
I think I will try to search for and remove bom or convert if no bom. (for my tests, not committing yet)

@JKoelman

This comment has been minimized.

Copy link
Contributor Author

commented Feb 15, 2014

whoops, you're right did alter the import controller but didn't think of it.

Have two versions offline 1 adapted and the other one not so was a bit confused.
And yep the "replace id " option is not working :(

@Hoffi1

This comment has been minimized.

Copy link
Contributor

commented Feb 15, 2014

😜

@JKoelman

This comment has been minimized.

Copy link
Contributor Author

commented Feb 15, 2014

@Hoffi1 maybe you did find a solution meanwhile, but did find a answer here: http://anupamsaha.wordpress.com/2011/08/02/detecting-utf-byte-order-mark-using-php/

and it seems to be doing the trick, and did leave out the iconv part
(didn't upload)

@Hoffi1

This comment has been minimized.

Copy link
Contributor

commented Feb 15, 2014

@JKoelman: Yes I have. 😉 Especially file_get_content() I try not to use. But pack() is nice. 👍
I have a solution which reads the file only once, line by line, removes bom and converts each line if no bom was present. If you like I can commit it.

@Hoffi1 Hoffi1 self-assigned this Feb 16, 2014

@Hoffi1 Hoffi1 closed this Feb 16, 2014

@JKoelman JKoelman reopened this Feb 16, 2014

@JKoelman

This comment has been minimized.

Copy link
Contributor Author

commented Feb 16, 2014

am reopening this one as am getting the error "Wrong number of fields" when trying to import.

@Hoffi1

This comment has been minimized.

Copy link
Contributor

commented Feb 16, 2014

Do you have a semicolon somewhere in your data fields?
This is a general problem because fields are not encapsulated in "" - maybe we should change that too.

But I will also check with no-bom file. Maybe we need additional parantheses in line 118.

@diesl

This comment has been minimized.

Copy link
Contributor

commented Feb 16, 2014

In general no conversion should be needed as the whole workflow (import, export, etc) should be utf8 (without bom, I think). So these conversions should be stripped everywhere.

@JKoelman

This comment has been minimized.

Copy link
Contributor Author

commented Feb 16, 2014

@diesl if you want to open a exported CSV in Excel the BOM needs to be there so far i know.

@JKoelman

This comment has been minimized.

Copy link
Contributor Author

commented Feb 16, 2014

Do you have a semicolon somewhere in your data fields?

did take a look but don't see it, do have semicolon's but they are there to seperate the fields.
(image above)

@Hoffi1

This comment has been minimized.

Copy link
Contributor

commented Feb 16, 2014

See the effect too. There seems to be a undocumented difference between fgetcsv() and str_getcsv(). Will alter the code to prevent usage of str_getcsv()...

@diesl: Generally I agree. But this makes new JEM version incompatible to older exports.
Because JEM is not in stable phase yet it's not really a problem, yes. But for JEM 2.0 I would recommend to have a mechanism to versioning export files.
If Excel needs bom I can't say. My old Excel doesn't understand it and always expects proprietary Windows encoding.

Hoffi1 added a commit that referenced this issue Feb 16, 2014

alter import controller (issue #600)
Don't use str_getcsv() because it's different to fgetcsv().

@Hoffi1 Hoffi1 modified the milestones: JEM 1.9.7 beta, JEM 1.9.6 beta Apr 6, 2014

@JKoelman

This comment has been minimized.

Copy link
Contributor Author

commented Apr 17, 2014

not sure if there was still something to be addressed in here but tried to import/export and it looked fine. did try the text "Iñtërnâtiônàlizætiøn" but didn't try out the greek or chinese text

@JKoelman

This comment has been minimized.

Copy link
Contributor Author

commented Apr 17, 2014

well did try to import greek + chinees too and that one looks fine so will close this issue.
--> tried out to import events only

@JKoelman JKoelman closed this Apr 17, 2014

@Hoffi1

This comment has been minimized.

Copy link
Contributor

commented Apr 17, 2014

The original problem was fixed so export/import works.

But there was an open discussion about switching to UFT-8 no BOM (incompatible), Excel, and much more.

@JKoelman

This comment has been minimized.

Copy link
Contributor Author

commented Apr 17, 2014

ah right,
for now i think it's best to let this issue closed and to continue the discussion in the forum if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.