Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify CSV import to comply RFC 4180 #585

Open
wants to merge 1 commit into
base: master
from

Conversation

@ljelinek-cznic
Copy link

commented Sep 10, 2019

Description

This patch enhances CSV import. It changes CSV parsing to be compliant to RFC 4180.

Related Issue

https://mantis.phplist.org/view.php?id=19934

Screenshots (if appropriate):

@samtuke samtuke requested a review from suelaP Sep 10, 2019

@samtuke

This comment has been minimized.

Copy link
Contributor

commented Sep 10, 2019

Thanks @ljelinek-cznic ! This will go through the usual QA process and hopefully make it into phpList 3.4.7

@bramley

This comment has been minimized.

Copy link
Contributor

commented Sep 10, 2019

@ljelinek-cznic You seem to have written your own parser. php has a function to parse a csv string str_getcsv() https://www.php.net/manual/en/function.str-getcsv.php
Is there a reason why that cannot be used?

@bramley

This comment has been minimized.

Copy link
Contributor

commented Sep 11, 2019

To answer my own question. The import data needs to be split into lines before str_getcsv() can parse a line of text. As one of the intentions of this change is to allow a new-line character within a field then that cannot now be done.

Instead fgetcsv() could be used.

@bramley

This comment has been minimized.

Copy link
Contributor

commented Sep 11, 2019

This current piece of code normalises the line ending to "\n".

    // Clean up email file
    $email_list = trim($email_list);
    $email_list = str_replace("\r", "\n", $email_list);
    $email_list = str_replace("\n\r", "\n", $email_list);
    $email_list = str_replace("\n\n", "\n", $email_list);

But if "\n" can now appear within a field then I'm not sure that the processing is entirely valid as it could change the content of a field.

@ljelinek-cznic

This comment has been minimized.

Copy link
Author

commented Sep 17, 2019

Unfortunately, fgetcsv() don't ensure full compliance to RFC 4180. This is the reason that I solve it by implementing own mechanism. Probably the most exhaustive explanation is here: https://wiki.php.net/rfc/kill-csv-escaping

@bramley

This comment has been minimized.

Copy link
Contributor

commented Sep 17, 2019

Thanks.
From what I can see from the link you provided the problems are with the escaping mechanism (using the \ character) which is outside RFC4180. But it appears that can be disabled by using \0 as the escape character.

The php function fgetcsv() seems to support the seven points in the definition at https://tools.ietf.org/html/rfc4180#page-2
Or is there something that I am missing?

@bramley

This comment has been minimized.

Copy link
Contributor

commented Sep 17, 2019

I have changed the code to use fgetcsv(). Using your file with some additions to have embedded new line, and embedded double quotes,

email;Last name;Name;Country
"johnsmith@example.com";"butter";"";"Czech Republic"
"x.smith@example.com";"maslo
after new line";"smith";"Czech Republic"
foo@bar.com;"name including a"" character";"Jim";"Great Britain"

the test output is
image

I think that the \ in front of " is a phplist bug. The value stored in the database does not have that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.