Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sympasoap oddity with utf-8 input #1862

Closed
dpc22 opened this issue Jul 8, 2024 · 8 comments
Closed

sympasoap oddity with utf-8 input #1862

dpc22 opened this issue Jul 8, 2024 · 8 comments

Comments

@dpc22
Copy link
Contributor

dpc22 commented Jul 8, 2024

Version

6.2.72

Installation method

My own rpm, derived from "official" RHEL 9 rpm.

Expected behavior

If someone calls the SOAP "add" method with a GeCOS value which contains non-ASCII characters, the data should be processed as UTF-8.

Actual behavior

The PostgreSQL database back end throws an exception:

Jul 8 09:19:27 lists-2 sympasoap[298198]: err main::#85 > Sympa::WWW::SOAP::Transport::handle#118 > SOAP::Transport::HTTP::CGI::handle#627 > SOAP::Transport::HTTP::Server::handle#459 > SOAP::Server::handle#2844 > (eval)#2878 > (eval)#2893 > Sympa::WWW::SOAP::add#812 > Sympa::Spindle::spin#95 > Sympa::Request::Handler::add::_twist#80 > Sympa::List::add_list_member#3291 > Sympa::DatabaseDriver::PostgreSQL::do_prepared_query#112 > Sympa::Database::do_prepared_query#383 Unable to execute SQL statement "INSERT INTO subscriber_table (subscribed_subscriber, reception_subscriber, update_epoch_subscriber, number_messages_subscriber, date_epoch_subscriber, visibility_subscriber, user_subscriber, comment_subscriber, list_subscriber, robot_subscriber) SELECT ?, ?, ?, ?, ?, ?, ?, ?, ?, ? FROM dual WHERE NOT EXISTS ( SELECT 1 FROM subscriber_table WHERE user_subscriber = ? AND list_subscriber = ? AND robot_subscriber = ? )": (22021) ERROR: invalid byte sequence for encoding "UTF8": 0xa3

"0xa3" is the single byte ISO-8859-1 character "£".

This is correctly encoded using the 2 byte UTF-8 sequence: "0xc2 0xa3" in my SOAP client.

Something has trans-coded UTF-8 to ISO-8859-1, but the database backend is expecting UTF-8.

Steps to reproduce

SOAP client script (written in Python) available on request.

Additional information

I have an unpleasant feeling that this is in some way related to:

#1407

"This behavior seems due to bug (or buggy behavior) of SOAP::Lite".

(We are using the version of SOAP-Lite which ships with RHEL 9, which is: perl-SOAP-Lite-1.27-8.el9.noarch).

If I add a "Encode::_utf8_off($gecos);" to: lib/Sympa/WWW/SOAP.pm:

sub add {
    my $class    = shift;
    my $listname = shift;
    my $email    = shift;
    my $gecos    = shift;
    my $quiet    = shift;

    Encode::_utf8_off($gecos);

Then things start to work in the way that I would expect. However it isn't clear to me whether this is a safe or sensible thing to do.

@dpc22 dpc22 added the bug label Jul 8, 2024
@ikedas
Copy link
Member

ikedas commented Jul 8, 2024

Hi @dpc22 ,

This is correctly encoded using the 2 byte UTF-8 sequence: "0xc2 0xa3" in my SOAP client.

Please provide a sample of the input data, the client script you used and how you made sure the client encoded it correctly.

@dpc22
Copy link
Contributor Author

dpc22 commented Jul 9, 2024

I attach my example Python script which fails (.txt extension required by github)

sync.py.txt

The equivalent Perl script seems to work:

sync.pl.txt

The only obvious difference is:

$soap->default_ns('urn:sympasoap');

I can't find a direct equivalence to "$soap->default_ns()" in the Zeep library that I am using in Python.

There is: "zeep.set_ns_prefix()", but that takes two arguments.

     |  set_ns_prefix(self, prefix, namespace)
     |      Set a shortcut for the given namespace.

The following didn't help:

zeep.set_ns_prefix(None, 'urn:sympasoap');

Afraid that I don't know what SOAP namespaces do, so I'm blundering around in the dark rather.

@dpc22
Copy link
Contributor Author

dpc22 commented Jul 9, 2024

I'm pretty sure that my Python code was originally derived from: https://pypi.org/project/sympasoap/.

That doesn't seem to do anything with namespaces either.

(Edit to add)

It also has a normalize method which just discards any non-ASCII characters on the GeCOS field before invoking the SOAP add method. Presumably the author ran into the same issue, but didn't come up with a more sensible fix.

@dpc22
Copy link
Contributor Author

dpc22 commented Jul 9, 2024

https://docs.python-zeep.org/en/master/transport.html#debugging

tells me how to dump the raw XML which is sent to the sympasoap server.

The raw HTTP POST request was:

zeep.transports: HTTP Post to https://test.lists.cam.ac.uk/sympasoap:
<?xml version='1.0' encoding='utf-8'?>
<soap-env:Envelope xmlns:soap-env="http://schemas.xmlsoap.org/soap/envelope/"><soap-env:Body><ns0:add xmlns:ns0="urn:sympasoap"><list>test-dpc22</list><email>dpc99@cam.ac.uk</email><gecos>Test £</gecos><quiet>true</quiet></ns0:add></soap-env:Body></soap-env:Envelope>

We have <xml ... encoding='utf-8'>

The <gecos> field appears to be correctly encoded as UTF-8: if I send the output to a file and use "od -c", I see the two byte sequence: "0xc2 0xa3" sent by the SOAP client.

0000440   l   >   <   g   e   c   o   s   >   T   e   s   t     302 243
0000460   <   /   g   e   c   o   s   >   <   q   u   i   e   t   >   t

>>> hex(0o302)
'0xc2'
>>> hex(0o243)
'0xa3'

I have a dedicated test server if I can add useful debugging at the server end. The normal Sympa verbose logging didn't tell me anything.

@ikedas
Copy link
Member

ikedas commented Jul 9, 2024

@dpc22, could you please apply #1592 and check if the problem will be solved?

@dpc22
Copy link
Contributor Author

dpc22 commented Jul 10, 2024

Thank you.

That seems to have fixed the problem on my test server.

I did need to add a patch for src/lib/Makefile.in in order to backport your fix from the GIT repository to the 6.2.72 release tarball given:

rename from src/lib/Sympa/WWW/SOAP/Transport.pm
rename to src/lib/Sympa/WWW/SOAP/FastCGI.pm

I will apply the fix to the live system either tomorrow morning or Monday morning.

@ikedas
Copy link
Member

ikedas commented Jul 10, 2024

Duplicate of #1541.

@dpc22
Copy link
Contributor Author

dpc22 commented Jul 11, 2024

Okay, that seems to have worked on the live system as well. Thanks for your help here!

@ikedas ikedas closed this as completed Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants