Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP::Message: new() is quietly accepting $content as ref and as scalar #13

Closed
isync opened this issue Nov 21, 2013 · 5 comments
Closed

Comments

@isync
Copy link

isync commented Nov 21, 2013

Feature request:
...and messes up utf8 handling by doing so.

Today I ran into a nasty little caveat of HTTP::Message:

Unknowingly, I constructed a HTTP::Message object in a wrong way: by passing content as a ref instead of a scalar to new(). After that I happily asked the object to give me ->content and ->decoded_content. All was
working, until I noticed strange inconsistent behaviour with utf8 content.

As it turned out, new() was not actually looking into the var I passed in as $content, and as a result, utf8::downgrade was done on the ref and not the actual string (as utf8::downgrade() works on scalars only, right?), so the $content never got the initial utf8 treatment. Summary: new() is not ref/scalar aware.

But later on, when content() is called, internal vars are passed through content_ref() which heals that I wrongly passed in a content ref - only the encoding messed up at that point. (in turn: content() is ref/scalar aware)

Took me hours to track down that a single slash in front of $content in HTTP::Message->new( $headers, $content ) was the root of all evil.

Emitting a warning somewhere might help equally misusing users For example in new():
warn("new() does not accept scalar-refs as content!") if ref($content);
Or the module has to decide if it is ref/scalar compatible all-the-way-through, or not-at-all.

@herveus
Copy link

herveus commented Apr 23, 2015

There are tests that set the content to a scalar ref and do things based on that, so it seems that it's trying to have it both ways. This appears to be the one spot where it misses.

herveus added a commit to herveus/http-message that referenced this issue Apr 23, 2015
    If it is a ref, dereference it before doing utf8::downgrade on it.
    This addresses issue libwww-perl#13 in GitHub
@isync
Copy link
Author

isync commented Apr 24, 2015

This is an old issue... can't totally wrap my head around what I described back then. But from looking at your changes... is there a typo?

you said: "ref, dereference it before doing utf8::downgrade on it."

but commited:

my $dcontent = ref($content) ? $$content : $content;
_utf8_downgrade($content);
$content = $dcontent;

could be you intended to do:
_utf8_downgrade($dcontent);

No? Just have a second look to make sure.
Also, why do a dereference? Wouldn't it be more efficient without a temp var? (untested:)

if (defined $content) {
    # POD says "should be bytes", but tests also use refs
    if(ref($content)){ _utf8_downgrade($$content) }else{ _utf8_downgrade($content) }
}

But then, thinking again, it might not be a good idea to keep $content a ref...
You decide,
thanks for working on this!

@herveus
Copy link

herveus commented Apr 25, 2015

You're right about the typo. I'm not sure about efficiency being an actual issue here.
I should probably maintain the ref-ness of $content, too.

On thing that is holding me up with the pull request is sorting out how to test it.

I'm not Unicode savvy enough to quite understand just how to write a test that will fail with your case.
Can you help me out? I have a test case that puts "foo" expressed in Cyrillic letters in and it passes through just fine, but it doesn't fail in the current version, so it's not as helpful as I'd prefer.

I'm doing the CPAN challenge and got handed this module; this looked like something I could take on.

@isync
Copy link
Author

isync commented Apr 25, 2015

Dug around quite a bit in my repos but couldn't figure out where I found this bug. Sorry.

The thing is not that it fails, but that decoded_content() and content() return inconsistent values. The module thinks it has treated the actual content, while it hasn't.

This is quite a fundamental module.

Sorry I can't put more work into this. Already took me hours to track down this bug (which was 50% on my end - the module doesn't have to anticipate every way someone might be misusing it). And today, starts feeling like a time-drain again, wrapping my head around this old issue... The simplest fix might be to emit a warning, as initially proposed.

@herveus
Copy link

herveus commented Apr 25, 2015

OK. Thanks for looking.

herveus added a commit to herveus/http-message that referenced this issue May 31, 2015
@isync isync closed this as completed Aug 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants