Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix possible ValueError in convert_to_int caused by values like 1px #10

Merged
merged 3 commits into from
Sep 10, 2013

Conversation

yaph
Copy link
Contributor

@yaph yaph commented Sep 4, 2013

When trying to parse http://www.wired.com/wiredscience/2013/09/rim-fire-map-color-scale/ a ValueError was raised in convert_to_img, because the page has image width and height values ending in px.

I changed the function to be more liberal regarding dimension values, by extracting the digits before casting to int. I added a test for this.

Not sure though if the value should be converted to int at all or kept as a string.

…om dimensions attributes values in convert_int. Also be more liberal allowing values with px or percentage suffixes. Added test for image dimensions.
@coveralls
Copy link

Coverage Status

Coverage decreased (-2.35%) when pulling ebbd936 on yaph:convertint into a8430c0 on michaelhelmick:master.

@michaelhelmick
Copy link
Owner

Very good. I'm glad that adding tests and modifying this source code has been understandable.

Now for the pull request, the only gripe I have is that it is invalid HTML to have width="100px" in the img tag. So, should we even parse it?

We could either:

  1. Accept this pull request and gracefully accept a width or height that might be invalid
  2. Modify the pull request and except both TypeError and ValueError, therefore ignoring the value all together

If we chose the second option, the convert_to_int method would look like this:

def convert_to_int(value):
    """Attempts to convert a specified value to an integer

    :param value: Content to be converted into an integer
    :type value: string or int

    """
    try:
        return int(value)
    except TypeError:
        return None

Thoughts @ashibble?

@ashibble
Copy link
Contributor

ashibble commented Sep 5, 2013

Although it may be invalid to write width="100px", I have seen it done before. I feel that the intended meaning is the same for px sizes, and so it wouldn't necessarily be a BAD thing.

I do think that we should ignore sizes with em or % in them, as these may not display correctly converted to px sizes. Changing % to pixels would reflect an incorrect width.

@yaph
Copy link
Contributor Author

yaph commented Sep 5, 2013

As I mentioned, I'm not sure, whether the value should be changed at all, except maybe for calling strip() on it.

I agree that just removing/ignoring em and % is not a good idea.

That being said valid HTML and real-world usage are 2 different things, so I suggest the following.

    if value is None:
        return None

    value = value.strip().strip('px')
    try:
        return int(value)
    except TypeError:
        return None
    except ValueError:
        return None

If a value contains a px suffix, although not valid, accept it and use the digits only. Everything else that can't be converted to int will be omitted.

@michaelhelmick
Copy link
Owner

We could probably do this:

pattern = re.compile(r'(?P<value>\d+)?px+')
def convert_to_int(value):
    try:
        match = re.search(pattern, value))
        if match:
            value = int(match.group('value')
    except (TypeError, ValueError):
        return None

I don't know if the double strip or the regex pattern search is faster.
I'll test that tomorrow, I'm getting sleepy and don't feel like testing it right now, haha.

@michaelhelmick
Copy link
Owner

@yaph

Alright, it appears the basic double strip is faster! Think you can update your pull request?

Except... instead of excepting TypeError and ValueError separately, can you do them in one like like in my last comment? Thanks!

@yaph
Copy link
Contributor Author

yaph commented Sep 9, 2013

Okay I'll update it soon.

@michaelhelmick
Copy link
Owner

Thanks!

Sent from my iPhone

On Sep 9, 2013, at 7:35 PM, Ramiro Gómez notifications@github.com wrote:

Okay I'll update it soon.


Reply to this email directly or view it on GitHub.

@coveralls
Copy link

Coverage Status

Coverage decreased (-1.17%) when pulling 7db8a0a on yaph:convertint into a8430c0 on michaelhelmick:master.

@michaelhelmick
Copy link
Owner

One more thing if it isn't too much trouble!

Change

value = value.strip().strip('px')

to

value = value.strip(' px')

I actually just did some reading and found out that the all instances of characters passed to strip are removed from the string!
http://docs.python.org/2/library/stdtypes.html#str.strip

Crazy, I thought it was just striping the phrase. My change has a space along side p and x value.strip(' px')

@yaph
Copy link
Contributor Author

yaph commented Sep 10, 2013

Interesting, I wasn't aware of that either, I'll update that.

@coveralls
Copy link

Coverage Status

Coverage decreased (-1.17%) when pulling f35cbef on yaph:convertint into a8430c0 on michaelhelmick:master.

michaelhelmick added a commit that referenced this pull request Sep 10, 2013
Fix possible ValueError in convert_to_int caused by values like 1px
@michaelhelmick michaelhelmick merged commit 5d44363 into michaelhelmick:master Sep 10, 2013
@michaelhelmick
Copy link
Owner

Merged, thanks!

@yaph
Copy link
Contributor Author

yaph commented Sep 10, 2013

You're welcome! Glad I could contribute to lassie.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants