Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible integer overflow in operations with addresses and sizes. #59349

Closed
serhiy-storchaka opened this issue Jun 22, 2012 · 22 comments
Closed
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@serhiy-storchaka
Copy link
Member

BPO 15144
Nosy @jcea, @mdickinson, @pitrou, @vstinner, @serhiy-storchaka
Files
  • align_operations.patch
  • align_operations2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2012-09-20.18:58:50.319>
    created_at = <Date 2012-06-22.20:02:48.379>
    labels = ['interpreter-core', 'type-bug']
    title = 'Possible integer overflow in operations with addresses and sizes.'
    updated_at = <Date 2012-09-28.11:45:51.241>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2012-09-28.11:45:51.241>
    actor = 'jcea'
    assignee = 'none'
    closed = True
    closed_date = <Date 2012-09-20.18:58:50.319>
    closer = 'pitrou'
    components = ['Interpreter Core']
    creation = <Date 2012-06-22.20:02:48.379>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = ['26091', '27232']
    hgrepos = []
    issue_num = 15144
    keywords = ['patch', 'needs review']
    message_count = 22.0
    messages = ['163473', '164799', '164800', '164801', '164804', '164805', '164809', '164810', '164818', '164821', '167480', '170568', '170620', '170623', '170684', '170809', '170811', '170817', '170828', '170830', '170831', '170842']
    nosy_count = 6.0
    nosy_names = ['jcea', 'mark.dickinson', 'pitrou', 'vstinner', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'low'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue15144'
    versions = ['Python 3.3']

    @serhiy-storchaka
    Copy link
    Member Author

    In unicodeobject.c and stringlib aligned addresses and sizes are used for optimization. pointer->integer and implicit integer->integer conversions may overflow or underflow on platforms with sizeof(size_t) != sizeof(void *) or sizeof(size_t) != sizeof(int). The proposed patch fixes such unsafe things in unicodeobject.c, stringlib and some other files.

    There are still a few unsafe places in libffi, but in this library Py_uintptr_t nor uintptr_t are not available.

    @serhiy-storchaka serhiy-storchaka added type-security A security issue interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Jun 22, 2012
    @mdickinson mdickinson self-assigned this Jun 23, 2012
    @mdickinson
    Copy link
    Member

    If we're worrying about undefined behaviour, it looks like recent optimizations have *introduced* new undefined behaviour in the form of strict aliasing violations. E.g., from ascii_decode:

    unsigned long value = *(const unsigned long *) _p;
    

    (here _p has type const char *). This should really be fixed; compilers are known to make optimizations based on the assumption that this sort of undefined behaviour doesn't occur.

    @pitrou
    Copy link
    Member

    pitrou commented Jul 7, 2012

    This should really be fixed; compilers are known to make optimizations
    based on the assumption that this sort of undefined behaviour doesn't
    occur.

    Doesn't the compiler have all the necessary information here? It's not like a subroutine is called.
    (also, I'm not sure what optimization it could actually make)

    @mdickinson
    Copy link
    Member

    (also, I'm not sure what optimization it could actually make)

    Me neither, but it doesn't seem safe to assume that no compiler will take advantage of this. I don't want to start guessing what compilers might or might not do; it would be much better simply to stick to valid C where possible.

    @mdickinson
    Copy link
    Member

    N.B. This could probably be fixed without affecting performance by using the usual union trick. (IIUC, that trick was technically still undefined behaviour for a while, but was eventually made legal by C99 + TC3.) As far as I know there aren't any instances of compilers causing problems with that construct.

    @pitrou
    Copy link
    Member

    pitrou commented Jul 7, 2012

    N.B. This could probably be fixed without affecting performance by
    using the usual union trick.

    How would it work? We would have to add various unions to the
    PyUnicode_Object definition?

    @mdickinson
    Copy link
    Member

    How would it work? We would have to add various unions to the
    PyUnicode_Object definition?

    No, you'd just need a temporary union defined in unicodeobject.c that would look something like:

    typedef union { unsigned long v; char s[SIZEOF_LONG]; } U;

    (with better choices of names). Python/dtoa.c does a similar thing to read / write the pieces of a C double using integers safely.

    @mdickinson
    Copy link
    Member

    I'll see if I can come up with a patch, and open a new issue for it (since I've successfully derailed this issue from its original topic)

    @serhiy-storchaka
    Copy link
    Member Author

    If we're worrying about undefined behaviour, it looks like recent optimizations have *introduced* new undefined behaviour in the form of strict aliasing violations. E.g., from ascii_decode:

    unsigned long value = \*(const unsigned long \*) \_p;
    

    (here _p has type const char *).

    I don't see what the undefined behavior. Can you specify exactly the
    item of the Standard, in which it is mentioned?

    This should really be fixed; compilers are known to make optimizations based on the assumption that this sort of undefined behaviour doesn't occur.

    I don't know how else you can rewrite it, without destroying completely
    the effect of optimization.

    In any case, I don't think that the original patch introduces some new
    undefined behavior.

    @mdickinson
    Copy link
    Member

    I don't see what the undefined behavior. Can you specify exactly the
    item of the Standard, in which it is mentioned?

    It's C99 section 6.5, paragraph 7: "An object shall have its stored value accessed only by an lvalue expression that has one of the following types ...". It's the dereferencing of the pointer that's the problem: that's accessing a stored value of type 'char' by an lvalue expression that has type 'unsigned long'.

    In any case, I don't think that the original patch introduces some new
    undefined behavior.

    Ah; were the strict aliasing problems already there before the patch? I didn't check.

    @serhiy-storchaka
    Copy link
    Member Author

     Ah; were the strict aliasing problems already there before the patch? I didn't check.

    Please open separate issue for this.

    @serhiy-storchaka
    Copy link
    Member Author

    Please, Mark or some other C expert, can you do a code review for the patch?

    @pitrou
    Copy link
    Member

    pitrou commented Sep 17, 2012

    Perhaps the three new macros should be made available in a .h file?
    (in which case they should be private, i.e. start with _).

    @pitrou pitrou added type-bug An unexpected behavior, bug, or error and removed type-security A security issue labels Sep 17, 2012
    @serhiy-storchaka
    Copy link
    Member Author

    Perhaps the three new macros should be made available in a .h file?
    (in which case they should be private, i.e. start with _).

    Perhaps. There are similar macros in other files (Include/objimpl.h,
    Objects/obmalloc.c, Python/pyarena.c, Modules/expat/xmlparse.c).

    @vstinner
    Copy link
    Member

    Perhaps the three new macros should be made available in a .h file?

    Good idea. Maybe pymacros.h? These macros need to be documented (with a comment in the .h file)

    @serhiy-storchaka
    Copy link
    Member Author

    Well, here is a new patch. The five new macros moved to pymacros.h and used in more files.

    @mdickinson
    Copy link
    Member

    Apologies; I got distracted from the main point of this issue with the strict aliasing stuff, and then it fell off the to-do list. Unassigning; Antoine or Victor, do you want to take this?

    @mdickinson mdickinson removed their assignment Sep 20, 2012
    @serhiy-storchaka
    Copy link
    Member Author

    Mark, please open a new discussion, so we don't lose it and that was the place for discussion.

    @pitrou
    Copy link
    Member

    pitrou commented Sep 20, 2012

    Thanks for the patch! These macros will be useful.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 20, 2012

    New changeset 99112b851b25 by Antoine Pitrou in branch 'default':
    Issue bpo-15144: Fix possible integer overflow when handling pointers as integer values, by using Py_uintptr_t instead of size_t.
    http://hg.python.org/cpython/rev/99112b851b25

    @pitrou
    Copy link
    Member

    pitrou commented Sep 20, 2012

    Committed in 3.3(.1).

    @pitrou pitrou closed this as completed Sep 20, 2012
    @mdickinson
    Copy link
    Member

    Mark, please open a new discussion.

    Done: bpo-15992.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants