Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SHA-3 and SHAKE (Keccak) support #60317

Closed
tiran opened this issue Oct 3, 2012 · 80 comments
Closed

Add SHA-3 and SHAKE (Keccak) support #60317

tiran opened this issue Oct 3, 2012 · 80 comments
Labels
extension-modules C modules in the Modules dir type-feature A feature request or enhancement

Comments

@tiran
Copy link
Member

tiran commented Oct 3, 2012

BPO 16113
Nosy @tim-one, @loewis, @rhettinger, @gpshead, @jcea, @pitrou, @vstinner, @larryhastings, @tiran, @ezio-melotti, @asvetlov, @mgorny, @dstufft
Files
  • 521e85a613bf.diff
  • remove_sha3.patch
  • SHA3-and-SHAKE-support-for-Python.patch
  • SHA3-and-SHAKE-support-for-Python-2.patch
  • SHA3-and-SHAKE-support-for-Python-3.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2016-09-08.09:51:00.910>
    created_at = <Date 2012-10-03.03:10:04.668>
    labels = ['extension-modules', 'type-feature']
    title = 'Add SHA-3 and SHAKE (Keccak) support'
    updated_at = <Date 2017-02-28.11:18:50.985>
    user = 'https://github.com/tiran'

    bugs.python.org fields:

    activity = <Date 2017-02-28.11:18:50.985>
    actor = 'mgorny'
    assignee = 'none'
    closed = True
    closed_date = <Date 2016-09-08.09:51:00.910>
    closer = 'christian.heimes'
    components = ['Extension Modules']
    creation = <Date 2012-10-03.03:10:04.668>
    creator = 'christian.heimes'
    dependencies = []
    files = ['27441', '33298', '42764', '43107', '44176']
    hgrepos = []
    issue_num = 16113
    keywords = ['patch']
    message_count = 80.0
    messages = ['171848', '171882', '171898', '171913', '171929', '171963', '171964', '171968', '171971', '171983', '171995', '172070', '172100', '172144', '172152', '172157', '172158', '172313', '172314', '172316', '172319', '172324', '183129', '190303', '191931', '191940', '191971', '201078', '201079', '201080', '201081', '201082', '201083', '201084', '201085', '201086', '201092', '201096', '201928', '207169', '207170', '207171', '207184', '207187', '207188', '207189', '207190', '207191', '207192', '207225', '207226', '207228', '207229', '231838', '253023', '253025', '253028', '253029', '253174', '264029', '265033', '265059', '265066', '265088', '265125', '266911', '266974', '273252', '273328', '273330', '273331', '273363', '274786', '274789', '274790', '274791', '274793', '274797', '275000', '288706']
    nosy_count = 21.0
    nosy_names = ['tim.peters', 'loewis', 'rhettinger', 'gregory.p.smith', 'jcea', 'pitrou', 'vstinner', 'larry', 'christian.heimes', 'habnabit', 'ezio.melotti', 'spatz', 'Arfrever', 'asvetlov', 'mgorny', 'python-dev', 'sbt', 'bjornedstrom', 'dstufft', 'markk', 'haakon']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue16113'
    versions = ['Python 3.6']

    @tiran
    Copy link
    Member Author

    tiran commented Oct 3, 2012

    Today the latest crypto hash function was announced by NIST [1]. I suggest that we include the new hash algorithm in 3.4 once it lands in OpenSSL.

    The Keccak site also has a reference implementation in C and Assembler [2]. It may take some effort to integrate the reference implementation as it contains several optimized backends for X86, X86_64, SIMD and various ARM platforms.

    [1] http://www.nist.gov/itl/csd/sha-100212.cfm
    [2] http://keccak.noekeon.org/

    @tiran tiran added extension-modules C modules in the Modules dir type-feature A feature request or enhancement labels Oct 3, 2012
    @jcea
    Copy link
    Member

    jcea commented Oct 3, 2012

    We have MD5, SHA1, sha256, sha512 implemented, to use when openssl is not available. Can we do the same with sha-3?. I would suggest to adopt the reference implementation without extensive optimizations, since we will have them when openssl has them.

    So we might implement SHA-3 now and integrate OpenSSL implementation later, when available. This is interesting, for instance, because many users of Python 3.4 will have a non "up to date" OpenSSL system library.

    @tiran
    Copy link
    Member Author

    tiran commented Oct 3, 2012

    I've done some experiments with the reference implementation and adopted code of sha1module.c for sha3: https://bitbucket.org/tiran/pykeccak

    So far the code just compiles (64bit only) but doesn't work properly yet. I may need to move away from the NIST interface and use the sponge interface directly.

    @tiran tiran self-assigned this Oct 3, 2012
    @bjornedstrom
    Copy link
    Mannequin

    bjornedstrom mannequin commented Oct 3, 2012

    For what it's worth, I've built a working C-based sha3-module that is available here: https://github.com/bjornedstrom/python-sha3

    Note that I've only tested this on Python 2, for Python 3 YMMV.

    Best regards
    Björn

    @tiran
    Copy link
    Member Author

    tiran commented Oct 4, 2012

    Hello Björn,

    thanks for the information. Your package didn't turn up on Google when I started with my experiment. Perhaps it's too new?

    Your code and mine have lots of similarities. I was amused when I saw that you had the same issue with the block size attribute. At first I set it to 200 (1600 / 8) but eventually I didn't implement it.

    My code does everything in C with a separate constructor for each flavor of SHA-3. It's compatible to Python 2.6 to 3.4 and uses the optimized code for 32 and 64bit platforms.

    Oh, and my code is now working properly. Feel free to review the module. I'll upload the test code later.

    @tiran
    Copy link
    Member Author

    tiran commented Oct 4, 2012

    Release 0.1 of pysha3 [1] is out. I've tweaked the C module to make it compatible with Python 2.6 to 3.4. The module and its tests run successfully under Linux and Windows. So far I've tested Linux X84_64 (2.7, 3.2, 3.3, 3.4), Windows X86 (2.6, 2.7, 3.2, 3.3) and Windows X86_64 (2.6, 2.7, 3.2, 3.3).

    Please review Modules/sha3module.c and ignore all version specific #if blocks. For Python 3.4 I'm going to remove all blocks for Python < 3.3.

    [1] http://pypi.python.org/pypi/pysha3/0.1

    @pitrou
    Copy link
    Member

    pitrou commented Oct 4, 2012

    Please review Modules/sha3module.c

    Can't you post a patch here?

    @tiran
    Copy link
    Member Author

    tiran commented Oct 4, 2012

    How about a sandbox repos?

    @pitrou
    Copy link
    Member

    pitrou commented Oct 4, 2012

    Good, you can click the "create patch" button when it's ready :)

    @tiran
    Copy link
    Member Author

    tiran commented Oct 4, 2012

    Antoine pointed out that the code contains C++ comments and exports a lot of functions. The latest patch has all // comments replaced, marks all functions and globals as static and #includes the C files directly.

    @tiran
    Copy link
    Member Author

    tiran commented Oct 4, 2012

    Please review the latest patch.

    I've included Gregory as he is the creator of hashlib.

    @tiran
    Copy link
    Member Author

    tiran commented Oct 5, 2012

    The hightlights of the next patch are

    • release the GIL
    • more test vectors
    • remove bgr_endian.h
    • move typedef UINT64 to sha3module
    • declare more globals as static

    @tiran
    Copy link
    Member Author

    tiran commented Oct 5, 2012

    I've documented the optimization options of Keccak. The block also contains a summarization of my modifications of the reference code.

    http://hg.python.org/sandbox/cheimes/file/57948df78dbd/Modules/_sha3/sha3module.c#l22

    @tiran
    Copy link
    Member Author

    tiran commented Oct 5, 2012

    New patch. I've removed the dependency on uint64 types. On platforms without a uint64 type the module is using the 32bit implementation with interleave tables.

    By the way the SSE / SIMD instructions aren't useful. They are two to four times slower.

    @gpshead
    Copy link
    Member

    gpshead commented Oct 5, 2012

    don't worry about optimization settings in python itself for now. the canonical optimized version will be in a future openssl version. now that it has been declared the standard it will get a *lot* more attention in the next few years.

    as it is, we _may_ want to replace this reference implementation with one from libtomcrypt in the future when it gets around to implementing it just so that the code for all of our bundled hash functions comes from the same place.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 6, 2012

    New changeset 11c9a894680e by Christian Heimes in branch 'default':
    Issue bpo-16113: integrade SHA-3 (Keccak) patch from http://hg.python.org/sandbox/cheimes
    http://hg.python.org/cpython/rev/11c9a894680e

    @tiran
    Copy link
    Member Author

    tiran commented Oct 6, 2012

    The code has landed in default. Let's see how the build bots like my patch and the reference implementation.

    @sbt
    Copy link
    Mannequin

    sbt mannequin commented Oct 7, 2012

    _sha3 is not being built on Windows, so importing hashlib fails

    >>> import hashlib
    ERROR:root:code for hash sha3_224 was not found.
    Traceback (most recent call last):
      File "C:\Repos\cpython-dirty\lib\hashlib.py", line 109, in __get_openssl_constructor
        f = getattr(_hashlib, 'openssl_' + name)
    AttributeError: 'module' object has no attribute 'openssl_sha3_224'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "C:\Repos\cpython-dirty\lib\hashlib.py", line 154, in <module>
        globals()[__func_name] = __get_hash(__func_name)
      File "C:\Repos\cpython-dirty\lib\hashlib.py", line 116, in __get_openssl_constructor
        return __get_builtin_constructor(name)
      File "C:\Repos\cpython-dirty\lib\hashlib.py", line 104, in __get_builtin_constructor
        raise ValueError('unsupported hash type ' + name)
    ValueError: unsupported hash type sha3_224
    ...

    @tiran
    Copy link
    Member Author

    tiran commented Oct 7, 2012

    I've pushed a fix about 5 minutes ago. The module wasn't compiled in debug builds due to an error in the project file. Please update your copy and try again.

    @tiran
    Copy link
    Member Author

    tiran commented Oct 7, 2012

    6cf6b8265e57 and 8172cc8bfa6d have fixed the issue on my VM. I didn't noticed the issue as I only tested hashlib with the release builds, not the debug builds. Sorry for that.

    @sbt
    Copy link
    Mannequin

    sbt mannequin commented Oct 7, 2012

    6cf6b8265e57 and 8172cc8bfa6d have fixed the issue on my VM. I didn't
    noticed the issue as I only tested hashlib with the release builds, not
    the debug builds. Sorry for that.

    Ah. I did not even notice there was _sha3.vcxproj.

    Is there any particular reason for not making it part of python3.dll like _sha1, _sha256, _sha512 are? (I thought it was only modules with special link requirements that became separate DLLs.)

    @tiran
    Copy link
    Member Author

    tiran commented Oct 7, 2012

    The module is rather large (about 190 KB) because the optimized SHA-3 implementation isn't optimized for size. For this reason I like to keep the module out of the main binary for now.

    @englabenny
    Copy link
    Mannequin

    englabenny mannequin commented Feb 27, 2013

    Please do not go forward until NIST publishes its SHA-3 specification document. We don't know yet what parameters they will finally choose when making Keccak SHA-3.

    @englabenny
    Copy link
    Mannequin

    englabenny mannequin commented May 29, 2013

    NIST has published a tentative schedule for SHA-3 standardization. They expect to publish in the second quarter of 2014.

    See http://csrc.nist.gov/groups/ST/hash/sha-3/timeline_fips.html

    and http://csrc.nist.gov/groups/ST/hash/sha-3/sha-3_standardization.html

    @habnabit
    Copy link
    Mannequin

    habnabit mannequin commented Jun 27, 2013

    As long as the reference Keccak code is going to live in the python stdlib anyway, I would /greatly/ appreciate it if the Keccak sponge function was directly exposed instead of just the fixed parameters used for SHA-3.

    A Keccak sponge can have a much wider range of rates/capacities, and after absorption can have any number of bytes squeezed out. The ability to get an unbounded number of bytes out is very useful and I've written some code that uses that behavior. I ended up having to write my own Keccak python library since none of the other SHA-3 libraries exposed this either.

    @tiran
    Copy link
    Member Author

    tiran commented Jun 27, 2013

    Hi Aaron,

    it's a tempting idea but I have to decline. The API is deliberately limited to the NIST interface. Once OpenSSL gains SHA-3 support we are going to use it in favor for the reference implementation. I don't expect OpenSSL to provide the full sponge API.

    I also like to keep all options open so I can switch to a different and perhaps smaller implementation in the future. The reference implementation is huge and the binary is more than 400 KB. For comparison the SHA-2 384 + 512 module's binary is just about 60 KB on a 64bit Linux system.

    Once a a new API has been introduced it's going to take at least two minor Python release and about four to five years to remove it.

    But I could add a more flexible interface to Keccak's sponge to my standalone sha3 module https://pypi.python.org/pypi/pysha3 ...

    @habnabit
    Copy link
    Mannequin

    habnabit mannequin commented Jun 27, 2013

    https://pypi.python.org/pypi/cykeccak/ is what I've written to do this, for reference.

    Honestly I hope that the Keccak sponge is directly exposed in openssl (or any other SHA-3 implementation) because of its utility beyond SHA-3. If the source of some other implementation is going to be bundled with python anyway, it shouldn't be difficult to expose the sponge bits.

    @larryhastings
    Copy link
    Contributor

    I should clarify, I don't speak for 2.7. The rules there are a little different and it's up to Benjamin to decide. But please don't add new features to 3.4 and 3.5.

    @Arfrever Arfrever mannequin changed the title SHA-3 (Keccak) support may need to be removed before 3.4 Add SHA-3 (Keccak) support Oct 15, 2015
    @Arfrever Arfrever mannequin reopened this Oct 15, 2015
    @Arfrever Arfrever mannequin removed the release-blocker label Oct 15, 2015
    @bjornedstrom
    Copy link
    Mannequin

    bjornedstrom mannequin commented Oct 19, 2015

    Remember that FIPS202 slightly change some parts of the Keccak that won the competition, so test results are different. I updated my stand alone SHA3 module, for anyone who is interested in using this now in Python 2 and 3.

    https://github.com/bjornedstrom/python-sha3

    @tiran
    Copy link
    Member Author

    tiran commented Apr 22, 2016

    The authors of Keccak have released a new version of the Keccak Code Package, http://keccak.noekeon.org/reorganized_code.html . The new package makes it much easier to integrate Keccak in Python. I'm working on a new patch with SHA3 and SHAKE support.

    @tiran tiran changed the title Add SHA-3 (Keccak) support Add SHA-3 and SHAKE (Keccak) support Apr 22, 2016
    @tiran
    Copy link
    Member Author

    tiran commented May 6, 2016

    This patch implements SHA-3 and SHAKE for Python 3.6. The algorithm is provided by a slightly modified copy of the Keccak Code Package. I had to replace C++ comments and perform some minor cleanups.

    @pitrou
    Copy link
    Member

    pitrou commented May 7, 2016

    Is there any guidance or recommendation on how to use the SHAKE variants?

    @larryhastings
    Copy link
    Contributor

    Christian: any interest in proposing this for 2.7? We could ask Benjamin. It could still make 2.7.11--rc1 should be tagged in about a month.

    @gpshead
    Copy link
    Member

    gpshead commented May 7, 2016

    I'd there any good reason 2.7 needs this? They are available via pypi as
    extensions. (Read: I vote no)

    On Sat, May 7, 2016, 3:15 AM Larry Hastings <report@bugs.python.org> wrote:

    Larry Hastings added the comment:

    Christian: any interest in proposing this for 2.7? We could ask
    Benjamin. It could still make 2.7.11--rc1 should be tagged in about a
    month.

    ----------


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue16113\>


    @tiran
    Copy link
    Member Author

    tiran commented May 8, 2016

    Larry,
    I'm with Gregory. There is no good reason to add SHA3 to Python 2.7. The SHA-2 family is still safe. Besides I'd rather add BLAKE2 to Python 2.7. It's much faster and more versatile than SHA3.

    Antoine,
    SHAKEs are XOF (extensible output function). NIST has standardized the XOFs but not yet approved them as replacement for other constructs. They are useful for signatures or as a simple stream cipher. The SHAKEs were low hanging fruits to implement, so I included them.

    @tiran
    Copy link
    Member Author

    tiran commented Jun 2, 2016

    New patch:

    • I moved the test vectors out of the repos. They are currently hosted on github. I'll move them to pythontest infra later.

    @gpshead
    Copy link
    Member

    gpshead commented Jun 2, 2016

    comments added to the code review.

    @tiran tiran removed their assignment Jun 12, 2016
    @tiran
    Copy link
    Member Author

    tiran commented Aug 20, 2016

    Patch 3 addresses GPS' code review.

    @rhettinger
    Copy link
    Contributor

    The SHAKEs were low hanging fruits to implement, so I included them.

    I don't think this is sufficient motivation. Each new API is a permanent maintenance and documentation burden. It is also a burden to every new user seeing the module and trying to decide which offering to use. We should provide tools that we know people need and error on the side of economy. I asked a room full of network engineers about SHAKE and not a single one of them had heard of it, so I think it would be premature to add to the standard library.

    @dstufft
    Copy link
    Member

    dstufft commented Aug 22, 2016

    I asked a room full of network engineers about SHAKE and not a single one of them had heard of it

    Why would a network engineer know about a new variable length hashing algorithm? It's not really within their problem domain.

    @habnabit
    Copy link
    Mannequin

    habnabit mannequin commented Aug 22, 2016

    I'm not sure why one would pick and choose here—SHAKE is part of the NIST
    SHA-3 standard.

    @tiran
    Copy link
    Member Author

    tiran commented Aug 22, 2016

    The maintenance burden is minimal. All six algorithms are just variants of the same KeccakP-1600 sponge construction with different initialization parameters for rate, capacity, delimiter and output size. SHAKEs have no default output len and another delimiter as SHA3s. https://github.com/gvanas/KeccakCodePackage/blob/master/Modes/KeccakHash.h#L34

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 7, 2016

    New changeset f8700ee4aef0 by Christian Heimes in branch 'default':
    Issue bpo-16113: Add SHA-3 and SHAKE support to hashlib module.
    https://hg.python.org/cpython/rev/f8700ee4aef0

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 7, 2016

    New changeset 4971ca2960c7 by Christian Heimes in branch 'default':
    Issue bpo-16113: KeccakP-1600-opt64 does not support big endian platforms yet.
    https://hg.python.org/cpython/rev/4971ca2960c7

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 7, 2016

    New changeset e8884dcace9f by Christian Heimes in branch 'default':
    Issue bpo-16113: compile the module on Windows, too.
    https://hg.python.org/cpython/rev/e8884dcace9f

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 7, 2016

    New changeset 68df416e94ba by Christian Heimes in branch 'default':
    Issue bpo-16113: take 2 on big endian machines.
    https://hg.python.org/cpython/rev/68df416e94ba

    @tiran
    Copy link
    Member Author

    tiran commented Sep 7, 2016

    A buildbot is complaining about strict aliasing:

    In file included from /buildbot/buildarea/3.x.ware-gentoo-x86.installed/build/Modules/_sha3/sha3module.c:113:0:
    /buildbot/buildarea/3.x.ware-gentoo-x86.installed/build/Modules/_sha3/kcp/KeccakP-1600-inplace32BI.c: In function ‘_PySHA3_KeccakP1600_SetBytesInLaneToZero’:
    /buildbot/buildarea/3.x.ware-gentoo-x86.installed/build/Modules/_sha3/kcp/KeccakP-1600-inplace32BI.c:97:5: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
    low = *((UINT32*)(laneAsBytes+0));
    ^
    In file included from /buildbot/buildarea/3.x.ware-gentoo-x86.installed/build/Modules/_sha3/sha3module.c:113:0:
    /buildbot/buildarea/3.x.ware-gentoo-x86.installed/build/Modules/_sha3/kcp/KeccakP-1600-inplace32BI.c: In function ‘_PySHA3_KeccakP1600_AddBytesInLane’:
    /buildbot/buildarea/3.x.ware-gentoo-x86.installed/build/Modules/_sha3/kcp/KeccakP-1600-inplace32BI.c:152:5: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
    low = *((UINT32*)(laneAsBytes+0));
    ^
    /buildbot/buildarea/3.x.ware-gentoo-x86.installed/build/Modules/_sha3/kcp/KeccakP-1600-inplace32BI.c: In function ‘_PySHA3_KeccakP1600_ExtractBytesInLane’:
    /buildbot/buildarea/3.x.ware-gentoo-x86.installed/build/Modules/_sha3/kcp/KeccakP-1600-inplace32BI.c:294:5: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
    *((UINT32*)(laneAsBytes+0)) = low;
    ^
    /buildbot/buildarea/3.x.ware-gentoo-x86.installed/build/Modules/_sha3/kcp/KeccakP-1600-inplace32BI.c: In function ‘_PySHA3_KeccakP1600_ExtractAndAddBytesInLane’:
    /buildbot/buildarea/3.x.ware-gentoo-x86.installed/build/Modules/_sha3/kcp/KeccakP-1600-inplace32BI.c:367:5: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
    *((UINT32*)(laneAsBytes+0)) = low;

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 7, 2016

    New changeset ddc95a9bc2e0 by Christian Heimes in branch 'default':
    Issue bpo-16113: one more C90 violation in big endian code.
    https://hg.python.org/cpython/rev/ddc95a9bc2e0

    @tiran tiran closed this as completed Sep 8, 2016
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 8, 2016

    New changeset e5871ffe9ac0 by Christian Heimes in branch 'default':
    Issue bpo-16113: SHA3: allocate extra memory for lane extraction and check return value of PyModule_Create()
    https://hg.python.org/cpython/rev/e5871ffe9ac0

    @mgorny
    Copy link
    Mannequin

    mgorny mannequin commented Feb 28, 2017

    Christian, since the code is now integrated in Python 3.6+ (with some bugfixes AFAICS), could you consider updating your bitbucket package to match it? It would be helpful as a backport package for older Python versions.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    9 participants