Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Commits on Oct 27, 2012
  1. crypto: serpent - add AVX2/x86_64 assembler implementation of serpent…

    authored
    … cipher
    
    TODO: benchmarks on real hardware.
    
    Performance tests using Intel Architecture Code Analyzer Version - 2.0.1:
    
    Estimate assume that latencies and throughput of 256-bit AVX2 instructions
    are same as of 128-bit AVX instructions on Ivy Bridge.
    
    serpent-16way, cycles/byte: 4.84
    
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
  2. crypto: twofish - add AVX2/x86_64 assembler implementation of twofish…

    authored
    … cipher
    
    TODO: benchmarks on real hardware.
    
    Performance tests using Intel Architecture Code Analyzer Version - 2.0.1:
    
    Estimates assume that latencies and throughput of 256-bit AVX2 instructions
    are same as of 128-bit AVX instructions, except vpgatherdd that is estimated as
    following:
     *1: Dispatches 8 load uops, that can be dispatched to two load ports, and 2
         helper uops for merging loads. This also is nearly same as estimating
         case where there is only one load port with gather hardware with 4 cycles
         latency.
     *2: Dedicated gather hardware on two load ports, 1 uop with latency of 8
         cycles (can fetch parallel on two ports) + 2 helper uops.
     *3: Dedicated gather hardware on two load port, 1 uop with latency of 4
         cycles (can fetch parallel on two ports) + 2 helper uops.
    
    twofish-16way, cycles/byte:
     *1:    5.40
     *2:    5.80
     *3:    3.87
    
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
  3. crypto: blowfish - add AVX2/x86_64 implementation of blowfish cipher

    authored
    TODO: test performance on real hardware.
    
    Performance tests using Intel Architecture Code Analyzer Version - 2.0.1:
    
    Estimates assume that latencies and throughput of 256-bit AVX2 instructions
    are same as of 128-bit AVX instructions, except vpgatherdd that is estimated as
    following:
     *1: Dispatches 8 load uops, that can be dispatched to two load ports, and 2
         helper uops for merging loads. This also is nearly same as estimating
         case where there is only one load port with gather hardware with 4 cycles
         latency.
     *2: Dedicated gather hardware on two load ports, 1 uop with latency of 8
         cycles (can fetch parallel on two ports) + 2 helper uops.
     *3: Dedicated gather hardware on two load port, 1 uop with latency of 4
         cycles (can fetch parallel on two ports) + 2 helper uops.
    
    blowfish-32way, cycles/byte:
     *1:	4.22
     *2:	3.76
     *3:	2.62
    
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
  4. crypto: tcrypt - randomize speed test memory

    authored
    Running speed tests on uniform memory can give better performance results than on random/real-world data. Therefore randomize memory
    to be used by speed tests.
    
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
  5. crypto: tcrypt - add async cipher speed tests for blowfish and camellia

    authored
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Commits on Oct 26, 2012
  1. crypto: cast5/cast6 - move lookup tables to shared module

    authored
    CAST5 and CAST6 both use same lookup tables, which can be moved shared module
    'cast_common'.
    
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
  2. [v2] crypto: camellia - add AES-NI/AVX/x86_64 assembler implementatio…

    authored
    …n of camellia cipher
    
    This patch adds AES-NI/AVX/x86_64 assembler implementation of Camellia block
    cipher. Implementation process data in sixteen block chunks, which are
    byte-sliced and AES SubBytes is reused for Camellia s-box with help of pre-
    and post-filtering.
    
    Patch has been tested with tcrypt and automated filesystem tests.
    
    tcrypt test results:
    
    Intel Core i5-2450M:
    
    camellia-aesni-avx vs camellia-asm-x86_64-2way:
    128bit key:                                             (lrw:256bit)    (xts:256bit)
    size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
    16B     0.98x   0.96x   0.99x   0.96x   0.96x   0.95x   0.95x   0.94x   0.97x   0.98x
    64B     0.99x   0.98x   1.00x   0.98x   0.98x   0.99x   0.98x   0.93x   0.99x   0.98x
    256B    2.28x   2.28x   1.01x   2.29x   2.25x   2.24x   1.96x   1.97x   1.91x   1.90x
    1024B   2.57x   2.56x   1.00x   2.57x   2.51x   2.53x   2.19x   2.17x   2.19x   2.22x
    8192B   2.49x   2.49x   1.00x   2.53x   2.48x   2.49x   2.17x   2.17x   2.22x   2.22x
    
    256bit key:                                             (lrw:384bit)    (xts:512bit)
    size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
    16B     0.97x   0.98x   0.99x   0.97x   0.97x   0.96x   0.97x   0.98x   0.98x   0.99x
    64B     1.00x   1.00x   1.01x   0.99x   0.98x   0.99x   0.99x   0.99x   0.99x   0.99x
    256B    2.37x   2.37x   1.01x   2.39x   2.35x   2.33x   2.10x   2.11x   1.99x   2.02x
    1024B   2.58x   2.60x   1.00x   2.58x   2.56x   2.56x   2.28x   2.29x   2.28x   2.29x
    8192B   2.50x   2.52x   1.00x   2.56x   2.51x   2.51x   2.24x   2.25x   2.26x   2.29x
    
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
  3. [v2] crypto: camellia-x86_64 - share common functions and move struct…

    authored
    …ures and function definitions to header file
    
    Prepare camellia-x86_64 functions to be reused from AVX/AESNI implementation
    module.
    
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Commits on Oct 24, 2012
  1. [v2] crypto: tcrypt - add async speed test for camellia cipher

    authored
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
  2. [v2] crypto: cryptd - disable softirqs in cryptd_queue_worker to prev…

    authored
    …ent data corruption
    
    cryptd_queue_worker attempts to prevent simultaneous accesses to crypto
    workqueue by cryptd_enqueue_request using preempt_disable/preempt_enable.
    However cryptd_enqueue_request might be called from softirq context,
    so add local_bh_disable/local_bh_enable to prevent data corruption and
    panics.
    
    Bug report at http://marc.info/?l=linux-crypto-vger&m=134858649616319&w=2
    
    v2:
     - Disable software interrupts instead of hardware interrupts
    
    Cc: stable@vger.kernel.org
    Reported-by: Gurucharan Shetty <gurucharan.shetty@gmail.com>
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
  3. crypto: aesni - fix XTS mode on x86-32, add wrapper function for asml…

    authored
    …inkage aesni_enc()
    
    Calling convention for internal functions and 'asmlinkage' functions is
    different on x86-32. Therefore do not directly cast aesni_enc as XTS tweak
    function, but use wrapper function in between. Fixes crash with "XTS +
    aesni_intel + x86-32" combination.
    
    Cc: stable@vger.kernel.org
    Reported-by: Krzysztof Kolasa <kkolasa@winsoft.pl>
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
  4. crypto - ablk_helper: add module parameter to allow testing cryptd re…

    authored
    …direction of requests
    
    Add module parameter to allow forcing redirection of crypto requests to
    cryptd worker threads. This allows these code paths to be actually tested
    (easier).
    
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
  5. @herbertx

    crypto: tegra - fix missing unlock on error case

    Wei Yongjun authored herbertx committed
    Add the missing unlock on the error handling path in function
    tegra_aes_get_random() and tegra_aes_rng_reset().
    
    Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
  6. @herbertx

    crypto: cast5/avx - avoid using temporary stack buffers

    authored herbertx committed
    Introduce new assembler functions to avoid use temporary stack buffers in glue
    code. This also allows use of vector instructions for xoring output in CTR and
    CBC modes and construction of IVs for CTR mode.
    
    ECB mode sees ~0.5% decrease in speed because added one extra function
    call. CBC mode decryption and CTR mode benefit from vector operations
    and gain ~5%.
    
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
  7. @herbertx

    crypto: serpent/avx - avoid using temporary stack buffers

    authored herbertx committed
    Introduce new assembler functions to avoid use temporary stack buffers in glue
    code. This also allows use of vector instructions for xoring output in CTR and
    CBC modes and construction of IVs for CTR mode.
    
    ECB mode sees ~0.5% decrease in speed because added one extra function
    call. CBC mode decryption and CTR mode benefit from vector operations
    and gain ~3%.
    
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
  8. @herbertx

    crypto: twofish/avx - avoid using temporary stack buffers

    authored herbertx committed
    Introduce new assembler functions to avoid use temporary stack buffers in glue
    code. This also allows use of vector instructions for xoring output in CTR and
    CBC modes and construction of IVs for CTR mode.
    
    ECB mode sees ~0.2% decrease in speed because added one extra function
    call. CBC mode decryption and CTR mode benefit from vector operations
    and gain ~3%.
    
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
  9. @herbertx

    crypto: cast6/avx - avoid using temporary stack buffers

    authored herbertx committed
    Introduce new assembler functions to avoid use temporary stack buffers in
    glue code. This also allows use of vector instructions for xoring output
    in CTR and CBC modes and construction of IVs for CTR mode.
    
    ECB mode sees ~0.5% decrease in speed because added one extra function
    call. CBC mode decryption and CTR mode benefit from vector operations
    and gain ~2%.
    
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
  10. @herbertx

    crypto: x86/glue_helper - use le128 instead of u128 for CTR mode

    authored herbertx committed
    'u128' currently used for CTR mode is on little-endian 'long long' swapped
    and would require extra swap operations by SSE/AVX code. Use of le128
    instead of u128 allows IV calculations to be done with vector registers
    easier.
    
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
  11. @herbertx

    crypto: testmgr - add new larger DES3_EDE testvectors

    authored herbertx committed
    Most DES3_EDE testvectors are short and do not test parallelised codepaths
    well. Add larger testvectors to test large crypto operations and to test
    multi-page crypto with DES3_EDE.
    
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
    Acked-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
  12. @herbertx

    crypto: testmgr - add new larger DES testvectors

    authored herbertx committed
    Most DES testvectors are short and do not test parallelised codepaths
    well. Add larger testvectors to test large crypto operations and to test
    multi-page crypto with DES.
    
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
    Acked-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
  13. @herbertx

    crypto: testmgr - add new larger AES testvectors

    authored herbertx committed
    Most AES testvectors are short and do not test parallelised codepaths
    well. Add larger testvectors to test large crypto operations and to test
    multi-page crypto with AES.
    
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
    Acked-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
  14. @herbertx

    crypto: testmgr - expand serpent test vectors

    authored herbertx committed
    AVX2 implementation of serpent cipher processes 16 blocks parallel, so
    we need to make test vectors larger to check parallel code paths.
    
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
    Acked-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
  15. @herbertx

    crypto: testmgr - expand blowfish test vectors

    authored herbertx committed
    AVX2 implementation of blowfish cipher processes 32 blocks parallel, so
    we need to make test vectors larger to check parallel code paths.
    
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
    Acked-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
  16. @herbertx

    crypto: testmgr - expand camellia test vectors

    authored herbertx committed
    AVX/AES-NI implementation of camellia cipher processes 16 blocks
    parallel, so we need to make test vectors larger to check parallel
    code paths.
    
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
    Acked-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Commits on Oct 18, 2012
Commits on Oct 15, 2012
  1. @sqazi @herbertx

    crypto: vmac - Make VMAC work when blocks aren't aligned

    sqazi authored herbertx committed
    VMAC implementation, as it is, does not work with blocks that
    are not multiples of 128-bytes.  Furthermore, this is a problem
    when using the implementation on scatterlists, even
    when the complete plain text is 128-byte multiple, as the pieces
    that get passed to vmac_update can be pretty much any size.
    
    I also added test cases for unaligned blocks.
    
    Signed-off-by: Salman Qazi <sqazi@google.com>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
  2. @herbertx

    crypto: talitos - convert to use be16_add_cpu()

    Wei Yongjun authored herbertx committed
    Convert cpu_to_be16(be16_to_cpu(E1) + E2) to use be16_add_cpu().
    
    dpatch engine is used to auto generate this patch.
    (https://github.com/weiyj/dpatch)
    
    Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
  3. @herbertx

    crypto: tcrypt - Added speed test in tcrypt for crc32c

    Tim Chen authored herbertx committed
    This patch adds a test case in tcrypt to perform speed test for
    crc32c checksum calculation.
    
    Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
  4. @herbertx

    crypto: crc32c - Optimize CRC32C calculation with PCLMULQDQ instruction

    Tim Chen authored herbertx committed
    This patch adds the crc_pcl function that calculates CRC32C checksum using the
    PCLMULQDQ instruction on processors that support this feature. This will
    provide speedup over using CRC32 instruction only.
    The usage of PCLMULQDQ necessitate the invocation of kernel_fpu_begin and
    kernel_fpu_end and incur some overhead.  So the new crc_pcl function is only
    invoked for buffer size of 512 bytes or more.  Larger sized
    buffers will expect to see greater speedup.  This feature is best used coupled
    with eager_fpu which reduces the kernel_fpu_begin/end overhead.  For
    buffer size of 1K the speedup is around 1.6x and for buffer size greater than
    4K, the speedup is around 3x compared to original implementation in crc32c-intel
    module. Test was performed on Sandy Bridge based platform with constant frequency
    set for cpu.
    
    A white paper detailing the algorithm can be found here:
    http://download.intel.com/design/intarch/papers/323405.pdf
    
    Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
  5. @herbertx

    crypto: crc32c - Rename crc32c-intel.c to crc32c-intel_glue.c

    Tim Chen authored herbertx committed
    This patch renames the crc32c-intel.c file to crc32c-intel_glue.c file
    in preparation for linking with the new crc32c-pcl-intel-asm.S file,
    which contains optimized crc32c calculation based on PCLMULQDQ
    instruction.
    
    Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Commits on Oct 14, 2012
  1. @torvalds

    Linux 3.7-rc1

    torvalds authored
  2. @torvalds

    Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upst…

    torvalds authored
    …ream-linus
    
    Pull MIPS update from Ralf Baechle:
     "Cleanups and fixes for breakage that occured earlier during this merge
      phase.  Also a few patches that didn't make the first pull request.
      Of those is the Alchemy work that merges code for many of the SOCs and
      evaluation boards thus among other code shrinkage, reduces the number
      of MIPS defconfigs by 5."
    
    * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (22 commits)
      MIPS: SNI: Switch RM400 serial to SCCNXP driver
      MIPS: Remove unused empty_bad_pmd_table[] declaration.
      MIPS: MT: Remove kspd.
      MIPS: Malta: Fix section mismatch.
      MIPS: asm-offset.c: Delete unused irq_cpustat_t struct offsets.
      MIPS: Alchemy: Merge PB1100/1500 support into DB1000 code.
      MIPS: Alchemy: merge PB1550 support into DB1550 code
      MIPS: Alchemy: Single kernel for DB1200/1300/1550
      MIPS: Optimize TLB refill for RI/XI configurations.
      MIPS: proc: Cleanup printing of ASEs.
      MIPS: Hardwire detection of DSP ASE Rev 2 for systems, as required.
      MIPS: Add detection of DSP ASE Revision 2.
      MIPS: Optimize pgd_init and pmd_init
      MIPS: perf: Add perf functionality for BMIPS5000
      MIPS: perf: Split the Kconfig option CONFIG_MIPS_MT_SMP
      MIPS: perf: Remove unnecessary #ifdef
      MIPS: perf: Add cpu feature bit for PCI (performance counter interrupt)
      MIPS: perf: Change the "mips_perf_event" table unsupported indicator.
      MIPS: Align swapper_pg_dir to 64K for better TLB Refill code.
      vmlinux.lds.h: Allow architectures to add sections to the front of .bss
      ...
  3. @torvalds

    Merge branch 'modules-next' of git://git.kernel.org/pub/scm/linux/ker…

    torvalds authored
    …nel/git/rusty/linux
    
    Pull module signing support from Rusty Russell:
     "module signing is the highlight, but it's an all-over David Howells frenzy..."
    
    Hmm "Magrathea: Glacier signing key". Somebody has been reading too much HHGTTG.
    
    * 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (37 commits)
      X.509: Fix indefinite length element skip error handling
      X.509: Convert some printk calls to pr_devel
      asymmetric keys: fix printk format warning
      MODSIGN: Fix 32-bit overflow in X.509 certificate validity date checking
      MODSIGN: Make mrproper should remove generated files.
      MODSIGN: Use utf8 strings in signer's name in autogenerated X.509 certs
      MODSIGN: Use the same digest for the autogen key sig as for the module sig
      MODSIGN: Sign modules during the build process
      MODSIGN: Provide a script for generating a key ID from an X.509 cert
      MODSIGN: Implement module signature checking
      MODSIGN: Provide module signing public keys to the kernel
      MODSIGN: Automatically generate module signing keys if missing
      MODSIGN: Provide Kconfig options
      MODSIGN: Provide gitignore and make clean rules for extra files
      MODSIGN: Add FIPS policy
      module: signature checking hook
      X.509: Add a crypto key parser for binary (DER) X.509 certificates
      MPILIB: Provide a function to read raw data into an MPI
      X.509: Add an ASN.1 decoder
      X.509: Add simple ASN.1 grammar compiler
      ...
  4. @mfleming @torvalds

    x86, boot: Explicitly include autoconf.h for hostprogs

    mfleming authored torvalds committed
    The hostprogs need access to the CONFIG_* symbols found in
    include/generated/autoconf.h.  But commit abbf159 ("UAPI: Partition
    the header include path sets and add uapi/ header directories") replaced
    $(LINUXINCLUDE) with $(USERINCLUDE) which doesn't contain the necessary
    include paths.
    
    This has the undesirable effect of breaking the EFI boot stub because
    the #ifdef CONFIG_EFI_STUB code in arch/x86/boot/tools/build.c is
    never compiled.
    
    It should also be noted that because $(USERINCLUDE) isn't exported by
    the top-level Makefile it's actually empty in arch/x86/boot/Makefile.
    
    Cc: H. Peter Anvin <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@kernel.org>
    Acked-by: David Howells <dhowells@redhat.com>
    Signed-off-by: Matt Fleming <matt.fleming@intel.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Something went wrong with that request. Please try again.