Skip to content

Conversation

@silverjam
Copy link
Contributor

@silverjam silverjam commented Feb 28, 2019

Design notes

Two general SBP parsing improvements:

  • To avoid buffer copies add an "into" variant for filling a buffer with SBP data, this allows the handler object to re-use one buffer and simply fill it with SBP data as needed -- note this only improves things on the send path, a similar variant on the recv path needs to be added.
  • Add a Numba based CRC routine to speed up CRC calculations

Testing

  • Send path copies/JIT:
    • Send path without reduced copies (and JIT):
      • 95.13, 108.14, 114.50
      • Average: 105.92
    • Send path with reduced copies and JIT:
      • 91.16, 94.41, 94.80
      • Average: 93.46
    • 11.76% improvement
    • Send path with reduced copies (and without JIT):
      • 147.58, 121.28, 107.57
      • Average: 125.47
    • Send path without reduced copies (and without JIT):
      • 111.83, 111.30, 102.92
      • Average: 108.68
    • 13.38% improvement
  • Receive path JIT:
    • Receive path with JIT:
      • 96.90, 99.66, 98.34, 99.38, 96.23, 95.08
      • Average: 97.60
    • Receive path without JIT:
      • 100.71, 100.55, 102.90, 100.26, 99.76, 101.41
      • Average: 100.93
    • 3.3% improvement

Todo

  • Re-enable JIT routines for CRCs
  • Re-stack git commits

@silverjam silverjam changed the title [WIP] Add JIT for crc, build into existing buffer to avoid copies [v0] Add JIT for crc, build into existing buffer to avoid copies Mar 4, 2019
@silverjam silverjam changed the title Add JIT for crc, build into existing buffer to avoid copies Add JIT for crc, build into existing buffer to avoid copies [ESD-1108] Mar 4, 2019
@silverjam silverjam changed the title Add JIT for crc, build into existing buffer to avoid copies [ESD-1108] [WIP] Add JIT for crc, build into existing buffer to avoid copies [ESD-1108] Mar 5, 2019
@silverjam silverjam force-pushed the silverjam/numba-jit0 branch from 2157a03 to eb21ef4 Compare March 6, 2019 01:57
@silverjam silverjam changed the title [WIP] Add JIT for crc, build into existing buffer to avoid copies [ESD-1108] Add JIT for crc, build into existing buffer to avoid copies [ESD-1108] Mar 6, 2019
@denniszollo
Copy link
Contributor

Three questions:

  • Do you plan to replace the callable of the handler / framer with these new "into_buffer" features that reduce copying?
  • Should the "bench" features be turned into unit tests type features instead?
  • Can you try the receive path benchmarking with files in which MsgObs which dominate the stream? These messages are usually big (255 bytes) so the CRC and copying might eat a larger proportion of CPU for streams dominated by the Obs, which will likely show that these changes will have a bigger performance impact.

@denniszollo
Copy link
Contributor

Wait I see you already setup into_buffer as part of the framer here: https://github.com/swift-nav/libsbp/pull/658/files#diff-f7e7ceb9c48458bb44be66283049a27bR162 so you can ignore my question about whether the python drivers use these features yet.

@silverjam
Copy link
Contributor Author

@denniszollo I think the bench scripts should be run regularly, but I don't think it should be part of CI-- maybe we can set these up in the Travis "cron" to run weekly or so.

@silverjam
Copy link
Contributor Author

@denniszollo Benchmarking obs would be good for the next task I have planned, which is to add an "into_buffer" variant for the receive path.

@silverjam silverjam force-pushed the silverjam/numba-jit0 branch 3 times, most recently from 5c16df1 to 1202479 Compare March 12, 2019 00:43
@silverjam silverjam force-pushed the silverjam/numba-jit0 branch 2 times, most recently from 045c9d0 to 05663fe Compare March 12, 2019 19:11
@pmiettinen
Copy link
Contributor

make python problems on Ubuntu 16.04, concerns py35 and py37.

Failed to build numba llvmlite
Installing collected packages: construct, futures, six, httpretty, pyserial, pyusb, pyftdi, pylibftdi, chardet, urllib3, idna, certifi, requests, requests-futures, numpy, llvmlite, numba, py, pytest, coverage, pytest-cov, cov-core, virtualenv, toml, pluggy, filelock, tox, ruamel.yaml
  Running setup.py install for llvmlite: started
    Running setup.py install for llvmlite: finished with status 'error'
    Complete output from command /mnt/users/pasi/swiftnav/libsbp/python/.tox/py35/bin/python3.5 -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-sp_tmt65/llvmlite/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-qxxtn93f/install-record.txt --single-version-externally-managed --compile --install-headers /mnt/users/pasi/swiftnav/libsbp/python/.tox/py35/include/site/python3.5/llvmlite:
    running install
    running build
    got version from file /tmp/pip-install-sp_tmt65/llvmlite/llvmlite/_version.py {'version': '0.27.1', 'full': 'f008359c1f9ee5e8a6a98f2095ea460f09f57edb'}
    running build_ext
    /mnt/users/pasi/swiftnav/libsbp/python/.tox/py35/bin/python3.5 /tmp/pip-install-sp_tmt65/llvmlite/ffi/build.py
    LLVM version... make[1]: Entering directory '/tmp/pip-install-sp_tmt65/llvmlite/ffi'
    # static-libstdc++ avoids runtime dependencies on a
    # particular libstdc++ version.
    g++  -shared -I/usr/lib/llvm-7/include -fno-exceptions -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -fno-rtti -g -flto assembly.cpp bitcode.cpp core.cpp initfini.cpp module.cpp value.cpp executionengine.cpp transforms.cpp passmanagers.cpp targets.cpp dylib.cpp linker.cpp object_file.cpp -o libllvmlite.so  -L/usr/lib/llvm-7/lib -flto -Wl,--exclude-libs=ALL -lLLVM-7
    transforms.cpp:10:32: warning: ‘unwrap’ violates the C++ One Definition Rule  [-Wodr]
         inline PassManagerBuilder *unwrap(LLVMPassManagerBuilderRef P) {
                                    ^
    executionengine.cpp:53:27: note: return value type mismatch
         inline TargetMachine *unwrap(LLVMTargetMachineRef P) {
                               ^
    /usr/lib/llvm-7/include/llvm/Target/TargetMachine.h:59:7: note: type name ‘llvm::TargetMachine’ should match type name ‘llvm::PassManagerBuilder’
     class TargetMachine {
           ^
    /usr/lib/llvm-7/include/llvm/Transforms/IPO/PassManagerBuilder.h:59:7: note: the incompatible type is defined here
     class PassManagerBuilder {
           ^
    executionengine.cpp:53:27: note: ‘unwrap’ was previously declared here
         inline TargetMachine *unwrap(LLVMTargetMachineRef P) {
                               ^
    executionengine.cpp:53:27: note: code may be misoptimized unless -fno-strict-aliasing is used
    /usr/bin/x86_64-linux-gnu-ld: /tmp/ccqf68sU.ltrans0.ltrans.o: relocation R_X86_64_32S against `_ZN4llvm3sys14DynamicLibrary7InvalidE' can not be used when making a shared object; recompile with -fPIC
    /tmp/ccqf68sU.ltrans0.ltrans.o: error adding symbols: Bad value
    collect2: error: ld returned 1 exit status
    Makefile.linux:20: recipe for target 'libllvmlite.so' failed
    make[1]: *** [libllvmlite.so] Error 1
    make[1]: Leaving directory '/tmp/pip-install-sp_tmt65/llvmlite/ffi'
    7.0.1

    SVML not detected
    Traceback (most recent call last):
      File "/tmp/pip-install-sp_tmt65/llvmlite/ffi/build.py", line 167, in <module>
        main()
      File "/tmp/pip-install-sp_tmt65/llvmlite/ffi/build.py", line 157, in main
        main_posix('linux', '.so')
      File "/tmp/pip-install-sp_tmt65/llvmlite/ffi/build.py", line 149, in main_posix
        subprocess.check_call(['make', '-f', makefile])
      File "/usr/lib/python3.5/subprocess.py", line 581, in check_call
        raise CalledProcessError(retcode, cmd)
    subprocess.CalledProcessError: Command '['make', '-f', 'Makefile.linux']' returned non-zero exit status 2
    error: command '/mnt/users/pasi/swiftnav/libsbp/python/.tox/py35/bin/python3.5' failed with exit status 1

Google finding https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=912790

@benjaminaltieri
Copy link
Contributor

benjaminaltieri commented Mar 13, 2019

@silverjam I haven't had a chance to review/test, but would it be difficult to encapsulate numpy/numba into an optional feature with a subset of cross-platform support, while retaining the un-optimized functionality across all previously supported platforms? that way a user external to Swift who may want to avoid these dependencies can do so, while internally we can benefit from the speedups you are introducing

@silverjam
Copy link
Contributor Author

@benjaminaltieri @pmiettinen Fixed Ubuntu 16.04 by pinning the numba/llvmlite versions to something that doesn't require LLVM 7. The eventual goal with these speed-ups is to not require numba for "deployed" versions of libsbp (that is, what we push to PyPI) since numba supports ahead of time compilation of accelerated functions (cffi, which is used in the Numba based parsing prototype, also supports this).

@silverjam
Copy link
Contributor Author

silverjam commented Mar 13, 2019

The continuous-integration/travis-ci/push failure is weird, both the push and pr travis check seem to be running the same unit test code (with "localhost" wrapped in getaddressbyname for the tcp test). On the push build py35 fails (but everything else succeeds)-- on the pr build all Python versions succeed... the one that's succeeding is the version that represents a merge to master, so this likely means the unit test will be fine once this PR is merged?

Copy link
Contributor

@denniszollo denniszollo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additional python dependencies don't worry me too much, that is the python way. I'm all for any performance improvements in the client libraries and think this makes sense.

Copy link
Contributor

@benjaminaltieri benjaminaltieri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IGTM!

+ If we're using libs that don't have a binary wheel available, we need
  Python dev packages to be able to build any C extensions from source.

+ Add numba and numpy to enable 'into_buffer' support
+ Use Numba JIT to speed up the crc routine in sbp.msg

+ Add 'into_buffer' support for sbp.msg so we can pack an SBP message
into an existing buffer at the specified offset.
This change allows us to re-use one buffer for serializing a group of
SBP messages and thus avoid unnecessary copies.
Make a helper function to reduce the scope of the try/catch that
attempts to coerce a value to an iterator.
@silverjam silverjam force-pushed the silverjam/numba-jit0 branch from 3d97ab0 to 01d354a Compare March 26, 2019 23:24
@silverjam silverjam merged commit c569e80 into master Mar 27, 2019
@silverjam silverjam deleted the silverjam/numba-jit0 branch March 27, 2019 04:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants