Broad XX enhancement and performance improvements #32

mmaiers-nmdp · 2020-03-09T14:45:30Z

Closes DRB1*02:XX #31 creates mappings for broad XX allele codes like DRB1*02:XX based on static file
pyard/dna_relshp.csv added to repo. (extracted from STAR mdp_cmn_prd..t_dna_relshp) no changes in 20 years)
Modified logging to not reconfigure format.
added documentation
Several performance enhancements

refactored isvalid() function to use a dictionary instead of testing whether allele is in a list
refactored redux() function to use a dictionary instead of testing whether allele is in a list
precompiled regex instead of compiling it during each call to redux() for performance
used memoization to store results of previous calls to redux()

profile of convert_pull.py before performance enhancements

   744150  254.186    0.000  255.649    0.000 pyard.py:457(isvalid)
   371361  107.522    0.000  108.164    0.000 pyard.py:366(redux)
  5227562    9.539    0.000    9.979    0.000 pyard.py:51(getvalue)
        1    6.400    6.400    6.400    6.400 {built-in method _pickle.load}
  2613781    4.156    0.000   16.305    0.000 pyard.py:55(loci_sort)
  7501937    3.821    0.000    5.614    0.000 re.py:271(_compile)
  7501306    3.100    0.000   11.050    0.000 re.py:180(search)
 10968273    2.467    0.000    2.467    0.000 {method 'split' of 'str' objects}
  7501361    2.360    0.000    2.360    0.000 {method 'search' of 're.Pattern' objects}
     6020    2.207    0.000    2.207    0.000 pyard.py:450(<lambda>)

profile of convert_pull.py after performance enhancements

   281302   89.124    0.000   89.349    0.000 pyard.py:384(redux)
  5237286    9.816    0.000   10.269    0.000 pyard.py:52(getvalue)
        1    5.149    5.149    5.149    5.149 {built-in method _pickle.load}
  2618643    4.335    0.000   16.832    0.000 pyard.py:56(loci_sort)
     1439    3.616    0.003    3.616    0.003 {method 'flush' of '_io.TextIOWrapper' objects}
  7130577    3.144    0.000    4.640    0.000 re.py:271(_compile)
  7129945    2.758    0.000    9.248    0.000 re.py:180(search)
 10987721    2.468    0.000    2.468    0.000 {method 'split' of 'str' objects}
     6020    2.350    0.000    2.350    0.000 pyard.py:473(<lambda>)
  7411302    1.943    0.000    1.943    0.000 {method 'search' of 're.Pattern' objects}

the improvement in isvald() has been dramatic.
the improvement in redux() has been less substantial.

further performance enhancement suggestions

increase memoization cache size for redux(). Currently using @functools.lru_cache(maxsize=1000)
add memoization to getvalue()
have a code clinic to discuss other ideas (parallelize? more caching/memoizing?)

cicd-nmdp

🥇 LGTM

pbashyal-nmdp

LGTM 👍

mmaiers-nmdp added 4 commits March 6, 2020 22:44

handle broad XX codes

87a0a7e

relshp file

ac95609

packaging

802b909

performance

8ed549f

mmaiers-nmdp requested a review from pbashyal-nmdp March 9, 2020 14:45

mmaiers-nmdp added 2 commits March 9, 2020 17:04

performance code clinic

6d72f57

gitignore and performance enhancements to pyard.py

789d4be

cicd-nmdp approved these changes Mar 10, 2020

View reviewed changes

pbashyal-nmdp approved these changes Mar 10, 2020

View reviewed changes

pbashyal-nmdp merged commit 7983a79 into nmdp-bioinformatics:master Mar 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Broad XX enhancement and performance improvements #32

Broad XX enhancement and performance improvements #32

Uh oh!

mmaiers-nmdp commented Mar 9, 2020

Uh oh!

cicd-nmdp left a comment

Uh oh!

pbashyal-nmdp left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Broad XX enhancement and performance improvements #32

Broad XX enhancement and performance improvements #32

Uh oh!

Conversation

mmaiers-nmdp commented Mar 9, 2020

profile of convert_pull.py before performance enhancements

profile of convert_pull.py after performance enhancements

further performance enhancement suggestions

Uh oh!

cicd-nmdp left a comment

Choose a reason for hiding this comment

Uh oh!

pbashyal-nmdp left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants