Skip to content

Conversation

@mmaiers-nmdp
Copy link
Contributor

  • Closes DRB1*02:XX #31 creates mappings for broad XX allele codes like DRB1*02:XX based on static file
  • pyard/dna_relshp.csv added to repo. (extracted from STAR mdp_cmn_prd..t_dna_relshp) no changes in 20 years)
  • Modified logging to not reconfigure format.
  • added documentation
  • Several performance enhancements
  1. refactored isvalid() function to use a dictionary instead of testing whether allele is in a list
  2. refactored redux() function to use a dictionary instead of testing whether allele is in a list
  3. precompiled regex instead of compiling it during each call to redux() for performance
  4. used memoization to store results of previous calls to redux()

profile of convert_pull.py before performance enhancements

   744150  254.186    0.000  255.649    0.000 pyard.py:457(isvalid)
   371361  107.522    0.000  108.164    0.000 pyard.py:366(redux)
  5227562    9.539    0.000    9.979    0.000 pyard.py:51(getvalue)
        1    6.400    6.400    6.400    6.400 {built-in method _pickle.load}
  2613781    4.156    0.000   16.305    0.000 pyard.py:55(loci_sort)
  7501937    3.821    0.000    5.614    0.000 re.py:271(_compile)
  7501306    3.100    0.000   11.050    0.000 re.py:180(search)
 10968273    2.467    0.000    2.467    0.000 {method 'split' of 'str' objects}
  7501361    2.360    0.000    2.360    0.000 {method 'search' of 're.Pattern' objects}
     6020    2.207    0.000    2.207    0.000 pyard.py:450(<lambda>)

profile of convert_pull.py after performance enhancements

   281302   89.124    0.000   89.349    0.000 pyard.py:384(redux)
  5237286    9.816    0.000   10.269    0.000 pyard.py:52(getvalue)
        1    5.149    5.149    5.149    5.149 {built-in method _pickle.load}
  2618643    4.335    0.000   16.832    0.000 pyard.py:56(loci_sort)
     1439    3.616    0.003    3.616    0.003 {method 'flush' of '_io.TextIOWrapper' objects}
  7130577    3.144    0.000    4.640    0.000 re.py:271(_compile)
  7129945    2.758    0.000    9.248    0.000 re.py:180(search)
 10987721    2.468    0.000    2.468    0.000 {method 'split' of 'str' objects}
     6020    2.350    0.000    2.350    0.000 pyard.py:473(<lambda>)
  7411302    1.943    0.000    1.943    0.000 {method 'search' of 're.Pattern' objects}

the improvement in isvald() has been dramatic.
the improvement in redux() has been less substantial.

further performance enhancement suggestions

  • increase memoization cache size for redux(). Currently using @functools.lru_cache(maxsize=1000)
  • add memoization to getvalue()
  • have a code clinic to discuss other ideas (parallelize? more caching/memoizing?)

Copy link

@cicd-nmdp cicd-nmdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇 LGTM

Copy link
Contributor

@pbashyal-nmdp pbashyal-nmdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@pbashyal-nmdp pbashyal-nmdp merged commit 7983a79 into nmdp-bioinformatics:master Mar 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DRB1*02:XX

3 participants