Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update API and add performance enhancements to CollectionTables #229

Merged
merged 1 commit into from
Dec 16, 2020

Conversation

robertcv
Copy link
Contributor

@robertcv robertcv commented Dec 6, 2020

What started with just a few fixes ended up being closer to a rewrite.
The major changes are:

  • change main methods to attributes: metadata() -> meta, expressions() -> exp, counts() -> rc
  • add clear_cache() which removes all cache files from the default location
  • add method id_to_symbol which holds gene id to symbole mapping
  • exp and rc return gene ids instead of symboles
  • sample name is used instead of sample slug for tables index
  • add separate versioning for expression data
  • seperatly fetch metadata and expressions
  • add downloading progress bar
  • mapping of gene ids to symboles is now cached on disc and is also used on other collection
  • add CollectionTable to resdk namespace
  • store expression type in returned tables attrs['exp_type'] attribute

Performance on 45 samples (exp and rc have the same performance):

how what old new
/ init 0.4s <0.01s
uncached meta 127s* 1.7s
uncached exp 255s* 64s**
uncached id_to_symbol / 86s
cached meta 1.7s 0.4s
cached exp 0.02s 1.8s
cached id_to_symbol / 2.8s

* calling just one of metadata() or expressions(), if used together resolwe server gets queried only once and performance improves for he second function for ~120s
** without gene id to symbol mapping

@robertcv
Copy link
Contributor Author

robertcv commented Dec 7, 2020

@JureZmrzlikar @mstajdohar

Copy link
Member

@JureZmrzlikar JureZmrzlikar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have few comments. Also, make sure that tests pass.

setup.py Outdated Show resolved Hide resolved
src/resdk/collection_tables.py Show resolved Hide resolved
src/resdk/collection_tables.py Show resolved Hide resolved
src/resdk/collection_tables.py Outdated Show resolved Hide resolved
src/resdk/collection_tables.py Show resolved Hide resolved
@robertcv
Copy link
Contributor Author

robertcv commented Dec 7, 2020

@JureZmrzlikar Can you look at Jenkins and help me understand why it is failing?

@JureZmrzlikar
Copy link
Member

@JureZmrzlikar Can you look at Jenkins and help me understand why it is failing?

I have no clue. @acopar Do you have any idea?

@acopar
Copy link
Contributor

acopar commented Dec 9, 2020

This problem happened because pip switched the default dependency resolver on the recent 20.3.* version. The new resolver has different dependency resolution, and causes some glitches with dependencies, in particular backtracking package versions until timeout. This PR should fix this problem:
https://github.com/genialis/genialis-base/pull/596

We can re-trigger build after genialis-base PR is merged.

@mstajdohar
Copy link
Member

What is the status? Can we merge this PR?

Copy link
Member

@JureZmrzlikar JureZmrzlikar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have few comments. Otherwise looks good to me.

src/resdk/collection_tables.py Outdated Show resolved Hide resolved
src/resdk/collection_tables.py Show resolved Hide resolved
@codecov-io
Copy link

codecov-io commented Dec 16, 2020

Codecov Report

Merging #229 (1196e38) into master (0d51338) will increase coverage by 0.20%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #229      +/-   ##
==========================================
+ Coverage   80.72%   80.92%   +0.20%     
==========================================
  Files          27       27              
  Lines        1619     1636      +17     
==========================================
+ Hits         1307     1324      +17     
  Misses        312      312              
Impacted Files Coverage Δ
src/resdk/__init__.py 100.00% <100.00%> (ø)
src/resdk/collection_tables.py 100.00% <100.00%> (ø)
src/resdk/utils/table_cache.py 97.14% <100.00%> (+0.47%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0d51338...1196e38. Read the comment docs.

@JureZmrzlikar JureZmrzlikar merged commit aa64587 into genialis:master Dec 16, 2020
@robertcv robertcv deleted the enh/collection_table branch August 26, 2021 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants