Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAGMA.Celltyping: bschilder_dev upgrade #93

Merged
merged 56 commits into from
Jan 27, 2022
Merged

MAGMA.Celltyping: bschilder_dev upgrade #93

merged 56 commits into from
Jan 27, 2022

Conversation

bschilder
Copy link
Collaborator

@bschilder bschilder commented Jan 5, 2022

MAGMA.Celltyping 2.0.0

  • MAJOR UPGRADE: MAGMA.Celltyping was revamped to meet CRAN standards,
    automatically install MAGMA, and take any species as input.

New features

  • Added a NEWS.md file to track changes to the package.
  • Automatically install MAGMA with new magma_install function;
    stores binaries in MAGMA.Celltyping-specific cache dir. Added various support functions to make this possible and ensure correct version is being used.
  • Added magma_uninstall function to remove one or all MAGMA binaries.
  • Allow MAGMA.Celltyping to install even if MAGMA is not installed. Instead,
    check at the beginning of functions that require MAGMA using magma_check.
    • magma_links_stored: Include built-in metadata with links to all MAGMA
      versions with parsed version numbers, OS, and which is the latest version.
  • Call MAGMA commands with magma_run which finds the requested version of
    MAGMA and uses it.
  • Print readable, user-friendly MAGMA commands when being run through
    magma_cmd function.
  • Added unit tests.
  • Create hex sticker
  • New function: get_sub_SNP_LOC_DATA
  • Formally deprecated functions using .Deprecated function and removing all other internal code:
    • get_genomebuild_for_sumstats
    • build_snp_location_tables
    • format.sumstats.for.magma
    • format_sumstats_for_magma_macOnly
    • standardise.sumstats.column.headers
    • standardise.sumstats.column.headers.crossplatform
  • Removed sumstatsColHeaders from data, as it was only used in now-deprecated
    functions.
  • Renamed all functions with "." to "_" to meet coding standards.
  • Renamed functions to be more concise and avoid issues with
    test file names being too long:
    • calculate.celltype.enrichment.probabilities.wtLimma --> calculate_celltype_enrichment_limma
    • calculate.conditional.celltype.enrichment.probabilities.wtLimma --> calculate_conditional_celltype_enrichment_limma
  • Removed all large data/ to GitHub Releases, now accessible with dedicated
    piggyback-based functions:
    • get_ctd: CellTypeDatasets
    • get_example_gwas: GWAS summary stats
    • get_genomeLocFile: NCBI gene coordinate references.
  • Create example full GWAS summary stats (both unfiltered and filtered + munged with MungeSumstats). Accessed by get_example_gwas.
    • "prospective_memory"
    • "fluid_intelligence"
    • "educational_attainment"
  • Updated vignettes:
    • Created concise Getting started vignette.
    • Updated origiinal vignette and turned into full_workflow vignette.
  • Made certain functions run automatically internally,
    instead of having the user run them:
    • get_genome_ref
    • prepare_quantile_groups
  • Remove unnecessary dependencies:
    • reshape
    • cowplot
    • SNPlocs.Hsapiens.dbSNP144.GRCh37
    • SNPlocs.Hsapiens.dbSNP144.GRCh38
  • Replaced hgnc2entrez with improved hgnc2entrez_ortohgene from
    orthogene::all_genes. Benchmarked to confirm that the latter
    increases the number of genes that can be converted.
  • Allow all functions to accept datasets/gene lists from any species.
    Now automatically converted to output_species (default: "human") using orthogene.
  • Create MAGMA files repository using various OpenGWAS datasets
    that have been munged with MungeSumstats: https://github.com/neurogenomics/MAGMA_Files_Public
    • magma_files_metadata: Built-in table of all pre-processed MAGMA files
      currently in the database.
  • Added API to search and access MAGMA files repository: import_magma_files.
  • Allow all relevant functions to take only MAGMA files as input
    (instead of requiring the GWAS summary stats); e.g. calculate_celltype_associations(magma_dir="<folder_containing_magma_files>")
    This function is also used for downloading MAGMA files in examples/unit tests.
  • Add header notation in code comments to improve code navigability.
  • Fix Roxygen notes:
    • Document @title,@description,@param, @return
      for all exported (and many internal) functions.
    • Document @examples
      for all exported (and many internal) functions.
    • Used @importFrom or requireNamespace for all imports functions.
  • Replace usage of all 1:10 syntax.
  • Reduce number of functions in NAMESPACE
  • Set all defaults consistently across all functions:
    • upstream_kb = 35
    • downstream_kb = 10
  • Allow the use of non-European populations by downloading
    population-specific LD panels from 1KG with
    get_genome_ref(population = "<population_name>")
  • Handle other CTD matrix input types by ensuring standardisation
    as dense matrices when computing quantiles/normalization.
  • Take advantage of new EWCE features in bschilder_dev branch:
    • Standardise CTD internally in all relevant functions using new
      EWCE::standardise_ctd
  • Create all-in-one functions celltype_associations_pipeline,
    which lets users specify which test they want to run with arguments, including:
    • calculate_celltype_associations (Linear mode)
    • calculate_celltype_associations (Top10% mode)
    • calculate_conditional_celltype_associations
  • Parallelise celltype_associations_pipeline across multiple cores.
  • Removed old functions whose output were not being used:
    • normalise_mean_exp
    • bin_specificityDistance_into_quantiles
    • bin_expression_into_quantiles
  • Add new function (plus tests):
    • get_driver_genes
  • Added unit tests for:
    • calculate_celltype_enrichment_limma
    • adjust_zstat_in_genesOut
    • Deprecated functions
  • Added R script to produce vignette results inst/extdata/MAGMA_Celltyping_1.0_vignette.R, and uploaded zipped folder via piggyback: MAGMA_Celltyping_1.0_results.zip
  • Added unit tests comparing old (1.0.0) vs new (>=2.0) MAGMA.Celltyping versions produce the same results; test-MAGMA_Celltyping_1.0_vs_2.0.R. Full report here.

Bug fixes

  • Removed usethis call from code.
  • Removed all library calls from code.
  • Avoid accidentally renaming columns with data.frame
  • Remove all suppressWarnings calls and resolve the underlying issues instead.
  • Add utils as Suggest.
  • Normalize paths to magma executables (to avoid path issues on WindowsOS).
  • Fixed axes in plot_celltype_associations, first reported here.
  • Fixed prepare_quantile_groups so that it's consistent with how EWCE
    compute specificity quantiles. Ensures that all celltypes (columns)
    have exactly the same number of quantiles, which was not the case before.
  • Fixed bug in ``

bschilder and others added 30 commits November 12, 2021 17:31
…magma files. Allow calculate_conditional_geneset_enrichment to run in any species an d handle version differences in magma file formats
@bschilder bschilder added the enhancement New feature or request label Jan 5, 2022
@Al-Murphy
Copy link
Collaborator

Some notes on the PR:

  • DESCRIPTION -
    • Change my role to role="ctb", as a contributor. I haven't done much work on MAGMA.Celltyping so it really shouldn't be called out like this.
    • I think you should be the maintainer of the package now, this makes the most sense since you will now understand it far better than anyone else (plus this will split EWCE and MAGMA between us which will hopefully mean less work for both of us). I think we should be consistent with our other packages and with this maintainer idea and you should put your role as role="cre" and Nathan's as role=c("cre","aut").
    • I agree that we should change the version to 2. See below for Hadley Wickham's comments on this:

    Increment major, e.g. 1.0.0, for a major release. This is best reserved for changes that are not backward compatible and that are likely to affect many users. Going from 0.b.c to 1.0.0 typically indicates that your package is feature complete with a stable API.

    • Since major changes are made here including deprecated functions so it makes sense. However, note the new version should be 2.0.0 not 2.0.1. Can you change this?
    • The call Remotes: github::NathanSkene/EWCE, please make a note to remove this once the new EWCE version goes live in Apr/May.
  • Add function documentation -
    • R/bin_expression_into_quantiles.R
    • R/bin_specificity.R
    • R/bin_specificityDistance_into_quantiles.R
    • R/check_access.R
    • R/check_celltype_names.R
    • R/check_enrichment_mode.R
    • R/check_entrez_genes.R
    • R/check_quantiles.R
    • R/create_genesets.R
    • R/decompress.R
    • R/find_GenesOut_files.R
    • R/fix_ctd.R
    • R/fix_path.R
    • R/get_actual_path.R
    • R/get_celltype_dict.R
    • R/get_example_magma_files.R
    • R/get_os.R
    • R/get_top10percent.R
    • R/github_download_files.R
    • R/import_magma_files_metadata.R
    • R/invert_dict.R - Is this necessary to have as a function, it is just a one liner?
    • R/load_rdata.R
    • R/magma_check.R
    • R/magma_check_version_match.R
    • R/magma_create_symlink.R
    • R/magma_download_binary.R
    • R/magma_executable_select.R
    • R/magma_find_executable.R
    • R/magma_installation_info.R
    • R/magma_installed.R
    • R/magma_installed_version.R
    • R/magma_links.R
    • R/magma_links_gather.R
    • R/magma_links_query.R
    • R/magma_os_suffix.R
    • R/magma_read_gsa_out.R
    • R/magma_read_sets_out.R
    • R/message_cmd.R - Is this necessary to have as a function, it is just a one liner?
    • R/messager.R
    • R/normalise_mean_exp.R
    • R/set_permissions.R
    • R/use_distance_to_add_expression_level_info.R
  • R/magma_geneset_test.r - This script should be deleted
  • R/utils.r seems to do roughly the same as R/messager.R, delete utils.r and update any calls
  • README.md -
    • I think Nathan's name should probably come last (as the PI for MAGMA.Celltyping)
    • Can you update the MungeSumstats reference to the correct link and update the authors?
    • When you submit to CRAN don't forget to add info to the README about installing from CRAN rather than Github.
  • Unit tests
    • test-calculate_conditional_geneset_enrichment.R - isn't being used, should it be?
    • test-map_snps_to_genes.r - isn't being used, should it be?
    • Add multiple tests to show that MAGMA.Celltyping 2.0 gives the same results as 1.0. I think this is a very important step so we know all the changes you made didn't change the results you will get.
    • What's the overall code coverage, it seems like further tests may be needed to cover more of the functionality? R CMD check on GHA seems to be running in 8 mins so we have a lot of time to play with which we should use to make the unit tests more robust.
  • Vignettes
    • MAGMA.Celltyping.Rmd - in # Run cell-type enrichment analyses give information on what MAGMA.Celltyping::celltype_associations_pipeline is doing at a high-level for new users. Currently only the code is given with no explanation.
    • MAGMA.Celltyping.Rmd - Same for # Plot results and the two subheadings. I get this is like the quickstart vignette but I think some more information is needed.
    • full_workflow.Rmd - Similar to other vignette I think a few lines on why you run each of the MAGMA.Celltyping functions would be really helpful. You explain the functions but then don't give the bigger picture to why you would want to use each. Both vignettes feel slightly too bare and unintuitive for a new user currently.

@bschilder
Copy link
Collaborator Author

bschilder commented Jan 7, 2022

DESCRIPTION

  • Change my role to role="ctb", as a contributor. I haven't done much work on MAGMA.Celltyping so it really shouldn't be called out like this.
  • I think you should be the maintainer of the package now, this makes the most sense since you will now understand it far better than anyone else (plus this will split EWCE and MAGMA between us which will hopefully mean less work for both of us). I think we should be consistent with our other packages and with this maintainer idea and you should put your role as role="cre" and Nathan's as role=c("cre","aut").

There can only be one "cre" (building the package throws an error otherwise). Keeping that as Nathan, since he both created it and will be the one to continue maintaining it after I leave the lab.

  • I agree that we should change the version to 2. See below for Hadley Wickham's comments on this:

    Increment major, e.g. 1.0.0, for a major release. This is best reserved for changes that are not backward compatible and that are likely to affect many users. Going from 0.b.c to 1.0.0 typically indicates that your package is feature complete with a stable API.

  • Since major changes are made here including deprecated functions so it makes sense. However, note the new version should be 2.0.0 not 2.0.1. Can you change this?
  • The call Remotes: github::NathanSkene/EWCE, please make a note to remove this once the new EWCE version goes live in Apr/May.

Noted here.

Add function documentation

Is this a requirement for CRAN, or a suggestion? Def good practice, but just trying to figure out what to prioritize

  • R/bin_expression_into_quantiles.R

  • R/bin_specificity.R

  • R/bin_specificityDistance_into_quantiles.R

  • R/check_access.R

  • R/check_celltype_names.R

  • R/check_enrichment_mode.R

  • R/check_entrez_genes.R

  • R/check_quantiles.R

  • R/create_genesets.R

  • R/decompress.R

  • R/find_GenesOut_files.R

  • R/fix_ctd.R

  • R/fix_path.R

  • R/get_actual_path.R

  • R/get_celltype_dict.R

  • R/get_example_magma_files.R

  • R/get_os.R

  • R/get_top10percent.R

  • R/github_download_files.R

  • R/import_magma_files_metadata.R

  • R/invert_dict.R - Is this necessary to have as a function, it is just a one liner?

  • R/load_rdata.R

  • R/magma_check.R

  • R/magma_check_version_match.R

  • R/magma_create_symlink.R

  • R/magma_download_binary.R

  • R/magma_executable_select.R

  • R/magma_find_executable.R

  • R/magma_installation_info.R

  • R/magma_installed.R

  • R/magma_installed_version.R

  • R/magma_links.R

  • R/magma_links_gather.R

  • R/magma_links_query.R

  • R/magma_os_suffix.R

  • R/magma_read_gsa_out.R

  • R/magma_read_sets_out.R

  • R/messager.R

  • R/normalise_mean_exp.R

  • R/set_permissions.R

  • R/use_distance_to_add_expression_level_info.R

  • R/magma_geneset_test.r - This script should be deleted

  • R/utils.r seems to do roughly the same as R/messager.R, delete utils.r and update any calls

  • R/message_cmd.R - Is this necessary to have as a function, it is just a one liner?

As a rule of mine, any bit of code you use more than once should be a function (even small ones). That way it is always consistent across usage.

README.md

  • I think Nathan's name should probably come last (as the PI for MAGMA.Celltyping)
  • Can you update the MungeSumstats reference to the correct link and update the authors?
  • When you submit to CRAN don't forget to add info to the README about installing from CRAN rather than Github.

Unit tests

  • test-calculate_conditional_geneset_enrichment.R - isn't being used, should it be?

Still trying to figure out this function. Need to have a discussion with @NathanSkene about this.

  • test-map_snps_to_genes.r - isn't being used, should it be?

Wrote this and then realized it takes too long to run. Keeping in case we decide to use it later, and also just to have some means of checking whether it works (even if manually).

  • Add multiple tests to show that MAGMA.Celltyping 2.0 gives the same results as 1.0. I think this is a very important step so we know all the changes you made didn't change the results you will get.
  • What's the overall code coverage, it seems like further tests may be needed to cover more of the functionality? R CMD check on GHA seems to be running in 8 mins so we have a lot of time to play with which we should use to make the unit tests more robust.

40% currently. Getting this up is def a longer-term goal of mine, but I can't sink too much time into this atm. Also, some tests take an extremely long time (thus why i hashed out test-map_snps_to_genes). Added Issue here.

Vignettes

  • MAGMA.Celltyping.Rmd - in # Run cell-type enrichment analyses give information on what MAGMA.Celltyping::celltype_associations_pipeline is doing at a high-level for new users. Currently only the code is given with no explanation.
  • MAGMA.Celltyping.Rmd - Same for # Plot results and the two subheadings. I get this is like the quickstart vignette but I think some more information is needed.
  • full_workflow.Rmd - Similar to other vignette I think a few lines on why you run each of the MAGMA.Celltyping functions would be really helpful. You explain the functions but then don't give the bigger picture to why you would want to use each. Both vignettes feel slightly too bare and unintuitive for a new user currently.

@Al-Murphy
Copy link
Collaborator

Just on "There can only be one "cre" (building the package throws an error otherwise). Keeping that as Nathan, since he both created it and will be the one to continue maintaining it after I leave the lab." - The bioconductor standard is that the maintainer should be the "cre" so that should be you. I would put Nathan as the aut only since having two cre throws an error. This is consistent with the labs other packages so I think we should follow it here too (I'm down as cre for EWCE/MungeSumstats). I know you will leave the lab after your PhD but that is a while away yet so I think we can update the cre when the time comes!

@bschilder bschilder added the bug Something isn't working label Jan 24, 2022
@bschilder bschilder merged commit a466ba2 into master Jan 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants