`MAGMA.Celltyping`: bschilder_dev upgrade #93

bschilder · 2022-01-05T13:21:48Z

MAGMA.Celltyping 2.0.0

MAJOR UPGRADE: MAGMA.Celltyping was revamped to meet CRAN standards,
automatically install MAGMA, and take any species as input.

New features

Added a NEWS.md file to track changes to the package.
Automatically install MAGMA with new magma_install function;
stores binaries in MAGMA.Celltyping-specific cache dir. Added various support functions to make this possible and ensure correct version is being used.
Added magma_uninstall function to remove one or all MAGMA binaries.
Allow MAGMA.Celltyping to install even if MAGMA is not installed. Instead,
check at the beginning of functions that require MAGMA using magma_check.
- magma_links_stored: Include built-in metadata with links to all MAGMA
  versions with parsed version numbers, OS, and which is the latest version.
Call MAGMA commands with magma_run which finds the requested version of
MAGMA and uses it.
Print readable, user-friendly MAGMA commands when being run through
magma_cmd function.
Added unit tests.
Create hex sticker
New function: get_sub_SNP_LOC_DATA
Formally deprecated functions using .Deprecated function and removing all other internal code:
- get_genomebuild_for_sumstats
- build_snp_location_tables
- format.sumstats.for.magma
- format_sumstats_for_magma_macOnly
- standardise.sumstats.column.headers
- standardise.sumstats.column.headers.crossplatform
Removed sumstatsColHeaders from data, as it was only used in now-deprecated
functions.
Renamed all functions with "." to "_" to meet coding standards.
Renamed functions to be more concise and avoid issues with
test file names being too long:
- calculate.celltype.enrichment.probabilities.wtLimma --> calculate_celltype_enrichment_limma
- calculate.conditional.celltype.enrichment.probabilities.wtLimma --> calculate_conditional_celltype_enrichment_limma
Removed all large data/ to GitHub Releases, now accessible with dedicated
piggyback-based functions:
- get_ctd: CellTypeDatasets
- get_example_gwas: GWAS summary stats
- get_genomeLocFile: NCBI gene coordinate references.
Create example full GWAS summary stats (both unfiltered and filtered + munged with MungeSumstats). Accessed by get_example_gwas.
- "prospective_memory"
- "fluid_intelligence"
- "educational_attainment"
Updated vignettes:
- Created concise Getting started vignette.
- Updated origiinal vignette and turned into full_workflow vignette.
Made certain functions run automatically internally,
instead of having the user run them:
- get_genome_ref
- prepare_quantile_groups
Remove unnecessary dependencies:
- reshape
- cowplot
- SNPlocs.Hsapiens.dbSNP144.GRCh37
- SNPlocs.Hsapiens.dbSNP144.GRCh38
Replaced hgnc2entrez with improved hgnc2entrez_ortohgene from
orthogene::all_genes. Benchmarked to confirm that the latter
increases the number of genes that can be converted.
Allow all functions to accept datasets/gene lists from any species.
Now automatically converted to output_species (default: "human") using orthogene.
Create MAGMA files repository using various OpenGWAS datasets
that have been munged with MungeSumstats: https://github.com/neurogenomics/MAGMA_Files_Public
- magma_files_metadata: Built-in table of all pre-processed MAGMA files
  currently in the database.
Added API to search and access MAGMA files repository: import_magma_files.
Allow all relevant functions to take only MAGMA files as input
(instead of requiring the GWAS summary stats); e.g. calculate_celltype_associations(magma_dir="<folder_containing_magma_files>")
This function is also used for downloading MAGMA files in examples/unit tests.
Add header notation in code comments to improve code navigability.
Fix Roxygen notes:
- Document @title,@description,@param, @return
  for all exported (and many internal) functions.
- Document @examples
  for all exported (and many internal) functions.
- Used @importFrom or requireNamespace for all imports functions.
Replace usage of all 1:10 syntax.
Reduce number of functions in NAMESPACE
Set all defaults consistently across all functions:
- upstream_kb = 35
- downstream_kb = 10
Allow the use of non-European populations by downloading
population-specific LD panels from 1KG with
get_genome_ref(population = "<population_name>")
Handle other CTD matrix input types by ensuring standardisation
as dense matrices when computing quantiles/normalization.
Take advantage of new EWCE features in bschilder_dev branch:
- Standardise CTD internally in all relevant functions using new
  EWCE::standardise_ctd
Create all-in-one functions celltype_associations_pipeline,
which lets users specify which test they want to run with arguments, including:
- calculate_celltype_associations (Linear mode)
- calculate_celltype_associations (Top10% mode)
- calculate_conditional_celltype_associations
Parallelise celltype_associations_pipeline across multiple cores.
Removed old functions whose output were not being used:
- normalise_mean_exp
- bin_specificityDistance_into_quantiles
- bin_expression_into_quantiles
Add new function (plus tests):
- get_driver_genes
Added unit tests for:
- calculate_celltype_enrichment_limma
- adjust_zstat_in_genesOut
- Deprecated functions
Added R script to produce vignette results inst/extdata/MAGMA_Celltyping_1.0_vignette.R, and uploaded zipped folder via piggyback: MAGMA_Celltyping_1.0_results.zip
Added unit tests comparing old (1.0.0) vs new (>=2.0) MAGMA.Celltyping versions produce the same results; test-MAGMA_Celltyping_1.0_vs_2.0.R. Full report here.

Bug fixes

Removed usethis call from code.
Removed all library calls from code.
Avoid accidentally renaming columns with data.frame
Remove all suppressWarnings calls and resolve the underlying issues instead.
Add utils as Suggest.
Normalize paths to magma executables (to avoid path issues on WindowsOS).
Fixed axes in plot_celltype_associations, first reported here.
Fixed prepare_quantile_groups so that it's consistent with how EWCE
compute specificity quantiles. Ensures that all celltypes (columns)
have exactly the same number of quantiles, which was not the case before.
Fixed bug in ``

…magma files. Allow calculate_conditional_geneset_enrichment to run in any species an d handle version differences in magma file formats

…gs to resolve Windows GHA error.

…alize magma_x paths for Windows

…version

Al-Murphy · 2022-01-07T13:09:10Z

Some notes on the PR:

DESCRIPTION -
- Change my role to role="ctb", as a contributor. I haven't done much work on MAGMA.Celltyping so it really shouldn't be called out like this.
- I think you should be the maintainer of the package now, this makes the most sense since you will now understand it far better than anyone else (plus this will split EWCE and MAGMA between us which will hopefully mean less work for both of us). I think we should be consistent with our other packages and with this maintainer idea and you should put your role as role="cre" and Nathan's as role=c("cre","aut").
- I agree that we should change the version to 2. See below for Hadley Wickham's comments on this:
Increment major, e.g. 1.0.0, for a major release. This is best reserved for changes that are not backward compatible and that are likely to affect many users. Going from 0.b.c to 1.0.0 typically indicates that your package is feature complete with a stable API.
- Since major changes are made here including deprecated functions so it makes sense. However, note the new version should be 2.0.0 not 2.0.1. Can you change this?
- The call Remotes: github::NathanSkene/EWCE, please make a note to remove this once the new EWCE version goes live in Apr/May.
Add function documentation -
- R/bin_expression_into_quantiles.R
- R/bin_specificity.R
- R/bin_specificityDistance_into_quantiles.R
- R/check_access.R
- R/check_celltype_names.R
- R/check_enrichment_mode.R
- R/check_entrez_genes.R
- R/check_quantiles.R
- R/create_genesets.R
- R/decompress.R
- R/find_GenesOut_files.R
- R/fix_ctd.R
- R/fix_path.R
- R/get_actual_path.R
- R/get_celltype_dict.R
- R/get_example_magma_files.R
- R/get_os.R
- R/get_top10percent.R
- R/github_download_files.R
- R/import_magma_files_metadata.R
- R/invert_dict.R - Is this necessary to have as a function, it is just a one liner?
- R/load_rdata.R
- R/magma_check.R
- R/magma_check_version_match.R
- R/magma_create_symlink.R
- R/magma_download_binary.R
- R/magma_executable_select.R
- R/magma_find_executable.R
- R/magma_installation_info.R
- R/magma_installed.R
- R/magma_installed_version.R
- R/magma_links.R
- R/magma_links_gather.R
- R/magma_links_query.R
- R/magma_os_suffix.R
- R/magma_read_gsa_out.R
- R/magma_read_sets_out.R
- R/message_cmd.R - Is this necessary to have as a function, it is just a one liner?
- R/messager.R
- R/normalise_mean_exp.R
- R/set_permissions.R
- R/use_distance_to_add_expression_level_info.R
R/magma_geneset_test.r - This script should be deleted
R/utils.r seems to do roughly the same as R/messager.R, delete utils.r and update any calls
README.md -
- I think Nathan's name should probably come last (as the PI for MAGMA.Celltyping)
- Can you update the MungeSumstats reference to the correct link and update the authors?
- When you submit to CRAN don't forget to add info to the README about installing from CRAN rather than Github.
Unit tests
- test-calculate_conditional_geneset_enrichment.R - isn't being used, should it be?
- test-map_snps_to_genes.r - isn't being used, should it be?
- Add multiple tests to show that MAGMA.Celltyping 2.0 gives the same results as 1.0. I think this is a very important step so we know all the changes you made didn't change the results you will get.
- What's the overall code coverage, it seems like further tests may be needed to cover more of the functionality? R CMD check on GHA seems to be running in 8 mins so we have a lot of time to play with which we should use to make the unit tests more robust.
Vignettes
- MAGMA.Celltyping.Rmd - in # Run cell-type enrichment analyses give information on what MAGMA.Celltyping::celltype_associations_pipeline is doing at a high-level for new users. Currently only the code is given with no explanation.
- MAGMA.Celltyping.Rmd - Same for # Plot results and the two subheadings. I get this is like the quickstart vignette but I think some more information is needed.
- full_workflow.Rmd - Similar to other vignette I think a few lines on why you run each of the MAGMA.Celltyping functions would be really helpful. You explain the functions but then don't give the bigger picture to why you would want to use each. Both vignettes feel slightly too bare and unintuitive for a new user currently.

bschilder · 2022-01-07T14:18:31Z

DESCRIPTION

Change my role to role="ctb", as a contributor. I haven't done much work on MAGMA.Celltyping so it really shouldn't be called out like this.
I think you should be the maintainer of the package now, this makes the most sense since you will now understand it far better than anyone else (plus this will split EWCE and MAGMA between us which will hopefully mean less work for both of us). I think we should be consistent with our other packages and with this maintainer idea and you should put your role as role="cre" and Nathan's as role=c("cre","aut").

There can only be one "cre" (building the package throws an error otherwise). Keeping that as Nathan, since he both created it and will be the one to continue maintaining it after I leave the lab.

I agree that we should change the version to 2. See below for Hadley Wickham's comments on this:

Increment major, e.g. 1.0.0, for a major release. This is best reserved for changes that are not backward compatible and that are likely to affect many users. Going from 0.b.c to 1.0.0 typically indicates that your package is feature complete with a stable API.
Since major changes are made here including deprecated functions so it makes sense. However, note the new version should be 2.0.0 not 2.0.1. Can you change this?
The call Remotes: github::NathanSkene/EWCE, please make a note to remove this once the new EWCE version goes live in Apr/May.

Noted here.

Add function documentation

Is this a requirement for CRAN, or a suggestion? Def good practice, but just trying to figure out what to prioritize

As a rule of mine, any bit of code you use more than once should be a function (even small ones). That way it is always consistent across usage.

README.md

I think Nathan's name should probably come last (as the PI for MAGMA.Celltyping)
Can you update the MungeSumstats reference to the correct link and update the authors?
When you submit to CRAN don't forget to add info to the README about installing from CRAN rather than Github.

Unit tests

test-calculate_conditional_geneset_enrichment.R - isn't being used, should it be?

Still trying to figure out this function. Need to have a discussion with @NathanSkene about this.

test-map_snps_to_genes.r - isn't being used, should it be?

Wrote this and then realized it takes too long to run. Keeping in case we decide to use it later, and also just to have some means of checking whether it works (even if manually).

Add multiple tests to show that MAGMA.Celltyping 2.0 gives the same results as 1.0. I think this is a very important step so we know all the changes you made didn't change the results you will get.
What's the overall code coverage, it seems like further tests may be needed to cover more of the functionality? R CMD check on GHA seems to be running in 8 mins so we have a lot of time to play with which we should use to make the unit tests more robust.

40% currently. Getting this up is def a longer-term goal of mine, but I can't sink too much time into this atm. Also, some tests take an extremely long time (thus why i hashed out test-map_snps_to_genes). Added Issue here.

Vignettes

MAGMA.Celltyping.Rmd - in # Run cell-type enrichment analyses give information on what MAGMA.Celltyping::celltype_associations_pipeline is doing at a high-level for new users. Currently only the code is given with no explanation.
MAGMA.Celltyping.Rmd - Same for # Plot results and the two subheadings. I get this is like the quickstart vignette but I think some more information is needed.
full_workflow.Rmd - Similar to other vignette I think a few lines on why you run each of the MAGMA.Celltyping functions would be really helpful. You explain the functions but then don't give the bigger picture to why you would want to use each. Both vignettes feel slightly too bare and unintuitive for a new user currently.

Al-Murphy · 2022-01-07T14:29:01Z

Just on "There can only be one "cre" (building the package throws an error otherwise). Keeping that as Nathan, since he both created it and will be the one to continue maintaining it after I leave the lab." - The bioconductor standard is that the maintainer should be the "cre" so that should be you. I would put Nathan as the aut only since having two cre throws an error. This is consistent with the labs other packages so I think we should follow it here too (I'm down as cre for EWCE/MungeSumstats). I know you will leave the lab after your PhD but that is a while away yet so I think we can update the cre when the time comes!

…entation

bschilder and others added 30 commits November 12, 2021 17:31

Upload reduced GWAS examples. Allow non-eur pops in get_genome_ref

91d042d

update documentation

b3139b9

Fix install_magma bugs

03355b9

Fix install_magma bugs

6c0a4f8

Remove extra magma executables

75bea7b

Fix magma_install

baca60a

Fix magma_install

16e4b65

Fix magma_install

132887c

Automatically find valid installation directory

b43dd61

Automatically find valid installation directory

4800df1

Automatically find valid installation directory

edb4e5f

Make symlink creation more robust

7f64742

fix magma

a2487e5

Remove magma exec

074b534

rename magma_install

bfae5fd

Upgrade to EWCEto EWCE 1.3.2. Fix entrez IDs

838ea30

Allow calculate[_conditional]_celltype_associations to run with only …

90c2258

…magma files. Allow calculate_conditional_geneset_enrichment to run in any species an d handle version differences in magma file formats

Remove NCBI files. Use get_genomeLocFile

2e3b477

Add GHA workflows

a62cc99

Specify EWCE bschilder_dev version

92cb6ff

Adjust GHA to install rmarkdown

ba5e098

Found missing ) typo in GHA

f0c11ba

Change MAGMA install dir to cache

e85b691

Add NEWS.md

2475b72

Add backup data for magma_links_stored. Ensure system calls are strin…

c18962c

…gs to resolve Windows GHA error.

Update NEWS

d09aabe

Add utils as dep, add magma unit tests

eb7069a

Add error handling for rmarkdown heatmap to avoid polygon error. Norm…

499fcc9

…alize magma_x paths for Windows

only normalizePath on Windows

ea732e9

Make MAGMA install/uninstall more robust

d7038e0

bschilder and others added 18 commits December 16, 2021 18:27

Fix tests. Update MAGMA installation info in REEADME

f621b22

Ensure MAGMA doesnt get constantly reinstaled

d55f4d0

Save MAGMA_Files_Public metadata

559d0c2

With with /nocache

fc1b1f4

Set zipped magma folder permissions

d19db79

Add badger as sugget

81fa7f7

Use sorted magma_links_stored metadata to avoid installing the wrong …

77ac859

…version

Fix wrong magma version install. \nocache

f3663f9

Fix magma_install test

9b40571

Replace noramlizePath with path.expand \nocache

5233856

Switch bioconductor_docker to latest while devel version is being fixed

aba6162

Switch bioconductor_docker to latest while devel version is being fixed

ceab405

Use remote version of EWCE until bioc gets updated

6cadc08

Add Windows-specific file permission function

a77d57d

Windows mods [skip ci]

e2e285c

Windows mods [skip ci]

ba349eb

Check on Mac [skip ci]

503ee77

All tests passing on Windows

8637fb9

bschilder added the enhancement New feature or request label Jan 5, 2022

bschilder requested a review from Al-Murphy January 5, 2022 13:21

bschilder added 5 commits January 7, 2022 15:49

Added all of Alan's PR suggestions except the internal function docum…

770bd36

…entation

Document internal functions. Delete unused functions. Update MAGMA links

87cd35f

Fix conditional tests. Add v1.0 vs. v2.0 tests

9b0f3cc

Add fix_path to calculate_conditional_geneset_enrichment

3ed7507

Add get_actual_path

bc23b54

bschilder added the bug Something isn't working label Jan 24, 2022

bschilder merged commit a466ba2 into master Jan 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`MAGMA.Celltyping`: bschilder_dev upgrade #93

`MAGMA.Celltyping`: bschilder_dev upgrade #93

bschilder commented Jan 5, 2022 •

edited

Loading

Al-Murphy commented Jan 7, 2022

bschilder commented Jan 7, 2022 •

edited

Loading

Al-Murphy commented Jan 7, 2022

MAGMA.Celltyping: bschilder_dev upgrade #93

MAGMA.Celltyping: bschilder_dev upgrade #93

Conversation

bschilder commented Jan 5, 2022 • edited Loading

MAGMA.Celltyping 2.0.0

New features

Bug fixes

Al-Murphy commented Jan 7, 2022

bschilder commented Jan 7, 2022 • edited Loading

DESCRIPTION

Add function documentation

README.md

Unit tests

Vignettes

Al-Murphy commented Jan 7, 2022

`MAGMA.Celltyping`: bschilder_dev upgrade #93

`MAGMA.Celltyping`: bschilder_dev upgrade #93

bschilder commented Jan 5, 2022 •

edited

Loading

bschilder commented Jan 7, 2022 •

edited

Loading