Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sophia script #1105

Merged

Conversation

callachennault
Copy link
Contributor

@callachennault callachennault commented Jan 17, 2024

  • WIP: Need to complete testing - make sure this produces same results as previous script

Adds Sophia cohort generation script to import-scripts directory. Note that this script and the update-az-mskimpact.sh script have some minor overlap (report_error, rename_files_in_delivery_directory, add_metadata_headers) and should be generalized.

export SOPHIA_MSKIMPACT_STABLE_ID="sophia_mskimpact"
export SOPHIA_COHORT_HOME="$SOPHIA_DATA_HOME/deliveries/$COHORT_NAME"
export SOPHIA_MSK_IMPACT_DATA_HOME="$SOPHIA_COHORT_HOME/$SOPHIA_MSKIMPACT_STABLE_ID"
export SOPHIA_TMPDIR="$SOPHIA_COHORT_HOME/tmp"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if I understand right --

sophia_portal_data/deliveries/-/sophia_mskimpact

Is there anything parallel to deliveries?

Also for tmp function, let's use something on the tmp volume - think we autogenerate a tmp directory as part of the script. If that's here, wil we still need the sophia_mskimpact directory?

Copy link
Collaborator

@averyniceday averyniceday Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was thinking something closer to

cbio-portal-data/delivery-pipelines/<recepient e.g. sophia>/<cancer-type-date>

Copy link
Contributor Author

@callachennault callachennault Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now /data/portal-cron/cbio-portal-data/sophia-data contains these directories/files:

  • cohort_ids/: contains files with list of IDs for each cohort
  • deliveries/: contains directories for each cancer type cohort and a date (in the case that we have to make changes / deliver multiple versions). For example, a subdirectory would be sophia-lung-data-01.28.24 and that directory contains: README.pdf, gene_panels/ directory, and sophia-mskimpact/ directory containing study files. This structure was copied from the structure of the az repo but since sophia doesn't import these files to a cbio instance it's probably not totally necessary, just thought it was a good idea to keep the same structure
  • gene_panels/: gene panel files that are packaged into each delivery
  • logs/: log files from cohort generation
  • README.pdf: readme file describing files, is packaged into each delivery

if you want to talk through this setup @averyniceday let me know, totally open to opinions

@averyniceday averyniceday merged commit eab94cd into knowledgesystems:master Feb 2, 2024
2 checks passed
@callachennault callachennault deleted the add-sophia-script branch February 2, 2024 15:33
sheridancbio pushed a commit to sheridancbio/cmo-pipelines that referenced this pull request Feb 5, 2024
* Add sophia script

* rename transpose_cna file

* Add filter-clinical-arg-functions script

* Add az var to correct automation environment

* Add correct path to transpose_cna script

* Call seq_date function

* Add seq_date before filtering columns

* syntax fix

* Fix call to filter out clinical attribute columns

* Fix nonsigned out file path

* Automate folder name

* directory fixes

* remove quotes?

* change date formatting

* output filepath for duplicate variants script

* use az_msk_impact_data_home var

* move sophia_data_home to automation environment

* Add comments

* Change dir structures in sophia script to match new repo structure

* Add git operations

* Remove test file

* Fix dirs for sophia zip command

* remove quotes

* Zip files before cleanup

* move zip step before git push
sheridancbio pushed a commit to sheridancbio/cmo-pipelines that referenced this pull request Feb 9, 2024
* Add sophia script

* rename transpose_cna file

* Add filter-clinical-arg-functions script

* Add az var to correct automation environment

* Add correct path to transpose_cna script

* Call seq_date function

* Add seq_date before filtering columns

* syntax fix

* Fix call to filter out clinical attribute columns

* Fix nonsigned out file path

* Automate folder name

* directory fixes

* remove quotes?

* change date formatting

* output filepath for duplicate variants script

* use az_msk_impact_data_home var

* move sophia_data_home to automation environment

* Add comments

* Change dir structures in sophia script to match new repo structure

* Add git operations

* Remove test file

* Fix dirs for sophia zip command

* remove quotes

* Zip files before cleanup

* move zip step before git push
sheridancbio pushed a commit to mandawilson/cmo-pipelines that referenced this pull request Mar 27, 2024
author Manda Wilson <1458628+mandawilson@users.noreply.github.com> 1703199176 -0500
committer Robert Sheridan <sheridan@cbio.mskcc.org> 1711560265 -0400

upgrade to java 21

switch to genome-nexus-annotation-pipeline that uses new maf repo

updated to spring 6, spring batch 5, spring boot 3 to match cbioportal

fix typos

Updates to AZ-MSKIMPACT to integrate with CDM (knowledgesystems#1098)

Fix bug in checking for duplicate Mutation Records (knowledgesystems#1099)

* Check if mutationRecord is duplicated before annotating

* Populate mutationMap in loadMutationRecordsFromJson

* add addRecordToMap

* Remove comments, add local vars for debugging

* Remove duplicate MAF variants for AZ

* Fix remove-duplicate-maf-variants call

* revert whitespace change

updates for migrating darwin and crdb to java11 (knowledgesystems#1080)

pom changes for pulling moved dependencies
changes to java args to silence warnings

Co-authored-by: cbioportal import user <cbioportal_importer@pipelines.cbioportal.mskcc.org>

Remove Annotated MAF before Import (knowledgesystems#958)

* remove annotated MAF to prevent duplicate

* Update subset_and_merge_crdb_pdx_studies.py

---------

Co-authored-by: Avery Wang <averyjwang@gmail.com>

Script to combine arbitrary files (knowledgesystems#1104)

* Script to combine arbitrary files

* Modify unit tests to work with script changes

* Remove unnecessary column specifier

* Fix syntax bug

Add sophia script (knowledgesystems#1105)

* Add sophia script

* rename transpose_cna file

* Add filter-clinical-arg-functions script

* Add az var to correct automation environment

* Add correct path to transpose_cna script

* Call seq_date function

* Add seq_date before filtering columns

* syntax fix

* Fix call to filter out clinical attribute columns

* Fix nonsigned out file path

* Automate folder name

* directory fixes

* remove quotes?

* change date formatting

* output filepath for duplicate variants script

* use az_msk_impact_data_home var

* move sophia_data_home to automation environment

* Add comments

* Change dir structures in sophia script to match new repo structure

* Add git operations

* Remove test file

* Fix dirs for sophia zip command

* remove quotes

* Zip files before cleanup

* move zip step before git push

Add script for merging Dremio/SMILE into cmo-access (knowledgesystems#1102)

- adds cfdna clinical and timeline data from dremio/SMILE
- converts patient identifiers using "dmp over cmo" identifier logic from dremio
- dremio patient id mapping table export code called to produce mapping table
- main script then calls update_cfdna_clinical_sample_patient_ids_via_dremio.sh
- merge.py used to combine clinical data from dremio with clinical data from cmo-access
- metadata headers added using new script : merge_clinical_metadata_headers_py3.py
- other import process flow (similar to other import scripts) followed
- error detection step added after debugging for sporadic data loss in results

Co-authored-by: Manda Wilson <1458628+mandawilson@users.noreply.github.com>

Modify preconsume script to work on one cohort at a time (knowledgesystems#1107)

Call correct function name

add options for logging in for different accounts

Preconsume archer-solid-cv4 and add fetch loop (knowledgesystems#1129)

* Handle archer-solid-cv4 samples
* Add loop
* move each cohort to its own dir and fix filename

switch to genome-nexus-annotation-pipeline that uses new maf repo

use updated genome-nexus-annotation-pipeline

update version of cmo-pipelines to 1.0.0

Convert BatchConfiguration to new Spring Batch format

drop unneeded dependency from redcap

removed gdd, updated crdb and ddp batch configs to spring batch 5

removed commons-lang

start of converting cvr to spring batch 5

fix cvr fetcher BatchConfiguration

fixed redcap pipeline spring batch 5 configuration

make spring-batch-integration match batch version

Co-authored-by: Manda Wilson <1458628+mandawilson@users.noreply.github.com>

drop darwin fetcher (and docs/scripts)
mandawilson pushed a commit to mandawilson/cmo-pipelines that referenced this pull request Mar 27, 2024
* Add sophia script

* rename transpose_cna file

* Add filter-clinical-arg-functions script

* Add az var to correct automation environment

* Add correct path to transpose_cna script

* Call seq_date function

* Add seq_date before filtering columns

* syntax fix

* Fix call to filter out clinical attribute columns

* Fix nonsigned out file path

* Automate folder name

* directory fixes

* remove quotes?

* change date formatting

* output filepath for duplicate variants script

* use az_msk_impact_data_home var

* move sophia_data_home to automation environment

* Add comments

* Change dir structures in sophia script to match new repo structure

* Add git operations

* Remove test file

* Fix dirs for sophia zip command

* remove quotes

* Zip files before cleanup

* move zip step before git push
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants