Skip to content

Latest commit

 

History

History
452 lines (323 loc) · 13.5 KB

all_scripts.rst

File metadata and controls

452 lines (323 loc) · 13.5 KB

Sequencing run processing

Metadata registration

Usage

find_and_register_project_metdata.py
[-h] -p PROJET_INFO_PATH -d DBCONFIG -t USER_ACCOUNT_TEMPLATE -n SLACK_CONFIG -u HPC_USER -a HPC_ADDRESS -l LDAP_SERVER [-h] [-s] [-c] [-i] [-m]

Parameters

-h, --help : Show this help message and exit
-p, --projet_info_path
 : Project metdata directory path
-d, --dbconfig : Database configuration file path
-t, --user_account_template
 : User account information email template file path
-s, --log_slack
 : Toggle slack logging
-n, --slack_config
 : Slack configuration file path
-c, --check_hpc_user
 : Toggle HPC user checking
-u, --hpc_user : HPC user name for ldap server checking
-a, --hpc_address
 : HPC address for ldap server checking
-l, --ldap_server
 : Ldap server address
-i, --setup_irods
 : Setup iRODS account for user
-m, --notify_user
 : Notify user about new account and password

Monitor sequencing run for demultiplexing

Usage

find_new_seqrun_and_prepare_md5.py
[-h] -p SEQRUN_PATH -m MD5_PATH -d DBCONFIG_PATH -s SLACK_CONFIG -a ASANA_CONFIG -i ASANA_PROJECT_ID -n PIPELINE_NAME -j SAMPLESHEET_JSON_SCHEMA [-e EXCLUDE_PATH]

Parameters

-h, --help : show this help message and exit
-p, --seqrun_path SEQRUN_PATH
 : Seqrun directory path
-m, --md5_path MD5_PATH
 : Seqrun md5 output dir
-d, --dbconfig_path DBCONFIG_PATH
 : Database configuration json file
-s, --slack_config SLACK_CONFIG
 : Slack configuration json file
-a, --asana_config ASANA_CONFIG
 : Asana configuration json file
-i, --asana_project_id ASANA_PROJECT_ID
 : Asana project id
-n, --pipeline_name PIPELINE_NAME
 : IGF pipeline name
-j, --samplesheet_json_schema SAMPLESHEET_JSON_SCHEMA
 : JSON schema for samplesheet validation
-e, --exclude_path EXCLUDE_PATH
 : List of sub directories excluded from the search

Switch off project barcode checking

Usage

mark_project_barcode_check_off.py
[-h] -p PROJET_ID_LIST -d DBCONFIG [-s] -n SLACK_CONFIG

Parameters

-h, --help : show this help message and exit
-p, --projet_id_list PROJET_ID_LIST
 : A file path listing project_igf_id
-d, --dbconfig DBCONFIG
 : Database configuration file path
-s, --log_slack
 : Toggle slack logging
-n, --slack_config SLACK_CONFIG
 : Slack configuration file path

Accept modified samplesheet for demultiplexing run

Usage

reset_samplesheet_for_pipeline.py
[-h] -p SEQRUN_PATH -d DBCONFIG -n SLACK_CONFIG -a ASANA_CONFIG -i ASANA_PROJECT_ID -f INPUT_LIST

Parameters

-h, --help : show this help message and exit
-p, --seqrun_path SEQRUN_PATH
 : Sequencing run directory path
-d, --dbconfig DBCONFIG
 : Database configuration file path
-n, --slack_config SLACK_CONFIG
 : Slack configuration file path
-a, --asana_config ASANA_CONFIG
 : Asana configuration file path
-i, --asana_project_id ASANA_PROJECT_ID
 : Asana project id
-f, --input_list INPUT_LIST
 : Sequencing run id list file

Copy files to temp directory for demultiplexing run

Usage

moveFilesForDemultiplexing.py
[-h] -i INPUT_DIR -o OUTPUT_DIR -s SAMPLESHEET_FILE -r RUNINFO_FILE

Parameters

-h, --help : show this help message and exit
-i, --input_dir INPUT_DIR
 : Input files directory
-o, --output_dir OUTPUT_DIR
 : Output files directory
-s, --samplesheet_file SAMPLESHEET_FILE
 : Illumina format samplesheet file
-r, --runinfo_file RUNINFO_FILE
 : Illumina format RunInfo.xml file

Transfer metadata to experiment from sample entries

Usage

update_experiment_metadata_from_sample_attribute.py [-h] -d DBCONFIG -n SLACK_CONFIG

Parameters

-h, --help show this help message and exit
-d, --dbconfig DBCONFIG
 : Database configuration file path
-n, --slack_config SLACK_CONFIG
 : Slack configuration file path

Pipeline control

Reset pipeline for data processing

Usage

batch_modify_pipeline_seed.py [-h] -t TABLE_NAME -p PIPELINE_NAME
-s SEED_STATUS -d DBCONFIG -n SLACK_CONFIG -a ASANA_CONFIG -i ASANA_PROJECT_ID -f INPUT_LIST

Parameters

-h, --help : show this help message and exit
-t, --table_name TABLE_NAME
 : Table name for igf id lookup
-p, --pipeline_name PIPELINE_NAME
 : Pipeline name for seed modification
-s, --seed_status SEED_STATUS
 : New seed status for pipeline_seed table
-d, --dbconfig DBCONFIG
 : Database configuration file path
-n, --slack_config SLACK_CONFIG
 : Slack configuration file path
-a, --asana_config ASANA_CONFIG
 : Asana configuration file path
-i, --asana_project_id ASANA_PROJECT_ID
 : Asana project id
-f, --input_list INPUT_LIST
 : IGF id list file

Samplesheet processing

Divide samplesheet data

Usage

divide_samplesheet.py
[-h] -i SAMPLESHEET_FILE -d OUTPUT_DIR [-p]

Parameters

-h, --help : show this help message and exit

-i, -samplesheet_file SAMPLESHEET_FILE : Illumina format samplesheet file -d, --output_dir OUTPUT_DIR : Output directory for writing samplesheet file -p, --print_stats : Print available stats for the samplesheet and exit

Reformat samplesheet for demultiplexing

Usage

reformatSampleSheet.py
[-h] -i SAMPLESHEET_FILE -f RUNINFOXML_FILE [-r] -o OUTPUT_FILE

Parameters

-h, --help : show this help message and exit
-i, --samplesheet_file SAMPLESHEET_FILE
 : Illumina format samplesheet file
-f, --runinfoxml_file RUNINFOXML_FILE
 : Illumina RunInfo.xml file
-r, --revcomp_index
 : Reverse complement HiSeq and NextSeq index2 column, default: True
-o, --output_file OUTPUT_FILE
 : Reformatted samplesheet file

Calculate basesmask for demultiplexing

Usage

makeBasesMask.py
[-h] -s SAMPLESHEET_FILE -r RUNINFO_FILE [-a READ_OFFSET] [-b INDEX_OFFSET]

Parameters

-h, --help : show this help message and exit
-s, --samplesheet_file SAMPLESHEET_FILE
 : Illumina format samplesheet file
-r, --runinfo_file RUNINFO_FILE
 : Illumina format RunInfo.xml file
-a, --read_offset READ_OFFSET
 : Extra sequencing cycle for reads, default: 1
-b, --index_offset INDEX_OFFSET
 : Extra sequencing cycle for index, default: 0

Create or modify data to database

Clean up data from existing database and create new tables

Usage

clean_and_rebuild_database.py
[-h] -d DBCONFIG_PATH -s SLACK_CONFIG

Parameters

-h, --help : Show this help message and exit
-d, --dbconfig_path
 : Database configuration json file
-s, --slack_config
 : Slack configuration json file

Load flowcell runs to database

Usage

load_flowcell_rules_data.py
[-h] -f FLOWCELL_DATA [-u] -d DBCONFIG_PATH -s SLACK_CONFIG

Parameters

-h, --help : Show this help message and exit
-f, --flowcell_data
 : Flowcell rules data json file
-u, --update : Update existing flowcell rules data, default: False
-d, --dbconfig_path
 : Database configuration json file
-s, --slack_config
 : Slack configuration json file

Load pipeline configuration to database

Usage

load_pipeline_data.py
[-h] -p PIPELINE_DATA [-u] -d DBCONFIG_PATH -s SLACK_CONFIG

Paramaters

-h, --help : Show this help message and exit
-p, --pipeline_data
 : Pipeline data json file
-u, --update : Update existing platform data, default: False
-d, --dbconfig_path
 : Database configuration json file
-s, --slack_config
 : Slack configuration json file

Load sequencing platform information to database

Usage

load_platform_data.py [-h] -p PLATFORM_DATA [-u] -d DBCONFIG_PATH -s SLACK_CONFIG

Parameters

-h, --help : Show this help message and exit
-p, --platform_data
 : Platform data json file
-u, --update : Update existing platform data, default: False
-d, --dbconfig_path
 : Database configuration json file
-s, --slack_config
 : Slack configuration json file

Load sequencing run information to database from a text input

Usage

load_seqrun_data.py [-h] -p SEQRUN_DATA -d DBCONFIG_PATH -s SLACK_CONFIG

Parameters

-h, --help : Show this help message and exit
-p, --seqrun_data
 : Seqrun data json file
-d, --dbconfig_path
 : Database configuration json file
-s, --slack_config
 : Slack configuration json file

Load file entries and build collection in database

Usage

load_files_collecion_to_db.py
[-h] -f COLLECTION_FILE_DATA -d DBCONFIG_PATH [-s]

Parameters

-h, --help : show this help message and exit
-f, --collection_file_data COLLECTION_FILE_DATA
 : Collection file data json file
-d, --dbconfig_path DBCONFIG_PATH
 : Database configuration json file
-s, --calculate_checksum
 : Toggle file checksum calculation

Check Storage utilisation

Calculate disk usage summary

Usage

calculate_disk_usage_summary.py
[-h] -p DISK_PATH [-c] [-r REMOTE_SERVER] -o OUTPUT_PATH

Parameters

-h, --help : show this help message and exit
-p, --disk_path DISK_PATH
 : List of disk path for summary calculation
-c, --copy_to_remoter
 : Toggle file copy to remote server
-r, --remote_server REMOTE_SERVER
 : Remote server address
-o, --output_path OUTPUT_PATH
 : Output directory path

Calculate disk usage for a top level directory

Usage

calculate_sub_directory_usage.py
[-h] -p DIRECTORY_PATH [-c] [-r REMOTE_SERVER] -o OUTPUT_FILEPATH

Parameters

-h, --help : show this help message and exit
-p, --directory_path DIRECTORY_PATH
 : A directory path for sub directory lookup
-c, --copy_to_remoter
 : Toggle file copy to remote server
-r, --remote_server REMOTE_SERVER
 : Remote server address
-o, --output_filepath OUTPUT_FILEPATH
 : Output gviz file path

Merge disk usage summary file and build a gviz json

Usage

merge_disk_usage_summary.py
[-h] -f CONFIG_FILE [-l LABEL_FILE] [-c] [-r REMOTE_SERVER] -o OUTPUT_FILEPATH

Parameters

-h, --help : show this help message and exit
-f, --config_file CONFIG_FILE
 : A configuration json file for disk usage summary
-l, --label_file LABEL_FILE
 : A json file for disk label name
-c, --copy_to_remoter
 : Toggle file copy to remote server
-r, --remote_server REMOTE_SERVER
 : Remote server address
-o, --output_filepath OUTPUT_FILEPATH
 : Output gviz file path

Seed analysis pipeline

A script for finding new experiment entries for seeding analysis pipeline

Usage

find_and_seed_new_analysis.py
[-h] -d DBCONFIG_PATH -s SLACK_CONFIG -p PIPELINE_NAME -t FASTQ_TYPE -f PROJECT_NAME_FILE [-m SPECIES_NAME] [-l LIBRARY_SOURCE]

Parameters

-h, --help : show this help message and exit

-d , --dbconfig_path DBCONFIG_PATH : Database configuration json file -s , --slack_config SLACK_CONFIG : Slack configuration json file -p , --pipeline_name PIPELINE_NAME : IGF pipeline name -t , --fastq_type FASTQ_TYPE : Fastq collection type -f , --project_name_file PROJECT_NAME_FILE : File containing project names for seeding analysis pipeline -m , --species_name SPECIES_NAME : Species name to filter analysis -l , --library_source LIBRARY_SOURCE : Library source to filter analysis