Sequencing run processing

Metadata registration

Usage

find_and_register_project_metdata.py: [-h] -p PROJET_INFO_PATH -d DBCONFIG -t USER_ACCOUNT_TEMPLATE -n SLACK_CONFIG -u HPC_USER -a HPC_ADDRESS -l LDAP_SERVER [-h] [-s] [-c] [-i] [-m]

Parameters

-h, --help : Show this help message and exit

-p, --projet_info_path

: Project metdata directory path

-d, --dbconfig : Database configuration file path

-t, --user_account_template

: User account information email template file path

-s, --log_slack

: Toggle slack logging

-n, --slack_config

: Slack configuration file path

-c, --check_hpc_user

: Toggle HPC user checking

-u, --hpc_user : HPC user name for ldap server checking

-a, --hpc_address

: HPC address for ldap server checking

-l, --ldap_server

: Ldap server address

-i, --setup_irods

: Setup iRODS account for user

-m, --notify_user

: Notify user about new account and password

Monitor sequencing run for demultiplexing

Usage

find_new_seqrun_and_prepare_md5.py

[-h] -p SEQRUN_PATH -m MD5_PATH -d DBCONFIG_PATH -s SLACK_CONFIG -a ASANA_CONFIG -i ASANA_PROJECT_ID -n PIPELINE_NAME -j SAMPLESHEET_JSON_SCHEMA [-e EXCLUDE_PATH]

Parameters

-h, --help : show this help message and exit

-p, --seqrun_path SEQRUN_PATH

: Seqrun directory path

-m, --md5_path MD5_PATH

: Seqrun md5 output dir

-d, --dbconfig_path DBCONFIG_PATH

: Database configuration json file

-s, --slack_config SLACK_CONFIG

: Slack configuration json file

-a, --asana_config ASANA_CONFIG

: Asana configuration json file

-i, --asana_project_id ASANA_PROJECT_ID

: Asana project id

-n, --pipeline_name PIPELINE_NAME

: IGF pipeline name

-j, --samplesheet_json_schema SAMPLESHEET_JSON_SCHEMA

: JSON schema for samplesheet validation

-e, --exclude_path EXCLUDE_PATH

: List of sub directories excluded from the search

Switch off project barcode checking

Usage

mark_project_barcode_check_off.py

[-h] -p PROJET_ID_LIST -d DBCONFIG [-s] -n SLACK_CONFIG

Parameters

-h, --help : show this help message and exit

-p, --projet_id_list PROJET_ID_LIST

: A file path listing project_igf_id

-d, --dbconfig DBCONFIG

: Database configuration file path

-s, --log_slack

: Toggle slack logging

-n, --slack_config SLACK_CONFIG

: Slack configuration file path

Accept modified samplesheet for demultiplexing run

Usage

reset_samplesheet_for_pipeline.py

[-h] -p SEQRUN_PATH -d DBCONFIG -n SLACK_CONFIG -a ASANA_CONFIG -i ASANA_PROJECT_ID -f INPUT_LIST

Parameters

-h, --help : show this help message and exit

-p, --seqrun_path SEQRUN_PATH

: Sequencing run directory path

-d, --dbconfig DBCONFIG

: Database configuration file path

-n, --slack_config SLACK_CONFIG

: Slack configuration file path

-a, --asana_config ASANA_CONFIG

: Asana configuration file path

-i, --asana_project_id ASANA_PROJECT_ID

: Asana project id

-f, --input_list INPUT_LIST

: Sequencing run id list file

Copy files to temp directory for demultiplexing run

Usage

moveFilesForDemultiplexing.py

[-h] -i INPUT_DIR -o OUTPUT_DIR -s SAMPLESHEET_FILE -r RUNINFO_FILE

Parameters

-h, --help : show this help message and exit

-i, --input_dir INPUT_DIR

: Input files directory

-o, --output_dir OUTPUT_DIR

: Output files directory

-s, --samplesheet_file SAMPLESHEET_FILE

: Illumina format samplesheet file

-r, --runinfo_file RUNINFO_FILE

: Illumina format RunInfo.xml file

Transfer metadata to experiment from sample entries

Usage

update_experiment_metadata_from_sample_attribute.py [-h] -d DBCONFIG -n SLACK_CONFIG

Parameters

-h, --help show this help message and exit

-d, --dbconfig DBCONFIG

: Database configuration file path

-n, --slack_config SLACK_CONFIG

: Slack configuration file path

Pipeline control

Reset pipeline for data processing

Usage

batch_modify_pipeline_seed.py [-h] -t TABLE_NAME -p PIPELINE_NAME

-s SEED_STATUS -d DBCONFIG -n SLACK_CONFIG -a ASANA_CONFIG -i ASANA_PROJECT_ID -f INPUT_LIST

Parameters

-h, --help : show this help message and exit

-t, --table_name TABLE_NAME

: Table name for igf id lookup

-p, --pipeline_name PIPELINE_NAME

: Pipeline name for seed modification

-s, --seed_status SEED_STATUS

: New seed status for pipeline_seed table

-d, --dbconfig DBCONFIG

: Database configuration file path

-n, --slack_config SLACK_CONFIG

: Slack configuration file path

-a, --asana_config ASANA_CONFIG

: Asana configuration file path

-i, --asana_project_id ASANA_PROJECT_ID

: Asana project id

-f, --input_list INPUT_LIST

: IGF id list file

Samplesheet processing

Divide samplesheet data

Usage

divide_samplesheet.py

[-h] -i SAMPLESHEET_FILE -d OUTPUT_DIR [-p]

Parameters

-h, --help : show this help message and exit

-i, -samplesheet_file SAMPLESHEET_FILE : Illumina format samplesheet file -d, --output_dir OUTPUT_DIR : Output directory for writing samplesheet file -p, --print_stats : Print available stats for the samplesheet and exit

Reformat samplesheet for demultiplexing

Usage

reformatSampleSheet.py

[-h] -i SAMPLESHEET_FILE -f RUNINFOXML_FILE [-r] -o OUTPUT_FILE

Parameters

-h, --help : show this help message and exit

-i, --samplesheet_file SAMPLESHEET_FILE

: Illumina format samplesheet file

-f, --runinfoxml_file RUNINFOXML_FILE

: Illumina RunInfo.xml file

-r, --revcomp_index

: Reverse complement HiSeq and NextSeq index2 column, default: True

-o, --output_file OUTPUT_FILE

: Reformatted samplesheet file

Calculate basesmask for demultiplexing

Usage

makeBasesMask.py

[-h] -s SAMPLESHEET_FILE -r RUNINFO_FILE [-a READ_OFFSET] [-b INDEX_OFFSET]

Parameters

-h, --help : show this help message and exit

-s, --samplesheet_file SAMPLESHEET_FILE

: Illumina format samplesheet file

-r, --runinfo_file RUNINFO_FILE

: Illumina format RunInfo.xml file

-a, --read_offset READ_OFFSET

: Extra sequencing cycle for reads, default: 1

-b, --index_offset INDEX_OFFSET

: Extra sequencing cycle for index, default: 0

Create or modify data to database

Clean up data from existing database and create new tables

Usage

clean_and_rebuild_database.py

[-h] -d DBCONFIG_PATH -s SLACK_CONFIG

Parameters

-h, --help : Show this help message and exit

-d, --dbconfig_path

: Database configuration json file

-s, --slack_config

: Slack configuration json file

Load flowcell runs to database

Usage

load_flowcell_rules_data.py

[-h] -f FLOWCELL_DATA [-u] -d DBCONFIG_PATH -s SLACK_CONFIG

Parameters

-h, --help : Show this help message and exit

-f, --flowcell_data

: Flowcell rules data json file

-u, --update : Update existing flowcell rules data, default: False

-d, --dbconfig_path

: Database configuration json file

-s, --slack_config

: Slack configuration json file

Load pipeline configuration to database

Usage

load_pipeline_data.py

[-h] -p PIPELINE_DATA [-u] -d DBCONFIG_PATH -s SLACK_CONFIG

Paramaters

-h, --help : Show this help message and exit

-p, --pipeline_data

: Pipeline data json file

-u, --update : Update existing platform data, default: False

-d, --dbconfig_path

: Database configuration json file

-s, --slack_config

: Slack configuration json file

Load sequencing platform information to database

Usage

load_platform_data.py [-h] -p PLATFORM_DATA [-u] -d DBCONFIG_PATH -s SLACK_CONFIG

Parameters

-h, --help : Show this help message and exit

-p, --platform_data

: Platform data json file

-u, --update : Update existing platform data, default: False

-d, --dbconfig_path

: Database configuration json file

-s, --slack_config

: Slack configuration json file

Load sequencing run information to database from a text input

Usage

load_seqrun_data.py [-h] -p SEQRUN_DATA -d DBCONFIG_PATH -s SLACK_CONFIG

Parameters

-h, --help : Show this help message and exit

-p, --seqrun_data

: Seqrun data json file

-d, --dbconfig_path

: Database configuration json file

-s, --slack_config

: Slack configuration json file

Load file entries and build collection in database

Usage

load_files_collecion_to_db.py: [-h] -f COLLECTION_FILE_DATA -d DBCONFIG_PATH [-s]

Parameters

-h, --help : show this help message and exit

-f, --collection_file_data COLLECTION_FILE_DATA

: Collection file data json file

-d, --dbconfig_path DBCONFIG_PATH

: Database configuration json file

-s, --calculate_checksum

: Toggle file checksum calculation

Check Storage utilisation

Calculate disk usage summary

Usage

calculate_disk_usage_summary.py

[-h] -p DISK_PATH [-c] [-r REMOTE_SERVER] -o OUTPUT_PATH

Parameters

-h, --help : show this help message and exit

-p, --disk_path DISK_PATH

: List of disk path for summary calculation

-c, --copy_to_remoter

: Toggle file copy to remote server

-r, --remote_server REMOTE_SERVER

: Remote server address

-o, --output_path OUTPUT_PATH

: Output directory path

Calculate disk usage for a top level directory

Usage

calculate_sub_directory_usage.py

[-h] -p DIRECTORY_PATH [-c] [-r REMOTE_SERVER] -o OUTPUT_FILEPATH

Parameters

-h, --help : show this help message and exit

-p, --directory_path DIRECTORY_PATH

: A directory path for sub directory lookup

-c, --copy_to_remoter

: Toggle file copy to remote server

-r, --remote_server REMOTE_SERVER

: Remote server address

-o, --output_filepath OUTPUT_FILEPATH

: Output gviz file path

Merge disk usage summary file and build a gviz json

Usage

merge_disk_usage_summary.py

[-h] -f CONFIG_FILE [-l LABEL_FILE] [-c] [-r REMOTE_SERVER] -o OUTPUT_FILEPATH

Parameters

-h, --help : show this help message and exit

-f, --config_file CONFIG_FILE

: A configuration json file for disk usage summary

-l, --label_file LABEL_FILE

: A json file for disk label name

-c, --copy_to_remoter

: Toggle file copy to remote server

-r, --remote_server REMOTE_SERVER

: Remote server address

-o, --output_filepath OUTPUT_FILEPATH

: Output gviz file path

Seed analysis pipeline

A script for finding new experiment entries for seeding analysis pipeline

Usage

find_and_seed_new_analysis.py

[-h] -d DBCONFIG_PATH -s SLACK_CONFIG -p PIPELINE_NAME -t FASTQ_TYPE -f PROJECT_NAME_FILE [-m SPECIES_NAME] [-l LIBRARY_SOURCE]

Parameters

-h, --help : show this help message and exit

-d , --dbconfig_path DBCONFIG_PATH : Database configuration json file -s , --slack_config SLACK_CONFIG : Slack configuration json file -p , --pipeline_name PIPELINE_NAME : IGF pipeline name -t , --fastq_type FASTQ_TYPE : Fastq collection type -f , --project_name_file PROJECT_NAME_FILE : File containing project names for seeding analysis pipeline -m , --species_name SPECIES_NAME : Species name to filter analysis -l , --library_source LIBRARY_SOURCE : Library source to filter analysis

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

all_scripts.rst

all_scripts.rst

Sequencing run processing

Metadata registration

Monitor sequencing run for demultiplexing

Switch off project barcode checking

Accept modified samplesheet for demultiplexing run

Copy files to temp directory for demultiplexing run

Transfer metadata to experiment from sample entries

Pipeline control

Reset pipeline for data processing

Samplesheet processing

Divide samplesheet data

Reformat samplesheet for demultiplexing

Calculate basesmask for demultiplexing

Create or modify data to database

Clean up data from existing database and create new tables

Load flowcell runs to database

Load pipeline configuration to database

Load sequencing platform information to database

Load sequencing run information to database from a text input

Load file entries and build collection in database

Check Storage utilisation

Calculate disk usage summary

Calculate disk usage for a top level directory

Merge disk usage summary file and build a gviz json

Seed analysis pipeline

`-h, --help`	: Show this help message and exit
`-p, --projet_info_path`
	: Project metdata directory path
`-d, --dbconfig`	: Database configuration file path
`-t, --user_account_template`
	: User account information email template file path
`-s, --log_slack`
	: Toggle slack logging
`-n, --slack_config`
	: Slack configuration file path
`-c, --check_hpc_user`
	: Toggle HPC user checking
`-u, --hpc_user`	: HPC user name for ldap server checking
`-a, --hpc_address`
	: HPC address for ldap server checking
`-l, --ldap_server`
	: Ldap server address
`-i, --setup_irods`
	: Setup iRODS account for user
`-m, --notify_user`
	: Notify user about new account and password

`-h, --help`	: show this help message and exit
`-p, --seqrun_path SEQRUN_PATH`
	: Seqrun directory path
`-m, --md5_path MD5_PATH`
	: Seqrun md5 output dir
`-d, --dbconfig_path DBCONFIG_PATH`
	: Database configuration json file
`-s, --slack_config SLACK_CONFIG`
	: Slack configuration json file
`-a, --asana_config ASANA_CONFIG`
	: Asana configuration json file
`-i, --asana_project_id ASANA_PROJECT_ID`
	: Asana project id
`-n, --pipeline_name PIPELINE_NAME`
	: IGF pipeline name
`-j, --samplesheet_json_schema SAMPLESHEET_JSON_SCHEMA`
	: JSON schema for samplesheet validation
`-e, --exclude_path EXCLUDE_PATH`
	: List of sub directories excluded from the search

`-h, --help`	: show this help message and exit
`-p, --projet_id_list PROJET_ID_LIST`
	: A file path listing project_igf_id
`-d, --dbconfig DBCONFIG`
	: Database configuration file path
`-s, --log_slack`
	: Toggle slack logging
`-n, --slack_config SLACK_CONFIG`
	: Slack configuration file path

`-h, --help`	: show this help message and exit
`-i, --input_dir INPUT_DIR`
	: Input files directory
`-o, --output_dir OUTPUT_DIR`
	: Output files directory
`-s, --samplesheet_file SAMPLESHEET_FILE`
	: Illumina format samplesheet file
`-r, --runinfo_file RUNINFO_FILE`
	: Illumina format RunInfo.xml file

`-h, --help`	show this help message and exit
`-d, --dbconfig DBCONFIG`
	: Database configuration file path
`-n, --slack_config SLACK_CONFIG`
	: Slack configuration file path

`-h, --help`	: show this help message and exit
`-t, --table_name TABLE_NAME`
	: Table name for igf id lookup
`-p, --pipeline_name PIPELINE_NAME`
	: Pipeline name for seed modification
`-s, --seed_status SEED_STATUS`
	: New seed status for pipeline_seed table
`-d, --dbconfig DBCONFIG`
	: Database configuration file path
`-n, --slack_config SLACK_CONFIG`
	: Slack configuration file path
`-a, --asana_config ASANA_CONFIG`
	: Asana configuration file path
`-i, --asana_project_id ASANA_PROJECT_ID`
	: Asana project id
`-f, --input_list INPUT_LIST`
	: IGF id list file

`-h, --help`	: show this help message and exit
`-i, --samplesheet_file SAMPLESHEET_FILE`
	: Illumina format samplesheet file
`-f, --runinfoxml_file RUNINFOXML_FILE`
	: Illumina RunInfo.xml file
`-r, --revcomp_index`
	: Reverse complement HiSeq and NextSeq index2 column, default: True
`-o, --output_file OUTPUT_FILE`
	: Reformatted samplesheet file

`-h, --help`	: Show this help message and exit
`-d, --dbconfig_path`
	: Database configuration json file
`-s, --slack_config`
	: Slack configuration json file

`-h, --help`	: Show this help message and exit
`-f, --flowcell_data`
	: Flowcell rules data json file
`-u, --update`	: Update existing flowcell rules data, default: False
`-d, --dbconfig_path`
	: Database configuration json file
`-s, --slack_config`
	: Slack configuration json file

`-h, --help`	: Show this help message and exit
`-p, --pipeline_data`
	: Pipeline data json file
`-u, --update`	: Update existing platform data, default: False
`-d, --dbconfig_path`
	: Database configuration json file
`-s, --slack_config`
	: Slack configuration json file

`-h, --help`	: Show this help message and exit
`-p, --platform_data`
	: Platform data json file
`-u, --update`	: Update existing platform data, default: False
`-d, --dbconfig_path`
	: Database configuration json file
`-s, --slack_config`
	: Slack configuration json file

`-h, --help`	: Show this help message and exit
`-p, --seqrun_data`
	: Seqrun data json file
`-d, --dbconfig_path`
	: Database configuration json file
`-s, --slack_config`
	: Slack configuration json file

`-h, --help`	: show this help message and exit
`-f, --collection_file_data COLLECTION_FILE_DATA`
	: Collection file data json file
`-d, --dbconfig_path DBCONFIG_PATH`
	: Database configuration json file
`-s, --calculate_checksum`
	: Toggle file checksum calculation

`-h, --help`	: show this help message and exit
`-p, --disk_path DISK_PATH`
	: List of disk path for summary calculation
`-c, --copy_to_remoter`
	: Toggle file copy to remote server
`-r, --remote_server REMOTE_SERVER`
	: Remote server address
`-o, --output_path OUTPUT_PATH`
	: Output directory path

`-h, --help`	: show this help message and exit
`-p, --directory_path DIRECTORY_PATH`
	: A directory path for sub directory lookup
`-c, --copy_to_remoter`
	: Toggle file copy to remote server
`-r, --remote_server REMOTE_SERVER`
	: Remote server address
`-o, --output_filepath OUTPUT_FILEPATH`
	: Output gviz file path

`-h, --help`	: show this help message and exit
`-f, --config_file CONFIG_FILE`
	: A configuration json file for disk usage summary
`-l, --label_file LABEL_FILE`
	: A json file for disk label name
`-c, --copy_to_remoter`
	: Toggle file copy to remote server
`-r, --remote_server REMOTE_SERVER`
	: Remote server address
`-o, --output_filepath OUTPUT_FILEPATH`
	: Output gviz file path

Files

all_scripts.rst

Latest commit

History

all_scripts.rst

File metadata and controls

Sequencing run processing

Metadata registration

Monitor sequencing run for demultiplexing

Switch off project barcode checking

Accept modified samplesheet for demultiplexing run

Copy files to temp directory for demultiplexing run

Transfer metadata to experiment from sample entries

Pipeline control

Reset pipeline for data processing

Samplesheet processing

Divide samplesheet data

Reformat samplesheet for demultiplexing

Calculate basesmask for demultiplexing

Create or modify data to database

Clean up data from existing database and create new tables

Load flowcell runs to database

Load pipeline configuration to database

Load sequencing platform information to database

Load sequencing run information to database from a text input

Load file entries and build collection in database

Check Storage utilisation

Calculate disk usage summary

Calculate disk usage for a top level directory

Merge disk usage summary file and build a gviz json

Seed analysis pipeline