forked from fls-bioinformatics-core/genomics
-
Notifications
You must be signed in to change notification settings - Fork 0
/
ChangeLog
1215 lines (891 loc) · 40.5 KB
/
ChangeLog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
2015-08-05 Peter Briggs <peter.briggs@manchester.ac.uk>
* genomics/bcftbx version 0.99.2
- Porting to Ubuntu: update Python scripts to use
'#!/usr/bin/env python' and shell scripts to use
'#!/bin/bash'
- bcftbx/TabFile: add switch to TabFile class t
prevent type conversions when reading in data
- bcftbx/utils: new function 'get_hostname'.
- NGS-general/split_fasta.py: fixes to handle
comments in sequence definition lines.
2015-04-16 Peter Briggs <peter.briggs@manchester.ac.uk>
* genomics/bcftbx version 0.99.1
- First version which is installable via setup.py
- Significant rearrangement of various scripts and
programs
- First version of sphinx-based documentation added
- First version of test scripts for SOLiD and
Illumina QC scripts
2015-02-12 Peter Briggs <peter.briggs@manchester.ac.uk>
* QC-pipeline/illumina_qc.sh
- Version 1.2.2
- Add --threads option (pass number of threads to
use to fastq_screen and fastqc)
* QC-pipeline/fastq_screen.sh
- Add --threads option (pass number of threads to
use to fastq_screen command)
2014-12-10 Peter Briggs <peter.briggs@manchester.ac.uk>
* utils/cmpdirs.py
- Version 0.0.1
- Version 0.0.2
- Version 0.0.3
- New program to recursively compare the contents
of one directory against another.
2014-12-04 Peter Briggs <peter.briggs@manchester.ac.uk>
* build-indexes/make_seq_alignments.sh
- New script to create sequence alignment (.nib)
files from a Fasta file.
2014-12-03 Peter Briggs <peter.briggs@manchester.ac.uk>
* utils/symlink_checker.py
- version 1.1.1
- Add 'genomics' top-level directory to search path
for Python modules.
2014-10-31 Peter Briggs <peter.briggs@manchester.ac.uk>
* QC-pipeline/illumina_qc.sh
- version 1.2.0
- Default behaviour is not *not* to decompress fastq
files, unless new '--ungzip-fastqs' option is
specified (and existing option '--no-gzip-fastqs' now
does nothing).
- version 1.2.1
- Added --version option.
2014-10-14 Peter Briggs <peter.briggs@manchester.ac.uk>
* bcftbx/cmdparse.py
- version 1.0.0
- New module for creating 'command parsers', for
processing command lines of the form 'PROG CMD OPTIONS
ARGS'.
* bcftbx/JobRunner.py
- version 1.1.0
- New function 'fetch_runner', returns appropriate job
runner instance matching text description (used for
specifying job runners on command line or in config
files).
2014-10-10 Peter Briggs <peter.briggs@manchester.ac.uk>
* bcftbx/utils.py
- version 1.5.0
- New function 'list_dirs', gets subdirectories of
specified parent directory.
* bcftbx/Solid.py
- Updated 'SolidRun' class to handle cases where the
run definition file is missing.
2014-10-09 Peter Briggs <peter.briggs@manchester.ac.uk>
* bcftbx/Md5sum.py
- version 1.1.0
- 'md5sum' function updated to handle either file name,
or a file-like object opened for reading.
* bcftbx/utils.py
- version 1.4.8
- New function 'get_current_user', gets name of
user running the program.
2014-10-08 Peter Briggs <peter.briggs@manchester.ac.uk>
* bcftbx/utils.py
- version 1.4.7
- New property 'resolve_link_via_parent' for PathInfo
class, gets 'real' path from one that includes
symbolic links at any level.
2014-09-01 Peter Briggs <peter.briggs@manchester.ac.uk>
* bcftbx/qc/report.py
- version 0.99.1
- relocated QC reporting classes and functions from the
qcreporter.py program into a new module in the bcftbx
package.
* bcftbx
- version 0.99.0
- add a single version for the whole package, accessible
using the 'bcftbx.get_version()' function.
* utils/md5checker.py
- version 0.3.2
- move unit tests into separate test module & remove --test
option.
2014-08-21 Peter Briggs <peter.briggs@manchester.ac.uk>
* bcftbx
- Substantial update: Python library modules from 'share'
relocated to 'bcftbx' and turned into a Python package.
- 'bcf_utils.py' also renamed to 'bcftbx/utils.py'.
- Python applications also updated to reflect the changes.
* microarray/best_exons.py
- version 1.2.1
- new program: averages data for 'best' exons for each gene
symbol in a file.
2014-08-15 Peter Briggs <peter.briggs@manchester.ac.uk>
* share/JobRunner.py
- version 1.0.5
- new 'ge_extract_args' property for GEJobRunner.
2014-08-11 Peter Briggs <peter.briggs@manchester.ac.uk>
* share/Md5sum.py
- version 1.0.1
- fixed compute_md5sums function to handle broken links
2014-06-16 Peter Briggs <peter.briggs@manchester.ac.uk>
* QC-pipeline/illumina_qc.sh
- version 1.1.1
- Need to specify the --extract option to work with FastQC
0.11.2 (should be backwardsly compatible with 0.10.1).
* share/IlluminaData.py
- version 1.1.5
- 'get_casava_sample_sheet' needs to handle leading & trailing
spaces in barcode sequences.
* share/bcf_utils.py
- version 1.4.5
- New function 'walk' traverses directory tree (wrapper for
os.walk function).
2014-06-04 Peter Briggs <peter.briggs@manchester.ac.uk>
* share/IlluminaData.py
- version 1.1.4
- Fix_bases_mask updated to handle situation when a single index
sequence is supplied for dual index data.
* illumina2cluster/report_barcodes.py
- version 0.0.2
- Make reporting cutoff apply only to exact matches.
2014-06-02 Peter Briggs <peter.briggs@manchester.ac.uk>
* illumina2cluster/prep_sample_sheet.py
- version 0.2.1
- New options --include-lanes and --truncate-barcodes allow
selection of subset of lanes, and barcode sequences to be
cut down.
2014-05-22 Peter Briggs <peter.briggs@manchester.ac.uk>
* illumina2cluster/report_barcodes.py
- New program: examine barcode sequences from one or more
FASTQ files and report the most prevalent.
2014-05-15 Peter Briggs <peter.briggs@manchester.ac.uk>
* utils/manage_seqs.py
- New program: utility to handle sets of named sequences;
intended to help manage custom 'contaminants' files for input
into the Brabaham 'FastQC' program.
2014-05-07 Peter Briggs <peter.briggs@manchester.ac.uk>
* QC-pipeline/illumina_qc.sh
- version 1.1.0
- Optionally use a non-default list of contaminants for
FastQC (if specified in the qc.setup file)
- Create and set a local tmp directory for Java when
running FastQC.
- New --no-gunzip option suppresses creation of uncompressed
fastq files.
* share/bcf_utils.py
- version 1.4.4
- New functions for getting user and group names and ID numbers
from the system.
- New 'PathInfo' class for getting information about file system
paths.
- Moved symbolic link handling classes and functions in from
utils/symlink_checker.py program.
- 'format_file_sizes' function updated to format to specific
units, and able to handle terabyte sizes.
- new function 'find_program'.
* share/htmlpagewriter.py
- version 1.0.0
- New module: HTML page generation functionality relocated from
the QC-pipeline/qcreporter.py utility.
* share/IlluminaData.py
- version 1.1.3
- Move 'describe_project', 'summarise_projects' and
'verify_run_against_sample_sheet' functions from
illumina2cluster/analyse_illumina_run.py into this
module.
* share/JobRunner.py
- version 1.0.4
- fix broken 'terminate' method for SimpleJobRunner.
- move set/get of log directory into the BaseJobRunner
class.
* share/Md5sum.py
- Moved Md5Checker and Md5Reporter classes from
utils/md5checker.py program.
* share/Pipeline.py
- version 0.1.3
- add 'runner' property to Job class (to access associated
JobRunner instance).
* share/platforms.py
- added additional platforms and new function 'list_platforms'
* utils/md5checker.py
- version 0.3.0
- substantial refactoring of code to add unit tests;
core functions and classes moved to the share/Md5sym.py
module.
* utils/symlink_checker.py
- version 1.1.0
- refactored to add unit tests and move core functions and
classes to share/bcf_utils.
* utils/uncompress_fastqz.sh
- New utility script for uncompressing fastq files.
2014-04-17 Peter Briggs <peter.briggs@manchester.ac.uk>
* ChIP-seq/make_macs2_xls.py
- version 0.3.2
- Only sort output on fold enrichment
- Handle output from --broad option of MACS2
- Split data over multiple sheets if row limit is exceeded
(approx 64k records)
- Prevent reported command line being truncated if maximum
cell size is exceeded (approx 250 characters)
- Refactored internals to make more robust, added unit
tests and switched to use simple_xls module for
spreadsheet generation.
2014-04-10 Peter Briggs <peter.briggs@manchester.ac.uk>
* RNA-seq/bowtie_mapping_stats.py
- version 1.1.5
- Updated to handle paired-end output from Bowtie2
2014-04-09 Peter Briggs <peter.briggs@manchester.ac.uk>
* share/simple_xls.py
- version 0.0.7
- New methods for inserting and appending columns and rows,
which better mimic operations that would be used within a
graphical spreadsheet program.
- Significant updates to handling internal book-keeping to
improve performance.
2014-04-04 Peter Briggs <peter.briggs@manchester.ac.uk>
* RNA-seq/bowtie_mapping_stats.py
- version 1.1.3
- Updated, now works with output from both Bowtie and Bowtie2
* share/simple_xls.py
- version 0.0.3
- New module intended to provide a nicer programmatic interface
to Excel spreadsheet generation (built on top of
Spreadsheet.py).
2014-02-11 Peter Briggs <peter.briggs@manchester.ac.uk>
* share/JobRunner.py
- version 1.0.2
- SimpleJobRunner: 'join_dirs' option joins stderr to stdout
- GEJobRunner: jobs in 't' (transferring) and 'qw'
(queued-waiting) states counted as "running"
- GEJobRunner: arbitrary qsub arguments can be specified via
'ge_extra_args' option
* share/SpreadSheet.py
- version 0.1.8: add support for additional style options
('font_height', 'centre', 'shrink_to_fit')
* share/bcf_utils.py
- version 1.0.3
- New function 'find_program' (locate file on PATH)
- New function 'name_matches' (simple pattern matching for project
and sample names, moved from analyse_illumina_data.py)
- New class 'AttributeDictionary'
- New class 'OrderedDictionary'
- New function 'touch' (creates new empty file)
* QC-pipeline/illumina_qc.sh
- Gunzip fastq.gz files via temporary name, to avoid partial
fastqs left behind if script terminates prematurely
- Write program version information to 'qc' subdirectory
* QC-pipeline/fastq_screen.sh
- Clean up existing files from previous incomplete run
* QC-pipeline/qcreporter.py
- version 0.1.1
- QCSample: 'fastqc' method made into a property
* share/Pipeline.py
- version 0.1.2
- Job class: add 'wait' method (waits for job to complete)
- PipelineRunner: 'max_concurrent_jobs' now applies only to
pipeline instance (i.e. not across all pipelines)
- PipelineRunner: implemented __del__ method to clean up
running pipeline instance (i.e. terminate running jobs)
* share/IlluminaData.py
- version 1.1.2
- New function 'fix_bases_mask' (adjust bases mask to match
actual barcode sequence lengths, for bclToFastq)
* ChIP-seq/make_macs_xls.sh
- Removed (redundant wrapper script to make_macs_xls.py)
* Unit tests
- Python unit tests moved into separate files in 'share'
2013-11-18 Peter Briggs <peter.briggs@manchester.ac.uk>
* build-indexes/fetch_fasta.sh
- Neurospora crassa (Ncrassa) updated to June 25th 2013
version.
* build-indexes/bowtie2_build_indexes.sh
- New: wrapper script to build bowtie2 indexes from a
fasta file.
* build-indexes/build_indexes.sh
- remove bfast indexes & add bowtie2.
2013-11-15 Peter Briggs <peter.briggs@manchester.ac.uk>
* build-indexes/fetch_fasta.sh
- various builds renamed to longer & more accurate names:
* hg18 -> hg18_random_chrM
* hg19 -> hg19_GRCh37_random_chrM
* mm9 -> mm9_random_chrM_chrUn
* mm10 -> mm10_random_chrM_chrUn
* dm3 -> dm3_het_chrM_chrU
* ecoli -> e_coli
* dicty -> dictyostelium
* chlamyR -> Creinhardtii169
- updates to broken download URLs and checksums for PhiX,
sacBay, ws200 and ws201 genome builds.
- UniVec updated to build #7.1.
2013-11-13 Peter Briggs <peter.briggs@manchester.ac.uk>
* build-indexes/fetch_fasta.sh
- updated to include sacCer1, sacCer3 and mm10 sequences.
- updated URL for C. reinhardtii.
- fixed minor bug in 'fetch_url' function.
2013-09-11 Peter Briggs <peter.briggs@manchester.ac.uk>
* share/IlluminaData.py
- version 1.1.1: update get_casava_sample_sheet function to
handle "Experimental Manager"-type sample sheet files when
there are no barcode indexes.
* share/JobRunner.py
- version 1.0.1: fix and standardise handling of log and error
files for SimpleJobRunner and GEJobRunner classes; also added
minimal unit tests for these classes.
2013-09-09 Peter Briggs <peter.briggs@manchester.ac.uk>
* share/FASTQFile.py
- version 0.3.0: attempt to improve performance of
SequenceIdentifier class (use string parsing instead of
regular expressions), and added new method 'is_pair_of'
(can be used to check if another SequenceIdentifier forms
an R1/2 pair with this one). FastqRead class has new attribute
'raw_seqid' (returns original sequence id header supplied on
instantiation). New function 'fastqs_are_pair' checks that
corresponding read headers match between two FASTQ files.
* illumina2cluster/verify_paired.py
- version 1.0.0: new utility to check that two fastq files form
an R1/R2 pair.
* illumina2cluster/analyse_illumina_run.py
- version 0.1.11: updated implementation of --merge-fastqs option.
* illumina2cluster/check_paired_fastqs.py
- Removed: replaced by 'verify_paired.py'.
* share/JobRunner.py
- version 1.0.1: updates to SimpleJobRunner and GEJobRunner classes
(store names associated with each job, and enable lookup via 'name'
method; ensure stored log directory is an absolute path, and that
log and error file names can be retrieved correctly even if log dir
is subsequently changed).
2013-09-06 Peter Briggs <peter.briggs@manchester.ac.uk>
* illumina2cluster/analyse_illumina_run.py
- version 0.1.9: improvements to reporting options when using
--summary and --list options.
- version 0.1.10: fix bug for runs that don't have undetermined
indices.
* share/IlluminaData.py
- version 1.0.2: new method 'fastq_subset' for IlluminaSample
(returns subset of fastq files based on read number).
2013-08-22 Peter Briggs <peter.briggs@manchester.ac.uk>
* share/bcf_utils.py:
- version 1.0.1: added new function 'concatenate_fastq_files'
(concatenates a list of fastq files).
- version 1.0.2: updated 'concatenate_fastq_files' to improve
performance, and added tests.
* illumina2cluster/analyse_illumina_run.py
- version 0.1.8: new option --merge-fastqs, creates
concatenated fastq files for each sample.
* share/IlluminaData.py
- version 1.0.1: new property 'full_name' for IlluminaData,
(returns name suitable for analysis subdirectory); new
function 'get_unique_fastq_names' (generates mapping of
full Illumina-style fastq file names to shortest unique
version).
* illumina2cluster/build_illumina_analysis_dir.py
- version 1.0.1: move analysis directory creation code from
__main__ to new 'create_analysis_dir' function.
- version 1.0.2: remove redundant functions and switch to
versions in bcf_utils module.
2013-08-21 Peter Briggs <peter.briggs@manchester.ac.uk>
* share/bcf_utils.py
- added baseline version number (1.0.0)
* illumina2cluster/build_illumina_analysis_dir.py
- added baseline version number (1.0.0)
2013-08-20 Peter Briggs <peter.briggs@manchester.ac.uk>
* share/IlluminaData.py, JobRunner.py
- added version numbers (baseline 1.0.0)
* share/FASTQFile.py
- version 0.2.6: fix sequence length returned for
colorspace reads by FastqRead.seqlen
- version 0.2.5: added is_colorspace property to FastqRead
2013-08-19 Peter Briggs <peter.briggs@manchester.ac.uk>
* illumina2cluster/prep_sample_sheet.py:
- version 0.2.0: --miseq option is deprecated as it's no
longer necessary; sample sheet conversion is performed
automatically if required.
* illumina2cluster/IlluminaData.py:
- new function 'get_casava_sample_sheet' produces a
CasavaSampleSheet object from sample sheet CSV file
regardless of format. 'convert_miseq_samplesheet_to_casava'
is deprecated as it is now just a wrapper to the more
genral function.
* share/FASTQFile.py
- version 0.2.4: added new properties to FastqRead: seqlen
(return sequence length), maxquality and minquality (max
and min encoded quality scores).
2013-08-14 Peter Briggs <peter.briggs@manchester.ac.uk>
* share/FASTQFile.py
- version 0.2.3: new FastqAttributes class provides
access to "gross" attributes of FASTQ file (e.g. read
count, file size).
* share/JobRunner.py
- SimpleJobRunner and GEJobRunner classes allow destination
directory for log files to be specified explicitly, and
to be changed after instantiation via new 'log_dir' methods.
- GEJobRunner class has new 'queue' method allowing GE queue
to be changed after instantiation.
2013-08-08 Peter Briggs <peter.briggs@manchester.ac.uk>
* illumina2cluster/analyse_illumina_run.py
- version 0.1.7: --summary option generates a one-line
description of projects and numbers of samples, suitable
for logging file entries.
2013-08-05 Peter Briggs <peter.briggs@manchester.ac.uk>
* share/IlluminaData.py
- new classes IlluminaRun (extracts data from a directory
with the "raw" data from a sequencer run) and
IlluminRunInfo (extracts data from a RunInfo.xml file).
* share/platforms.py
- new Python module with utilities and data to identify NGS
sequencer platforms
* illumina2cluster/rsync_seq_data.py
- version 0.0.5: moved sequencer platform identification
code to share/platforms.py
- version 0.0.4: new options --no-log (write rsync ouput
directly to stdout) and --exclude (specify rsync filter
patterns to exclude files from transfer); explicitly
handle keyboard interrupt (i.e. ctrl-C) during rsync
operation.
2013-08-01 Peter Briggs <peter.briggs@manchester.ac.uk>
* illumina2cluster/rsync_seq_data.py
- version 0.0.3: added new hiseq sequencer pattern to
PLATFORMS.
2013-07-26 Peter Briggs <peter.briggs@manchester.ac.uk>
* illumina2cluster/rsync_seq_data.py
- version 0.0.2: add --mirror option, runs rsync with
--delete-after option to remove files from target directory
which are no longer present in the source.
* share/Spreadsheet.py
- version 0.1.7: fixed bug which meant formulae generation
failed for columns after 'Z' (i.e. 'AA', 'AB' etc).
2013-07-19 Peter Briggs <peter.briggs@manchester.ac.uk>
* ChIP-seq/make_macs2_xls.py
- modified version of make_macs_xls.py to convert XLS output
files from MACS 2.0.10 (contributed by Ian Donaldson).
2013-07-15 Peter Briggs <peter.briggs@manchester.ac.uk>
* illumina2cluster/rsync_seq_data.sh
- removed, replaced by rsync_seq_data.py.
* illumina2cluster/rsync_seq_data.py
- version 0.0.1: new program for rsync'ing sequencing data to
the appropriate location in the archive.
* utils/cluster_load.py
- new utility for reporting current Grid Engine utilisation by
wrapping the qstat program.
2013-05-21 Peter Briggs <peter.briggs@manchester.ac.uk>
* illumina2cluster/auto_process_illumina.sh
- version 0.2.4: use multiple cores for bcl-to-fastq conversion.
* share/IlluminaData.py
- IlluminaSample class no longer raises an exception if no fastq
files are found, so IlluminaData objects can be populated from
an incomplete CASAVA run.
* illumina2cluster/build_illumina_analysis_dir.py
- automatically determine the set of shortest unique link names
to use for fastqs in each project.
2013-05-20 Peter Briggs <peter.briggs@manchester.ac.uk>
* illumina2cluster/bclToFastq.sh
- New option --nprocessors allows specification of number of
cores to utilise when performing bcl to Fastq conversion.
2013-05-17 Peter Briggs <peter.briggs@manchester.ac.uk>
* illumina2cluster/auto_process_illumina.sh
- version 0.2.3: fix bug with extracting the exit code from the
CASAVA/bcl2fastq step.
* share/FASTQFile.py
- version 0.2.1: implement more efficient line counting in nreads
function.
* illumina2cluster/analyse_illumina_run.py
- version 0.1.4: print results from --stats option in real time.
2013-05-15 Peter Briggs <peter.briggs@manchester.ac.uk>
* illumina2cluster/auto_process_illumina.sh
- version 0.2.2: fix automatic determination of number of allowed
mismatches from the bases mask, to deal with e.g. 'I6n'
2013-05-02 Peter Briggs <peter.briggs@manchester.ac.uk>
* illumina2cluster/auto_process_illumina.sh
- version 0.2.1: write log files to "logs" subdirectory.
2013-05-01 Peter Briggs <peter.briggs@manchester.ac.uk>
* illumina2cluster/auto_process_illumina.sh
- version 0.2.0: updated to work with multiple sample sheets.
2013-04-25 Peter Briggs <peter.briggs@manchester.ac.uk>
* illumina2cluster/auto_process_illumina.sh
- version 0.1.0: significant updates to improve robustness, automatically
acquire mismatches and generate statistics report.
* ilumina2cluster/analyse_illumina_run.py
- version 0.1.2: also report file sizes as well as number of reads for
Fastq files using --stats option.
* share/bcf_utils.py
- new function "format_file_size" (converts file size supplied in bytes
into human-readable form e.g. 4.0K, 186.0M, 1.6G).
2013-04-24 Peter Briggs <peter.briggs@manchester.ac.uk>
* share/bcf_utils.py
- fix bug in extract_index (failed for names ending with 0 e.g. 'PJB0').
2013-04-23 Peter Briggs <peter.briggs@manchester.ac.uk>
* ilumina2cluster/analyse_illumina_run.py
- version 0.1.1: added --stats option (reports number of reads for each
FASTQ file generated by CASAVA's bcl-to-FASTQ conversion).
* share/IlluminaData.py
- IlluminaData class has new property "undetermined" (allows access to
undetermined reads produced by demultiplexing).
- IlluminaProject.prettyPrintSamples() no longer includes info on paired
endedness of the data in the project.
2013-04-22 Peter Briggs <peter.briggs@manchester.ac.uk>
* illumina2cluster/auto_process_illumina.sh
- new script to automate processing of sequencing data from Illumina
platforms.
2013-04-16 Peter Briggs <peter.briggs@manchester.ac.uk>
* QC-pipeline/run_qc_pipeline.py
- fix bug with --queue option which meant queue specification was not
being honoured by the program.
2013-04-11 Peter Briggs <peter.briggs@manchester.ac.uk>
* illumina2cluster/analyse_illumina_run.py
- version 0.1.0: new option --verify=SAMPLE_SHEET, verifies outputs
against those predicted by the named sample sheet.
* share/IlluminaData.py
- CasavaSampleSheet class:
1. In "duplicated_names" method, now considers index and lane number
as well as SampleID and SampleProject in determining uniqueness.
2. New method "predict_output", returns a data structure describing
the expected project/sample/base file name hierarchy that would be
created using the sample sheet.
3. Added 'paired_end' attribute to the IlluminaData and
IlluminaProject classes.
* illumina2cluster/prep_sample_sheet.py
- version 0.1.0: renamed from 'update_sample_sheet.py'
- version 0.1.1: print predicted outputs for the input sample sheet.
* illumina2cluster/update_sample_sheet.py
- renamed to 'prep_sample_sheet.py'
* illumina2cluster/demultiplex_undetermined_fastq.py
- new program: reassign reads with undetermined index sequences (i.e.
barcodes) from the FASTQ files in the 'Undetermined_indices'
output directory from CASAVA.
2013-04-10 Peter Briggs <peter.briggs@manchester.ac.uk>
* QC-pipeline/qcreporter.py
- version 0.1.0: added version number, and write this to report header
along with date and time of report generation.
- put the per-base quality boxplot from FastQC into the top-level
report.
* share/IlluminaData.py
- CasavaSampleSheet class: automatically remove double quotes from
around sample sheet values upon reading.
2013-04-09 Peter Briggs <peter.briggs@manchester.ac.uk>
* share/FASTQFile.py
- version 0.2.0: added tests, new function "nreads" (counts reads in
FASTQ), and enabled FastqIterator to read data from an open
file-like object.
2013-04-08 Peter Briggs <peter.briggs@manchester.ac.uk>
* share/IlluminaData.py
- updated IlluminaProject class: allow "Undetermined_indices" dir to
also be treated as a "project" within the class framework.
* illumina2cluster/analyse_illumina_run.py
- added --copy option, to copy specific FASTQ files to pwd.
2013-04-05 Peter Briggs <peter.briggs@manchester.ac.uk>
* QC-pipeline/qcreporter.py
- new --regexp option allows selection of a subset of samples based on
regular expression pattern matching e.g. --regexp=SY[1-4]?_trim
2013-03-13 Peter Briggs <peter.briggs@manchester.ac.uk>
* share/JobRunner.py
- update GEJobRunner and DRMAAJobRunner classes to deal with suspended
jobs.
* share/FASTQFile.py
- version 0.1.2: update FastqRead class to operate in a more efficient
"lazy" fashion.
2013-03-07 Peter Briggs <peter.briggs@manchester.ac.uk>
* utils/fastq_sniffer.py
- new utility to identify likely FASTQ file format, quality encoding
and equivalent Galaxy data type.
2013-02-19 Peter Briggs <peter.briggs@manchester.ac.uk>
* utils/extract_reads.py
- version 0.1.3: fix bug handling fastq files, was confused by quality
lines beginning with '#' character.
2013-02-18 Peter Briggs <peter.briggs@manchester.ac.uk>
* illumina2cluster/update_sample_sheet.py
- fix bug in --set-id option which misidentified lanes by their number.
2013-01-29 Peter Briggs <peter.briggs@manchester.ac.uk>
* illumina2cluster/update_sample_sheet.py
- new option --miseq indicates input sample sheet is in MiSeq format,
(which will be converted to CASAVA format on output).
* share/IlluminaData.py
- update convert_miseq_samplesheet_to_casava to handle paired-end MiSeq
sample sheet.
- add new attribute "paired_end" to IlluminaSample objects, to indicate
whether the sample has paired end data.
* illumina2cluster/build_illumina_analysis_dir.py
- deal correctly with linking to paired end Fastq files.
2013-01-25 Peter Briggs <peter.briggs@manchester.ac.uk>
* share/IlluminaData.py
- fix bug in convert_miseq_samplesheet_to_casava (always wrote empty
sample sheet).
2013-01-24 Peter Briggs <peter.briggs@manchester.ac.uk>
* share/FASTQFile.py
- version 0.1.0: "casava" format now renamed to "illumina18", for
consistency with FASTQ information at
http://en.wikipedia.org/wiki/FASTQ_format
- version 0.1.1: fixed failure to read Illumina 1.8+ files that are
missing barcode sequences in the identifier string.
2013-01-23 Peter Briggs <peter.briggs@manchester.ac.uk>
* share/IlluminaData.py
- new class CasavaSampleSheet for handling sample sheet files for input
into CASAVA.
- new function convert_miseq_samplesheet_to_casava for creating CASAVA
style sample sheet from one from a MiSEQ sequencer.
* illumina2cluster/update_sample_sheet.py
- updated to use the CasavaSampleSheet class from IlluminaData.py.
2013-01-22 Peter Briggs <peter.briggs@manchester.ac.uk>
* share/FASTQFile.py
- version 0.0.2: enable FastqIterator to operate on gzipped FASTQ input.
2013-01-21 Peter Briggs <peter.briggs@manchester.ac.uk>
* utils/split_fasta.py
- version 0.1.0: substantial rewrite to enable the core functionality
to be unit tested.
* utils/extract_reads.py
- version 0.1.2: cosmetic updates to comments etc only.
2013-01-18 Peter Briggs <peter.briggs@manchester.ac.uk>
* utils/split_fasta.py
- new utility for splitting Fasta file into individual chromosomes.
2013-01-14 Peter Briggs <peter.briggs@manchester.ac.uk>
* QC-pipeline/qcreporter.py
- new option --verify: reports if all expected outputs from the QC
pipeline exist for each sample, to check that the pipeline ran to
completion.
2013-01-10 Peter Briggs <peter.briggs@manchester.ac.uk>
* QC-pipeline/fastq_stats.sh
- fix bug in sorting stats file, now header lines should always sort to
the top of the file.
* illumina2cluster/analyse_illumina_run.py
- first version of reporting utility for Illumina data, similar to the
"analyse_solid_run.py" in solid2cluster.
* illumina2cluster/build_illumina_analysis_dir.py
- moved --list and --report functions to new analyse_illumina_data.py
utility.
* solid2cluster/analyse_solid_run.py
- only print paths to primary data files if --report-paths option is
specified
- print timestamps for primary data files along with sample names
- --quiet option renamed to --no-warnings
2013-01-09 Peter Briggs <peter.briggs@manchester.ac.uk>
* illumina2cluster/build_illumina_analysis_dir.py
- moved classes for handling Illumina data to IlluminaData.py, and take
other utility functions from bcf_utils.py
* share/Experiment.py
- moved utility functions to bcf_utils.py module
* share/IlluminaData.py
- new Python module containing classes for handling Illumina-based
sequencing data, extracted from build_illumina_analysis_dir.py.
* share/bcf_utils.py
- new Python module containing common utility functions shared between
sequencing data modules, extracted from Experiment.py.
2013-01-07 Peter Briggs <peter.briggs@manchester.ac.uk>
* illumina2cluster/build_illumina_analysis_dir.py
- add --report option to pretty print sample names within each project.
2012-12-06 Peter Briggs <peter.briggs@manchester.ac.uk>
* NGS-general/boxplotps2png.sh
- utility to generate PNGs from PS boxplots generated by qc_boxplotter.
* QC-pipeline/qcreporter.py
- updated to deal with reporting QC for older SOLiD runs which predate
filtering (so there are just boxplots and fastq_screens).
2012-11-27 Peter Briggs <peter.briggs@manchester.ac.uk>
* QC-pipeline/qcreporter.py
- added --qc_dir option to specify a non-default QC directory.
2012-11-26 Peter Briggs <peter.briggs@manchester.ac.uk>
* illumina2cluster/rsync_seq_data.sh
- utility script wrapping rsync command for copying arbitrary sequence
data directories.
* illumina2cluster/update_sample_sheet.py
- check for empty sampleID and SampleProject names.
* QC-pipeline/illumina_qc.sh
- add --nogroup option to FastQC invocation.
- remove ".fastq" from output log file names when running with fastq.gz
input files.
* illumina2cluster/build_illumina_analysis_dirs.py
- make relative (rather than absolute) symbolic links to source fastq files
when building analysis directories.
2012-11-16 Peter Briggs <peter.briggs@manchester.ac.uk>
* utils/fastq_edit.py
- version 0.0.2: added --stats option to generate simple statistics
about input FASTQ file.
2012-11-13 Peter Briggs <peter.briggs@manchester.ac.uk>
* illumina2cluster/bclToFastq.sh
- added --nmismatches options (passes number of allowed mismatches to
the underlying configureBclToFastq.pl script in CASAVA).
42012-11-01 Peter Briggs <peter.briggs@manchester.ac.uk>
* utils/symlink_checker.py
- new utility for checking and updating (broken) symbolic links.
* QC-pipeline/qcreporter.py
- added --format option (explicitly specify format of base input files if
necessary) and updated automatic platform and data type detection.
* share/Spreadsheet.py
- version 0.1.6: Workbook class issues warning when appending to an existing
XLS file (previously warned when creating a new file)
2012-10-31 Peter Briggs <peter.briggs@manchester.ac.uk>
* illumina2cluster/update_sample_sheet.py
- new option --fix-duplicates automatically deals with duplicated
SampleID/SampleProject combinations; using --fix-duplicates and
--fix-spaces together should deal with most sample sheet problems
without requiring further intervention.
2012-10-18 Peter Briggs <peter.briggs@manchester.ac.uk>
* solid2cluster/analyse_solid_run.py
- --layout option now defaults to 'absolute' links to primary data in generated
script.
* solid2cluster/build_analysis_dir.py
- default is now to make absolute links to primary data files
2012-10-16 Peter Briggs <peter.briggs@manchester.ac.uk>
* illumina2cluster/update_sample_sheet.py
- added --ignore-warnings option (forces output sample sheet file to
be written out even if there are errors)
2012-10-15 Peter Briggs <peter.briggs@manchester.ac.uk>
* illumina2cluster/bclToFastq.sh
- added --use-bases-mask option (passes mask specification to the underlying
configureBclToFastq.pl script in CASAVA).
* illumina2cluster/build_illumina_analysis_dir.py
- added new options --keep-names (preserve the full names of the source fastq
files when creating links) and --merge-replicates (create merged fastq files
for each set of replicates detected).
2012-10-03 Peter Briggs <peter.briggs@manchester.ac.uk>
* QC-pipeline/run_qc_pipeline.py
- added --regexp option to allow filtering of input file names.
* QC-pipeline/solid_qc.sh, illumina_qc.sh
- write data about underlying QC programs (including versions) to
<sample>.programs output files.
* QC-pipeline/qcreporter.py
- report QC program information from <sample>.programs files (if
available).
2012-10-02 Peter Briggs <peter.briggs@manchester.ac.uk>
* QC-pipeline/qcreporter.py
- output ZIP file has run/sample-specific top-level directory; HTML
report file name restored to 'qc_report.html'.
2012-10-01 Peter Briggs <peter.briggs@manchester.ac.uk>
* QC-pipeline/qcreporter.py
- fixed bug for correctly allocating screens to samples