Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add average quality score to stats #411

Closed
apcamargo opened this issue Sep 27, 2023 · 5 comments
Closed

Add average quality score to stats #411

apcamargo opened this issue Sep 27, 2023 · 5 comments

Comments

@apcamargo
Copy link

Right now seqkit stats displays the % of bases with quality score over 20 and 30. It would also be useful to have the average quality score information, since this can be used for quality control.

@shenwei356
Copy link
Owner

seqkit seq has one flag:

-Q, --min-qual float            only print sequences with average quality greater or equal than this
                                limit (-1 for no limit) (default -1)

seqkit fx2tab also has one flag to filter records before piping to seqkit tab2fx.

  -q, --avg-qual               print average quality of a read

@apcamargo
Copy link
Author

I don't think those replace a computation of the average quality within a FASTQ. I could achieve that by using fx2tab, taking a average quality across all reads and weighing by the read length.

@shenwei356
Copy link
Owner

Oh, people often use Q20 and Q30, I don't know how many people use the average quality score of the whole FASTQ file. Technically, it's very easy to add the metric.

shenwei356 added a commit that referenced this issue Sep 29, 2023
@shenwei356
Copy link
Owner

added.

$  seqkit stats *.f{a,q}.gz -a
processed files:  6 / 6 [======================================] ETA: 0s. done
file               format  type  num_seqs    sum_len  min_len  avg_len  max_len   Q1   Q2   Q3  sum_gap  N50  Q20(%)  Q30(%)  AvgQual  GC(%)
hairpin.fa.gz      FASTA   RNA     28,645  2,949,871       39      103    2,354   76   91  111        0  101       0       0        0  45.77
mature.fa.gz       FASTA   RNA     35,828    781,222       15     21.8       34   21   22   22        0   22       0       0        0   47.6
Illimina1.8.fq.gz  FASTQ   DNA     10,000  1,500,000      150      150      150  150  150  150        0  150   96.16   89.71    24.82  49.91
nanopore.fq.gz     FASTQ   DNA      4,000  1,798,723      153    449.7    6,006  271  318  391        0  395   40.79   12.63     9.48  46.66
reads_1.fq.gz      FASTQ   DNA      2,500    567,516      226      227      229  227  227  227        0  227   91.24   86.62    15.45  53.63
reads_2.fq.gz      FASTQ   DNA      2,500    560,002      223      224      225  224  224  224        0  224   91.06   87.66    14.62  54.77

@apcamargo
Copy link
Author

Thank you!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants