Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read mulitple files and overlaying plots #4

Merged
merged 3 commits into from
Sep 25, 2017
Merged

read mulitple files and overlaying plots #4

merged 3 commits into from
Sep 25, 2017

Conversation

MahShaaban
Copy link
Contributor

Hi @kassambara,
I have been working on a package similar to fastqcr, then I found about yours and I think it's well written. I would like to suggest some code to enable a visual comparison between multiple samples:

  • A function to read multiple files qc_read_collection: a simple wrapper to use your function qc_read to read multiple FastQC files in a way consistent with other package functions. The output of this function is an object of class qc_read_collection that can be used to make overlaying plots.
    This is supposed to work along the lines of the original function, so the output is a list of tibbles and each tibble has an extra column sample to track the original file.
# extract paths to the demo files
qc.dir <- system.file("fastqc_results", package = "fastqcr")
qc.files <- list.files(qc.dir, full.names = TRUE)

# read all modules in all files
qc <- qc_read_collection(qc.files,
                         sample_names = paste('S', 1:5, sep = ''),
                         modules = 'Per sequence GC content')
Output of `r head(qc)`
$per_sequence_gc_content
    sample GC Content     Count
1       S1          0      81.0
2       S1          1      44.0
3       S1          2      14.0
4       S1          3      39.5
5       S1          4      58.0
6       S1          5      78.5
7       S1          6     143.0
8       S1          7     264.5
9       S1          8     342.5
10      S1          9     427.5
  • A function to plot overlaying lines of the GC content of multiple samples. As an example, I add another simple modification to the function .plot_gc_content that produces a line graph of the GC content of multiple samples.
    And here is the output of calling plot_gc_content_collection(qc)

rplot

If you find these suggestions interesting, I can write the extensions to plotting other modules and figure out a dispatch method on the original qc_readclass and qc_read_collection.

@MahShaaban MahShaaban mentioned this pull request Sep 25, 2017
@kassambara
Copy link
Owner

Great job Man and thank you for your contribution. Being able to plot a collection of samples will be extremely useful to the community.

Please make sure to add your name in the documentation to give you the credit.

#'@author Mahmoud Shaaban, \email{mahmoud.s.fahmy@@students.kasralainy.edu.eg}

@kassambara kassambara merged commit 7c317e7 into kassambara:master Sep 25, 2017
kassambara added a commit that referenced this pull request Sep 25, 2017
@MahShaaban
Copy link
Contributor Author

I am glad you find the suggestions interesting, @kassambara. I will follow with another pull request with the plotting functions for the other modules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants