Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal MGF export functions gain parameter addFields #362

Merged
merged 3 commits into from
Oct 12, 2018
Merged

Conversation

jorainer
Copy link
Collaborator

Add parameter addFields to internal writeMgfDataFile and writeMgfContent to support addition of arbitrary additional fields in the MGF output file.

- Add parameter addFields to internal writeMgfDataFile and writeMgfContent to
  support addition of arbitrary additional fields in the MGF output file.
- Add related unit tests.
@jorainer jorainer requested review from lgatto and sgibb October 10, 2018 08:44
Copy link
Collaborator

@sgibb sgibb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

if (class(con) == "character" && file.exists(con)) {
message("Overwriting ", con, "!")
unlink(con)
}

if (length(addFields)) {
if (is.null(dim(addFields)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you don't use is.matrix(addFields) || is.data.frame(addFields) (I guess because you want to support S4Vectors::DataFrame and so on)? But than I think length(dim(addFields)) == 2 would be safer. I don't know why anybody want to try this but currently your code accepts addFields = array(1:27, dim=c(3, 3, 3)). as.matrix in the following line will turn it in a 27x1 matrix which will (hopefully) cause an error at nrow(addFields) != length(splist).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @sgibb ! Thanks! I'll fix.

stop("Column names required on 'addFields'.")
if (nrow(addFields) != length(splist))
stop("nrow of 'addFields' has to match length of 'splist'")
haveAddFields <- TRUE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this additional variable haveAddFields? Would not is.null(addFields) be enough in line 63 (if (haveAddFields)). But even this would not be needed at all (see below).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we don't. Usually I add a variable instead of calling is.null 2 or more times on the same object in a function. Doesn't apply here, but if the function would be called in a loop this can have a performance effect.
But good point with the NULL subsetting! Actually never tried subsetting NULL 😄

@@ -45,13 +60,17 @@ writeMgfDataFile <- function(splist, con, COM = NULL, TITLE = NULL,
if (verbose)
setTxtProgressBar(pb, i)

writeMgfContent(splist[[i]], TITLE = NULL, con = con)
if (haveAddFields)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this if statement here. If we reach this point addFields is a matrix or NULL. And you can subset NULL:

f <- function(n, addFields=NULL) {
    for (i in seq_len(n)) {
        print(addFields[i, ])
    }
}

f(3)
# NULL
# NULL
# NULL

f(3, matrix(1:6, ncol=2, dimnames=list(NULL, c("FOO", "BAR"))))
# FOO BAR 
#   1   4 
# FOO BAR 
#   2   5 
# FOO BAR 
#   3   6 

While it looks a little bit odd/dirty the workhorse function writeMgfContent will test for NULL itself.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

R/readWriteMgfData.R Show resolved Hide resolved
}
if (verbose)
close(pb)
}

writeMgfContent <- function(sp, TITLE = NULL, con) {
writeMgfContent <- function(sp, TITLE = NULL, con, addFields = TRUE) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the default value TRUE for addFields here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ups, well, me neither. I'll change to addFields = NULL.

@@ -83,6 +102,11 @@ writeMgfContent <- function(sp, TITLE = NULL, con) {
}
} else .cat("\nRTINSECONDS=", rtime(sp))

if (length(addFields)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of the nested if statement: length(addFields) && !is.null(names(addFields)) or is.vector(addFields) && !is.null(names(addFields))?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. Will change.

if (length(addFields)) {
if (!is.null(names(addFields)))
.cat(paste0("\n", paste(toupper(names(addFields)),
addFields, sep = "="), collapse = ""))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.cat("\n", paste(toupper(names(addFields)), addFields, sep = "=", collapse = "\n"))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, better that way. Thanks.

if (length(dim(addFields)) != 2)
stop("'addFields' has to be a matrix or data.frame.")
if (!is.matrix(addFields))
addFields <- do.call(cbind, lapply(addFields, as.character))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really hate as.matrix for calling format and padding all characters with whitespaces. Instead of do.call and lapply we could use addFields <- trimws(as.matrix(d)) here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW: Internal as.matrix.data.frame uses a for loop. So depends on your style whether you like the do.call, cbind, lapply or trimws(as.matrix()) approach.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather take the do.call, cbind, lapply avenue - it's faster then the trimws(as.matrix). But thanks for the suggestion.

@sgibb sgibb merged commit 6ac413a into master Oct 12, 2018
@jorainer jorainer deleted the write_mgf branch October 12, 2018 07:46
@lgatto
Copy link
Owner

lgatto commented Oct 12, 2018

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants