-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[hail][feature] Add "comment" potion to import_matrix_table
#7686
Conversation
@@ -1664,6 +1667,9 @@ def import_matrix_table(paths, | |||
instead. | |||
delimiter : :obj:`str` | |||
A single character string which separates values in the file. | |||
comment : :obj:`str` or :obj:`list` of :obj:`str` | |||
Skip lines beginning with the given string if the string is a single |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or :obj:
list
of :obj:str
Is a valid comment a ["str", "str2"]? The explanation doesn't seem to match that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's fine -- used for two distinct comment patterns. Clarified docs.
@@ -218,7 +218,8 @@ class TextMatrixReader(MatrixReader): | |||
has_header=bool, | |||
separator=str, | |||
gzip_as_bgzip=bool, | |||
add_row_id=bool) | |||
add_row_id=bool, | |||
comment=sequenceof(str)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
depending on whether or not you really want to allow ["str1", "str2"], this could be str (although sequenceof also works)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading further down it does seem you allow a list. My thought is that allowing either a single character or a regex is sufficient, and avoids needing to traverse (a small) list for every line. I think that operation could add up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an identical interface to import_table
, where we added the possibility of multiple comment characters to provide better user interfaces. I don't think you'd be able to detect a performance cost here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, keeping interface same seems correct.
return escape_str(json.dumps(reader)) | ||
|
||
def __eq__(self, other): | ||
return isinstance(other, MatrixBGENReader) and \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch.
case class TextTableReaderOptions( | ||
files: Array[String], | ||
typeMapStr: Map[String, String], | ||
comment: Array[String], | ||
separator: String, | ||
missing: Set[String], | ||
noHeader: Boolean, | ||
hasHeader: Boolean, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few more places in this class that use noHeader:
val TextTableReaderOptions(files, _, comment, separator, missing, noHeader, impute, _, _, skipBlankLines, forceBGZ, filterAndReplace, forceGZ) = options |
-> hasHeader
val preColumns = if (noHeader) { |
(noHeader || line.value != header) && |
the last 2 !hasHeader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! That's why this was failing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check for noHeader use
If you dismiss I’ll approve |
I’ll just approve |
cc @nawatts