Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hail][feature] Add "comment" potion to import_matrix_table #7686

Merged
merged 3 commits into from
Dec 9, 2019

Conversation

tpoterba
Copy link
Contributor

@tpoterba tpoterba commented Dec 8, 2019

@konradjk
Copy link
Collaborator

konradjk commented Dec 8, 2019

potion

@@ -1664,6 +1667,9 @@ def import_matrix_table(paths,
instead.
delimiter : :obj:`str`
A single character string which separates values in the file.
comment : :obj:`str` or :obj:`list` of :obj:`str`
Skip lines beginning with the given string if the string is a single
Copy link
Contributor

@akotlar akotlar Dec 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or :obj:list of :obj:str

Is a valid comment a ["str", "str2"]? The explanation doesn't seem to match that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's fine -- used for two distinct comment patterns. Clarified docs.

@@ -218,7 +218,8 @@ class TextMatrixReader(MatrixReader):
has_header=bool,
separator=str,
gzip_as_bgzip=bool,
add_row_id=bool)
add_row_id=bool,
comment=sequenceof(str))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depending on whether or not you really want to allow ["str1", "str2"], this could be str (although sequenceof also works)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading further down it does seem you allow a list. My thought is that allowing either a single character or a regex is sufficient, and avoids needing to traverse (a small) list for every line. I think that operation could add up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an identical interface to import_table, where we added the possibility of multiple comment characters to provide better user interfaces. I don't think you'd be able to detect a performance cost here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, keeping interface same seems correct.

return escape_str(json.dumps(reader))

def __eq__(self, other):
return isinstance(other, MatrixBGENReader) and \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch.

case class TextTableReaderOptions(
files: Array[String],
typeMapStr: Map[String, String],
comment: Array[String],
separator: String,
missing: Set[String],
noHeader: Boolean,
hasHeader: Boolean,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few more places in this class that use noHeader:

val TextTableReaderOptions(files, _, comment, separator, missing, noHeader, impute, _, _, skipBlankLines, forceBGZ, filterAndReplace, forceGZ) = options

-> hasHeader

val preColumns = if (noHeader) {

(noHeader || line.value != header) &&

the last 2 !hasHeader

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! That's why this was failing...

Copy link
Contributor

@akotlar akotlar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check for noHeader use

@akotlar
Copy link
Contributor

akotlar commented Dec 9, 2019

If you dismiss I’ll approve

@akotlar
Copy link
Contributor

akotlar commented Dec 9, 2019

I’ll just approve

@danking danking merged commit db11ef4 into hail-is:master Dec 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants