Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hail][feature] Add "comment" potion to `import_matrix_table` #7686

Merged
merged 3 commits into from Dec 9, 2019

Conversation

@tpoterba
Copy link
Collaborator

tpoterba commented Dec 8, 2019

cc @nawatts

@konradjk

This comment has been minimized.

Copy link
Collaborator

konradjk commented Dec 8, 2019

potion

@@ -1664,6 +1667,9 @@ def import_matrix_table(paths,
instead.
delimiter : :obj:`str`
A single character string which separates values in the file.
comment : :obj:`str` or :obj:`list` of :obj:`str`
Skip lines beginning with the given string if the string is a single

This comment has been minimized.

Copy link
@akotlar

akotlar Dec 8, 2019

Collaborator

or :obj:list of :obj:str

Is a valid comment a ["str", "str2"]? The explanation doesn't seem to match that.

This comment has been minimized.

Copy link
@tpoterba

tpoterba Dec 8, 2019

Author Collaborator

yes, that's fine -- used for two distinct comment patterns. Clarified docs.

@@ -218,7 +218,8 @@ class TextMatrixReader(MatrixReader):
has_header=bool,
separator=str,
gzip_as_bgzip=bool,
add_row_id=bool)
add_row_id=bool,
comment=sequenceof(str))

This comment has been minimized.

Copy link
@akotlar

akotlar Dec 8, 2019

Collaborator

depending on whether or not you really want to allow ["str1", "str2"], this could be str (although sequenceof also works)

This comment has been minimized.

Copy link
@akotlar

akotlar Dec 8, 2019

Collaborator

Reading further down it does seem you allow a list. My thought is that allowing either a single character or a regex is sufficient, and avoids needing to traverse (a small) list for every line. I think that operation could add up.

This comment has been minimized.

Copy link
@tpoterba

tpoterba Dec 8, 2019

Author Collaborator

This is an identical interface to import_table, where we added the possibility of multiple comment characters to provide better user interfaces. I don't think you'd be able to detect a performance cost here.

This comment has been minimized.

Copy link
@akotlar

akotlar Dec 8, 2019

Collaborator

I see, keeping interface same seems correct.

return escape_str(json.dumps(reader))

def __eq__(self, other):
return isinstance(other, MatrixBGENReader) and \

This comment has been minimized.

Copy link
@akotlar

akotlar Dec 8, 2019

Collaborator

nice catch.

case class TextTableReaderOptions(
files: Array[String],
typeMapStr: Map[String, String],
comment: Array[String],
separator: String,
missing: Set[String],
noHeader: Boolean,
hasHeader: Boolean,

This comment has been minimized.

Copy link
@akotlar

akotlar Dec 8, 2019

Collaborator

There are a few more places in this class that use noHeader:

val TextTableReaderOptions(files, _, comment, separator, missing, noHeader, impute, _, _, skipBlankLines, forceBGZ, filterAndReplace, forceGZ) = options

-> hasHeader

val preColumns = if (noHeader) {

(noHeader || line.value != header) &&

the last 2 !hasHeader

This comment has been minimized.

Copy link
@tpoterba

tpoterba Dec 9, 2019

Author Collaborator

thanks! That's why this was failing...

Copy link
Collaborator

akotlar left a comment

Check for noHeader use

fix
@akotlar

This comment has been minimized.

Copy link
Collaborator

akotlar commented Dec 9, 2019

If you dismiss I’ll approve

@akotlar

This comment has been minimized.

Copy link
Collaborator

akotlar commented Dec 9, 2019

I’ll just approve

@akotlar
akotlar approved these changes Dec 9, 2019
@danking danking merged commit db11ef4 into hail-is:master Dec 9, 2019
1 check passed
1 check passed
ci-test success
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.