Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix empty tabular output error when using discover_datasets. #4240

Merged

Conversation

Projects
None yet
6 participants
@pkrog
Copy link
Contributor

commented Jun 27, 2017

Fix issue #4231 .
I propose this pull request solving an issue for my tool mtbls-dwnld that may at times output empty tabular files (header line only, no data).
Where I have made correction in the code, I've not seen any check about files having at least one line of data (maybe I've missed that), so I've supposed there is no requirements in Galaxy for output files not being empty. Please, tell me if I'm wrong.

@galaxybot galaxybot added the triage label Jun 27, 2017

@galaxybot galaxybot added this to the 17.09 milestone Jun 27, 2017

@@ -0,0 +1 @@
/home/pierrick/dev/mtbls-dwnld

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek Jun 27, 2017

Member

I suppose this has been added by accident.

@pkrog

This comment has been minimized.

Copy link
Contributor Author

commented Jun 28, 2017

Oops... you are right @mvdbeek , I've removed the incriminated file. Thanks.

@jmchilton

This comment has been minimized.

Copy link
Member

commented Jul 5, 2017

This makes good sense to me - if the tests pass I'll merge it. Thanks a ton for the contribution @pierrickrogermele - it looks nice and it is really appreciated.

@nsoranzo

This comment has been minimized.

Copy link
Member

commented Jul 5, 2017

I have some suggestions, please wait before merging.

if data_row is None:
dataset.metadata.columns = len( header_row )
else:
dataset.metadata.columns = max( len( header_row ), len( data_row ) )
dataset.metadata.column_names = header_row
dataset.metadata.delimiter = reader.dialect.delimiter

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Jul 5, 2017

Member

Some of the code above (e.g. line 997) would fail if the dataset is completely empty. This is how I would write this method (not tested):

    def set_meta( self, dataset, **kwd ):
        column_types = []
        header_row = []
        data_row = []
        data_lines = 0
        if dataset.has_data():
            with open(dataset.file_name, 'r') as csvfile:
                # Parse file with the correct dialect
                reader = csv.reader(csvfile, self.dialect)
                try:
                    header_row = next(reader)
                    data_row = next(reader)
                    for row in reader:
                        pass
                except StopIteration:
                    pass
                except csv.Error as e:
                    raise Exception('CSV reader error - line %d: %s' % (reader.line_num, e))
                else:
                    data_lines = reader.line_num - 1

        # Guess column types
        for cell in data_row:
            column_types.append(self.guess_type(cell))

        # Set metadata
        dataset.metadata.data_lines = data_lines
        dataset.metadata.comment_lines = int(bool(header_row))
        dataset.metadata.column_types = column_types
        dataset.metadata.columns = max( len( header_row ), len( data_row ) )
        dataset.metadata.column_names = header_row
        dataset.metadata.delimiter = self.dialect.delimiter

This comment has been minimized.

Copy link
@pkrog

pkrog Jul 6, 2017

Author Contributor

Yes, this looks good, I'll try these modifications.

@pkrog

This comment has been minimized.

Copy link
Contributor Author

commented Jul 6, 2017

Could someone, please, enlighten me about the failure of api test:

FAIL: test_list_download (api.test_dataset_collections.DatasetCollectionApiTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/galaxy/test/api/test_dataset_collections.py", line 108, in test_list_download
    assert len(namelist) == 3, "Expected 3 elements in [%s]" % namelist
AssertionError: Expected 3 elements in [['Test Dataset Collection/data1.txt', 'Test Dataset Collection/data2.txt']]
@mvdbeek

This comment has been minimized.

Copy link
Member

commented Jul 6, 2017

Could someone, please, enlighten me about the failure of api test:

Sorry about the confusion. This is not related to your PR, you can ignore this. We know it fails every now and then and it should be fixed by #4281. We will merge PRs that fail this test, although this problem is likely solved now.

dataset.metadata.column_types = column_types
dataset.metadata.columns = max( len( header_row ), len( data_row ) )
dataset.metadata.column_names = header_row
dataset.metadata.delimiter = self.dialect.delimiter

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek Jul 6, 2017

Member

Can you add anther blank line below this line?
Classes should be separated by two blank lines. This is needed for the travis linting to pass.

This comment has been minimized.

Copy link
@pkrog

pkrog Jul 6, 2017

Author Contributor

Ok.

@galaxyproject galaxyproject deleted a comment from jmchilton Jul 6, 2017

@nsoranzo nsoranzo merged commit ba5549b into galaxyproject:dev Jul 6, 2017

5 checks passed

api test Build finished. 279 tests run, 0 skipped, 0 failed.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
framework test Build finished. 150 tests run, 0 skipped, 0 failed.
Details
integration test Build finished. 37 tests run, 0 skipped, 0 failed.
Details
toolshed test Build finished. 579 tests run, 0 skipped, 0 failed.
Details
@nsoranzo

This comment has been minimized.

Copy link
Member

commented Jul 6, 2017

Thanks @pierrickrogermele!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.