line number approximation of tabular data #6506

bernt-matthias · 2018-07-17T11:05:11Z

For tabular data (in my examples gtf) an approximate number of lines is shown. This approximation is often quite off, e.g.

~30,000,000 lines for a data set with 28,147,144 lines
~8,700,000 lines for a data set with 8,232,791 lines

I would like to know if there is a technical reason (but I could not find something in the code). And if not suggest one of the following solutions:

actually show a shorter representation like ~30 * 10^6
show the precise number (which would use the same space)

If I know where and have a suggestion how to solve this I would volunteer.

The text was updated successfully, but these errors were encountered:

tougai · 2019-02-11T08:53:49Z

I have experienced the same problem, ie. different approximations when extracting first column of tabular file with cut (~3,300,000) and original file (~2,800,000). When i look at the files and do a wc -l on command line, i get the same number (2,711,716) for both .dat files.

bernt-matthias · 2019-02-12T22:10:24Z

Just found the code used for the estimation:

galaxy/lib/galaxy/datatypes/data.py

Line 787 in 4b92f95

def estimate_file_lines(self, dataset):

fixes galaxyproject#6506 for large files the number of lines is estimated and shown as a rounded number (using two significant digits), e.g `~8,700,000 lines`. with this change it will be: `~87 10^5 lines` this commit also makes roundify really round numbers (as the name suggests) and not simply cut at two digits, but this could be reverted if there are concerns wrt speed due to using more math

bernt-matthias mentioned this issue May 13, 2022

Make columns types an empty list for empty tabular data #13918

Merged

12 tasks

dannon closed this as completed in 7abb163 Mar 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

line number approximation of tabular data #6506

line number approximation of tabular data #6506

bernt-matthias commented Jul 17, 2018 •

edited

Loading

tougai commented Feb 11, 2019

bernt-matthias commented Feb 12, 2019

line number approximation of tabular data #6506

line number approximation of tabular data #6506

Comments

bernt-matthias commented Jul 17, 2018 • edited Loading

tougai commented Feb 11, 2019

bernt-matthias commented Feb 12, 2019

bernt-matthias commented Jul 17, 2018 •

edited

Loading