-
Notifications
You must be signed in to change notification settings - Fork 992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
line number approximation of tabular data #6506
Comments
I have experienced the same problem, ie. different approximations when extracting first column of tabular file with cut (~3,300,000) and original file (~2,800,000). When i look at the files and do a wc -l on command line, i get the same number (2,711,716) for both .dat files. |
Just found the code used for the estimation: galaxy/lib/galaxy/datatypes/data.py Line 787 in 4b92f95
|
fixes galaxyproject#6506 for large files the number of lines is estimated and shown as a rounded number (using two significant digits), e.g `~8,700,000 lines`. with this change it will be: `~87 10^5 lines` this commit also makes roundify really round numbers (as the name suggests) and not simply cut at two digits, but this could be reverted if there are concerns wrt speed due to using more math
fixes galaxyproject#6506 for large files the number of lines is estimated and shown as a rounded number (using two significant digits), e.g `~8,700,000 lines`. with this change it will be: `~87 10^5 lines` this commit also makes roundify really round numbers (as the name suggests) and not simply cut at two digits, but this could be reverted if there are concerns wrt speed due to using more math
fixes galaxyproject#6506 for large files the number of lines is estimated and shown as a rounded number (using two significant digits), e.g `~8,700,000 lines`. with this change it will be: `~87 10^5 lines` this commit also makes roundify really round numbers (as the name suggests) and not simply cut at two digits, but this could be reverted if there are concerns wrt speed due to using more math
For tabular data (in my examples gtf) an approximate number of lines is shown. This approximation is often quite off, e.g.
~30,000,000 lines
for a data set with28,147,144
lines~8,700,000 lines
for a data set with8,232,791
linesI would like to know if there is a technical reason (but I could not find something in the code). And if not suggest one of the following solutions:
~30 * 10^6
If I know where and have a suggestion how to solve this I would volunteer.
The text was updated successfully, but these errors were encountered: