-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IO position not updated while reading a CSV file #195
Comments
Could you show your use case of |
Yes. Basically It looks like this: #file = File.expand_path(File.join(File.dirname(__FILE__), file_name))
CSV.open(file, 'rb', {col_sep: ';', encoding: 'ISO-8859-1'}) do |csv|
pos = csv.pos #or .tell
csv.each { |_| positions << pos; pos = csv.pos }
end I got a full example of the test in a gist |
Thanks. |
Could you also show the |
In my application i have an autocomplete dropdown for some cities zipcode from a csv file. So to be able to search effectively in that file we use the For example: We know in that file Moscow is at position 45 and Monrovia at the position 367. Our index is built with the chunks of Moscow and Monrovia like this:
So when someone enters M or Mo we can propose line Moscow and Monrovia based on the fact that we know M or Mo are on the lines 45 and 367. Does it make sense ? |
@kou here is an output of the script.
|
How about using We may read a large chunk at once for performance instead of reading each line. So I don't want to recommend users that they expect |
Yes we could have used Please could you show me some valid examples of If |
How about caching a parsed CSV object in memory instead of using I don't think |
Yes it will be faster for sure but we got a really big csv file so am wondering about memory growth issues. Anyway thanks for the advice, we will try to experiment it. Then i guess the changes about But am still waiting for some valid example of those io-delegated methods (such as |
I think that it's time you switch to other approach such as using DB from CSV in your situation. I close this. I don't think most of those IO-delegated methods are useful. They still exist for historical reasons. They may be useful with |
While reading CSV files the
pos
method always return first line position.It seems the issue appeared since this commit (found using git bisect)
To illustrate it i made a test file.
I also made a fork on my own repo with the test file and a csv file (in test/csv/proof) and tried to fix it.
My solution is to pass down the
@input
instance fromParser
toScanner
and update position while iterating withineach_line
method.I wanted to make sure it was an issue (or maybe a bad use from me) before opening a Pull Request
I am using Ruby 2.7.2
Best regards,
Emmanuel KONZI.
The text was updated successfully, but these errors were encountered: