Implement new file reader #16

markbt · 2020-01-12T20:10:23Z

Implement a new reader for on-disk files, that reads the data out of the file
rather than mmapping it.

It's a bit simple right now, so it reloads each line from the file whenever it
wants to display it. It would be better to go through some kind of LRU block
cache for loading chunks of the file. The cache can be flushed whenever we
detect a reload is necessary.

The heuristic for append-vs-reload is whether the last 4k of the file has
changed or not. Also, any time we try to parse a line and fine a newline
in the middle of the line, we trigger a full reload.

This should help with #8 and #9, as we now watch the file and load new contents
if the file is appended to, or reload the file if the contents change or the
file is truncated.

The mmap implementation is retained in case it will be useful, but for now is
unused.

quark-zju · 2020-01-12T23:49:45Z

I wasn't aware of this. I did something similar at quark-zju@305b3ac. It has some caching support.

jsgf · 2020-01-15T22:14:59Z

How much does/would caching help? The kernel is already doing all that for you, so all you're saving is the cost of the syscalls themselves. There can't be that many of them since you'd be limited by the user's reading rate (and bulk operations like search can amortize the cost with large reads).

quark-zju · 2020-01-17T06:03:42Z

I'm not sure. Each line will trigger a read call. It could be 100+ lines. Search also seems to trigger one read per line due to the current API design.

Implement a new reader for on-disk files, that reads the data out of the file rather than mmapping it. Reads to the file for display and search go through an LRU cache of 1MB blocks. If a file change is detected, then depending on whether or not the file has been appended to or rewritten, streampager will load the remaining data or flush its caches and reload the file. The heuristic for append-vs-reload is whether the last 4k of the file has changed or not, or if any time we try to parse a line we find a newline in the middle of the line. This replaces the mmap implementation, giving a better experience for looking at files on disk. The mmap implementation is retained in case it will be useful, but for now is unused. Fixes #8 Fixes #9

markbt · 2020-01-18T19:21:19Z

I added cache support, so now this is probably ready to go. Any input welcome before I merge it.

The notify behaviour seems a bit unreliable, but when it works, it's quite nice.

markbt changed the title ~~implement new file reader~~ Implement new file reader Jan 12, 2020

markbt force-pushed the new-files branch from 4e13596 to 4593ec7 Compare January 12, 2020 20:17

markbt force-pushed the new-files branch 2 times, most recently from d0f7968 to 914af4d Compare January 18, 2020 18:55

markbt force-pushed the new-files branch from 914af4d to bfb5743 Compare January 18, 2020 19:17

markbt merged commit a0a734e into master Jan 19, 2020

markbt deleted the new-files branch January 19, 2020 10:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement new file reader #16

Implement new file reader #16

markbt commented Jan 12, 2020

quark-zju commented Jan 12, 2020

jsgf commented Jan 15, 2020

quark-zju commented Jan 17, 2020

markbt commented Jan 18, 2020

Implement new file reader #16

Implement new file reader #16

Conversation

markbt commented Jan 12, 2020

quark-zju commented Jan 12, 2020

jsgf commented Jan 15, 2020

quark-zju commented Jan 17, 2020

markbt commented Jan 18, 2020