Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement new file reader #16

Merged
merged 1 commit into from
Jan 19, 2020
Merged

Implement new file reader #16

merged 1 commit into from
Jan 19, 2020

Conversation

markbt
Copy link
Owner

@markbt markbt commented Jan 12, 2020

Implement a new reader for on-disk files, that reads the data out of the file
rather than mmapping it.

It's a bit simple right now, so it reloads each line from the file whenever it
wants to display it. It would be better to go through some kind of LRU block
cache for loading chunks of the file. The cache can be flushed whenever we
detect a reload is necessary.

The heuristic for append-vs-reload is whether the last 4k of the file has
changed or not. Also, any time we try to parse a line and fine a newline
in the middle of the line, we trigger a full reload.

This should help with #8 and #9, as we now watch the file and load new contents
if the file is appended to, or reload the file if the contents change or the
file is truncated.

The mmap implementation is retained in case it will be useful, but for now is
unused.

@markbt markbt changed the title implement new file reader Implement new file reader Jan 12, 2020
@quark-zju
Copy link
Contributor

I wasn't aware of this. I did something similar at quark-zju@305b3ac. It has some caching support.

@jsgf
Copy link

jsgf commented Jan 15, 2020

How much does/would caching help? The kernel is already doing all that for you, so all you're saving is the cost of the syscalls themselves. There can't be that many of them since you'd be limited by the user's reading rate (and bulk operations like search can amortize the cost with large reads).

@quark-zju
Copy link
Contributor

I'm not sure. Each line will trigger a read call. It could be 100+ lines. Search also seems to trigger one read per line due to the current API design.

@markbt markbt force-pushed the new-files branch 2 times, most recently from d0f7968 to 914af4d Compare January 18, 2020 18:55
Implement a new reader for on-disk files, that reads the data out of the file
rather than mmapping it.

Reads to the file for display and search go through an LRU cache of 1MB blocks.

If a file change is detected, then depending on whether or not the file
has been appended to or rewritten, streampager will load the remaining data
or flush its caches and reload the file.

The heuristic for append-vs-reload is whether the last 4k of the file has
changed or not, or if any time we try to parse a line we find a newline
in the middle of the line.

This replaces the mmap implementation, giving a better experience for looking
at files on disk.

The mmap implementation is retained in case it will be useful, but for now is
unused.

Fixes #8
Fixes #9
@markbt
Copy link
Owner Author

markbt commented Jan 18, 2020

I added cache support, so now this is probably ready to go. Any input welcome before I merge it.

The notify behaviour seems a bit unreliable, but when it works, it's quite nice.

@markbt markbt merged commit a0a734e into master Jan 19, 2020
@markbt markbt deleted the new-files branch January 19, 2020 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants