Skip to content
This repository has been archived by the owner on Jan 20, 2022. It is now read-only.

Function "dir_read" in PAL layer doesn't handle the case where offset > 0 #469

Open
rainfld opened this issue Feb 12, 2019 · 8 comments
Open
Assignees

Comments

@rainfld
Copy link
Contributor

rainfld commented Feb 12, 2019

The function y "dir_read" takes the offset as an argument but doesn't handle the case that offset > 0.
Currently we return -PAL_ERROR_INVAL when offset > 0. Eventually we need to add handling logic.

@boryspoplawski
Copy link
Contributor

Imho this functions does not need to handle offset other than 0 - it's unclear what would offset in stream of directory entries mean (bytes, entries, ?).
Referenced issue was due to a bug elsewhere, see PR #840

@donporter
Copy link
Contributor

Large directories may not be able to read in one pass, so offset is important.

That said, we should have well-defined units for what offset is in terms of.

@boryspoplawski
Copy link
Contributor

I like the current semantics of directory read - a non-seekable stream of directory entries.
"dir_read" does some internal bookkeeping and second consecutive read on the same handle just returns next portion of entires - large directories are read in multiple passes, without using an offset.

@mkow
Copy link
Member

mkow commented Aug 13, 2019

Closing. IMO @boryspoplawski is right, seeking on directories doesn't make too much sense and the bugs related to this were a result of something else, as Borys noted.

@mkow mkow closed this as completed Aug 13, 2019
@donporter
Copy link
Contributor

I don't agree with this: I would prefer to have minimally stateful PAL abstractions, including handles without implicit cursors. Otherwise, this can create problems for checkpoint/migrate/restore.

But I also don't want to deal with this right now, so I'm ok leaving this on the back-burner. But maybe we don't want this closed?

@mkow
Copy link
Member

mkow commented Aug 13, 2019

I closed it because I thought your previous comment was based on a wrong assumption (that you can't read large directories without seeking), but if what you want is to list directories by explicit seeking instead of by keeping a state in the PAL handle then we can continue discussion.

From my side: I think both solutions need to snapshot the whole directory listing on DkStreamOpen (because of concurrency issues). This means that there's a lot of state inside such handle either way, so I'm not sure that this makes a difference here. On the other hand, explicit seeking would be more consistent with how other types of resources are read.

@mkow mkow reopened this Aug 13, 2019
@boryspoplawski
Copy link
Contributor

As much as I agree that having stateless API is desired, I don't think it is doable in this case. Main issue is that we cannot do a full directory snapshot (at either open or first directory read), because it might contain too many entries to cache. If there is no snapshot, how to interpret an offset? Actual content of directory might change at any time. Besides, giving an ability to ask for arbitrary off means we have to implement it somehow - open and read directory each time Pal's readdir is called?

@dimakuv
Copy link
Contributor

dimakuv commented Jul 14, 2021

Pawel will look more into this, but for now I'm assigning Priority 2 since this doesn't look like blocking any workloads.

@dimakuv dimakuv added the P: 2 label Jul 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants