Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement slurp with os/open. #1397

Closed
amano-kenji opened this issue Feb 9, 2024 · 28 comments
Closed

Reimplement slurp with os/open. #1397

amano-kenji opened this issue Feb 9, 2024 · 28 comments

Comments

@amano-kenji
Copy link
Contributor

Right now, it uses file/open. file/ functions block the event loop. os/open creates a stream which doesn't block the event loop.

@iacore
Copy link
Contributor

iacore commented Feb 9, 2024

Here is a bigger question: how much OS function support should Janet have when compiled without event loop?

@amano-kenji
Copy link
Contributor Author

I don't know.

@bakpakin
Copy link
Member

This might be something better with spork. Linux, and I suspect other operating systems, make reading and writing from the file essential synchronous. A re-implementation of slurp that uses os/open might not be that useful

@iacore
Copy link
Contributor

iacore commented Feb 18, 2024

is current ev/read synchronous? it says otherwise in (doc ev/read)

    cfunction
    src/core/ev.c on line 2985, column 1

    (ev/read stream n &opt buffer timeout)

    Read up to n bytes into a buffer asynchronously from a stream. n 
    can also be the keyword :all to read into the buffer until end of 
...

@amano-kenji
Copy link
Contributor Author

amano-kenji commented Feb 18, 2024

You mean ev/read and ev/write don't yield to the event loop when they act on files opened with os/open?

@iacore
Copy link
Contributor

iacore commented Feb 19, 2024

You mean ev/read and ev/write don't yield to the event loop when they act on files opened with os/open?

Sort of. I was wonder because @bakpakin said

make reading and writing from the file essential synchronous.

@bakpakin
Copy link
Member

Even if you use poll or select on files in Linux, reading and writing can still block. Yes, for special files and pipes this might fix a problem but I think it makes more sense to just move to spork.

@amano-kenji
Copy link
Contributor Author

What do you mean? Do you mean that ev/read and ev/write against a file can block the event loop?

@amano-kenji
Copy link
Contributor Author

I don't know what to add to spork because I don't have grasp of the problem.

@bakpakin
Copy link
Member

What I was trying to say is that the implementation is straight forward, but that it is not all that valuable. For more information, here is a mailing list that in short describes the issue: https://utcc.utoronto.ca/~cks/space/blog/linux/HardAsyncFileIO

If you show some motivating code where blocking the event loop was an issue, maybe we could offer more help.

@iacore
Copy link
Contributor

iacore commented Feb 19, 2024

Even if you use poll or select on files in Linux, reading and writing can still block. Yes, for special files and pipes this might fix a problem but I think it makes more sense to just move to spork.

io_uring would not block for data read on Linux;
we should use something like libuv (but smaller) for sanity if we want to add that to Janet, although current O_NONBLOCK is fine i think? personally, i don't need that much performance from Janet.

@bakpakin
Copy link
Member

An io_uring event loop would be nice and probably work, but again for such a high performance requirement, the other abstractions in Janet would likely be unsuitable anyway. Not to mention a requirement on more modern kernel versions.

@amano-kenji
Copy link
Contributor Author

If you raise minimum required kernel version, you can implement asynchronous file IO now?

@llmII
Copy link
Contributor

llmII commented Feb 21, 2024

It would look like io_uring is not the panacea necessary for AIO on files per the man page sourced here.

aio(7) - See notes section.

@sogaiu
Copy link
Contributor

sogaiu commented Feb 22, 2024

In case things change, here are some bits from the linked page:

       Simultaneous asynchronous read or write operations using the same
       aiocb structure yield undefined results.

       The current Linux POSIX AIO implementation is provided in user
       space by glibc.  This has a number of limitations, most notably
       that maintaining multiple threads to perform I/O operations is
       expensive and scales poorly.  Work has been in progress for some
       time on a kernel state-machine-based implementation of
       asynchronous I/O (see [io_submit(2)](https://man7.org/linux/man-pages/man2/io_submit.2.html), [io_setup(2)](https://man7.org/linux/man-pages/man2/io_setup.2.html), [io_cancel(2)](https://man7.org/linux/man-pages/man2/io_cancel.2.html),
       [io_destroy(2)](https://man7.org/linux/man-pages/man2/io_destroy.2.html), [io_getevents(2)](https://man7.org/linux/man-pages/man2/io_getevents.2.html)), but this implementation hasn't
       yet matured to the point where the POSIX AIO implementation can
       be completely reimplemented using the kernel system calls.

@llmII
Copy link
Contributor

llmII commented Feb 22, 2024

Thinking through this... let's say it was implemented with os/open. Sure, every call to ev/read would block, but say it's only read in 4kb chunks... one could possibly (ev/sleep 0) to effect a yield to the event loop that may result in another task having a chance to do some work till it yields to the event loop?

Would the idea of re-implementing slurp in terms of basically something along the lines of the below make sense? It's more psuedo-code than it is actual code checked for viability.

(while (ev/read file 4096 buf)
  (ev/sleep 0))

@pepe
Copy link
Member

pepe commented Feb 22, 2024

But what if you build Janet without ev? What would be the strategy, then?

@llmII
Copy link
Contributor

llmII commented Feb 22, 2024

I'm unsure if Janet-side code has the ability to detect if ev is present or not. If it did it could conditionally define such perhaps? Or it could be that non-ev-Janet is not as well supported and lacks slurp, leaving people to read the full content of a file in terms of file/read?

@sogaiu
Copy link
Contributor

sogaiu commented Feb 22, 2024

There is this section in janetconf.h regarding not using ev:

/* These settings should be specified before amalgamation is
 * built. Any build with these set should be considered non-standard, and
 * certain Janet libraries should be expected not to work. */
// ... elided ... //
/* #define JANET_NO_EV */

@sogaiu
Copy link
Contributor

sogaiu commented Feb 22, 2024

Regarding detection of ev, perhaps this or a variation?

@pepe
Copy link
Member

pepe commented Feb 22, 2024

It seems superfluous. TBH, the whole debate is just bikesheding to me. Even the name slurp sounds like something not too advanced :-).

@sogaiu
Copy link
Contributor

sogaiu commented Feb 22, 2024

I don't really understand your comment.

Would you mind elaborating on it?

@pepe
Copy link
Member

pepe commented Feb 23, 2024

I meant that I see slurp as a quick, dirty way to get the file's content, for example, in the throwaway scripts or when loading configuration. When I care about some more optimized version, writing my version tailored to that specific needs is easy.

But most of all, looking at the length of this thread, it is not worth the cause.

@sogaiu
Copy link
Contributor

sogaiu commented Feb 23, 2024

Thanks for the explanation.

@sogaiu
Copy link
Contributor

sogaiu commented Feb 23, 2024

FWIW, I'm not convinced that the idea mentioned by llmII was not worth continuing to discuss. It seemed a bit premature to stop the conversation.

@amano-kenji
Copy link
Contributor Author

amano-kenji commented Feb 24, 2024

I want to know what can replace slurp for asynchronous IO.

(while (ev/read file 4096 buf)
  (ev/sleep 0))

Does this yield to the event loop regularly? If IO itself was synchronous, I would still want to yield to the event loop after reading a chunk of 4 kilobytes or something like that.

In real code, I would probably use

(while (ev/read file 4096)
  (ev/give ch buf))

or

(while (ev/read file 4096 buf)
  (yield buf))

I think ev/give and yield yield to the event loop.

@llmII
Copy link
Contributor

llmII commented Feb 24, 2024

@amano-kenji Please note that (yield) does not yield to the event loop but to whatever resumed the currently running fiber. This is important to note because if a task yields without doing so via an ev/* call it needs to be resumed via ev/go or it is done.

The following example code illustrates this working for anyone who comes back to look at this issue in the future, and outlines how slurp could be implemented to work like the example as well.

Before we begin we should note that for the event loop to be able to schedule something other than the currently running fiber, the currently running fiber must yield to the event loop. It may not seem intuitive but (ev/sleep 0) is the equivalent of (yield), the only difference being where the fiber is yielding to, the former yielding to the event loop, the latter to the section of code that resumed the currently running fiber having called (yield).

You'll need to modify the example to use a suitably large file, and change what the file is that it's opening. If you're on *nix something like dd if=/dev/zero of=bad-file bs=1 count=0 seek=1G may be helpful.

Example:

(def buf @"")

(def tf (ev/spawn
          (protect
            (forever
              (ev/sleep .5)
              (print "Here we are @" (length buf))))
          (print "Last @" (length buf))))

# optionally use `os/open`, makes no difference
(with [f (file/open "/tmp/bad-file" :r)]
  (while (:read f 4096 buf)
    (ev/sleep 0)))

(print "We read 1gb!")
(ev/cancel tf :cancel)

Example output:

Here we are @359878656
Here we are @901939200
We read 1gb!
Last @1073741824

Example slurp implementation that won't block tasks as badly as reading all the bytes in one go:

(defn slurp
  ``Read all data from a file with name `path` and then close the file.``
  [path]
  (def buf @"")
  # optionally use `os/open`, makes no difference
  (with [f (file/open path :rb)] 
    (while (:read f 4096 buf)
      (ev/sleep 0)))
  buf)

@bakpakin
Copy link
Member

Closing this as the solution posted here should work as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants