Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do you want parallel file operations? #58

Closed
UnixJunkie opened this issue Feb 24, 2017 · 11 comments
Closed

do you want parallel file operations? #58

UnixJunkie opened this issue Feb 24, 2017 · 11 comments

Comments

@UnixJunkie
Copy link
Collaborator

I have this one currently:

let parmap_on_file (ncores: int) (fn: string) (f: 'a -> 'b) (read_one: in_channel -> 'a): 'b list = ...
@UnixJunkie
Copy link
Collaborator Author

Or, I wonder if I should create a separate library depending on parmap ...

@UnixJunkie
Copy link
Collaborator Author

I will create a separate library if I gather enough interesting primitives.

@rdicosmo
Copy link
Owner

rdicosmo commented Feb 24, 2017 via email

@UnixJunkie UnixJunkie reopened this Jun 9, 2017
@UnixJunkie
Copy link
Collaborator Author

I think such a function is quite useful. I'd like to contribute it to parmap.
Here is the current signature:

let parmap_on_file
    (ncores: int)
    (fn: filename)
    (f: 'a -> 'b)
    (read_one: in_channel -> 'a): 'b list

If deemed useful, we can probably add later friend functions such as pariter_on_file,
parmap_fold_on_file, etc.

Let me know if you have a better interface to propose.

This is the second time I need such a functionality in a project, so I guess it can be quite
useful to other parmap users as well.
I do chemoinformatics, but I guess bioinformatics people might have such needs as well.

Regards,
Francois.

@UnixJunkie
Copy link
Collaborator Author

@smondet @agarwal

@agarwal
Copy link

agarwal commented Jun 9, 2017

I haven't been using parmap in a while, so my opinion not useful at this time.

@smondet
Copy link

smondet commented Jun 9, 2017

@UnixJunkie that sounds useful when we want to have only ncores items at once in memory.
A more general version would use any stream-like input: unit -> 'a option.

PS: I haven't done any "analysis-level" bioinformatics in a long while though :)

@UnixJunkie
Copy link
Collaborator Author

@smondet Is the option just used to send the end of file info via a None?

@UnixJunkie
Copy link
Collaborator Author

Maybe the most generic construct is:
let parallelize
(ncores: int)
(demux: () -> 'a)
(work: 'a -> 'b)
(mux: 'b -> ()): ()
but then that's so generic that it should reside out of parmap.

@smondet
Copy link

smondet commented Jun 12, 2017

@UnixJunkie Yes, "End of Stream" actually 👍

@UnixJunkie
Copy link
Collaborator Author

parany can be used for that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants