-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fork-join threading type does not work #10
Comments
Ok looks like when the Furthermore, my program doesn't seem to be doing anything: var a = {
name: 'a',
guid: 'f1ec1bba-cb46-4802-be99-70c76db40027'
};
var b = {
name: 'b',
guid: 'b16b4d65-a6cd-45f5-bcb9-cdda175bf2ff'
};
var c = {
name: 'c',
guid: '7b04ace0-86a7-413f-85c3-1eb7e9f414c7'
};
var d = {
name: 'd',
guid: '36b41b0b-fea9-4ee2-8f18-95847ac074f0'
};
var e = {
name: 'e',
guid: '17717b9c-8281-4c16-9f74-04a27e6c1784'
};
var f = {
name: 'f',
guid: '5738f651-2056-44ed-87f3-8f5a273a5500'
};
var input = [a,b,c,d,e];
var ems = require('ems')(8, false, 'fj');
var maxObjects = 1000;
var sharedData = ems.new({
dimensions: [maxObjects],
heapSize: maxObjects * maxObjects,
useMap: true
});
ems.parallel(function(){
ems.parForEach(0, 5, function(num) {
var object = input[num];
sharedData.writeXF(object.guid, object);
});
if(ems.myID !== 0) {
process.exit();
}
});
for(var i in input) {
console.log(sharedData.read(input[i].guid)); // This doesn't get printed
} UPDATE: |
Hi Farzad, Due to the rules for forming closures in Javascript, everything to be executed in the parallel context must appear inside the var ems = require('ems')(process.argv[2], false, 'fj');
ems.parallel(function(){
var a = {
name: 'a',
guid: 'f1ec1bba-cb46-4802-be99-70c76db40027'
};
var b = {
name: 'b',
guid: 'b16b4d65-a6cd-45f5-bcb9-cdda175bf2ff'
};
var c = {
name: 'c',
guid: '7b04ace0-86a7-413f-85c3-1eb7e9f414c7'
};
var d = {
name: 'd',
guid: '36b41b0b-fea9-4ee2-8f18-95847ac074f0'
};
var e = {
name: 'e',
guid: '17717b9c-8281-4c16-9f74-04a27e6c1784'
};
var f = {
name: 'f',
guid: '5738f651-2056-44ed-87f3-8f5a273a5500'
};
var input = [a,b,c,d,e];
var maxObjects = 1000;
var sharedData = ems.new({
dimensions: [maxObjects],
heapSize: maxObjects * maxObjects,
useMap: true
});
ems.parForEach(0, input.length, function(num) {
var object = input[num];
sharedData.writeXF(object.guid, object);
});
if(ems.myID !== 0) {
process.exit();
}
for(var i in input) {
ems.diag(JSON.stringify(sharedData.read(input[i].guid)));
}
}); [EDIT: I made a few mods to allow any number of processes] Let me know if that doesn't work out in some larger context. |
Thanks for the quick response!
This is problematic for me. Since I have objects that are declared outside of the Maybe if I explain to you what I am trying to do, you may be able to give me some more information on how to achieve my goal. Basically, I have a REST server which handles requests by doing some work before serializing the data and sending back the response to the client. Now my problem is that the serialization part takes quite some time and it is possible to parallelize it with multi-threading but unfortunately, Nodejs is single threaded. This is when I came across your library which seems to be able to solve this issue for me however I can't seem to find a proper way of utilizing it in my program. So is there a way to spawn processes from the main thread so that they :
Thank you in advance |
To make that possible I added the ability to pass arguments to the parallel region. I just pushed this as v1.1.0 which you can One difference is that global variables (including modules) can not shared between processes, so each process must instantiate it's own global variables, meaning some setup code will need to move inside an EMS parallel region.
You can now pass these in as arguments to the function executed as the parallel region.
Each process only interacts with other processes via EMS shared memory and arguments passed to the parallel region from the master process.
Each process is persistent between parallel regions -- a global variable declared in one parallel region is still defined in that process in later parallel regions. I added an example called var ems = require('ems')(parseInt(process.argv[2]), true, 'fj');
var http = require('http');
var port = 8080;
var shared_data;
/* Connect to the shared data on every task. The map key will be
* the URL, the value will be the concatenation of work performed
* by all the processes. */
ems.parallel(function () {
shared_data = ems.new({
dimensions: [1000],
heapSize: [100000],
useExisting: false,
useMap: true,
setFEtags: 'full',
filename: '/tmp/EMS_shared_web_data.ems'
});
});
// When a request arrives, each process does some work and appends the results
// to shared memory
function handleRequest(request, response) {
// If this URL has not yet been requested, the data is undefined
// and must be initialized.
// Alternatively, may be initialized not here but at ems.new()
shared_data.cas(request.url, undefined, "Response preamble.");
// Enter a parallel region, each process does some work, and then
// appends the result the value.
ems.parallel(request.url, function (url) {
// Do some work
shared_data.faa(url, " Work from process " + ems.myID + ".");
});
// Return the results, leaving them in shared memory for later updates
response.end('Shared results from(' + request.url + "):" + shared_data.readFF(request.url));
}
// Create the Web server
http.createServer(handleRequest).listen(port, function () {
ems.diag("Server listening on: http://localhost:" + port);
}); Let me know if this doesn't help with your application.
[EDIT: I slept on it decided the barrier should be implied by the join at the end of a parallel region. Pushed as v1.1.1] |
Thanks! This helps quite a bit. However it does raise a few questions:
What I'm looking for is something like this: var http = require('http');
var port = 8080;
// When a request arrives, each process does some work and appends the results
// to shared memory
function handleRequest(request, response) {
// I want to fork from here and not from the beginning of the program is that possible?
var ems = require('ems')(4, true, 'fj');
var shared_data;
ems.parallel(function(){
shared_data = ems.new({
dimensions: [1000],
heapSize: [100000],
useExisting: false,
useMap: true,
setFEtags: 'full',
filename: '/tmp/EMS_shared_web_data.ems'
});
// If this URL has not yet been requested, the data is undefined
// and must be initialized.
// Alternatively, may be initialized not here but at ems.new()
shared_data.cas(request.url, undefined, "Response preamble.");
// Enter a parallel region, each process does some work, and then
// appends the result the value.
ems.parallel(request.url, function (url) {
// Do some work
shared_data.faa(url, " Work from process " + ems.myID + ".");
if(ems.myID !==0 ) {
// exit the child processes because we do not need them anymore
process.exit();
}
});
// Return the results, leaving them in shared memory for later updates
response.end('Shared results from(' + request.url + "):" + shared_data.readFF(request.url));
// Destroy the shared object
shared_data.destroy();
}
// Create the Web server
http.createServer(handleRequest).listen(port, function () {
ems.diag("Server listening on: http://localhost:" + port);
}); |
Thanks for taking the time to explain what you're trying to do, my apologies for taking so long to respond. I have been looking at this and tried a few things that didn't work out so I wanted to get back to you and let you know where things are.
I have modified example that uses atomic fetch-and-add to generate GUIDs that can be used to distinguish different requests to the same URL. This will be in the next commit.
Presently EMS arrays always persist after the program exits, primarily because EMS was designed for persistent memory. Ephemeral EMS arrays didn't get much attention and the only two hooks to programmatically remove an EMS array are broken and commented out. There are potential parallel hazards when a new EMS array with the same name is created immediately after the first one is destroyed, so a naive implementation won't work. In the meantime if you want to delete a key-value pair in EMS you can write
I can see why encapsulating EMS entirely within a callback would help with integrating legacy code, however I'm becoming less optimistic this can be made to work in EMS 1.x. Initializing EMS is a fairly expensive operation -- almost certainly not something you'd want to do in the middle of generating a REST response. Like other Node modules, EMS is meant to be Using the same process in different parallel regions is why global variables are persistent. It's worth repeating that the only variables that persist between parallel regions are global variables, not One last note about global variables in EMS -- parallel loops are dynamically load balanced, meaning iteration N of a parallel loop may be executed any process. Over a series of loops, a global variable instantiated by iteration N in one loop may not be present on iteration N of the next loop because it was instantiated in another process. Over the weekend I should be able to finish up the new example demonstrating persistence, unique IDs, and maybe ephemeral EMS arrays. Thanks for your patience,
|
I just pushed a few mods:
Initializing ems more than once per program is still looking unlikely in EMS 1.x. Hopefully the
|
Hi Farzad, It's been a while since any comments or changes were made on this issue, and I think the original fork-join issues eventually were resolved, so I'm closing this issue. Please do re-open it if any fork-join issues remain. |
setting the threading type to
'fj'
prints the following error:code:
console:
node test.js
> EMS: Must declare number of nodes to use. Input:NaN
The text was updated successfully, but these errors were encountered: