Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue with getFieldURI in distributed setting #10

Closed
grauchs opened this issue Apr 4, 2018 · 4 comments
Closed

issue with getFieldURI in distributed setting #10

grauchs opened this issue Apr 4, 2018 · 4 comments

Comments

@grauchs
Copy link

grauchs commented Apr 4, 2018

I manage to run my mupif script on one machine, using two server applications on a different machine, transgferring the fields by getField and setField methods. If I replace the getField method by getFieldURI, the execution hangs in the subsequently called setField method. However, if I run the control script and the two server applications on the same machine, it executes without any problem until the end.
the relevant part in the control script is:


print('$$$ Displacement output')
uri = app1.getFieldURI(FieldID.FID_Displacement,istep.getTime().inUnitsOf('s').getValue())
log.info("URI of problem 1's Displacement field is " + str(uri) )
res1 = Pyro4.Proxy(uri)
#res1 = app1.getField(FieldID.FID_Displacement,istep.getTime().inUnitsOf('s').getValue())
res1.toVTK2('testoutput-d',format='ascii')

print('******** SET FIELD TO SECOND MODEL ')
print('
*************************************')
print('$$$ Field 1 time',res1.getTime())
print('$$$ set displacement field to model 2')
app2.setField(res1)


I noticed that I can address field res1 in the controlscript, as in print('$$$ Field 1 time',res1.getTime()), but not in the subsequent setField executed in the API. I noticed that in the API's setField method, the field is treated as an Pyro4.core.Proxy, and subsdequent Field methods do not execute. As mentionned above , the problem does not show up if the control script executes on the same machine as the two application servers!

For information, here are the relevant parts Config.py and two Serverconfig.py Files


        self.nshost = '10.1.1.231'
        self.nsport = 9090

        self.server = '10.1.1.232'
        self.serverPort = 44382
        self.serverNathost = '127.0.0.1'
        self.serverNatport = 5555

        self.server2 = '10.1.1.232'
        self.serverPort2 = 44385
        self.serverNathost2 = self.server2
        self.serverNatport2 = 5558
        self.appName2 = 'MuPIFServer2'

    self.serverPort = self.serverPort+1
    if self.serverNatport != None:
        self.serverNatport+=1
    self.socketApps = self.socketApps+1
    self.portsForJobs=( 9200, 9249 )
    self.jobNatPorts = [None] if self.jobNatPorts[0]==None else list(range(6200, 6249)) 

    self.serverPort = self.serverPort+2
    if self.serverNatport != None:
        self.serverNatport+=2
    self.socketApps = self.socketApps+2
    self.portsForJobs=( 9250, 9300 )
    self.jobNatPorts = [None] if self.jobNatPorts[0]==None else list(range(6250, 6300)) 

@nitramkaroh
Copy link
Contributor

nitramkaroh commented Apr 19, 2018

We tried the Example11, where the getFieldURI method is used, and it is working in our setting in Linux as well as in Windows.
Do you use vpn or ssh?
Can you check that your ports are open?

@grauchs
Copy link
Author

grauchs commented May 2, 2018

I use ssh.
On the application server, running the two applications, the following relevant ports are open.
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 10.1.1.232:9250 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:6250 0.0.0.0:* LISTEN
tcp 0 0 10.1.1.232:9200 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:10001 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:10002 0.0.0.0:* LISTEN
tcp 0 0 10.1.1.232:44382 0.0.0.0:* LISTEN
tcp 0 0 10.1.1.232:44383 0.0.0.0:* LISTEN
tcp 0 0 ::1:6250 :::* LISTEN

On the machine running the control script, the following ports are open
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 127.0.0.1:6250 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:5556 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:5557 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:6200 0.0.0.0:* LISTEN
tcp 0 0 ::1:6250 :::* LISTEN
tcp 0 0 ::1:5556 :::* LISTEN
tcp 0 0 ::1:5557 :::* LISTEN
tcp 0 0 ::1:6200 :::* LISTEN

So basically, one the application server, port 6200 is not open, despite appering in the log of the server. I don't know if this is critical.

@grauchs
Copy link
Author

grauchs commented May 23, 2018

I managed to reproduce the error with the demoapp! I put it into a tarball. To run it, you have to set the ip of the nameserver and set some paths in th Config.py file, then start Server1.py in the ServerA directory, then start Server2.py in ServerB, the Example666.py in HQ, the last starting the computations. There is a flag in Example666.py for switching between getField and getFieldURI. If Server1.py runs on a different machine that Example666.py, then the application will hang when it sets the field. Use the -m1 flag for ssh communication.
testclone.zip
If there are problem running the example, don't hesitate to ask

@nitramkaroh
Copy link
Contributor

Dear Gaston,

we hope that we finally solved the issue. Can you, please, try to run the same example using the latest mupif git version?

Thank you.
Martin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants