error while attempting to bind on address, address already in use #315

segasai · 2021-07-19T13:38:38Z

When running multiple pystan instances in parallel, (specifically I'm directly calling
.log_prob() on subsets of the data in parallel).
there is an error caused by port allocation of httpstan.

in getl_one()
     36 
     37 def getl_one(i, x):
---> 38     return si.Ms[i].log_prob(x)
     39 
     40 

~/pyenv38/lib/python3.8/site-packages/stan/model.py in log_prob()
    395                 return resp.json()["log_prob"]
    396 
--> 397         return asyncio.run(go())
    398 
    399     def grad_log_prob(self, unconstrained_parameters: Sequence[float]) -> float:

/usr/lib/python3.8/asyncio/runners.py in run()
     42         if debug is not None:
     43             loop.set_debug(debug)
---> 44         return loop.run_until_complete(main)
     45     finally:
     46         try:

/usr/lib/python3.8/asyncio/base_events.py in run_until_complete()
    614             raise RuntimeError('Event loop stopped before Future completed.')
    615 
--> 616         return future.result()
    617 
    618     def stop(self):

~/pyenv38/lib/python3.8/site-packages/stan/model.py in go()
    389 
    390         async def go():
--> 391             async with stan.common.HttpstanClient() as client:
    392                 resp = await client.post(f"/{self.model_name}/log_prob", json=payload)
    393                 if resp.status != 200:

~/pyenv38/lib/python3.8/site-packages/stan/common.py in __aenter__()
     34         host, port = "127.0.0.1", unused_tcp_port()
     35         site = aiohttp.web.TCPSite(self.runner, host, port)
---> 36         await site.start()
     37         self.session = aiohttp.ClientSession()
     38         self.base_url = f"http://{host}:{port}/v1"

~/pyenv38/lib/python3.8/site-packages/aiohttp/web_runner.py in start()
    119         server = self._runner.server
    120         assert server is not None
--> 121         self._server = await loop.create_server(
    122             server,
    123             self._host,

/usr/lib/python3.8/asyncio/base_events.py in create_server()
   1461                         sock.bind(sa)
   1462                     except OSError as err:
-> 1463                         raise OSError(err.errno, 'error while attempting '
   1464                                       'to bind on address %r: %s'
   1465                                       % (sa, err.strerror.lower())) from None

OSError: [Errno 98] error while attempting to bind on address ('127.0.0.1', 35943): address already in use

Basically if I understand correctly, the port is returned by unused_tcp_port(), but it could
easily get stolen by a parallel httpstan instance

Describe your system

Linux, Ubuntu 20.04, 64bit, gcc-9.3

Steps/Code to Reproduce

The following code does not do anything useful but triggers the bug

import stan
import multiprocessing as mp
import numpy as np

schools_code = """
data {
int N;
vector[N] x;
}
parameters {
  real mu;                // population treatment effect

}
model {
  target += normal_lpdf(x | mu, 1); // log-likelihood
}
"""


class si:
    M = None


def func(x):
    return si.M.log_prob(x)


if __name__ == '__main__':
    N = 10000
    x = np.random.normal(size=N)
    data = {'x': x, 'N': N}
    si.M = stan.build(schools_code, data=data)
    pool = mp.Pool(36)
    res = []
    for i in range(100000):
        res.append(pool.apply_async(func, ([1], )))
    for r in res:
        (r.get())

The text was updated successfully, but these errors were encountered:

riddell-stan · 2021-07-19T13:55:00Z

Thanks for the report.

I'm not sure anyone intended to support the use of pystan in this manner. The provided code is a little fishy -- you could simply run func sequentially, right?

It's difficult to predict the order of execution when using multiprocessing. Perhaps you could arrange things to run in entirely different processes (i.e., without using multiprocessing). I suspect port allocation would work correctly.

segasai · 2021-07-19T14:01:17Z

The reason why I run things in parallel, is that my likelihood calculation is very slow, so I've decided to split the data and use many cores that I have to distributie the likelihood/gradient calculations over cores and do HMC myself using littlemcmc. (map_rect is too much pain to use).

Regarding multprocessing, on linux it uses processes not threads. And in my actual problem what I have is 36 datasets and run them over my 36 processes in a pool and then accumulate gradients/likelihoods. The order does not matter for me.

segasai · 2021-07-19T14:09:25Z

As a temporary stopgap fix I've just put this in stan/common.py

        for i in range(10):
            try:
                host, port = "127.0.0.1", unused_tcp_port()
                site = aiohttp.web.TCPSite(self.runner, host, port)
                await site.start()
                break
            except OSError:
                continue

so it'd try 10 times before bailing out, but that's just a hack.

riddell-stan · 2021-07-19T14:32:53Z

That's a good idea. I suspect others encountering this issue can copy this.

…

As a temporary stopgap fix I've just put this in stan/common.py ` for i in range(10): try: host, port = "127.0.0.1", unused_tcp_port() site = aiohttp.web.TCPSite(self.runner, host, port) await site.start() break except OSError: continue ` so it'd try 10 times before bailing out, but that's just a hack. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#315 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJQUBVRLV6OHFK3VWSOZ2ULTYQW2BANCNFSM5ATXHF6A>.

stale · 2021-11-25T20:38:53Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

segasai added the bug label Jul 19, 2021

riddell-stan removed the bug label Aug 13, 2021

stale bot added the wontfix label Nov 25, 2021

stale bot closed this as completed Apr 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error while attempting to bind on address, address already in use #315

error while attempting to bind on address, address already in use #315

segasai commented Jul 19, 2021

riddell-stan commented Jul 19, 2021

segasai commented Jul 19, 2021

segasai commented Jul 19, 2021

riddell-stan commented Jul 19, 2021 via email

stale bot commented Nov 25, 2021

error while attempting to bind on address, address already in use #315

error while attempting to bind on address, address already in use #315

Comments

segasai commented Jul 19, 2021

Describe your system

Steps/Code to Reproduce

riddell-stan commented Jul 19, 2021

segasai commented Jul 19, 2021

segasai commented Jul 19, 2021

riddell-stan commented Jul 19, 2021 via email

stale bot commented Nov 25, 2021