Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import pandas hanging Flask 0.11.1 / Apache 2.4.18 #14641

Closed
Ty-WDFW opened this issue Nov 12, 2016 · 22 comments
Closed

import pandas hanging Flask 0.11.1 / Apache 2.4.18 #14641

Ty-WDFW opened this issue Nov 12, 2016 · 22 comments
Labels
Build Library building on various platforms Usage Question
Milestone

Comments

@Ty-WDFW
Copy link

Ty-WDFW commented Nov 12, 2016

pandas 0.19.1 is hanging apache on the import in the python script, the website times out. Downgrading to 0.18.1 resolves the issue. Tested this on a fresh EC2 instance Ubuntu 16.04.

Apache log:

  • [Sat Nov 12 03:05:18.784672 2016] [core:warn] [pid 23426:tid 139710063925120] AH00045: child process 23563 still did not exit, sending a SIGTERM
  • [Sat Nov 12 03:05:20.786946 2016] [core:warn] [pid 23426:tid 139710063925120] AH00045: child process 23563 still did not exit, sending a SIGTERM
  • [Sat Nov 12 03:05:22.789238 2016] [core:warn] [pid 23426:tid 139710063925120] AH00045: child process 23563 still did not exit, sending a SIGTERM
  • [Sat Nov 12 03:05:24.791547 2016] [core:error] [pid 23426:tid 139710063925120] AH00046: child process 23563 still did not exit, sending a SIGKILL

INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-45-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.19.1
nose: None
pip: 9.0.1
setuptools: 28.8.0
Cython: None
numpy: 1.11.2
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Nov 12, 2016

pls show a code example

@Ty-WDFW
Copy link
Author

Ty-WDFW commented Nov 12, 2016

from flask import Flask
import pandas as pd

app = Flask(__name__)

@app.route("/")
def hello():
    return 'Hello World!'

if __name__ == "__main__":
    app.run()

When import pandas as pd is commented out the script will run fine and the website will load. Interesting enough when pandas imported in the python console it works flawlessly -- this is purely an issue with pandas 0.19.1 and apache. In the example here I'm using a environment, I've also tested this outside of a environment and still have the same issue.

Here's my apache configuration file:

<VirtualHost *:80>
        # The ServerName directive sets the request scheme, hostname and port that
        # the server uses to identify itself. This is used when creating
        # redirection URLs. In the context of virtual hosts, the ServerName
        # specifies what hostname must appear in the request's Host: header to
        # match this virtual host. For the default virtual host (this file) this
        # value is not decisive as it is used as a last resort host regardless.
        # However, you must set it for any further virtual host explicitly.
        #ServerName www.example.com

        ServerAdmin webmaster@localhost
        DocumentRoot /var/www/html

        WSGIDaemonProcess flaskapp user=flask group=www threads=5
        WSGIScriptAlias / /var/www/html/flaskapp/flaskapp.wsgi

        <Directory flaskapp>
            WSGIProcessGroup flaskapp
            WSGIApplicationGroup %{GLOBAL}
            Order deny,allow
            Allow from all
        </Directory>

Here's the .wsgi file:

import os
import sys
import site

# Add virtualenv site packages
site.addsitedir(os.path.join(os.path.dirname(__file__), 'env/local/lib64/python2.7/site-packages'))

# Path of execution
sys.path.append('/var/www/html/flaskapp')

# Fired up virtualenv before include application
activate_env = os.path.expanduser(os.path.join(os.path.dirname(__file__), 'env/bin/activate_this.py'))
execfile(activate_env, dict(__file__=activate_env))

from main import app as application

@jreback
Copy link
Contributor

jreback commented Nov 12, 2016

you are doing odd path manipulation
you shouldn't do any of that in the python program ; activation needs to occur before the app starts

you are probably picking up different versions of pandas from different envs (and/or it's deps)

so you need to fix that

closing as not a pandas issue

@jreback jreback closed this as completed Nov 12, 2016
@hdemers
Copy link

hdemers commented Nov 14, 2016

Same issue here with pandas >= 0.19.0. However, in this case I don't have different versions of pandas from different envs (and/or it's deps), because I'm running this web server inside a docker container freshly built each time.

Commenting out import pandas in the following solves the issue, as well as using pandas 0.18.1. Pandas 0.19.0 or 0.19.1 is making Apache hang.

from flask import Flask
import pandas

app = Flask(__name__)

@app.route("/")
def hello():
    return 'Hello World!'

if __name__ == "__main__":
    app.run()

The .wsgi file is:

import sys
sys.path.insert(0, '/app')
from myapplication import app as application

@jreback
Copy link
Contributor

jreback commented Nov 14, 2016

not really sure what you are actually running. But this works fine.

If you can show a reproducible example, pls comment.

Python 3.5.2 |Continuum Analytics, Inc.| (default, Jul  5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]
Type "copyright", "credits" or "license" for more information.

IPython 5.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: from flask import Flask
   ...: import pandas
   ...:
   ...: app = Flask(__name__)
   ...:
   ...: @app.route("/")
   ...: def hello():
   ...:     return 'Hello World!'
   ...:

In [2]: app.run()
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
127.0.0.1 - - [14/Nov/2016 16:30:11] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [14/Nov/2016 16:30:11] "GET /favicon.ico HTTP/1.1" 404 -

In [3]: pandas.__version__
Out[3]: '0.19.0'

In [5]: import flask

In [6]: flask.__version__
Out[6]: '0.11.1'

@jreback jreback added Build Library building on various platforms Usage Question labels Nov 14, 2016
@hdemers
Copy link

hdemers commented Nov 14, 2016

The above script has to be run by Apache, with mod_wsgi, exactly like it was first reported. I've included this piece

if __name__ == "__main__":
    app.run()

to show that the script runs fine outside of Apache, but makes it hang otherwise.

@Ty-WDFW
Copy link
Author

Ty-WDFW commented Nov 14, 2016

However, in this case I don't have different versions of pandas from different envs (and/or it's deps)

Just to clarify, I was using a brand new instance with only 0.19.1 freshly installed in my environment. There were no other versions of pandas or even environments installed system-wide.

@jreback
Copy link
Contributor

jreback commented Nov 14, 2016

you can try running with python -v to see what is happening, otherwise no idea.

@Ty-WDFW
Copy link
Author

Ty-WDFW commented Nov 21, 2016

It'll take someone with more technical knowledge than a salmon biologist to figure out how to make mod_wsgi run python in verbose. 😞. Since I first posted this I've tested this on multiple clean instances, the only resolution is to downgrade to 18.1.

@Qblack
Copy link

Qblack commented Dec 12, 2016

I have also had this issue.
It can be solved by adding WSGIApplicationGroup %{GLOBAL}.
unfortunately that may lead to more issues down the road if packages ever share names.

I found the solution here http://stackoverflow.com/questions/25782912/pandas-and-numpy-thread-safety but have only ever had issues for the latest version of pandas if that helps the salmon people figure it out.

@birdsarah
Copy link
Contributor

birdsarah commented Dec 14, 2016

Hi all, in case it's useful.

I had a similar issue:
Django==1.10.3
pandas==0.19.1
Python 3.4
Apache with mod_wsgi on AWS Elastic Beanstalk

My solution was to move the imports from the top of my views.py file and into the functions that needed them and all was well. pandas was already being used by this project in django management commands with the imports at the top of the module it was the addition into views.py that gave the problem.

However, it may be worth noting that I think bokeh also had the same problem, and bokeh doesn't have a dependency on pandas any more. I will need to confirm this though if it's a useful avenue.

@birdsarah
Copy link
Contributor

@jreback, you closed this issue on Nov 12 with the reason "closing as not a pandas issue" - i don't know where else I'd file this bug and look for progress/insight on it - suggestions welcome.

@jreback
Copy link
Contributor

jreback commented Jan 27, 2017

@birdsarah I would try doing your import of pandas IN the function you need it (rather than at the top of the module). If you have numexpr installed this would make a difference (and make sure you have latest versions of thing)

@Qblack
Copy link

Qblack commented Jan 27, 2017

I do not have numexpr installed but will try moving the imports when I get a chance. Any idea if the problem comes from numpy instead of pandas?

@jreback
Copy link
Contributor

jreback commented Jan 27, 2017

@Qblack no idea. it sounds like an initialization problem.

@birdsarah
Copy link
Contributor

I would try doing your import of pandas IN the function you need it

That's what I'm doing - and it works. But feels like a workaround as I've got commented pandas imports dotted all around my codebase - it's not ideal.

Will have a look at numexpr - thanks

@jzwinck
Copy link
Contributor

jzwinck commented Mar 9, 2017

I have encountered a similar hang, when doing import pandas as pd at the top level of a Python file which I import using boost::python::exec(). I'm using Pandas 0.19.2 and Python 3.5. It hangs when it imports indexing.py which does this:

_eps = np.finfo('f4').eps

And indeed, if I just import numpy and do that myself instead of importing Pandas, it hangs as well, seemingly trying to manage the GIL.

@Ty-WDFW if you have a chance, perhaps you can try replacing the above-mentioned line in indexing.py with this approximation:

_eps = 1.1920929e-07

And see if that fixes the hang. Or just try doing np.finfo('f4').eps in your own script before you import pandas and see if it hangs there.

In my case, the problem seems to be that Py_Initialize() was called in one thread during startup, then the actual Python code was executed in a different thread later. The GIL ends up being held by the first thread and np.finfo() tries to acquire it in the second. One solution to this is to call PyEval_InitThreads(); PyEval_SaveThread(); after Py_Initialize(), then acquire the GIL explicitly before each call into the Python C API.

@jorisvandenbossche
Copy link
Member

@jzwinck Thanks for looking into that!
Do I understand you correctly that this is then something that should be reported to numpy? (as it is triggered by just calling np.finfo('f4').eps)

@jzwinck
Copy link
Contributor

jzwinck commented Mar 13, 2017

@jorisvandenbossche No, it is not a NumPy bug, though it is pretty strange/annoying that np.finfo() does anything with the GIL.

The only true bug that I believe exists here is in the outer application, which in my case and probably in all the cases here called Py_Initialize() in one thread, then ran Python code in another thread without the Python C API mandated calls to PyEval_InitThreads() and so on. In other words, this is a classic deadlock caused by lock precondition violation.

If Pandas wants to make life easier on future folks who could get screwed up by this, it is probably possible to check if the GIL is held by the thread which imports pandas, by importing a small Cython module at the top of pandas.py with code as here: http://stackoverflow.com/questions/11366556/how-can-i-check-whether-a-thread-currently-holds-the-gil

It could really save some people a lot of time (as evidenced by this issue; it took me perhaps two hours to debug my own occurrence of the same). And it isn't a lot of code...just needs a careful hand to get it right, and a willingness to add a bit more Cython, which I have previously been told is not desirable in Pandas generally.

Then again, this diagnostic could equally be applied to Python's very own import mechanism (because importing anything while the GIL is not held is an error, and importing anything is already not high-performance). Or to the various PyRun_*() functions, all of which run Python code so must never be called from C when the GIL is not held.

@jzwinck
Copy link
Contributor

jzwinck commented Mar 13, 2017

@jreback and @jorisvandenbossche: I just noticed that the NumPy docs explicitly say that finfo() should not be cached at the module level (when developing NumPy itself). I admit that import pandas is already far slower than import numpy, but if Pandas wishes to follow NumPy's edict, it might be a good idea to move Pandas' finfo() call into the one function which needs it, is_index_slice().

@jorisvandenbossche
Copy link
Member

@jzwinck yes, I think that is certainly OK (want to do a PR?)

Would that actually solve this issue at the same time? Or would it just postpone the hanging until an indexing operation is done?

@jreback
Copy link
Contributor

jreback commented Mar 13, 2017

@jzwinck we could just hard code this. it is barely used.

jzwinck added a commit to jzwinck/pandas that referenced this issue Mar 15, 2017
NumPy docs for `np.finfo()` say not to call it during import (at module scope).
It's a relatively expensive call, and it modifies the GIL state.
Now we just hard-code it, because it is always the value anyway.
This avoids touching the GIL at import, which helps avoid deadlocks in practice.

Closes pandas-dev#14641.
jzwinck added a commit to jzwinck/pandas that referenced this issue Mar 15, 2017
jzwinck added a commit to jzwinck/pandas that referenced this issue Mar 15, 2017
@jreback jreback added this to the 0.20.0 milestone Mar 15, 2017
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this issue Mar 21, 2017
NumPy docs for np.finfo() say not to call it during import (at
module scope).  It's a relatively expensive call, and it modifies the
GIL state.  Now we just hard-code it, because it is always the value
anyway.  This avoids touching the GIL at import, which helps avoid
deadlocks in practice.

closes pandas-dev#14641

Author: John Zwinck <jzwinck@gmail.com>

Closes pandas-dev#15691 from jzwinck/patch-1 and squashes the following commits:

dadb97c [John Zwinck] DOC: mention pandas-dev#14641 in 0.20.0 whatsnew
e565230 [John Zwinck] ENH: use constant f32 eps, not np.finfo() during import
mattip pushed a commit to mattip/pandas that referenced this issue Apr 3, 2017
NumPy docs for np.finfo() say not to call it during import (at
module scope).  It's a relatively expensive call, and it modifies the
GIL state.  Now we just hard-code it, because it is always the value
anyway.  This avoids touching the GIL at import, which helps avoid
deadlocks in practice.

closes pandas-dev#14641

Author: John Zwinck <jzwinck@gmail.com>

Closes pandas-dev#15691 from jzwinck/patch-1 and squashes the following commits:

dadb97c [John Zwinck] DOC: mention pandas-dev#14641 in 0.20.0 whatsnew
e565230 [John Zwinck] ENH: use constant f32 eps, not np.finfo() during import
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Library building on various platforms Usage Question
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants