Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io/common.py: boto3 with python 3.5 #11915

Closed
stharrold opened this issue Dec 28, 2015 · 30 comments
Closed

io/common.py: boto3 with python 3.5 #11915

stharrold opened this issue Dec 28, 2015 · 30 comments
Labels
Build Library building on various platforms Compat pandas objects compatability with Numpy or Python functions IO Data IO issues that don't fit into a more specific label
Milestone

Comments

@stharrold
Copy link

Pandas v0.17.1 won't import for Python 3.5 due to boto. Replacing with boto3 appears to fix the issue. I'd like to suggest replacing dependencies on boto with boto3 for at least Python 3.5. I didn't test with other Python 3x versions. Thank you

Update (2016-01-10T02:45:00Z): There are significant API changes between boto and boto3 (http://boto3.readthedocs.org/en/latest/guide/migrations3.html). Doing import boto3 as boto as below will allow pandas to import for Python 3.5, but then AWS functionality is broken.

samuel_harrold@instance-20151227t225000z:~$ ipython
Python 3.5.1 |Anaconda 2.4.1 (64-bit)| (default, Dec  7 2015, 11:16:01) 
Type "copyright", "credits" or "license" for more information.

IPython 4.0.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import pandas
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-d6ac987968b6> in <module>()
----> 1 import pandas

/home/samuel_harrold/anaconda3/lib/python3.5/site-packages/pandas/__init__.py in <module>()
     40 
     41 # let init-time option registration happen
---> 42 import pandas.core.config_init
     43 
     44 from pandas.core.api import *

/home/samuel_harrold/anaconda3/lib/python3.5/site-packages/pandas/core/config_init.py in <module>()
     15                                 is_instance_factory, is_one_of_factory,
     16                                 get_default_val)
---> 17 from pandas.core.format import detect_console_encoding
     18 
     19 

/home/samuel_harrold/anaconda3/lib/python3.5/site-packages/pandas/core/format.py in <module>()
      8 from pandas.core.base import PandasObject
      9 from pandas.core.common import adjoin, notnull
---> 10 from pandas.core.index import Index, MultiIndex, _ensure_index
     11 from pandas import compat
     12 from pandas.compat import(StringIO, lzip, range, map, zip, reduce, u,

/home/samuel_harrold/anaconda3/lib/python3.5/site-packages/pandas/core/index.py in <module>()
     29 from pandas.core.strings import StringAccessorMixin
     30 from pandas.core.config import get_option
---> 31 from pandas.io.common import PerformanceWarning
     32 
     33 

/home/samuel_harrold/anaconda3/lib/python3.5/site-packages/pandas/io/common.py in <module>()
     66 
     67 try:
---> 68     from boto.s3 import key
     69     class BotoFileLikeReader(key.Key):
     70         """boto Key modified to be more file-like

/home/samuel_harrold/anaconda3/lib/python3.5/site-packages/boto/__init__.py in <module>()
   1214     return storage_uri(uri_str)
   1215 
-> 1216 boto.plugin.load_plugins(config)

/home/samuel_harrold/anaconda3/lib/python3.5/site-packages/boto/plugin.py in load_plugins(config)
     90         return
     91     directory = config.get('Plugin', 'plugin_directory')
---> 92     for file in glob.glob(os.path.join(directory, '*.py')):
     93         _import_module(file)

/home/samuel_harrold/anaconda3/lib/python3.5/posixpath.py in join(a, *p)
     87                 path += sep + b
     88     except (TypeError, AttributeError, BytesWarning):
---> 89         genericpath._check_arg_types('join', a, *p)
     90         raise
     91     return path

/home/samuel_harrold/anaconda3/lib/python3.5/genericpath.py in _check_arg_types(funcname, *args)
    141         else:
    142             raise TypeError('%s() argument must be str or bytes, not %r' %
--> 143                             (funcname, s.__class__.__name__)) from None
    144     if hasstr and hasbytes:
    145         raise TypeError("Can't mix strings and bytes in path components") from None

TypeError: join() argument must be str or bytes, not 'NoneType'

In [2]: exit()
samuel_harrold@instance-20151227t225000z:~$ cp /home/samuel_harrold/anaconda3/lib/python3.5/site-packages/pandas/io/common_boto3.py /home/samuel_harrold/anaconda3/lib/python3.5/site-packages/pandas/io/common.py
samuel_harrold@instance-20151227t225000z:~$ diff /home/samuel_harrold/anaconda3/lib/python3.5/site-packages/pandas/io/common.py /home/samuel_harrold/anaconda3/lib/python3.5/site-packages/pandas/io/common_orig.py
68c68
<     from boto3.s3 import key

---
>     from boto.s3 import key
272c272
<             import boto3 as boto

---
>             import boto
samuel_harrold@instance-20151227t225000z:~$ ipython
Python 3.5.1 |Anaconda 2.4.1 (64-bit)| (default, Dec  7 2015, 11:16:01) 
Type "copyright", "credits" or "license" for more information.

IPython 4.0.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import pandas

In [2]: # success

In [3]: exit()
samuel_harrold@instance-20151227t225000z:~$ ipython
Python 3.5.1 |Anaconda 2.4.1 (64-bit)| (default, Dec  7 2015, 11:16:01) 
Type "copyright", "credits" or "license" for more information.

IPython 4.0.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import pandas as pd

In [2]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 3.16.0-0.bpo.4-amd64
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.1
nose: 1.3.7
pip: 7.1.2
setuptools: 19.1.1
Cython: 0.23.4
numpy: 1.10.2
scipy: 0.16.0
statsmodels: 0.6.1
IPython: 4.0.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.5.0
openpyxl: 2.2.6
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.4.4
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.9
pymysql: None
psycopg2: None
Jinja2: None

In [3]: 
@jreback
Copy link
Contributor

jreback commented Dec 28, 2015

I think the import of boto should first try to import boto3 as boto, then try import boto, then pass on the ImportError (this is all in pandas/io/common.py).

further would need:

  • to update the install.rst (recommend boto3 for PY3).
  • boto3 is ATM on only pip installable, so need to change ci/requirements- for the python 3 where we use boto now to pip install boto3.
  • util\print_versions.py should then print boto3/boto (only 1, which ever one imports, or None if neither)

want to do a pull-request?

@jreback jreback added the Build Library building on various platforms label Dec 28, 2015
@jreback jreback added this to the 0.18.0 milestone Dec 28, 2015
@jreback jreback added the Compat pandas objects compatability with Numpy or Python functions label Dec 28, 2015
@TomAugspurger
Copy link
Contributor

I haven't looked much recently, but I think that boto3 does have API changes from boto, so we would need a compatibility layer.

@stharrold
Copy link
Author

I haven't used boto or boto3, but there do seem to be API changes: boto/boto#3306. From the issue comment thread, it seems preferable to use boto instead of boto3 where boto is sufficient.

Pandas's CI would have caught this if it were an issue for PY<3.5. Above import error only exists for PY3.5 if boto is installed. If PY3.5 but boto is not installed, there is no import error.

In-progress TODO list:

  • Using boto3 for all PY3: if sys.version_info >= (3, ): try: import boto3 as boto except ImportError: pass.
  • Inside pandas source:
    • pandas/io/common.py: import boto3 as above. Note: API change for storing data (boto.key.Key vs boto3.Object.put).
    • pandas/io/tests/test_excel.py: import boto3 as above.
    • pandas/io/tests/test_parsers.py: import boto3 as above.
    • pandas/util/print_versions.py: add boto, boto3.
  • Outside pandas source:
    • tox.ini:
      • Should there be a [testenv:py35]? Fixed with c74b4b4
      • Add boto3 for PY3.
    • asv_bench/benchmarks/io_bench.py: import boto3 as above.
    • README.md: Add boto3 for PY3.
    • doc/source/install.rst: boto3 <https://pypi.python.org/pypi/boto3>__: Recommended for Amazon S3 access in PY3.
    • ci/requirements-3*: Replace boto with boto3 as pip-installable (similar to requirements-2.7*) No file requirements-3.5_SLOW; omitting.

Comment last modified: 2016-01-10T02:55:00Z

@jreback
Copy link
Contributor

jreback commented Dec 30, 2015

I think boto3 should be used for all PY3. not really sure of any API changes. This is prob not tested very much ATM.

@stharrold
Copy link
Author

From the terminal output above, it seems that boto/plugin.py failed when it used glob (also see boto/boto#3413). There is a record in the Python 3.5 changelog for glob (https://docs.python.org/3/whatsnew/3.5.html, https://docs.python.org/3.5/whatsnew/changelog.html).

@stharrold
Copy link
Author

I agree that using boto3 for all PY3 is probably the best choice given the apparent movement of the AWS Python SDK project.

After a little searching, there are significant changes in the AWS Python SDK API between boto and boto3: http://boto3.readthedocs.org/en/latest/guide/migrations3.html. I don't think I'm able to make a timely pull request since I'm inexperienced with both nosetests and boto[3]. Perhaps the users who originally included boto in pandas could help?

Thanks again for your help with the issue.

@jreback
Copy link
Contributor

jreback commented Jan 30, 2016

@stharrold want to do a PR?

@jreback jreback modified the milestones: Next Major Release, 0.18.0 Jan 30, 2016
@jreback jreback added the IO Data IO issues that don't fit into a more specific label label Jan 30, 2016
@jreback jreback modified the milestones: 0.18.0, Next Major Release Jan 30, 2016
@stharrold
Copy link
Author

@jreback Thanks for the offer. I'm sorry, I'm not in a position to devote the time that I think would be necessary to properly fix the issue.

@jvkersch
Copy link

I wouldn't mind giving this a closer look, if nobody else beats me to it.

@jreback
Copy link
Contributor

jreback commented Jan 31, 2016

that would be great!

@TomAugspurger
Copy link
Contributor

@jvkersch thanks.

FWIW I've had to start porting a few things to use boto3 (apparently changes to ConfigParser in 3.4.3 and 3.5.1 broke boto), and the changes aren't too difficult.

It will be up to you as you try to implement this, but I think that requiring only boto3 is entirely reasonable if it's easier. There's no real reason to clutter up the pandas code base with bunch of boto / boto3 compatibility stuff.

@jvkersch
Copy link

@TomAugspurger @jreback I finally had time to look into this, and I agree the immediate issue is a bug in boto for Python 3.5 (reported as boto/boto#3474, where I left a comment). That said, I wouldn't mind trying my hand at a switchover to boto3, using the excellent to-do list that @stharrold prepared as a guide, but I propose first adding a small check to pandas.io.common to check for Python 3.5 + boto, to ensure that Pandas still loads cleanly in the meantime. Does that sound reasonable?

@jreback
Copy link
Contributor

jreback commented Feb 16, 2016

Is this a new version of boto? what version do you repro on?

(so for sure would take a PR that adds boto to the version list)

p(py3.5)bash-3.2$ python 
Python 3.5.1 |Continuum Analytics, Inc.| (default, Dec  7 2015, 11:24:55) 
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas 
>>> quit()
(py3.5)bash-3.2$ conda install boto
Fetching package metadata: ......
Solving package specifications: ............
Package plan for installation in environment /Users/jreback/miniconda/envs/py3.5:

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    boto-2.39.0                |           py35_0         1.4 MB

The following NEW packages will be INSTALLED:

    boto: 2.39.0-py35_0

Fetching packages ...
boto-2.39.0-py 100% |##########################################################################################################################################################| Time: 0:00:00   4.58 MB/s
Extracting packages ...
[      COMPLETE      ]|#############################################################################################################################################################################| 100%
Linking packages ...
[      COMPLETE      ]|#############################################################################################################################################################################| 100%
(py3.5)bash-3.2$ python
Python 3.5.1 |Continuum Analytics, Inc.| (default, Dec  7 2015, 11:24:55) 
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
>>> pandas.show_versions()

INSTALLED VERSIONS
------------------
commit: 6fecc9331d54cb3b7c710ac823aa29f02872097d
python: 3.5.1.final.0
python-bits: 64
OS: Darwin
OS-release: 14.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.0rc1+13.g6fecc93
nose: 1.3.7
pip: 8.0.2
setuptools: 19.6.2
Cython: 0.23.2
numpy: 1.10.4
scipy: 0.17.0
statsmodels: None
xarray: None
IPython: 4.0.3
sphinx: None
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: 3.2.1.1
numexpr: 2.4.3
matplotlib: 1.5.0
openpyxl: None
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.3
lxml: None
bs4: None
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: 0.6.6.None
psycopg2: 2.6 (dt dec pq3 ext)
jinja2: None

@jreback jreback modified the milestones: 0.18.0, 0.18.1 Feb 16, 2016
@TomAugspurger
Copy link
Contributor

@jreback it's a new package entirely called boto3. If we have an interested volunteer (which it sounds like we do! thanks @jvkersch), we could hopefully just require boto3 and not have any compatibility code in pandas (maybe detect boto and warn, recommending an upgrade to boto3 for a version?) boto3 is pure python and can be install alongside boto, so that's not a problem. It also picks up AWS credentials so ideally a user won't have to change anything, just pip install boto3.

@TomAugspurger
Copy link
Contributor

If you aren't using boto at all, either uninstalling boto or renaming / moving the ~/.boto should do it. Or downgrade to python 3.5.0, which I believe works.

Just installing boto3 won't work till we're using it. I've been porting some stuff at work to use boto3 so maybe I can put together a fix tomorrow night.

@jreback
Copy link
Contributor

jreback commented Mar 2, 2016

ok, going to trap/ignore the error for now on 0.18.0

@TomAugspurger
Copy link
Contributor

Makes sense. If I have time to get to this before the next RC I'll remove those.

@jreback
Copy link
Contributor

jreback commented Mar 3, 2016

@TomAugspurger what is the repro for this error?

e.g. on py3.5.1 and 2.39 for boto it imports fine for me.

jreback added a commit to jreback/pandas that referenced this issue Mar 3, 2016
jreback added a commit that referenced this issue Mar 3, 2016
Closes #12489

Author: Jeff Reback <jeff@reback.net>

Closes #12511 from jreback/numexpr and squashes the following commits:

bb5cb1c [Jeff Reback] COMPAT: remove boto dep check, xref #11915
bad7c0f [Jeff Reback] COMPAT: blacklist numexpr=2.4.4
@TomAugspurger
Copy link
Contributor

@jreback I haven't been able to repro the import time TypeError. I think these conditions should set it up...

with open(os.path.expanduser('~/.boto'), 'wt') as f:
    f.write('[Plugins]\nplugin_directory = {}'.format(os.path.dirname(__file__))

import pandas

There just needs to be a .py file in that directory. The actual error comes from joining the plugin_directory value from that parsed Config file to a glob.glob result, one of which is bytes and the other is str in the original example. AFAIK those should always both be strs? But maybe something changed in 3.5.0.

The second bug, present in 3.4.4 and 3.5.1 which prevents the first one from even occurring, only raises when you try to authenticate. It repros with

import boto
boto.connect_s3()
NoAuthHandlerFound: No handler was ready to authenticate. 1 handlers were checked. ['HmacAuthV1Handler'] Check your credentials

If you're on 3.4.4 or 3.5.1 you should never get the first error since any key lookup from the Config will give None, even if the key is present.

That's defined in boto.exception.

@jreback
Copy link
Contributor

jreback commented Mar 3, 2016

ok, currently master is clear: https://travis-ci.org/pydata/pandas/jobs/113383976 (though for some reason boto was not showing up in show_versions() (I added it), even though conda says its installing).

even more odd is pip installing doesn't make this show up: see https://travis-ci.org/jreback/pandas/jobs/113398289 (is print_versions() doing something wrong)?

@mrocklin
Copy link
Contributor

mrocklin commented Mar 9, 2016

FWIW I've had good experiences going with just boto3 and deprecating boto entirely.

@jreback
Copy link
Contributor

jreback commented Mar 23, 2016

so new package: s3fs from @martindurant, @mrocklin, and @koverholt that might be of interest. This is a pure-python library only dependent on boto3, pip installable.

I think we could make this a dep (rather than boto3), and remove a lot of code in io/common.py as this gives us a nice file-system like object.

could potentially close: #7682, #8508 as well as this issue.

@mrocklin
Copy link
Contributor

What would you all need from s3fs to make this happen? Is there any particular logic or special cases that you've had to implement here that would we would need to ensure was well handled by s3fs?

@TomAugspurger
Copy link
Contributor

I'll run the test suite with s3fs swapped in tonight or this weekend and see what failure we have.

@jreback jreback modified the milestones: 0.18.1, 0.18.2 Apr 26, 2016
@jorisvandenbossche jorisvandenbossche modified the milestones: 0.20.0, 0.19.0 Aug 21, 2016
ShaharBental pushed a commit to ShaharBental/pandas that referenced this issue Dec 26, 2016
closes pandas-dev#11915

Author: Tom Augspurger <tom.augspurger88@gmail.com>

Closes pandas-dev#13137 from TomAugspurger/s3fs and squashes the following commits:

92ac063 [Tom Augspurger] CI: Update deps, docs
81690b5 [Tom Augspurger] COMPAT/REF: Use s3fs for s3 IO
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Library building on various platforms Compat pandas objects compatability with Numpy or Python functions IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants