Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: Made s3 related tests mock boto #17388

Merged
merged 1 commit into from Sep 14, 2017

Conversation

kirkhansen
Copy link
Contributor

Kept a couple tests un-mocked for testing things like accessing a private bucket as that's hard to mock.

@codecov
Copy link

codecov bot commented Aug 31, 2017

Codecov Report

Merging #17388 into master will increase coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17388      +/-   ##
==========================================
+ Coverage   91.18%    91.2%   +0.01%     
==========================================
  Files         163      163              
  Lines       49545    49545              
==========================================
+ Hits        45179    45188       +9     
+ Misses       4366     4357       -9
Flag Coverage Δ
#multiple 88.98% <ø> (+0.03%) ⬆️
#single 40.21% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.77% <0%> (-0.1%) ⬇️
pandas/core/indexes/datetimes.py 95.43% <0%> (-0.1%) ⬇️
pandas/io/common.py 68.64% <0%> (+0.84%) ⬆️
pandas/io/s3.py 85% <0%> (+85%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fa557f7...8f149d8. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not add moto to EVERY ci file, this completely defeats the purpose. only add where s3fs is atm. we specifically have certain types of tests marked as disabled. you need to handle moto NOT being installed (via a pytest.importorskip..). on windows only add on a single build, ok with adding the pip install but only add for the 3.6 build.

@@ -8,3 +8,4 @@ py
PyCrypto
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls make sure to add a return at the end of the line; that's the red character in the display

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you adding tips files (as .csv) that already exist elsewhere?

@@ -5,3 +5,4 @@ pandas_gbq
pandas_datareader
statsmodels
scikit-learn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

@@ -2,3 +2,4 @@ html5lib==1.0b2
beautifulsoup4==4.2.0
openpyxl
argparse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

@@ -1,3 +1,4 @@
html5lib==1.0b2
beautifulsoup4==4.2.1
blosc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -0,0 +1 @@
moto
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

@@ -0,0 +1 @@
moto
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove but leave the file

@@ -0,0 +1 @@
moto
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

@@ -0,0 +1 @@
moto
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

@@ -0,0 +1 @@
moto
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is ok, add a return


import pandas.util.testing as tm
from pandas import DataFrame
from pandas.io.parsers import read_csv, read_table

TIPS_FILE = os.path.join(tm.get_data_path(), 'tips.csv')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be a fixture


@pytest.fixture(scope='module')
def salaries_table():
path = os.path.join(tm.get_data_path(), 'salaries.csv')
return read_table(path)


@pytest.fixture(scope='module')
def test_s3_resource(request):
pytest.importorskip('s3fs')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skip on moto not installed

import pytest
import six
from moto import mock_s3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import may have to be inside the fixture, so that the test module can be imported without moto being installed.

@TomAugspurger
Copy link
Contributor

why are you adding tips files (as .csv) that already exist elsewhere?

I don't think we have compressed versions of those in the repo.

@gfyoung gfyoung added Testing pandas testing functions or related to the test suite CI Continuous Integration labels Aug 31, 2017
@kirkhansen
Copy link
Contributor Author

do not add moto to EVERY ci file, this completely defeats the purpose. only add where s3fs is atm. we specifically have certain types of tests marked as disabled. you need to handle moto NOT being installed (via a pytest.importorskip..). on windows only add on a single build, ok with adding the pip install but only add for the 3.6 build.

@jreback what is the purpose I've defeated if I may ask? I suppose I can understand skipping some tests based on s3fs not being installed to test that pandas can handle import errors / can run without optional dependencies or something like that, but installing different test dependencies per build is unnecessary and potentially dangerous. It also makes the build process needlessly complicated. I spent an extensive amount of time reading the build scripts and studying the failed build output to figure out which requirements files were being read in each build. I was finally frustrated enough that I just added it to every .pip file so at least all the environments would have that test-only dependency. Most of my time was spent waiting for the build since it took about 25 minutes or so to witness a failure.

Skipping tests based on test-only dependencies can also be dangerous as you could implement a test that has a skip, and never realize the test would actually fail when it should have due to a bug the test would have otherwise found. 48 requirements files seems a bit excessive. Skipping based on test-only dependencies is a bad idea, IMHO.

@TomAugspurger
Copy link
Contributor

Part of the reason for the complicated requirements files is the need to support multiple versions of libraries like numpy, matplotlib, etc. We'll need one file per environment for them.

For optional deps though, it may be simpler to install them in every environment, especially test-only deps like moto or mock.

I like how dask handles checking for accidental use of optional dependencies here

@kirkhansen
Copy link
Contributor Author

@TomAugspurger dask's handling of that is nice! Good call on the different versions; that makes sense.
I'm still fairly new to the github interface for pull requests. Is there a proper protocol to notify @jreback that I believe the requested changes have been dealt with?

conn.create_bucket(Bucket='cant_get_it', ACL='private')
add_tips_files('cant_get_it')

def teardown():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that as of pytest 3.0, the preferred way of doing teardown / finalization is to simply yield and then run the teardown: https://docs.pytest.org/en/latest/fixture.html#fixture-finalization-executing-teardown-code. It's a bit simpler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had it that way originally, but saw this on that same page

Finalizers will always be called regardless if the fixture setup code raises an exception. This is handy to properly close all resources created by a fixture even if one of them fails to be created/acquired

For this case that probably doesn't matter, but I wasn't sure which was preferred. I can change that to a yield.

@@ -50,151 +93,142 @@ def check_compressed_urls(salaries_table, compression, extension, mode,
tm.assert_frame_equal(url_table, salaries_table)


class TestS3(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll still keep the class TestS3, simply for the namespacing, so you can do pytest pandas/tests/io/parser/test_network::TestS3 to hit all the s3 tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine. We may want to add something to the contributing docs then to ward people like me from changing existing tests to the 'more functional style' then.

https://pandas.pydata.org/pandas-docs/stable/contributing.html#transitioning-to-pytest

Going forward, we are moving to a more functional style using the pytest framework, which offers a richer testing framework that will facilitate testing and developing. Thus, instead of writing test classes, we will write test functions like this...

# boto3 is a dependency of s3fs
import boto3
client = boto3.client("s3")
def test_read_csv__handles_boto_s3_object(test_s3_resource, tips_file):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra _ between csv and handles

Copy link
Contributor Author

@kirkhansen kirkhansen Sep 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a habit I've formed at my current position. We name our tests like test_name_of_function__the_thing_your_testing. I can remove that.

result = read_csv(s3_object["Body"])
assert isinstance(result, DataFrame)
assert not result.empty
result = read_csv(six.BytesIO(s3_object["Body"].read()), encoding='utf8')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

six isn't a dependency of pandas (though it probably is transitively), but you can do from pandas.compat import BytesIO.

I'm curious why you needed it now, when we didn't before. Does simply passing encoding='utf-8' to read_csv work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why that stopped working when I starting using moto...
The error I get using result = read_csv(s3_object["Body"], encoding='utf8') is

        if not is_file_like(filepath_or_buffer):
            msg = "Invalid file path or buffer object type: {_type}"
>           raise ValueError(msg.format(_type=type(filepath_or_buffer)))
E           ValueError: Invalid file path or buffer object type: <class 'botocore.response.StreamingBody'>

pandas/io/common.py:219: ValueError

Making it a BytesIO explicitly makes pandas happy about it.
I'm also running python 3.6.0 locally.

def salaries_table():
path = os.path.join(tm.get_data_path(), 'salaries.csv')
return read_table(path)


@pytest.fixture(scope='module')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a test and not a fixture

pytest.importorskip('s3fs')
moto = pytest.importorskip('moto')
moto.mock_s3().start()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if u r trying to test 3 things then use parametrize

('tips.csv.bz2', tips_file + '.bz2'),
]

def add_tips_files(bucket_name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this becomes inline when u use parametrize

moto.mock_s3().stop()
request.addfinalizer(teardown)

return conn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like you need to create a proper yield fixture that does the setup and tear down

the test itself is pretty simple

pytest.importorskip('s3fs')
# more of an integration test due to the not-public contents portion
# can probably mock this though.
for ext, comp in [('', None), ('.gz', 'gzip'), ('.bz2', 'bz2')]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use parametrize

tm.get_data_path('tips.csv')).iloc[:10], df)


def test_parse_public_s3_bucket_nrows(test_s3_resource):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since u have so many s3 tests would be ok with making a new file just for them (ok leaving them here too)

# Read with a chunksize
chunksize = 5
local_tips = read_csv(tm.get_data_path('tips.csv'))
for ext, comp in [('', None), ('.gz', 'gzip'), ('.bz2', 'bz2')]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use parametrize

# Read public file from bucket with not-public contents
df = read_csv('s3://cant_get_it/tips.csv')
def test_parse_public_s3_bucket_python(test_s3_resource):
for ext, comp in [('', None), ('.gz', 'gzip'), ('.bz2', 'bz2')]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@pep8speaks
Copy link

pep8speaks commented Sep 4, 2017

Hello @kirkhansen! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on September 14, 2017 at 02:26 Hours UTC

@jreback
Copy link
Contributor

jreback commented Sep 5, 2017

@kirkhansen with regards to #17388 (comment)

we have many builds and have a fairly comprehensive test infrastructure in order to support many goals.

  • backcompat testing of specific versions
  • slow/fast testing
  • having builds not take excessive time
  • easy interpretation of failing results

so we don't need every dependency running on every single build. sure the vast majority of them do, but the io sub-systems in particular are somewhat segregated to specific builds.

Imagine you are a contributor diagnosing failures, having some builds succeed, while having failures isolated to a single or small number of builds is generally a good thing.

testing everything on every build is nice to do in a smaller project, but becomes infeasible in larger ones.

finally not having a test mock in every build forces the test writer care of mocking properly (e.g. making sure imports are satisfied and such, making testing more robust).

@TomAugspurger
Copy link
Contributor

so we don't need every dependency running on every single build.

That's true for some dependencies, but I think docs and test-only dependencies should always be installed:

  • pytest
  • pytest-xdist
  • pytest-cov
  • flake8
  • sphinx
  • nbsphinx
  • moto

finally not having a test mock in every build forces the test writer care of mocking properly

How so? The behavior of

  • ImportError if s3fs isn't installed
  • succeed if it is

is the same, regardless of whether moto is installed or not.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kirkhansen about the BytesIO from here, that test wasn't being run, since before, since it didn't start with test_ 😬 . Thanks for fixing that.

@jreback
Copy link
Contributor

jreback commented Sep 6, 2017

ok I'll buy that we should make moto a standard test install. so remove moto from all .pip/.build files, and simply add where we install pytest (for both windows and linux), then it will always be available. would be ok removing the import skips as well. pls add to the ci\requirements_dev.txt though

@kirkhansen
Copy link
Contributor Author

kirkhansen commented Sep 12, 2017

so remove moto from all .pip/.build files, and simply add where we install pytest

It appears pytest is being installed in the install_travis.sh, install_circle.sh, and appveyor.yml via conda. It also appears that the conda-forge channel is only added to some tests causing lots of build fails. I assume there is good reason to not include the conda-forge channel on all builds. How would you like me to proceed, @jreback?

@jreback
Copy link
Contributor

jreback commented Sep 12, 2017

It appears pytest is being installed in the install_travis.sh, install_circle.sh, and appveyor.yml via conda. It also appears that the conda-forge channel is only added to some tests causing lots of build fails. I assume there is good reason to not include the conda-forge channel on all builds. How would you like me to proceed, @jreback?

dont' install the conda version, just pip install, its right under the conda line (e.g. we pip install pytest-xdist)

@@ -5,3 +5,4 @@ cython
pytest>=3.1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the no change files e.g

git checkout master ci/requirements-3.6_NUMPY_DEV.pip

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added those files when I was attempting to install moto via pip in those environments; they don't exist in master. Do we want to get rid of them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually these are empty, so ok

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor spelling change. otherwise lgtm. ping on green.

@@ -19,6 +26,40 @@ def salaries_table():
return read_table(path)


@pytest.fixture(scope='module')
def test_s3_resource(tips_file):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you rename to s3_resources (test is reserved for an actual test)

@jreback jreback added this to the 0.21.0 milestone Sep 13, 2017
Kept a couple around for testing things like accessing a private bucket as that's hard to mock.

Try the pip counterparts

Some more merge request changes
@jreback jreback merged commit 97abd2c into pandas-dev:master Sep 14, 2017
@jreback
Copy link
Contributor

jreback commented Sep 14, 2017

thanks @kirkhansen nice PR!

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Sep 14, 2017 via email

@kirkhansen
Copy link
Contributor Author

my pleasure!

@kirkhansen kirkhansen deleted the moto-s3-tests branch September 15, 2017 13:31
alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TST/CI: Use moto to mock S3 calls
5 participants