Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLD: Split out tests into pandas_tests package #53007

Open
wants to merge 56 commits into
base: main
Choose a base branch
from

Conversation

lithomas1
Copy link
Member

@lithomas1 lithomas1 commented Apr 29, 2023

This seems to work(I tested by making a new conda env and running tests via pd.test() ) but is kinda sketchy.
(Last thing to do is to update the absolute imports from within the test folder to relative ones).

This saves about 1-2MBs for me off the total wheel size for me.

@lithomas1 lithomas1 added the Build Library building on various platforms label Apr 29, 2023
MANIFEST.in Outdated
@@ -53,7 +53,11 @@ global-exclude *.pxi
# GH 39321
# csv_dir_path fixture checks the existence of the directory
# exclude the whole directory to avoid running related tests in sdist
prune pandas/tests/io/parser/data
#prune pandas/tests/io/parser/data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this commented out on purpose? Remove?

@jbrockmendel
Copy link
Member

So we wouldn’t need to make a separate repo? Would we need to set something up on pypi? If “no” to both then very neat

@lithomas1
Copy link
Member Author

So we wouldn’t need to make a separate repo? Would we need to set something up on pypi? If “no” to both then very neat

No, we wouldn't need to make a separate repo, but we'd have to set up a "pandas-tests" package on PyPI (unless we are removing ability to test altogether).

The idea is that "pandas-tests" would have a pinned pandas version as a dependency, and we would release pandas and pandas-tests at the same time.

The only thing left to do in this PR is to turn all the absolute imports into relative imports (since the tests may now live elsewhere), which is probably going to cause a lot of churn, and then run the tests to make sure that nothing else in the tests is depending on stuff in pandas proper.

@mroeschke
Copy link
Member

Would also need some validation in pd.test to raise an error if pandas-test package isn't installed

@github-actions
Copy link
Contributor

github-actions bot commented Jun 1, 2023

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Jun 1, 2023
@lithomas1
Copy link
Member Author

Planning on holding off on this until setuptools is completely gone.

@lithomas1 lithomas1 added the Mothballed Temporarily-closed PR the author plans to return to label Jun 2, 2023
@mroeschke
Copy link
Member

Going to close to clear the queue but feel free to reopen when ready

@mroeschke mroeschke closed this Aug 1, 2023
@lithomas1 lithomas1 reopened this Aug 23, 2023
@simonjayhawkins
Copy link
Member

Planning on holding off on this until setuptools is completely gone.

@lithomas1 what's the timescales on this?

@lithomas1
Copy link
Member Author

I'll take another look - thanks for the reminder.

@lithomas1
Copy link
Member Author

Finally green here 😌

Anyone want to take a look?

(NOTE: The diff is only big because I split the doctest stuff out of conftest. Actual changes are only ~200 lines or so)

@@ -188,6 +188,19 @@
__git_version__ = v.get("full-revisionid")
del get_versions, v

import sys
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this break an API users potentially use. If so will we need a depr cycle?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kinda.
You can still access pandas.tests as before (if you're using a wheel without the tests submodule) e.g. if you wanted to use the extension dtypes tests, but you'll need to install pandas-tests to do so.

This is technically breaking, but the next release also happens to be 3.0, so I think we'd be able to get away with a breaking change here :)

@alimcmaster1
Copy link
Member

alimcmaster1 commented Mar 19, 2024

Hey!

Thinks this is a decent idea.

How much does this decrease the dist size by? Would be good to know that before merging this as that's the main benefit of this change right? Shout if I'm missing something.

How will we ensure users don't have mismatching versions of pandas and pandas_test installed if these are to deployed to pypi as sep packages? I didn't see any logic for this. As will need to pin version of pandas used in pandas_test right?

@lithomas1
Copy link
Member Author

Hey!

Thinks this is a decent idea.

How much does this decrease the dist size by? Would be good to know that before merging this as that's the main benefit of this change right? Shout if I'm missing something.

It's about 15 MBs. The whole of pandas is around 45 MB, so it shaves off a third off the install side.

How will we ensure users don't have mismatching versions of pandas and pandas_test installed if these are to deployed to pypi as sep packages? I didn't see any logic for this. As will need to pin version of pandas used in pandas_test right?

I haven't figured this out yet (My original plan was to get this merged first and see if downstream is able to easily adapt to this before proceeding further). I was thinking I was going to put in a pin in the pyproject.toml for pandas_tests come release time. Open to suggestion on this, though.

ci/code_checks.sh Outdated Show resolved Hide resolved
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think now it may be important to run the wheel build job on PRs again since more PRs will be updating pandas and pandas_tests and we need to make sure there's always compatibility between the two.

@lithomas1
Copy link
Member Author

I think now it may be important to run the wheel build job on PRs again since more PRs will be updating pandas and pandas_tests and we need to make sure there's always compatibility between the two.

I think it should be OK personally.

On a normal pandas build pandas_tests is just pandas.tests, we don't exclude the tests on a regular build, just on builds from sdist (which would be the case of a wheel build).

@lithomas1
Copy link
Member Author

@pandas-dev/pandas-core
any other feedback?

I'd like to merge this in otherwise.

Comment on lines +13 to +15
#dependencies=[
# "pandas==2.1.0"
#]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this still be commented out?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder.

I planned on doing this in a followup (since technically this only needs to be set at release time), but then realized I can do this programmtically.

I'll make the changes to the wheel builder workflow soon.

@mroeschke
Copy link
Member

I would still prefer to have every PR run 1 job (Ubuntu only) pandas wheel without tests + pandas_tests wheel to validate that the connection between the two is never broken.

Our CI jobs are relatively streamlined now and I think this job is definitely more important than some of the checks we run on every PR

@WillAyd
Copy link
Member

WillAyd commented Mar 27, 2024

Nice work, but I'm pretty lukewarm to the implementation. Seems like there are a lot of potential break points and CI is not the easiest to debug.

For me locally I see 20.7 MB / 33.6 MB in the test folder is from data files, with a good deal of that from SAS. I am unsure if all of the added complexity here is worth it versus a simpler solution to find a better home for data files

image

@lithomas1
Copy link
Member Author

lithomas1 commented Mar 27, 2024

Nice work, but I'm pretty lukewarm to the implementation. Seems like there are a lot of potential break points and CI is not the easiest to debug.

For me locally I see 20.7 MB / 33.6 MB in the test folder is from data files, with a good deal of that from SAS. I am unsure if all of the added complexity here is worth it versus a simpler solution to find a better home for data files

We already don't ship the data files in the test suite, so the 15 MBs of savings is from the actual Python pandas tests files, which lines up with the 13ish MBs of non test data that you're seeing.
(this is around a third of the unzipped wheel size)

Is there something in particular that worries you about the implementation?

(r.e. the breaking points: IMO, this does not affect regular CI at all and should only affect the wheel builders.

I do understand that this adds quite a bit of complexity to the build process, though.
Most of the added complexity should be contained to just to the wheel builders workflow and not to the core Python files of pandas itself.

Do let me know if there's something I can simplify more or clarify/document better.
)

@lithomas1
Copy link
Member Author

I would still prefer to have every PR run 1 job (Ubuntu only) pandas wheel without tests + pandas_tests wheel to validate that the connection between the two is never broken.

Our CI jobs are relatively streamlined now and I think this job is definitely more important than some of the checks we run on every PR

Gotcha.

I guess the easiest route would be to run a subset of the wheel builders on every PR. I'll do that in a followup.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK thanks @lithomas1 and sorry for the ignorance. I've left a few comments where I have concerns

# (unlike Github Actions there is no host access using cibuildwheel with CircleCI)
name: Build the sdist
command: |
pip3 install build setuptools-scm wheel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is setuptools-scm a misnomer or does it have some kind of dependency with setuptools?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use it to version the pandas-tests package?

Setting up versioneer is probably overkill for pandas-tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm not sure I understand. So this does still require setuptools right? I think that is confusing given the work we have put into meson to replace that library

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, pandas-tests is built using setuptools.

I figured it was much easier to use setuptools given this is a pure Python package.

(Note that if you're just doing regular development you'll never need setuptools, unless you want to build pandas_tests by hand for some reason)

.circleci/config.yml Show resolved Hide resolved
.github/workflows/wheels.yml Show resolved Hide resolved
.github/workflows/wheels.yml Show resolved Hide resolved
@@ -1,120 +1,25 @@
"""
This file is very long and growing, but it was decided to not split it yet, as
it's still manageable (2020-03-17, ~1.1k LoC). See gh-31989
Just a conftest file for doctest stuff
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I missed in the thread but why do we need a doctest conftest separate from the test confest? This doesn't work to move everything?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Library building on various platforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Suggestion: remove tests from the distribution
9 participants