-
-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN: Implement io modules as plugins #26804
Comments
generally agree. where would
some of this functionality and again |
I was thinking more in the ones listed in http://pandas.pydata.org/pandas-docs/stable/user_guide/io.html than in methods like I agree the |
the to_ methods could benefit from a more unified API. and perhaps using the I think that for the HTML (for notebooks) especially, a core implementation is needed but also the ability to use 3rd party plugins for additional capability like cdn DataTables. enhancements like to_markdown #11052, could benefit from a unified api, a base class to extend, but not be part of pandas itself. |
@datapythonista you are saying a lot of things, but aside from creating massive churn, what problem are you actually trying to solve here?
how does this proposal actually help here? The vast majority of generic.py code has nothing to do with this < 10% at most); look at to_gbq and read_gbq; these wrappers actually take up very little space.
again how does your proposal do anything for this?
not even sure what this means
again how does your propsal actually address any of this
and what would you replace this with? or are you proposing no testing? your general statements are more or less aggreeable, but these lack any specific advantages I think. I was expecting you to say: we should move all of the io routines to a public .io accessor to modularize them. I would actually be in favor of this, but your proposal is focused around the actual code layout / implementation and I fail to see much benefit here. h |
In Regarding dependencies, with this proposal we should be able to have the imports in every IO module normally at the top of files. And not in If every module in I'm also +1 on a The main advantage I see in this proposal is having a uniform and decoupled way of writing io modules. For me it makes a huge difference when maintaining those modules if:
|
we could probably do this now using pytest custom markers. |
FWIW I would be against moving tests. I think that would make packaging more complex and always runs the risk of messing with test discovery and subtly not running things.
… On Jun 12, 2019, at 8:04 AM, Simon Hawkins ***@***.***> wrote:
If every module in pandas/contrib/io is defining its own tests, we can in the CI (or locally) call things like pytest pandas/contrib/io/stata pandas/contrib/io/parquet and be explicit on what we're testing. This way, we don't need skip_if_no because we won't use the dependencies to decide what's being tested.
we could probably do this now using pytest custom markers.
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub <#26804?email_source=notifications&email_token=AAEU4UPSPPBMZZXWDHMHIMTP2DQ43A5CNFSM4HXGYTA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXQF3WI#issuecomment-501243353>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEU4ULQPXFAIPBFOJMGLCLP2DQ43ANCNFSM4HXGYTAQ>.
|
format.py being in io never felt right to me. Most of our classes rely on it for their If (part of?) format.py were moved to |
I don't really have an opinion on the proposed reorganization of implementations / test files.
I think I would be against any magic like this. I'm pretty happy with the |
(following the discussion in #26710)
Currently, most of the
io
functionality (csv, json, html, pickle, stata...) lives inpandas/io
. But theto_*
functions are implemented inpandas/core/[generic|frame|series].py
. The tests live inpandas/tests/io
. And their dependencies, are not explicit anywhere afaik, they are imported lazily so the needed libraries are reported once the function is used (some are listed inenvironment.yml
but not all).I propose to move every io type to a directory in
pandas/contrib/io
that will include:read_*
andto_*
functionsSo, a example structure could be:
To call the functionality we could simply have something like this in
Series
,DataFrame
(also something similar forread_*
), but other ideas welcome:I see several advantages here:
generic.py
has more than 11k lines of code, a significant part are io related, same forframe.py
with 8k...)skip_if_no
that cause problems and tests stop being run without noticingNot part of this proposal, but I think in the future we could also move to
contrib
other parts that are not IO but could also benefit from being decoupled, like plotting or the extension arrays (that's why I thinkcontrib/io/
makes sense, so we can havecontrib/plotting
... in the future).CC: @pandas-dev/pandas-core
The text was updated successfully, but these errors were encountered: