Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API/ERR: allow iterators in df.set_index & improve errors #24984

Merged
merged 36 commits into from Feb 24, 2019

Conversation

@h-vetinari
Copy link
Contributor

commented Jan 28, 2019

  • closes the parts of #22484 (resp. those worth keeping) that were reverted in #25085 due to #24969 closes #24969
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff

This is a quick fix for the regression - however, I think this should be immediately (i.e. 0.24.1) be deprecated. I haven't yet added a deprecation warning here, pending further discussion in the issue.

@jorisvandenbossche @TomAugspurger @jreback

or isinstance(keys, (ABCIndexClass, ABCSeries, np.ndarray))):
# make sure we have a container of keys/arrays we can iterate over
# tuples can appear as valid column keys!
if not isinstance(keys, list):
keys = [keys]

This comment has been minimized.

Copy link
@h-vetinari

h-vetinari Jan 28, 2019

Author Contributor

If supporting custom types, we need to go back to the state pre-#22486 that just puts everything that's not listlike in list. Otherwise we can't guarantee that iteration below will work.

This comment has been minimized.

Copy link
@jreback

jreback Jan 30, 2019

Contributor

I think that is best, let's just revert entirely to pre 22486

This comment has been minimized.

Copy link
@h-vetinari

h-vetinari Jan 30, 2019

Author Contributor

pre-#22486 had some other problems we do not need to re-instate (e.g. weird KeyErrors if it's an iter).

I agree that the code complexity must stay maintainable, and so the most reasonable thing (IMO), ist to just not bother trying to "fix" unhashable custom types being used as keys. At that point, the user is so far gone off the reservation that it's really not our job to give them a good error message (again, even the repr of such a frame would crash).

# everything else gets tried as a key; see GH 24969
try:
self[col]
str(col)

This comment has been minimized.

Copy link
@wkschwartz

wkschwartz Jan 28, 2019

Can str(col) normally raise a KeyError?

This comment has been minimized.

Copy link
@h-vetinari

h-vetinari Jan 28, 2019

Author Contributor

The error might be different, that's true. On second thought, str defaults to repr if it's not implemented, so unless the class manages to break its own repr (haha), this should not be necessary.

@codecov

This comment has been minimized.

Copy link

commented Jan 28, 2019

Codecov Report

Merging #24984 into master will decrease coverage by <.01%.
The diff coverage is 90%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24984      +/-   ##
==========================================
- Coverage   92.38%   92.38%   -0.01%     
==========================================
  Files         166      166              
  Lines       52400    52405       +5     
==========================================
+ Hits        48409    48413       +4     
- Misses       3991     3992       +1
Flag Coverage Δ
#multiple 90.8% <90%> (-0.01%) ⬇️
#single 42.88% <10%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/frame.py 96.88% <90%> (-0.05%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bf693ff...3e01681. Read the comment docs.

@codecov

This comment has been minimized.

Copy link

commented Jan 28, 2019

Codecov Report

Merging #24984 into master will decrease coverage by <.01%.
The diff coverage is 90.47%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24984      +/-   ##
==========================================
- Coverage   91.73%   91.73%   -0.01%     
==========================================
  Files         173      173              
  Lines       52856    52877      +21     
==========================================
+ Hits        48490    48509      +19     
- Misses       4366     4368       +2
Flag Coverage Δ
#multiple 90.3% <90.47%> (ø) ⬆️
#single 41.69% <52.38%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/compat/__init__.py 58.03% <50%> (-0.07%) ⬇️
pandas/core/frame.py 96.85% <94.73%> (-0.03%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3855a27...5f99b15. Read the comment docs.

@TomAugspurger
Copy link
Contributor

left a comment

Thanks for the PR.

I'm not sure what should be done going forward, or whether we should have a deprecation now. @toobaz may have thoughts here too. I'm fine with leaving that for a comprehensive "what's allowed an Index" overhaul.

Show resolved Hide resolved pandas/tests/frame/test_alter_axes.py Outdated
@wkschwartz

This comment has been minimized.

Copy link

commented Jan 28, 2019

Reposting #24969 (comment) here in case the discussion of the deprecation of custom label types proceeds here instead of at the original ticket.

Obviously I would prefer no deprecation of custom label types as they are integral to my company’s applications. However, I would urge strongly that if you do decide to deprecate the feature, you do so starting only in the next major release (presumably 0.25.0) rather than in a minor release (0.24.1).

If you do stop supporting custom label types and I am not to be stuck at Pandas 0.23.4 forever, I could theoretically undertake the (expensive) refactoring to use unique IDs (my production code has the equivalent of the name field from my toy example in the OP). However, other users whose code would break might not have convenient unique IDs to switch to. Please do not remove this feature lightly.

@toobaz

This comment has been minimized.

Copy link
Member

commented Jan 28, 2019

@toobaz may have thoughts here too

I'm boring, I always have the same thoughts, every time the discussion comes out :-) And they are:

  • if we didn't say it's not allowed, it's allowed, and should work
  • if there is no reason to disallow it, let's not say it's not allowed
  • if we think that the reason to disallow it is that it makes code simpler, we are almost always wrong. Clean code will work with any type a user throws in, as long as it satisfies general properties (e.g. __hash__). Viceversa, explicitly listing the types we "like" as scalar will require us to amend the list (and the docs) every time we find some new type that yes, we think is worth allowing.
  • (a valid reason to disallow it is that either it brings ambiguity, or it changes the semantics, as would be accepting lists as keys)
@toobaz

This comment has been minimized.

Copy link
Member

commented Jan 28, 2019

Don't get me wrong: it is good to have this discussion once and for all if it results in cleaner docs/assumptions. But my opinion is that the docs should say "you can use as keys anything which is not mutable".

@wkschwartz

This comment has been minimized.

Copy link

commented Jan 28, 2019

But my opinion is that the docs should say "you can use as keys anything which is not mutable".

"Not mutable" is a bit fuzzy in Python. If, instead, keys are required to be hashable (which usually implies some sort of recursive immutability), then users already familiar with Python dicts would immediately get it, you can test for key-eligibility with isinstance(potential_key, collections.abc.Hashable) or try: hash(potential_key) \n except TypeError: ..., and you'll maintain backward compatibility. Moreover, you'll always be able to convert a DataFrame to/from dicts, which most folks already assume they can do freely (at least as far as I've seen).

The documentation should then also have a sentence or two about the importance of immutability and compatibility between the object's __hash__ and __eq__ implementations. A mention of ints, strings, tuples, namedtuples, and dataclass's frozen parameter would then give users some ideas.

In the Python 3.7.2 session below, a is mutable, b is technically immutable, but not sufficiently so for its hash to work (suggesting that isinstance(potential_key, collections.abc.Hashable) isn't quite good enough).

>>> a = []
>>> b = (a,)
>>> hash(b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
@toobaz

This comment has been minimized.

Copy link
Member

commented Jan 28, 2019

"Not mutable" is a bit fuzzy in Python. If, instead, keys are required to be hashable

Yep, my mistake, "hashability" is the right property.

Show resolved Hide resolved pandas/tests/frame/test_alter_axes.py Outdated
@h-vetinari

This comment has been minimized.

Copy link
Contributor Author

commented Jan 28, 2019

I'm not a core dev, but FWIW, here are some counter points to @toobaz arguments:

if there is no reason to disallow it, let's not say it's not allowed

If the code is not designed and tested for it, such a carte blanche is (not in this example, but across 1000s of small cuts) an immense burden down the road when something that wasn't explicitly forbidden suddenly breaks.

if we didn't say it's not allowed, it's allowed, and should work

See lack of testing above. Duck typing really bites us in the ass here. Stability (a possible SemVer after 1.0) even more.

if we think that the reason to disallow it is that it makes code simpler, we are almost always wrong. Clean code will work with any type a user throws in, as long as it satisfies general properties (e.g. __hash__).

The previous behaviour was specifically causing some weird KeyErrors for stuff that should have never been tested as a key (hence I'm adding a try-except in this PR), see #22484. I could have easily maintained this capability, if it had been tested, or even documented.

Viceversa, explicitly listing the types we "like" as scalar will require us to amend the list (and the docs) every time we find some new type that yes, we think is worth allowing.

That - in contrast to the above - would make it a controlled (non-breaking) expansion where necessary (IMO far preferable). I'm also thinking the ecosystem has matured enough by now to broadly have a common understanding what "scalar" means.

(a valid reason to disallow it is that either it brings ambiguity, or it changes the semantics, as would be accepting lists as keys)

That line can be blurred arbitrarily far (see my example in the issue).

Don't get me wrong: it is good to have this discussion once and for all if it results in cleaner docs/assumptions. But my opinion is that the docs should say "you can use as keys anything which is not mutable".

In #24702, it sounded like you're advocating against using tuples as keys (which I would agree with), which also aren't mutable are hashable

IMO, the API should have a clearly defined surface, not an arbitrary "whatever happens to work". The latter would mean (and already does to some extent) a disproportionate amount of dev time to chase down weird corner cases, rather than focus on a consistent API that's useful and unambiguous for 99.99% of the cases.

@toobaz

This comment has been minimized.

Copy link
Member

commented Jan 28, 2019

If the code is not designed and tested for it

You test lines of code, not user behavior. Otherwise tests suites would have to be infinite. You write code that satisfies properties, not just that doesn't break your tests. Tests are a very useful addition, but you will never test any possible combination of parameters, even if you list one by one the possible values they can take. Otherwise, since duck typing is the norm in Python, you would be accusing a very large number of projects of not being able to write effective tests.

Notice, by the way, that "custom types" is not a precise definition. People might feed our indices with types that they did not create, but other libraries did.

The previous behaviour was specifically causing some weird KeyErrors for stuff that should have never been tested as a key (hence I'm adding a try-except in this PR), see #22484.

Feel free to expand, I'm not following you. Notice that Index subclasses have all the right to raise TypeErrors, because there the types are clearly defined.

In #24702, it sounded like you're advocating against using tuples as keys

Not at all. I said multiple times in many other discussions that tuples (with hashable content) are perfectly valid keys. I actually used exactly the same arguments I'm using here, that's why I'm boring :-)

IMO, the API should have a clearly defined surface, not an arbitrary "whatever happens to work"

That's sure. I'm being very clear (although @wkschwartz beat me at it) on what the API should support. And it should work on what it supports.

That line can be blurred arbitrarily far (see my example in the issue).

Not following you. I'm replying there.

@h-vetinari

This comment has been minimized.

Copy link
Contributor Author

commented Jan 28, 2019

You test lines of code, not user behavior. Otherwise tests suites would have to be infinite.

Not if there's a well-defined set of input types. ;-)

Feel free to expand, I'm not following you.

The iterator that gets passed to df.set_index (see OP of #22484), say, in the expectation of its values being used as the index, somehow gets turned into an Index and then raises a KeyError. That's one of several confusing examples.

That's sure. I'm being very clear (although @wkschwartz beat me at it) on what the API should support. And it should work on what it supports.

Ok, that's a more nuanced discussion then (vs. "everything that works is allowed"). If all hashable objects should be usable as keys, we'd have to drastically improve our testing for it, but it's not impossible to try to support it, certainly. I just think it's far too wide a line to draw. I mentioned in the issue that it would be more feasible to add dataclasses specifically to the types we test, rather than all hashables (and to be sure: my point is not about @wkschwartz use case - much less against him).

@toobaz

This comment has been minimized.

Copy link
Member

commented Jan 28, 2019

If all hashable objects should be usable as keys, we'd have to drastically improve our testing for it, but it's not impossible

Again, non sequitur. Testing is good, but we want to write code that works even in those cases in which it was not tested.

Not if there's a well-defined set of input types. ;-)

OK, let's cut it short: will you soon be asking us to limit the values of Series/DataFrames to well-defined set of input types? I hope not. Do you think this is a problem for our testing suite? I hope not.

If you need to test any possible combination of allowed input types with every possible behavior of every method in the pandas API, tests will be just ''slightly short'' of infinite.

(see OP of #22484)

I see three points there, and not sure if any of them is a problem. Can you assume I'm even more stupid than I am?

General comment: I still wasn't able to understand if this regression was a mistake or was made on purpose. In the second case, while I can only blame myself for not being able to keep up with the PRs, I still think such a change deserved some more discussion, e.g. in mailing list.

@wkschwartz

This comment has been minimized.

Copy link

commented Jan 28, 2019

To expand on @toobaz's #24984 (comment): in a duck-typed language, APIs should be based on protocols, interfaces, or capabilities, not on exact class hierarchies. For example, Python core does fine defining the API surface of dict keys as anything hashable.

Regarding whether that which is not forbidden is allowed (TWINFIA): Pandas' having a version < 1 means the maintainers have the right to change semantics. Please balance that right against the mass of established code bases in production that rely on Pandas. I couldn't be the only author to assume TWINFIA.

Thank you, @h-vetinari and @toobaz, for taking an interest in this!

@h-vetinari

This comment has been minimized.

Copy link
Contributor Author

commented Jan 29, 2019

Again, non sequitur. Testing is good, but we want to write code that works even in those cases in which it was not tested.

I agree that properly written code, that only ever depends on hashability for keys, is possible. But many methods (not just set_index) need to do a lot of gynmastics for wildly varying input signatures - in this case, column keys, list of column keys, various arrays, lists of arrays, and mixes between arrays and keys.
This to me is a large reason of the utility of pandas, because very different (but equally sensible) scenarios just work. Allowing arbitrary types for keys makes this "magic" much more complicated, but OK, opinions can differ what should be the design goal.

Not if there's a well-defined set of input types. ;-)

OK, let's cut it short: [strawman]

Let me clarify my language here: it's certainly possible to test methods against various container types that it should accept. What's in those containers is another story.

(see OP of #22484)

I see three points there, and not sure if any of them is a problem. Can you assume I'm even more stupid than I am?

I know you're not (stupid), but ok, here's another try:

>>> import pandas as pd
>>> import numpy as np
>>>
>>> df = pd.DataFrame(np.random.randn(5, 5), columns=list('ABCDE'))
>>>
>>> # Series-constructor uses iterators as if it was an array
>>> my_iter = map(str, df.A)
>>> pd.Series(my_iter)
0     0.8812414814169731
1    -1.0789948340108944
2    -0.2049936946205116
3    0.35952380472779194
4     -0.529524992759882
>>>
>>> # so why not do the same for df.set_index, which *also* works with arrays?
>>> my_iter = map(str, df.A)  # had been consumed previously
>>> df.set_index(my_iter)
Traceback (most recent call last):
[...]
KeyError: "None of [Index(['0.8812414814169731', '-1.0789948340108944', '-0.2049936946205116',\n       '0.35952380472779194', '-0.529524992759882'],\n      dtype='object')] are in the [columns]"

There's a couple things that are confusing here. How did an iterator get turned into an Index object, without any further user-input? And why is that Index being tested as a column key?!

General comment: I still wasn't able to understand if this regression was a mistake or was made on purpose.

There was no informed decision on this regression because it was neither tested nor documented (if the documented standard had been "all hashable objects can be key", it would have been easy to conform to that). The problems were much more mundane, i.e. having to inspect the items of the outer-most container and determine whether they are keys / Index / Series / MultiIndex / np.ndarray. Through several rounds of review @jreback then added the review requirment to add objects that pass is_list_like (which got reverted just before the release) but that would have still broken this use case here.

The common ground here is (as we've seen above) to have a well-defined API. If what can be keys is explicitly documented somewhere, it can be kept in mind for whatever code that's being written, and hashability would be one of several valid choices for that.

My main point above was that enforcing code to stay runnable that only works because of an oversight (and there's a bunch of that in pandas, no matter how much we strive for clean code) is way too generous.

@toobaz

This comment has been minimized.

Copy link
Member

commented Jan 29, 2019

But many methods (not just set_index) need to do a lot of gynmastics for wildly varying input signatures

Claim: identifying lists (more in general, containers) requres gymnastics (and, incidentally, detailed docs), while identifying valid keys is extremely simple if their definition is "hashable object". And requires more complicated code, more complicated docs, and more waste of the user's memory, if the definition is more complicated than that.

As in: ExtensionArray (and having even our own types rely on them) is already making (I think) the pandas codebase cleaner, and easier to maintain, precisely because we gave up setting the admissible values.

I know you're not (stupid), but ok, here's another try:

Thanks, I appreciate. You are perfectly right that pandas' definition of "list-like" is a mess, and that's a ground where we desperately need to set standards. Or maybe, we already all agree on standards (iterators are list-like, tuples are not), and there is just annoying legacy code that doesn't follow them (for sure is_list_like doesn't).

I think that's not the case for keys, where things are... simple.

EDIT: sure, list-likes and keys definitions are not so orthogonal as I am putting them. I'm clearly assuming that if something is a list-like, then it is not a key, by definition. So if the user has hashable list-likes, the user will need to know that they are not valid keys (as they are valid containers). But "outlawing" types which are not list-likes is orthogonal.

@TomAugspurger TomAugspurger added this to the 0.24.1 milestone Jan 29, 2019

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Jan 29, 2019

Steering the discussion back to the PR: what needs to be done for 0.24.1? IMO, the changes and tests look good, just need a release note under "fixed regressions". Anything else?

@jorisvandenbossche
Copy link
Member

left a comment

Looks good to me, just a minor comment, and the whatsnew note that Tom asked for.

else:
# everything else gets tried as a key; see GH 24969
try:
self[col]

This comment has been minimized.

Copy link
@jorisvandenbossche

jorisvandenbossche Jan 29, 2019

Member

I think doing col in self.columns is a bit more efficient? (doesn't need construct the series)
Although then you also need to care about the False case

This comment has been minimized.

Copy link
@h-vetinari

h-vetinari Jan 29, 2019

Author Contributor

I tried, but col in self.columns works without the hashing, and even if check

hash(col) and col in self.columns

this does not catch the iter case (an iterator is hashable, who would have thought...)

This comment has been minimized.

Copy link
@jorisvandenbossche

jorisvandenbossche Jan 30, 2019

Member

Why is the hashing needed? We just need to know whether it is a column name or not?
Or you want to raise a different error message?

This comment has been minimized.

Copy link
@jorisvandenbossche

jorisvandenbossche Jan 30, 2019

Member

And it certainly checks hashing for some cases:

In [5]: {1} in pd.Index([1, 2, 3])
...
TypeError: unhashable type: 'set'

(just not an iter because as you said, it is hashable)

@jreback jreback added the Reshaping label Jan 29, 2019

@h-vetinari
Copy link
Contributor Author

left a comment

Show resolved Hide resolved pandas/tests/frame/test_alter_axes.py Outdated
@h-vetinari
Copy link
Contributor Author

left a comment

@jreback PTAL

Show resolved Hide resolved pandas/tests/frame/test_alter_axes.py Outdated
@h-vetinari
Copy link
Contributor Author

left a comment

Thanks for the review, PTAL

@@ -21,6 +21,7 @@ Other Enhancements

- :meth:`Timestamp.replace` now supports the ``fold`` argument to disambiguate DST transition times (:issue:`25017`)
- :meth:`DataFrame.at_time` and :meth:`Series.at_time` now support :meth:`datetime.time` objects with timezones (:issue:`24043`)
- :meth:`DataFrame.set_index` now works for instances of ``abc.Iterator``, provided their output is of the same length as the calling frame (:issue:`22484`)

This comment has been minimized.

Copy link
@h-vetinari

h-vetinari Feb 19, 2019

Author Contributor

I updated the OP to clarify, and added the this PR as you asked

Show resolved Hide resolved pandas/tests/frame/test_alter_axes.py
@h-vetinari

This comment has been minimized.

Copy link
Contributor Author

commented Feb 19, 2019

One of the azure jobs failed with a ResourceWarning (they're back? yay!)

[...]
s...........sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name=0 mode='r' encoding='UTF-8'>

=================================== FAILURES ===================================
______________ test_chunks_have_consistent_numerical_type[python] ______________
[gw0] linux -- Python 3.6.8 /home/vsts/miniconda3/envs/pandas-dev/bin/python

all_parsers = <pandas.tests.io.parser.conftest.PythonParser object at 0x7f37938b8b00>

    def test_chunks_have_consistent_numerical_type(all_parsers):
        parser = all_parsers
        integers = [str(i) for i in range(499999)]
        data = "a\n" + "\n".join(integers + ["1.0", "2.0"] + integers)
    
        # Coercions should work without warnings.
        with tm.assert_produces_warning(None):
>           result = parser.read_csv(StringIO(data))

pandas/tests/io/parser/test_common.py:1078: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <contextlib._GeneratorContextManager object at 0x7f37472facc0>
type = None, value = None, traceback = None

    def __exit__(self, type, value, traceback):
        if type is None:
            try:
>               next(self.gen)
E               AssertionError: Caused unexpected warning(s): [('ResourceWarning', ResourceWarning('unclosed <ssl.SSLSocket [closed] fd=18, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6>',), '/home/vsts/work/1/s/pandas/io/parsers.py', 2854)].

Could someone please restart it? @jreback @TomAugspurger @jorisvandenbossche @gfyoung

@h-vetinari

This comment has been minimized.

Copy link
Contributor Author

commented Feb 19, 2019

Thanks to whoever restarted the job. Unfortunately, it seems to be a non-transient break (which is good, at least from the point of hunting ResourceWarnings, but bad for what else I should be accomplishing today).

My operating assumption is that a new boto release broke something, but unfortunately I can't see the version in the build script. The conda list call comes after the environment is installed.

@h-vetinari

This comment has been minimized.

Copy link
Contributor Author

commented Feb 19, 2019

Added the conda list call in #25377 and merged it in here.

@h-vetinari

This comment has been minimized.

Copy link
Contributor Author

commented Feb 19, 2019

Nevermind, this is passing again. Seems those ResourceWarnings will have to wait for another day.

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Feb 20, 2019

Since it is passing again, can you then remove the conda list changes here?

@h-vetinari

This comment has been minimized.

Copy link
Contributor Author

commented Feb 20, 2019

@jorisvandenbossche
Feel free to revert the last commit yourself, otherwise I'll do it tonight (Europe).

@@ -23,6 +23,9 @@ set +v
source activate pandas-dev
set -v

# Display pandas-dev environment (for debugging)

This comment has been minimized.

Copy link
@jreback

jreback Feb 20, 2019

Contributor

yeah pls revert all of this

@h-vetinari

This comment has been minimized.

Copy link
Contributor Author

commented Feb 20, 2019

@jreback @jorisvandenbossche
This should hopefully be done now.

@@ -22,6 +22,7 @@ Other Enhancements
- Indexing of ``DataFrame`` and ``Series`` now accepts zerodim ``np.ndarray`` (:issue:`24919`)
- :meth:`Timestamp.replace` now supports the ``fold`` argument to disambiguate DST transition times (:issue:`25017`)
- :meth:`DataFrame.at_time` and :meth:`Series.at_time` now support :meth:`datetime.time` objects with timezones (:issue:`24043`)
- :meth:`DataFrame.set_index` now works for instances of ``abc.Iterator``, provided their output is of the same length as the calling frame (:issue:`22484`, :issue:`24984`)

This comment has been minimized.

Copy link
@jreback

jreback Feb 21, 2019

Contributor

this is just a code reorg yes? with slightly better error messages in some cases?

This comment has been minimized.

Copy link
@h-vetinari

h-vetinari Feb 21, 2019

Author Contributor

@jreback:
Minimal reorg, just better errors and enabling iterators (due to your review).

The custom classes were re-enabled by #25085 (which took over the tests from this PR), which closed the regression #24969, and has a corresponding whatsnew note. I guess the issue didn't get closed yet, because I only noted that #25085 was an alternative to this PR (at the time, for solving #24969), but didn't add the "closes #24969" explicitly - sorry.

@h-vetinari

This comment has been minimized.

Copy link
Contributor Author

commented Feb 24, 2019

@jreback
Can we push this one over the finish line? (new commit was just due to whatsnew conflict)

@jreback jreback merged commit 5ae9b48 into pandas-dev:master Feb 24, 2019

11 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
pandas-dev.pandas Build #20190224.3 succeeded
Details
pandas-dev.pandas (Checks_and_doc) Checks_and_doc succeeded
Details
pandas-dev.pandas (Linux py27_locale_slow_old_np) Linux py27_locale_slow_old_np succeeded
Details
pandas-dev.pandas (Linux py27_np_120) Linux py27_np_120 succeeded
Details
pandas-dev.pandas (Linux py36_locale_slow) Linux py36_locale_slow succeeded
Details
pandas-dev.pandas (Linux py37_locale) Linux py37_locale succeeded
Details
pandas-dev.pandas (Linux py37_np_dev) Linux py37_np_dev succeeded
Details
pandas-dev.pandas (Windows py27_np121) Windows py27_np121 succeeded
Details
pandas-dev.pandas (Windows py36_np14) Windows py36_np14 succeeded
Details
pandas-dev.pandas (macOS py35_np_120) macOS py35_np_120 succeeded
Details

@h-vetinari h-vetinari deleted the h-vetinari:set_index_custom branch Feb 24, 2019

@h-vetinari h-vetinari referenced this pull request Feb 24, 2019

Closed

DEPR/API: disallow lists within list for set_index #24697

5 of 5 tasks complete

thoo added a commit to thoo/pandas that referenced this pull request Feb 28, 2019

Merge remote-tracking branch 'upstream/master' into Rt05
* upstream/master:
  DOC: CategoricalIndex doc string (pandas-dev#24852)
  CI: add __init__.py to isort skip list (pandas-dev#25455)
  TST: numpy RuntimeWarning with Series.round() (pandas-dev#25432)
  DOC: fixed geo accessor example in extending.rst (pandas-dev#25420)
  BUG: fixed merging with empty frame containing an Int64 column (pandas-dev#25183) (pandas-dev#25289)
  TST: remove never-used singleton fixtures (pandas-dev#24885)
  PERF/REF: improve performance of Series.searchsorted, PandasArray.searchsorted, collect functionality (pandas-dev#22034)
  BUG: Indexing with UTC offset string no longer ignored (pandas-dev#25263)
  API/ERR: allow iterators in df.set_index & improve errors (pandas-dev#24984)
  DOC: Rewriting of ParserError doc + minor spacing (pandas-dev#25421)
  ENH: Add in sort keyword to DatetimeIndex.union (pandas-dev#25110)
  ERR: doc update for ParsingError (pandas-dev#25414)
  BUG: Fix type coercion in read_json orient='table' (pandas-dev#21345) (pandas-dev#25219)
  DEP: add pytest-mock to environment.yml (pandas-dev#25417)
  Correct a typo of version number for interpolate() (pandas-dev#25418)
  Mark test_pct_max_many_rows as high memory (pandas-dev#25400)
  DOC: Edited docstring of Interval (pandas-dev#25410)

Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

the-nose-knows added a commit to the-nose-knows/pandas that referenced this pull request Mar 9, 2019

upstream sync (#1)
* ERR/TST: Add pytest idiom to dtypes/test_cast.py (pandas-dev#24847)

* fix MacPython pandas-wheels failue (pandas-dev#24851)

* DEPS: Bump pyarrow min version to 0.9.0 (pandas-dev#24854)

Closes pandas-devgh-24767

* DOC: Document AttributeError for accessor (pandas-dev#24855)

Closes pandas-dev#20579

* Start whatsnew for 0.24.1 and 0.25.0 (pandas-dev#24848)

* DEPR/API: Non-ns precision in Index constructors (pandas-dev#24806)

* BUG: Format mismatch doesn't coerce to NaT (pandas-dev#24815)

* BUG: Properly parse unicode usecols names in CSV (pandas-dev#24856)

* CLN: fix typo in asv eval.Query suite (pandas-dev#24865)

* BUG: DataFrame respects dtype with masked recarray (pandas-dev#24874)

* REF/CLN: Move private method (pandas-dev#24875)

* BUG : ValueError in case on NaN value in groupby columns (pandas-dev#24850)

* BUG: fix floating precision formatting in presence of inf (pandas-dev#24863)

* DOC: Creating top-level user guide section, and moving pages inside (pandas-dev#24677)

* DOC: Creating top-level development section, and moving pages inside (pandas-dev#24691)

* DOC: Creating top-level getting started section, and moving pages inside (pandas-dev#24678)

* DOC: Implementing redirect system, and adding user_guide redirects (pandas-dev#24715)

* DOC: Implementing redirect system, and adding user_guide redirects

* Using relative urls for the redirect

* Validating that no file is overwritten by a redirect

* Adding redirects for getting started and development sections

* DOC: fixups (pandas-dev#24888)

* Fixed heading on whatnew
* Remove empty scalars.rst

* CLN: fix typo in ctors.SeriesDtypesConstructors setup (pandas-dev#24894)

* DOC: No clean in sphinx_build (pandas-dev#24902)

Closes pandas-dev#24727

* BUG (output formatting): use fixed with for truncation column instead of inferring from last column (pandas-dev#24905)

* DOC: also redirect old whatsnew url (pandas-dev#24906)

* Revert BUG-24212 fix usage of Index.take in pd.merge (pandas-dev#24904)

* Revert BUG-24212 fix usage of Index.take in pd.merge

xref pandas-dev#24733
xref pandas-dev#24897

* test 0.23.4 output

* added note about buggy test

* DOC: Add experimental note to DatetimeArray and TimedeltaArray (pandas-dev#24882)

* DOC: Add experimental note to DatetimeArray and TimedeltaArray

* Disable M8 in nanops (pandas-dev#24907)

* Disable M8 in nanops

Closes pandas-dev#24752

* CLN: fix typo in asv benchmark of non_unique_sorted, which was not sorted (pandas-dev#24917)

* API/VIS: remove misc plotting methods from plot accessor (revert pandas-dev#23811) (pandas-dev#24912)

* DOC: some 0.24.0 whatsnew clean-up (pandas-dev#24911)

* DOC: Final reorganization of documentation pages (pandas-dev#24890)

* DOC: Final reorganization of documentation pages

* Move ecosystem to top level

* DOC: Adding redirects to API moved pages (pandas-dev#24909)

* DOC: Adding redirects to API moved pages

* DOC: Making home page links more compact and clearer (pandas-dev#24928)

* DOC: 0.24 release date (pandas-dev#24930)

* DOC: Adding version to the whatsnew section in the home page (pandas-dev#24929)

* API: Remove IntervalArray from top-level (pandas-dev#24926)

* RLS: 0.24.0

* DEV: Start 0.25 cycle

* DOC: State that we support scalars in to_numeric (pandas-dev#24944)

We support it and test it already.

xref pandas-devgh-24910.

* DOC: Minor what's new fix (pandas-dev#24933)

* TST: GH#23922 Add missing match params to pytest.raises (pandas-dev#24937)

* Add tests for NaT when performing dt.to_period (pandas-dev#24921)

* DOC: switch headline whatsnew to 0.25 (pandas-dev#24941)

* BUG-24212 fix regression in pandas-dev#24897 (pandas-dev#24916)

* CLN: reduce overhead in setup for categoricals benchmarks in asv (pandas-dev#24913)

* Excel Reader Refactor - Base Class Introduction (pandas-dev#24829)

* TST/REF: Add pytest idiom to test_numeric.py (pandas-dev#24946)

* BLD: silence npy_no_deprecated warnings with numpy>=1.16.0 (pandas-dev#24864)

* CLN: Refactor cython to use memory views (pandas-dev#24932)

* DOC: Clean sort_values and sort_index docstrings (pandas-dev#24843)

* STY: use pytest.raises context syntax (indexing) (pandas-dev#24960)

* Fixed itertuples usage in to_dict (pandas-dev#24965)

* Fixed itertuples usage in to_dict

Closes pandas-dev#24940
Closes pandas-dev#24939

* STY: use pytest.raises context manager (resample) (pandas-dev#24977)

* DOC: Document breaking change to read_csv (pandas-dev#24989)

* DEPR: Fixed warning for implicit registration (pandas-dev#24964)

*  STY: use pytest.raises context manager (indexes/datetimes) (pandas-dev#24995)

* DOC: move whatsnew note of pandas-dev#24916 (pandas-dev#24999)

* BUG: Fix broken links (pandas-dev#25002)

The previous location of contributing.rst file was
/doc/source/contributing.rst but has been moved to
/doc/source/development/contributing.rst

* fix for BUG: grouping with tz-aware: Values falls after last bin (pandas-dev#24973)

* REGR: Preserve order by default in Index.difference (pandas-dev#24967)

Closes pandas-dev#24959

* CLN: do not use .repeat asv setting for storing benchmark data (pandas-dev#25015)

* CLN: isort asv_bench/benchmark/algorithms.py (pandas-dev#24958)

* fix+test to_timedelta('NaT', box=False) (pandas-dev#24961)

* PERF: significant speedup in sparse init and ops by using numpy in check_integrity (pandas-dev#24985)

* BUG: Fixed merging on tz-aware (pandas-dev#25033)

* Test nested PandasArray (pandas-dev#24993)

* DOC: fix error in documentation pandas-dev#24981 (pandas-dev#25038)

* BUG: support dtypes in column_dtypes for to_records() (pandas-dev#24895)

* Makes example from docstring work (pandas-dev#25035)

* CLN: typo fixups (pandas-dev#25028)

* BUG: to_datetime(strs, utc=True) used previous UTC offset (pandas-dev#25020)

* BUG: Better handle larger numbers in to_numeric (pandas-dev#24956)

* BUG: Better handle larger numbers in to_numeric

* Warn about lossiness when passing really large
numbers that exceed (u)int64 ranges.

* Coerce negative numbers to float when requested
instead of crashing and returning object.

* Consistently parse numbers as integers / floats,
even if we know that the resulting container has
to be float. This is to ensure consistent error
behavior when inputs numbers are too large.

Closes pandas-devgh-24910.

* MAINT: Address comments

* BUG: avoid usage in_qtconsole for recent IPython versions (pandas-dev#25039)

* Drop IPython<4.0 compat

* Revert "Drop IPython<4.0 compat"

This reverts commit 0cb0452.

* update
a
* whatsnew

* REGR: fix read_sql delegation for queries on MySQL/pymysql (pandas-dev#25024)

* DOC: Start 0.24.2.rst (pandas-dev#25026)

[ci skip]

* REGR: rename_axis with None should remove axis name (pandas-dev#25069)

* clarified the documentation for DF.drop_duplicates (pandas-dev#25056)

* Clarification in docstring of Series.value_counts (pandas-dev#25062)

* ENH: Support fold argument in Timestamp.replace (pandas-dev#25046)

* CLN: to_pickle internals (pandas-dev#25044)

* Implement+Test Tick.__rtruediv__ (pandas-dev#24832)

* API: change Index set ops sort=True -> sort=None (pandas-dev#25063)

* BUG: to_clipboard text truncated for Python 3 on Windows for UTF-16 text (pandas-dev#25040)

* PERF: use new to_records() argument in to_stata() (pandas-dev#25045)

* DOC: Cleanup 0.24.1 whatsnew (pandas-dev#25084)

* Fix quotes position in pandas.core, typos and misspelled parameters. (pandas-dev#25093)

* CLN: Remove sentinel_factory() in favor of object() (pandas-dev#25074)

* TST: remove DST transition scenarios from tc pandas-dev#24689 (pandas-dev#24736)

* BLD: remove spellcheck from Makefile (pandas-dev#25111)

* DOC: small clean-up of 0.24.1 whatsnew (pandas-dev#25096)

* DOC: small doc fix to Series.repeat (pandas-dev#25115)

* TST: tests for categorical apply (pandas-dev#25095)

* CLN: use dtype in constructor (pandas-dev#25098)

* DOC: frame.py doctest fixing (pandas-dev#25097)

* DOC: 0.24.1 release (pandas-dev#25125)

[ci skip]

* Revert set_index inspection/error handling for 0.24.1 (pandas-dev#25085)

* DOC: Minor what's new fix (pandas-dev#24933)

* Backport PR pandas-dev#24916: BUG-24212 fix regression in pandas-dev#24897 (pandas-dev#24951)

* Revert "Backport PR pandas-dev#24916: BUG-24212 fix regression in pandas-dev#24897 (pandas-dev#24951)"

This reverts commit 84056c5.

* DOC/CLN: Timezone section in timeseries.rst (pandas-dev#24825)

* DOC: Improve timezone documentation in timeseries.rst

* edit some of the examples

* Address review

* DOC: Fix validation type error RT04 (pandas-dev#25107) (pandas-dev#25129)

* Reading a HDF5 created in py2 (pandas-dev#25058)

* BUG: Fixing regression in DataFrame.all and DataFrame.any with bool_only=True (pandas-dev#25102)

* Removal of return variable names (pandas-dev#25123)

* DOC: Improve docstring of Series.mul (pandas-dev#25136)

* TST/REF: collect DataFrame reduction tests (pandas-dev#24914)

* Fix validation error type `SS05` and check in CI  (pandas-dev#25133)

* Fixed tuple to List Conversion in Dataframe class (pandas-dev#25089)

* STY: use pytest.raises context manager (indexes/multi) (pandas-dev#25175)

* DOC: Updates to Timestamp document (pandas-dev#25163)

* BLD: pin cython language level to '2' (pandas-dev#25145)

Not explicitly pinning the language level has been producing future
warnings from cython.  The next release of cython is going to change
the default level to '3str' under which the pandas cython extensions
do not compile.

The long term solution is to update the cython files to the next
language level, but this is a stop-gap to keep pandas building.

* CLN: Use ABCs in set_index (pandas-dev#25128)

* DOC: update docstring for series.nunique (pandas-dev#25116)

* DEPR: remove PanelGroupBy, disable DataFrame.to_panel (pandas-dev#25047)

* BUG: DataFrame.merge(suffixes=) does not respect None (pandas-dev#24819)

* fix MacPython pandas-wheels failure (pandas-dev#25186)

* modernize compat imports (pandas-dev#25192)

* TST: follow-up to Test nested pandas array pandas-dev#24993 (pandas-dev#25155)

* revert changes to tests in pandas-devgh-24993

* Test nested PandasArray

* isort test_numpy.py

* change NP_VERSION_INFO

* use LooseVersion

* add _np_version_under1p16

* remove blank line from merge master

* add doctstrings to fixtures

* DOC/CLN: Fix errors in Series docstrings (pandas-dev#24945)

* REF: Add more pytest idiom to test_holiday.py (pandas-dev#25204)

* DOC: Fix validation type error SA05 (pandas-dev#25208)

Create check for SA05 errors in CI

* BUG: Fix Series.is_unique with single occurrence of NaN (pandas-dev#25182)

* REF: Remove many Panel tests (pandas-dev#25191)

* DOC: Fixes to docstrings and add PR10 (space before colon) to validation (pandas-dev#25109)

* DOC: exclude autogenerated c/cpp/html files from 'trailing whitespace' checks (pandas-dev#24549)

* STY: use pytest.raises context manager (indexes/period) (pandas-dev#25199)

* fix ci failures (pandas-dev#25225)

* DEPR: remove tm.makePanel and all usages (pandas-dev#25231)

* DEPR: Remove Panel-specific parts of io.pytables (pandas-dev#25233)

* DEPR: Add Deprecated warning for timedelta with passed units M and Y  (pandas-dev#23264)

* BUG-25061 fix printing indices with NaNs (pandas-dev#25202)

* BUG: Fix regression in DataFrame.apply causing RecursionError (pandas-dev#25230)

* BUG: Fix regression in DataFrame.apply causing RecursionError

* Add feedback from PR

* Add feedback after further code review

* Add feedback after further code review 2

* BUG: Fix read_json orient='table' without index (pandas-dev#25170) (pandas-dev#25171)

* BLD: prevent asv from calling sys.stdin.close() by using different launch method (pandas-dev#25237)

* (Closes pandas-dev#25029) Removed extra bracket from cheatsheet code example. (pandas-dev#25032)

* CLN: For loops, boolean conditions, misc. (pandas-dev#25206)

* Refactor groupby group_add from tempita to fused types (pandas-dev#24954)

* CLN: Remove ipython 2.x compat (pandas-dev#25150)

* CLN: Remove ipython 2.x compat

* trivial change to trigger asv

* Update v0.25.0.rst

* revert whatsnew

* BUG: Duplicated returns boolean dataframe (pandas-dev#25234)

* REF/TST: resample/test_base.py (pandas-dev#25262)

* Revert "BLD: prevent asv from calling sys.stdin.close() by using different launch method (pandas-dev#25237)" (pandas-dev#25253)

This reverts commit f67b7fd.

* BUG: pandas Timestamp tz_localize and tz_convert do not preserve `freq` attribute (pandas-dev#25247)

* DEPR: remove assert_panel_equal (pandas-dev#25238)

* PR04 errors fix (pandas-dev#25157)

* Split Excel IO Into Sub-Directory (pandas-dev#25153)

* API: Ensure DatetimeTZDtype standardizes pytz timezones (pandas-dev#25254)

* API: Ensure DatetimeTZDtype standardizes pytz timezones

* Add whatsnew

* BUG: Fix exceptions when Series.interpolate's `order` parameter is missing or invalid (pandas-dev#25246)

* BUG: raise accurate exception from Series.interpolate (pandas-dev#24014)

* Actually validate `order` before use in spline

* Remove unnecessary check and dead code

* Clean up comparison/tests based on feedback

* Include invalid order value in exception

* Check for NaN order in spline validation

* Add whatsnew entry for bug fix

* CLN: Make unit tests assert one error at a time

* CLN: break test into distinct test case

* PEP8 fix in test module

* CLN: Test fixture for interpolate methods

* BUG: DataFrame.join on tz-aware DatetimeIndex (pandas-dev#25260)

* REF: use _constructor and ABCFoo to avoid runtime imports (pandas-dev#25272)

* Refactor groupby group_prod, group_var, group_mean, group_ohlc (pandas-dev#25249)

*  Fix typo in Cheat sheet with regex (pandas-dev#25215)

* Edit parameter type in pandas.core.frame.py DataFrame.count (pandas-dev#25198)

* TST/CLN: remove test_slice_ints_with_floats_raises (pandas-dev#25277)

* Removed Panel class from HDF ASVs (pandas-dev#25281)

* DOC: Fix minor typo in docstring (pandas-dev#25285)

* DOC/CLN: Fix errors in DataFrame docstrings (pandas-dev#24952)

* Skipped broken Py2 / Windows test (pandas-dev#25323)

* Rt05 documentation error fix issue 25108 (pandas-dev#25309)

* Fix typos in docs (pandas-dev#25305)

* Doc: corrects spelling in generic.py (pandas-dev#25333)

* BUG: groupby.transform retains timezone information (pandas-dev#25264)

* Fixes Formatting Exception (pandas-dev#25088)

* Bug: OverflowError in resample.agg with tz data (pandas-dev#25297)

* DOC/CLN: Fix various docstring errors (pandas-dev#25295)

* COMPAT: alias .to_numpy() for timestamp and timedelta scalars (pandas-dev#25142)

* ENH: Support times with timezones in at_time (pandas-dev#25280)

* BUG: Fix passing of numeric_only argument for categorical reduce (pandas-dev#25304)

* TST: use a fixed seed to have the same uniques across python versions (pandas-dev#25346)

TST: add pytest-mock to handle mocker fixture

* TST: xfail excel styler tests, xref GH25351 (pandas-dev#25352)

* TST: xfail excel styler tests, xref GH25351

* CI: cleanup .c files for cpplint>1.4

* DOC: Correct doc mistake in combiner func (pandas-dev#25360)

Closes pandas-devgh-25359.

* DOC/BLD: fix --no-api option (pandas-dev#25209)

* DOC: modify typos in Contributing section (pandas-dev#25365)

* Remove spurious MultiIndex creation in `_set_axis_name` (pandas-dev#25371)

* Resovles pandas-dev#25370
* Introduced by pandas-dev#22969

* pandas-dev#23049: test for Fatal Stack Overflow stemming From Misuse of astype('category') (pandas-dev#25366)

* 9236: test for the DataFrame.groupby with MultiIndex having pd.NaT (pandas-dev#25310)

* [BUG] exception handling of MultiIndex.__contains__ too narrow (pandas-dev#25268)

* 14873: test for groupby.agg coercing booleans (pandas-dev#25327)

* BUG/ENH: Timestamp.strptime (pandas-dev#25124)

* BUG: constructor Timestamp.strptime() does not support %z.

* Add doc string to NaT and Timestamp

* updated the error message

* Updated whatsnew entry.

* Interval dtype fix (pandas-dev#25338)

* [CLN] Excel Module Cleanups (pandas-dev#25275)

Closes pandas-devgh-25153

Authored-By: tdamsma <tdamsma@gmail.com>

* ENH: indexing and __getitem__ of dataframe and series accept zerodim integer np.array as int (pandas-dev#24924)

* REGR: fix TimedeltaIndex sum and datetime subtraction with NaT (pandas-dev#25282, pandas-dev#25317) (pandas-dev#25329)

* edited whatsnew typo (pandas-dev#25381)

* fix typo of see also in DataFrame stat funcs (pandas-dev#25388)

* API: more consistent error message for MultiIndex.from_arrays (pandas-dev#25189)

* CLN: (re-)enable infer_dtype to catch complex (pandas-dev#25382)

* DOC: Edited docstring of Interval (pandas-dev#25410)

The docstring contained a repeated segment, which I removed.

* Mark test_pct_max_many_rows as high memory (pandas-dev#25400)

Fixes issue pandas-dev#25384

* Correct a typo of version number for interpolate() (pandas-dev#25418)

* DEP: add pytest-mock to environment.yml (pandas-dev#25417)

* BUG: Fix type coercion in read_json orient='table' (pandas-dev#21345) (pandas-dev#25219)

* ERR: doc update for ParsingError (pandas-dev#25414)

Closes pandas-devgh-22881

* ENH: Add in sort keyword to DatetimeIndex.union (pandas-dev#25110)

* DOC: Rewriting of ParserError doc + minor spacing (pandas-dev#25421)

Follow-up to pandas-devgh-25414.

* API/ERR: allow iterators in df.set_index & improve errors (pandas-dev#24984)

* BUG: Indexing with UTC offset string no longer ignored (pandas-dev#25263)

* PERF/REF: improve performance of Series.searchsorted, PandasArray.searchsorted, collect functionality (pandas-dev#22034)

* TST: remove never-used singleton fixtures (pandas-dev#24885)

* BUG: fixed merging with empty frame containing an Int64 column (pandas-dev#25183) (pandas-dev#25289)

* DOC: fixed geo accessor example in extending.rst (pandas-dev#25420)

I realised "lon" and "lat" had just been switched with "longitude" and "latitude" in the following code block. So I used those names here as well.

* TST: numpy RuntimeWarning with Series.round() (pandas-dev#25432)

* CI: add __init__.py to isort skip list (pandas-dev#25455)

* DOC: CategoricalIndex doc string (pandas-dev#24852)

* DataFrame.drop Raises KeyError definition (pandas-dev#25474)

* BUG: Keep column level name in resample nunique (pandas-dev#25469)

Closes pandas-devgh-23222

xref pandas-devgh-23645

* ERR: Correct error message in to_datetime (pandas-dev#25467)

* ERR: Correct error message in to_datetime

Closes pandas-devgh-23830

xref pandas-devgh-23969

* Fix minor typo (pandas-dev#25458)

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

* CI: Set pytest minversion to 4.0.2 (pandas-dev#25402)

* CI: Set pytest minversion to 4.0.2

* STY: use pytest.raises context manager (indexes) (pandas-dev#25447)

* STY: use pytest.raises context manager (tests/test_*) (pandas-dev#25452)

* STY: use pytest.raises context manager (tests/test_*)

* fix ci failures

* skip py2 ci failure

* Fix minor error in dynamic load function (pandas-dev#25256)

* Cythonized GroupBy Quantile (pandas-dev#20405)

* BUG: Fix regression on DataFrame.replace for regex (pandas-dev#25266)

* BUG: Fix regression on DataFrame.replace for regex

The commit ensures that the replacement for regex is not confined to the 
beginning of the string but spans all the characters within. The 
behaviour is then consistent with versions prior to 0.24.0.

One test has been added to account for character replacement when the 
character is not at the beginning of the string.

* Correct contribution guide docbuild instruction (pandas-dev#25479)

* TST/REF: Add pytest idiom to test_frequencies.py (pandas-dev#25430)

* BUG: Fix index type casting in read_json with orient='table' and float index (pandas-dev#25433) (pandas-dev#25434)

* BUG: Groupby.agg with reduction function with tz aware data (pandas-dev#25308)

* BUG: Groupby.agg cannot reduce with tz aware data

* Handle output always as UTC

* Add whatsnew

* isort and add another fixed groupby.first/last issue

* bring condition at a higher level

* Add try for _try_cast

* Add comments

* Don't pass the utc_dtype explicitly

* Remove unused import

* Use string dtype instead

* DOC: Fix docstring for read_sql_table (pandas-dev#25465)

* ENH: Add Series.str.casefold (pandas-dev#25419)

* Fix PR10 error and Clean up docstrings from functions related to RT05 errors (pandas-dev#25132)

* Fix unreliable test (pandas-dev#25496)

* DOC: Clarifying doc/make.py --single parameter (pandas-dev#25482)

* fix MacPython / pandas-wheels ci failures (pandas-dev#25505)

* DOC: Reword Series.interpolate docstring for clarity (pandas-dev#25491)

* Changed insertion order to sys.path (pandas-dev#25486)

* TST: xfail non-writeable pytables tests with numpy 1.16x (pandas-dev#25517)

* STY: use pytest.raises context manager (arithmetic, arrays, computati… (pandas-dev#25504)

* BUG: Fix RecursionError during IntervalTree construction (pandas-dev#25498)

* STY: use pytest.raises context manager (plotting, reductions, scalar...) (pandas-dev#25483)

* STY: use pytest.raises context manager (plotting, reductions, scalar...)

* revert removed testing in test_timedelta.py

* remove TODO from test_frame.py

* skip py2 ci failure

* BUG: Fix potential segfault after pd.Categorical(pd.Series(...), categories=...) (pandas-dev#25368)

* Make DataFrame.to_html output full content (pandas-dev#24841)

* BUG-16807-1 SparseFrame fills with default_fill_value if data is None (pandas-dev#24842)

Closes pandas-devgh-16807.

* DOC: Add conda uninstall pandas to contributing guide (pandas-dev#25490)

* fix pandas-dev#25487 add modify documentation

* fix segfault when running with cython coverage enabled, xref cython#2879 (pandas-dev#25529)

* TST: inline empty_frame = DataFrame({}) fixture (pandas-dev#24886)

* DOC: Polishing typos out of doc/source/user_guide/indexing.rst (pandas-dev#25528)

* STY: use pytest.raises context manager (frame) (pandas-dev#25516)

* DOC: Fix pandas-dev#24268 by updating description for keep in Series.nlargest (pandas-dev#25358)

* DOC: Fix pandas-dev#24268 by updating description for keep

* fix MacPython / pandas-wheels ci failures (pandas-dev#25537)

*  TST/CLN: Remove more Panel tests (pandas-dev#25550)

* BUG: caught typeError in series.at (pandas-dev#25506) (pandas-dev#25533)

* ENH: Add errors parameter to DataFrame.rename (pandas-dev#25535)

* ENH: GH13473 Add errors parameter to DataFrame.rename

* TST: Skip IntervalTree construction overflow test on 32bit (pandas-dev#25558)

* DOC: Small fixes to 0.24.2 whatsnew (pandas-dev#25559)

* minor typo error (pandas-dev#25574)

* BUG: in error message raised when invalid axis parameter (pandas-dev#25553)

* BLD: Fixed pip install with no numpy (pandas-dev#25568)

* Document the behavior of `axis=None` with `style.background_gradient` (pandas-dev#25551)

* fix minor typos in dsintro.rst (pandas-dev#25579)

* BUG: Handle readonly arrays in period_array (pandas-dev#25556)

* BUG: Handle readonly arrays in period_array

Closes pandas-dev#25403

* DOC: Fix typo in tz_localize (pandas-dev#25598)

* BUG: secondary y axis could not be set to log scale (pandas-dev#25545) (pandas-dev#25586)

* TST: add test for groupby on list of empty list (pandas-dev#25589)

* TYPING: Small fixes to make stubgen happy (pandas-dev#25576)

* CLN: Parmeterize test cases (pandas-dev#25355)

hksonngan pushed a commit to hksonngan/pandas that referenced this pull request Mar 12, 2019

hksonngan pushed a commit to hksonngan/pandas that referenced this pull request Mar 12, 2019

alimcmaster1 added a commit to alimcmaster1/pandas that referenced this pull request Jun 3, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.