Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: dataframe setitem error on size incompatible assignment #41193

Closed

Conversation

Mxchaeltrxn
Copy link

@Mxchaeltrxn Mxchaeltrxn commented Apr 28, 2021

I've labeled the error something silly but that can be fixed later.

This pull request replaces this older pr #41008

I've put all my tests inside one file for easier review.

Issues

I'm unsure how to stop this test from failing test_pivot_table_doctest_case Could you help with this?

This test (test_invalid_colormap) also fails but when I rerun it individually, it passes ...

I also had an issue with flake8 and black not working together ... in my most recent commit so I just committed it with --no-verify.

@Mxchaeltrxn Mxchaeltrxn changed the title FIX: dataframe setitem error on size incompatible assignment BUG: dataframe setitem error on size incompatible assignment Apr 28, 2021
@Mxchaeltrxn
Copy link
Author

I may have worked something out so don't review this yet.

@Mxchaeltrxn
Copy link
Author

I've gotten test_pivot_table_doctest_case to pass.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't add a new test file. rather somewhere in tests/frame/indexing

@@ -1233,6 +1233,12 @@ def value_getitem(placement):
blk = self.blocks[blkno]
blk_locs = blklocs[val_locs.indexer]
if blk.should_store(value):
if (
value.shape[0] != 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably better to have .should_store return False in this case and have it hit the other path? or is that incorreect

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand you correctly, I think it should stay as it is because the below test case fails:

def test_setitem_size_incompatible_ndarray2(arr):
    data = DataFrame(
        [[1, "A", 1.0], [2, "B", 2.0], [3, "C", 3.0], [4, "D", 4.0]],
        columns=["A", "A", "B"],
    )
    msg = "Errored123"
    with pytest.raises(ValueError, match=msg):
        # data is of shape (4,2) but we are assigning it to shape (4,4)
        data["A"] = np.random.randn(4, 4)

It fails because it goes down the other path (where blk.should_store is false) and no error is thrown despite assigning the column(s) to data of a different shape. I haven't handled this yet because I wanted to make sure the logic was ok so far (sorry I should have mentioned this earlier).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that .should_store is probably not the best description of what it does, (based on my understanding of the method).

@jreback jreback added the Indexing Related to indexing on series/frames, not to indexes themselves label Apr 30, 2021
@Mxchaeltrxn
Copy link
Author

@jreback Thanks for the review. I will move the tests and rename them later. How do I make sure that I'm not duplicating tests ...?

@jreback
Copy link
Contributor

jreback commented May 12, 2021

@jreback Thanks for the review. I will move the tests and rename them later. How do I make sure that I'm not duplicating tests ...?

IIUC, you are essentially narrowing a case, so either try to find the tests that exercise that part of the code (might) be hard, or simply add in the logical place; by-definition you won't be duplicating.

@simonjayhawkins simonjayhawkins added this to the 1.3 milestone May 25, 2021
@jreback jreback removed this from the 1.3 milestone May 26, 2021
@Mxchaeltrxn
Copy link
Author

@jreback Sorry I've been a bit busy. I'll return to this on the weekend when I have some time.

@Mxchaeltrxn
Copy link
Author

@jreback Hey so I've been trying to work on this but I've been getting an error. Do you have any ideas?
Steps to reproduce:

  1. git fetch upstream
  2. git merge upstream/master
  3. conda env update -n pandas-dev --file environment.yml --prune
  4. python setup.py build_ext -j 4

I get this strange error:

  File "setup.py", line 249
    f"{extension}-source file '{sourcefile}' not found.\n"
                                                         ^
SyntaxError: invalid syntax

I'm not really sure why this is happening when it wasn't before. I'm using the new M1 chip on the mac if that matters.

Also, if I get this issue in the future, where should I be putting it? I don't think should be raising a Github issue for development build errors.

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Jun 30, 2021
@Mxchaeltrxn
Copy link
Author

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

I will have a look and see if I can get this repository running in the next few days.

@Mxchaeltrxn
Copy link
Author

Closing as I do not want to work on this anymore.

@Mxchaeltrxn Mxchaeltrxn closed this Aug 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Unexpected behavior when assigning multi-dimensional array to DataFrame column
3 participants