Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pd.Series.ffill() raise the error: AttributeError: 'numpy.ndarray' object has no attribute 'ffill' #4577

Closed
VasilijKolomiets opened this issue Jun 15, 2022 · 5 comments · Fixed by #4588
Assignees
Labels
bug 🦗 Something isn't working

Comments

@VasilijKolomiets
Copy link

VasilijKolomiets commented Jun 15, 2022

System information

  • OS Platform and Distribution - WIndows10:
  • Modin version (0.15.0+7.g4ec7f634):
  • Python version 3.9.12:
  • Code we can use to reproduce:
  import modin
  import modin.pandas as pd
  from distributed import Client
  
  import numpy as np
  
  
  if __name__ == '__main__':
      if pd.__name__ == 'modin.pandas':
          client = Client(n_workers=3)
          print(modin.__version__)

    df = pd.DataFrame(
        dict(
            a=[1, 2, None, None, None, ],
            b=(1, None, 3, 4, 5,),
        )
    )

    df.a.ffill(inplace=True)
    print(df)

    df['tr_id'] = 0

    df.tr_id = np.where(
        (df.b <= 4),
        3,
        None,
    )

    df.tr_id.ffill(inplace=True)

Describe the problem

I have the output:

0.15.0+7.g4ec7f634
UserWarning: Ray execution environment not yet initialized. Initializing...
To remove this warning, run the following python code before doing dataframe operations:

    import ray
    ray.init()

UserWarning: Distributing <class 'dict'> object. This may take some time.
     a    b
0  1.0  1.0
1  2.0  NaN
2  2.0  3.0
3  2.0  4.0
4  2.0  5.0
Traceback (most recent call last):
  File "d:\OD\OneDrive\Projects\Chud_Amaz\Soft_in_dev\moduled_way_OOP\modin_test_ffill.py", line 33, in <module>
    df.tr_id.ffill(inplace=True)
AttributeError: 'numpy.ndarray' object has no attribute 'ffill'

Problem

When new column was filled with np.where - method ffill does not work

@mvashishtha
Copy link
Collaborator

@VasilijKolomiets thank you for reporting this issue! I can reproduce it at Modin version 4ec7f63:

import modin.pandas as pd

df = pd.DataFrame([[1]], columns=['col0'])
df.col0 = [3]
print(f'The entire dataframe: {df}')
print(f'df.col0: {df.col0}')
print(f'df["col0"]: {df["col0"]}')
print(df.col0.ffill)

In Modin, I get AttributeError: 'list' object has no attribute 'ffill', but in pandas, the code runs without error. It seems that when we assign a list-like value that is not a series to a Modin dataframe using __setattr__, we assign the item to the dataframe, but we don't reset the attribute of the frame. In the snippet above, df.col0 gets the assigned list of [3] instead of the new column value, which is pd.Series([3], name = 'col0').

If I can see a quick fix, I'll assign this issue to myself and make the fix.

@mvashishtha mvashishtha added the bug 🦗 Something isn't working label Jun 15, 2022
@mvashishtha mvashishtha self-assigned this Jun 15, 2022
@mvashishtha
Copy link
Collaborator

In the snippet I posted above, the Modin dataframe's __setattr__ calls __setitem__ to modify the col0 column in place. It then calls object.__setattr__(self, key, value), which re-assigns the col0 property to exactly the value that was passed in, i.e. the list. I think a fairly simple fix would be to call object.__setattr__(self, key, self.__getitem__(key)) in the case here where we call __setitem__.

I can't take this on right now, so I'll leave it unassigned.

@mvashishtha mvashishtha removed their assignment Jun 15, 2022
@pyrito pyrito self-assigned this Jun 15, 2022
@VasilijKolomiets
Copy link
Author

VasilijKolomiets commented Jun 16, 2022

In the snippet I posted above, the Modin dataframe's __setattr__ calls __setitem__ to modify the col0 column in place. It then calls object.__setattr__(self, key, value), which re-assigns the col0 property to exactly the value that was passed in, i.e. the list. I think a fairly simple fix would be to call object.__setattr__(self, key, self.__getitem__(key)) in the case here where we call __setitem__.

I can't take this on right now, so I'll leave it unassigned.

So what have I to do to obtain result as in 'pure pandas'?
May beput resulting numpy in Series on right hand?

Or just wait a bit? )

@mvashishtha
Copy link
Collaborator

May beput resulting numpy in Series on right hand?

@VasilijKolomiets unfortunately it turns out that that won't work in Modin either 😢 Modin assigns the right-hand side as-is to the attribute instead of pointing the attribute to the the new column.

import modin.pandas as pd

df = pd.DataFrame([[1]], columns=['col0'])
df.col0 = pd.Series([3])
df.iloc[0, 0] = 4
# BUG: col0 is unchanged!!!
assert df.col0.equals(df['col0'])

What you can do instead is use __setitem__ instead of __setattr__, e.g. df['tr_id'] = np.where(... instead of df.tr_id = np.where(.... This works:

import modin.pandas as pd

df = pd.DataFrame([[1]], columns=['col0'])
df['col0'] = pd.Series([3])
df.iloc[0, 0] = 4
assert df.col0.equals(df['col0'])

Meanwhile, @pyrito will work on a PR that should fix all the bugs identified here.

@VasilijKolomiets
Copy link
Author

Thanks a lot!
.
'Be happy, do not worry' (c)

pyrito pushed a commit to pyrito/modin that referenced this issue Jun 21, 2022
…alue

Signed-off-by: Karthik Velayutham <vkarthik@ponder.io>
RehanSD pushed a commit that referenced this issue Jun 22, 2022
Signed-off-by: Karthik Velayutham <vkarthik@ponder.io>
RehanSD pushed a commit that referenced this issue Jun 24, 2022
Signed-off-by: Karthik Velayutham <vkarthik@ponder.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🦗 Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants