New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix md.concat
error when there are same fetch chunk data
#3285
Conversation
What's the root cause of this issue? |
Like the example: import mars
import mars.dataframe as md
import mars.tensor as mt
mars.new_session()
data = {"A": [i for i in range(10)]}
df0 = md.DataFrame(data)
df1 = df0[['A']]
df2 = df0[['A']]
df1 = df1.execute()
df2 = df2.execute()
df3 = md.concat([df1, df2], axis=1)
df3.execute() There will be one subtask who has 4 nodes as follow and the two In the early version, we introduced a new fused algorithm and will put two chunk data branches into one subtask. And this cause the problem. |
This does not actually |
Why these two chunks have the same index? I think there may be some bugs in tiling. |
|
@qinxuye @wjsi Can we just use a random id instead of the tokenize id? Currently, one stage may contains multiple chunks with the same key in different subtasks. |
Id is generated by |
The duplicate id and key may cause unexpected problems in distributed system. How mars optimize the compute by the tokenized key? |
I guess just regenerating id is ok, the mech for key should be kept. |
For this issue, maybe regenerating id is OK. But, tokenize key cost CPU and may introduce some bugs to Mars, I want to check if the optimization actually works. |
Key could be the same if the tileable, chunk or op has the same properties while id should be different in my opinion. |
Yes, but the meta and store managment are using key as the key. Different subtasks may overwrite the data of meta and store. Also, there are hundreds of My suggestion:
|
@fyrestone You can give a try to generate key randomly and see if everything can work well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What do these changes do?
There is a GraphContainsCycleError when concatenating two DataFrame in which there are same fetch chunks.
Related issue number
Fixes #3284
Check code requirements