You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
when changing the number of embeddings to 4091, and mch_size to 1021 of the code below, it will throw the following exception
ValueError: ShardedTensor global_size property does not match from different ranks! Found global_size=torch.Size([3070]) on rank:0, and global_size=torch.Size([3068]) on rank:1.
ValueError: ShardedTensor global_size property does not match from different ranks! Found global_size=torch.Size([3070]) on rank:0, and global_size=torch.Size([3068]) on rank:1.
Traceback (most recent call last):
File "test2.py", line 143, in <module>
spmd_sharing_simulation(ShardingType.ROW_WISE)
File "test2.py", line 139, in spmd_sharing_simulation
assert 0 == p.exitcode
AssertionError
Hi, thanks for trying out ManagedCollisionCollection!
Not sure if its a bug. The thing is, we are trying to (only) use ManagedCollisionCollection with rowwise sharding, which would shard the table evenly to all the gpus, hence the divisible thing.
Hi, thanks for trying out ManagedCollisionCollection!
Not sure if its a bug. The thing is, we are trying to (only) use ManagedCollisionCollection with rowwise sharding, which would shard the table evenly to all the gpus, hence the divisible thing.
Thanks for your quick response, yes, I tried ManagedCollisionCollection on our data, the performance degraded when using ManagedCollisionCollection. The training time is also significant increased. Is there any guideline or document on how to set the hyper-parameters when using this module, e.g., eviction_interval, zch_size, mch_size, and which policy is better DistanceLFU_EvictionPolicy or LFU_EvictionPolicy under which scenario.
when changing the number of embeddings to 4091, and mch_size to 1021 of the code below, it will throw the following exception
The text was updated successfully, but these errors were encountered: