Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARK v0.1.0 Known Bugs & Issues #35

Closed
Tracked by #61
chhwang opened this issue Jul 17, 2023 · 0 comments
Closed
Tracked by #61

ARK v0.1.0 Known Bugs & Issues #35

chhwang opened this issue Jul 17, 2023 · 0 comments

Comments

@chhwang
Copy link
Contributor

chhwang commented Jul 17, 2023

    • Executor::tensor_memcpy_host_to_device() will cause unknown error if the tensors on the host device is not sequential. We need more check about the tensor on the host or mabe need a python warpper for this (Improve Python interfaces #48)
    • Sometime if the tensor is padded, the allgather operation might overwrite the recv tensor, and the allreduce tensor will also be incorrect. (@chhwang: now send/recv checks contiguity)
    • Current layernorm and sofxmax operation is scheduled using a quite hack way, might needs for more update in the future. (Minor updates #59)
    • Layernorm need a recv dependency at its output (@chhwang: it already has)
    • [ ] Support both source and destination offsets in NetIbQp::stage_send() moved to the next version
    • When using python -m unittest discover -s . -p "test_*.py" to run all unittest, the snedrecv test will fail, but when we run them seperately, their will be no problem. Seems that in some cases the previous runtime context is not destroyed when one unittest finished and another unittest start. This problem also exist in the current main branch. (@chhwang: this is the test code's issue, won't fix for now)
    • [ ] Offsets of importing/exporting tensors are not properly handled moved to the next version
    • float matmul error rate seems too high but it's unclear if it is ARK's issue or the test code issue (@chhwang: this is not an issue)
@chhwang chhwang changed the title Known Issues Known Bugs & Issues Jul 17, 2023
@chhwang chhwang changed the title Known Bugs & Issues ARK v0.1.0 Known Bugs & Issues Aug 15, 2023
@chhwang chhwang closed this as completed Sep 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant