-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Function to Automatically Transform Input To Use The Smallest dtype #6
Comments
The optimal solution (in terms of dtype size) would be to use np.unique. However, a simpler solution that just tests on |
I would agree with this. I think we just need to show some documentation or a good example that demonstrates how much of a waste it is to have an array like:
And one should convert this to integers between |
from time import time
from pydivsufsort import divsufsort
n = 1_000_000
random_string = np.random.randint(255, size=n, dtype=np.uint8)
d = time()
divsufsort(random_string)
print(time() - d)
random_string = random_string.astype(np.uint64)
d = time()
divsufsort(random_string)
print(time() - d)
random_string[0] = -1
d = time()
divsufsort(random_string)
print(time() - d)
On such a string, |
Sorry, github is currently experiencing problems and I deleted your double comment. It looks like both are gone now. |
I can't tell from this comment if you think this is too fast or slow? IIRC, |
120 ms is comparable to the cost of divsufsort on a string with the same dtype. Hence the user will pay in most cases a non negligible cost that there is no way to avoid. There is no way to make np.unique faster or to use it without sorting. The current version includes an optimization that will work in most cases and doesn't cost much. Why did you reopen the issue? To remember to put that in an example? |
I could add a parameter to launch np.unique but just doing it will take the same amount of code, and I want the interface to be simple. |
Cool. I'm with you and agree that a simple interface is preferred. Sorry for re-opening. I couldn't comment without re-opening. For issues that aren't closed by an actual commit, I tend to prefer to leave it open for at least 14 days (2 weeks) to allow for discussion and additional thoughts/considerations. After 2 weeks, I usually close and make a comment like "feel free to re-open..." Not trying to dictate here but this approach might be useful in order to avoid constantly opening and closing issues just to be able to comment. It would reduce the amount of email notifications. |
According to #3
It would be nice to add a function to handle this automatically behind the scenes
The text was updated successfully, but these errors were encountered: