-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic usage example? #1
Comments
Hi @aalexandersson, thanks for following this! I need to modify the version numbering because it is currently only viable for my specific use case and needs more basic functionality of the original implementation to get to that point. I have a larger update incoming that refactors some of the code significantly, and it will include a readme. I am also resolving some questions about the methodology with the original authors. Specifically, deduplication is not implemented yet because I don't understand the intuition around the deduplication in fastlink (see this issue). |
Ok, it sounds very promising! 👍 The About your implementation of deduplication, is there a reason why you need to port the |
I guess its less intuition than an actual bug. I need to read the dissertation, but the code starting here doesn't deduplicate the data as intended. I think the intention was to tapply over the ids of dfb for dfa but it instead tapplys over the ids of dfa against a merge against dfa and vice versa for dfb. The issue is that because the duplicate row ids are removed before dfb is deduped, it doesn't actually run a dedupe on dfB. Thanks for linking Ben Fitfeld's dissertation, it is helpful to know what the intended behavior was. I will write a function up based on that intended solution. |
Oh, that would explain it. I have experienced a few duplicates in dfB several times on real but confidential data despite running the default Therefore, here are my guesses: 1) You are correct, 2) It would help the
Yes, that sounds very reasonable 💯. Implement something that works, and is flexible. It can always be improved later. |
Hello, been looking over this for a couple of days and can't figure out how to get the matches from the two passed in df's based on structures returned by |
@Westat-Transportation this functionality is added as of 0.0.7. let me know if the function is what you were looking for. |
Can you please add a basic usage example? Is it already a "minimum viable package" worth registering since it is v0.1.x?
I am very excited to learn about your package, to possibly one day switch from
fastLink
(R) andSplink
(Python). I am not a good Julia programmer though. It would be easier to try your package as a user if it has a basic usage example and if it is registered. Best wishes!The text was updated successfully, but these errors were encountered: