-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Improve recurring transaction detection. #1641
base: main
Are you sure you want to change the base?
Conversation
server/recurring/amount_test.go
Outdated
bandwidth := SilvermansRuleOfThumb(data) | ||
|
||
bandwidths := make([]float64, 0) | ||
for i := 500; i < 5000; i += 10 { | ||
bandwidths = append(bandwidths, float64(i)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also append the bandwidth from silvermans rule of thumb here too and then sort the array.
We generate several bandwidths starting at $5.00 and then going up by 10 cents, but including the silvermans rule might cause this to be even more accurate since it has no such constraint. OR we might be able to tune our own increment to be 50 cents or even $1 based instead.
TODONeed to implement some kind of data smoothing. I'm thinking a Gaussian smoothing might be the best for the dataset since its also reasonable to implement in go: func gaussianKernel(size int, sigma float64) []float64 {
kernel := make([]float64, size)
sum := 0.0
m := size / 2
for i := 0; i < size; i++ {
diff := float64(i - m)
kernel[i] = math.Exp(-(diff * diff) / (2 * sigma * sigma))
sum += kernel[i]
}
// Normalize the kernel
for i := range kernel {
kernel[i] /= sum
}
return kernel
}
func gaussianSmooth(data []float64, sigma float64) []float64 {
size := int(sigma * 6) // a common choice for kernel size
if size%2 == 0 {
size++ // ensure kernel size is odd
}
kernel := gaussianKernel(size, sigma)
halfSize := size / 2
smoothedData := make([]float64, len(data))
for i := range data {
var weightedSum float64
var weightSum float64
for j := -halfSize; j <= halfSize; j++ {
if i+j >= 0 && i+j < len(data) {
weight := kernel[halfSize+j]
weightedSum += data[i+j] * weight
weightSum += weight
}
}
smoothedData[i] = weightedSum / weightSum
}
return smoothedData
} This should also improve peak detection. Need to experiment with various sigma values, or should the sigma be determined by the bandwidth value? |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1641 +/- ##
==========================================
- Coverage 51.05% 51.04% -0.01%
==========================================
Files 321 321
Lines 17403 17403
Branches 438 438
==========================================
- Hits 8885 8884 -1
- Misses 8032 8033 +1
Partials 486 486 ☔ View full report in Codecov by Sentry. |
Recurring transactions are complicated, I want to try to isolate specific amounts within a dataset of known similar transactions. This way I can determine which transactions are most likely to be recurring, but I want to narrow this down to be more accurate. Some similar transactions might actually be two subscriptions. Or there may be other patterns. This is really just throwing stuff at the wall and seeing what sticks
100a2a8
to
8060ecc
Compare
This code is an experiment on isolating specific amounts within a known set of similar transactions.
The goal of this is to be able to cherry pick specific transactions out of a cluttered dataset to identify them as recurring.
An example of this is amazon, there might be tons of amazon transactions but only a handful of them are actually something like an "amazon prime subscription". So the goal with this is to isolate those transactions that are part of a subscription.