feat: Improve recurring transaction detection. #1641

elliotcourant · 2023-12-15T16:35:22Z

This code is an experiment on isolating specific amounts within a known set of similar transactions.

The goal of this is to be able to cherry pick specific transactions out of a cluttered dataset to identify them as recurring.

An example of this is amazon, there might be tons of amazon transactions but only a handful of them are actually something like an "amazon prime subscription". So the goal with this is to isolate those transactions that are part of a subscription.

elliotcourant · 2023-12-15T16:38:03Z

server/recurring/amount_test.go

+		bandwidth := SilvermansRuleOfThumb(data)
+
+		bandwidths := make([]float64, 0)
+		for i := 500; i < 5000; i += 10 {
+			bandwidths = append(bandwidths, float64(i))
+		}


Should also append the bandwidth from silvermans rule of thumb here too and then sort the array.

We generate several bandwidths starting at $5.00 and then going up by 10 cents, but including the silvermans rule might cause this to be even more accurate since it has no such constraint. OR we might be able to tune our own increment to be 50 cents or even $1 based instead.

elliotcourant · 2023-12-15T16:40:33Z

TODO

Need to implement some kind of data smoothing. I'm thinking a Gaussian smoothing might be the best for the dataset since its also reasonable to implement in go:

func gaussianKernel(size int, sigma float64) []float64 {
	kernel := make([]float64, size)
	sum := 0.0
	m := size / 2

	for i := 0; i < size; i++ {
		diff := float64(i - m)
		kernel[i] = math.Exp(-(diff * diff) / (2 * sigma * sigma))
		sum += kernel[i]
	}

	// Normalize the kernel
	for i := range kernel {
		kernel[i] /= sum
	}

	return kernel
}

func gaussianSmooth(data []float64, sigma float64) []float64 {
	size := int(sigma * 6) // a common choice for kernel size
	if size%2 == 0 {
		size++ // ensure kernel size is odd
	}

	kernel := gaussianKernel(size, sigma)
	halfSize := size / 2
	smoothedData := make([]float64, len(data))

	for i := range data {
		var weightedSum float64
		var weightSum float64

		for j := -halfSize; j <= halfSize; j++ {
			if i+j >= 0 && i+j < len(data) {
				weight := kernel[halfSize+j]
				weightedSum += data[i+j] * weight
				weightSum += weight
			}
		}

		smoothedData[i] = weightedSum / weightSum
	}

	return smoothedData
}

This should also improve peak detection.

Need to experiment with various sigma values, or should the sigma be determined by the bandwidth value?

codecov-commenter · 2023-12-15T16:44:43Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (63b5d13) 51.05% compared to head (2b0528c) 51.04%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1641      +/-   ##
==========================================
- Coverage   51.05%   51.04%   -0.01%     
==========================================
  Files         321      321              
  Lines       17403    17403              
  Branches      438      438              
==========================================
- Hits         8885     8884       -1     
- Misses       8032     8033       +1     
  Partials      486      486

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Recurring transactions are complicated, I want to try to isolate specific amounts within a dataset of known similar transactions. This way I can determine which transactions are most likely to be recurring, but I want to narrow this down to be more accurate. Some similar transactions might actually be two subscriptions. Or there may be other patterns. This is really just throwing stuff at the wall and seeing what sticks

elliotcourant self-assigned this Dec 15, 2023

elliotcourant commented Dec 15, 2023

View reviewed changes

elliotcourant added 3 commits December 15, 2023 16:56

throwing more shit at the wall seeing what sticks

69b8e9e

chore: PRogress

8060ecc

elliotcourant force-pushed the experiment/recurring-amount-isolation branch from 100a2a8 to 8060ecc Compare December 15, 2023 22:56

chore: Throwing more shit at the wall

2b0528c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Improve recurring transaction detection. #1641

feat: Improve recurring transaction detection. #1641

elliotcourant commented Dec 15, 2023

elliotcourant Dec 15, 2023

elliotcourant commented Dec 15, 2023

codecov-commenter commented Dec 15, 2023 •

edited

feat: Improve recurring transaction detection. #1641

Are you sure you want to change the base?

feat: Improve recurring transaction detection. #1641

Conversation

elliotcourant commented Dec 15, 2023

elliotcourant Dec 15, 2023

Choose a reason for hiding this comment

elliotcourant commented Dec 15, 2023

TODO

codecov-commenter commented Dec 15, 2023 • edited

Codecov Report

codecov-commenter commented Dec 15, 2023 •

edited