The binary outputs to stdout, so you can easily redirect the output to a file
cargo run -- <path> > <output file>
# concrete example
cargo run -- test_assets/single_client/spec.csv > accounts.csv
# with the release flag
cargo run --release -- test_assets/single_client/spec.csv > accounts.csv
This repo can be tested with the default rust toolchain
cargo test
This will run the unit, doc and integration tests.
The benchmarks can be run with
cargo bench
A number of test assets can be found in the test_assets directory.
Rust docs can be built and viewed locally via
cargo doc --open
this is recommended in order explore what is exposed in the crate and examples of how to use it
There are 3 available engines which solve this problem included in this repository - please generate the docs to read more about how each one works in more detail
- A single-threaded synchronous engine (aka
basic
) - A multi-threaded engine (think of a simplified version of Kafka or AWS Kinesis) (aka
stream-like
) - An async task based engine (could be viewed as a simplified actor pattern) (aka
actor-like
)
The default entry-point to the binary exposed by this repository uses the basic (single-threaded)
implementation.
The below benchmarks were generated off the same dataset as the flamegraphs, unfortunately it's too large to commit to GitHub, but contains 5,000,000 CSV rows and approximately 250 unique clients.
Note: the data set used to generate the
below results is different to the one that will be included if you run cargo bench
- which is currently pointing to
test_assets/larger
.
name | flavor | performance |
---|---|---|
basic |
single-threaded-sync |
[4.3701s, 4.3879s, 4.4114s] |
stream-like |
multi-threaded-sync |
[7.3678s, 7.5490s, 7.7421s] |
actor-like |
async-task |
[6.4912s, 6.5779s, 6.6718s] |
The above performance was largely what I expected given the compute performed on each CSV row is minimal - some basic mathematical operations and a few hashmap lookups/inserts/removals.
Although the concurrency based solutions in theory allow multiple rows to be processed at the same time, the overhead associated with the concurrency ie. passing & receiving messages through channels, managing wake-ups etc end up contributing more to the overall runtime than if the data was just processed sequentially.
This is primarily true as the solutions implemented attempt to process the data in as close to "real time" (ie. we process the record as soon as it comes in) as possible.
If eventual consistency was acceptable, it's possible that an epoch based solution might allow for better concurrency and possibly overall perfomance gains - a solution where you would collect all data for an epoch and then resolve it in parallel - this solution would allow you to use other data structures and as such could potentially avoid some of that over head (although they would have their own overhead). This would have to be profiled and tested to be verified though, at this point it is just an idea.
There are flamegraphs available for the 3 implementations - these were generated running over an identical data (same as the benchmarks) set of 5,000,000 CSV rows with approximately 250 unique clients - unfortunately this dataset is too large to upload to GitHub. These can be found in the flamegraph directory or below
My interpretation of this requirement
- In the case of a
Dispute
on aDeposit
. In this scenario it implies that through some mechanism external to the system, the client has forcefully reversed a transactionChargeback
. For example they've asked their card issuer to cancel a deposit before the funds were actually released to the system. This implies that either something went in a manner that the customer really did not expect it to, something went wrong on our end or the client has malious intentions on the system, therefore freezing the account for review makes sense. - In the case of a
Dispute
on aWithdrawal
, this requirement is slightly more vague to me. If we go with the assumption that a withdrawal can only be made to a single/list of approved accounts as is commonly seen on trading platforms the notion of a malicious third party withdrawing funds shouldn't exist - as funds can only be withdrawn to the clients bank account.- In turn this would imply that the only scenario that could be disputed is if the system thought that a
withdrawal had been made, but for some reason the funds hadn't actually been released to the account eg. there was
an error with the client's banks network. From our
systems perspective, the funds have left the system, therefore the client cannot trade with said funds (although
the client believes the funds still exist in the system). Therefore from our system's perspective there is no reason
to apply
Dispute
logic on aWithdrawal
(which would further reduce the client's available funds). Therefore for the sake of this exercise it is NOT possible toDispute
aWithdrawal
.
- In turn this would imply that the only scenario that could be disputed is if the system thought that a
withdrawal had been made, but for some reason the funds hadn't actually been released to the account eg. there was
an error with the client's banks network. From our
systems perspective, the funds have left the system, therefore the client cannot trade with said funds (although
the client believes the funds still exist in the system). Therefore from our system's perspective there is no reason
to apply
- Assuming the state transitions detailed below are valid.
- Key callout: Once a transaction is either resolved, it cannot transition backwards to
Dispute
again.
- Key callout: Once a transaction is either resolved, it cannot transition backwards to
stateDiagram-v2
[*] --> Deposit
[*] --> Withdrawal
Deposit --> Dispute
Dispute --> Resolve
Dispute --> Chargeback
Deposit --> [*]
Withdrawal --> [*]
Resolve --> [*]
Chargeback --> [*]
The rounding strategy that will be used is Bankers Rounding
. Although I don't have much
professional experience working with currency, this seems a reasonable choice given the increased stability it offers.
Ultimately this would be a business decision on how they want to enforce the rounding strategy. For this exercise the
fact that the transaction amounts are in fiat currency (from a quick search a majority of fiat currencies don't go beyond 2 dp, although there
are a few that go to 3 and 4dp), and the requirements specify to carry transactions through to 4dp, makes me believe
this is a safe assumption.
- Can always improve the amount of testing/test coverage, both at a unit and integration level. On a similar note, also improve the assertions which were done very quickly.
- Potentially use a crate such as spectral to enhance how the assertions read
- Improve the error paradigm throughout the application. In a quick project something like
anyhow
oreyre
is okay, however for something that is looking to go into production, I prefer something more robust - ideally without magic strings everywhere.- A tangible example of this would be to have an
ErrorKind
type enum for matching on errors instead of performing string matching
- A tangible example of this would be to have an
- Improve the Telemetry, I've not included a
tracing::subscriber
primarily because I'm not sure if automated tests will look at performance, and any form of telemetry emission will have an effect on that. Similarly I can't log tostdout
given the requirements of how the binary will be invoked.- Having said that, this project does include enough telemetry to get started (the event emission is implemented) however it could probably do some re-visiting and tidying up.
- Use some form of actual data store instead of holding everything in memory
cargo audit
Fetching advisory database from `https://github.com/RustSec/advisory-db.git`
Loaded 421 security advisories (from /Users/naaman/.cargo/advisory-db)
Updating crates.io index
Scanning Cargo.lock for vulnerabilities (140 crate dependencies)
Crate: serde_cbor
Version: 0.11.2
Warning: unmaintained
Title: serde_cbor is unmaintained
Date: 2021-08-15
ID: RUSTSEC-2021-0127
URL: https://rustsec.org/advisories/RUSTSEC-2021-0127
Dependency tree:
serde_cbor 0.11.2
└── criterion 0.3.6
└── transactions 0.1.0
warning: 1 allowed warning found