Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replay DB and Global Stats #364

Draft
wants to merge 39 commits into
base: main
Choose a base branch
from

Conversation

alvarosevilla95
Copy link

@alvarosevilla95 alvarosevilla95 commented Dec 25, 2022

Motivation

This PR refloats the original db PR Vince and I worked on back in the day, and to set up the framework and base implmentation for a Global Stats explorer in the app.

General architecture

Database

The core functioning for the database is very similar to what we had in the original PR. We have a sqlite database, and on the load of a replay folder, we cache the basic metadata that we use on the replay list. This is quite fast, taking just a few seconds to load my 10k replay dataset.

Up to here, the benefit of having the database is mostly having near-instant replay list loading and refreshing (< 1 second on my laptop for 10k games). The database size is also negligible, as each game takes <1% of its original file size.

Stats cache

The next big thing is stat caching. Due to concerns in the original PR, and with UX in mind, this does not happen automatically anymore. Instead, the user can choose to enable global stat caching from the replay browser, which triggers a background process that computes stats for all games and stores them in the database. This is a relatively slow process (~20 mins for 10k games on my machine), but it is non intrusive. It has graceful recovery so we could halt the process when dolphin is opened (although being limited to 1 core, it shouldn't really have noticeable performance impact in most machines).

Stats computer

The main reason for caching game stats in the database is to be able to be able to efficiently compute global stats across all games. I'll describe the exact workflow below, but the core idea is that the frontend can send a request to the backend for stats on a collection of files. The backend then opens a cursor on the database and scans all relevant games, reducing them to a global stats object. Notably, we never load all stats to memory, so the system should be resillient to large numbers of games. The calculation is reasonably fast as well, just under 2 seconds on my machine for 9k games.

Stats components

Here is the actualy flashy stuff. I've chosen recharts as the base for most of these views, which has worked pretty well. There is a new GlobalStats container, that can be accessed through a button from ReplayBrowser. At the moment I'm still experimenting a bit with the charts and layouts, but I'm thinking of having 4 main views in this page:

  • General. Shows some general stats (total playtime / games / wins etc) and a quick overiew of characters played / against, opponents etc,
  • Progression. Focused on timeseries charts showing evolution of certain metrics across time (l-cancel... %, move/tech/throw distribution...)
  • Analysis. A more focused view that centers on identifying patterns, common followups, option selection...
  • Random. The fun bits, not very developed yet but some ideas include times saved by randall, ledgedash SDs...

User workflow

The user will open the slippi launcher as always. On startup, the system automatically loads the default replay folder and sets up the db cache if necessary. So far the experience is identical as before, but with the performance imrpovements of the database on the replay file list.

On the replay browser, there is a new button to "Enable Global Stats". When clicked, the app starts the stats loading cache in the bacground, reporting progress in the fronend (with no blocking). The caching is resillient, so on restart it will pick up where it left off without redoing work. Once the caching is complete, the button is replaced with an "Explore Stats button"

Clicking this, the app sends a request for stats to the backend, it takes the file names, scans the db and computes the stats object. Once this is returned the GlobalStats page gets rendered.

Pending discussion

  • Database migrations / versioning. With the split of the db in stats vs non stats, this becomes a bit easier, but still needs to be well defined. We may have to invalidate either the metadata or stats cache at any point (due to a new version of slippi-js for instance). Doing so for the metadata side is inexpensive, as the user will barely notice it getting recreated, but for stats it can take a while, but of course we can just leverage the same backgroun process as for the initial load

  • Limiting games computed by default. As I said, the stats computing is relatively fast, but for users with >100k games the loading will still be quite slow. A solution for this is to limit the stats page to last N (10k?) games by default, but let the user expand to the full set manually.

Bonus

  • Opening the stats page with a file selection automatically limits the calculation to the selected files

alvarosevilla95 and others added 30 commits May 31, 2021 10:14
…#126)

* chore(vscode): use LF for new lines by default

* Persist replay stats data to NeDB for faster access

* Upgrade electron to 11.3.0

The NeDB library required optional chaining which is not supported in
the current version

* Set DB location in userData

* Support for folders and file changes in the db

* Compute stats in the main process

* Downgrade Electron

Apparently Electron sucks a lot and the new version breaks the
sourcegraph.
Had to switch nedb-promises-typescript to @types/nedb due to lack of
optional chaining support in the current version.

* Delete from DB files from deleted folders

* Add player tags to top level for easy querying

* optimize file detection

* Lazy load stats from the db

* Switch from NeDB to sqlite3

* Improve querying

* Use worker_threads to parallelize stats computing

* Restructure replay code

* Separate server, db, and stats parser

* Style fixes

* Move replay code back to replayBrowser

* Fix wiring

* Restructure folder loading

* Change db name to 'sqlippi.db'

Co-authored-by: Vince Au <vinceau09@gmail.com>

* Move @types/sqlite3 to dev dependency

* Rename dao.ts to db.ts

* Style fixes

* Add worker pool for bulk stats loading

* Consolidate ipc logic

* Separate db into two tables

* Move startTime to replay_data

* Initial db schema draft

* Remove schemaa file

* Swap node-sqlite with better-sqlite3

* Move db to worker thread

* Add better-sqlite3 to thread-plugins bundled modules

* Pass userdata path to the db from main thread

* Typo in log

* Remove old stats worker

* Discard better-sqlite3

* Revert "Discard better-sqlite3"

This reverts commit 1aaa051.

* fix: type issues

* fix: better-sqlite3 rebuilding

Co-authored-by: Vince Au <vinceau09@gmail.com>
@vinceau vinceau mentioned this pull request Oct 28, 2023
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants