Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDK V2 Prerelease - (Slot based storage) #2901

Closed
wants to merge 242 commits into from
Closed

Conversation

martin-lysk
Copy link
Contributor

@martin-lysk martin-lysk commented Jun 7, 2024

Iteration 1 (Prerelease 15.7.)

Meta TODO's:

  • PR description
  • Ticket writing

TODO's

  • Collect requirements
    • Fink
    • Sherlock

@nils.jacobsen

MessageBundle Editor

  • Pattern
  • Make the Component stylable (check where we still have issues)
  • Pass in installedLintRules to render translatable
  • Minimal Docs in Storybook
  • Clean helper functions (Move to SDK, write tests)

Settings Component

  • Adjustments for new settings schema
  • Adjust docs

@loris.sigrist

  • Finalize lint rule interface
    • API Types
    • Typescript gymnastic (finalize run method)
    • ✅ Group results by bundle and hash groups (offload work to worker)
    • Tests
  • IPC FS - needed by sherlock and Parrot
    • ✅ integrate into sample web app for testing
    • test in vs code
    • Tests
  • Review SDK code
    • Support with tests

@martin.lysk1

  • Finalize SlotStorage
    • work in Jan's feedback
    • tests
    • Report property in bundle
    • custom conflict handler
  • Merge state property in bundle
  • Integrate lix (currently iso git)
  • Finalize sdk2 api and loadproject2 (Martin)
    • load tests
    • Tests

Iteration 2 (Experimental release 30.7.)

  • ? Filter Bar
  • Import
  • Export
  • Conflict handling
  • Migration strategy
    • settings shims?
      @loris.sigrist collects requirements (utility functions for upgrades)
  • Quill/Tiptap/FBEditor based BundleEditor
  • Hosted Component Docs
  • Lint Rule Directives

...

Iteration 3

  • Version history

Copy link

changeset-bot bot commented Jun 7, 2024

🦋 Changeset detected

Latest commit: d7bc246

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 27 packages
Name Type
@inlang/sdk Minor
@inlang/message-bundle-component Patch
@inlang/badge Patch
@inlang/doc-layout-component Patch
@inlang/editor Patch
@inlang/github-lint-action Patch
vs-code-extension Patch
@inlang/rpc Patch
@inlang/settings-component Patch
@inlang/telemetry Patch
@inlang/recommend-ninja Patch
@inlang/plugin-i18next Patch
@inlang/plugin-json Patch
@inlang/plugin-m-function-matcher Patch
@inlang/plugin-next-intl Patch
@inlang/plugin-sap-ui5 Patch
@inlang/plugin-t-function-matcher Patch
@inlang/paraglide-unplugin Patch
@inlang/paraglide-js-e2e Patch
@inlang/paraglide-next-e2e Patch
vite-vanilla-ts Patch
@inlang/paraglide-rollup Patch
@inlang/paraglide-vite Patch
@inlang/paraglide-webpack Patch
@inlang/paraglide-astro Patch
@inlang/paraglide-sveltekit Patch
@inlang/paraglide-sveltekit-example Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@jldec
Copy link
Contributor

jldec commented Jun 18, 2024

notes from initial slotfile storage exploration

Slotfiles are an on-disk text file format designed to play nicely with git. They persist inlang message data into discreet slots in files, thereby avoiding collisions and minimizing merge conflicts.

High level design

  • a SlotFile contains an array with a predefined number of slots
  • each slot may contain one record, aka "doc", with an id, or the slot may be empty
  • the slot index is determined using a hash of the doc id
  • this results in a very low probability of new docs occupying the same slot
  • when the slot for a new doc is occupied in the first file, it lands on the same slot index in another slotfile (I think)
  • each slot is serialized as a line in the slotfile, separated from other slots by blank lines.

Questions

  • what does "transient" mean, particularly "transient slot file"?
  • why is file read (load) mixed with write (save) e.g. in saveChangesToDisk
  • what does "YDEEEAS" mean?
  • why do we need a forceReload flag
  • why this duplication ...loadResults.updated, ...loadResults.updated (plan for tombstones?)

Nits

  • would love to see more passing tests
  • format code, and fix or disable ts checks and lints before committing.
  • wrap jsdoc comment lines where they are really long (e.g. types/SlotFile.ts)
  • avoid hard-to-differentiate function names like loadSlotFileFromFs and loadSlotFilesFromFs
  • avoid super long functions like saveChangesToDisk (>200 lines)

@martin-lysk
Copy link
Contributor Author

martin-lysk commented Jun 18, 2024

Thanks for the feedback @jldec

notes from initial slotfile storage exploration

Slotfiles are an on-disk text file format designed to play nicely with git. They persist inlang message data into discreet slots in files, thereby avoiding collisions and minimizing merge conflicts.

High level design

  • a SlotFile contains an array with a predefined number of slots

👍

  • each slot may contain one record, aka "doc", with an id, or the slot may be empty

👍

  • the slot index is determined using a hash of the doc id

👍

  • this results in a very low probability of new docs occupying the same slot

👍

  • when the slot for a new doc is occupied in the first file, it lands on the same slot index in another slotfile (I think)

It checks for a slot in the transient file / files if one excists first (transient files are about to be created - and slots used in those can only be occupied by the current process so conflicts can not occur). If no transient file exists yet (the normal case for non batch operations) it will check the files existing files by the nearest files by name first for a free slot. If all slots in existing files are occupied it will create a transient file.
When it’s slot is occupied it uses the name of the file it would create if there

  • each slot is serialized as a line in the slotfile, separated from other slots by blank lines.

👍

Questions

  • what does "transient" mean, particularly "transient slot file"?

To understand this the following properties are important:

  • New Records that are not yet persist are hold in memory and kept as transient records until persisted (successful fs.write file)
  • Changes made on records that occupy a slot in a slot file are represented in the corresponding slotfiles changedRecords property

During a saving to disc routine, the transient records tries to find a slot in existing files (to reduce file creation to a minimum) if no free slot is found a transient file is created (for the lifetime of a save operation). A transientFile is also taken into account to find free slots for other transient records (compare slot search from comment above)

  • why is file read (load) mixed with write (save) e.g. in saveChangesToDisk

This is an extra roundtrip to collect changes on the disc that may happen right before the save. This is not a guarantee to catch all possible changes from other processes but since we will (TODO in place) block incoming event processing during the save an extra load was placed. Definitely a need to clearify and Optimize with the introduction of the lock.

  • what does "YDEEEAS" mean?

it should be „ASDasd“ but I guess my fingers gone crazy when I needed a log I can look for during debugging 🤣

  • why do we need a forceReload flag

🕵️‍♂️ don’t know without looking into the code Will check tomorrow

  • why this duplication ...loadResults.updated, ...loadResults.updated (plan for tombstones?)

🕵️‍♂️ Will check - as well

Nits

  • would love to see more passing tests

Sure - I started writing some todos in the .test.ts file please add everything you would like to see

  • format code, and fix or disable ts checks and lints before committing.

Disagree - I commit WIP and build only what I need in my focus, everything that kicks me out of the flow - but now it’s time to clean up - on it!

  • wrap jsdoc comment lines where they are really long (e.g. types/SlotFile.ts)

👍

  • avoid hard-to-differentiate function names like loadSlotFileFromFs and loadSlotFilesFromFs

👍

  • avoid super long functions like saveChangesToDisk (>200 lines)
    With

🤓👍

NilsJacobsen and others added 2 commits July 3, 2024 10:07
@marting-lysk lintCOnfig need the installedLintRules to show UI with good UX
…omponent-is-not-reacting-on-updates

fix: example lintConfig -> added installedLintRules
const slotStorage = await createSlotWriter<DocumentExample>({
fs: memoryFs,
path: "/slot/",
fileNameCharacters: 3,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a sane default, i expect this will rarely ever need to be touched.

// expect(Object.isFrozen(docs1[0])).eq(true, "Records returned by slotStorage should be frozen")
// eslint-disable-next-line @typescript-eslint/no-non-null-assertion -- we check the length before
expect(docs1[0]!.data.content).eq(insertedDocument.content)
const updateDocument = structuredClone(insertedDocument)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is clone needed? if storage can break without cloning, this has to move into the library

expect(slotStorage._writerInternals.transientSlotEntries.size).eq(1)

await slotStorage.save()
expect(slotStorage._writerInternals.transientSlotEntries.size).eq(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aahhhhh so this is really a transient storage, not just a bad name for a memory layer on top of the file?! this is the first time i understand it is really working like that., maybe i misread this before? if you just updated the memory representation of the file directly an then reconcile when saving you could reuse the same conflict logic as for writes by multiple threads and the whole impl would be simpler. i dont fully understand the need for this 3 layer design.


await slotStorage1.save()
await slotStorage2.save()
const conflictingRecord = slotStorage2.findDocumentsById([insertedDocument.id])[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isnt locking supposed to prevent this case? we need locking in any case to now destroy the file write on concurrent threads, so it should be simple to just prevent any writes from other threads concurrently.

NilsJacobsen and others added 24 commits July 3, 2024 14:19
…-and-languagetag-in-the-settings-component

Nilsjacobsen/inlmc 104 use locale and languagetag in the settings component | INLMC 🧩
…ntation-for-message-bundle-component

feat: add docs for the message bundle component | INLMC 🧩
Add RPC-FS to SDKv2 branch
@samuelstroschein
Copy link
Member

Closing because outdated. Inlang SDK v2 is rebuild on sqlite

@github-actions github-actions bot locked and limited conversation to collaborators Aug 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants