Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast attempt at serialization, using SimpleValue #2739

Merged
merged 15 commits into from
Dec 31, 2023
Merged

Conversation

OAGr
Copy link
Contributor

@OAGr OAGr commented Dec 15, 2023

I want to do a few things:

  1. Expose some internal serialized representation for many items in Squiggle. Maybe we'd have a native "toDict" function or
    similar.
  2. Have a much better JSON serializer. (Right now, ours is very basic).

This code does a first pass at (1). (1) makes (2) very easy, so that would be easy to do on top of this.

I'm unsure about how we want to go about the rest of this. Curious to get takes. My recommendation is that we keep it simple, even if kind of ugly, at this point. I'm okay with the syntax changing a bit later, my main goal is to expose a lot. Ideally many of the items in (1) could easily be de-serialized, but that's further work.

If we have (1) and (2), we could later have optional views for both, in our app.

Copy link

changeset-bot bot commented Dec 15, 2023

⚠️ No Changeset found

Latest commit: 2f70a2a

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link

vercel bot commented Dec 15, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
quri-hub ✅ Ready (Inspect) Visit Preview Dec 31, 2023 2:34am
squiggle-components ✅ Ready (Inspect) Visit Preview Dec 31, 2023 2:34am
squiggle-website ✅ Ready (Inspect) Visit Preview Dec 31, 2023 2:34am
1 Ignored Deployment
Name Status Preview Updated (UTC)
quri-ui ⬜️ Ignored (Inspect) Visit Preview Dec 31, 2023 2:34am

Copy link
Contributor

sweep-ai bot commented Dec 15, 2023

Apply Sweep Rules to your PR?

  • Apply: All docstrings and comments should be up to date.
  • Apply: Ensure that all variables and functions have descriptive names.
  • Apply: Avoid using unnecessary separators or extra characters in code.
  • Apply: Use consistent indentation and spacing throughout the code.
  • Apply: Ensure that all code is properly formatted and follows the style guide.
  • Apply: Avoid using magic numbers or hard-coded values in the code.

Copy link

codecov bot commented Dec 15, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (8dddd1d) 71.31% compared to head (2f70a2a) 70.15%.
Report is 21 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2739      +/-   ##
==========================================
- Coverage   71.31%   70.15%   -1.16%     
==========================================
  Files         118      119       +1     
  Lines        6581     6748     +167     
  Branches     1368     1436      +68     
==========================================
+ Hits         4693     4734      +41     
- Misses       1880     2006     +126     
  Partials        8        8              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@OAGr
Copy link
Contributor Author

OAGr commented Dec 15, 2023

Connects to #2563

Copy link
Collaborator

@berekuk berekuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this adds JSON in Squiggle language that turns Values into other simplified Values.

Without commenting on the implementation (which can be cleaned up, especially regarding TypeScript types):

  • JSON(...) an alternative to SqValue.asJS, I'm not sure which one is better, can see some benefits from doing it this way, there's a lot to unpack here (do we need the simple value to be available in Squiggle? do we mostly use it in DB and access it through SqValue?)
  • technically, JSON is a text format, so I initially expected that JSON(...) function would return a string; otherwise, toSimpleValue name is more proper
  • OTOH, I don't like the idea of serializing things to strings only to parse them to JS objects again, so I'm happy that this new function doesn't do it

Another observation/reminder is that for storing things in the database, the serialization format won't be very human-readable.

One reason for this is lambdas; serialized lambdas would look like a deeply nested tree of AST nodes, at best, and eventually will look like a byte code.

Second, more important reason, is that Squiggle values are not trees but directed graphs without cycles, and serializing them to trees is too expensive; so we'll have to store them as lists of values which refer to each other by ids in that list.

So it's better to treat serialized values as black boxes. This is an argument against having JSON(...) as a function in Squiggle: its output won't be very useful for viewing in the playground.

@OAGr
Copy link
Contributor Author

OAGr commented Dec 16, 2023

(Addressing your comment)
On naming: I'm unsure here. I'm not very attached to the name, but I think JSON would give people the best idea of what it's meant to be. JSON doesn't exactly have to be a string - well, JSON itself is just a format, not a string. JS of course has the popular stringify() method, but this further points out that stringified JSON is not the same as JSON.
In some libraries there are toJSON() methods on classes, which outputs JSON-like objects similar to these.

I imagine we could also have stringify methods. Maybe we have:

  1. JSON.make() -> Makes the dict type, as shown, probably without lamdas.
  2. JSON.makeWithLambdas() -> If needed. Keeps lambdas.
  3. JSON.stringify(JSON | any) -> Returns JSON string.
  4. (Later) JSON.parse -> Converts some JSON objects into Squiggle objects.

I think the main use wouldn't be to deal with functions as much as to deal with distributions/calculators/plots and so on. Many of these map 1-1 with some JSON-like representation (minus lambdas, sometime).

Some potential uses:

  1. Get all the data that comes from a distribution, perhaps to get the samples or something.
  2. See the values behind a Plot/Calculator, for debugging.
  3. If you have a Calculator object, maybe modify one field of it and ask for a new Calculator with that change.
  4. Have your API call Squiggle, get this format as JSON, for whatever you might want.

There is one interesting distinction to consider, that I'm not sure about. My guess is that we'll want to wrap Values, with Type information. Like,

{type: "PointSetDist", value: {mixed: {}....}}

The downside here is that this makes reading them in the Squiggle viewer a bit annoying, but the upside is that this would allow us to fully reconstruct them (minus lambdas), without otherwise knowing the type.

@OAGr
Copy link
Contributor Author

OAGr commented Dec 18, 2023

Quick thought - this could also help with testing. As in, we test against the JSON(Calculator), as that would give us more information than our regular Calculator toString.

* main: (180 commits)
  support 0.9.0 in ModelExportPage
  Bump versions after 0.9.0 release
  fix exec
  0.0.1-0
  print exec logs (should help with github releases)
  don't ignore versioned-components (it's private anyway)
  lint fix
  reformat changelogs
  improve changelog
  update vscode changelog entry
  fix unexpected character <
  update lockfile
  bundle tailwind-generated css with vscode ext
  simplify tailwind config for packages that use versioned-components
  Version Packages
  skip empty changesets
  cleanup useless changeset headers
  minor changelog script improvements
  rephase and update changesets
  ignore all vscode workspace files
  ...
@OAGr OAGr changed the title Experiment with adding serialization Fast attempt at serialization, using SimpleValue Dec 30, 2023
@OAGr OAGr marked this pull request as ready for review December 30, 2023 20:22
Copy link
Collaborator

@berekuk berekuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several things I'd change in this PR, but I'm not sure if it's worth the delay.

(I'll write these here instead of inlining, sorry; if I'll try to comment on the entire diff, this will be too long and I'd go into too many details)

  1. Value and SimpleValue types could be converted on TypeScript level with conditional types, then we could replace JSType generic parameter in SqValue with it, it's a bit complicated typewise, but I expect the result would be quite nice
  2. Most of any types in simpleValue.ts, I'd either replace with unknown or improve in other ways
  3. Maybe we don't need ImmutableMap in simple values, could be plain JS objects or builtin Maps? we have to use ImmutableMap in VDict values because we want to modify Squiggle dicts quickly, but I expect that simple values will always be read-only.
  4. I don't think you ever use SqAbstractValue.toSimple, and it could be moved to asJS (like, make asJS non-abstract, then override it in some subclasses)

Some or all of these can be done when this PR is merged, it's not a big deal.


Another concern is that you've tagged custom object types with vType, but dicts can have this field too; I see that you don't try to deserialize custom objects in toValue, so I guess this format isn't stable yet and we can change it later?

But the only options that aren't ambiguous that I see here are:

  1. wrap all dicts in { type: "Dict", value: ... }
  2. or, replace vType with JS symbol; but this would work only for SimpleValue, not for serialized JSON

I know there's also a third option, "reserve a rarely used Dict key for this", like __SQUIGGLE_TYPE__, but I'd recommend against that workaround. It's hard to predict how serialization will be used, and doing this would mean that someone in the far future will be able to, for example, crash someone's webpage by passing this unique key, with unforeseeable consequences. (In other words, I hope that Squiggle Dicts will stay close to JS Maps in semantics, not JS Objects, which do have warts like this).


One last small request: toValue -> simpleValueToValue, fromValue -> simpleValueFromValue, etc. Short names look too generic when I look at them out of context, e.g. in SqValue/index.ts which also has wrapValue that's unrelated.

I'd also be fine with fromValue being a method (toSimple()) on Value objects and toValue being a static BaseVallue.fromSimple() method, but that might entangle values and simple values too much.

packages/squiggle-lang/src/value/toSimpleValue.ts Outdated Show resolved Hide resolved
* main: (31 commits)
  Hot fix for method scale
  remove import assertion
  Added back scale default constants
  Added changeset
  ScaleShift -> Method
  Added back defaults to SqScale
  First pass on refactoring Scale
  convert textmate-grammar to a public package; highlight squiggle code in markdown
  MarkdownViewer uses shiki instead of react-syntax-highlighter
  Finishing touches to Tag.doc
  Changing @description to @doc
  Fixes from CR
  fix version order; fix insertVersion script
  Fixed minor import error
  Refactored text size and color data to be in MarkdownViewer
  support decorators on exported vars
  Update packages/website/scripts/compileDocsForLLM.js
  Update tag.ts
  Final touches to LLMPrompt page
  Added LLMPrompt page
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

2 participants