Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial changes to drop default serde dependency #255

Merged

Conversation

CosmicHorrorDev
Copy link
Contributor

Related #214

Motivation

This was an experiment to see how difficult it would be to drop serde usage by default (including serde_json and serde_yaml). The idea was to see how lightweight insta could be in both clean compile times and the number of dependencies if someone was just using the debug, display, and string snapshot formats

(Spoilers) 💥 Breaking Changes 💥

  1. This does involve removing the Deserialie and Serialize trait from several publicly exposed types
  2. The type for info got changed to a String although I think it could just be moved behind the serialization feature flag
  3. The assert_yaml_snapshot!() and assert_json_snapshot!() formats got moved behind the yaml and json feature flags respectively

Implementation

The serde implementation was replaced by a handwritten baby version that somewhat follows the same principle. The idea is still to translate the underlying data format to a common structure (in this case crate::parse::Value) and to have logic to serialize and deserialize data for structs with that common format

To make the implementation simple the common format is kept very basic to make it easy to map to the data returned by the parsers (This can be seen in src/parse/mod.rs). Also the mapping of the common data format to different structs had to be handwritten instead of just being able to use #[derive(Deserialize, Serialize)] (This can be seen in src/snapshot.rs)

Results

Big dislaimer: The main benefit is to people who can get by with just the debug, display, and string snapshot formats

This drastically drops the number of default dependencies

$ cargo tree --edges no-dev
insta v1.17.1 (/home/wintermute/Programming/Repos/insta)
├── console v0.15.1
│   ├── libc v0.2.126
│   ├── once_cell v1.11.0
│   └── terminal_size v0.1.17
│       └── libc v0.2.126
├── json v0.12.4
├── linked-hash-map v0.5.6
├── once_cell v1.11.0
├── similar v2.1.0
│   ├── bstr v0.2.17
│   │   ├── lazy_static v1.4.0
│   │   ├── memchr v2.5.0
│   │   └── regex-automata v0.1.10
│   └── unicode-segmentation v1.9.0
└── yaml-rust v0.4.5
    └── linked-hash-map v0.5.6

Along with having a large impact on clean build times (Note before filters out crates that built in less than 0.3 seconds while after filters out less than 0.2s)

Before

baseline

After

slim

I also took a brief survey over 20 highly downloaded crates that use insta to see how many use formats outside of debug, default, and string. From

cargo-emit	    cynic	   hostlist-parser   opentelemetry-application-insights  symbolic-debuginfo
cargo-watch	    cynic-codegen  html_parser	     paperclip				 symbolic-symcache
cargo-workspaces    delog	   lustre_collector  similar				 syn
codespan-reporting  full_moon	   minijinja	     stylua				 test-case

only cynic, full_moon, and html_parser used a format outside of debug, default, and string. yaml for cynic and full_moon and json for html_parser

Pending Work

I didn't want to take the time to polish everything since I'm not sure this will even be merged considering it involves breaking changes

  • Going through and updating docs
  • Probably switching info back to Serialize probably

@mitsuhiko
Copy link
Owner

mitsuhiko commented Jul 31, 2022

I like this a lot. There are even quite quite significant benefits for people who are using the serde formats with this. However what I'm not a fan of is that the snapshot format no longer carries info directly and expression is missing entirely:

Before:

---
source: tests/test_settings.rs
description: The snapshot are four integers
expression: "vec![1, 2, 3, 4]"
info:
  env:
    ENVIRONMENT: production
  cmdline:
    - my-tool
    - run
---

After:

---
source: tests/test_settings.rs
description: The snapshot is four integers
info: "env: [\"ENVIRONMENT\", \"production\"], cmdline: [\"my-tool\", \"run\"]"
---

I generally think that the info change to a string would be something I would not be okay with. Since apparently yaml is a dependency now anyways it should be easy enough to parse this. The info object can also be represented by the internal Value type rather than the serde_yaml Value type instead.

More importantly in general content::Content and parse::Value are more or less duplicates. I think it would be a better call to make them the same type and only add the serde specific trait implementations for the former as needed. There is no need to carry two value types in the crate for what is effectively the same purpose. The many different types on the content type could be phased out, they are not that useful and i believe they are already invisible to observers.

I also generally believe that serde_json and serde_yaml could be entirely pointless as dependencies. For as long as Content/Value implements optionally serde::Serialize it would be possible to make a minimal adapter that just serializes out via yaml/json.

In the interest of cutting out more dependencies the json dependency as it exists at the moment is completely unneeded. The only use of parsing seems to be the cargo manifest which can be parsed with the yaml crate as well.

Good work overall. I think this can turn into something quite valuable! Oh in terms of if this could be merged: absolutely. Mostly because I believe this could even be smooth by going in two version steps:

  • In the first release: a new _hidden_serde feature is added which is on by default which gates Serialize and Deserialize on public traits along with the serde based assertion macros and gives a deprecation warning unless the necessary yaml and json features are also enabled.
  • Next release that is off by default.

About info in particular: since it internally holds Content/Value i think it would be more than acceptable to have two apis for it: once that takes the value directly raw, the other takes a Serialize and the latter is only used when the serde feature is enabled. Beyond setting it, it already does not expose serde stuff (very intentionally I might add).

@mitsuhiko
Copy link
Owner

I pushed a small fix for expression up.

@CosmicHorrorDev
Copy link
Contributor Author

I totally agree with all the feedback. A brief summary:

  • Drop parse::Value in favor of content::Content
  • Switch info to store a Value/Content and potentially expose two apis to set it (one by giving the raw value and the other taking some Serialize type)
  • Potentially drop serde_json and serde_yaml entirely by substituting Content rendering to json/yaml

The one thing I still have questions on is

In the interest of cutting out more dependencies the json dependency as it exists at the moment is completely unneeded. The only use of parsing seems to be the cargo manifest which can be parsed with the yaml crate as well

I'm assuming that you're mixing up YAML and TOML when it comes to parsing the manifests unless there's something else I'm missing

Also worth noting that json is currently being used to create the pending inline snapshot files which seems to be a line delimited JSON format. I wonder if we could store them as a YAML array instead. It should be possible to make cargo-insta handle both formats to handle backwards compatibility


Good work overall. I think this can turn into something quite valuable! Oh in terms of if this could be merged: absolutely. Mostly because I believe this could even be smooth by going in two version steps:

  • In the first release: a new _hidden_serde feature is added which is on by default which gates Serialize and Deserialize on public traits along with the serde based assertion macros and gives a deprecation warning unless the necessary yaml and json features are also enabled.
  • Next release that is off by default.

Awesome! That's great to hear

I'll work towards addressing all the feedback above, and thanks for fixing expression being missing. That change was unintentional 😅

@mitsuhiko
Copy link
Owner

I'm assuming that you're mixing up YAML and TOML when it comes to parsing the manifests unless there's something else I'm missing

YAML is a superset of JSON so for the purpose of reading cargo manifest files you can use the yaml-rust crate:

use std::process::Command;

fn main() {
    let out = Command::new("cargo")
        .arg("read-manifest")
        .output()
        .unwrap();
    let manifest = yaml_rust::YamlLoader::load_from_str(
        std::str::from_utf8(&out.stdout).unwrap()).unwrap();
    dbg!(manifest);
}

About this in particular:

Also worth noting that json is currently being used to create the pending inline snapshot files which seems to be a line delimited JSON format. I wonder if we could store them as a YAML array instead. It should be possible to make cargo-insta handle both formats to handle backwards compatibility

For now I would probably keep the logic here but since insta (not cargo insta) only needs to emit these files I wouldn't be entirely opposed to a minimal JSON emitter which can be used for snapshots and this. #256 already showed that relying on generic serialization libraries is not great for insta as formats change all the time with updates, so actually having an embedded JSON emitter in insta is something I would probably prefer having anyways. But in the interest of keeping the PR mergeable I think that should be a followup task. So I would keep the json dependency exclusively for the inline snapshots pending file for now, and I will then follow up with something that kills that too.

@CosmicHorrorDev
Copy link
Contributor Author

Pushed changes that:

  • Replace usages of parse::Value with content::Content
  • Make info take Serialize and add in a raw_info that takes a content::Content e.g.
use insta::internals::Content;

let raw_info = Content::Map(vec![
    (
        Content::from("env"),
        Content::Seq(vec![
            Content::from("ENVIRONMENT"),
            Content::from("production"),
        ]),
    ),
    (
        Content::from("cmdline"),
        Content::Seq(vec![Content::from("my-tool"), Content::from("run")]),
    ),
]);
with_settings!({description => "The snapshot is four integers", raw_info => &raw_info}, {
    assert_debug_snapshot!(vec![1, 2, 3, 4])
});
  • Lower the MSRV to 1.51.0 when using default features

I think this is a good stopping point for this PR to avoid scope creep and trying out a bunch of changes that may need hashing out. With that said the pending work left is (which I'm certainly happy to work on, but feel free to pick some if you want)

  • Switch cargo manifest parsing and pending inline snapshots to use YAML dropping the need for json
    • This will involve changes in cargo-insta to support both formats for backwards compat.
  • Evaluate replacing serde_yaml with Content::as_yaml (manually verify with several projects to make sure there aren't any differences)
  • Smooth the breaking changes with:

In the first release: a new _hidden_serde feature is added which is on by default which gates Serialize and Deserialize on public traits along with the serde based assertion macros and gives a deprecation warning unless the necessary yaml and json features are also enabled.

  • Deprecate the backtrace feature now that it's no longer used? (seems like we might as well)
  • Update docs to reflect all the changes

@CosmicHorrorDev CosmicHorrorDev marked this pull request as ready for review August 5, 2022 00:50
@mitsuhiko
Copy link
Owner

This looks very good. I think I will be merging this to master and then look from there how to close on the remaining items.

@mitsuhiko mitsuhiko changed the title Experiment with dropping default serde dep Initial changes to drop default serde dependency Aug 5, 2022
@mitsuhiko mitsuhiko merged commit b9d58c2 into mitsuhiko:master Aug 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants