Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Added debug IDs to source bundle javascript files #762

Merged
merged 14 commits into from
Feb 21, 2023

Conversation

mitsuhiko
Copy link
Member

@mitsuhiko mitsuhiko commented Feb 20, 2023

This adds a new header debug-id to files stored in a source bundle that is used as artifact bundle. The goal of this is to enable the lookup of JavaScript files by debug ID. As we are not using much of the DebugSession abstraction for this yet, i chose not to add this to the trait but just the particular implementation of a session for source bundles. Lookup requires knowledge of the file type and only returns contents, no meta information.

let sess = bundle.debug_session().unwrap();
let f = sess
    .source_by_debug_id(
        "5e618b9f-54a9-4389-b196-519819dd7c47".parse().unwrap(),
        SourceFileType::MinifiedSource,
    )
    .unwrap()
    .expect("should exist");

Manifest Format Changes

The manifest gains a new well known key in the headers dictionary called debug-id which so far is exclusively used to refer to debug IDs of source maps and minified source files. However it's intentionally kept generic in case there is further use of this in the future. For instance it might be valuable for discovering debug files for .wasm files on the internet via HTTP header.

{
  "files": {
    "foo.min.js": {
      "type":"minified_sourec",
      "url": "https://mycdn.invalid/foo/foo.min.js",
      "headers": {
        "sourcemap": "foo.min.map",
        "debug-id": "eb6e60f1-65ff-4f6f-adff-f1bbeded627b"
      }
    }
  },
  "debug_id": "06f93af5-832b-3bbc-6dfd-27a46912605a"
}

Additionally the use of debug_id in the toplevel manifest is now also used to refer to the artifact bundle for JavaScript. This means that in the JavaScript case we have multiple IDs at play:

  • toplevel debug_id: refers to the artifact bundle (source bundle) of a particular upload / release
  • per file debug_id: refers to the particular debug ID of an individual minified source and it's corresponding source map

This also changes the implementation to always lowercase headers before writing and after reading for simplified handling.

Source Maps vs Indexed RAM Bundles

Because we have different SourceFileTypes for normal source maps and indexed RAM bundles they require separate handling for the caller. This is probably sensible as they are nothing alike but it's a bit odd.

Artifact vs Source Bundle

This commit also documents that a source bundle and an artifact bundle are really the same thing, with some minor differences about how we refer to it depending on if it's use to hold sources for a debug file, or if it holds release like artifacts for JavaScript processing.

Other API Changes

This also adds missing functionality to the SourceBundleDebugSession to make this actually useful for JavaScript processing:

  • source_by_url is added to look up a source by URL. Open question here is if this can just be the same as source_by_path? It's odd that we have two ways to key this really.
  • source_by_debug_id is the canonical way now to look up sources by debug ID + kind
  • SourceFileDescriptor is added to better describe what a source is for. The descriptor has all the API necessary to retrieve information from source files (contents and meta data) that is relevant for symbolication and unminification:
    • ty returns the SourceFileType. This was previously impossible to read after setting.
    • contents returns the contents if available, otherwise url should be used for fetching.
    • url doubles as a way to either refer to a source link or the canonical URL of the source file. There is no reason why these need to be separate methods as they are referring to the same thing anyways.
    • path is just the path that was used to look up the source for native source bundle companions
    • debug_id returns the parsed DebugId for source maps and minified sources in bundles that have debug IDs
    • source_mapping_url returns either the sourcemap or x-sourcemap reference in the manifest of the bundle. If neither can be found, it falls back to parsing the embedded contents. This makes it convenient enough to use though one could make an argument that the parsing logic should go elsewhere.

@mitsuhiko mitsuhiko requested a review from a team February 20, 2023 12:43
@github-actions
Copy link

github-actions bot commented Feb 20, 2023

Messages
📖 Do not forget to update Sentry-docs with your feature once the pull request gets approved.

Generated by 🚫 dangerJS against 9172abc

symbolic-debuginfo/src/sourcebundle.rs Outdated Show resolved Hide resolved
symbolic-debuginfo/src/sourcebundle.rs Outdated Show resolved Hide resolved
symbolic-debuginfo/src/sourcebundle.rs Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Feb 20, 2023

Codecov Report

Merging #762 (0b23291) into master (9e55fec) will increase coverage by 0.31%.
The diff coverage is 85.24%.

❗ Current head 0b23291 differs from pull request most recent head 9172abc. Consider uploading reports for the commit 9172abc to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #762      +/-   ##
==========================================
+ Coverage   74.22%   74.53%   +0.31%     
==========================================
  Files          70       70              
  Lines       15117    15317     +200     
==========================================
+ Hits        11220    11417     +197     
- Misses       3897     3900       +3     

Copy link
Member

@Swatinem Swatinem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its very unfortunate that we use the same file format for both artifact bundles (that need metadata), and source bundles (that only care about files).

I would separate these two things from one another. We could have an accessor for the metadata which lazily parses the JSON manifest.
This metadata then allows you to query things by path or by debug-id, etc.

For applying the native source context, we do not need the metadata at all.
For that native use-case, the file paths in the zip are completely predictable.
Profiling has even shown that unzipping and parsing the JSON manifest (which can be hundreds of KiB) is expensive, especially considering that you mostly want just a couple of files from it.
So there, it might be cheaper to just access the contents directly via the predictable file paths, and do the manifest parsing and lookup as a fallback.

I’m not entirely sure what the best idea is to deal with "contents" vs "url". We could do the same logic there as well.

One optimized code path for:

  • Give me the contents of the file contained in the zip.

Plus another code path for:

  • Give me the metadata (including the url where to fetch the file) for file with name or debug-id.

symbolic-debuginfo/src/sourcebundle.rs Outdated Show resolved Hide resolved
symbolic-debuginfo/src/sourcebundle.rs Outdated Show resolved Hide resolved
symbolic-debuginfo/src/sourcebundle.rs Outdated Show resolved Hide resolved
symbolic-debuginfo/src/sourcebundle.rs Outdated Show resolved Hide resolved
Comment on lines 432 to 434
if possible_sourcemap.starts_with("data:application/json") {
return None;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if we actually want to return the data url here? Otherwise we would have to do that same detection logic in symbolicator again. CC @loewenheim wdyt, as you implemented the base64 encoded data urls?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should expose the data url here, yes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure though if we should leave the data url prefix alone; if the URL is returned as a plain string, we will need to be able to recognize it later.

@Swatinem Swatinem merged commit 68147ad into master Feb 21, 2023
@Swatinem Swatinem deleted the feature/debug-ids branch February 21, 2023 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants