Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parseFileSync span bug #1366

Open
JiangWeixian opened this issue Jan 28, 2021 · 28 comments
Open

parseFileSync span bug #1366

JiangWeixian opened this issue Jan 28, 2021 · 28 comments
Labels

Comments

@JiangWeixian
Copy link

Describe the bug

same file parse twice, different output

Input code

//test.jsx
const a = 1

import { parseFileSync } from '@swc/core';

const ast = parseFileSync('./test.jsx', { jsx: true, syntax: 'ecmascript', comments: true, dynamicImport: true, decorators: true, decoratorsBeforeExport: true, script: true })
const ast1 = parseFileSync('./test.jsx', { jsx: true, syntax: 'ecmascript', comments: true, dynamicImport: true, decorators: true, decoratorsBeforeExport: true, script: true })

console.log(JSON.stringify(ast))
console.log('====')
console.log(JSON.stringify(ast1))

{"type":"Module","span":{"start":0,"end":11,"ctxt":0},"body":[{"type":"VariableDeclaration","span":{"start":0,"end":11,"ctxt":0},"kind":"const","declare":false,"declarations":[{"type":"VariableDeclarator","span":{"start":6,"end":11,"ctxt":0},"id":{"type":"Identifier","span":{"start":6,"end":7,"ctxt":0},"value":"a","typeAnnotation":null,"optional":false},"init":{"type":"NumericLiteral","span":{"start":10,"end":11,"ctxt":0},"value":1},"definite":false}]}],"interpreter":null}
====
{"type":"Module","span":{"start":12,"end":23,"ctxt":0},"body":[{"type":"VariableDeclaration","span":{"start":12,"end":23,"ctxt":0},"kind":"const","declare":false,"declarations":[{"type":"VariableDeclarator","span":{"start":18,"end":23,"ctxt":0},"id":{"type":"Identifier","span":{"start":18,"end":19,"ctxt":0},"value":"a","typeAnnotation":null,"optional":false},"init":{"type":"NumericLiteral","span":{"start":22,"end":23,"ctxt":0},"value":1},"definite":false}]}],"interpreter":null}

first ast span start at 0, second start at 12

Config

{
    // Please copy and paste your .swcrc file here
}

Expected behavior
A clear and concise description of what you expected to happen.

Version
The version of @swc/core:

"@swc/core": "^1.2.46",

Additional context
Add any other context about the problem here.

@Brooooooklyn
Copy link
Member

By design,just change the file name passed into the parseFileSync you'll get the same result.

@JiangWeixian
Copy link
Author

By design,just change the file name passed into the parseFileSync you'll get the same result.

I think it's a bug

@kdy1
Copy link
Member

kdy1 commented Jan 29, 2021

It's not a bug.
If you want span to start with 0, you can create new instance of Compiler, which have same apis.

@kdy1 kdy1 closed this as completed Jan 29, 2021
@JiangWeixian
Copy link
Author

JiangWeixian commented Jan 29, 2021

Compiler

@kdy1

const compiler = new Compiler()
const ast = compiler.parseFileSync('test.jsx', { syntax: 'ecmascript' })
const compiler1 = new Compiler()
const ast1 = compiler1.parseFileSync('test.js', { syntax: 'ecmascript' })

// test.js
const b = 1
// test.jsx
const a = 1
{"type":"Module","span":{"start":0,"end":11,"ctxt":0},"body":[{"type":"VariableDeclaration","span":{"start":0,"end":11,"ctxt":0},"kind":"const","declare":false,"declarations":[{"type":"VariableDeclarator","span":{"start":6,"end":11,"ctxt":0},"id":{"type":"Identifier","span":{"start":6,"end":7,"ctxt":0},"value":"a","typeAnnotation":null,"optional":false},"init":{"type":"NumericLiteral","span":{"start":10,"end":11,"ctxt":0},"value":1},"definite":false}]}],"interpreter":null}
====
{"type":"Module","span":{"start":12,"end":23,"ctxt":0},"body":[{"type":"VariableDeclaration","span":{"start":12,"end":23,"ctxt":0},"kind":"const","declare":false,"declarations":[{"type":"VariableDeclarator","span":{"start":18,"end":23,"ctxt":0},"id":{"type":"Identifier","span":{"start":18,"end":19,"ctxt":0},"value":"b","typeAnnotation":null,"optional":false},"init":{"type":"NumericLiteral","span":{"start":22,"end":23,"ctxt":0},"value":1},"definite":false}]}],"interpreter":null}

second file still start with 12, not working

@kdy1 kdy1 reopened this Jan 29, 2021
@kdy1 kdy1 modified the milestones: v1.2.47, v1.2.48 Jan 29, 2021
@kdy1 kdy1 modified the milestones: v1.2.48, v1.2.49, v1.2.50 Feb 19, 2021
@kdy1 kdy1 removed this from the v1.2.50 milestone Mar 1, 2021
@mikob
Copy link

mikob commented Apr 15, 2021

Having the same issue. Any known workarounds?

Here's a slightly simpler reproduction:

    const compiler1 = new Compiler();
    console.log(compiler1.parseSync("const a = 1", { syntax: "ecmascript" }));
    const compiler2 = new Compiler();
    console.log(compiler2.parseSync("const b = 1", { syntax: "ecmascript" }));

@kdy1
Copy link
Member

kdy1 commented Apr 15, 2021

Currently, there's no workaround.

I think api to convert a span from swc to line-col span is required, though.

@mikob
Copy link

mikob commented Jun 13, 2021

@kdy1 why is line-col needed? Can't \n just be consistently be used for line endings if it's a windows/unix EOL problem?

@kdy1
Copy link
Member

kdy1 commented Jun 14, 2021

@mikob What do you mean? line-col is about span, not line endings.

@mikob
Copy link

mikob commented Jun 14, 2021

Right, I just don't see the immediate benefit to using line-col. I thought maybe because you wanted to be explicit about line endings. I use spans to get offsets and extract pieces of the code using .substr and line-col would only make that harder.

@jdalton
Copy link

jdalton commented Aug 17, 2021

Ran into this issue too trying to make an adapter for estree AST.

@kdy1
Copy link
Member

kdy1 commented Aug 18, 2021

@jdalton I want to remove parse and print with v2 of swc.
I'm going to provide a rust-based plugin API instead.

@jdalton
Copy link

jdalton commented Aug 19, 2021

@kdy1

I want to remove parse and print with v2 of swc.
I'm going to provide a rust-based plugin API instead.

Ah cool. Will there still be a way to expose the AST? I currently have been able workaround this issue using an adapter to track the offset of the start of the root AST and smooth over the other differences with estree.

@kdy1
Copy link
Member

kdy1 commented Aug 20, 2021

You mean, you want to get actual offset from Span of node side?
I think I can provide an API to get line-col span from lo-hi span.

This API, if added, will be removed along with parse/print on v2, but it will work for now.

@jdalton
Copy link

jdalton commented Aug 20, 2021

@kdy1 I've noticed some confusion around this from another user. Folks use the AST from parsers, in formats like babylon and estree, to perform source transformations. The AST usually has properties such as .start, .end or maybe a .range array of [start, end] values. In the case of swc it has a .span object with .start and .end properties. These coordinates allow for source to be modified in substrings at the start and end of various nodes. So having AST is required for this but also the AST should have coordinates of the nodes (start and end string positions).

@kdy1
Copy link
Member

kdy1 commented Aug 20, 2021

Most ast nodes already have span. What swc is lacking is an api to convert span to line-col coordinates.
Do you mean there should be line-col coordinates?

@jdalton
Copy link

jdalton commented Aug 20, 2021

Are we talking past each other? Maybe an example would help. In ast-explorer using acorn with an example of export * as ns from 'a'; the ExportAllDeclaration node has a start of 0 and an end of 24. swc provides these as a span property (almost identical to the values of acorn's range/state/end) but because of #1366 its coordinates are off on subsequent parsings.

@kdy1
Copy link
Member

kdy1 commented Aug 20, 2021

Now I understand it. But I'm not sure about the change.
Plugin api / parse / print is misdesigned and going to be removed in v2.
In other words, AST will not be exposed to js world.

@jdalton
Copy link

jdalton commented Aug 20, 2021

In other words, AST will not be exposed to js world.

If it could still be exposed to JS by a rust plugin or some such way that would be great too.

@vjpr
Copy link

vjpr commented Sep 30, 2021

Here is the location of the issue.

Every time transform_sync is called, SourceMap#new_source_file is called and the start_pos is set.

If you are using transformSync and transforming one file at at time, there seems to be no way to reset the start pos. And spans use the SourceMap start pos.

#[js_function(4)]
pub fn transform_sync(cx: CallContext) -> napi::Result<JsObject> {
    exec_transform(cx, |c, src, options| {
        Ok(c.cm.new_source_file(
            if options.filename.is_empty() {
                FileName::Anon
            } else {
                FileName::Real(options.filename.clone().into())
            },
            src,
        ))
    })
}
    pub fn new_source_file(&self, filename: FileName, src: String) -> Lrc<SourceFile> {
        // The path is used to determine the directory for loading submodules and
        // include files, so it must be before remapping.
        // Note that filename may not be a valid path, eg it may be `<anon>` etc,
        // but this is okay because the directory determined by `path.pop()` will
        // be empty, so the working directory will be used.
        let unmapped_path = filename.clone();

        let (filename, was_remapped) = match filename {
            FileName::Real(filename) => {
                let (filename, was_remapped) = self.path_mapping.map_prefix(filename);
                (FileName::Real(filename), was_remapped)
            }
            other => (other, false),
        };

        // We hold lock at here to prevent panic
        // If we don't do this, lookup_char_pos and its family **may** panic.
        let mut files = self.files.borrow_mut();

        let start_pos = self.next_start_pos(src.len());

        let source_file = Lrc::new(SourceFile::new(
            filename,
            was_remapped,
            unmapped_path,
            src,
            Pos::from_usize(start_pos),
        ));

Can we get a config option to not set the start_pos from the end of the previous one? It should probably be the default for the public js api.


NOTE: If let start_pos = self.next_start_pos(src.len()); is set to zero on each compile, source maps actually break...the originalLine in the source map is always 1, and the filename is something like jsx-config-pragmaFrag.js.

@marcins
Copy link

marcins commented Aug 19, 2022

For my purposes, I was implementing a parse + visit, and needed spans to get access to comments in the source which SWC doesn't currently include in the AST (in JS at least). I have worked around this for now by adding a visitProgram that captures the node.span.start as an offset, and everywhere I need a span I just transform it to one with the offset subtracted. Works for my purposes, might not for yours.

e.g.

class MyVisitor extends Visitor {
    code: string;
    offset: number;

    constructor(code: string) {
        super();
        this.code = code;
        this.offset = 0;
    }

    visitProgram(node: Program) {
        this.offset = node.span.start;
        return super.visitProgram(node);
    }

    fixSpan(span: Span): Span {
        return {
            start: span.start - this.offset,
            end: span.end - this.offset,
        };
    }

    visitCallExpression(node: CallExpression): Expression {
            const bodySpan = this.fixSpan(node.span);

            // this can now get the correct code for this node with subsequent invocations..
            const nodeBody = this.code.substring(bodySpan.start - 1, bodySpan.end);
   }
}

@sverrejoh
Copy link

For my purposes, I was implementing a parse + visit, and needed spans to get access to comments in the source which SWC doesn't currently include in the AST (in JS at least). I have worked around this for now by adding a visitProgram that captures the node.span.start as an offset, and everywhere I need a span I just transform it to one with the offset subtracted. Works for my purposes, might not for yours.

e.g.

class MyVisitor extends Visitor {
    code: string;
    offset: number;

    constructor(code: string) {
        super();
        this.code = code;
        this.offset = 0;
    }

    visitProgram(node: Program) {
        this.offset = node.span.start;
        return super.visitProgram(node);
    }

    fixSpan(span: Span): Span {
        return {
            start: span.start - this.offset,
            end: span.end - this.offset,
        };
    }

    visitCallExpression(node: CallExpression): Expression {
            const bodySpan = this.fixSpan(node.span);

            // this can now get the correct code for this node with subsequent invocations..
            const nodeBody = this.code.substring(bodySpan.start - 1, bodySpan.end);
   }
}

This works in many cases, but I found that if the file starts with a comment the offset is off.

@voorjaar
Copy link
Contributor

For my purposes, I was implementing a parse + visit, and needed spans to get access to comments in the source which SWC doesn't currently include in the AST (in JS at least). I have worked around this for now by adding a visitProgram that captures the node.span.start as an offset, and everywhere I need a span I just transform it to one with the offset subtracted. Works for my purposes, might not for yours.
e.g.

class MyVisitor extends Visitor {
    code: string;
    offset: number;

    constructor(code: string) {
        super();
        this.code = code;
        this.offset = 0;
    }

    visitProgram(node: Program) {
        this.offset = node.span.start;
        return super.visitProgram(node);
    }

    fixSpan(span: Span): Span {
        return {
            start: span.start - this.offset,
            end: span.end - this.offset,
        };
    }

    visitCallExpression(node: CallExpression): Expression {
            const bodySpan = this.fixSpan(node.span);

            // this can now get the correct code for this node with subsequent invocations..
            const nodeBody = this.code.substring(bodySpan.start - 1, bodySpan.end);
   }
}

This works in many cases, but I found that if the file starts with a comment the offset is off.

This would fix for comments and spaces, we can parse an empty string first to get the initial offset.

   this.offset = parseSync("").span.end

@DoctorGester
Copy link

@voorjaar this only works for me if I'm doing parseFileSync after i.e.

export async function parse_files(files: string[]): Promise<Parsed_Module[]> {
    return await Promise.all(
        files.map(async (file) => {
            // @Hack SWC bug - https://github.com/swc-project/swc/issues/1366
            const offset = (parseSync("")).span.end;

            return {
                file,
                span_offset: offset,
                module: parseFileSync(file, parse_options)
            };
        })
    );
}

Instead of

export async function parse_files(files: string[]): Promise<Parsed_Module[]> {
    return await Promise.all(
        files.map(async (file) => {
            // @Hack SWC bug - https://github.com/swc-project/swc/issues/1366
            const offset = (parseSync("")).span.end;

            return {
                file,
                span_offset: offset,
                module: await parseFile(file, parse_options)
            };
        })
    );
}

Which has unacceptable performance :(
Any ideas?

@voorjaar
Copy link
Contributor

Not sure if this works. We didn't use the built-in visitor. See https://github.com/macro-plugin/macros

async function hackParseFile (file, options) {
  const offset = (await parse("")).span.end

  return [offset, await parseFile(file, options)]
}

export async function parse_files (files: string[]): Promise<Parsed_Module[]> {
  return await Promise.all(
    files.map(async (file) => {
      // @Hack SWC bug - https://github.com/swc-project/swc/issues/1366
      const [span_offset, module] = await hackParseFile(file, parse_options)

      return {
        file,
        span_offset,
        module
      }
    })
  )
}

@joarfish
Copy link

joarfish commented Jun 5, 2023

We've also at first had problems with using spans (in node/typescript). Apart from taking into account the offset as mentioned in other comments above its also important to keep in mind that the span's start/end refer to byte positions and not char positions. So this was helpful in our context to get the right substrings from SWC's spans:

export class StringAsBytes {
    private string: Uint8Array;
    private decoder: TextDecoder;

    constructor(string: string) {
        this.decoder = new TextDecoder();
        this.string = (new TextEncoder()).encode(string);
    }

    /**
     * Returns a slice of the string by providing byte indices.
     * @param from - Byte index to slice from
     * @param to - Optional byte index to slice to
     */
    public slice(from: number, to?: number): string {
        return this.decoder.decode(
            new DataView(this.string.buffer, from, to !== undefined ? to - from : undefined)
        );
    }
}

Hope this is helpful.

@rizrmd
Copy link

rizrmd commented Nov 20, 2023

This bug is kind of deal breaker for switching to swc, In our product we need to invoke parser multiple times in the same process. Increasing span.start with each invocation will make swc stopped working at some point – especially when parsing huge file.

twada added a commit to twada/power-assert-monorepo that referenced this issue Dec 30, 2023
Yash-Singh1 added a commit to Yash-Singh1/stylex-vscode that referenced this issue Jan 10, 2024
- add move away from swc to roadmap because of accumalating offsets (swc-project/swc#1366)
- strict requirements for doc color in keyvalue properties
@Yash-Singh1
Copy link

For my case, I need this to convert spans to line-column values in an LSP that I am implementing. Currently, this breaks in swc when I have a file with a comment at the top, because that offsets all of the positions of my spans.

@knightedcodemonkey
Copy link

knightedcodemonkey commented Jun 1, 2024

There is a workaround if you run each parse call in it's own child_process.

import { spawnSync } from 'node:child_process'
import { join } from 'node:path'

import type { Module } from '@swc/core'

const spawnChild = (...args: string[]) => {
  const { stdout } = spawnSync('node', [join(import.meta.dirname, 'child.js'), ...args])
  const ast = JSON.parse(stdout.toString()) as Module

  return ast
}
export const reparse = (source: string): Module => {
  return spawnChild(source)
}

const ast1 = reparse('const foo = "bar"')
const ast2 = reparse('const foo = "bar"')

console.log(ast1.span.start === ast2.span.start) // true

child.js

import { argv, stdout } from 'node:process'
import { parseSync } from '@swc/core'

stdout.write(JSON.stringify(parseSync(argv[2])))

I've published this to npm as @knighted/reparse. It also supports passing a filename.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests