Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AssemblyScript #5640

Closed

Conversation

SebastianSpeitel
Copy link

@SebastianSpeitel SebastianSpeitel commented Nov 8, 2021

Description

Adds AssemblyScript language and heuristics.

The heuristics are really rough for now and will be changing.

Checklist:

- '@unmanaged'
- '// @ts-ignore: decorator'
- '[iuf](8|16|32|64)'
- 'usize'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- 'usize'
- '\b[iu]size'

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why only \b on the beginning and not the end?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because we should accept isize or usize but not iusize or uisize

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's the \b at the front, but thy not \b[iu]size\b?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add this at the end but usually it's unnecessary. But it's depend on how tokens match - per lines or per words.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So something like [\s:<][iu]size[\s>)] would be more accurate.
(The : is for variable defintions like const ptr:usize, the <> for generics like Array<usize> and the ) for (ptr: usize)=>void)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's true. It could be false positive. But at the same time this should be also valid: :isize for example function foo():isize.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also =>usize

Copy link

@MaxGraey MaxGraey Nov 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All possible variants: Array<usize>, ()=>usize, (x:isize) :isize, : isize{, as isize;, isize(x), <isize>x, [isize,isize], {x:isize}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like a more appropriate border would be [^.\w], so it doesn't match inside other words and not as a property.

- '@final'
- '@unmanaged'
- '// @ts-ignore: decorator'
- '[iuf](8|16|32|64)'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- '[iuf](8|16|32|64)'
- '\b[iuf](8|16|32|64)'

pattern:
- '@inline'
- '@final'
- '@unmanaged'
Copy link

@MaxGraey MaxGraey Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also

- '@operator'
- '@global'

Copy link
Author

@SebastianSpeitel SebastianSpeitel Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and @unsafe and maybe even @builtin

Copy link
Member

@lildude lildude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow!! Over 20 samples 😱 We really don't need that many, especially the really big files which seems to have a lot of duplicate content from the others. Please cut down the samples to only those that are most representative of the language and real world use. 2-5 is plenty.

Copy link
Collaborator

@Alhadis Alhadis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't AssemblyScript quite literally a clean subset of TypeScript? That doesn't make it its own language, and is going to be a nightmare to disambiguate from "regular" TypeScript; i.e., you'll wind up with projects classified as 80% AssemblyScript and perhaps 20% TypeScript due to the presence of .ts files that didn't match a heuristic.

@SebastianSpeitel
Copy link
Author

@Alhadis Yes it is, except a few very small edge cases.
In some cases the code is valid TS and AS (sometimes even by design, because there is portable AS) and it should fallback to TS there.

@lildude I included so many examples because it is so hard to distinguish and I even removed some of them, because they were valid TS and AS even for me.
I want to include a few more TS examples as well to test in that direction.

@Alhadis
Copy link
Collaborator

Alhadis commented Nov 11, 2021

@Alhadis Yes it is, except a few very small edge cases.
In some cases the code is valid TS and AS (sometimes even by design, because there is portable AS) and it should fallback to TS there.

This should tell you that no real (language-level) difference exists between AssemblyScript and TypeScript—the former is simply a restrictive and specialised application of the latter, not unlike asm.js. From my reading, the only difference between the two technologies is that AssemblyScript mandates strictly-typed code and abstinence from JavaScript's more dynamic features (I'm guessing stuff like eval(), new Function(…) and so forth).

@MaxGraey
Copy link

MaxGraey commented Nov 11, 2021

AssemblyScript has not only a stricter type system, but also some semantic features and fixes that TypeScript cannot afford. Some of this diffs are presented in this article: https://blog.suborbital.dev/assemblyscript-vs-typescript.

Btw good point about eval. It could be add to negative_pattern

@Alhadis
Copy link
Collaborator

Alhadis commented Nov 11, 2021

Some of this diffs are presented in this article: https://blog.suborbital.dev/assemblyscript-vs-typescript.

All of those differences are semantic in nature, and have nothing to do with syntax. That is, they pertain to the interpretation of TypeScript, and many of these differences can also be enforced by a conventional TypeScript compiler, IIRC.

Sorry, I'm not seeing anything that warrants recognition as a discrete language.

@dcodeIO
Copy link

dcodeIO commented Nov 11, 2021

Perhaps for reference, the most prominent syntactic difference is that decorators like @inline are allowed on functions and variables. The remainder is mostly semantics, like a different set of built-in types, some new, some not supported, availability of operator overloads, type assertions becoming casts, different behavior of triple equals, and such. Not taking a side :)

@SebastianSpeitel
Copy link
Author

This is what I'm currently at:

- extensions: ['.ts']
  rules:
  - language: XML
    pattern: '<TS\b'
  - language: AssemblyScript
    and:
    - negative_pattern: 
      # Invalid types in AssemblyScript
      - '[^"`''.\w]undefined[^"`''.\w]'
      - '[^.\w]any[^.\w]'
      - '[^.\w]unknown[^.\w]'
      # No eval in AssemblyScript
      - '[^.\w]eval\s*\('
    - pattern:
      # Builtin decorators
      - '@inline'
      - '@final'
      - '@unmanaged'
      - '@operator'
      - '@global'
      - '@unsafe'
      - '@builtin'
      - '^// @ts-ignore: decorator$'
      # number types only available in AssemblyScript
      - '[^"`''.\w][iuf](8|16|32|64)[^"`''.\w]'
      - '[^"`''.\w][iu]size[^"`''.\w]'
  - language: TypeScript

Sorry, I'm not seeing anything that warrants recognition as a discrete language.

I think AssemblyScript counts as it's own language, but I can see that if there is no way to safely distinguish it, it can't be added here.

Also I think that in cases where a file in an AssemblyScript project is misclassified (which will surely happen) and is valid Typescript, the classification is still correct. For me it would be enough to get the heuristics to a point where no TypeScript file is detected as AS, because that would probably confuse a lot of people, but false-negatives in AS are not that big of a deal (for me).
If false-negatives are a no-go here, I would have no problem with that.

@Alhadis
Copy link
Collaborator

Alhadis commented Nov 11, 2021

I think AssemblyScript counts as it's own language

That's only because the project's name makes it sound like a language. 😉 Would you hold the same perspective if the project were named ASM.ts or ts2wasm?

but I can see that if there is no way to safely distinguish it, it can't be added here.

You're right, I'm afraid. However, assembly script can still be added as an alias of TypeScript, which will give more meaningful results when searching by language, as well as be recognised in fenced code blocks in readme files:

``` assemblyscript
export function negate(n: i32): i32 {
	return -a;
}
```

``` typescript
export function negate(n: i32): i32 {
	return -a;
}
```

negative_pattern:
- '\s+undefined\s+'
- '\s+any\s+'
- '\s+unknown\s+'
Copy link

@MaxGraey MaxGraey Nov 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw I'm not sure about unknown. It is quite likely that it will still be used at least for variadic function declarations

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also some people in the AS community have implemented various forms of unknown, and it could conceivably be supported officially later once semantics are ironed out.

@SebastianSpeitel
Copy link
Author

However, assembly script can still be added as an alias of TypeScript, which will give more meaningful results when searching by language, as well as be recognised in fenced code blocks in readme files:

I'm not sure an alias of TypeScript would be correct, because we already have users, who expect to be able to use any TS library in their AS project, which is sadly not the case. And letting them find mostyl TS code when searching for AS wouldn't help with this.
But the recognition in code blocks would be amazing.

Would it be possible to add it, but without detection via heuristics, so it has be set via overrides?

@Alhadis
Copy link
Collaborator

Alhadis commented Jan 23, 2022

Just to be clear: AssemblyScript is not a language. It's a compilation target for TypeScript with the same syntax, semantics, and compiler. Certain aspects of the runtime may differ (predefined types, a more restrictive standard library, etc), but runtimes aren't languages.

I noticed this PR was submitted in response to @comtechnet's issue at AssemblyScript/assemblyscript#2127:

Github does NOT show/list AssemblyScript or WASM - so NO ONE knows I have blockchain AS or WASM experience #2127

That way folks looking at my repo and others can tell right away that we have NEAR blockchain and AssemblyScript smartcontract programming experience.

The most we could do is add AssemblyScript as an alias of TypeScript, for the reasons I outlined earlier. However, this will not show up in repository listings as a project's language; i.e., here:

Figure 1

Long story short, GitHub doesn't support user-defined languages (including arbitrary labels or descriptions). Instead, @comtechnet (and any other AssemblyScript users) are encouraged to use topics to improve "discoverability". Just tag your project with #assemblyscript.

@Alhadis Alhadis closed this Jan 23, 2022
@MaxGraey
Copy link

Just to be clear: AssemblyScript is not a language. It's a compilation target for TypeScript with the same syntax, semantics, and compiler.

That's totally false. AssemblyScript is typescript-like language and compiler to WebAssembly which is compiler target. It has own subset syntax which is close to TS but has some limitations and extensions (like global function decorators, operator overloads, native types like i64/u64/i8/u16 which doesn't support by TS). It also has own parser, compiler and emitter. It has similar syntax but totally different semantics sometimes.

@Alhadis
Copy link
Collaborator

Alhadis commented Jan 23, 2022

@MaxGraey A compiler and its output are completely independent of its input. gcc(1), for example, can compile at least 9 different languages:

-x language
Specify explicitly the language for the following input files (rather than letting the compiler choose a default based on the file name suffix). This option applies to all following input files until the next -x option. Possible values for language are: c, c++, objective-c, objective-c++, assembler, ada, d, f77, go.

It can output binaries or shared objects for a variery of architectures, which serves as the analogoue of a compiler targeting JavaScript or WASM as its output.

It has own subset syntax which is close to TS but has some limitations and extensions (like global function decorators, operator overloads, native types like i64/u64/i8/u16 which doesn't support by TS).

Yes, that ties in with what I said about runtime details:

Certain aspects of the runtime may differ (predefined types, a more restrictive standard library, etc), but runtimes aren't languages.

@MaxGraey
Copy link

A compiler and its output are completely independent of its input

Yes, and how it relate to TS and AS are the same statement? Every language should have at least one compiler. Some of them like Clang or GCC can compile C and C++ at the same time. But I don't really see how this cancels out the fact that AS and TS have pretty serious differences in semantics?

@MaxGraey
Copy link

MaxGraey commented Jan 23, 2022

For example Flow and TS have much more similarities between each other. Has similar types, syntax, compilation target (JS). Same extension (.js). But they syntaxes distinguish in linguist at some points: https://github.com/github/linguist/blob/4fc6808d99980a88465861217db6b74cd0d639d3/lib/linguist/vendor.yml#L229

And here:
#4515

@Alhadis
Copy link
Collaborator

Alhadis commented Jan 23, 2022

When I say "semantics", I'm talking about the exact behaviour and logic of language-level elements: classes, functions, variables, expressions, scoping, etc. Fundamental elements of JavaScript, and by extension, TypeScript.

But they syntaxes distinguish in linguist at some points:

No, we're not distinguishing syntax here. vendor.yml lists filename patterns that are commonly used by vendored code, and are to be excluded from language stats by default.

@MaxGraey
Copy link

I also meant that cancelled PR #4515

@Alhadis
Copy link
Collaborator

Alhadis commented Jan 23, 2022

That's a different language of the same name.

@MaxGraey
Copy link

Ah, you're right!

@munrocket
Copy link

I am agree that it's nightmare to detect it. Moreover as a programmer I prefer another extension for AssemblyScript in my personal projects. Here is an issues about supporting different extension that I am using it

  1. Consider a file extension other than .ts AssemblyScript/assemblyscript#1003
  2. Support for .as extension as-pect/as-pect#356
  3. add .as support surma/rollup-plugin-assemblyscript#3

AssemblyScript is still pretty young language and follow WebAssembly specification and implemented things like Operator Overloading / SIMD / Reference Types / Own STD Library / Other interfaces that will never will be in TypeScript. Also it's incompatible with TypeScript and better to write new project from scratch than adopting it. WebAssembly technology is growing and Javascript community is huge. Here statistics in Google Trends, NPM downloads in 5 years and video from Github itself about it https://www.youtube.com/watch?v=97ej9-CE3Gc

|||

Will you add it to github if we change the extension to a unique one? @Alhadis, please give it a try! Because I think we can make this decision and it's right time to change the extension finally (community used .ts just for linting LOL).

@romdotdog
Copy link

romdotdog commented Jan 25, 2022

As an AssemblyScript contributor, changing the extension and thus having to replicate the hundreds of errors that TypeScript lints for in the AssemblyScript compiler would be both costly performance-wise and would bloat WASM bootstrapping sizes. I don't intend to hijack this discussion and turn this into a pros-and-cons-of-changing-the-extension discussion so I'd be happy to voice and reiterate my stance on this in the Discord.

@github-linguist github-linguist locked as resolved and limited conversation to collaborators Jun 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants