Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how much work needed to make it work at compile time? #70

Closed
timotheecour opened this issue Jun 28, 2019 · 19 comments
Closed

how much work needed to make it work at compile time? #70

timotheecour opened this issue Jun 28, 2019 · 19 comments

Comments

@timotheecour
Copy link
Contributor

after applying the patch from #69 (comment) to make it work with latest nim 0.20 (which does work!), I'm trying to make it work at compile time; currently it gives compiler errors for the basic examples in https://nimyaml.org/

newFileStream doesn't work at CT so I tried this as well:

  let s1 = staticRead(file)
  let s = newStringStream(s1)
  var personList: seq[Person]
  load(s, personList)

which gives several errors eg:

private/lex.nim(1153, 11) Error: cannot generate code for: mGCref

/yaml/private/lex.nim(1157, 37) Error: VM is only allowed to 'cast' between integers and/or floats of same size
      result[] = YamlLexerObj(source: cast[pointer](blSource), inFlow: false,

Has anyone succeeded in making it work at CT (even with a more limited set of yaml features would be a start)?

@flyx
Copy link
Owner

flyx commented Jul 6, 2019

The lexer can handle different sources (strings, streams) with the SourceProvider concept. To avoid exporting the actual source implementation via generic parameter, the source is stored as pointer and unchecked casts are used to retrieve it. The Lexer stores function pointers to the functions doing the correct casts for the current source.

Since the VM cannot cast, this code needs to be refactored in some way. The idea of SourceProvider was to avoid dispatching calls for retrieving each character as this will slow down the Lexer. So a good solution will not employ methods.

However, there is no guarantee that the rest of the code will work at compile-time.

@sealmove
Copy link

This feature is crucial for nimitai so I plan to tackle it soon.

@flyx
Copy link
Owner

flyx commented Oct 13, 2020

@sealmove There are two other issues related to the current state of the lexer, #85 (also some more information available there) and several failures to comply to the YAML specification which are not documented as GitHub issue but available in this list of skipped tests (the tests come from the official YAML test suite).

The best way to go ahead is to rewrite the Lexer properly. I basically know how to do it since I wrote AdaYaml after NimYAML and avoided all these issues there. Since this and the other issue seem to be important, I will try and accomplish a rewrite till the end of the month, to avoid putting more effort into code that cannot be fixed to solve all its issues without rewriting it (a change is needed in how the input is tokenized).

Would this work for you? I can't really guarantee that a Lexer rewrite will completely fix this issue but you can take it up from there.

@sealmove
Copy link

@flyx Sorry I didn't notice your reply until now. This is awesome news!

Would this work for you

Yes, absolutely. Please notify me when the new lexer is ready.

@flyx
Copy link
Owner

flyx commented Nov 6, 2020

@sealmove Status update:

The rewrite is mostly working (as seen in the branch lexer-rewrite) however the lexer & parser APIs changed and I need to adjust the serialization code (the high level API, i.e. load, dump etc, will be unchanged).

Compile time execution is currently blocked by the global usage of serializationTagLibrary in some places. I will probably go forward and remove the type TagId, instead returning tags directly as ref string or something – this will remove the need for those global references.

A quick downgrading edit shows that NimYAML indeed works at compiletime when this obstacle is removed.

Work on the branch should be finished soon™.

@flyx
Copy link
Owner

flyx commented Nov 10, 2020

The lexer rewrite has now landed in the devel branch.

The following code using NimYAML at compile time now compiles and yields the expected result:

import yaml

type
  Person = object
    firstnamechar: char
    surname: string
    age: int32

setTagUri(Person, "!person")

const persons = loadAs[seq[Person]]("""
- {firstnamechar: K, surname: Koch, age: 23}
- {firstnamechar: S, surname: McDuck, age: 150}""")

for p in persons:
  echo p

However, latest Nim head is required for this to work. Nim 1.4.0 fails to do it because `==` on proc variables appears to be broken there.

Currently not working are ref values of any kind. The reason for this is that those values are being cast to pointer when loading, which is used for storing them as target for aliases. I expect that it is possible to support ref values at compile time by disabling this feature, so that you cannot load cyclic structures. I currently do not plan to implement this as it seems to be a rather exotic feature – if this is important for you, please open a separate issue for it.

I invite everyone taking interest in this issue to test their use-case with NimYAML's current devel branch and give feedback if something is not working as expected. I don't plan to thoroughly test this myself as I don't expect discrepancies from runtime operation seeing that the simple example given above works.

@flyx flyx closed this as completed Nov 10, 2020
@timotheecour
Copy link
Contributor Author

timotheecour commented Nov 10, 2020

@flyx this snippet gives:
NimYAML/yaml/serialization.nim(407, 18) Error: attempting to call undeclared routine: 'getGMTime'

EDIT: looks like i was on master, not devel; why is the main branch called devel instead of master? this goes against virtually all github repos and nimble packages (with the only exception of nim repo itself) ?

(this aside, this is great news, thanks!)

@timotheecour
Copy link
Contributor Author

Currently not working are ref values of any kind. The reason for this is that those values are being cast to pointer when loading, which is used for storing them as target for aliases

I don't understand this part; it prevents a lot of usage at CT even with nim-lang/Nim#15528; why are pointer needed for that?

@flyx
Copy link
Owner

flyx commented Nov 11, 2020

why is the main branch called devel instead of master

The master branch is for releases; a checkout of master will give you the latest release. This is common workflow for larger projects although I admit that most single-dev repos don't use it; it's just my preferred workflow. It should not really matter since devel is the HEAD branch and therefore will be the one checked out when you clone NimYAML or install it with nimble's yaml#head.

I don't understand this part; it prevents a lot of usage at CT even with nim-lang/Nim#15528; why are pointer needed for that?

To resolve aliases. For example:

import yaml

type
  Node = ref object
    name: string
    children: seq[Node]

let input = """
name: root
children:
- &a
  name: A
  children: []
- name: B 
  children: [*a]
"""

let root = loadAs[Node](input)

echo root.children[0].name             # will be A
echo root.children[1].children[0].name # will be the same A

NimYAML will create multiple references to ref values from aliases in the input, as seen here. To facilitate that, it must keep a hashmap of all objects created from YAML nodes with an anchor. Since anchored nodes could be of multiple different types, they are cast to pointer (only ref values are allowed to have an anchor so this always works) for storing them in the hashmap. A type identifier is stored along with the pointer to ensure that the pointer is only used with the correct type.

I am not sure what the alternative to this would be. I guess it would be possible to have one hashmap per type but that would involve lots of macro magic to generate the necessary amount of hashmaps based on the type we deserialize to (it would need to collect all ref types that are part of the root type). I would rather not add that much complexity as maintaining this library is already hard enough as it is.

In OOP languages like Java I would use the base Object type for this but in Nim not every type derives from RootObj and for all I know, upcasting/downcasting is not possible to the extend needed for this.

Perhaps anchors could simply be forbidden during compile time, but I'm not entirely sure how to do that. Is there some when defined(compiletime)? I wonder whether this would help anyone, as using ref types kind-of implies that you want to be able to reference an object multiple times.

I am no active Nim user apart from maintaining this library so there may be options I am not familiar with. If you know one, please tell me.

@sealmove
Copy link

Thanks for this work!

Is there some when defined(compiletime)

Fyi yes, there is.

@timotheecour
Copy link
Contributor Author

timotheecour commented Nov 11, 2020

At least nimyaml should allow to deal with ref types that don't involve multiple references to the same object, that's a very common use case.
For multiple references, I don't see why it wouldn't work in VM, but at least we can deal with that later (note, a difficulty would be if user wants to convert a VM value involving multiple references to same object to a RT value, as nim-lang/Nim#15528 currently doesn't handle those correctly but that's a separate issue that can be dealt with later)

Is there some when defined(compiletime)

when nimvm as mentioned above (see usage restriction in docs)

I wonder whether this would help anyone, as using ref types kind-of implies that you want to be able to reference an object multiple times

I disagree with that; ref could be in the type declaration (eg to allow multiple references at run time) but not in the serialization string (ie no multiple references to same object mentioned in serialization string); it's common

To facilitate that, it must keep a hashmap of all objects created from YAML nodes with an anchor
Since anchored nodes could be of multiple different types, they are cast to pointer

why don't you use int instead of pointer? eg:

when true:
  type Foo = ref object
    x: int
  proc main()=
    let f1 = Foo(x: 10)
    let b1 = cast[int](f1)
  static: main()
  main()

the address (via cast[int]) should be enough to tell whether an object is already in HashMap.

EDIT: maybe the problem is you need the reverse direction as well, ie: cast[Foo](b1) where b1: int ? if so, that doesn't work yet, but I can try to make a PR in nim to support that (I had previously added nim-lang/Nim#12877 but it doesnt' support int => ref)

@flyx
Copy link
Owner

flyx commented Nov 11, 2020

why don't you use int instead of pointer?

pointer seemed to be the more fitting type. Ultimately the only thing that matters is that the target type has the correct size, so if using int avoids any problems I'm unaware of, I can use that instead.

maybe the problem is you need the reverse direction as well

Yes, without the reverse cast, it would not be possible to put the value together.

but I can try to make a PR in nim to support that

That would certainly be the best solution, compared with when nimvm which would just be a functionality-degrading workaround. I'd be grateful if you could do that.

@timotheecour
Copy link
Contributor Author

timotheecour commented Nov 13, 2020

when true:
  import yaml
  type
    Person = ref object
      firstnamechar: char
      surname: string
      age: int32
  proc `$`(a: Person): string = $a[]
  setTagUri(Person, "!person")
  proc fn =
    let persons = loadAs[seq[Person]]("""
      - {firstnamechar: K, surname: Koch, age: 23}
      - {firstnamechar: S, surname: McDuck, age: 150}""")
    echo persons
  static: fn()
  fn()

CT:
@[(firstnamechar: 'K', surname: "Koch", age: 23), (firstnamechar: 'S', surname: "McDuck", age: 150)]
RT:
@[(firstnamechar: 'K', surname: "Koch", age: 23), (firstnamechar: 'S', surname: "McDuck", age: 150)]

@flyx
Copy link
Owner

flyx commented Nov 14, 2020

Great work! Probably better test code is the one with nodes which actually uses anchors & aliases – the code you posted only assures the cast compiles but never uses it:

import yaml

type
  Node = ref object
    name: string
    children: seq[Node]

static:
  let input = """
name: root
children:
- &a
  name: A
  children: []
- name: B
  children: [*a]
"""

  let root = loadAs[Node](input)

  echo root.children[0].name
  echo root.children[1].children[0].name

@timotheecour
Copy link
Contributor Author

timotheecour commented Nov 14, 2020

can we re-open this issue until support at CT is reasonable (doesn't have to be complete for that)?

your snippet doesn't work, but neither does the following that doesn't contain anchors:

when true:
  import yaml
  type
    Node = ref object
      name: string
      children: seq[Node]

  proc main =
    let input = """
  name: root
  children:
  - name: A
    children: []
  """
    let root = loadAs[Node](input)
    echo root.children[0].name
  static: main()
  main()

gives:

/Users/timothee/git_clone/nim/timn/tests/nim/all/t11261.nim(125, 15) t11261
/Users/timothee/git_clone/nim/timn/tests/nim/all/t11261.nim(123, 28) main
/Users/timothee/git_clone/nim/NimYAML/yaml/serialization.nim(1382, 7) loadAs
/Users/timothee/git_clone/nim/NimYAML/yaml/serialization.nim(1365, 14) load
/Users/timothee/git_clone/nim/NimYAML/yaml/serialization.nim(1344, 5) construct
/Users/timothee/git_clone/nim/timn/tests/nim/all/t11261.nim(6, 1) template/generic instantiation from here
/Users/timothee/git_clone/nim/NimYAML/yaml/serialization.nim(1344, 5) Error: unhandled exception: Expected field name, got yamlStartMap [YamlConstructionError]
      raise (ref YamlConstructionError)(getCurrentException())

with #89 it gives:
with -d:yamlDebug:

yamlDebug: parser: push atStreamStart, indent = -2
yamlDebug: parser: transition atStreamEnd
yamlDebug: parser: push beforeDoc, indent = -1
yamlDebug: lexer -> [1,1-1,3] Indentation
yamlDebug: parser: transition beforeDocEnd
yamlDebug: parser: push beforeImplicitRoot, indent = -1
yamlDebug: parser: update indent = -1
yamlDebug: lexer -> [1,3-1,7] Plain
yamlDebug: parser: transition atBlockIndentationProps
yamlDebug: parser: update indent = 2
yamlDebug: lexer -> [1,7-1,8] MapValueInd
yamlDebug: parser: transition afterImplicitKey
yamlDebug: parser: push emitCached
yamlDebug: emitCollection key: pos = 0, len = 1
yamlDebug: parser: pop

upon further investigating it looks like yamlStartMap is being produced twice, hence the error; but I'm lacking context of the inner workings of your library to figure out the problem

edit: see #91 for further analysis on this particular issue

  • it's really hard to debug this (requires knowledge of the inner workings of your library), what would really help is if you could minimize this down to something that doesn't use any yaml code, ie pointing to a nim bug (ie: works at RT, not at CT).

  • another issue is that nimyaml code keeps reusing the following pattern which is bad:
    try: complexCode except: raise someException
    this is bad because all the context is lost. At least you should populate the parent exception

  • except Exception: is bad (eg, in proc next*(s: YamlStream)) ; it means it'll catch even defects like AssertionDefect

@timotheecour
Copy link
Contributor Author

timotheecour commented Nov 14, 2020

@flyx

Probably better test code is the one with nodes which actually uses anchors & aliases

I've improved the nim PR and now cyclic data (nimYAML anchors) do work, see #92

the above mentioned problem remains (also referenced in #91)

@flyx
Copy link
Owner

flyx commented Nov 14, 2020

Fair enough. I'll reopen the issue.

The error in your code is unrelated to ref types. Here's an edit that only calls the parser:

when true:
  import yaml

  proc main =
    let input = """
  name: root
  children:
  - name: A
    children: []
  """
    let p = initYamlParser(serializationTagLibrary)
    let s = p.parse(input)
    try:
      while true:
        let e = s.next()
        echo e
        if e.kind == yamlEndStream: break
    except:
      echo repr(getCurrentException())
  static: main()
  main()

This runs into an exception at compile time but works at runtime. I'll investigate to pinpoint what goes wrong.

@flyx flyx reopened this Nov 14, 2020
@timotheecour
Copy link
Contributor Author

timotheecour commented Nov 14, 2020

This runs into an exception at compile time

can you show it?

your snippet works for me, no exception raised, but prints different things at RT and CT:

at CT:
+STR
+DOC
+MAP
+MAP
=VAL :root
=VAL :children
+SEQ
+MAP
+MAP
=VAL :A
=VAL :children
+SEQ
-SEQ
-MAP
-SEQ
-MAP
-DOC
-STR

at RT:
+STR
+DOC
+MAP
=VAL :name
=VAL :root
=VAL :children
+SEQ
+MAP
=VAL :name
=VAL :A
=VAL :children
+SEQ
-SEQ
-MAP
-SEQ
-MAP
-DOC
-STR

@flyx flyx mentioned this issue Nov 14, 2020
@flyx
Copy link
Owner

flyx commented Aug 30, 2023

With Nim 2.0.0 and current devel, RT and CT both work for the snippet I posted above. That leaves the ref-to-ptr-cast error the only known remaining issue for CT.

@flyx flyx closed this as completed in f60725f Aug 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants