Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Way to split code into multiple files #40

Closed
oxinabox opened this issue Dec 19, 2019 · 7 comments
Closed

Way to split code into multiple files #40

oxinabox opened this issue Dec 19, 2019 · 7 comments

Comments

@oxinabox
Copy link
Contributor

Per brief discussion at NeurIPS.
Sometime Input is not really IO,
In that it is deterministic, e.g. the loading of data.
So just code in another format, in another file.
This kind of input is thus actually more closely linked to metaprogramming than normal IO.

But right now, we can't actually load code in the Dex language, from another file. AFAIK.

I propose the additional of :include file/path.dx
as a command allowed to occur at top level.

Then I will do some metaprogramming in some other language (obs. Julia), in order to generate some Dex code that contains my data.
Which I will :include at the top of my script.

I don't want to put it directly in my script as it's probably going to be thousands of lines long.
Also I might want to regenerate it.

A possible generalisation of this would be includeby::(String->AST)->Path->Nothing
Which would take in a function to do the metaprogramming.
But I don't think we are anywhere near there yet?

@dougalm
Copy link
Collaborator

dougalm commented Jan 1, 2020

Yes, we definitely want an import system. I think I'll just follow Haskell here, starting with an unqualified import Foo that gives you access to the whole top-level namespace of foo.dx.

Then I will do some metaprogramming in some other language (obs. Julia), in order to generate some Dex code that contains my data.

That's a good place to start for now, but I think we'd ideally have a dedicated serialization format (or two) rather than actually executing Dex code containing mostly literals. You'd still load it as if you were importing a module, but our implementation would just need to parse it and put the data in memory, rather than compile and execute it. I'm imagining two formats: textual and binary. The textual one might be a subset of Dex syntax (like JSON is a subset of javascript). The binary one could be close to Dex's internal runtime data structures, so that it could just be memmapped.

@oxinabox
Copy link
Contributor Author

oxinabox commented Jan 1, 2020

Some time this week, I intend to write a few slides explaining that seperating files from namespaces, and import (access things from namespace) from include (near direct-text transfer) is (against common wisdom) a good thing.
Title of that part of the talk: "namespaces are overrated, let's have less of them".
Which is not the whole argument, at all.
But I am yet to write it.
Key points other points include:

  • making it easy to make and use local packages vs full on packages managed by package manager means they don't take that jump and thus do not release things nor benefit from dependency management.
  • overly long files, or overly empty namespaces.

The short is: I think should have an include right now, not an import.

I think a binary formay would be good.
When I had mere 200 examples from fashionMNIST as text constants it was taking ages to reload in web.
Though an include would help there as can avoid reparsing to check for changes i guess.

@dougalm
Copy link
Collaborator

dougalm commented Jan 3, 2020

Implementing include is also quite a bit simpler than import, and we can use ordinary unix file paths (foo/bar.dx) rather than inventing our own name resolution mechanisms (Foo.Bar). You've convinced me! 3087694 still has a few rough edges but it's a start.

This still isn't a good way to load data, since compiling huge literals will be very slow (several LLVM instructions per scalar). I'll keep this issue open until we have a dedicated text or binary serialization solution.

@oxinabox oxinabox changed the title Way to split code I to multiple files Way to split code into multiple files Jan 7, 2020
@oxinabox
Copy link
Contributor Author

oxinabox commented Jan 7, 2020

A change also needs to be made to WebOutput.hs,
so that it knows it needs to watch out for changes in any include'd files.

dougalm added a commit that referenced this issue Jan 8, 2020
This means we can efficiently(ish) load "dex object" files and write them to
memory directly rather than compiling them as huge LLVM literals (see #40). It's
still not very fast: loading a length-100k vector of integers takes 4 seconds.
It doesn't hit LLVM, but it still uses the general program parser and type
inference. We can probably make it faster with literal-specific ones.
@dougalm
Copy link
Collaborator

dougalm commented Jan 10, 2020

I just made a little binary data format (0c7f0fa to 70742cb). It's a mmappable format, so it may also help us build a zero-copy FFI. There's a Python module to dump NumPy arrays into it. I'd welcome a Julia one ;).

@oxinabox
Copy link
Contributor Author

I'd welcome a Julia one ;).

Seems fun.

@dougalm
Copy link
Collaborator

dougalm commented Feb 26, 2020

I think load and include are good enough for now. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants