New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WD-40 (for reducing Rust with the next mainstream language) #35

Open
shelby3 opened this Issue Jul 17, 2017 · 291 comments

Comments

Projects
None yet
@shelby3

shelby3 commented Jul 17, 2017


                    
I propose a (rough sketch for a) language design that I posit while attempting to maintain the most significant opportunities for performance (e.g. monomorphisation) would have superior tradeoffs than Rust in the areas of complexity bloat, design freedom, and benefits versus inflexibility/tsuris of static code analysis. It would have the HLL feel of JavaScript with the LLL control of C++ without all the complexity and unnecessary abstractions (e.g. not the anti-patterns OOP subclassing and not shared_ptr as the default over GC).
GC Exclusion Axiom:
Afaics, presuming (enforcement of block scope lifetimes and) all functions input (mutable or immutable because we don’t care in a single-threaded model) borrows—i.e. don’t take lifetime ownership (note callbacks take lifetime ownership)—then GC is not required for provable use-after-free (aka dangling reference/pointer) safety. However, for dynamic data structures that share references (e.g. directed graphs with in-degree > 1), then GC references are much more convenient and performant than the alternative of representing the references with array indices. Note functions that would otherwise take ownership can take ownership of a (GC referenced or unboxed¹) deep copy/clone instead. However where deep cloning would be too costly in terms of performance and memory consumption, shared GC references may be preferred. Where GC is employed we have protection against memory leaks which Rust does not provide, except with an inferior except prompt/deterministic reference counting smart pointer paradigm Rc<T>.

Afaics, there are some marvelous simplifying implications of this axiom:

  1. Presuming all functions adhere except where they declare references to be GC, afaics the compiler can automatically implement RAII (in try-finally blocks² with an equivalent finally concept for Promise callbacks) for deterministic memory deallocation (and automatically deterministically destruct instances of a type that have a destructor¹ which is a major feature lacking from GC languages and the with of Python being inferior because it is manual and object isn’t deallocated deterministically; noting that such types can not used with GC references directly or indirectly as a member of a data structure that is GC referenced).

  2. Afaics, the same code will be sound and applicable whether the data structures are implemented by compiler unboxed without GC or boxed with GC. In other words, other than the aforementioned explicit GC references for algorithmic convenience and performance, the programmer never needs to indicate whether the data structures should be boxed or unboxed³. I want to strip away the noise/complexity of Ref<>, Box<>, etc.. For read-only inputs (i.e. the unannotated default) the programmer need not even think about whether the inputs are cloned-by-value (i.e. primitive type) or passed-by-reference; and cloned-by-value type inputs should never be allowed to be annotated as mutable on the function signature (although they can be mutated inside the function body unless they explicitly annotated read-only and the point being that the programmer never needs to think about whether the type is primitive or not and can see if there is a mutability implication by looking at the function signature). Although I presume an optimizing compiler will always strive for unboxed (because this will be more performant), if we start writing code in say TypeScript we can strive to adhere to this pattern (or have our compiler generate TypeScript) without all the unboxing mess if we so desire.

  3. Goodbye to Rust’s lifetime parameters and 3 kinds of pointers noise/tsuris/complexity; and due to the proposed single-threaded concurrency event model goodbye also to Rust’s mutability borrowing restrictions/complexity/tsuris.

  4. When thinking about interfacing with ASM.js (or Webassembly aka WASM) with minimum (or no) marshaling overhead, then since (which IMO is a major paradigmatic flaw) there is no way yet to have those interact directly with GC references (ostensibly because since there is no static typing then JavaScript thinks these are unbounded polymorphic, i.e. Any aka Object), we can (until some future where we write our own VM) favor unboxed and mallocing these via Emscripten (or similar transpilation to ASM.js/WASM of our own C-like language features).

¹ But not cloning resources such as file handles.

² And Rust’s error handling has no try (panic! is doesn't throw an error up the stack call hierarchy) and instead forces all exceptions to be bubbled up as Result<T, E> return values which is excessive boilerplate noise/tsuris.

³ Even arrays of data structures that have variable sized members such as String can be cleverly implemented (by a library or the compiler) to use indices byte offsets into a second array which contains the variable sized values. This can even work with mutable variable size values, by creating a new value at the end of the second array growing the array as necessary and/or implementing a malloc and free on the second array, i.e. this second array heap is freed from the parent heap as a monolith. Note if the second array is encapsulated so it can never be referenced directly, then it can be repositioned in virtual address space as needed to minimize virtual address space fragmentation. Monomorphisation could make such libraries more efficient.

Note bounded polymorphism never absolutely requires boxing. Unbounded polymorphism would but we are refusing to support Any, i.e. all unbounded polymorphism in modules must be bounded when consumed by compiled/linked code.

Subsequent discussion in this thread will explain further the rough sketch so that readers can gain clarity. Let’s see what caveats and flaws are revealed via discussion. I look forward to @keean’s feedback and perhaps he will explain to what degree it corresponds to some features of the Ada PL.

I do not claim any of these ideas are novel inventions. @keean has alluded to for example cloning as an alternative to lifetimes and/or GC. What I am doing above is making a more holistic presentation of various ideas that have been floating around.

Of course typeclasses or ML functor/modules on top of this for HLL abstractions. If typeclasses, there is an option to go with implicit function arguments instead of requires, which makes it more readily implementable in TypeScript.

@shelby wrote:

My expectation is that in Rust the programmer will often need to resort to unsafe code to obtain the necessary freedom. My overriding concern is the K.I.S.S. principle and the amount of the complexity budget consumed for the benefits attained.

@shelby3 wrote:

…the goals that one justifies employing static typing (“better structure/abstraction, have more readable/maintainable code, and giving stronger guarantees about safety, lack of bugs, and lack of exploits”)…

A valid argument against static typing which I have made on these issues threads, is that no static abstraction can be 100% leak-free. Meaning that eventually we always end up having to violate the invariants of the abstraction, or moreover that it becomes very convoluted to maintain type safety and have complete degrees-of-freedom of expression. So then static typing becomes a negative factor for productivity instead of a positive one. This is why selecting which abstractions to enforce with a compiler is an engineering design challenge (arguably even perhaps an art) of matching to use cases.

EDIT: on the coding for readability point, “Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.”

@shelby3 wrote:

…there is diminishing return from static checking complexity because we can not statically check every clever algorithm and design paradigm that programmers will come up with. I think he had mentioned fighting with type systems.

I am interested to explore static typing that does not frustrate programmers.

Disagree that Rust’s balance of flexibility and tsuris is ideally suited to most apps.

@shelby3 wrote:

Additionally it’s the noise of extreme static checking such as the lifetime parameters (in addition to generic parameters and normal kinds of static types) on every function in Rust… GC is so much more elegant. I understand certain real-time use cases will not tolerate GC pauses (although we did also discuss V8’s incremental mark-and-sweep upthread) and more optimized by zero overhead abstractions (e.g. no runtime overhead for deciding which to deallocate)

@shelby3 wrote:

Perhaps we will eventually think of some low-complexity abstraction that gives us non-marshaled integration for tree data structures between HLL and LLL, and perhaps we can even prove borrowing and lifetimes statically. But I suspect we can not without heading towards the complexity of C++ and Rust.

@keean wrote:

I would not write a web service in 'C' or even 'Rust' ever, this is really ideal ground for a garbage collected language. I would use Node (TypeScript), Django(Python) or 'Go'.

@keean wrote:

My view is there needs to be a high-level-language and a low-level-language that share enough memory-layout and data-structure that data can easily be passed backward and forward.

@shelby3 wrote:

I am wanting to move away from such clusterfucks of never ending aggregation of complexity.

I am not going to try to argue that Rust has no use cases.

But I want to create a mainstream programming language used by millions of programmers.

Do you want to compete with Rust?

@keean replied:

@shelby3 wrote:

Once we start heading for Rust’s three different kinds of pointers, then I will think we have failed to make a clean high-level language for the masses.

Here we agree.

@shelby3

This comment has been minimized.

Show comment
Hide comment
@shelby3

shelby3 Jul 17, 2017

not shared_ptr as the default over GC

inferior except prompt/deterministic reference counting smart pointer paradigm

deterministically destruct instances of a type that has a destructor¹ which is a major feature lacking from GC languages … noting that such types can not used with GC references directly or indirectly as a member of a data structure that is GC referenced

We need optional GC and reference counting. The latter excels at deterministic, prompt finalization (which is required for non-main memory finalization) but has the trade-offs quoted below:

@shelby3 wrote:

However, reference counting deallocation is prompt and more deterministic than GC mark-and-sweep (although I rebutted there, “But please note that reference counting can’t break cyclical references but GC can. And reference counting can cause a cascade/domino effect pause that would not be limited to a maximum pause as hypothetically V8’s incremental mark-and-sweep GC.”). Incremental GC schemes when not overloaded with allocation that escapes generational GC, decrease high-latency stalls.

Has it ever been tried to combine reference counting with GC? Apparently so:

Edaqa Mortoray wrote:

Reference counting with cycle detection adds an interesting twist. I believe both D and Python do this. In order to eliminate memory loops objects are “sometimes” scanned for reference loops. This involves a lot more work than simply touching a reference count — though still not nearly as much as scanning all allocated memory. The impact of this loop detection depends heavily on how often it is scanned, and what the triggers are.

Given our single-threaded model, we do not need atomics on the reference counting and given the proposal of the OP to emphasize borrowing and RAII then we would not get the benefits of a generational GC scheme. So reference counting would give us determinism except in the case of cyclical references and then GC would kick in to break cycles. However, when we know the cyclical references are probable (e.g. any kind of non-acyclic graph such as the DOM or a doubly-linked list), then it is wasted overhead to employ reference counting, although that overhead could be very tiny (i.e. infrequent tracing) if we only want to catch unexpected cycles (i.e. as an insurance policy and we’d want to log these and consider them bugs).

I suppose we could rate-limit reference counting finalization, to prevent the rare aberrant high latency stall.

@fauigerzigerk wrote:

@rbehrends wrote:

It is well-known that the amortized cost of naive reference counting is considerably higher than that of a modern tracing garbage collector (regardless of whether you also support cycle collection). It is also well-known that non-naive RC-based approaches generally make it easier to achieve lower pause times (though cycle collection may throw a spanner in the works).

True, but I think these comparisons are often done under the assumption that memory utilisation remains fairly low. Tracing GCs react very badly to growing memory utilisation in my experience. This is of course ultimately an economic question and maybe it should be modelled that way.

@sklogic wrote:

@jules wrote:

@jules wrote:

Tracing GC is not always faster. GC time is proportional to the size of the allocated memory, reference counting time is not.

It's not about running out of memory altogether, it's about the heap filling up with dead objects while the GC doesn't yet know that they are dead. Of course things are great when they are great and the GC can keep up with the mutator. Yet failure does happen whether or not you think it should happen, hence all the efforts that I mentioned that are trying to fix the issues.

As I said, in such a case RC would behave even worse.

@david-given wrote:

Regarding latency spikes: that's not entirely true. If you drop the last reference to an object, then freeing that object can cause the last reference to other objects to be dropped, etc.

So it's possible that dropping a reference can cause a large amount of work, which if it happens during a critical path can cause a latency spike. Particularly if finalisers are involved. If your object ownership graph is straightforward this isn't a problem, but if you have large objects with shared ownership it's very easy to be surprised here.

@sklogic wrote:

It is really hard to get real time guarantees with RC - given that time for cleaning up depends on a size of a structure.

Just imagine it freeing a 10Gb linked list.

Real time pure mark and sweep GCs are easier.

@rbehrends wrote:

Just keep in mind the trade-offs that you are making.

For starters, you are adding a considerable performance overhead to local pointer assignments; you're replacing a register-register move (and one that can be optimized away in many situations by a modern compiler through register renaming, turning it into a no-op) with a register-register move plus two memory updates plus a branch. Even where you have cache hits and the branch can be predicted correctly almost always, you're dealing with cache and BTB pollution; these effects may not fully show up in micro-benchmarks, but only completely manifest in large programs.

@barrkel wrote:

There's no discussion of thread safety and how that affects reference counting. That's probably the biggest problem; you need to use memory barriers to make reference counting safe with multiple threads, and that can kill performance.

@vardump wrote:

Reference counting is fast with one CPU core. But it gets worse and worse as you add cores. Inter-CPU synchronization is slow and the buses can saturate. With enough cores, at some point that'll be all the system is doing

@iofj wrote:

@rbehrends wrote:

It is well-known that…

I submit this is only true for equivalent number of things to keep track of. In practice this is not the case. Languages with GC [typically] go completely overboard with GC, using it completely everywhere, even when not strictly necessary and certainly java does.

In C++, if you have a map with items, you use move semantics and you have have either 0 or 1 refcounts to keep track of, the one for the map itself. The rest is still "refcounted" but without ever touching any integer, by the normal scoping rules. Same goes for Rust code. That's ignoring the fact that a Java object is, at minimum, 11 bytes larger than a C++ object. Given java's boxing rules and string encoding, the difference in object sizes becomes bigger with bigger objects. Because out-of-stackframe RTTI is a basic necessity of tracing garbage collectors this is unavoidable, and cannot be fixed in another language. Bigger object sizes also mean more memory needed, more memory bandwidth needed, ... And Java's constant safety checks also mean

In Java, the same map will give the GC 3 items to keep track of (minimum) per entry in the map, plus half a dozen for the map itself. One for the object that keeps the key and the value, one of the key and one for the value. That's assuming both key and value are boxed primitives, not actual java objects. In that case, it'll be more.

@majke wrote:

Having that in mind, there are couple of problems with refcnt:

  • Cache spill. Increasing/decreasing a counter values in random places in memory kills cache. I remember one time I separated the "string object" data structure in python away from the immutable (!) string value (python has usually C struct object followed in memory by value blob). The python program suddenly went way faster - immutable (readonly) string values in memory were close to each other, mutated refcnts were close to another. Everyone was happier

For this last quoted issue, I am contemplating the ability to employ the unused bits (c.f. also) of 64-bit pointers (64-bit also provides a huge virtual memory space and c.f. also) would enable storing an index into a contiguous array which has the reference counts as array elements so they would be in contiguous pages. If the number of available unused bits is insufficient to count all allocated objects, then perhaps each compiled object type could employ a separate array (i.e. a zero-runtime-cost compile-time array selection), presuming the cache thrashing issue is due to intra-page-size fragmentation, not lack of inter-page contiguity. Also then we do not need to create the array element until the second reference. This is applicable when mostly passing around objects and not accessing all the data of the object as frequently.

Weighted reference counting can be employed to reduce the accesses to the reference count by half (since the source pointer has to be accessed any way), but it requires some unused bits in the pointer to store the weight.


David Ireland wrote:

Don’t get me wrong, Apple must have had a good reason to choose reference counting for memory management. That reason is actually pretty obvious: Objective-C is not a safe language, so you can’t move objects. If you can’t move objects, You have to use something like malloc to minimise fragmentation: you pay a big up front cost to allocate an object in the right place. In most GC schemes, allocation is free – you just increment a pointer. This is fine, because you are going to iterate through all your objects every now and again, and it doesn’t cost a whole lot more to move some of them around at the same time. You get to compact away all the fragmentation this results in. This makes GC fairly miserly with memory too. In iOS , if you want to allocate something bigger than the largest free fragment, you die. With GC, you have to wait a while for the GC to compact everything, after which memory will be compacted 100% efficiently and there will be a single memory extent as big as all available memory. Objective-C can’t have any of these benefits, so Apple chose a different route for iOS, and put the best possible spin on it.

Mobile apps won’t be fast until browsers are re-written in a fast, safe programming language that can share a good GC with ECMAScript. Of course browsers won’t be secure until then either, but I’m not holding my breath. There is no such language that is sufficiently popular, and Google fluffed it’s lines with Go. Maybe ECMAScript implementations will gain so much in performance, that all code that manipulates the DOM will be written in ECMAScript, and we can finally use a high performance GC.


Edit Nov. 18, 2017 Distinguish between non-deterministic reference counting and deterministic RAII.

shelby3 commented Jul 17, 2017

not shared_ptr as the default over GC

inferior except prompt/deterministic reference counting smart pointer paradigm

deterministically destruct instances of a type that has a destructor¹ which is a major feature lacking from GC languages … noting that such types can not used with GC references directly or indirectly as a member of a data structure that is GC referenced

We need optional GC and reference counting. The latter excels at deterministic, prompt finalization (which is required for non-main memory finalization) but has the trade-offs quoted below:

@shelby3 wrote:

However, reference counting deallocation is prompt and more deterministic than GC mark-and-sweep (although I rebutted there, “But please note that reference counting can’t break cyclical references but GC can. And reference counting can cause a cascade/domino effect pause that would not be limited to a maximum pause as hypothetically V8’s incremental mark-and-sweep GC.”). Incremental GC schemes when not overloaded with allocation that escapes generational GC, decrease high-latency stalls.

Has it ever been tried to combine reference counting with GC? Apparently so:

Edaqa Mortoray wrote:

Reference counting with cycle detection adds an interesting twist. I believe both D and Python do this. In order to eliminate memory loops objects are “sometimes” scanned for reference loops. This involves a lot more work than simply touching a reference count — though still not nearly as much as scanning all allocated memory. The impact of this loop detection depends heavily on how often it is scanned, and what the triggers are.

Given our single-threaded model, we do not need atomics on the reference counting and given the proposal of the OP to emphasize borrowing and RAII then we would not get the benefits of a generational GC scheme. So reference counting would give us determinism except in the case of cyclical references and then GC would kick in to break cycles. However, when we know the cyclical references are probable (e.g. any kind of non-acyclic graph such as the DOM or a doubly-linked list), then it is wasted overhead to employ reference counting, although that overhead could be very tiny (i.e. infrequent tracing) if we only want to catch unexpected cycles (i.e. as an insurance policy and we’d want to log these and consider them bugs).

I suppose we could rate-limit reference counting finalization, to prevent the rare aberrant high latency stall.

@fauigerzigerk wrote:

@rbehrends wrote:

It is well-known that the amortized cost of naive reference counting is considerably higher than that of a modern tracing garbage collector (regardless of whether you also support cycle collection). It is also well-known that non-naive RC-based approaches generally make it easier to achieve lower pause times (though cycle collection may throw a spanner in the works).

True, but I think these comparisons are often done under the assumption that memory utilisation remains fairly low. Tracing GCs react very badly to growing memory utilisation in my experience. This is of course ultimately an economic question and maybe it should be modelled that way.

@sklogic wrote:

@jules wrote:

@jules wrote:

Tracing GC is not always faster. GC time is proportional to the size of the allocated memory, reference counting time is not.

It's not about running out of memory altogether, it's about the heap filling up with dead objects while the GC doesn't yet know that they are dead. Of course things are great when they are great and the GC can keep up with the mutator. Yet failure does happen whether or not you think it should happen, hence all the efforts that I mentioned that are trying to fix the issues.

As I said, in such a case RC would behave even worse.

@david-given wrote:

Regarding latency spikes: that's not entirely true. If you drop the last reference to an object, then freeing that object can cause the last reference to other objects to be dropped, etc.

So it's possible that dropping a reference can cause a large amount of work, which if it happens during a critical path can cause a latency spike. Particularly if finalisers are involved. If your object ownership graph is straightforward this isn't a problem, but if you have large objects with shared ownership it's very easy to be surprised here.

@sklogic wrote:

It is really hard to get real time guarantees with RC - given that time for cleaning up depends on a size of a structure.

Just imagine it freeing a 10Gb linked list.

Real time pure mark and sweep GCs are easier.

@rbehrends wrote:

Just keep in mind the trade-offs that you are making.

For starters, you are adding a considerable performance overhead to local pointer assignments; you're replacing a register-register move (and one that can be optimized away in many situations by a modern compiler through register renaming, turning it into a no-op) with a register-register move plus two memory updates plus a branch. Even where you have cache hits and the branch can be predicted correctly almost always, you're dealing with cache and BTB pollution; these effects may not fully show up in micro-benchmarks, but only completely manifest in large programs.

@barrkel wrote:

There's no discussion of thread safety and how that affects reference counting. That's probably the biggest problem; you need to use memory barriers to make reference counting safe with multiple threads, and that can kill performance.

@vardump wrote:

Reference counting is fast with one CPU core. But it gets worse and worse as you add cores. Inter-CPU synchronization is slow and the buses can saturate. With enough cores, at some point that'll be all the system is doing

@iofj wrote:

@rbehrends wrote:

It is well-known that…

I submit this is only true for equivalent number of things to keep track of. In practice this is not the case. Languages with GC [typically] go completely overboard with GC, using it completely everywhere, even when not strictly necessary and certainly java does.

In C++, if you have a map with items, you use move semantics and you have have either 0 or 1 refcounts to keep track of, the one for the map itself. The rest is still "refcounted" but without ever touching any integer, by the normal scoping rules. Same goes for Rust code. That's ignoring the fact that a Java object is, at minimum, 11 bytes larger than a C++ object. Given java's boxing rules and string encoding, the difference in object sizes becomes bigger with bigger objects. Because out-of-stackframe RTTI is a basic necessity of tracing garbage collectors this is unavoidable, and cannot be fixed in another language. Bigger object sizes also mean more memory needed, more memory bandwidth needed, ... And Java's constant safety checks also mean

In Java, the same map will give the GC 3 items to keep track of (minimum) per entry in the map, plus half a dozen for the map itself. One for the object that keeps the key and the value, one of the key and one for the value. That's assuming both key and value are boxed primitives, not actual java objects. In that case, it'll be more.

@majke wrote:

Having that in mind, there are couple of problems with refcnt:

  • Cache spill. Increasing/decreasing a counter values in random places in memory kills cache. I remember one time I separated the "string object" data structure in python away from the immutable (!) string value (python has usually C struct object followed in memory by value blob). The python program suddenly went way faster - immutable (readonly) string values in memory were close to each other, mutated refcnts were close to another. Everyone was happier

For this last quoted issue, I am contemplating the ability to employ the unused bits (c.f. also) of 64-bit pointers (64-bit also provides a huge virtual memory space and c.f. also) would enable storing an index into a contiguous array which has the reference counts as array elements so they would be in contiguous pages. If the number of available unused bits is insufficient to count all allocated objects, then perhaps each compiled object type could employ a separate array (i.e. a zero-runtime-cost compile-time array selection), presuming the cache thrashing issue is due to intra-page-size fragmentation, not lack of inter-page contiguity. Also then we do not need to create the array element until the second reference. This is applicable when mostly passing around objects and not accessing all the data of the object as frequently.

Weighted reference counting can be employed to reduce the accesses to the reference count by half (since the source pointer has to be accessed any way), but it requires some unused bits in the pointer to store the weight.


David Ireland wrote:

Don’t get me wrong, Apple must have had a good reason to choose reference counting for memory management. That reason is actually pretty obvious: Objective-C is not a safe language, so you can’t move objects. If you can’t move objects, You have to use something like malloc to minimise fragmentation: you pay a big up front cost to allocate an object in the right place. In most GC schemes, allocation is free – you just increment a pointer. This is fine, because you are going to iterate through all your objects every now and again, and it doesn’t cost a whole lot more to move some of them around at the same time. You get to compact away all the fragmentation this results in. This makes GC fairly miserly with memory too. In iOS , if you want to allocate something bigger than the largest free fragment, you die. With GC, you have to wait a while for the GC to compact everything, after which memory will be compacted 100% efficiently and there will be a single memory extent as big as all available memory. Objective-C can’t have any of these benefits, so Apple chose a different route for iOS, and put the best possible spin on it.

Mobile apps won’t be fast until browsers are re-written in a fast, safe programming language that can share a good GC with ECMAScript. Of course browsers won’t be secure until then either, but I’m not holding my breath. There is no such language that is sufficiently popular, and Google fluffed it’s lines with Go. Maybe ECMAScript implementations will gain so much in performance, that all code that manipulates the DOM will be written in ECMAScript, and we can finally use a high performance GC.


Edit Nov. 18, 2017 Distinguish between non-deterministic reference counting and deterministic RAII.

@shelby3

This comment has been minimized.

Show comment
Hide comment
@shelby3

shelby3 Jul 18, 2017

@anonymous wrote in private:

Progress on the language design is great to see.

I need progress on code, not language design, but I am trying to have paradigm I can be thinking about when structuring my code in TypeScript + C, so that it can be an easy transition if ever we complete a new PL and also to drive my insights on that potential language.

Concretely, I think it means I will write wrapper classes with getters and setters in TypeScript that take an ArrayBuffer for data and do my malloc in Emscripten and eliminate marshalling that way. Which will be a precursor to what a compiler can do automatically (and later with better integration by outputting ASM.js or WASM directly and maybe one day our own VM since the browser VMs have limitations).

shelby3 commented Jul 18, 2017

@anonymous wrote in private:

Progress on the language design is great to see.

I need progress on code, not language design, but I am trying to have paradigm I can be thinking about when structuring my code in TypeScript + C, so that it can be an easy transition if ever we complete a new PL and also to drive my insights on that potential language.

Concretely, I think it means I will write wrapper classes with getters and setters in TypeScript that take an ArrayBuffer for data and do my malloc in Emscripten and eliminate marshalling that way. Which will be a precursor to what a compiler can do automatically (and later with better integration by outputting ASM.js or WASM directly and maybe one day our own VM since the browser VMs have limitations).

@shelby3 shelby3 referenced this issue Jul 28, 2017

Open

Concurrency #17

@shelby3

This comment has been minimized.

Show comment
Hide comment
@shelby3

shelby3 commented Aug 26, 2017

@shelby3

This comment has been minimized.

Show comment
Hide comment
@shelby3

shelby3 Nov 19, 2017

I wrote:

We need optional GC and reference counting. The latter excels at deterministic, prompt finalization (which is required for non-main memory finalization) but has the trade-offs quoted below:

[…]

Edit Nov. 18, 2017 Distinguish between non-deterministic reference counting and deterministic RAII.

Re-reading the prior linked comments…

First realization is that reference counting is non-deterministic and is inferior to GC in every facet (for the applicable use cases of non-deterministic resource management and thus the inherent lack of deterministic finalization) except for interop with non-GC objects as explained below (but potentially at the cost of loss of compaction and loss of automatic cyclical references memory leak detection).

Afaics, a useful facility to add in addition to GC is the zero-cost1 abstraction of deterministic RAII block scope allocated/deallocated on the stack, because it (probably improves performance and) adds determinism of finalization (aka destructors) and interacts with the optimization of low-level performance as explained below.

To achieve such zero-cost resource management abstractions requires typing/tracking the lifetimes of borrows of references so the compiler can prove a resource has no live borrow when it goes out of block scope. I had proposed the compile-time enforced “is stored” annotation on function inputs when an input will be (even transitively) borrowed beyond the lifetime of the function application. A separate concern from the issue of tracking borrows (a la Rust’s total ordering which assumes multithreaded preemption everywhere or my proposed partial ordering in conjunction with singled-threaded event model) for the purpose of preventing concurrent race conditions access (which afair we discussed in depth in the Concurrency thread). This granular typing of the context of side-effects seems more straightforward than some grand concept of typing everything as a (a cordoned context of imperative) side-effect??2

One of the key advantages/requirements of GC is the ability to move objects so that memory can be periodically compacted (with the compaction being amortized over many implicit/delayed deallocation events which would otherwise be explicit/immediate/non-deterministic-latency with reference counting), which enables allocation to be as low-cost as incrementing a pointer to the end of allocated objects in a contiguous memory address space.

As an aside, for highly optimized low-level code to run at maximum performance, it must be possible to freeze the position of objects in the virtual memory address space, else the objects must be copied. The former has a concurrency live and dead-lock issue w.r.t. the said compaction.

For RAII managed data structures that contain pointers to data (i.e. unboxed) which is not RAII managed at the current or containing block scope (i.e. for GC pointers since pointers to data RAII managed in a contained block scope would exhibit use-after-free errors), these pointers have to be mutable by the compaction engine and have to be recorded in the list of references which the GC must trace. To have full interoperability between GC instances and RAII (stack allocated data structure) instances without copying data, incurs either a performance penalty of reloading pointers on each access (in case they’ve been moved by the compactor) else the aforementioned concurrency live and dead-lock issue w.r.t. the said compaction (which is an onerous, undesirable can-of-worms).

Thus, I’m leaning towards that stack allocated data structures should not interop with GC instances, except via copying because (other than deterministic finalization) performance is their primary raison d'être. And interop with GC would (as GC does) also encourage unboxed data structures, which are known to thrash caches and degrade performance. Additionally, I think it should be allowed to have GC instances which are unboxed data structures (e.g. transpiled to ArrayBuffer in JavaScript) so they can be most efficiently copied (i.e. no transformation overhead). One might ponder whether to also have reference counted allocation on the heap , so that these immovable objects would not need to be copied when accessed by low-level performant algorithms (and note that the reason ASM.js didn’t allow very small ArrayBuffers is because there’s no presumption that what was compiled to ASM.js did bounds checking, although not a hindrance because our compiler could do static bounds checking in many cases and insert dynamic checks in the others), but the loss of compaction and the lack of automatic cyclical references detection are onerous. Any attempt to overcome these onerous trade-offs with “stop the world” of the code that is operating on non-GC instances, would suffer from “a concurrency live and dead-lock issue” as well (because think about it that just stopping the threads is not sufficient as the state can be invalid when the threads are restarted, instead they must be stopped at points where the current thread state has no such dependencies).

However, is memory compaction even necessary on 64-bit virtual address spaces wherein the hardware can map the virtual address pages to compacted, available physical pages? Well I guess yes unless most objects are much larger than the ~4Kb page size.

However, afaics perhaps the compaction issue could be significantly mitigated by allocating same size objects from contiguous virtual address space significantly larger than ~4KB in reference counting algorithm. Thus allocations would be first applied to deallocated slots in the address space, if any, per the standard reference counting allocation and deallocation algorithm, which employs a linked-list of LIFO free slots.3

If compaction is required, then I contemplate whether to allow unsafe low-level code which manually employs malloc and free should still be allowed for cases where there’s no other way to achieve performance with dynamic memory allocation.

1 Zero runtime cost because due to the compile-time determinism, we know at compile-time when the deallocation will occur.

2 How does Oleg’s insight conflict with @keean’s prior stance (in our discussions) against AST macros? I vaguely remember that he was against macros because they obfuscated debugging and the type system.

Is this the paper? http://okmij.org/ftp/Computation/having-effect.html#HOPE

I’m unable to readily grok Oleg’s work because he communicates in type theory and Haskell (neither of which am I highly fluent). Afaics he explains with details/examples instead of conceptually and generatively from ground level concepts. And I do not have enough time to apply arduous effort. So he limits his audience to those who are fluent in every facet of the prior art he builds on and/or who can justify the time and effort.

Essentially all side-effects distill to the generative essence of shared state. If state is not shared, then there are no interactions of effects between those processes.

Afaics, a Monad controls the threading of access to an orthogonal context across function application. That orthogonal context can be state in the State monad. Correct? That is why Monads can’t generally compose because they cordon orthogonal contexts.

But what is the goal to achieve? If we model higher-level concepts as imperative building blocks with contexts, we can do unlimited paradigms, but not only will our code become impossible to read because of the overhead of the necessary syntax and type system to allow for such generality (which was essentially my complaint against trying to model everything, e.g. even subtyping, as typeclasses as you were proposing before), but we have an explosion of issues with the interaction of too many paradigms. Afaics, this isn’t headed towards a simple language.

There will not be a perfect language. Type theory will never be complete and consistent. Design is driven by goals.

Thoughts?

3 Richard Jones. Garbage Collection Algorithms For Automatic Dynamic Memory Management, §2.1 The Reference Counting Algorithm

shelby3 commented Nov 19, 2017

I wrote:

We need optional GC and reference counting. The latter excels at deterministic, prompt finalization (which is required for non-main memory finalization) but has the trade-offs quoted below:

[…]

Edit Nov. 18, 2017 Distinguish between non-deterministic reference counting and deterministic RAII.

Re-reading the prior linked comments…

First realization is that reference counting is non-deterministic and is inferior to GC in every facet (for the applicable use cases of non-deterministic resource management and thus the inherent lack of deterministic finalization) except for interop with non-GC objects as explained below (but potentially at the cost of loss of compaction and loss of automatic cyclical references memory leak detection).

Afaics, a useful facility to add in addition to GC is the zero-cost1 abstraction of deterministic RAII block scope allocated/deallocated on the stack, because it (probably improves performance and) adds determinism of finalization (aka destructors) and interacts with the optimization of low-level performance as explained below.

To achieve such zero-cost resource management abstractions requires typing/tracking the lifetimes of borrows of references so the compiler can prove a resource has no live borrow when it goes out of block scope. I had proposed the compile-time enforced “is stored” annotation on function inputs when an input will be (even transitively) borrowed beyond the lifetime of the function application. A separate concern from the issue of tracking borrows (a la Rust’s total ordering which assumes multithreaded preemption everywhere or my proposed partial ordering in conjunction with singled-threaded event model) for the purpose of preventing concurrent race conditions access (which afair we discussed in depth in the Concurrency thread). This granular typing of the context of side-effects seems more straightforward than some grand concept of typing everything as a (a cordoned context of imperative) side-effect??2

One of the key advantages/requirements of GC is the ability to move objects so that memory can be periodically compacted (with the compaction being amortized over many implicit/delayed deallocation events which would otherwise be explicit/immediate/non-deterministic-latency with reference counting), which enables allocation to be as low-cost as incrementing a pointer to the end of allocated objects in a contiguous memory address space.

As an aside, for highly optimized low-level code to run at maximum performance, it must be possible to freeze the position of objects in the virtual memory address space, else the objects must be copied. The former has a concurrency live and dead-lock issue w.r.t. the said compaction.

For RAII managed data structures that contain pointers to data (i.e. unboxed) which is not RAII managed at the current or containing block scope (i.e. for GC pointers since pointers to data RAII managed in a contained block scope would exhibit use-after-free errors), these pointers have to be mutable by the compaction engine and have to be recorded in the list of references which the GC must trace. To have full interoperability between GC instances and RAII (stack allocated data structure) instances without copying data, incurs either a performance penalty of reloading pointers on each access (in case they’ve been moved by the compactor) else the aforementioned concurrency live and dead-lock issue w.r.t. the said compaction (which is an onerous, undesirable can-of-worms).

Thus, I’m leaning towards that stack allocated data structures should not interop with GC instances, except via copying because (other than deterministic finalization) performance is their primary raison d'être. And interop with GC would (as GC does) also encourage unboxed data structures, which are known to thrash caches and degrade performance. Additionally, I think it should be allowed to have GC instances which are unboxed data structures (e.g. transpiled to ArrayBuffer in JavaScript) so they can be most efficiently copied (i.e. no transformation overhead). One might ponder whether to also have reference counted allocation on the heap , so that these immovable objects would not need to be copied when accessed by low-level performant algorithms (and note that the reason ASM.js didn’t allow very small ArrayBuffers is because there’s no presumption that what was compiled to ASM.js did bounds checking, although not a hindrance because our compiler could do static bounds checking in many cases and insert dynamic checks in the others), but the loss of compaction and the lack of automatic cyclical references detection are onerous. Any attempt to overcome these onerous trade-offs with “stop the world” of the code that is operating on non-GC instances, would suffer from “a concurrency live and dead-lock issue” as well (because think about it that just stopping the threads is not sufficient as the state can be invalid when the threads are restarted, instead they must be stopped at points where the current thread state has no such dependencies).

However, is memory compaction even necessary on 64-bit virtual address spaces wherein the hardware can map the virtual address pages to compacted, available physical pages? Well I guess yes unless most objects are much larger than the ~4Kb page size.

However, afaics perhaps the compaction issue could be significantly mitigated by allocating same size objects from contiguous virtual address space significantly larger than ~4KB in reference counting algorithm. Thus allocations would be first applied to deallocated slots in the address space, if any, per the standard reference counting allocation and deallocation algorithm, which employs a linked-list of LIFO free slots.3

If compaction is required, then I contemplate whether to allow unsafe low-level code which manually employs malloc and free should still be allowed for cases where there’s no other way to achieve performance with dynamic memory allocation.

1 Zero runtime cost because due to the compile-time determinism, we know at compile-time when the deallocation will occur.

2 How does Oleg’s insight conflict with @keean’s prior stance (in our discussions) against AST macros? I vaguely remember that he was against macros because they obfuscated debugging and the type system.

Is this the paper? http://okmij.org/ftp/Computation/having-effect.html#HOPE

I’m unable to readily grok Oleg’s work because he communicates in type theory and Haskell (neither of which am I highly fluent). Afaics he explains with details/examples instead of conceptually and generatively from ground level concepts. And I do not have enough time to apply arduous effort. So he limits his audience to those who are fluent in every facet of the prior art he builds on and/or who can justify the time and effort.

Essentially all side-effects distill to the generative essence of shared state. If state is not shared, then there are no interactions of effects between those processes.

Afaics, a Monad controls the threading of access to an orthogonal context across function application. That orthogonal context can be state in the State monad. Correct? That is why Monads can’t generally compose because they cordon orthogonal contexts.

But what is the goal to achieve? If we model higher-level concepts as imperative building blocks with contexts, we can do unlimited paradigms, but not only will our code become impossible to read because of the overhead of the necessary syntax and type system to allow for such generality (which was essentially my complaint against trying to model everything, e.g. even subtyping, as typeclasses as you were proposing before), but we have an explosion of issues with the interaction of too many paradigms. Afaics, this isn’t headed towards a simple language.

There will not be a perfect language. Type theory will never be complete and consistent. Design is driven by goals.

Thoughts?

3 Richard Jones. Garbage Collection Algorithms For Automatic Dynamic Memory Management, §2.1 The Reference Counting Algorithm

@NodixBlockchain

This comment has been minimized.

Show comment
Hide comment
@NodixBlockchain

NodixBlockchain Nov 21, 2017

If i may :)

For me the main problem i would have with garbage collector is that they are harder to debug and test.

The nightmare i have is you will test the functionality on simple program, then it works, and you keep adding functions, race conditions, and two month latter you realize there is a memory leak or memory corruption somewhere, and you're good for days or weeks of debugging to trace the problem.

With reference counting, it's very easy to trace problem, because you know where the freeing must happen, the point where the last reference to an object is released can be tracked easily, and it make it very easy to track problems, either they are memory corruption due to memory being released prematurely , or memory leaks.

With garbage collector who will parse the whole tree every now and then, it will traverse thousands of pointer allocated over few minutes of running, and it can become much harder to track where the problem is if it free a memory that it shouldn't, or if it doesn't free a memory that it should.

Especially if you go for multi layered scope, with closures, and asynchronous code execution, it can become very hard to make sure the GC will catch everything right. And bugs will generally appear when the program start to become more complex rather than in easy vanilla cases when the program is small and simple.

And if you need to have a system of plugin, like code that are loaded at runtime that can manipulate references itself, it can become also impossible to track deterministically the reference tree at compile time, and the host program might have no clue at all of what the plugin is doing with the reference passed to it. And it's the same problem with distributed application.

For me now when i want to add new feature, especially for memory management, my first concern is not only 'how to make it work', but also how easy it will be to detect problems, and to track what cause the problem.

With reference counting, it's easy to make a break point when reference get freed, and having the debugger to pause everything and inspect all the context at the moment where it happens.

With GC, the analysis is delayed, and all the context of execution that change the state of the object is lost, so it makes it very hard to debug.

For me to make GC programming easy, it needs something like android SDK where all the application is structured at macro level with activities, intents, special class for asynchronous/background execution, and making in sort that lifetime of all objects are predictible due to the structuring of the application, declaration of resources in XML etc.

https://developer.android.com/guide/components/activities/process-lifecycle.html

They have whole lot of classes to build the application on, to make resource management and lifecycle of object more deterministic.

It impose a certain structure to all application classes and component to make memory management fit in case that the GC can easily deal with.

Making general purpose GC to deal with complex language with multi layered scope and asynchronous execution without giving any hint to the pattern that the object is likely to follow by inheriting some class, and structuring the application with built class with predictible behavior seems very hard.

Reference counting is much easier to deal with for the general case.

NodixBlockchain commented Nov 21, 2017

If i may :)

For me the main problem i would have with garbage collector is that they are harder to debug and test.

The nightmare i have is you will test the functionality on simple program, then it works, and you keep adding functions, race conditions, and two month latter you realize there is a memory leak or memory corruption somewhere, and you're good for days or weeks of debugging to trace the problem.

With reference counting, it's very easy to trace problem, because you know where the freeing must happen, the point where the last reference to an object is released can be tracked easily, and it make it very easy to track problems, either they are memory corruption due to memory being released prematurely , or memory leaks.

With garbage collector who will parse the whole tree every now and then, it will traverse thousands of pointer allocated over few minutes of running, and it can become much harder to track where the problem is if it free a memory that it shouldn't, or if it doesn't free a memory that it should.

Especially if you go for multi layered scope, with closures, and asynchronous code execution, it can become very hard to make sure the GC will catch everything right. And bugs will generally appear when the program start to become more complex rather than in easy vanilla cases when the program is small and simple.

And if you need to have a system of plugin, like code that are loaded at runtime that can manipulate references itself, it can become also impossible to track deterministically the reference tree at compile time, and the host program might have no clue at all of what the plugin is doing with the reference passed to it. And it's the same problem with distributed application.

For me now when i want to add new feature, especially for memory management, my first concern is not only 'how to make it work', but also how easy it will be to detect problems, and to track what cause the problem.

With reference counting, it's easy to make a break point when reference get freed, and having the debugger to pause everything and inspect all the context at the moment where it happens.

With GC, the analysis is delayed, and all the context of execution that change the state of the object is lost, so it makes it very hard to debug.

For me to make GC programming easy, it needs something like android SDK where all the application is structured at macro level with activities, intents, special class for asynchronous/background execution, and making in sort that lifetime of all objects are predictible due to the structuring of the application, declaration of resources in XML etc.

https://developer.android.com/guide/components/activities/process-lifecycle.html

They have whole lot of classes to build the application on, to make resource management and lifecycle of object more deterministic.

It impose a certain structure to all application classes and component to make memory management fit in case that the GC can easily deal with.

Making general purpose GC to deal with complex language with multi layered scope and asynchronous execution without giving any hint to the pattern that the object is likely to follow by inheriting some class, and structuring the application with built class with predictible behavior seems very hard.

Reference counting is much easier to deal with for the general case.

@keean

This comment has been minimized.

Show comment
Hide comment
@keean

keean Nov 21, 2017

Owner

My preference is to manage as much as possible from the stack as you can. There are some simple tricks that can make this work. If all allocation is RAII and you have proceedures not functions then to return some data you simply pass a collection (set/map/array etc) into the proceedure.

Edit: where this gets complicated is you then end up with two kinds of pointers "owning pointers" which must form a Directed-Acyclic-Graph, and non-owning (or reference) pointers (and links between data-structures). The simple solution is to make all 'references' weak-references, so that you have to null-test them when dereferencing to check if the resource they point to has gone away. Then we simply delete things when they go out of scope, removing all resources pointed to by owning-pointers in the DAG. No GC no Reference counting, and a simple non-zero check on dereference. Of cource 'nullable' pointers are not considered a good thing, so we can then look for static techniques where we do not allow references to outlive the object to they reference. This is where Rust's lifetimes come from.

Owner

keean commented Nov 21, 2017

My preference is to manage as much as possible from the stack as you can. There are some simple tricks that can make this work. If all allocation is RAII and you have proceedures not functions then to return some data you simply pass a collection (set/map/array etc) into the proceedure.

Edit: where this gets complicated is you then end up with two kinds of pointers "owning pointers" which must form a Directed-Acyclic-Graph, and non-owning (or reference) pointers (and links between data-structures). The simple solution is to make all 'references' weak-references, so that you have to null-test them when dereferencing to check if the resource they point to has gone away. Then we simply delete things when they go out of scope, removing all resources pointed to by owning-pointers in the DAG. No GC no Reference counting, and a simple non-zero check on dereference. Of cource 'nullable' pointers are not considered a good thing, so we can then look for static techniques where we do not allow references to outlive the object to they reference. This is where Rust's lifetimes come from.

@shelby3

This comment has been minimized.

Show comment
Hide comment
@shelby3

shelby3 Nov 21, 2017

@NodixBlockchain (aka IadixDev @ BCT) wrote:

If i may :)

I can’t speak for @keean (and this is his project area but this is my thread), but personally I’d prefer “please don’t” (or post once for every 10 posts by others in this thread, so as to keep the S/N ratio of the thread higher), i.e that you would comment in your thread “Saluations! :)” instead (or at least not in this/my thread), unless you have something new+important+accurate+well-researched to point out which you have not written already in the past.

If you quote from this thread, and post in your thread, I will probably read (presuming I have time) and perhaps even reply there. I’m trying to avoid this thread becoming another 300+ posts that I can’t wade through when I want to refresh my own memory of what my (and @keean’s) analyses was/is.

@keean and I have been doing backtalking on LinkedIn to reduce clutter in these issues thread. You’re welcome to add me there.

With reference counting, it's very easy to trace problem, because you know where the freeing must happen, the point where the last reference to an object is released can be tracked easily, and it make it very easy to track problems, either they are memory corruption due to memory being released prematurely , or memory leaks.

I’m wondering whether that is a superstitious, anecdotal belief (i.e. aliasing error) or an assertion you actually studied, measured, and verified to be the case with in depth effort for comparing reference counting and tracing garbage collection. Because inherently reference counting doesn’t track the directed graph of memory allocation (i.e. all we have are reference counts and no back pointers to the references that comprise those counts); thus, there’s no feasible way to see which data structures have leaked because they’re no longer reachable from the untraced roots of the said graph in a reference counting GC system— for example due to cyclical references. Whereas, in theory a tracing garbage collector could provide such a graph for debugging.

With reference counting, it's easy to make a break point when reference get freed, and having the debugger to pause everything and inspect all the context at the moment where it happens.

With GC, the analysis is delayed, and all the context of execution that change the state of the object is lost, so it makes it very hard to debug.

You can’t see which/where are all the references to the shared object that had it’s reference count decremented, because it’s the implicit nature of non-deterministic memory allocation to not know which pointers would be the candidates and the reference counting scheme provides no back pointers from the said object to said sharing pointers (as aforementioned).

I agree it wouldn’t normally be possible to set a runtime breakpoint on the change in number of references to a specific object with a tracing GC, because the assignment of pointers doesn’t touch the referenced object (although write-guards for generational GC might get you as close as the virtual page but that might not be helpful). Thus, there’s no way to know which assignment to set a breakpoint on for testing the address of the said shared object. Yet comparably, the more optimized variants of reference counting also have the same debugging problem because they don’t even update the said common object (as explained below) because they keep the reference count in the pointers (and additionally there is the Weighted Reference Counting). In both cases, a sophisticated debugger and language could in theory be designed to set such conditional breakpoints on all pointer assignments (conditionally breaking only if the said shared object address is encountered), if such a facility is helpful for debugging memory leaks. And the tracing GC would have more information about the memory allocation graph.

And if you need to have a system of plugin, like code that are loaded at runtime that can manipulate references itself, it can become also impossible to track deterministically the reference tree at compile time

It seems you’re presuming that reference counting models deterministic memory allocation, but that is not generally the case and (per my reply to @keean below), if that were generally the case then in theory it could be replaced by zero-runtime-cost compile-time deterministic allocation mgmt (albeit with tsuris of a complex compile-time typing system). Afaics, you continue to repeat this belief system of yours (presumably confirmation biased by your presumed preference for your low-level framework you discussed in your thread “Saluations! :)” and per my comments to @keean about C/C++ below) without refuting the points that have been made to you numerous times previously. I’ll recap and augment… (and hoping you’ll keep the not-so-thoroughly-contemplated-studied noise to a minimum in this/my thread please).

https://developer.android.com/guide/components/activities/process-lifecycle.html

They have whole lot of classes to build the application on, to make resource management and lifecycle of object more deterministic.

I don’t see how the process lifecycle has anything to with the generalized problem that in general memory allocation is non-deterministic.

It impose a certain structure to all application classes and component to make memory management fit in case that the GC can easily deal with.

Programming paradigms for avoiding memory leaks are useful regardless of the memory allocation scheme chosen.

I previously mentioned the following disadvantages for reference counting to you (either in the Concurrency thread and/or the other thread you created). And note the first one below is a form of memory leak inherent in reference counting.

Reference counting (which technically is a form of “direct”1 garbage collection) can’t garbage collect cyclical references. And according to an expert book I’m reading there’s no known means of weak pointers or other paradigm which can reliably solve the problem of cyclical references in all scenarios.2 This makes sense because reference counting (RC) doesn’t contemplate the entire directed graph of objects holistically. Thus only “indirect, tracing”1 garbage collection which deals with the entire directed graph will reliably detect and garbage collect cyclical references. However, note that probabilistic, localized tracing combined with RC decrements is comparable to mark-sweep (MS)3 for (even apparently asymptotic) throughput computational complexity costs, yet with an upper bound (“6 ms”) on pause times due to locality of search. Yet presumably pause times are eliminated with a concurrent MS (and as cited below, RC has less throughput than BG-MS).

According to this expert book4, reference counting has significantly worse throughput (except compared to the pathological asymptotic huge heap case for tracing GC which can be avoided15, and perhaps multi-threaded/multi-core/parallelism but I think need to study that more) because of the significant additional overhead for every pointer assignment. This overhead as a percentage is worse for highly optimized, low-level languages (e.g. C/C++) compared to interpreted languages which have high overhead any way. Note that a claimed throughput parity for RC with MS5 (and it’s just culling young generation overhead similar to prior art16) is not parity with generational GC combined with MS (BG-MS).6 Additionally the adjustment of the reference counts on each pointer assignment thrashes the cache7 further reducing performance, except for optimizations which generally have to combined with a tracing GC any way (and they diminish the afaics mostly irrelevant claimed deterministic finality benefit).8 Additionally for multithreaded code, there is additional overhead due to contention over the race condition when updating reference counts, although there are potential amortization optimizations which are also potentially superior for parallelized and/or distributed paradigms9 (but they diminish the afaics mostly irrelevant claimed deterministic finality benefit). Additionally memory allocation management on the heap is non-deterministic, so the determinism of the immediacy of deallocation for referencing counting is afaics more or less irrelevant. Additionally reference counting (especially when executing destructors) can cause a dominoes cascade of latency (aka “stop the world”) and the amortization optimizations10 again diminish the afaics mostly irrelevant claimed deterministic finality benefit.

However, in my next post in this thread, I will propose a variant of deferred reference counting11 in conjunction with a clever generational GC scheme (apparently similar to prior art16), for the avoiding the pathological asymptotic case of tracing GC. See my further comments about this in my reply to @keean below.

Especially if you go for multi layered scope, with closures, and asynchronous code execution, it can become very hard to make sure the GC will catch everything right. And bugs will generally appear when the program start to become more complex rather than in easy vanilla cases when the program is small and simple.

In general, more complexity in the code will increase the probability of semantic memory leaks, even semantic leaks of the form that automatic garbage collection can’t fix because the error is semantic. Reference counting (a form of automated GC) will also suffer these semantic memory leaks. The issue of complexity of semantics, is more or less not peculiar to the type of automatic GC scheme chosen. Thus I don’t view this as a valid complaint against tracing GC in favor of reference counting.

1 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §1 Introduction, pg. 2.

2 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §3.5 Cyclic reference counting, pg. 56 – 62, §3.6 Issues to consider: Cyclic data structures, pg. 71.

3 D. F. Bacon and V. T. Rajan (2001). Concurrent cycle collection in reference counted systems. In J. L. Knudsen, editor, Proceedings of 15th European Conference on Object-Oriented Programming, ECOOP 2001, Budapest, Hungary, June 18-22, volume 2072 of Lecture Notes in Computer Science, pp. 207–235. Springer-Verlag, 2001.

4 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §2.1 The Reference Counting Algorithm, pg. 19 – 25, §3 Reference Counting, pp 43 – 74.

5 R. Shahriyar, S. M. Blackburn, X. Yang, and K. S. McKinley (2012), "Taking Off the Gloves with Reference Counting Immix," in OOPSLA ‘13: Proceeding of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications, 2013. Originally found on Steve Blackburn’s website.

6 Stephen Blackburn; Kathryn McKinley (2003). "Ulterior Reference Counting: Fast Garbage Collection without a Long Wait". Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications. OOPSLA 2003, Table 3 on pg. 9.

7 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §2.1 The Reference Counting Algorithm: The strengths and weakness of reference counting, pg. 22.

8 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §3.3 Limited-field reference counts, pp. 50 – 55, §3.3 Limited-field reference counts: One-bit reference counts, pg. 52, §3.6 Issues to consider: Space for reference counts & Locality of references, pg. 71.

9 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §8.4 Concurrent Reference Counting, pp. 200 – 201, §3.7 Notes, pg. 74, §4.1 Comparisons with reference counting, pg. 77.

10 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §3.1 Non-recursive freeing, pg. 44.

11 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §3.2 Deferred reference counting: The Deutsch-Bobrow algorithm, pg. 46.


@keean wrote:

My preference is to manage as much as possible from the stack as you can. There are some simple tricks that can make this work. If all allocation is RAII and you have procedures not functions then to return some data you simply pass a collection (set/map/array etc) into the procedure.

This is conceptually broadly analogous/related to deferred reference counting12 schemes which rely on compile-time deterministic tracing of the deallocation, such as via linear (aka uniqueness) types.

Although I was obviously also contemplating similarly to you in my prior posts in this thread (after you had reminded me about RAII in our previous discussions and my recollection of programming in C++ in the late 1990s), after reading a book on garbage collection and thinking more about the issues, I’m going to further argue against the compile-time deterministic memory mgmt language design strategy in my upcoming detailed post, because I believe it’s not as generally flexible, generally sound, nor highly comprehensible solution.

These stack memory mgmt paradigms you’re referring to, have narrow applicability because they don’t handle the general case of non-deterministic heap management.

Afaics, it’s a special case methodology that leads to a very complex set of corner cases such as for C/C++ which tries to pile on a bunch of different sort of compile-time optimizations and keep as much memory mgmt on the stack, but lead to a brittle clusterfuck of a language with a 1500 page manual that virtually no one understands entirely (not even the C++ creator Bjarne Stroustrup nor the STL creator Alexander Stepanov, as they both admitted).

And those stack paradigms don’t marry well holistically with a bolt-on tracing garbage collection appendage13 (i.e. was not design holistically with the language), although I have not yet read a more recent research paper on combining tracing GC with Rust.14 To even attempt to apply these compile-time deterministic memory management holistically requires a complex typing tsuris akin to Rust’s total ordering of borrow lifetimes with lifetime annotation parameters for transitivity of borrows from inputs to outputs, and of course the programmer must violate the invariants and create unsafe code in order to handle non-deterministic (i.e. runtime randomized) memory mgmt, which defeats the entire point as then the total order is no longer checked and the unsafety can leak back into the code the compiler thinks is safe.

I apply the 80/20 or 90/10 Pareto principle to prioritization. Although all that special case tsuris might be able to obtain an extra 10% of performance, it has the cost of 90% more effort on debugging, readability of the code, maintenance of the code, complexity of the code, etc.. When you really need that 10% boost (or in the case where the more generalized GC paradigm is much slower for some reason), then go write a module in C++ or Rust. I see no need to invest my effort in trying to recreate (or improve upon) those languages, because there are millions of man-hours already in those ecosystems. Our only prayer of creating something useful in this decade with our resources, is to 90/10% prioritize on clean abstractions that are elegant and generalized and 90% performant with 10% of the effort and complexity.

EDIT: on the coding for readability point, “Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.”

Additionally, perhaps my GC design ideas (to be outlined in my next post) might possibly be compelling (if I haven’t made some errors in my thought process, which is entirely possible given I need to eat dark chocolate to even get my brain to ephemerally coax my brain out of the 105 IQ brain fog gutter and function non-lethargically presumably due to an ongoing TB infection). Note however, that the prior art for GC is apparently vast and seemingly exhaustive in terms of the low hanging fruit of obvious, simple variants16 (i.e. fertile ground presumably only in the realm of esoteric, high complexity, and/or abstruse). Thus, unlikely I discovered something (after a day of reading and contemplation) that hadn’t be considered previously in the prior art, although there appears to be potential prior art similar to my ideas16 that have appeared later than the book I read. Note I independently arrived at my ideas (after about a day of contemplating the 1996 book) and have not yet read that 2003 research paper16 to compare.

Total ordering compile-time paradigms (including attempting to type check everything) are afaics brittle paradigms that break because the programmer must necessarily break out of them because the entropy of the universe is not bounded. I commented on this previously numerous times in these issues threads. The non-determinism (i.e. unbounded randomness and entropy) of the universe (i.e. the real world in I/O) can’t be statically modeled.

My thought is we need to be circumspect about the benefits versus trade-offs of every paradigm and feature we choose in a programming language. And the target use cases (and acumen/preferences of the target programmers demographics) of the programming language will also factor into the decision process.

For example, Haskell programmers favor brevity and mathematical expression with trade-offs such as hidden (2nd, 3rd, and 4th order) details in the form of implicit typing (which the mathematically minded model in their mind without reading them explicitly in the code), which drastically reduces the demographics of programmers who can read the code and participate. Some snobbishly claim this is a good filter to keep the less intelligent programmers away. Haskell has other trade-offs (necessarily to enable the mathematical modelling) which we’ve discussed else where are outside the scope of this concise summary, including but not limited to (and my descriptions may not be entirely accurate given my limited Haskell fluency) the debugging non-determinism tsuris of lazy evaluation (although the requirement for lazy evaluation is disputed), the bottom type populating all types, a total ordering (or algebra) on implementation of each data type for each typeclass interface, a total ordering on the sequential (monadic) threading of I/O processing in the code (requiring unsafe code to fully accommodate the non-determinism of I/O), etc..

Edit: where this gets complicated is you then end up with two kinds of pointers "owning pointers" which must form a Directed-Acyclic-Graph, and non-owning (or reference) pointers (and links between data-structures). The simple solution is to make all 'references' weak-references, so that you have to null-test them when dereferencing to check if the resource they point to has gone away. Then we simply delete things when they go out of scope, removing all resources pointed to by owning-pointers in the DAG. No GC no Reference counting, and a simple non-zero check on dereference. Of cource 'nullable' pointers are not considered a good thing, so we can then look for static techniques where we do not allow references to outlive the object to they reference. This is where Rust's lifetimes come from.

Human managed weak references are an incomplete and human-error prone strategy. Also as you noted then you leak the non-determinism from deallocation to how to handle error conditions of null pointers, leaking the non-determinism into program logic! That is horrible. Whereas, a provably sound weak pointer algorithm has exponential asymptotics and thus is no better or worse than tracing GC mark-sweep.2

12 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §3.2 Deferred reference counting, pg. 45.

13 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §9 Garbage Collection for C, pg. 227.

14 Y. Lin, S. M. Blackburn, A. L. Hosking, and M. Norrish (2016), "Rust as a Language for High Performance GC Implementation," in Proceedings of the Sixteenth ACM SIGPLAN International Symposium on Memory Management, ISMM ‘16, Santa Barbara, CA, June 13, 2016, 2016.

15 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, Preface: The audience, pg. xxiv, Preface: The Bibliography and the World-Wide Web, pg. xxv.

16 Stephen Blackburn; Kathryn McKinley (2003). "Ulterior Reference Counting: Fast Garbage Collection without a Long Wait". Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications. OOPSLA 2003. I originally found this on Wikipedia’s page for Reference Counting.


Kindle version of the book is available on Amazon for less then $20 if you don’t want to turn your monitor sideways to read the “free” (copyright violation theft) pdf linked above.

EDIT: a more concise overview resource:

@NodixBlockchain wrote:

https://openresearch-repository.anu.edu.au/bitstream/1885/9053/1/02WholeThesis_Garner.pdf

This papper cover it all :D

shelby3 commented Nov 21, 2017

@NodixBlockchain (aka IadixDev @ BCT) wrote:

If i may :)

I can’t speak for @keean (and this is his project area but this is my thread), but personally I’d prefer “please don’t” (or post once for every 10 posts by others in this thread, so as to keep the S/N ratio of the thread higher), i.e that you would comment in your thread “Saluations! :)” instead (or at least not in this/my thread), unless you have something new+important+accurate+well-researched to point out which you have not written already in the past.

If you quote from this thread, and post in your thread, I will probably read (presuming I have time) and perhaps even reply there. I’m trying to avoid this thread becoming another 300+ posts that I can’t wade through when I want to refresh my own memory of what my (and @keean’s) analyses was/is.

@keean and I have been doing backtalking on LinkedIn to reduce clutter in these issues thread. You’re welcome to add me there.

With reference counting, it's very easy to trace problem, because you know where the freeing must happen, the point where the last reference to an object is released can be tracked easily, and it make it very easy to track problems, either they are memory corruption due to memory being released prematurely , or memory leaks.

I’m wondering whether that is a superstitious, anecdotal belief (i.e. aliasing error) or an assertion you actually studied, measured, and verified to be the case with in depth effort for comparing reference counting and tracing garbage collection. Because inherently reference counting doesn’t track the directed graph of memory allocation (i.e. all we have are reference counts and no back pointers to the references that comprise those counts); thus, there’s no feasible way to see which data structures have leaked because they’re no longer reachable from the untraced roots of the said graph in a reference counting GC system— for example due to cyclical references. Whereas, in theory a tracing garbage collector could provide such a graph for debugging.

With reference counting, it's easy to make a break point when reference get freed, and having the debugger to pause everything and inspect all the context at the moment where it happens.

With GC, the analysis is delayed, and all the context of execution that change the state of the object is lost, so it makes it very hard to debug.

You can’t see which/where are all the references to the shared object that had it’s reference count decremented, because it’s the implicit nature of non-deterministic memory allocation to not know which pointers would be the candidates and the reference counting scheme provides no back pointers from the said object to said sharing pointers (as aforementioned).

I agree it wouldn’t normally be possible to set a runtime breakpoint on the change in number of references to a specific object with a tracing GC, because the assignment of pointers doesn’t touch the referenced object (although write-guards for generational GC might get you as close as the virtual page but that might not be helpful). Thus, there’s no way to know which assignment to set a breakpoint on for testing the address of the said shared object. Yet comparably, the more optimized variants of reference counting also have the same debugging problem because they don’t even update the said common object (as explained below) because they keep the reference count in the pointers (and additionally there is the Weighted Reference Counting). In both cases, a sophisticated debugger and language could in theory be designed to set such conditional breakpoints on all pointer assignments (conditionally breaking only if the said shared object address is encountered), if such a facility is helpful for debugging memory leaks. And the tracing GC would have more information about the memory allocation graph.

And if you need to have a system of plugin, like code that are loaded at runtime that can manipulate references itself, it can become also impossible to track deterministically the reference tree at compile time

It seems you’re presuming that reference counting models deterministic memory allocation, but that is not generally the case and (per my reply to @keean below), if that were generally the case then in theory it could be replaced by zero-runtime-cost compile-time deterministic allocation mgmt (albeit with tsuris of a complex compile-time typing system). Afaics, you continue to repeat this belief system of yours (presumably confirmation biased by your presumed preference for your low-level framework you discussed in your thread “Saluations! :)” and per my comments to @keean about C/C++ below) without refuting the points that have been made to you numerous times previously. I’ll recap and augment… (and hoping you’ll keep the not-so-thoroughly-contemplated-studied noise to a minimum in this/my thread please).

https://developer.android.com/guide/components/activities/process-lifecycle.html

They have whole lot of classes to build the application on, to make resource management and lifecycle of object more deterministic.

I don’t see how the process lifecycle has anything to with the generalized problem that in general memory allocation is non-deterministic.

It impose a certain structure to all application classes and component to make memory management fit in case that the GC can easily deal with.

Programming paradigms for avoiding memory leaks are useful regardless of the memory allocation scheme chosen.

I previously mentioned the following disadvantages for reference counting to you (either in the Concurrency thread and/or the other thread you created). And note the first one below is a form of memory leak inherent in reference counting.

Reference counting (which technically is a form of “direct”1 garbage collection) can’t garbage collect cyclical references. And according to an expert book I’m reading there’s no known means of weak pointers or other paradigm which can reliably solve the problem of cyclical references in all scenarios.2 This makes sense because reference counting (RC) doesn’t contemplate the entire directed graph of objects holistically. Thus only “indirect, tracing”1 garbage collection which deals with the entire directed graph will reliably detect and garbage collect cyclical references. However, note that probabilistic, localized tracing combined with RC decrements is comparable to mark-sweep (MS)3 for (even apparently asymptotic) throughput computational complexity costs, yet with an upper bound (“6 ms”) on pause times due to locality of search. Yet presumably pause times are eliminated with a concurrent MS (and as cited below, RC has less throughput than BG-MS).

According to this expert book4, reference counting has significantly worse throughput (except compared to the pathological asymptotic huge heap case for tracing GC which can be avoided15, and perhaps multi-threaded/multi-core/parallelism but I think need to study that more) because of the significant additional overhead for every pointer assignment. This overhead as a percentage is worse for highly optimized, low-level languages (e.g. C/C++) compared to interpreted languages which have high overhead any way. Note that a claimed throughput parity for RC with MS5 (and it’s just culling young generation overhead similar to prior art16) is not parity with generational GC combined with MS (BG-MS).6 Additionally the adjustment of the reference counts on each pointer assignment thrashes the cache7 further reducing performance, except for optimizations which generally have to combined with a tracing GC any way (and they diminish the afaics mostly irrelevant claimed deterministic finality benefit).8 Additionally for multithreaded code, there is additional overhead due to contention over the race condition when updating reference counts, although there are potential amortization optimizations which are also potentially superior for parallelized and/or distributed paradigms9 (but they diminish the afaics mostly irrelevant claimed deterministic finality benefit). Additionally memory allocation management on the heap is non-deterministic, so the determinism of the immediacy of deallocation for referencing counting is afaics more or less irrelevant. Additionally reference counting (especially when executing destructors) can cause a dominoes cascade of latency (aka “stop the world”) and the amortization optimizations10 again diminish the afaics mostly irrelevant claimed deterministic finality benefit.

However, in my next post in this thread, I will propose a variant of deferred reference counting11 in conjunction with a clever generational GC scheme (apparently similar to prior art16), for the avoiding the pathological asymptotic case of tracing GC. See my further comments about this in my reply to @keean below.

Especially if you go for multi layered scope, with closures, and asynchronous code execution, it can become very hard to make sure the GC will catch everything right. And bugs will generally appear when the program start to become more complex rather than in easy vanilla cases when the program is small and simple.

In general, more complexity in the code will increase the probability of semantic memory leaks, even semantic leaks of the form that automatic garbage collection can’t fix because the error is semantic. Reference counting (a form of automated GC) will also suffer these semantic memory leaks. The issue of complexity of semantics, is more or less not peculiar to the type of automatic GC scheme chosen. Thus I don’t view this as a valid complaint against tracing GC in favor of reference counting.

1 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §1 Introduction, pg. 2.

2 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §3.5 Cyclic reference counting, pg. 56 – 62, §3.6 Issues to consider: Cyclic data structures, pg. 71.

3 D. F. Bacon and V. T. Rajan (2001). Concurrent cycle collection in reference counted systems. In J. L. Knudsen, editor, Proceedings of 15th European Conference on Object-Oriented Programming, ECOOP 2001, Budapest, Hungary, June 18-22, volume 2072 of Lecture Notes in Computer Science, pp. 207–235. Springer-Verlag, 2001.

4 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §2.1 The Reference Counting Algorithm, pg. 19 – 25, §3 Reference Counting, pp 43 – 74.

5 R. Shahriyar, S. M. Blackburn, X. Yang, and K. S. McKinley (2012), "Taking Off the Gloves with Reference Counting Immix," in OOPSLA ‘13: Proceeding of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications, 2013. Originally found on Steve Blackburn’s website.

6 Stephen Blackburn; Kathryn McKinley (2003). "Ulterior Reference Counting: Fast Garbage Collection without a Long Wait". Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications. OOPSLA 2003, Table 3 on pg. 9.

7 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §2.1 The Reference Counting Algorithm: The strengths and weakness of reference counting, pg. 22.

8 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §3.3 Limited-field reference counts, pp. 50 – 55, §3.3 Limited-field reference counts: One-bit reference counts, pg. 52, §3.6 Issues to consider: Space for reference counts & Locality of references, pg. 71.

9 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §8.4 Concurrent Reference Counting, pp. 200 – 201, §3.7 Notes, pg. 74, §4.1 Comparisons with reference counting, pg. 77.

10 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §3.1 Non-recursive freeing, pg. 44.

11 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §3.2 Deferred reference counting: The Deutsch-Bobrow algorithm, pg. 46.


@keean wrote:

My preference is to manage as much as possible from the stack as you can. There are some simple tricks that can make this work. If all allocation is RAII and you have procedures not functions then to return some data you simply pass a collection (set/map/array etc) into the procedure.

This is conceptually broadly analogous/related to deferred reference counting12 schemes which rely on compile-time deterministic tracing of the deallocation, such as via linear (aka uniqueness) types.

Although I was obviously also contemplating similarly to you in my prior posts in this thread (after you had reminded me about RAII in our previous discussions and my recollection of programming in C++ in the late 1990s), after reading a book on garbage collection and thinking more about the issues, I’m going to further argue against the compile-time deterministic memory mgmt language design strategy in my upcoming detailed post, because I believe it’s not as generally flexible, generally sound, nor highly comprehensible solution.

These stack memory mgmt paradigms you’re referring to, have narrow applicability because they don’t handle the general case of non-deterministic heap management.

Afaics, it’s a special case methodology that leads to a very complex set of corner cases such as for C/C++ which tries to pile on a bunch of different sort of compile-time optimizations and keep as much memory mgmt on the stack, but lead to a brittle clusterfuck of a language with a 1500 page manual that virtually no one understands entirely (not even the C++ creator Bjarne Stroustrup nor the STL creator Alexander Stepanov, as they both admitted).

And those stack paradigms don’t marry well holistically with a bolt-on tracing garbage collection appendage13 (i.e. was not design holistically with the language), although I have not yet read a more recent research paper on combining tracing GC with Rust.14 To even attempt to apply these compile-time deterministic memory management holistically requires a complex typing tsuris akin to Rust’s total ordering of borrow lifetimes with lifetime annotation parameters for transitivity of borrows from inputs to outputs, and of course the programmer must violate the invariants and create unsafe code in order to handle non-deterministic (i.e. runtime randomized) memory mgmt, which defeats the entire point as then the total order is no longer checked and the unsafety can leak back into the code the compiler thinks is safe.

I apply the 80/20 or 90/10 Pareto principle to prioritization. Although all that special case tsuris might be able to obtain an extra 10% of performance, it has the cost of 90% more effort on debugging, readability of the code, maintenance of the code, complexity of the code, etc.. When you really need that 10% boost (or in the case where the more generalized GC paradigm is much slower for some reason), then go write a module in C++ or Rust. I see no need to invest my effort in trying to recreate (or improve upon) those languages, because there are millions of man-hours already in those ecosystems. Our only prayer of creating something useful in this decade with our resources, is to 90/10% prioritize on clean abstractions that are elegant and generalized and 90% performant with 10% of the effort and complexity.

EDIT: on the coding for readability point, “Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.”

Additionally, perhaps my GC design ideas (to be outlined in my next post) might possibly be compelling (if I haven’t made some errors in my thought process, which is entirely possible given I need to eat dark chocolate to even get my brain to ephemerally coax my brain out of the 105 IQ brain fog gutter and function non-lethargically presumably due to an ongoing TB infection). Note however, that the prior art for GC is apparently vast and seemingly exhaustive in terms of the low hanging fruit of obvious, simple variants16 (i.e. fertile ground presumably only in the realm of esoteric, high complexity, and/or abstruse). Thus, unlikely I discovered something (after a day of reading and contemplation) that hadn’t be considered previously in the prior art, although there appears to be potential prior art similar to my ideas16 that have appeared later than the book I read. Note I independently arrived at my ideas (after about a day of contemplating the 1996 book) and have not yet read that 2003 research paper16 to compare.

Total ordering compile-time paradigms (including attempting to type check everything) are afaics brittle paradigms that break because the programmer must necessarily break out of them because the entropy of the universe is not bounded. I commented on this previously numerous times in these issues threads. The non-determinism (i.e. unbounded randomness and entropy) of the universe (i.e. the real world in I/O) can’t be statically modeled.

My thought is we need to be circumspect about the benefits versus trade-offs of every paradigm and feature we choose in a programming language. And the target use cases (and acumen/preferences of the target programmers demographics) of the programming language will also factor into the decision process.

For example, Haskell programmers favor brevity and mathematical expression with trade-offs such as hidden (2nd, 3rd, and 4th order) details in the form of implicit typing (which the mathematically minded model in their mind without reading them explicitly in the code), which drastically reduces the demographics of programmers who can read the code and participate. Some snobbishly claim this is a good filter to keep the less intelligent programmers away. Haskell has other trade-offs (necessarily to enable the mathematical modelling) which we’ve discussed else where are outside the scope of this concise summary, including but not limited to (and my descriptions may not be entirely accurate given my limited Haskell fluency) the debugging non-determinism tsuris of lazy evaluation (although the requirement for lazy evaluation is disputed), the bottom type populating all types, a total ordering (or algebra) on implementation of each data type for each typeclass interface, a total ordering on the sequential (monadic) threading of I/O processing in the code (requiring unsafe code to fully accommodate the non-determinism of I/O), etc..

Edit: where this gets complicated is you then end up with two kinds of pointers "owning pointers" which must form a Directed-Acyclic-Graph, and non-owning (or reference) pointers (and links between data-structures). The simple solution is to make all 'references' weak-references, so that you have to null-test them when dereferencing to check if the resource they point to has gone away. Then we simply delete things when they go out of scope, removing all resources pointed to by owning-pointers in the DAG. No GC no Reference counting, and a simple non-zero check on dereference. Of cource 'nullable' pointers are not considered a good thing, so we can then look for static techniques where we do not allow references to outlive the object to they reference. This is where Rust's lifetimes come from.

Human managed weak references are an incomplete and human-error prone strategy. Also as you noted then you leak the non-determinism from deallocation to how to handle error conditions of null pointers, leaking the non-determinism into program logic! That is horrible. Whereas, a provably sound weak pointer algorithm has exponential asymptotics and thus is no better or worse than tracing GC mark-sweep.2

12 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §3.2 Deferred reference counting, pg. 45.

13 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §9 Garbage Collection for C, pg. 227.

14 Y. Lin, S. M. Blackburn, A. L. Hosking, and M. Norrish (2016), "Rust as a Language for High Performance GC Implementation," in Proceedings of the Sixteenth ACM SIGPLAN International Symposium on Memory Management, ISMM ‘16, Santa Barbara, CA, June 13, 2016, 2016.

15 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, Preface: The audience, pg. xxiv, Preface: The Bibliography and the World-Wide Web, pg. xxv.

16 Stephen Blackburn; Kathryn McKinley (2003). "Ulterior Reference Counting: Fast Garbage Collection without a Long Wait". Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications. OOPSLA 2003. I originally found this on Wikipedia’s page for Reference Counting.


Kindle version of the book is available on Amazon for less then $20 if you don’t want to turn your monitor sideways to read the “free” (copyright violation theft) pdf linked above.

EDIT: a more concise overview resource:

@NodixBlockchain wrote:

https://openresearch-repository.anu.edu.au/bitstream/1885/9053/1/02WholeThesis_Garner.pdf

This papper cover it all :D

@keean

This comment has been minimized.

Show comment
Hide comment
@keean

keean Nov 21, 2017

Owner

The claims made about GC efficiency in those kind of books never turn out to be true in reality. Look at how JavaScript is slow yet emscripten can get within factor of 2 of native 'C', yet both run on the same VM. The difference is that asm.js and WebASM are strongly typed and manually manage the memory. In theory the JIT can make well typed JS code as fast as strongly typed code, so well typed TypeScript should be as fast as the compiled asm.js/WebASM, but its not. The remaining difference is mostly down to memory management in my opinion.

Owner

keean commented Nov 21, 2017

The claims made about GC efficiency in those kind of books never turn out to be true in reality. Look at how JavaScript is slow yet emscripten can get within factor of 2 of native 'C', yet both run on the same VM. The difference is that asm.js and WebASM are strongly typed and manually manage the memory. In theory the JIT can make well typed JS code as fast as strongly typed code, so well typed TypeScript should be as fast as the compiled asm.js/WebASM, but its not. The remaining difference is mostly down to memory management in my opinion.

@NodixBlockchain

This comment has been minimized.

Show comment
Hide comment
@NodixBlockchain

NodixBlockchain Nov 21, 2017

Edit: where this gets complicated is you then end up with two kinds of pointers "owning pointers" which must form a Directed-Acyclic-Graph, and non-owning (or reference) pointers (and links between data-structures). The simple solution is to make all 'references' weak-references, so that you have to null-test them when dereferencing to check if the resource they point to has gone away. Then we simply delete things when they go out of scope, removing all resources pointed to by owning-pointers in the DAG. No GC no Reference counting, and a simple non-zero check on dereference. Of cource 'nullable' pointers are not considered a good thing, so we can then look for static techniques where we do not allow references to outlive the object to they reference. This is where Rust's lifetimes come from.

I wonder if this solution would work if the object instance referenced can change during the lifetime of the reference.

Because it would mean you might need to delete the referenced object before the reference get out of scope in case a new object is assigned to it during its lifetime.

NodixBlockchain commented Nov 21, 2017

Edit: where this gets complicated is you then end up with two kinds of pointers "owning pointers" which must form a Directed-Acyclic-Graph, and non-owning (or reference) pointers (and links between data-structures). The simple solution is to make all 'references' weak-references, so that you have to null-test them when dereferencing to check if the resource they point to has gone away. Then we simply delete things when they go out of scope, removing all resources pointed to by owning-pointers in the DAG. No GC no Reference counting, and a simple non-zero check on dereference. Of cource 'nullable' pointers are not considered a good thing, so we can then look for static techniques where we do not allow references to outlive the object to they reference. This is where Rust's lifetimes come from.

I wonder if this solution would work if the object instance referenced can change during the lifetime of the reference.

Because it would mean you might need to delete the referenced object before the reference get out of scope in case a new object is assigned to it during its lifetime.

@barrkel

This comment has been minimized.

Show comment
Hide comment
@barrkel

barrkel Nov 21, 2017

barrkel commented Nov 21, 2017

@keean

This comment has been minimized.

Show comment
Hide comment
@keean

keean Nov 21, 2017

Owner

TypeScript can force you to stick to classes for objects, allowing the JIT to efficiently optimise these. Really for JS where there are no allocations going on it is as fast as asm.js. GC hides all sorts of costs, like the extra level of pointer indirection.

Probably a major factor is that GC hides the cost of allocation and deallocation, so the programmer does not realise they are thrashing object creation.

What is the fastest garbage collected language?

Owner

keean commented Nov 21, 2017

TypeScript can force you to stick to classes for objects, allowing the JIT to efficiently optimise these. Really for JS where there are no allocations going on it is as fast as asm.js. GC hides all sorts of costs, like the extra level of pointer indirection.

Probably a major factor is that GC hides the cost of allocation and deallocation, so the programmer does not realise they are thrashing object creation.

What is the fastest garbage collected language?

@keean

This comment has been minimized.

Show comment
Hide comment
@keean

keean Nov 21, 2017

Owner

GC is even worse with multi threads, because it has to stop all the threads to do the collection. This leads to GC pauses, and bad user interface Jank, plus programs crashing due to fragmentation...

Owner

keean commented Nov 21, 2017

GC is even worse with multi threads, because it has to stop all the threads to do the collection. This leads to GC pauses, and bad user interface Jank, plus programs crashing due to fragmentation...

@keean

This comment has been minimized.

Show comment
Hide comment
@keean

keean Nov 21, 2017

Owner

The cache issue for me i don't think it's a problem as pointer and reference count are supposed to always be in the same cache line

The reference count needs to be in the object referenced, not the pointer, so it won't be in the same cache line as the pointer. When you have multiple pointers to the same object they all need to increment/decrement the same counter.

Owner

keean commented Nov 21, 2017

The cache issue for me i don't think it's a problem as pointer and reference count are supposed to always be in the same cache line

The reference count needs to be in the object referenced, not the pointer, so it won't be in the same cache line as the pointer. When you have multiple pointers to the same object they all need to increment/decrement the same counter.

@keean

This comment has been minimized.

Show comment
Hide comment
@keean

keean Nov 21, 2017

Owner

See: https://softwareengineering.stackexchange.com/questions/203745/demonstration-of-garbage-collection-being-faster-than-manual-memory-management

The key fact for me is that managed memory usage is always significantly worse than the managed. Yes GC can be fast providing you have twice the RAM available. Typically on a multi-tasking machine this means less RAM available for other processes. This results is real slowdown of things like the disk-caching layer because its losing pages to the application.

The end result is that the user experience of using a system with GC is significantly worse than using a system with manual allocation - providing the software does not crash due to bad memory management :-)

Owner

keean commented Nov 21, 2017

See: https://softwareengineering.stackexchange.com/questions/203745/demonstration-of-garbage-collection-being-faster-than-manual-memory-management

The key fact for me is that managed memory usage is always significantly worse than the managed. Yes GC can be fast providing you have twice the RAM available. Typically on a multi-tasking machine this means less RAM available for other processes. This results is real slowdown of things like the disk-caching layer because its losing pages to the application.

The end result is that the user experience of using a system with GC is significantly worse than using a system with manual allocation - providing the software does not crash due to bad memory management :-)

@keean

This comment has been minimized.

Show comment
Hide comment
@keean

keean Nov 21, 2017

Owner

It cannot be with the pointer, because when you create a new pointer to the array you need to increment the reference count.

Owner

keean commented Nov 21, 2017

It cannot be with the pointer, because when you create a new pointer to the array you need to increment the reference count.

@barrkel

This comment has been minimized.

Show comment
Hide comment
@barrkel

barrkel Nov 21, 2017

barrkel commented Nov 21, 2017

@keean

This comment has been minimized.

Show comment
Hide comment
@keean

keean Nov 21, 2017

Owner

I mostly just butted in because accusing GC for JS / Typescript being
slower than webasm seemed a bit outlandish. GC overhead is directly
measurable; it's trivially disproved for anyone inclined to measure.

Actually the difference in performance between JS and WebASM is only about 10-20% anyway, so that is in the 2%-10% region you claimed for the GC, so for some algorithms it could account for 100% of the difference...

Owner

keean commented Nov 21, 2017

I mostly just butted in because accusing GC for JS / Typescript being
slower than webasm seemed a bit outlandish. GC overhead is directly
measurable; it's trivially disproved for anyone inclined to measure.

Actually the difference in performance between JS and WebASM is only about 10-20% anyway, so that is in the 2%-10% region you claimed for the GC, so for some algorithms it could account for 100% of the difference...

@keean

This comment has been minimized.

Show comment
Hide comment
@keean

keean Nov 21, 2017

Owner

Interestingly I am about to port a bunch of asm.js to typescript, as the memory allocation for asm.js / webasm does not work well with the main JS garbage collector, and allocating the heap for the webasm is causing fragmentation of the JS heap. In the end it turns out that the better memory usage, keeping all them memory managed by the GC is better for the application stability than the small performance gain of having the code in asm.js.

Following this argument to its logical conclusion, the better memory usage of manually managed memory for the whole application would be better overall.

Edit: its probably worth pointing out that this application uses a lot of memory, and has some periods when it is performing rapid allocations. This leads to memory usage for a single tab of about 1GB (even though memory buffers are 16mb so with manual allocation it would get nowhere near that level). Frequently when usage gets more than 1GB the web-browser crashes, even though most of that memory is probably free, and just waiting to be collected.

Owner

keean commented Nov 21, 2017

Interestingly I am about to port a bunch of asm.js to typescript, as the memory allocation for asm.js / webasm does not work well with the main JS garbage collector, and allocating the heap for the webasm is causing fragmentation of the JS heap. In the end it turns out that the better memory usage, keeping all them memory managed by the GC is better for the application stability than the small performance gain of having the code in asm.js.

Following this argument to its logical conclusion, the better memory usage of manually managed memory for the whole application would be better overall.

Edit: its probably worth pointing out that this application uses a lot of memory, and has some periods when it is performing rapid allocations. This leads to memory usage for a single tab of about 1GB (even though memory buffers are 16mb so with manual allocation it would get nowhere near that level). Frequently when usage gets more than 1GB the web-browser crashes, even though most of that memory is probably free, and just waiting to be collected.

@shelby3

This comment has been minimized.

Show comment
Hide comment
@shelby3

shelby3 Nov 21, 2017

@NodixBlockchain wrote:

What make me think that cyclic dependency are not a problem with the way i handle it, is that i have a case of cyclic dependency like this.

This is off-topic. Explicit, unchecked (i.e. “unsafe”) manual mgmt is not what we are interested in here. The topic was about automatic (or compiler checked deterministic) mgmt. As you know, both @keean and I are significantly into compiler-assisted checking.

I understand you want to participate. But you really have not wrapped your mind around all the concepts, and so you should be more circumspect and lurk more. I said if you want to post to your thread, then I will try to participate occasionally.

Please be respectful and recognize your limitations.

I'm aware of the potential downside, but for me they remain benign for the use case i have for the moment.

After briefly learning about it, I’m no longer interested in your low-level framework. Your monotheistic intent on pushing your low-level preferences diverges the focus of my thread, because you don’t have an open-mind attempting to be objective about research results. Your low-level framework focus is irrelevant to the topic of this thread. Please talk about that in your thread which you entitled “Saluations! :)”. This/my thread is not for trying to convince yourself and/or the world that your low-level framework paradigm is great. Your thread is for that. I wouldn’t ban you even if this were my repository, but I believe curation (not absolute censorship) is valuable. Trust me, if you write anything in your thread which I happen to read, that I think is important enough to be in this thread, I will quote it over in this thread. Also I presented a guideline given that I find your posts have so many errors and low informational value, that you could post after every 10 posts by others in the thread (so as to force you to focus yourself more and keep the S/N ratio here reasonable).

I considered the low-level implications as even more evident by the post which follows this one. And realized that your preferences are a dinosaur. See my next post.

I am discussing here about general purpose, high-level programming language design that employs compiler checking and/or automated runtime GC to facilitate lower costs of development and guard against human error. With some thought given to performance optimization and some perhaps C-like low-level capabilities as long as the result is clean, simple, and elegant and not explicit, manual, bugs-prone human mgmt of memory.

shelby3 commented Nov 21, 2017

@NodixBlockchain wrote:

What make me think that cyclic dependency are not a problem with the way i handle it, is that i have a case of cyclic dependency like this.

This is off-topic. Explicit, unchecked (i.e. “unsafe”) manual mgmt is not what we are interested in here. The topic was about automatic (or compiler checked deterministic) mgmt. As you know, both @keean and I are significantly into compiler-assisted checking.

I understand you want to participate. But you really have not wrapped your mind around all the concepts, and so you should be more circumspect and lurk more. I said if you want to post to your thread, then I will try to participate occasionally.

Please be respectful and recognize your limitations.

I'm aware of the potential downside, but for me they remain benign for the use case i have for the moment.

After briefly learning about it, I’m no longer interested in your low-level framework. Your monotheistic intent on pushing your low-level preferences diverges the focus of my thread, because you don’t have an open-mind attempting to be objective about research results. Your low-level framework focus is irrelevant to the topic of this thread. Please talk about that in your thread which you entitled “Saluations! :)”. This/my thread is not for trying to convince yourself and/or the world that your low-level framework paradigm is great. Your thread is for that. I wouldn’t ban you even if this were my repository, but I believe curation (not absolute censorship) is valuable. Trust me, if you write anything in your thread which I happen to read, that I think is important enough to be in this thread, I will quote it over in this thread. Also I presented a guideline given that I find your posts have so many errors and low informational value, that you could post after every 10 posts by others in the thread (so as to force you to focus yourself more and keep the S/N ratio here reasonable).

I considered the low-level implications as even more evident by the post which follows this one. And realized that your preferences are a dinosaur. See my next post.

I am discussing here about general purpose, high-level programming language design that employs compiler checking and/or automated runtime GC to facilitate lower costs of development and guard against human error. With some thought given to performance optimization and some perhaps C-like low-level capabilities as long as the result is clean, simple, and elegant and not explicit, manual, bugs-prone human mgmt of memory.

@shelby3

This comment has been minimized.

Show comment
Hide comment
@shelby3

shelby3 Nov 21, 2017

@keean wrote:

The difference is that asm.js and WebASM are strongly typed and manually manage the memory. In theory the JIT can make well typed JS code as fast as strongly typed code, so well typed TypeScript should be as fast as the compiled asm.js/WebASM, but its not. The remaining difference is mostly down to memory management in my opinion.

TypeScript transpiles to JavaScript (ES6), not ASM.js nor WASM.

(As you know, TypeScript aims to be round-trip compatible with and a typed superset of JavaScript— meaning it should pass through ES6 code more or less unchanged.)

Also C code which is transpiled to ASM.js/WASM, is employing different abstractions such as not boxing every damn thing, such as every element of every array.

The VM can’t always statically type objects, so methods and type coersions can require dynamic overhead.

Also I believe JS may be passing all function arguments on the heap (the prototype chain of objects which form the closures we previously discussed), and thus although the throughput and locality (i.e. contiguity for minimizing cache and virtual paging thrashing) is similar (i.e. allocation contiguously on the stack versus the generational heap with only an extra end of area pointer increment for each generational area allocation as compared to one for the SP on each function call), the volume of objects before a collection (i.e. reduction of the end of area pointer, although presumably all function arguments could be allocated as one data structure thus equivalent cost) is much greater than for stack passed arguments and thus perhaps additional thrashing.

C code rarely employs hashmap lookup, yet JS literally encourages it since all object properties and non-numeric array indices employ it.

There are many more details and we would need to study all the reasons:

https://news.ycombinator.com/item?id=7943303
https://www.html5rocks.com/en/tutorials/speed/v8/
https://blog.sessionstack.com/how-javascript-works-inside-the-v8-engine-5-tips-on-how-to-write-optimized-code-ac089e62b12e

This thread is in part my attempt to get the best of low-level ASM.js/WASM married to a better GC scheme that is more compatible with (some of) those lower-level abstractions!

[…] GC hides all sorts of costs, like the extra level of pointer indirection.

Probably a major factor is that GC hides the cost of allocation and deallocation, so the programmer does not realise they are thrashing object creation.

Agreed, e.g. do not realize they’re boxing everything forcing an extra pointer indirection for every read/write of an array element for example.

Actually the difference in performance between JS and WebASM[ASM.js] is only about 10-20% anyway, so that is in the 2%-10% region you claimed for the GC, so for some algorithms it could account for 100% of the difference...

Huh? JS is several times slower than C, and ASM.js is afair purportedly within about 40% slower than native C speed:

https://julialang.org/benchmarks/
https://benchmarksgame.alioth.debian.org/u64q/compare.php?lang=node&lang2=gpp

Perhaps you’re referring to JS highly optimized to avoid features which impair aforelinked V8’s optimization?

Yes the choice is between static or runtime checking. The best is to use the stack, so there is no need for checking, but that does not work for all scenarios.

Generational GC is very efficient nearly as much as the stack (as I explained) and I am going to propose an algorithm that I am thinking might make it even more efficient.

What he fails to realise is that 2-10% is only true when you have enough memory.

It is true that the asymptotic computational complexity (i.e. performance) of tracing GC is pathological both as live memory usage approaches/exceeds the bound of physical memory and as total memory consumption increases for computers with more GBs (or TBs) of DRAM. That is why I cited that research for footnote16 in my prior post—which btw my independently derived idea seems to be roughly analogous to—for removing the pathological asymptotics by leveraging an optimized form of RC for the older generations. I will write my detailed post about this soon.

And that is why I allude in my response to @barrkel to the time vs. space trade-off approaching parity between automated GC and manual/stack/compile-time deterministic paradigms for memory management. The reason that the deterministic paradigms you espouse will not exceed the space vs. time trade-offs for the throughput (nor maximum latency pauses) of state-of-the-art automated GC in general is because I repeat that memory allocation mgmt is inherently non-deterministic (thus requires runtime management)!

Also there is research1 (which I haven’t read yet) which seems to indicate that as the number of (perhaps also non-deterministic) factors (e.g. power and energy management issues) that need to be managed increases, then automatic algorithms may become the most plausible (reasonable cost, time-to-market, stability, maintenance) for most applications with hand-tuning being further, further relegated to the esoteric that demand extreme tuning (e.g. the NASA space shuttle).

Yes GC can be fast providing you have twice the RAM available.

Even manual memory mgmt will have virtual memory paging out to disk as the physical memory is exhausted.

That half memory deal is afaik due to the “semi-space” of a copy collector that is not generational. Afaics, there are better alternatives now. See footnote16 in my prior post. I need to study this more thoroughly though.

Of course manual (both unchecked and compiler checked) mgmt can be a bit faster or correct some pathological cases (as well create potentially other ones ;), but at great cost as I wrote in my prior post as follows:

I apply the 80/20 or 90/10 Pareto principle to prioritization. Although all that special case tsuris might be able to obtain an extra 10% of performance, it has the cost 90% more effort on debugging, readability of the code, maintenance of the code, complexity of the code, etc.. When you really need that 10% boost (or in the case where the more generalized GC paradigm is much slower for some reason), then go write a module in C++ or Rust. I see no need to invest my effort in trying to recreate (or improve upon) those languages, because there are millions of man-hours already in those ecosystems. Our only prayer of creating something useful in this decade with our resources, is to 90/10% prioritize on clean abstractions that are elegant and generalized and 90% performant with 10% of the effort and complexity.

And you alluded to the great cost in terms of bugs and/or tsuris:

using a system with manual allocation - providing the software does not crash due to bad memory management :-)

In my prior post, I had alluded to how attempting to have the compiler check memory mgmt is forcing it into a deterministic, total order, which will necessarily cause the programmer to violate the compiler checks and leads to bugs (unless perhaps that compile-time paradigm is somehow limited to only deterministic instances, but from my study thus far it appears that the cross-pollution/interopt between young and older generation allocation makes non-deterministic, runtime automated algorithm interop with a deterministic young generation have worse characteristics than a runtime, automated generational non-deterministic collector for the younger generation):

To even attempt to apply these compile-time deterministic memory management holistically requires a complex typing tsuris akin to Rust’s total ordering of borrow lifetimes with lifetime annotation parameters for transitivity of borrows from inputs to outputs, and of course the programmer must violate the invariants and create unsafe code in order to handle non-deterministic (i.e. runtime randomized) memory mgmt, which defeats the entire point as then the total order is no longer checked and the unsafety can leak back into the code the compiler thinks is safe.

[…]

Total ordering compile-time paradigms (including attempting to type check everything) are afaics brittle paradigms that break because the programmer must necessarily break out of them because the entropy of the universe is not bounded. I commented on this previously numerous times in these issues threads. The non-determinism (i.e. unbounded randomness and entropy) of the universe (i.e. the real world in I/O) can’t be statically modeled.

1 T. Cao, S. M. Blackburn, T. Gao, and K. S. McKinley, "The Yin and Yang of Power and Performance for Asymmetric Hardware and Managed Software," in ISCA ‘12: The 39th International Symposium on Computer Architecture, 2012. Originally found on Steve Blackburn’s website. Also T. Cao’s master’s thesis prior art.


@barrkel wrote:

The argument in favour of GC isn't speed of execution, it's speed of
development at very low costs - an extra 2 to 10% or so. Speed of
development in memory safe languages is significantly higher

+1.

Cited:

Hertz and Berger [2005] explore the tradeoff between automatic and manual memory management in an artificial setting, using a memory trace from a previous run of the benchmark to determine when objects become unreachable and inserting explicit calls to free objects at the moment they become unreachable. This analysis, while interesting in itself, ignores the enormous programming effort and overhead (as in talloc above) required to track the ownership of objects and determine when they are no longer used. The most frequently cited study is Rovner [1985], who found that Mesa programmers spent 40% of their time implementing memory management procedures and finding errors related to explicit storage management.

Also it is about consistency of memory mgmt, i.e. avoiding the pitfalls of the average programmer or rushed development team.

Yes, GC scales because it trades space for time. If you're running on a
small embedded device, it can be a poor tradeoff; if you're trying to
maximize compute throughput across multiple machines, it can cost real
dollars. But these are extremes, and most people aren't on the extremes.

Well yes, but as I mentioned above in my reply to @keean, and I presume you’re also alluding to, that the differences in space vs. time tradeoffs between automatic and manual memory mgmt are getting closer to low double digit (or even single digit) percentages (i.e. “extremes” of optimization) and thus heavily favoring automatic GC for most use cases. Note I’m not conflating this metric with the 2 - 10% figure for throughput differences that you asserted.

I mostly just butted in because accusing GC for JS / Typescript being
slower than webasm seemed a bit outlandish. GC overhead is directly
measurable; it's trivially disproved for anyone inclined to measure.

I'm gonna unsubscribe from this thread now.

Note though that the metric you propose (although significantly representative given all other factors equivalent) doesn’t account for all impacts of the GC scheme chosen, such as cache and virtual paging locality (contiguity) to minimize thrashing which can significantly impact performance. Note however, that deterministic compile-time (manual explicit or for stack argument conventions and such thus assisted partially implicit) memory mgmt doesn’t necessarily achieve a panacea for this ancillary factor and could actually degrade it because generational and copy GC increase locality.

I appreciate your assistance in helping to explain. And I agree that is an important clarification to emphasize. Please feel free to participate here as often or infrequently as you want.

Btw, @keean is expert in other areas such as C/C++, Haskell, Prolog, type theory, and library algebras. Afaik, he has only been developing seriously with JS/TypeScript for a relatively much shorter period of time. So that might explain it. Or it could also be rushing and trying to answer late in the evening after a long day of managing his company.

@barrkel wrote:

Types are part of it, but it's more about
abstraction level. Higher abstraction means the runtime needs to perform
more work to get to machine code, and will be using generalisations (aka
baked in assumptions about trade-offs) to get there - and on a JIT time
budget too. Whereas webasm is designed to be efficiently and unambiguously
converted to machine code, so there will be fewer trade-offs, and the
compiler targeting webasm can spend much more time analysing the trade-offs
involved in reducing abstraction level.

[…}

And that's without getting into the costs of semantic fidelity. You can't
simply pretend JS's prototype inheritance doesn't exist given typescript
source; all the work needs to be done anyhow, just in case, since the web
is a distributed system and thus non-deterministic.

+1.

shelby3 commented Nov 21, 2017

@keean wrote:

The difference is that asm.js and WebASM are strongly typed and manually manage the memory. In theory the JIT can make well typed JS code as fast as strongly typed code, so well typed TypeScript should be as fast as the compiled asm.js/WebASM, but its not. The remaining difference is mostly down to memory management in my opinion.

TypeScript transpiles to JavaScript (ES6), not ASM.js nor WASM.

(As you know, TypeScript aims to be round-trip compatible with and a typed superset of JavaScript— meaning it should pass through ES6 code more or less unchanged.)

Also C code which is transpiled to ASM.js/WASM, is employing different abstractions such as not boxing every damn thing, such as every element of every array.

The VM can’t always statically type objects, so methods and type coersions can require dynamic overhead.

Also I believe JS may be passing all function arguments on the heap (the prototype chain of objects which form the closures we previously discussed), and thus although the throughput and locality (i.e. contiguity for minimizing cache and virtual paging thrashing) is similar (i.e. allocation contiguously on the stack versus the generational heap with only an extra end of area pointer increment for each generational area allocation as compared to one for the SP on each function call), the volume of objects before a collection (i.e. reduction of the end of area pointer, although presumably all function arguments could be allocated as one data structure thus equivalent cost) is much greater than for stack passed arguments and thus perhaps additional thrashing.

C code rarely employs hashmap lookup, yet JS literally encourages it since all object properties and non-numeric array indices employ it.

There are many more details and we would need to study all the reasons:

https://news.ycombinator.com/item?id=7943303
https://www.html5rocks.com/en/tutorials/speed/v8/
https://blog.sessionstack.com/how-javascript-works-inside-the-v8-engine-5-tips-on-how-to-write-optimized-code-ac089e62b12e

This thread is in part my attempt to get the best of low-level ASM.js/WASM married to a better GC scheme that is more compatible with (some of) those lower-level abstractions!

[…] GC hides all sorts of costs, like the extra level of pointer indirection.

Probably a major factor is that GC hides the cost of allocation and deallocation, so the programmer does not realise they are thrashing object creation.

Agreed, e.g. do not realize they’re boxing everything forcing an extra pointer indirection for every read/write of an array element for example.

Actually the difference in performance between JS and WebASM[ASM.js] is only about 10-20% anyway, so that is in the 2%-10% region you claimed for the GC, so for some algorithms it could account for 100% of the difference...

Huh? JS is several times slower than C, and ASM.js is afair purportedly within about 40% slower than native C speed:

https://julialang.org/benchmarks/
https://benchmarksgame.alioth.debian.org/u64q/compare.php?lang=node&lang2=gpp

Perhaps you’re referring to JS highly optimized to avoid features which impair aforelinked V8’s optimization?

Yes the choice is between static or runtime checking. The best is to use the stack, so there is no need for checking, but that does not work for all scenarios.

Generational GC is very efficient nearly as much as the stack (as I explained) and I am going to propose an algorithm that I am thinking might make it even more efficient.

What he fails to realise is that 2-10% is only true when you have enough memory.

It is true that the asymptotic computational complexity (i.e. performance) of tracing GC is pathological both as live memory usage approaches/exceeds the bound of physical memory and as total memory consumption increases for computers with more GBs (or TBs) of DRAM. That is why I cited that research for footnote16 in my prior post—which btw my independently derived idea seems to be roughly analogous to—for removing the pathological asymptotics by leveraging an optimized form of RC for the older generations. I will write my detailed post about this soon.

And that is why I allude in my response to @barrkel to the time vs. space trade-off approaching parity between automated GC and manual/stack/compile-time deterministic paradigms for memory management. The reason that the deterministic paradigms you espouse will not exceed the space vs. time trade-offs for the throughput (nor maximum latency pauses) of state-of-the-art automated GC in general is because I repeat that memory allocation mgmt is inherently non-deterministic (thus requires runtime management)!

Also there is research1 (which I haven’t read yet) which seems to indicate that as the number of (perhaps also non-deterministic) factors (e.g. power and energy management issues) that need to be managed increases, then automatic algorithms may become the most plausible (reasonable cost, time-to-market, stability, maintenance) for most applications with hand-tuning being further, further relegated to the esoteric that demand extreme tuning (e.g. the NASA space shuttle).

Yes GC can be fast providing you have twice the RAM available.

Even manual memory mgmt will have virtual memory paging out to disk as the physical memory is exhausted.

That half memory deal is afaik due to the “semi-space” of a copy collector that is not generational. Afaics, there are better alternatives now. See footnote16 in my prior post. I need to study this more thoroughly though.

Of course manual (both unchecked and compiler checked) mgmt can be a bit faster or correct some pathological cases (as well create potentially other ones ;), but at great cost as I wrote in my prior post as follows:

I apply the 80/20 or 90/10 Pareto principle to prioritization. Although all that special case tsuris might be able to obtain an extra 10% of performance, it has the cost 90% more effort on debugging, readability of the code, maintenance of the code, complexity of the code, etc.. When you really need that 10% boost (or in the case where the more generalized GC paradigm is much slower for some reason), then go write a module in C++ or Rust. I see no need to invest my effort in trying to recreate (or improve upon) those languages, because there are millions of man-hours already in those ecosystems. Our only prayer of creating something useful in this decade with our resources, is to 90/10% prioritize on clean abstractions that are elegant and generalized and 90% performant with 10% of the effort and complexity.

And you alluded to the great cost in terms of bugs and/or tsuris:

using a system with manual allocation - providing the software does not crash due to bad memory management :-)

In my prior post, I had alluded to how attempting to have the compiler check memory mgmt is forcing it into a deterministic, total order, which will necessarily cause the programmer to violate the compiler checks and leads to bugs (unless perhaps that compile-time paradigm is somehow limited to only deterministic instances, but from my study thus far it appears that the cross-pollution/interopt between young and older generation allocation makes non-deterministic, runtime automated algorithm interop with a deterministic young generation have worse characteristics than a runtime, automated generational non-deterministic collector for the younger generation):

To even attempt to apply these compile-time deterministic memory management holistically requires a complex typing tsuris akin to Rust’s total ordering of borrow lifetimes with lifetime annotation parameters for transitivity of borrows from inputs to outputs, and of course the programmer must violate the invariants and create unsafe code in order to handle non-deterministic (i.e. runtime randomized) memory mgmt, which defeats the entire point as then the total order is no longer checked and the unsafety can leak back into the code the compiler thinks is safe.

[…]

Total ordering compile-time paradigms (including attempting to type check everything) are afaics brittle paradigms that break because the programmer must necessarily break out of them because the entropy of the universe is not bounded. I commented on this previously numerous times in these issues threads. The non-determinism (i.e. unbounded randomness and entropy) of the universe (i.e. the real world in I/O) can’t be statically modeled.

1 T. Cao, S. M. Blackburn, T. Gao, and K. S. McKinley, "The Yin and Yang of Power and Performance for Asymmetric Hardware and Managed Software," in ISCA ‘12: The 39th International Symposium on Computer Architecture, 2012. Originally found on Steve Blackburn’s website. Also T. Cao’s master’s thesis prior art.


@barrkel wrote:

The argument in favour of GC isn't speed of execution, it's speed of
development at very low costs - an extra 2 to 10% or so. Speed of
development in memory safe languages is significantly higher

+1.

Cited:

Hertz and Berger [2005] explore the tradeoff between automatic and manual memory management in an artificial setting, using a memory trace from a previous run of the benchmark to determine when objects become unreachable and inserting explicit calls to free objects at the moment they become unreachable. This analysis, while interesting in itself, ignores the enormous programming effort and overhead (as in talloc above) required to track the ownership of objects and determine when they are no longer used. The most frequently cited study is Rovner [1985], who found that Mesa programmers spent 40% of their time implementing memory management procedures and finding errors related to explicit storage management.

Also it is about consistency of memory mgmt, i.e. avoiding the pitfalls of the average programmer or rushed development team.

Yes, GC scales because it trades space for time. If you're running on a
small embedded device, it can be a poor tradeoff; if you're trying to
maximize compute throughput across multiple machines, it can cost real
dollars. But these are extremes, and most people aren't on the extremes.

Well yes, but as I mentioned above in my reply to @keean, and I presume you’re also alluding to, that the differences in space vs. time tradeoffs between automatic and manual memory mgmt are getting closer to low double digit (or even single digit) percentages (i.e. “extremes” of optimization) and thus heavily favoring automatic GC for most use cases. Note I’m not conflating this metric with the 2 - 10% figure for throughput differences that you asserted.

I mostly just butted in because accusing GC for JS / Typescript being
slower than webasm seemed a bit outlandish. GC overhead is directly
measurable; it's trivially disproved for anyone inclined to measure.

I'm gonna unsubscribe from this thread now.

Note though that the metric you propose (although significantly representative given all other factors equivalent) doesn’t account for all impacts of the GC scheme chosen, such as cache and virtual paging locality (contiguity) to minimize thrashing which can significantly impact performance. Note however, that deterministic compile-time (manual explicit or for stack argument conventions and such thus assisted partially implicit) memory mgmt doesn’t necessarily achieve a panacea for this ancillary factor and could actually degrade it because generational and copy GC increase locality.

I appreciate your assistance in helping to explain. And I agree that is an important clarification to emphasize. Please feel free to participate here as often or infrequently as you want.

Btw, @keean is expert in other areas such as C/C++, Haskell, Prolog, type theory, and library algebras. Afaik, he has only been developing seriously with JS/TypeScript for a relatively much shorter period of time. So that might explain it. Or it could also be rushing and trying to answer late in the evening after a long day of managing his company.

@barrkel wrote:

Types are part of it, but it's more about
abstraction level. Higher abstraction means the runtime needs to perform
more work to get to machine code, and will be using generalisations (aka
baked in assumptions about trade-offs) to get there - and on a JIT time
budget too. Whereas webasm is designed to be efficiently and unambiguously
converted to machine code, so there will be fewer trade-offs, and the
compiler targeting webasm can spend much more time analysing the trade-offs
involved in reducing abstraction level.

[…}

And that's without getting into the costs of semantic fidelity. You can't
simply pretend JS's prototype inheritance doesn't exist given typescript
source; all the work needs to be done anyhow, just in case, since the web
is a distributed system and thus non-deterministic.

+1.

@NodixBlockchain

This comment has been minimized.

Show comment
Hide comment
@NodixBlockchain

NodixBlockchain Nov 22, 2017

I’m wondering whether that is a superstitious, anecdotal belief (i.e. aliasing error) or an assertion you actually studied, measured, and verified to be the case with in depth effort for comparing reference counting and tracing garbage collection. Because inherently reference counting doesn’t track the directed graph of memory allocation (i.e. all we have are reference counts and no back pointers to the references that comprise those counts); thus, there’s no feasible way to see which data structures have leaked because they’re no longer reachable from the untraced roots of the said graph in a reference counting GC system— for example due to cyclical references. Whereas, in theory a tracing garbage collector could provide such a graph for debugging.

If you consider reference as objects with a delete operators rather than only a pointer with a free, the destructor can release reference to childs before to release the reference.

It's not anecdotal, i have fully working application server with it's own memory management, object oriented script language, which deal with thousands of objects and i never spent more than 1h on tracking memory leak in the whole development cycle because i have clear methodology to track them.

The script system is actually close to javascript with tree of objects on several layers so i could make generational GC very easily, but there is too much corner case with multi threads and plugin for me to really consider it, over simple reference counting who works for every single case possible with very small code base.

Been doing tons of experiment with different model of memory management.

And i also have many experience with GC language such as AS3, android SDK and javascript, and there are always many instances where i wish there was a manual memory management, because the garbage collector won't really release the memory when it 'should', and it's very often macro managed with the GC only releasing things once in a while, which is far from being optimal if there is need for lot of dynamic object creation inside of a component that need to run for long time allocating lot of new object.

And it's not even multi threaded, and there are already these kind of problem that are recurrent in every GC language out there.

NodixBlockchain commented Nov 22, 2017

I’m wondering whether that is a superstitious, anecdotal belief (i.e. aliasing error) or an assertion you actually studied, measured, and verified to be the case with in depth effort for comparing reference counting and tracing garbage collection. Because inherently reference counting doesn’t track the directed graph of memory allocation (i.e. all we have are reference counts and no back pointers to the references that comprise those counts); thus, there’s no feasible way to see which data structures have leaked because they’re no longer reachable from the untraced roots of the said graph in a reference counting GC system— for example due to cyclical references. Whereas, in theory a tracing garbage collector could provide such a graph for debugging.

If you consider reference as objects with a delete operators rather than only a pointer with a free, the destructor can release reference to childs before to release the reference.

It's not anecdotal, i have fully working application server with it's own memory management, object oriented script language, which deal with thousands of objects and i never spent more than 1h on tracking memory leak in the whole development cycle because i have clear methodology to track them.

The script system is actually close to javascript with tree of objects on several layers so i could make generational GC very easily, but there is too much corner case with multi threads and plugin for me to really consider it, over simple reference counting who works for every single case possible with very small code base.

Been doing tons of experiment with different model of memory management.

And i also have many experience with GC language such as AS3, android SDK and javascript, and there are always many instances where i wish there was a manual memory management, because the garbage collector won't really release the memory when it 'should', and it's very often macro managed with the GC only releasing things once in a while, which is far from being optimal if there is need for lot of dynamic object creation inside of a component that need to run for long time allocating lot of new object.

And it's not even multi threaded, and there are already these kind of problem that are recurrent in every GC language out there.

@shelby3

This comment has been minimized.

Show comment
Hide comment
@shelby3

shelby3 Nov 22, 2017

I’m trying to preempt another 1000+ post thread. We already had extensive discussions of your ideas, framework, perspective in the Concurrency and your “Saluations! :)” threads. We don’t need to repeat that again. I’m not trying to be dictatorial or unappreciative of discussion. I’m very interested right now in automated GC, for the reasons I’m explaining. I want to see if we can make it work optimally with the lower-level abstractions and resolve the pathological asymptotics issue.

@NodixBlockchain wrote:

If you consider reference as objects with a delete operators rather than only a pointer with a free, the destructor can release reference to childs before to release the reference.

I already told you that explicitly asking the programmer to remember to break cycles is off-topic (i.e. the destructor will not be automatically fired if there is a cyclic reference preventing the parent most object's reference count from going to 0):

This is off-topic. Explicit, unchecked (i.e. “unsafe”) manual mgmt is not what we are interested in here. The topic was about automatic (or compiler checked deterministic) mgmt. As you know, both @keean and I are significantly into compiler-assisted checking.

Although you’ve deleted (or moved to your thread?) those numerous prior off-topic posts I asked you to (thank you), you want to repeat the same points of what I percieve to be off-topic noise w.r.t. to the focus I want in this thread. I understand you think you have something important to say, but you apparently did not even pay attention or grok that I had already stated that it (i.e. the same point you’re repeating again) was/is off-topic.

@NodixBlockchain wrote:

but there is too much corner case with multi threads

We already discussed in the prior threads that (your and Rust’s paradigm of allowing/presuming) willy-nilly multithreading everywhere is not the paradigm I’m interested in. I’m looking at the single-threaded event model. I will be discussing GC and multithreading in this thread.

@NodixBlockchain wrote:

It's not anecdotal, i have fully working application server with it's own memory management, object oriented script language, which deal with thousands of objects and i never spent more than 1h on tracking memory leak in the whole development cycle because i have clear methodology to track them.

How many times am I going to have to repeat myself that I am not interested in discussing further the anecdotal claims of your low-level framework. You’re apparently overlooking the more holistic point that:

  • We’re (or at least I’m) not interested in your manual “unsafe” (i.e. not checked by compiler not runtime enforced) paradigm. Been there, done that in the 1980s and early 1990s. In late 1990s I adopted simplistic reference counting with C++ for CoolPage. Now I’m advancing further. IMO, you’re a dinosaur.

  • My reasons for not being interested are detailed in my prior response to @keean and @barrkel.

    You’re preferences and experience with your framework are not indicative of the development costs and other trade-offs for the community-at-large with such a paradigm, i.e. your arguments are anecdotal. When you have a large community and a lot of research/surveys/metrics amongst many general purpose applications refuting the exposition I have made based on more extensive sources, then we can discuss yours. No doubt that if you’re personally very dedicated and expert in hand-tuning with your framework specifically designed for the blockchain application you’re building, then you can probably get outstanding metrics as an anecdotal example application, but that isn’t provably (and IMO unlikely to be) relevant in the larger scheme of things w.r.t. to general applications and general developer productivity, maintenance, stability, bugginess, etc.. Again please refer to my prior response to @keean and @barrkel for my reasoning on why manual, human-error prone management of resource factors is a computer science cul-de-sac. Whether my perspective is correct or not, afaics it’s the current focus of this thread. Even the owner of this repository @keean who expressed some preference for some compile-time deterministic memory mgmt, afaics doesn’t want to allow cyclic references memory leaks that require human intervention, as you’re apparently proposing.

    Additionally having your own methodology for your own handcrafted framework is not a valid comparison to what is possible but has not yet been so handcrafted for tracing GC (so as to be able to compare apples-to-apples for a diverse ecosystem of users and use cases), as I had already explained in a prior response as follows:

    In both cases, a sophisticated debugger and language could in theory be designed to set such conditional breakpoints on all pointer assignments (conditionally breaking only if the said shared object address is encountered), if such a facility is helpful for debugging memory leaks. And the tracing GC would have more information about the memory allocation graph.

If you’re interested in your paradigm in spite of what I perceive to be the convincing exposition in my prior posts, then please feel free of course to discuss in a different thread. I may occasionally read there. If you make your best arguments there (not in this thread) and if I see anything I agree with, then I will quote it over here (and invite/entice you to discuss here). Please do not pollute my thread (at least respect the one post for every 10 others request, or 6 if all the intervening posts are lengthy and infrequent) with what I currently perceive to be off-topic discussion. I do not want an interactive chat style discussion in this thread. We can chat back-and-forth realtime in private or another thread. I wanted focused, carefully thought out discussion in this thread that is easy to re-read and condense enough for others to follow. Because this is my thread wherein I am trying to sort of what language paradigms I need and this is a very serious issue I am in a rush to complete and I do not want the distraction of what I perceive to be a boisterous (repetitively arguing the same points over and over) dinosaur and entirely incorrect.

Essentially you’re trying to route me away from extensive sources to your pet project. That is very myopic.

You’re apparently a prolific and very productive programmer (unlike my pitiful self reduced to rubble because still apparently suffering from a perhaps resistant strain of Tuberculosis). You and I aren’t able to collaborate much because you and I are ostensibly headed in different directions w.r.t. programming paradigms, while maintaining very low overhead on throughput.

@NodixBlockchain wrote:

And i also have many experience with GC language such as AS3, android SDK and javascript, and there are always many instances where i wish there was a manual memory management, because the garbage collector won't really release the memory when it 'should', and it's very often macro managed with the GC only releasing things once in a while, which is far from being optimal if there is need for lot of dynamic object creation inside of a component that need to run for long time allocating lot of new object.

It “should” only do so in a balance of throughput, asymptotics, pause latency, and (time & space) locality. If you’re preferring priorities of time locality (immediacy), then not only are you attempting the impossible of defying the non-determinism of memory mgmt (i.e. immediacy is irrelevant when the period of allocation is non-deterministic), you’re also not optimizing the entire equation for memory mgmt. This sort of myopia might not manifest as measured problems in your handcrafted project due to the confirmation bias of handcrafted design decisions, but may in more diverse and general ecosystem of use cases and developers. Your exposition has the hallmarks of an amateur lacking academic study.

You’re referring to automated GC implementations from a decade or more ago. Even Google’s JavaScript V8’s GC is only incremental, generational with a standard pathological asymptotics mark-sweep for the older generation.

I wrote already in this thread that new research has drastically improved this issue:

What he fails to realise is that 2-10% is only true when you have enough memory.

It is true that the asymptotic computational complexity (i.e. performance) of tracing GC is pathological both as live memory usage approaches/exceeds the bound of physical memory and as total memory consumption increases for computers with more GBs (or TBs) of DRAM. That is why I cited that research for footnote16 in my prior post—which btw my independently derived idea seems to be roughly analogous to—for removing the pathological asymptotics by leveraging an optimized form of RC for the older generations. I will write my detailed post about this soon.

And that is why I allude in my response to @barrkel to the time vs. space trade-off approaching parity between automated GC and manual/stack/compile-time deterministic paradigms for memory management. The reason that the deterministic paradigms you espouse will not exceed the space vs. time trade-offs for the throughput (nor maximum latency pauses) of state-of-the-art automated GC in general is because I repeat that memory allocation mgmt is inherently non-deterministic (thus requires runtime management)!

You make me repeat myself because you don’t assimilate all the details I write. Presumably you’re so glued to your mental model of programming that you don’t notice subtleties that paradigm-shift the presumptions.


@keean wrote:

The reference count needs to be in the object referenced, not the pointer, so it won't be in the same cache line as the pointer. When you have multiple pointers to the same object they all need to increment/decrement the same counter.

Agreed, but note in my prior post with the many footnotes about GC, I cited some research (Weighted Reference Counting) that places the count in the pointers for allocation and only has to decrement the shared count on deallocation. Also cited research about employing the reference counting only for a single-bit stored in the pointer in a reference counting generation scheme. Also in an earlier post I suggested to pulling all the counts closer together in contiguous locality to minimize cache line and virtual page faults. There’s also the concept of buffering the increments and decrements to amortize.

shelby3 commented Nov 22, 2017

I’m trying to preempt another 1000+ post thread. We already had extensive discussions of your ideas, framework, perspective in the Concurrency and your “Saluations! :)” threads. We don’t need to repeat that again. I’m not trying to be dictatorial or unappreciative of discussion. I’m very interested right now in automated GC, for the reasons I’m explaining. I want to see if we can make it work optimally with the lower-level abstractions and resolve the pathological asymptotics issue.

@NodixBlockchain wrote:

If you consider reference as objects with a delete operators rather than only a pointer with a free, the destructor can release reference to childs before to release the reference.

I already told you that explicitly asking the programmer to remember to break cycles is off-topic (i.e. the destructor will not be automatically fired if there is a cyclic reference preventing the parent most object's reference count from going to 0):

This is off-topic. Explicit, unchecked (i.e. “unsafe”) manual mgmt is not what we are interested in here. The topic was about automatic (or compiler checked deterministic) mgmt. As you know, both @keean and I are significantly into compiler-assisted checking.

Although you’ve deleted (or moved to your thread?) those numerous prior off-topic posts I asked you to (thank you), you want to repeat the same points of what I percieve to be off-topic noise w.r.t. to the focus I want in this thread. I understand you think you have something important to say, but you apparently did not even pay attention or grok that I had already stated that it (i.e. the same point you’re repeating again) was/is off-topic.

@NodixBlockchain wrote:

but there is too much corner case with multi threads

We already discussed in the prior threads that (your and Rust’s paradigm of allowing/presuming) willy-nilly multithreading everywhere is not the paradigm I’m interested in. I’m looking at the single-threaded event model. I will be discussing GC and multithreading in this thread.

@NodixBlockchain wrote:

It's not anecdotal, i have fully working application server with it's own memory management, object oriented script language, which deal with thousands of objects and i never spent more than 1h on tracking memory leak in the whole development cycle because i have clear methodology to track them.

How many times am I going to have to repeat myself that I am not interested in discussing further the anecdotal claims of your low-level framework. You’re apparently overlooking the more holistic point that:

  • We’re (or at least I’m) not interested in your manual “unsafe” (i.e. not checked by compiler not runtime enforced) paradigm. Been there, done that in the 1980s and early 1990s. In late 1990s I adopted simplistic reference counting with C++ for CoolPage. Now I’m advancing further. IMO, you’re a dinosaur.

  • My reasons for not being interested are detailed in my prior response to @keean and @barrkel.

    You’re preferences and experience with your framework are not indicative of the development costs and other trade-offs for the community-at-large with such a paradigm, i.e. your arguments are anecdotal. When you have a large community and a lot of research/surveys/metrics amongst many general purpose applications refuting the exposition I have made based on more extensive sources, then we can discuss yours. No doubt that if you’re personally very dedicated and expert in hand-tuning with your framework specifically designed for the blockchain application you’re building, then you can probably get outstanding metrics as an anecdotal example application, but that isn’t provably (and IMO unlikely to be) relevant in the larger scheme of things w.r.t. to general applications and general developer productivity, maintenance, stability, bugginess, etc.. Again please refer to my prior response to @keean and @barrkel for my reasoning on why manual, human-error prone management of resource factors is a computer science cul-de-sac. Whether my perspective is correct or not, afaics it’s the current focus of this thread. Even the owner of this repository @keean who expressed some preference for some compile-time deterministic memory mgmt, afaics doesn’t want to allow cyclic references memory leaks that require human intervention, as you’re apparently proposing.

    Additionally having your own methodology for your own handcrafted framework is not a valid comparison to what is possible but has not yet been so handcrafted for tracing GC (so as to be able to compare apples-to-apples for a diverse ecosystem of users and use cases), as I had already explained in a prior response as follows:

    In both cases, a sophisticated debugger and language could in theory be designed to set such conditional breakpoints on all pointer assignments (conditionally breaking only if the said shared object address is encountered), if such a facility is helpful for debugging memory leaks. And the tracing GC would have more information about the memory allocation graph.

If you’re interested in your paradigm in spite of what I perceive to be the convincing exposition in my prior posts, then please feel free of course to discuss in a different thread. I may occasionally read there. If you make your best arguments there (not in this thread) and if I see anything I agree with, then I will quote it over here (and invite/entice you to discuss here). Please do not pollute my thread (at least respect the one post for every 10 others request, or 6 if all the intervening posts are lengthy and infrequent) with what I currently perceive to be off-topic discussion. I do not want an interactive chat style discussion in this thread. We can chat back-and-forth realtime in private or another thread. I wanted focused, carefully thought out discussion in this thread that is easy to re-read and condense enough for others to follow. Because this is my thread wherein I am trying to sort of what language paradigms I need and this is a very serious issue I am in a rush to complete and I do not want the distraction of what I perceive to be a boisterous (repetitively arguing the same points over and over) dinosaur and entirely incorrect.

Essentially you’re trying to route me away from extensive sources to your pet project. That is very myopic.

You’re apparently a prolific and very productive programmer (unlike my pitiful self reduced to rubble because still apparently suffering from a perhaps resistant strain of Tuberculosis). You and I aren’t able to collaborate much because you and I are ostensibly headed in different directions w.r.t. programming paradigms, while maintaining very low overhead on throughput.

@NodixBlockchain wrote:

And i also have many experience with GC language such as AS3, android SDK and javascript, and there are always many instances where i wish there was a manual memory management, because the garbage collector won't really release the memory when it 'should', and it's very often macro managed with the GC only releasing things once in a while, which is far from being optimal if there is need for lot of dynamic object creation inside of a component that need to run for long time allocating lot of new object.

It “should” only do so in a balance of throughput, asymptotics, pause latency, and (time & space) locality. If you’re preferring priorities of time locality (immediacy), then not only are you attempting the impossible of defying the non-determinism of memory mgmt (i.e. immediacy is irrelevant when the period of allocation is non-deterministic), you’re also not optimizing the entire equation for memory mgmt. This sort of myopia might not manifest as measured problems in your handcrafted project due to the confirmation bias of handcrafted design decisions, but may in more diverse and general ecosystem of use cases and developers. Your exposition has the hallmarks of an amateur lacking academic study.

You’re referring to automated GC implementations from a decade or more ago. Even Google’s JavaScript V8’s GC is only incremental, generational with a standard pathological asymptotics mark-sweep for the older generation.

I wrote already in this thread that new research has drastically improved this issue:

What he fails to realise is that 2-10% is only true when you have enough memory.

It is true that the asymptotic computational complexity (i.e. performance) of tracing GC is pathological both as live memory usage approaches/exceeds the bound of physical memory and as total memory consumption increases for computers with more GBs (or TBs) of DRAM. That is why I cited that research for footnote16 in my prior post—which btw my independently derived idea seems to be roughly analogous to—for removing the pathological asymptotics by leveraging an optimized form of RC for the older generations. I will write my detailed post about this soon.

And that is why I allude in my response to @barrkel to the time vs. space trade-off approaching parity between automated GC and manual/stack/compile-time deterministic paradigms for memory management. The reason that the deterministic paradigms you espouse will not exceed the space vs. time trade-offs for the throughput (nor maximum latency pauses) of state-of-the-art automated GC in general is because I repeat that memory allocation mgmt is inherently non-deterministic (thus requires runtime management)!

You make me repeat myself because you don’t assimilate all the details I write. Presumably you’re so glued to your mental model of programming that you don’t notice subtleties that paradigm-shift the presumptions.


@keean wrote:

The reference count needs to be in the object referenced, not the pointer, so it won't be in the same cache line as the pointer. When you have multiple pointers to the same object they all need to increment/decrement the same counter.

Agreed, but note in my prior post with the many footnotes about GC, I cited some research (Weighted Reference Counting) that places the count in the pointers for allocation and only has to decrement the shared count on deallocation. Also cited research about employing the reference counting only for a single-bit stored in the pointer in a reference counting generation scheme. Also in an earlier post I suggested to pulling all the counts closer together in contiguous locality to minimize cache line and virtual page faults. There’s also the concept of buffering the increments and decrements to amortize.

@NodixBlockchain

This comment has been minimized.

Show comment
Hide comment
@NodixBlockchain

NodixBlockchain Nov 22, 2017

The thing is my paradigm is the basics for high level language, which is fully working with JS/Json, can generate binary data for webGL apps, or other kind of dynamic application.

Even Node.js or javascript VM use this kind of code internally. You can't have any kind of high level language without this kind of code underlying it.

And for the moment it give more feature than javascript as it work in multi threaded environment and memory can be freed safely when it's not used directly.

The script language can already process HTML data, including form variable, query variable, and interact 100% with js app, and i think it's still cleaner than PHP.

for example https://github.com/NodixBlockchain/nodix/blob/master/export/web/nodix.site#L555

It can parse POST variable, including files, generate js variable and code with live variable from the node etc.

The low level mechanism is only to be seen as a minimalistic VM to run the script language, which either handle network events, or generate html page with dynamic data.

Any kind of advanced high level language will need this sort of code to manage memory, references, multi threads etc

The lexer/VM can be very simple because the low level code already have advanced feature to deal with objects, memory, scope and evaluation of objects data.

NodixBlockchain commented Nov 22, 2017

The thing is my paradigm is the basics for high level language, which is fully working with JS/Json, can generate binary data for webGL apps, or other kind of dynamic application.

Even Node.js or javascript VM use this kind of code internally. You can't have any kind of high level language without this kind of code underlying it.

And for the moment it give more feature than javascript as it work in multi threaded environment and memory can be freed safely when it's not used directly.

The script language can already process HTML data, including form variable, query variable, and interact 100% with js app, and i think it's still cleaner than PHP.

for example https://github.com/NodixBlockchain/nodix/blob/master/export/web/nodix.site#L555

It can parse POST variable, including files, generate js variable and code with live variable from the node etc.

The low level mechanism is only to be seen as a minimalistic VM to run the script language, which either handle network events, or generate html page with dynamic data.

Any kind of advanced high level language will need this sort of code to manage memory, references, multi threads etc

The lexer/VM can be very simple because the low level code already have advanced feature to deal with objects, memory, scope and evaluation of objects data.

@shelby3

This comment has been minimized.

Show comment
Hide comment
@shelby3

shelby3 Nov 22, 2017

@NodixBlockchain, you’re repeating the same points I already discussed with you in the Concurrency and your “Saluations! :)” threads. I refuse to repeat my explanations again to you as to why what you think is applicable, is not applicable to what I am doing. It’s not my job to repeat myself over and over, just because you think I didn’t already refute you before in those other threads.

Please stop. Please.

You can't have any kind of high level language without this kind of code underlying it.

Any kind of advanced high level language will need this sort of code to manage memory, references, multi threads etc

Incorrect. And I do not want to explain why again.

And for the moment it give more feature than javascript as it work in multi threaded environment

Your paradigm for multithreading is not applicable to the direction I’m headed, as I will repeat what I wrote before quoted as follows:

but there is too much corner case with multi threads

We already discussed in the prior threads that (your and Rust’s paradigm of allowing/presuming) willy-nilly multithreading every where is not the paradigm I’m interested in. I’m looking at the single-threaded event model. I will be discussing GC and multithreading in this thread.

I’m not trying to be a jerk. I have clear reasons for the focus that I’m pursuing. I hope you also understand my available time is very limited (and no that is not a valid reason that I should employ your framework).

Sometimes I ponder if the trolls at BCT paid you to come and disrupt me. I’m having enough difficulty as it is making forward progress, and I need to be taken off on repetitive tangents like I need another hole in my head in addition to the one I already have from a hammer.

shelby3 commented Nov 22, 2017

@NodixBlockchain, you’re repeating the same points I already discussed with you in the Concurrency and your “Saluations! :)” threads. I refuse to repeat my explanations again to you as to why what you think is applicable, is not applicable to what I am doing. It’s not my job to repeat myself over and over, just because you think I didn’t already refute you before in those other threads.

Please stop. Please.

You can't have any kind of high level language without this kind of code underlying it.

Any kind of advanced high level language will need this sort of code to manage memory, references, multi threads etc

Incorrect. And I do not want to explain why again.

And for the moment it give more feature than javascript as it work in multi threaded environment

Your paradigm for multithreading is not applicable to the direction I’m headed, as I will repeat what I wrote before quoted as follows:

but there is too much corner case with multi threads

We already discussed in the prior threads that (your and Rust’s paradigm of allowing/presuming) willy-nilly multithreading every where is not the paradigm I’m interested in. I’m looking at the single-threaded event model. I will be discussing GC and multithreading in this thread.

I’m not trying to be a jerk. I have clear reasons for the focus that I’m pursuing. I hope you also understand my available time is very limited (and no that is not a valid reason that I should employ your framework).

Sometimes I ponder if the trolls at BCT paid you to come and disrupt me. I’m having enough difficulty as it is making forward progress, and I need to be taken off on repetitive tangents like I need another hole in my head in addition to the one I already have from a hammer.

@shelby3

This comment has been minimized.

Show comment
Hide comment
@shelby3

shelby3 Nov 22, 2017

@keean wrote:

Interestingly I am about to port a bunch of asm.js to typescript, as the memory allocation for asm.js / webasm does not work well with the main JS garbage collector, and allocating the heap for the webasm is causing fragmentation of the JS heap. In the end it turns out that the better memory usage, keeping all them memory managed by the GC is better for the application stability than the small performance gain of having the code in asm.js.

Edit: its probably worth pointing out that this application uses a lot of memory, and has some periods when it is performing rapid allocations. This leads to memory usage for a single tab of about 1GB (even though memory buffers are 16mb so with manual allocation it would get nowhere near that level). Frequently when usage gets more than 1GB the web-browser crashes, even though most of that memory is probably free, and just waiting to be collected.

Agreed afaik you appear to be making a valid and important point, which dovetails with my initial peek into analyses about GC for proper integration with the low-level coding.

The problem you’re experiencing perhaps has something to do with the fact that most browsers are limited to presuming a least common denominator of a 32-bit virtual address space and this gets very complicated as to how they split this up between processes. They also have to protect against writing outside of bounds efficiently. I dug a little bit into the issues (c.f. the last 2 links in quoted text below) but did not completely climb down the rabbit hole.

@shelby3 wrote:

For this last quoted issue, I am contemplating the ability to employ the unused bits (c.f. also) of 64-bit pointers (64-bit also provides a huge virtual memory space and c.f. also) would enable storing an index into a contiguous array which has the reference counts as array elements so they would be in contiguous pages.

I had mentioned to you in private discussion my idea about storing older generation objects in same sized objects memory pages in order to minimize fragmentation with a 64-bit virtual address space. This would rely on less frequent compaction pauses for old generation objects, and would reuse old generation slots instead of waiting to compact them (i.e. not primarily a copy collector).

Also note that very large or small objects are typically handled different by GC algorithms, so there may be an issue there w.r.t. to the 16MB objects you’re allocating.

I agree that the memory allocation seems to have corner cases around the interaction of the GC in JS and the allocation of the ArrayBuffer for ASM.js/WASM. Also of course objects have to copied from one to the other, they’re not interoperable.

However, as I pointed out in prior posts, just because automated GC has a corner issue with a particular scenario, doesn’t enable the presumption that manual memory mgmt is a panacea in terms of the holistic equation of memory mgmt optimization. You gain some control in one area and lose something in other axis of the optimization space (e.g. more human-error prone, less throughput, or worse asymptotics, etc).

@shelby wrote:

It “should” only do so in a balance of throughput, asymptotics, pause latency, and (time & space) locality. If you’re preferring priorities of time locality (immediacy), then not only are you attempting the impossible of defying the non-determinism of memory mgmt (i.e. immediacy is irrelevant when the period of allocation is non-deterministic), you’re also not optimizing the entire equation for memory mgmt. This sort of myopia might not manifest as measured problems in your handcrafted project due to the confirmation bias of handcrafted design decisions, but may in more diverse and general ecosystem of use cases and developers. Your exposition has the hallmarks of an amateur lacking academic study.

@shelby3 wrote:

Note though that the metric you propose (although significantly representative given all other factors equivalent) doesn’t account for all impacts of the GC scheme chosen, such as cache and virtual paging locality (contiguity) to minimize thrashing which can significantly impact performance. Note however, that deterministic compile-time (manual explicit or for stack argument conventions and such thus assisted partially implicit) memory mgmt doesn’t necessarily achieve a panacea for this ancillary factor and could actually degrade it because generational and copy GC increase locality.

I’m contemplating whether I can design my own programming language which models what I want, will tolerably to transpile to JS/TypeScript (or something else) easily enough to allow me to use it now, and then rewrite the VM and compiler later to optimum. This is what I’m analysing now and the reason I’m doing research on GC.

@keean wrote:

Following this argument to its logical conclusion, the better memory usage of manually managed memory for the whole application would be better overall.

How do you arrive at this conclusion? Have the posts I have made since you wrote that, changed your mind?

Afaics, the optimum solution is to better design the GC and VM with the low-level coding in mind. JavaScript was designed by Brendan Eich in 10 days. I presume that since then backward compatibility demands have prevented any holistic overhaul.

Also I’m prepared to deprecate 32-bit OSes as a viable target. I’m future directed because anyway, we’re not going to get something popularized within a couple of years at best. Afaik, even mobile is moving towards 64-bit rapidly. Microsoft Windoze may be a bit resistant, but 64-bit Home is available for those who care when they buy a machine. And anyone still running that spyware without dual booting or running Linux in a VirtualBox is really behind the curve.

shelby3 commented Nov 22, 2017

@keean wrote:

Interestingly I am about to port a bunch of asm.js to typescript, as the memory allocation for asm.js / webasm does not work well with the main JS garbage collector, and allocating the heap for the webasm is causing fragmentation of the JS heap. In the end it turns out that the better memory usage, keeping all them memory managed by the GC is better for the application stability than the small performance gain of having the code in asm.js.

Edit: its probably worth pointing out that this application uses a lot of memory, and has some periods when it is performing rapid allocations. This leads to memory usage for a single tab of about 1GB (even though memory buffers are 16mb so with manual allocation it would get nowhere near that level). Frequently when usage gets more than 1GB the web-browser crashes, even though most of that memory is probably free, and just waiting to be collected.

Agreed afaik you appear to be making a valid and important point, which dovetails with my initial peek into analyses about GC for proper integration with the low-level coding.

The problem you’re experiencing perhaps has something to do with the fact that most browsers are limited to presuming a least common denominator of a 32-bit virtual address space and this gets very complicated as to how they split this up between processes. They also have to protect against writing outside of bounds efficiently. I dug a little bit into the issues (c.f. the last 2 links in quoted text below) but did not completely climb down the rabbit hole.

@shelby3 wrote:

For this last quoted issue, I am contemplating the ability to employ the unused bits (c.f. also) of 64-bit pointers (64-bit also provides a huge virtual memory space and c.f. also) would enable storing an index into a contiguous array which has the reference counts as array elements so they would be in contiguous pages.

I had mentioned to you in private discussion my idea about storing older generation objects in same sized objects memory pages in order to minimize fragmentation with a 64-bit virtual address space. This would rely on less frequent compaction pauses for old generation objects, and would reuse old generation slots instead of waiting to compact them (i.e. not primarily a copy collector).

Also note that very large or small objects are typically handled different by GC algorithms, so there may be an issue there w.r.t. to the 16MB objects you’re allocating.

I agree that the memory allocation seems to have corner cases around the interaction of the GC in JS and the allocation of the ArrayBuffer for ASM.js/WASM. Also of course objects have to copied from one to the other, they’re not interoperable.

However, as I pointed out in prior posts, just because automated GC has a corner issue with a particular scenario, doesn’t enable the presumption that manual memory mgmt is a panacea in terms of the holistic equation of memory mgmt optimization. You gain some control in one area and lose something in other axis of the optimization space (e.g. more human-error prone, less throughput, or worse asymptotics, etc).

@shelby wrote:

It “should” only do so in a balance of throughput, asymptotics, pause latency, and (time & space) locality. If you’re preferring priorities of time locality (immediacy), then not only are you attempting the impossible of defying the non-determinism of memory mgmt (i.e. immediacy is irrelevant when the period of allocation is non-deterministic), you’re also not optimizing the entire equation for memory mgmt. This sort of myopia might not manifest as measured problems in your handcrafted project due to the confirmation bias of handcrafted design decisions, but may in more diverse and general ecosystem of use cases and developers. Your exposition has the hallmarks of an amateur lacking academic study.

@shelby3 wrote:

Note though that the metric you propose (although significantly representative given all other factors equivalent) doesn’t account for all impacts of the GC scheme chosen, such as cache and virtual paging locality (contiguity) to minimize thrashing which can significantly impact performance. Note however, that deterministic compile-time (manual explicit or for stack argument conventions and such thus assisted partially implicit) memory mgmt doesn’t necessarily achieve a panacea for this ancillary factor and could actually degrade it because generational and copy GC increase locality.

I’m contemplating whether I can design my own programming language which models what I want, will tolerably to transpile to JS/TypeScript (or something else) easily enough to allow me to use it now, and then rewrite the VM and compiler later to optimum. This is what I’m analysing now and the reason I’m doing research on GC.

@keean wrote:

Following this argument to its logical conclusion, the better memory usage of manually managed memory for the whole application would be better overall.

How do you arrive at this conclusion? Have the posts I have made since you wrote that, changed your mind?

Afaics, the optimum solution is to better design the GC and VM with the low-level coding in mind. JavaScript was designed by Brendan Eich in 10 days. I presume that since then backward compatibility demands have prevented any holistic overhaul.

Also I’m prepared to deprecate 32-bit OSes as a viable target. I’m future directed because anyway, we’re not going to get something popularized within a couple of years at best. Afaik, even mobile is moving towards 64-bit rapidly. Microsoft Windoze may be a bit resistant, but 64-bit Home is available for those who care when they buy a machine. And anyone still running that spyware without dual booting or running Linux in a VirtualBox is really behind the curve.

@NodixBlockchain

This comment has been minimized.

Show comment
Hide comment
@NodixBlockchain

NodixBlockchain Nov 22, 2017

Sometimes I ponder if the trolls at BCT paid you to come and disrupt me.

Just have to comment on this, you're paranoiac dude lol I don't have anything to do with anyone on BCT lol

NodixBlockchain commented Nov 22, 2017

Sometimes I ponder if the trolls at BCT paid you to come and disrupt me.

Just have to comment on this, you're paranoiac dude lol I don't have anything to do with anyone on BCT lol

@keean

This comment has been minimized.

Show comment
Hide comment
@keean

keean Nov 22, 2017

Owner

How do you arrive at this conclusion? Have the posts I have made since you wrote that, changed your mind?

Because I am good enough to get the manual memory management right, and it will use a lot less memory, which will result in less fragmentation, and less crashes and therefore a better user experience.

I am still convinced that static analysis and region based memory management is a sweet spot for high performance languages.

I think GC is useful for simple coding, but there are some big problems:

  1. it encourages object thrashing
  2. it suffers from fragmentation
  3. it causes user-interface jank due to the 'stop-the-world' nature of GC.

If you have a system with all three of the above problems, there is nothing you can do as a programmer within the language to solve the problem.

You can however solve problems (2) and (3) if you can avoid (1) by limiting object creation. If you can force pre-allocation of resources, and then use effects to restrict new object creation, so that the compiler will prevent you using idioms that thrash the GC then it could be acceptable. The problem is that you need all the library APIs to respect this too, and provide a way for the user to pass the result buffer into the function/procedure. As we know if you give people multiple ways to solve a problem they will probably pick the wrong way.

This is why I am interested in a language that forces people to do things the right way, so that then library APIs will always be written the correct way. Of course this might not make the language very popular, but I think it is an interesting approach to explore.

Owner

keean commented Nov 22, 2017

How do you arrive at this conclusion? Have the posts I have made since you wrote that, changed your mind?

Because I am good enough to get the manual memory management right, and it will use a lot less memory, which will result in less fragmentation, and less crashes and therefore a better user experience.

I am still convinced that static analysis and region based memory management is a sweet spot for high performance languages.

I think GC is useful for simple coding, but there are some big problems:

  1. it encourages object thrashing
  2. it suffers from fragmentation
  3. it causes user-interface jank due to the 'stop-the-world' nature of GC.

If you have a system with all three of the above problems, there is nothing you can do as a programmer within the language to solve the problem.

You can however solve problems (2) and (3) if you can avoid (1) by limiting object creation. If you can force pre-allocation of resources, and then use effects to restrict new object creation, so that the compiler will prevent you using idioms that thrash the GC then it could be acceptable. The problem is that you need all the library APIs to respect this too, and provide a way for the user to pass the result buffer into the function/procedure. As we know if you give people multiple ways to solve a problem they will probably pick the wrong way.

This is why I am interested in a language that forces people to do things the right way, so that then library APIs will always be written the correct way. Of course this might not make the language very popular, but I think it is an interesting approach to explore.

@shelby3

This comment has been minimized.

Show comment
Hide comment
@shelby3

shelby3 Sep 12, 2018

@keean I was careful to point out that I think Jon’s experience applies when trying to force FP on Rust. The benchmarks you cite were hand-coded to use the most optimum programming paradigm for each PL. So I think Jon is probably correct that Rust will not be so decisively more performant when someone is trying to use Rust as a FP language. FP typically doesn’t work well without automatic GC. I continue my stance (until proven otherwise) that Rust is an inflexible PL that forces the programmer to pay a huge price in productivity, expression, and thus readability in order to obtain its vaunted performance and safety guarantees. Some people may justify that in some use cases. I just can’t agree with those who claim that Rust is the best future PL for statically typed programming. I still posit (and hope) we can improve on Rust with the oft-mentioned Actor paradigm we’re contemplating.

In short, I agree with you and I also give some credit to Jon’s point. You are both probably correct, but you are more correct that mark-and-sweep GC is not acceptable moving forward. But Jon is probably also correct that Rust’s lifetimes are probably also not flexible enough to be acceptable moving forward. That’s why I have been looking at other paradigms for automated GC. And I am excited about the “actor epiphany” proposal I devised with your instigation. Interested to see if that will be the sweet spot that checks off all the needs:

  • flexibility
  • statically proven safety (including data race and exceptions safety)
  • performance
  • resource (e.g. memory consumption) efficiency

shelby3 commented Sep 12, 2018

@keean I was careful to point out that I think Jon’s experience applies when trying to force FP on Rust. The benchmarks you cite were hand-coded to use the most optimum programming paradigm for each PL. So I think Jon is probably correct that Rust will not be so decisively more performant when someone is trying to use Rust as a FP language. FP typically doesn’t work well without automatic GC. I continue my stance (until proven otherwise) that Rust is an inflexible PL that forces the programmer to pay a huge price in productivity, expression, and thus readability in order to obtain its vaunted performance and safety guarantees. Some people may justify that in some use cases. I just can’t agree with those who claim that Rust is the best future PL for statically typed programming. I still posit (and hope) we can improve on Rust with the oft-mentioned Actor paradigm we’re contemplating.

In short, I agree with you and I also give some credit to Jon’s point. You are both probably correct, but you are more correct that mark-and-sweep GC is not acceptable moving forward. But Jon is probably also correct that Rust’s lifetimes are probably also not flexible enough to be acceptable moving forward. That’s why I have been looking at other paradigms for automated GC. And I am excited about the “actor epiphany” proposal I devised with your instigation. Interested to see if that will be the sweet spot that checks off all the needs:

  • flexibility
  • statically proven safety (including data race and exceptions safety)
  • performance
  • resource (e.g. memory consumption) efficiency
@keean

This comment has been minimized.

Show comment
Hide comment
@keean

keean Sep 12, 2018

Owner

@shelby3 I must have misunderstood what he was saying. When I read it he seemed to be implying that GC was faster than manually managed memory and used more space, both things I have heard before, but that real world experience, and benchmarks do not seem to support.

Owner

keean commented Sep 12, 2018

@shelby3 I must have misunderstood what he was saying. When I read it he seemed to be implying that GC was faster than manually managed memory and used more space, both things I have heard before, but that real world experience, and benchmarks do not seem to support.

@shelby3

This comment has been minimized.

Show comment
Hide comment
@shelby3

shelby3 Sep 12, 2018

@keean wrote:

When I read it he seemed to be implying that GC was faster than manually managed memory and used more space

I am also interpreting what he wrote as implying that broader claim, but I realized that he is really complaining about how Rust will perform in a FP use case scenario (even if he didn’t realize or at least didn’t explicitly acknowledge that his claim is narrower).

shelby3 commented Sep 12, 2018

@keean wrote:

When I read it he seemed to be implying that GC was faster than manually managed memory and used more space

I am also interpreting what he wrote as implying that broader claim, but I realized that he is really complaining about how Rust will perform in a FP use case scenario (even if he didn’t realize or at least didn’t explicitly acknowledge that his claim is narrower).

@sighoya

This comment has been minimized.

Show comment
Hide comment
@sighoya

sighoya Sep 12, 2018

@shelby3
@keean

Jon Harrop replied again.

And I'm mostly of his opinion in the following aspects:

Manual management is not zero cost, deallocation is done at runtime, not at compile time, only the decision to do it is at compile time.
Freeing memory immediately after it is out of scope is not best strategy as it can be costly to execute repeated freeing of small objects instead to purge them all together when the buffer is full.

Rust memory management does only cover compile time decidable deallocations by forcing a global order for its unique ptrs which constrains user flexibility.
For the other cases, Rust falls back to reference count each object with their RC or ARC which is known to be inferior to tracing GC.

There is another point.
Because of lot non determinism in real word applications you can never manage memory at compile time, you will need some form of a GC ( objects pools simulate GC). And with runtime knowledge, you are more clever when it is convenient to deallocate memory. Manual memory management can't do this and will never be able to do this.

Usual code written in Rust or C++ is, in my eyes, slower than code written in a good PL with GC because outperforming a world class GC is a hard task these days and incur development time and therefore causing costs.
Enterprises are not interested to write code with highest performance in the first place, this is one of the reasons why so many enterprises are satisfied with Java and aren't willing to pay for 10-20% increase in speed.

sighoya commented Sep 12, 2018

@shelby3
@keean

Jon Harrop replied again.

And I'm mostly of his opinion in the following aspects:

Manual management is not zero cost, deallocation is done at runtime, not at compile time, only the decision to do it is at compile time.
Freeing memory immediately after it is out of scope is not best strategy as it can be costly to execute repeated freeing of small objects instead to purge them all together when the buffer is full.

Rust memory management does only cover compile time decidable deallocations by forcing a global order for its unique ptrs which constrains user flexibility.
For the other cases, Rust falls back to reference count each object with their RC or ARC which is known to be inferior to tracing GC.

There is another point.
Because of lot non determinism in real word applications you can never manage memory at compile time, you will need some form of a GC ( objects pools simulate GC). And with runtime knowledge, you are more clever when it is convenient to deallocate memory. Manual memory management can't do this and will never be able to do this.

Usual code written in Rust or C++ is, in my eyes, slower than code written in a good PL with GC because outperforming a world class GC is a hard task these days and incur development time and therefore causing costs.
Enterprises are not interested to write code with highest performance in the first place, this is one of the reasons why so many enterprises are satisfied with Java and aren't willing to pay for 10-20% increase in speed.

@keean

This comment has been minimized.

Show comment
Hide comment
@keean

keean Sep 12, 2018

Owner

@shelby3 I disagree about Java, we don't use it because it would cost us much more for the resources to run the software. We do use languages with GC, just not Java.

I agree that for most business process logic GC is fast enough, but I disagree with his analysis of GC being faster in general. I think it's easy to speculate and be wrong. Real examples of code that performs better with GC would be needed to make that point believable.

Owner

keean commented Sep 12, 2018

@shelby3 I disagree about Java, we don't use it because it would cost us much more for the resources to run the software. We do use languages with GC, just not Java.

I agree that for most business process logic GC is fast enough, but I disagree with his analysis of GC being faster in general. I think it's easy to speculate and be wrong. Real examples of code that performs better with GC would be needed to make that point believable.

@NodixBlockchain

This comment has been minimized.

Show comment
Hide comment
@NodixBlockchain

NodixBlockchain Sep 13, 2018

NodixBlockchain commented Sep 13, 2018

@shelby3

This comment has been minimized.

Show comment
Hide comment
@shelby3

shelby3 Sep 13, 2018

@keean Jon replied again and I think he is making some valid points. I do not think he is arguing that the current GC we have in Java is more performant in every use case compared to for example Rust or C++. Rather he is arguing that manual memory management is not the best balance we can possibly achieve (for most programmers and most “usual cases”) nor is it best in every use case. He may actually be trying to make a stronger argument than my summary, but for now I will stick with the interpretation that his analysis is more nuanced than just stating that Java’s GC is faster. I think @sighoya summarized it neatly above.

@keean I do agree with your objections about automatic memory management being difficult to control and having in some cases pathological memory footprints. Also I agree with @NodixBlockchain’s concise summary of points I have made in the past. Thus I think if we will have AMM (automatic memory management), then it must address those concerns of yours. Perhaps this “actor epiphany” idea we’re exploring might be such a sweet spot in the design space.

I replied to Jon on Quora:

Java (OOP + type erasure - value types) occupies a very specific corner of memory management where allocation rates are far too high and programs generate far too much short-lived garbage (IMO) and the VM is then optimised to handle this I-wouldn’t-start-from-here situation. Do these results about Java generalise to other paradigms and data representations? I’m skeptical. Just adding value types gets you a much flatter heap topology and a much lower allocation rate.

Agreed. There may be sweet spots in the design space which have not been fully explored yet. I’m onboard with the bump-pointer generational GC. I’m just not sure that mark-and-sweep is what we need to go along with it, per my other mention of the design idea we’re just recently exploring.

An automatic mark-and-sweep is reasoning at runtime, which will never be as efficient as reasoning at compile-time,

I am skeptical. Reasoning at compile-time is extremely limited because so little information is available.

Miscommunication. I meant to imply that given the case of symmetric information, then reasoning at compile-time is obviously more efficient. Indeed you’re correct to point out (as I also have written in the Github about my stance against Rust) is that static reasoning is too inflexible and limiting. Also your point has support for example from the example speculative execution pipelines in the CPU which have more information at runtime and thus are more effective. Keean pointed out that Itanium failed because static analysis of instruction level parallelism (ILP) is less effective.

Their oracular memory management appears to be very different from manual memory management. With manual memory management the programmer reasons about the program using only information that is available at compile time and using only the tools available. In the real world, I see two important cosequences in practice: programmers end up being extremely conservative about lifetime (i.e. they keep a lot of floating garbage around because it makes it easier to reason about the correctness of their memory management) and programmers often end up reaching for extremely inefficient one-size-fits-all solutions like reference counting in order to fix things like nasty concurrency bugs. In contrast, this oracular memory management uses extensive information gained at run-time from a previous run of the program and is presumably vastly more aggressive than a real programmer at deallocating as a consequence.

Agreed. That needs to be considered. For example, I wouldn’t want to use ARC generally without some mark-and-sweep probing for cyclical references. Immutability may help make that more efficient. In the idea I am exploring then only shared references are ARC and they will typically be immutable (exception being those exclusive writable objects moved instead of shared between Actors).

You seem to be saying that the detailed exposition of issues is beyond what can be reasonably summarized?

The issues can be enumerated but I cannot see how to quantify them in a useful way.

It’s a complex design space. Perhaps there can be some better attempts at quantification, yet they’ll probably always be incomplete and thus potentially misleading in some use case scenarios.

As you know, Rust’s borrowing lifetimes can prove these are safe at compile-time. No runtime overhead.

I wouldn’t call stack allocations no runtime overhead. To give a concrete example, look at the ABIs for returning complex numbers in C99 and C++. C99 returns them in registers whereas C++ returns them on the stack (sret form). C++ can be a lot slower in this context and has no facility to return them in registers. So what you are calling “zero cost” is slower than an alternative design.

That might be an orthogonal issue. I would think that proving lifetime invariants would aid a hypothetical compiler in register allocation. However, we do come back to your original point about register pressure without TCO. I haven’t really thought about the way Rust’s lifetimes could interopt with TCO if at all? Seems that Pony’s reference capabilities (for data race safety) would be able to interopt with TCO:

https://tutorial.ponylang.org/gotchas/recursion.html

Tail calls? They’re useful for representing state machines efficiently and state machines are ubiquitous in the Actor model.

Definitely must have them. AFAICT Pony’s reference capabilities should be compatible with tail calls.

Purely functional data structures?

https://en.wikipedia.org/wiki/Purely_functional_data_structure

We’ve discussed for example implementing collections with lazy iterators and we should have lazy primitives and pureness annotations. Memoization though might need to be manually encoded. This is still on the todo list to study in more detail.

shelby3 commented Sep 13, 2018

@keean Jon replied again and I think he is making some valid points. I do not think he is arguing that the current GC we have in Java is more performant in every use case compared to for example Rust or C++. Rather he is arguing that manual memory management is not the best balance we can possibly achieve (for most programmers and most “usual cases”) nor is it best in every use case. He may actually be trying to make a stronger argument than my summary, but for now I will stick with the interpretation that his analysis is more nuanced than just stating that Java’s GC is faster. I think @sighoya summarized it neatly above.

@keean I do agree with your objections about automatic memory management being difficult to control and having in some cases pathological memory footprints. Also I agree with @NodixBlockchain’s concise summary of points I have made in the past. Thus I think if we will have AMM (automatic memory management), then it must address those concerns of yours. Perhaps this “actor epiphany” idea we’re exploring might be such a sweet spot in the design space.

I replied to Jon on Quora:

Java (OOP + type erasure - value types) occupies a very specific corner of memory management where allocation rates are far too high and programs generate far too much short-lived garbage (IMO) and the VM is then optimised to handle this I-wouldn’t-start-from-here situation. Do these results about Java generalise to other paradigms and data representations? I’m skeptical. Just adding value types gets you a much flatter heap topology and a much lower allocation rate.

Agreed. There may be sweet spots in the design space which have not been fully explored yet. I’m onboard with the bump-pointer generational GC. I’m just not sure that mark-and-sweep is what we need to go along with it, per my other mention of the design idea we’re just recently exploring.

An automatic mark-and-sweep is reasoning at runtime, which will never be as efficient as reasoning at compile-time,

I am skeptical. Reasoning at compile-time is extremely limited because so little information is available.

Miscommunication. I meant to imply that given the case of symmetric information, then reasoning at compile-time is obviously more efficient. Indeed you’re correct to point out (as I also have written in the Github about my stance against Rust) is that static reasoning is too inflexible and limiting. Also your point has support for example from the example speculative execution pipelines in the CPU which have more information at runtime and thus are more effective. Keean pointed out that Itanium failed because static analysis of instruction level parallelism (ILP) is less effective.

Their oracular memory management appears to be very different from manual memory management. With manual memory management the programmer reasons about the program using only information that is available at compile time and using only the tools available. In the real world, I see two important cosequences in practice: programmers end up being extremely conservative about lifetime (i.e. they keep a lot of floating garbage around because it makes it easier to reason about the correctness of their memory management) and programmers often end up reaching for extremely inefficient one-size-fits-all solutions like reference counting in order to fix things like nasty concurrency bugs. In contrast, this oracular memory management uses extensive information gained at run-time from a previous run of the program and is presumably vastly more aggressive than a real programmer at deallocating as a consequence.

Agreed. That needs to be considered. For example, I wouldn’t want to use ARC generally without some mark-and-sweep probing for cyclical references. Immutability may help make that more efficient. In the idea I am exploring then only shared references are ARC and they will typically be immutable (exception being those exclusive writable objects moved instead of shared between Actors).

You seem to be saying that the detailed exposition of issues is beyond what can be reasonably summarized?

The issues can be enumerated but I cannot see how to quantify them in a useful way.

It’s a complex design space. Perhaps there can be some better attempts at quantification, yet they’ll probably always be incomplete and thus potentially misleading in some use case scenarios.

As you know, Rust’s borrowing lifetimes can prove these are safe at compile-time. No runtime overhead.

I wouldn’t call stack allocations no runtime overhead. To give a concrete example, look at the ABIs for returning complex numbers in C99 and C++. C99 returns them in registers whereas C++ returns them on the stack (sret form). C++ can be a lot slower in this context and has no facility to return them in registers. So what you are calling “zero cost” is slower than an alternative design.

That might be an orthogonal issue. I would think that proving lifetime invariants would aid a hypothetical compiler in register allocation. However, we do come back to your original point about register pressure without TCO. I haven’t really thought about the way Rust’s lifetimes could interopt with TCO if at all? Seems that Pony’s reference capabilities (for data race safety) would be able to interopt with TCO:

https://tutorial.ponylang.org/gotchas/recursion.html

Tail calls? They’re useful for representing state machines efficiently and state machines are ubiquitous in the Actor model.

Definitely must have them. AFAICT Pony’s reference capabilities should be compatible with tail calls.

Purely functional data structures?

https://en.wikipedia.org/wiki/Purely_functional_data_structure

We’ve discussed for example implementing collections with lazy iterators and we should have lazy primitives and pureness annotations. Memoization though might need to be manually encoded. This is still on the todo list to study in more detail.

@keean

This comment has been minimized.

Show comment
Hide comment
@keean

keean Sep 13, 2018

Owner

@shelby3

@keean Jon replied again and I think he is making some valid points. I do not think he is arguing that the current GC we have in Java is more performant in every use case compared to for example Rust or C++. Rather he is arguing that manual memory management is not the best we can possibly achieve nor is it best in every use case. He may actually be trying to make a stronger argument than my summary, but for now I will stick with the interpretation that his analysis is more nuanced than just stating that Java’s GC is faster.

Speculation is easy. I want to see an example where any program is measurably faster in a GC language like Java compared to an implementation in a non-GC language. I think these kind of arguments sound good in theory, but in practice I have never seen them to be true. Rust has shown that you can have 'C' level performance with abstraction (traits/typeclasses) and no performance penalty. So what is left to explain the worse performance of other languages? You can no longer argue the higher level abstraction makes it slower.

Note: I am not arguing against automatic memory management in a mainstream language, I think it is essential.

There is a big issue though, and that is when the GC performance is poor, you are reduced to looking at tooling to try and understand why. This can be very difficult. The advantage for Rust is that lifetimes are an abstract concept that we can see in the source code and reason about, so poor memory performance becomes something we can deal with in code-reviews, rather than something we have to deal with using instrumentation and tooling whilst the program is running.

Owner

keean commented Sep 13, 2018

@shelby3

@keean Jon replied again and I think he is making some valid points. I do not think he is arguing that the current GC we have in Java is more performant in every use case compared to for example Rust or C++. Rather he is arguing that manual memory management is not the best we can possibly achieve nor is it best in every use case. He may actually be trying to make a stronger argument than my summary, but for now I will stick with the interpretation that his analysis is more nuanced than just stating that Java’s GC is faster.

Speculation is easy. I want to see an example where any program is measurably faster in a GC language like Java compared to an implementation in a non-GC language. I think these kind of arguments sound good in theory, but in practice I have never seen them to be true. Rust has shown that you can have 'C' level performance with abstraction (traits/typeclasses) and no performance penalty. So what is left to explain the worse performance of other languages? You can no longer argue the higher level abstraction makes it slower.

Note: I am not arguing against automatic memory management in a mainstream language, I think it is essential.

There is a big issue though, and that is when the GC performance is poor, you are reduced to looking at tooling to try and understand why. This can be very difficult. The advantage for Rust is that lifetimes are an abstract concept that we can see in the source code and reason about, so poor memory performance becomes something we can deal with in code-reviews, rather than something we have to deal with using instrumentation and tooling whilst the program is running.

@shelby3

This comment has been minimized.

Show comment
Hide comment
@shelby3

shelby3 Sep 13, 2018

@keean wrote:

Speculation is easy. I want to see an example where any program is measurably faster in a GC language like Java compared to an implementation in a non-GC language. I think these kind of arguments sound good in theory, but in practice I have never seen them to be true. Rust has shown that you can have 'C' level performance with abstraction (traits/typeclasses) and no performance penalty. So what is left to explain the worse performance of other languages? You can no longer argue the higher level abstraction makes it slower.

Is Jon correct to imply that Rust’s lifetimes are incompatible with tail calls? If yes, then your argument hinges on that the programmer has to program a certain way that avoids tail calls.

Now I think we can probably side-step recursion with lazy iterators and still achieve the same functional compositionality. So I am thinking Jon’s complaint about lack of tail calls may be a red-herring. Anyway apparently we can get them with Pony’s reference capabilities.

But I think he absolutely correct that we can’t have the same level of algorithmic expression flexibility with Rust lifetimes as compared to the bump-pointer heap I have proposed for the “actor epiphany”. In fact, we know we can’t express certain constructs with lifetimes because lifetimes are not smart enough to see they are safe. And that bump-pointer heap I have proposed for the “actor epiphany” will be more efficient than Rust’s lifetimes because of cache locality!! Instead Rust would have been putting all on the non-bump-pointer heap with ARC.

So you are comparing apples-to-giraffes. 😁 (intended to be humorous)

giraffe

Indeed you can probably devise some complex memory regions design in order to code something in Rust as fast as it would be with my/our “actor epiphany” but it’s going to be non-standard (i.e. in a custom library of yours), difficult, and even more difficult for others to understand.

Just because you can make something that is more performant because you’re a programming guru, but your productivity drops into the toilet and/or new hires can’t read your code, that is not a solution!

I stumbled onto this comment which is about an unrelated topic but analogously applies:

Have you ever tried to implement a new collection that sits in the collections hierarchy? I have. It's awful. It is unbelievably difficult to get right. And as a consequence, NO ONE DOES IT! Pretty much everyone just sits outside of the collections hierarchy and has conversion methods to and fro, because that's a radically easier thing to do. And so, part of the major selling point of CanBuildFrom, all of the Like traits and the inheritence, simply goes away! These aren't useful features because no one can use them, because they're far to[o] complicated and arcane.


Note: I am not arguing against automatic memory management in a mainstream language, I think it is essential.

So why even argue for something you know we can’t possibly accept for the design of Zer0?

Who you making your PL for?

I think the key issue is how close can we get to Rust/C++ performance with this “actor epiphany”. If we can get within the noise band of difference for single-threaded, then Rust/C++ have no chance to complete at massively multi-core! Game over. Checkmate.

There is a big issue though, and that is when the GC performance is poor, you are reduced to looking at tooling to try and understand why. This can be very difficult. The advantage for Rust is that lifetimes are an abstract concept that we can see in the source code and reason about, so poor memory performance becomes something we can deal with in code-reviews, rather than something we have to deal with using instrumentation and tooling whilst the program is running.

Ah when you need to use memory regions and other complex design then your Rust code could potentially also require you use tooling to understand what your code is actually doing. Bugs that programmers generate such as overflowing with unnecessary allocations are really the same debugging issue whether you use ARC in Rust or AMM.

I think the problem here about control is really mark-and-sweep and uncontrolled heuristics about collection intervals. So pray tell me how my “actor epiphany” idea is any worse than what you could do with Rust??? I can’t see it. I see only improvement.

shelby3 commented Sep 13, 2018

@keean wrote:

Speculation is easy. I want to see an example where any program is measurably faster in a GC language like Java compared to an implementation in a non-GC language. I think these kind of arguments sound good in theory, but in practice I have never seen them to be true. Rust has shown that you can have 'C' level performance with abstraction (traits/typeclasses) and no performance penalty. So what is left to explain the worse performance of other languages? You can no longer argue the higher level abstraction makes it slower.

Is Jon correct to imply that Rust’s lifetimes are incompatible with tail calls? If yes, then your argument hinges on that the programmer has to program a certain way that avoids tail calls.

Now I think we can probably side-step recursion with lazy iterators and still achieve the same functional compositionality. So I am thinking Jon’s complaint about lack of tail calls may be a red-herring. Anyway apparently we can get them with Pony’s reference capabilities.

But I think he absolutely correct that we can’t have the same level of algorithmic expression flexibility with Rust lifetimes as compared to the bump-pointer heap I have proposed for the “actor epiphany”. In fact, we know we can’t express certain constructs with lifetimes because lifetimes are not smart enough to see they are safe. And that bump-pointer heap I have proposed for the “actor epiphany” will be more efficient than Rust’s lifetimes because of cache locality!! Instead Rust would have been putting all on the non-bump-pointer heap with ARC.

So you are comparing apples-to-giraffes. 😁 (intended to be humorous)

giraffe

Indeed you can probably devise some complex memory regions design in order to code something in Rust as fast as it would be with my/our “actor epiphany” but it’s going to be non-standard (i.e. in a custom library of yours), difficult, and even more difficult for others to understand.

Just because you can make something that is more performant because you’re a programming guru, but your productivity drops into the toilet and/or new hires can’t read your code, that is not a solution!

I stumbled onto this comment which is about an unrelated topic but analogously applies:

Have you ever tried to implement a new collection that sits in the collections hierarchy? I have. It's awful. It is unbelievably difficult to get right. And as a consequence, NO ONE DOES IT! Pretty much everyone just sits outside of the collections hierarchy and has conversion methods to and fro, because that's a radically easier thing to do. And so, part of the major selling point of CanBuildFrom, all of the Like traits and the inheritence, simply goes away! These aren't useful features because no one can use them, because they're far to[o] complicated and arcane.


Note: I am not arguing against automatic memory management in a mainstream language, I think it is essential.

So why even argue for something you know we can’t possibly accept for the design of Zer0?

Who you making your PL for?

I think the key issue is how close can we get to Rust/C++ performance with this “actor epiphany”. If we can get within the noise band of difference for single-threaded, then Rust/C++ have no chance to complete at massively multi-core! Game over. Checkmate.

There is a big issue though, and that is when the GC performance is poor, you are reduced to looking at tooling to try and understand why. This can be very difficult. The advantage for Rust is that lifetimes are an abstract concept that we can see in the source code and reason about, so poor memory performance becomes something we can deal with in code-reviews, rather than something we have to deal with using instrumentation and tooling whilst the program is running.

Ah when you need to use memory regions and other complex design then your Rust code could potentially also require you use tooling to understand what your code is actually doing. Bugs that programmers generate such as overflowing with unnecessary allocations are really the same debugging issue whether you use ARC in Rust or AMM.

I think the problem here about control is really mark-and-sweep and uncontrolled heuristics about collection intervals. So pray tell me how my “actor epiphany” idea is any worse than what you could do with Rust??? I can’t see it. I see only improvement.

@shelby3

This comment has been minimized.

Show comment
Hide comment
@shelby3

shelby3 Sep 13, 2018

@keean I think perhaps my prior comment is not 100% clear as to my stance. What I am saying is we can’t just take some optimally coded performance benchmarks as gospel and the whole truth. The actual truth is more nuanced. Although I agree with you that mark-and-sweep is difficult to control, maybe there is something better as a balance between Rust’s lifetimes and mark-and-sweep. And again, I am excited to find out whether this “actor epiphany” idea is that sweet spot or not.

shelby3 commented Sep 13, 2018

@keean I think perhaps my prior comment is not 100% clear as to my stance. What I am saying is we can’t just take some optimally coded performance benchmarks as gospel and the whole truth. The actual truth is more nuanced. Although I agree with you that mark-and-sweep is difficult to control, maybe there is something better as a balance between Rust’s lifetimes and mark-and-sweep. And again, I am excited to find out whether this “actor epiphany” idea is that sweet spot or not.

@keean

This comment has been minimized.

Show comment
Hide comment
@keean

keean Sep 13, 2018

Owner

@shelby The thing is people should not be implementing their own collections, they should use the ones in the library. So when I insert some data into my index map, how does GC help? I have to remember to remove the object from the index. Personally I think GC is focusing on too low a level. Memory management should be automatic, I don't want to see malloc/free or new/delete. However I expect to see a lot of insert/delete on collections. In this way I don't think lifetimes help, nor does GC, if you don't remove things when they are not needed you will leak menory.

Owner

keean commented Sep 13, 2018

@shelby The thing is people should not be implementing their own collections, they should use the ones in the library. So when I insert some data into my index map, how does GC help? I have to remember to remove the object from the index. Personally I think GC is focusing on too low a level. Memory management should be automatic, I don't want to see malloc/free or new/delete. However I expect to see a lot of insert/delete on collections. In this way I don't think lifetimes help, nor does GC, if you don't remove things when they are not needed you will leak menory.

@Ichoran

This comment has been minimized.

Show comment
Hide comment
@Ichoran

Ichoran Sep 13, 2018

There are no cases I know of where with attention to memory allocation you can do better with GC than without. But there are some cases where using standard best practice you do better with GC. A notable example is the binary-trees benchmark in CLBG: https://benchmarksgame-team.pages.debian.net/benchmarksgame/performance/binarytrees.html

Note that the basic "C gcc" implementation, which naively mallocs/frees tree components, takes 30s (single threaded) to complete. The "Java #3" implementation is similarly straightforward and single-threaded, yet takes only 15s of CPU time (12s wall clock; GC is parallelized).

This has been my (informal) experience with code that is like binary-trees: if you build up lots of transient data structures that require heap allocation, you are faster with GC than with even an explicit per-item scheme, let alone ARC which has additional overhead. However, I have yet to find a case where attention to memory is not radically better yet (e.g. the memory arena versions, which have ~8s CPU time and are parallelized). Note that Rust's version uses a memory arena but because of a lack of placement new has to copy the key data structure twice (construct on stack, then copy into memory arena, instead of constructing in-place in the memory arena), and is consequently almost twice as slow as the best C/C++ implementations. (Rust is going to get placement-new at some point and then should mostly catch up.)

Just FYI, since you were talking about having hard data about GC vs non-GC benchmarks.

Ichoran commented Sep 13, 2018

There are no cases I know of where with attention to memory allocation you can do better with GC than without. But there are some cases where using standard best practice you do better with GC. A notable example is the binary-trees benchmark in CLBG: https://benchmarksgame-team.pages.debian.net/benchmarksgame/performance/binarytrees.html

Note that the basic "C gcc" implementation, which naively mallocs/frees tree components, takes 30s (single threaded) to complete. The "Java #3" implementation is similarly straightforward and single-threaded, yet takes only 15s of CPU time (12s wall clock; GC is parallelized).

This has been my (informal) experience with code that is like binary-trees: if you build up lots of transient data structures that require heap allocation, you are faster with GC than with even an explicit per-item scheme, let alone ARC which has additional overhead. However, I have yet to find a case where attention to memory is not radically better yet (e.g. the memory arena versions, which have ~8s CPU time and are parallelized). Note that Rust's version uses a memory arena but because of a lack of placement new has to copy the key data structure twice (construct on stack, then copy into memory arena, instead of constructing in-place in the memory arena), and is consequently almost twice as slow as the best C/C++ implementations. (Rust is going to get placement-new at some point and then should mostly catch up.)

Just FYI, since you were talking about having hard data about GC vs non-GC benchmarks.

@shelby3

This comment has been minimized.

Show comment
Hide comment
@shelby3

shelby3 Sep 13, 2018

@keean wrote:

The thing is people should not be implementing their own collections, they should use the ones in the library. So when I insert some data into my index map, how does GC help? I have to remember to remove the object from the index.

In this way I don't think lifetimes help, nor does GC, if you don't remove things when they are not needed you will leak memory.

The collections library can’t call free when an element is removed or replaced, because it doesn’t know if there are other references to the same object (unless you copy everything or basically use value semantics everywhere which would be inefficient). There must be ARC, lifetimes, or some other form of knowledge such as mark-and-sweep or memory regions about when an object has no more extant references.


@Ichoran wrote:

But there are some cases where using standard best practice you do better with GC. A notable example is the binary-trees benchmark in CLBG: https://benchmarksgame-team.pages.debian.net/benchmarksgame/performance/binarytrees.html

Note that the basic "C gcc" implementation, which naively mallocs/frees tree components, takes 30s (single threaded) to complete. The "Java #3" implementation is similarly straightforward and single-threaded, yet takes only 15s of CPU time (12s wall clock; GC is parallelized).

That’s a good example of the point Jon Harrop was making and which I intuitively understood would be the case.

However, I have yet to find a case where attention to memory is not radically better yet (e.g. the memory arena versions, which have ~8s CPU time and are parallelized).

I think we need memory regions for Zer0 (which can employ efficient bump-pointer allocation and a single region-wide deallocation), because it will be very slow to build for example an immutable binary tree for sharing with another Actor using ARC. Can’t use the efficient Actor-local bump-pointer heap of my “actor epiphany” because that will be only for non-shared.

@keean you said you prefer control instead of mark-and-sweep heuristics, so that’s what I’m contemplating to provide.

Pony would instead give you GC automation without the regions, but there’s a performance penalty.

shelby3 commented Sep 13, 2018

@keean wrote:

The thing is people should not be implementing their own collections, they should use the ones in the library. So when I insert some data into my index map, how does GC help? I have to remember to remove the object from the index.

In this way I don't think lifetimes help, nor does GC, if you don't remove things when they are not needed you will leak memory.

The collections library can’t call free when an element is removed or replaced, because it doesn’t know if there are other references to the same object (unless you copy everything or basically use value semantics everywhere which would be inefficient). There must be ARC, lifetimes, or some other form of knowledge such as mark-and-sweep or memory regions about when an object has no more extant references.


@Ichoran wrote:

But there are some cases where using standard best practice you do better with GC. A notable example is the binary-trees benchmark in CLBG: https://benchmarksgame-team.pages.debian.net/benchmarksgame/performance/binarytrees.html

Note that the basic "C gcc" implementation, which naively mallocs/frees tree components, takes 30s (single threaded) to complete. The "Java #3" implementation is similarly straightforward and single-threaded, yet takes only 15s of CPU time (12s wall clock; GC is parallelized).

That’s a good example of the point Jon Harrop was making and which I intuitively understood would be the case.

However, I have yet to find a case where attention to memory is not radically better yet (e.g. the memory arena versions, which have ~8s CPU time and are parallelized).

I think we need memory regions for Zer0 (which can employ efficient bump-pointer allocation and a single region-wide deallocation), because it will be very slow to build for example an immutable binary tree for sharing with another Actor using ARC. Can’t use the efficient Actor-local bump-pointer heap of my “actor epiphany” because that will be only for non-shared.

@keean you said you prefer control instead of mark-and-sweep heuristics, so that’s what I’m contemplating to provide.

Pony would instead give you GC automation without the regions, but there’s a performance penalty.

@keean

This comment has been minimized.

Show comment
Hide comment
@keean

keean Sep 13, 2018

Owner

@shelby I liked the idea of memory regions when I looked into them.

I really like the idea of automatic memory management by default. @Ichoran 's example highlights that GC does better than naive management. We should enable beginners, and lazy people like me to have GC by default, so you get the 15s performance without trying.

Then I like the idea of being able to tune the memory handling to get the 8second version where you care about it. In most programs 80% or more of the code is not in performance critical sections, so let's have something simple and safe by default.

Then when performance is critical let's have some concept of thread priority, so we can use regions to manually take over management, but with the guarantee that the GC will clean up after the critical section if you leaked anything.

Owner

keean commented Sep 13, 2018

@shelby I liked the idea of memory regions when I looked into them.

I really like the idea of automatic memory management by default. @Ichoran 's example highlights that GC does better than naive management. We should enable beginners, and lazy people like me to have GC by default, so you get the 15s performance without trying.

Then I like the idea of being able to tune the memory handling to get the 8second version where you care about it. In most programs 80% or more of the code is not in performance critical sections, so let's have something simple and safe by default.

Then when performance is critical let's have some concept of thread priority, so we can use regions to manually take over management, but with the guarantee that the GC will clean up after the critical section if you leaked anything.

@shelby3

This comment has been minimized.

Show comment
Hide comment
@shelby3

shelby3 Sep 15, 2018

Summarized the concurrency issue w.r.t. to recent discussions in this thread.

shelby3 commented Sep 15, 2018

Summarized the concurrency issue w.r.t. to recent discussions in this thread.

@shelby3

This comment has been minimized.

Show comment
Hide comment
@shelby3

shelby3 Sep 15, 2018

I have written up a detail new issue on the new Zer0 repository based on our up-thread discussion:

Region-based memory management, aka “Arena” pattern

Please come on over to that GitLab issue and please comment there instead of here if you don’t mind. I put considerable effort into that post linked above as you will see.

Note I am hereby putting everyone on notice that I am ramping down and will asap discontinue posting to Github. The reason is both because I don’t entirely trust Microsoft (has anyone used that ad-malware crap they named Windows 10) given what is coming down the pike politically (quotes below from #30):

Btw, I recently wrote about the destruction that being caused by this “fairness” doctrine of the youth. (both the linked post and my follow-up reply are on point)

Also I think there is an important principle here at stake and you may not realize how close we are in the West to burning all the books again: http://esr.ibiblio.org/?p=8120

Some more links to the rapid transformation of our freedoms in the West:

And more importantly, because it is time to make a Zer0 repository and GitLab appears to be technologically better than GitHub:

https://usersnap.com/blog/gitlab-github/
https://about.gitlab.com/2017/07/19/git-wars-switching-to-gitlab/
https://about.gitlab.com/comparison/

You will notice the improvement immediately. For example, the markdown support for footnotes and the reddish-colored highlights for code.

I do hope @keean, @sighoya, @Ichoran, @jdh30 et al will enjoin me in ongoing discussions at the new Zer0 repository. I hope I will prove to be as tolerant as @keean has been. Keean this change is by no means motivated by wanting to take control away from you (and thanks so much for making the issues backup). I just need to have a Zer0 repository and focus my discussions for bringing Zer0 to fruition. I welcome you with open arms (and I am going to try my damnedest to keep my frustration level from boiling over).

shelby3 commented Sep 15, 2018

I have written up a detail new issue on the new Zer0 repository based on our up-thread discussion:

Region-based memory management, aka “Arena” pattern

Please come on over to that GitLab issue and please comment there instead of here if you don’t mind. I put considerable effort into that post linked above as you will see.

Note I am hereby putting everyone on notice that I am ramping down and will asap discontinue posting to Github. The reason is both because I don’t entirely trust Microsoft (has anyone used that ad-malware crap they named Windows 10) given what is coming down the pike politically (quotes below from #30):

Btw, I recently wrote about the destruction that being caused by this “fairness” doctrine of the youth. (both the linked post and my follow-up reply are on point)

Also I think there is an important principle here at stake and you may not realize how close we are in the West to burning all the books again: http://esr.ibiblio.org/?p=8120

Some more links to the rapid transformation of our freedoms in the West:

And more importantly, because it is time to make a Zer0 repository and GitLab appears to be technologically better than GitHub:

https://usersnap.com/blog/gitlab-github/
https://about.gitlab.com/2017/07/19/git-wars-switching-to-gitlab/
https://about.gitlab.com/comparison/

You will notice the improvement immediately. For example, the markdown support for footnotes and the reddish-colored highlights for code.

I do hope @keean, @sighoya, @Ichoran, @jdh30 et al will enjoin me in ongoing discussions at the new Zer0 repository. I hope I will prove to be as tolerant as @keean has been. Keean this change is by no means motivated by wanting to take control away from you (and thanks so much for making the issues backup). I just need to have a Zer0 repository and focus my discussions for bringing Zer0 to fruition. I welcome you with open arms (and I am going to try my damnedest to keep my frustration level from boiling over).

@jdh30

This comment has been minimized.

Show comment
Hide comment
@jdh30

jdh30 Oct 17, 2018

I just noticed that the Benchmarks Games (aka Great Computer Language Shootout) has been cited several times from this thread. I'm afraid you cannot draw any strong conclusions from its results because their methodology is so flawed.

In point of fact I think @keean stated that GC'd languages are slower there. That is a reflection of the ideological beliefs of that site's owner who subjectively "de-optimises" solutions that are deemed by him to be too fast. Another problem is that most of the tasks are trivially reducible because, for example, they take trivial input, produce trivial output (e.g. a constant like the digits of pi) and compute needless data. Haskell caused havok when it was introduced because lazy evaluation meant that several benchmarks returned immediately because their intermediate results didn't need computing. More sinisterly, many other languages were doing artificially well because their compiler's optimisers would statically evaluate half of the benchmark.

Consequently, the results say little to nothing about the characteristics of the languages and their implementations being tested.

I'd also note that the original binary trees benchmark is due to Hans Boehm who, I believe, invented it as a test for his conservative Boehm GC.

jdh30 commented Oct 17, 2018

I just noticed that the Benchmarks Games (aka Great Computer Language Shootout) has been cited several times from this thread. I'm afraid you cannot draw any strong conclusions from its results because their methodology is so flawed.

In point of fact I think @keean stated that GC'd languages are slower there. That is a reflection of the ideological beliefs of that site's owner who subjectively "de-optimises" solutions that are deemed by him to be too fast. Another problem is that most of the tasks are trivially reducible because, for example, they take trivial input, produce trivial output (e.g. a constant like the digits of pi) and compute needless data. Haskell caused havok when it was introduced because lazy evaluation meant that several benchmarks returned immediately because their intermediate results didn't need computing. More sinisterly, many other languages were doing artificially well because their compiler's optimisers would statically evaluate half of the benchmark.

Consequently, the results say little to nothing about the characteristics of the languages and their implementations being tested.

I'd also note that the original binary trees benchmark is due to Hans Boehm who, I believe, invented it as a test for his conservative Boehm GC.

@keean

This comment has been minimized.

Show comment
Hide comment
@keean

keean Oct 17, 2018

Owner

@jdh30 whilst I agree with what you are saying about the benchmarks game, my own benchmarking agrees with those conclusions. I am open to any empirical evidence that those conclusions are wrong, but anecdotal evidence will get us nowhere.

Owner

keean commented Oct 17, 2018

@jdh30 whilst I agree with what you are saying about the benchmarks game, my own benchmarking agrees with those conclusions. I am open to any empirical evidence that those conclusions are wrong, but anecdotal evidence will get us nowhere.

@Ichoran

This comment has been minimized.

Show comment
Hide comment
@Ichoran

Ichoran Oct 17, 2018

@keean @jdh30 - The CLBG expressly says, itself, that the benchmarks are flawed; but there are quite a few, and the flaws are not all identical, and my experience also agrees with Keean's (i.e. CLBG basically gets it right for most stuff).

The nice thing about CLBG is that it lets you talk about something instead of waving your hands in the air. People love waving their hands in the air about their favorite language and how fast it is. But that's even less reliable, even more flawed--by a long shot--than CLBG.

So I'd love to have better, but sadly and weirdly, it seems to be about the best there is out there right now. Efforts like TechEmpower's web benchmarks suffer from two gigantic confounds: first, all the slow languages are calling routines in C and C++ to do a lot of the heavy lifting (not useful if you want to know how good the language is on its own rather than acting as glue); and second, there isn't adequate attention to produce optimized implementations of the different platforms due to the effort involved. In contrast, the CLBG benchmarks require simple algorithms where you mostly have to rely upon basic language features, and are small enough so multiple people can (and have) submitted benchmarks that try to improve on the existing scores.

There are a non-negligible number of glitches and problems (e.g. I spent a lot of time writing Scala benchmarks which all were removed when Scala was dropped--I'm pretty annoyed by that), but it's still a source of data in an area that is mostly just hot air.

Ichoran commented Oct 17, 2018

@keean @jdh30 - The CLBG expressly says, itself, that the benchmarks are flawed; but there are quite a few, and the flaws are not all identical, and my experience also agrees with Keean's (i.e. CLBG basically gets it right for most stuff).

The nice thing about CLBG is that it lets you talk about something instead of waving your hands in the air. People love waving their hands in the air about their favorite language and how fast it is. But that's even less reliable, even more flawed--by a long shot--than CLBG.

So I'd love to have better, but sadly and weirdly, it seems to be about the best there is out there right now. Efforts like TechEmpower's web benchmarks suffer from two gigantic confounds: first, all the slow languages are calling routines in C and C++ to do a lot of the heavy lifting (not useful if you want to know how good the language is on its own rather than acting as glue); and second, there isn't adequate attention to produce optimized implementations of the different platforms due to the effort involved. In contrast, the CLBG benchmarks require simple algorithms where you mostly have to rely upon basic language features, and are small enough so multiple people can (and have) submitted benchmarks that try to improve on the existing scores.

There are a non-negligible number of glitches and problems (e.g. I spent a lot of time writing Scala benchmarks which all were removed when Scala was dropped--I'm pretty annoyed by that), but it's still a source of data in an area that is mostly just hot air.

@NodixBlockchain

This comment has been minimized.

Show comment
Hide comment
@NodixBlockchain

NodixBlockchain Oct 18, 2018

I did experiments with a c runtime some time ago, to test performance of atomic reference counting vs tracing GC on binary tree algorithm, as the runtime make reference tracing easy, tracing GC was still much faster, but i would need to redo those tests because i updated memory management since i made them.

https://github.com/NodixBlockchain/memtest/tree/master/memtest

I wanted to also test cases where there is infrequent allocation/deallocation and lot of allocated regerences, so that the GC has to do tracing for nothing many times, to see how it can impact performance.

NodixBlockchain commented Oct 18, 2018

I did experiments with a c runtime some time ago, to test performance of atomic reference counting vs tracing GC on binary tree algorithm, as the runtime make reference tracing easy, tracing GC was still much faster, but i would need to redo those tests because i updated memory management since i made them.

https://github.com/NodixBlockchain/memtest/tree/master/memtest

I wanted to also test cases where there is infrequent allocation/deallocation and lot of allocated regerences, so that the GC has to do tracing for nothing many times, to see how it can impact performance.

@keean

This comment has been minimized.

Show comment
Hide comment
@keean

keean Oct 18, 2018

Owner

It really depends on what you are doing with the binary tree. If you know max bound of node count, then pre allocate all the nodes in a single Arena, and then deallocate the lot when you have finished with the tree. Nothing can be faster than that.

Conversely if you need eager deallocation, then freeing memory as soon as the node is removed from the tree will be optimal for memory usage.

Owner

keean commented Oct 18, 2018

It really depends on what you are doing with the binary tree. If you know max bound of node count, then pre allocate all the nodes in a single Arena, and then deallocate the lot when you have finished with the tree. Nothing can be faster than that.

Conversely if you need eager deallocation, then freeing memory as soon as the node is removed from the tree will be optimal for memory usage.

@NodixBlockchain

This comment has been minimized.

Show comment
Hide comment
@NodixBlockchain

NodixBlockchain Oct 18, 2018

NodixBlockchain commented Oct 18, 2018

@jdh30

This comment has been minimized.

Show comment
Hide comment
@jdh30

jdh30 Oct 18, 2018

@keean "my own benchmarking agrees with those conclusions". As long as you've tested it independently that's fine, of course. I'm just highly skeptical of any conclusions drawn from the Benchmarks Game. My own results vary, e.g. http://fsharpnews.blogspot.com/2010/03/f-vs-unmanaged-c-for-parallel-numerics.html

"Conversely if you need eager deallocation, then freeing memory as soon as the node is removed from the tree will be optimal for memory usage". You may want sliding compaction.

@Ichoran "So I'd love to have better, but sadly and weirdly, it seems to be about the best there is out there right now".

You may find the following benchmarks more useful:

Ray Tracer Language Comparison http://www.ffconsultancy.com/languages/ray_tracer/
HLVM version http://flyingfrogblog.blogspot.com/2010/01/hlvm-on-ray-tracer-language-comparison.html
These programs are only ~100 LOC but cover different (related) algorithms in different languages with a significant performance hot path, i.e. the time isn't focused on a single line of code. The results have been analyzed in detail.

List-based n-queens Benchmark http://flyingfrogblog.blogspot.com/2010/12/towards-mark-region-gc-for-hlvm.html
This GC benchmark hits the generational hypothesis squarely.

Symbolic differentiation with Mathematica vs Swift vs OCaml vs F# on .NET and Mono http://flyingfrogblog.blogspot.com/2017/12/does-reference-counting-really-use-less_26.html
This benchmark was originally designed to stress garbage collectors with both short- and long-lived values. However, some of the idiomatic implementations (e.g. Swift) produce zero garbage. So it ended up also benchmarking data representations.

"multiple people can (and have) submitted benchmarks that try to improve on the existing scores."

I have contributed many different solutions, often many times faster and sometimes orders of magnitude faster only to have them rejected. My personal experience was that improvements to solutions in non-GC'd languages like C and C++ tended to be accepted whereas huge improvements to OCaml and F# were rejected.

I find it laughable that I am still credited with only C++ versions: https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/spectralnorm-gpp-6.html

"which all were removed when Scala was dropped"

Eh, no Scala?!

jdh30 commented Oct 18, 2018

@keean "my own benchmarking agrees with those conclusions". As long as you've tested it independently that's fine, of course. I'm just highly skeptical of any conclusions drawn from the Benchmarks Game. My own results vary, e.g. http://fsharpnews.blogspot.com/2010/03/f-vs-unmanaged-c-for-parallel-numerics.html

"Conversely if you need eager deallocation, then freeing memory as soon as the node is removed from the tree will be optimal for memory usage". You may want sliding compaction.

@Ichoran "So I'd love to have better, but sadly and weirdly, it seems to be about the best there is out there right now".

You may find the following benchmarks more useful:

Ray Tracer Language Comparison http://www.ffconsultancy.com/languages/ray_tracer/
HLVM version http://flyingfrogblog.blogspot.com/2010/01/hlvm-on-ray-tracer-language-comparison.html
These programs are only ~100 LOC but cover different (related) algorithms in different languages with a significant performance hot path, i.e. the time isn't focused on a single line of code. The results have been analyzed in detail.

List-based n-queens Benchmark http://flyingfrogblog.blogspot.com/2010/12/towards-mark-region-gc-for-hlvm.html
This GC benchmark hits the generational hypothesis squarely.

Symbolic differentiation with Mathematica vs Swift vs OCaml vs F# on .NET and Mono http://flyingfrogblog.blogspot.com/2017/12/does-reference-counting-really-use-less_26.html
This benchmark was originally designed to stress garbage collectors with both short- and long-lived values. However, some of the idiomatic implementations (e.g. Swift) produce zero garbage. So it ended up also benchmarking data representations.

"multiple people can (and have) submitted benchmarks that try to improve on the existing scores."

I have contributed many different solutions, often many times faster and sometimes orders of magnitude faster only to have them rejected. My personal experience was that improvements to solutions in non-GC'd languages like C and C++ tended to be accepted whereas huge improvements to OCaml and F# were rejected.

I find it laughable that I am still credited with only C++ versions: https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/spectralnorm-gpp-6.html

"which all were removed when Scala was dropped"

Eh, no Scala?!

@keean

This comment has been minimized.

Show comment
Hide comment
@keean

keean Oct 18, 2018

Owner

If you have "owning references" then you don't need reference counting with a tree. The internal links in the tree would be owning references, so when nodes are removed you just free them. In effect you do not allow any external references to outlive the owning reference (this is what Rust lifetimes are all about).

Owner

keean commented Oct 18, 2018

If you have "owning references" then you don't need reference counting with a tree. The internal links in the tree would be owning references, so when nodes are removed you just free them. In effect you do not allow any external references to outlive the owning reference (this is what Rust lifetimes are all about).

@NodixBlockchain

This comment has been minimized.

Show comment
Hide comment
@NodixBlockchain

NodixBlockchain Oct 18, 2018

NodixBlockchain commented Oct 18, 2018

@keean

This comment has been minimized.

Show comment
Hide comment
@keean

keean Oct 18, 2018

Owner

@jdh30

You may want sliding compaction

With an MMU we should never need compaction. We should be able to remap the pages...

Owner

keean commented Oct 18, 2018

@jdh30

You may want sliding compaction

With an MMU we should never need compaction. We should be able to remap the pages...

@keean

This comment has been minimized.

Show comment
Hide comment
@keean

keean Oct 18, 2018

Owner

@NodixBlockchain My point is with data structures that have an "insert" and "remove" operation we don't need memory management, because we are manually managing the collection in the first place. If we forget to remove a node from a tree then we leak the space. In other words "insert" is allocation and "remove" is deallocation. There is no need to separately manage the memory.

Owner

keean commented Oct 18, 2018

@NodixBlockchain My point is with data structures that have an "insert" and "remove" operation we don't need memory management, because we are manually managing the collection in the first place. If we forget to remove a node from a tree then we leak the space. In other words "insert" is allocation and "remove" is deallocation. There is no need to separately manage the memory.

@NodixBlockchain

This comment has been minimized.

Show comment
Hide comment
@NodixBlockchain

NodixBlockchain Oct 18, 2018

NodixBlockchain commented Oct 18, 2018

@keean

This comment has been minimized.

Show comment
Hide comment
@keean

keean Oct 18, 2018

Owner

@NodixBlockchain I see, you can use this to compare performance of different automatic memory management algorithms, but it does not allow for meaningful comparison with manual memory management.

Owner

keean commented Oct 18, 2018

@NodixBlockchain I see, you can use this to compare performance of different automatic memory management algorithms, but it does not allow for meaningful comparison with manual memory management.

@keean

This comment has been minimized.

Show comment
Hide comment
@keean

keean Oct 18, 2018

Owner

@jdh30 Interesting that the HLVM work points the finger at boxing for poor performance of GC languages, as its has GC but an unboxed representation and seems to perform pretty well.

If HLVM can keep that sort of comparison with C++ more widely, I would say it is good enough performance wise.

Owner

keean commented Oct 18, 2018

@jdh30 Interesting that the HLVM work points the finger at boxing for poor performance of GC languages, as its has GC but an unboxed representation and seems to perform pretty well.

If HLVM can keep that sort of comparison with C++ more widely, I would say it is good enough performance wise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment