Join GitHub today
WD-40 (for reducing Rust with the next mainstream language) #35
Afaics, there are some marvelous simplifying implications of this axiom:
¹ But not cloning resources such as file handles.
² And Rust’s error handling has no
³ Even arrays of data structures that have variable sized members such as
Note bounded polymorphism never absolutely requires boxing. Unbounded polymorphism would but we are refusing to support
Subsequent discussion in this thread will explain further the rough sketch so that readers can gain clarity. Let’s see what caveats and flaws are revealed via discussion. I look forward to @keean’s feedback and perhaps he will explain to what degree it corresponds to some features of the Ada PL.
I do not claim any of these ideas are novel inventions. @keean has alluded to for example cloning as an alternative to lifetimes and/or GC. What I am doing above is making a more holistic presentation of various ideas that have been floating around.
Of course typeclasses or ML functor/modules on top of this for HLL abstractions. If typeclasses, there is an option to go with implicit function arguments instead of
EDIT: on the coding for readability point, “Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.”
We need optional GC and reference counting. The latter excels at deterministic, prompt finalization (which is required for non-main memory finalization) but has the trade-offs quoted below:
Has it ever been tried to combine reference counting with GC? Apparently so:
Edaqa Mortoray wrote:
Given our single-threaded model, we do not need atomics on the reference counting and given the proposal of the OP to emphasize borrowing and RAII then we would not get the benefits of a generational GC scheme. So reference counting would give us determinism except in the case of cyclical references and then GC would kick in to break cycles. However, when we know the cyclical references are probable (e.g. any kind of non-acyclic graph such as the DOM or a doubly-linked list), then it is wasted overhead to employ reference counting, although that overhead could be very tiny (i.e. infrequent tracing) if we only want to catch unexpected cycles (i.e. as an insurance policy and we’d want to log these and consider them bugs).
I suppose we could rate-limit reference counting finalization, to prevent the rare aberrant high latency stall.
For this last quoted issue, I am contemplating the ability to employ the unused bits (c.f. also) of 64-bit pointers (64-bit also provides a huge virtual memory space and c.f. also) would enable storing an index into a contiguous array which has the reference counts as array elements so they would be in contiguous pages. If the number of available unused bits is insufficient to count all allocated objects, then perhaps each compiled object type could employ a separate array (i.e. a zero-runtime-cost compile-time array selection), presuming the cache thrashing issue is due to intra-page-size fragmentation, not lack of inter-page contiguity. Also then we do not need to create the array element until the second reference. This is applicable when mostly passing around objects and not accessing all the data of the object as frequently.
Weighted reference counting can be employed to reduce the accesses to the reference count by half (since the source pointer has to be accessed any way), but it requires some unused bits in the pointer to store the weight.
David Ireland wrote:
Edit Nov. 18, 2017 Distinguish between non-deterministic reference counting and deterministic RAII.
@anonymous wrote in private:
I need progress on code, not language design, but I am trying to have paradigm I can be thinking about when structuring my code in TypeScript + C, so that it can be an easy transition if ever we complete a new PL and also to drive my insights on that potential language.
Concretely, I think it means I will write wrapper classes with getters and setters in TypeScript that take an ArrayBuffer for data and do my
Re-reading the prior linked comments…
First realization is that reference counting is non-deterministic and is inferior to GC in every facet (for the applicable use cases of non-deterministic resource management and thus the inherent lack of deterministic finalization) except for interop with non-GC objects as explained below (but potentially at the cost of loss of compaction and loss of automatic cyclical references memory leak detection).
Afaics, a useful facility to add in addition to GC is the zero-cost1 abstraction of deterministic RAII block scope allocated/deallocated on the stack, because it (probably improves performance and) adds determinism of finalization (aka destructors) and interacts with the optimization of low-level performance as explained below.
To achieve such zero-cost resource management abstractions requires typing/tracking the lifetimes of borrows of references so the compiler can prove a resource has no live borrow when it goes out of block scope. I had proposed the compile-time enforced “is stored” annotation on function inputs when an input will be (even transitively) borrowed beyond the lifetime of the function application. A separate concern from the issue of tracking borrows (a la Rust’s total ordering which assumes multithreaded preemption everywhere or my proposed partial ordering in conjunction with singled-threaded event model) for the purpose of preventing concurrent race conditions access (which afair we discussed in depth in the Concurrency thread). This granular typing of the context of side-effects seems more straightforward than some grand concept of typing everything as a (a cordoned context of imperative) side-effect??2
One of the key advantages/requirements of GC is the ability to move objects so that memory can be periodically compacted (with the compaction being amortized over many implicit/delayed deallocation events which would otherwise be explicit/immediate/non-deterministic-latency with reference counting), which enables allocation to be as low-cost as incrementing a pointer to the end of allocated objects in a contiguous memory address space.
As an aside, for highly optimized low-level code to run at maximum performance, it must be possible to freeze the position of objects in the virtual memory address space, else the objects must be copied. The former has a concurrency live and dead-lock issue w.r.t. the said compaction.
For RAII managed data structures that contain pointers to data (i.e. unboxed) which is not RAII managed at the current or containing block scope (i.e. for GC pointers since pointers to data RAII managed in a contained block scope would exhibit use-after-free errors), these pointers have to be mutable by the compaction engine and have to be recorded in the list of references which the GC must trace. To have full interoperability between GC instances and RAII (stack allocated data structure) instances without copying data, incurs either a performance penalty of reloading pointers on each access (in case they’ve been moved by the compactor) else the aforementioned concurrency live and dead-lock issue w.r.t. the said compaction (which is an onerous, undesirable can-of-worms).
Thus, I’m leaning towards that stack allocated data structures should not interop with GC instances, except via copying because (other than deterministic finalization) performance is their primary raison d'être. And interop with GC would (as GC does) also encourage unboxed data structures, which are known to thrash caches and degrade performance. Additionally, I think it should be allowed to have GC instances which are unboxed data structures (e.g. transpiled to
However, is memory compaction even necessary on 64-bit virtual address spaces wherein the hardware can map the virtual address pages to compacted, available physical pages? Well I guess yes unless most objects are much larger than the ~4Kb page size.
However, afaics perhaps the compaction issue could be significantly mitigated by allocating same size objects from contiguous virtual address space significantly larger than ~4KB in reference counting algorithm. Thus allocations would be first applied to deallocated slots in the address space, if any, per the standard reference counting allocation and deallocation algorithm, which employs a linked-list of LIFO free slots.3
If compaction is required, then I contemplate whether to allow unsafe low-level code which manually employs
1 Zero runtime cost because due to the compile-time determinism, we know at compile-time when the deallocation will occur.
2 How does Oleg’s insight conflict with @keean’s prior stance (in our discussions) against AST macros? I vaguely remember that he was against macros because they obfuscated debugging and the type system.
Is this the paper? http://okmij.org/ftp/Computation/having-effect.html#HOPE
I’m unable to readily grok Oleg’s work because he communicates in type theory and Haskell (neither of which am I highly fluent). Afaics he explains with details/examples instead of conceptually and generatively from ground level concepts. And I do not have enough time to apply arduous effort. So he limits his audience to those who are fluent in every facet of the prior art he builds on and/or who can justify the time and effort.
Essentially all side-effects distill to the generative essence of shared state. If state is not shared, then there are no interactions of effects between those processes.
Afaics, a Monad controls the threading of access to an orthogonal context across function application. That orthogonal context can be state in the State monad. Correct? That is why Monads can’t generally compose because they cordon orthogonal contexts.
But what is the goal to achieve? If we model higher-level concepts as imperative building blocks with contexts, we can do unlimited paradigms, but not only will our code become impossible to read because of the overhead of the necessary syntax and type system to allow for such generality (which was essentially my complaint against trying to model everything, e.g. even subtyping, as typeclasses as you were proposing before), but we have an explosion of issues with the interaction of too many paradigms. Afaics, this isn’t headed towards a simple language.
There will not be a perfect language. Type theory will never be complete and consistent. Design is driven by goals.
3 Richard Jones. Garbage Collection Algorithms For Automatic Dynamic Memory Management, §2.1 The Reference Counting Algorithm
If i may :)
For me the main problem i would have with garbage collector is that they are harder to debug and test.
The nightmare i have is you will test the functionality on simple program, then it works, and you keep adding functions, race conditions, and two month latter you realize there is a memory leak or memory corruption somewhere, and you're good for days or weeks of debugging to trace the problem.
With reference counting, it's very easy to trace problem, because you know where the freeing must happen, the point where the last reference to an object is released can be tracked easily, and it make it very easy to track problems, either they are memory corruption due to memory being released prematurely , or memory leaks.
With garbage collector who will parse the whole tree every now and then, it will traverse thousands of pointer allocated over few minutes of running, and it can become much harder to track where the problem is if it free a memory that it shouldn't, or if it doesn't free a memory that it should.
Especially if you go for multi layered scope, with closures, and asynchronous code execution, it can become very hard to make sure the GC will catch everything right. And bugs will generally appear when the program start to become more complex rather than in easy vanilla cases when the program is small and simple.
And if you need to have a system of plugin, like code that are loaded at runtime that can manipulate references itself, it can become also impossible to track deterministically the reference tree at compile time, and the host program might have no clue at all of what the plugin is doing with the reference passed to it. And it's the same problem with distributed application.
For me now when i want to add new feature, especially for memory management, my first concern is not only 'how to make it work', but also how easy it will be to detect problems, and to track what cause the problem.
With reference counting, it's easy to make a break point when reference get freed, and having the debugger to pause everything and inspect all the context at the moment where it happens.
With GC, the analysis is delayed, and all the context of execution that change the state of the object is lost, so it makes it very hard to debug.
For me to make GC programming easy, it needs something like android SDK where all the application is structured at macro level with activities, intents, special class for asynchronous/background execution, and making in sort that lifetime of all objects are predictible due to the structuring of the application, declaration of resources in XML etc.
They have whole lot of classes to build the application on, to make resource management and lifecycle of object more deterministic.
It impose a certain structure to all application classes and component to make memory management fit in case that the GC can easily deal with.
Making general purpose GC to deal with complex language with multi layered scope and asynchronous execution without giving any hint to the pattern that the object is likely to follow by inheriting some class, and structuring the application with built class with predictible behavior seems very hard.
Reference counting is much easier to deal with for the general case.
My preference is to manage as much as possible from the stack as you can. There are some simple tricks that can make this work. If all allocation is RAII and you have proceedures not functions then to return some data you simply pass a collection (set/map/array etc) into the proceedure.
Edit: where this gets complicated is you then end up with two kinds of pointers "owning pointers" which must form a Directed-Acyclic-Graph, and non-owning (or reference) pointers (and links between data-structures). The simple solution is to make all 'references' weak-references, so that you have to null-test them when dereferencing to check if the resource they point to has gone away. Then we simply delete things when they go out of scope, removing all resources pointed to by owning-pointers in the DAG. No GC no Reference counting, and a simple non-zero check on dereference. Of cource 'nullable' pointers are not considered a good thing, so we can then look for static techniques where we do not allow references to outlive the object to they reference. This is where Rust's lifetimes come from.
I can’t speak for @keean (and this is his project area but this is my thread), but personally I’d prefer “please don’t” (or post once for every 10 posts by others in this thread, so as to keep the S/N ratio of the thread higher), i.e that you would comment in your thread “Saluations! :)” instead (or at least not in this/my thread), unless you have something new+important+accurate+well-researched to point out which you have not written already in the past.
If you quote from this thread, and post in your thread, I will probably read (presuming I have time) and perhaps even reply there. I’m trying to avoid this thread becoming another 300+ posts that I can’t wade through when I want to refresh my own memory of what my (and @keean’s) analyses was/is.
@keean and I have been doing backtalking on LinkedIn to reduce clutter in these issues thread. You’re welcome to add me there.
I’m wondering whether that is a superstitious, anecdotal belief (i.e. aliasing error) or an assertion you actually studied, measured, and verified to be the case with in depth effort for comparing reference counting and tracing garbage collection. Because inherently reference counting doesn’t track the directed graph of memory allocation (i.e. all we have are reference counts and no back pointers to the references that comprise those counts); thus, there’s no feasible way to see which data structures have leaked because they’re no longer reachable from the untraced roots of the said graph in a reference counting GC system— for example due to cyclical references. Whereas, in theory a tracing garbage collector could provide such a graph for debugging.
You can’t see which/where are all the references to the shared object that had it’s reference count decremented, because it’s the implicit nature of non-deterministic memory allocation to not know which pointers would be the candidates and the reference counting scheme provides no back pointers from the said object to said sharing pointers (as aforementioned).
I agree it wouldn’t normally be possible to set a runtime breakpoint on the change in number of references to a specific object with a tracing GC, because the assignment of pointers doesn’t touch the referenced object (although write-guards for generational GC might get you as close as the virtual page but that might not be helpful). Thus, there’s no way to know which assignment to set a breakpoint on for testing the address of the said shared object. Yet comparably, the more optimized variants of reference counting also have the same debugging problem because they don’t even update the said common object (as explained below) because they keep the reference count in the pointers (and additionally there is the Weighted Reference Counting). In both cases, a sophisticated debugger and language could in theory be designed to set such conditional breakpoints on all pointer assignments (conditionally breaking only if the said shared object address is encountered), if such a facility is helpful for debugging memory leaks. And the tracing GC would have more information about the memory allocation graph.
It seems you’re presuming that reference counting models deterministic memory allocation, but that is not generally the case and (per my reply to @keean below), if that were generally the case then in theory it could be replaced by zero-runtime-cost compile-time deterministic allocation mgmt (albeit with tsuris of a complex compile-time typing system). Afaics, you continue to repeat this belief system of yours (presumably confirmation biased by your presumed preference for your low-level framework you discussed in your thread “Saluations! :)” and per my comments to @keean about C/C++ below) without refuting the points that have been made to you numerous times previously. I’ll recap and augment… (and hoping you’ll keep the not-so-thoroughly-contemplated-studied noise to a minimum in this/my thread please).
I don’t see how the process lifecycle has anything to with the generalized problem that in general memory allocation is non-deterministic.
Programming paradigms for avoiding memory leaks are useful regardless of the memory allocation scheme chosen.
I previously mentioned the following disadvantages for reference counting to you (either in the Concurrency thread and/or the other thread you created). And note the first one below is a form of memory leak inherent in reference counting.
Reference counting (which technically is a form of “direct”1 garbage collection) can’t garbage collect cyclical references. And according to an expert book I’m reading there’s no known means of weak pointers or other paradigm which can reliably solve the problem of cyclical references in all scenarios.2 This makes sense because reference counting (RC) doesn’t contemplate the entire directed graph of objects holistically. Thus only “indirect, tracing”1 garbage collection which deals with the entire directed graph will reliably detect and garbage collect cyclical references. However, note that probabilistic, localized tracing combined with RC decrements is comparable to mark-sweep (MS)3 for (even apparently asymptotic) throughput computational complexity costs, yet with an upper bound (“6 ms”) on pause times due to locality of search. Yet presumably pause times are eliminated with a concurrent MS (and as cited below, RC has less throughput than BG-MS).
According to this expert book4, reference counting has significantly worse throughput (except compared to the pathological asymptotic huge heap case for tracing GC which can be avoided15, and perhaps multi-threaded/multi-core/parallelism but I think need to study that more) because of the significant additional overhead for every pointer assignment. This overhead as a percentage is worse for highly optimized, low-level languages (e.g. C/C++) compared to interpreted languages which have high overhead any way. Note that a claimed throughput parity for RC with MS5 (and it’s just culling young generation overhead similar to prior art16) is not parity with generational GC combined with MS (BG-MS).6 Additionally the adjustment of the reference counts on each pointer assignment thrashes the cache7 further reducing performance, except for optimizations which generally have to combined with a tracing GC any way (and they diminish the afaics mostly irrelevant claimed deterministic finality benefit).8 Additionally for multithreaded code, there is additional overhead due to contention over the race condition when updating reference counts, although there are potential amortization optimizations which are also potentially superior for parallelized and/or distributed paradigms9 (but they diminish the afaics mostly irrelevant claimed deterministic finality benefit). Additionally memory allocation management on the heap is non-deterministic, so the determinism of the immediacy of deallocation for referencing counting is afaics more or less irrelevant. Additionally reference counting (especially when executing destructors) can cause a dominoes cascade of latency (aka “stop the world”) and the amortization optimizations10 again diminish the afaics mostly irrelevant claimed deterministic finality benefit.
However, in my next post in this thread, I will propose a variant of deferred reference counting11 in conjunction with a clever generational GC scheme (apparently similar to prior art16), for the avoiding the pathological asymptotic case of tracing GC. See my further comments about this in my reply to @keean below.
In general, more complexity in the code will increase the probability of semantic memory leaks, even semantic leaks of the form that automatic garbage collection can’t fix because the error is semantic. Reference counting (a form of automated GC) will also suffer these semantic memory leaks. The issue of complexity of semantics, is more or less not peculiar to the type of automatic GC scheme chosen. Thus I don’t view this as a valid complaint against tracing GC in favor of reference counting.
1 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §1 Introduction, pg. 2.
2 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §3.5 Cyclic reference counting, pg. 56 – 62, §3.6 Issues to consider: Cyclic data structures, pg. 71.
3 D. F. Bacon and V. T. Rajan (2001). Concurrent cycle collection in reference counted systems. In J. L. Knudsen, editor, Proceedings of 15th European Conference on Object-Oriented Programming, ECOOP 2001, Budapest, Hungary, June 18-22, volume 2072 of Lecture Notes in Computer Science, pp. 207–235. Springer-Verlag, 2001.
4 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §2.1 The Reference Counting Algorithm, pg. 19 – 25, §3 Reference Counting, pp 43 – 74.
5 R. Shahriyar, S. M. Blackburn, X. Yang, and K. S. McKinley (2012), "Taking Off the Gloves with Reference Counting Immix," in OOPSLA ‘13: Proceeding of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications, 2013. Originally found on Steve Blackburn’s website.
6 Stephen Blackburn; Kathryn McKinley (2003). "Ulterior Reference Counting: Fast Garbage Collection without a Long Wait". Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications. OOPSLA 2003, Table 3 on pg. 9.
7 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §2.1 The Reference Counting Algorithm: The strengths and weakness of reference counting, pg. 22.
8 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §3.3 Limited-field reference counts, pp. 50 – 55, §3.3 Limited-field reference counts: One-bit reference counts, pg. 52, §3.6 Issues to consider: Space for reference counts & Locality of references, pg. 71.
9 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §8.4 Concurrent Reference Counting, pp. 200 – 201, §3.7 Notes, pg. 74, §4.1 Comparisons with reference counting, pg. 77.
10 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §3.1 Non-recursive freeing, pg. 44.
11 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §3.2 Deferred reference counting: The Deutsch-Bobrow algorithm, pg. 46.
This is conceptually broadly analogous/related to deferred reference counting12 schemes which rely on compile-time deterministic tracing of the deallocation, such as via linear (aka uniqueness) types.
Although I was obviously also contemplating similarly to you in my prior posts in this thread (after you had reminded me about RAII in our previous discussions and my recollection of programming in C++ in the late 1990s), after reading a book on garbage collection and thinking more about the issues, I’m going to further argue against the compile-time deterministic memory mgmt language design strategy in my upcoming detailed post, because I believe it’s not as generally flexible, generally sound, nor highly comprehensible solution.
These stack memory mgmt paradigms you’re referring to, have narrow applicability because they don’t handle the general case of non-deterministic heap management.
Afaics, it’s a special case methodology that leads to a very complex set of corner cases such as for C/C++ which tries to pile on a bunch of different sort of compile-time optimizations and keep as much memory mgmt on the stack, but lead to a brittle clusterfuck of a language with a 1500 page manual that virtually no one understands entirely (not even the C++ creator Bjarne Stroustrup nor the STL creator Alexander Stepanov, as they both admitted).
And those stack paradigms don’t marry well holistically with a bolt-on tracing garbage collection appendage13 (i.e. was not design holistically with the language), although I have not yet read a more recent research paper on combining tracing GC with Rust.14 To even attempt to apply these compile-time deterministic memory management holistically requires a complex typing tsuris akin to Rust’s total ordering of borrow lifetimes with lifetime annotation parameters for transitivity of borrows from inputs to outputs, and of course the programmer must violate the invariants and create unsafe code in order to handle non-deterministic (i.e. runtime randomized) memory mgmt, which defeats the entire point as then the total order is no longer checked and the unsafety can leak back into the code the compiler thinks is safe.
I apply the 80/20 or 90/10 Pareto principle to prioritization. Although all that special case tsuris might be able to obtain an extra 10% of performance, it has the cost of 90% more effort on debugging, readability of the code, maintenance of the code, complexity of the code, etc.. When you really need that 10% boost (or in the case where the more generalized GC paradigm is much slower for some reason), then go write a module in C++ or Rust. I see no need to invest my effort in trying to recreate (or improve upon) those languages, because there are millions of man-hours already in those ecosystems. Our only prayer of creating something useful in this decade with our resources, is to 90/10% prioritize on clean abstractions that are elegant and generalized and 90% performant with 10% of the effort and complexity.
EDIT: on the coding for readability point, “Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.”
Additionally, perhaps my GC design ideas (to be outlined in my next post) might possibly be compelling (if I haven’t made some errors in my thought process, which is entirely possible given I need to eat dark chocolate to even get my brain to ephemerally coax my brain out of the 105 IQ brain fog gutter and function non-lethargically presumably due to an ongoing TB infection). Note however, that the prior art for GC is apparently vast and seemingly exhaustive in terms of the low hanging fruit of obvious, simple variants16 (i.e. fertile ground presumably only in the realm of esoteric, high complexity, and/or abstruse). Thus, unlikely I discovered something (after a day of reading and contemplation) that hadn’t be considered previously in the prior art, although there appears to be potential prior art similar to my ideas16 that have appeared later than the book I read. Note I independently arrived at my ideas (after about a day of contemplating the 1996 book) and have not yet read that 2003 research paper16 to compare.
Total ordering compile-time paradigms (including attempting to type check everything) are afaics brittle paradigms that break because the programmer must necessarily break out of them because the entropy of the universe is not bounded. I commented on this previously numerous times in these issues threads. The non-determinism (i.e. unbounded randomness and entropy) of the universe (i.e. the real world in I/O) can’t be statically modeled.
My thought is we need to be circumspect about the benefits versus trade-offs of every paradigm and feature we choose in a programming language. And the target use cases (and acumen/preferences of the target programmers demographics) of the programming language will also factor into the decision process.
For example, Haskell programmers favor brevity and mathematical expression with trade-offs such as hidden (2nd, 3rd, and 4th order) details in the form of implicit typing (which the mathematically minded model in their mind without reading them explicitly in the code), which drastically reduces the demographics of programmers who can read the code and participate. Some snobbishly claim this is a good filter to keep the less intelligent programmers away. Haskell has other trade-offs (necessarily to enable the mathematical modelling) which we’ve discussed else where are outside the scope of this concise summary, including but not limited to (and my descriptions may not be entirely accurate given my limited Haskell fluency) the debugging non-determinism tsuris of lazy evaluation (although the requirement for lazy evaluation is disputed), the bottom type populating all types, a total ordering (or algebra) on implementation of each data type for each typeclass interface, a total ordering on the sequential (monadic) threading of I/O processing in the code (requiring unsafe code to fully accommodate the non-determinism of I/O), etc..
Human managed weak references are an incomplete and human-error prone strategy. Also as you noted then you leak the non-determinism from deallocation to how to handle error conditions of null pointers, leaking the non-determinism into program logic! That is horrible. Whereas, a provably sound weak pointer algorithm has exponential asymptotics and thus is no better or worse than tracing GC mark-sweep.2
12 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §3.2 Deferred reference counting, pg. 45.
13 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §9 Garbage Collection for C, pg. 227.
14 Y. Lin, S. M. Blackburn, A. L. Hosking, and M. Norrish (2016), "Rust as a Language for High Performance GC Implementation," in Proceedings of the Sixteenth ACM SIGPLAN International Symposium on Memory Management, ISMM ‘16, Santa Barbara, CA, June 13, 2016, 2016.
15 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, Preface: The audience, pg. xxiv, Preface: The Bibliography and the World-Wide Web, pg. xxv.
16 Stephen Blackburn; Kathryn McKinley (2003). "Ulterior Reference Counting: Fast Garbage Collection without a Long Wait". Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications. OOPSLA 2003. I originally found this on Wikipedia’s page for Reference Counting.
Kindle version of the book is available on Amazon for less then $20 if you don’t want to turn your monitor sideways to read the “free” (copyright violation theft) pdf linked above.
EDIT: a more concise overview resource:
I wonder if this solution would work if the object instance referenced can change during the lifetime of the reference.
Because it would mean you might need to delete the referenced object before the reference get out of scope in case a new object is assigned to it during its lifetime.
TypeScript can force you to stick to classes for objects, allowing the JIT to efficiently optimise these. Really for JS where there are no allocations going on it is as fast as asm.js. GC hides all sorts of costs, like the extra level of pointer indirection.
Probably a major factor is that GC hides the cost of allocation and deallocation, so the programmer does not realise they are thrashing object creation.
What is the fastest garbage collected language?
The reference count needs to be in the object referenced, not the pointer, so it won't be in the same cache line as the pointer. When you have multiple pointers to the same object they all need to increment/decrement the same counter.
The key fact for me is that managed memory usage is always significantly worse than the managed. Yes GC can be fast providing you have twice the RAM available. Typically on a multi-tasking machine this means less RAM available for other processes. This results is real slowdown of things like the disk-caching layer because its losing pages to the application.
The end result is that the user experience of using a system with GC is significantly worse than using a system with manual allocation - providing the software does not crash due to bad memory management :-)
The argument in favour of GC isn't speed of execution, it's speed of development at very low costs - an extra 2 to 10% or so. Speed of development in memory safe languages is significantly higher; and if you don't like GC, GC isn't the only way to get to memory safety, but it is the easier way. Reference counting is GC. Arguably, arena allocation, or any other scheme more sophisticated than paired alloc & free is GC. But those schemes are usually chosen not because they perform better (though they often do - free() isn't "free") - but because developer productivity is higher. And none of those schemes are what make something like JS slower than something like webasm. Yes, GC scales because it trades space for time. If you're running on a small embedded device, it can be a poor tradeoff; if you're trying to maximize compute throughput across multiple machines, it can cost real dollars. But these are extremes, and most people aren't on the extremes. I mostly just butted in because accusing GC for JS / Typescript being slower than webasm seemed a bit outlandish. GC overhead is directly measurable; it's trivially disproved for anyone inclined to measure. I'm gonna unsubscribe from this thread now. I got linked in by an `@` reference.…
On Tue, Nov 21, 2017 at 7:05 PM Keean Schupke ***@***.***> wrote: See: https://softwareengineering.stackexchange.com/questions/203745/demonstration-of-garbage-collection-being-faster-than-manual-memory-management The key fact for me is that managed memory usage is always significantly worse than the managed. Yes GC can be fast providing you have twice the RAM available. Typically on a multi-tasking machine this means less RAM available for other processes. This results is real slowdown of things like the disk-caching layer because its losing pages to the application. The end result is that the user experience of using a system with GC is significantly worse than using a system with manual allocation - providing the software does not crash due to bad memory management :-) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#35 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAHypDRVXNDtR_IQ1pw9KW3-aGN6LIWjks5s4x7tgaJpZM4OZwjF> .
Actually the difference in performance between JS and WebASM is only about 10-20% anyway, so that is in the 2%-10% region you claimed for the GC, so for some algorithms it could account for 100% of the difference...
Interestingly I am about to port a bunch of asm.js to typescript, as the memory allocation for asm.js / webasm does not work well with the main JS garbage collector, and allocating the heap for the webasm is causing fragmentation of the JS heap. In the end it turns out that the better memory usage, keeping all them memory managed by the GC is better for the application stability than the small performance gain of having the code in asm.js.
Following this argument to its logical conclusion, the better memory usage of manually managed memory for the whole application would be better overall.
Edit: its probably worth pointing out that this application uses a lot of memory, and has some periods when it is performing rapid allocations. This leads to memory usage for a single tab of about 1GB (even though memory buffers are 16mb so with manual allocation it would get nowhere near that level). Frequently when usage gets more than 1GB the web-browser crashes, even though most of that memory is probably free, and just waiting to be collected.
This is off-topic. Explicit, unchecked (i.e. “unsafe”) manual mgmt is not what we are interested in here. The topic was about automatic (or compiler checked deterministic) mgmt. As you know, both @keean and I are significantly into compiler-assisted checking.
I understand you want to participate. But you really have not wrapped your mind around all the concepts, and so you should be more circumspect and lurk more. I said if you want to post to your thread, then I will try to participate occasionally.
Please be respectful and recognize your limitations.
After briefly learning about it, I’m no longer interested in your low-level framework. Your monotheistic intent on pushing your low-level preferences diverges the focus of my thread, because you don’t have an open-mind attempting to be objective about research results. Your low-level framework focus is irrelevant to the topic of this thread. Please talk about that in your thread which you entitled “Saluations! :)”. This/my thread is not for trying to convince yourself and/or the world that your low-level framework paradigm is great. Your thread is for that. I wouldn’t ban you even if this were my repository, but I believe curation (not absolute censorship) is valuable. Trust me, if you write anything in your thread which I happen to read, that I think is important enough to be in this thread, I will quote it over in this thread. Also I presented a guideline given that I find your posts have so many errors and low informational value, that you could post after every 10 posts by others in the thread (so as to force you to focus yourself more and keep the S/N ratio here reasonable).
I considered the low-level implications as even more evident by the post which follows this one. And realized that your preferences are a dinosaur. See my next post.
I am discussing here about general purpose, high-level programming language design that employs compiler checking and/or automated runtime GC to facilitate lower costs of development and guard against human error. With some thought given to performance optimization and some perhaps C-like low-level capabilities as long as the result is clean, simple, and elegant and not explicit, manual, bugs-prone human mgmt of memory.
Also C code which is transpiled to ASM.js/WASM, is employing different abstractions such as not boxing every damn thing, such as every element of every array.
The VM can’t always statically type objects, so methods and type coersions can require dynamic overhead.
Also I believe JS may be passing all function arguments on the heap (the prototype chain of objects which form the closures we previously discussed), and thus although the throughput and locality (i.e. contiguity for minimizing cache and virtual paging thrashing) is similar (i.e. allocation contiguously on the stack versus the generational heap with only an extra end of area pointer increment for each generational area allocation as compared to one for the SP on each function call), the volume of objects before a collection (i.e. reduction of the end of area pointer, although presumably all function arguments could be allocated as one data structure thus equivalent cost) is much greater than for stack passed arguments and thus perhaps additional thrashing.
C code rarely employs hashmap lookup, yet JS literally encourages it since all object properties and non-numeric array indices employ it.
There are many more details and we would need to study all the reasons:
This thread is in part my attempt to get the best of low-level ASM.js/WASM married to a better GC scheme that is more compatible with (some of) those lower-level abstractions!
Agreed, e.g. do not realize they’re boxing everything forcing an extra pointer indirection for every read/write of an array element for example.
Huh? JS is several times slower than C, and ASM.js is afair purportedly within about 40% slower than native C speed:
Perhaps you’re referring to JS highly optimized to avoid features which impair aforelinked V8’s optimization?
Generational GC is very efficient nearly as much as the stack (as I explained) and I am going to propose an algorithm that I am thinking might make it even more efficient.
It is true that the asymptotic computational complexity (i.e. performance) of tracing GC is pathological both as live memory usage approaches/exceeds the bound of physical memory and as total memory consumption increases for computers with more GBs (or TBs) of DRAM. That is why I cited that research for footnote16 in my prior post—which btw my independently derived idea seems to be roughly analogous to—for removing the pathological asymptotics by leveraging an optimized form of RC for the older generations. I will write my detailed post about this soon.
And that is why I allude in my response to @barrkel to the time vs. space trade-off approaching parity between automated GC and manual/stack/compile-time deterministic paradigms for memory management. The reason that the deterministic paradigms you espouse will not exceed the space vs. time trade-offs for the throughput (nor maximum latency pauses) of state-of-the-art automated GC in general is because I repeat that memory allocation mgmt is inherently non-deterministic (thus requires runtime management)!
Also there is research1 (which I haven’t read yet) which seems to indicate that as the number of (perhaps also non-deterministic) factors (e.g. power and energy management issues) that need to be managed increases, then automatic algorithms may become the most plausible (reasonable cost, time-to-market, stability, maintenance) for most applications with hand-tuning being further, further relegated to the esoteric that demand extreme tuning (e.g. the NASA space shuttle).
Even manual memory mgmt will have virtual memory paging out to disk as the physical memory is exhausted.
That half memory deal is afaik due to the “semi-space” of a copy collector that is not generational. Afaics, there are better alternatives now. See footnote16 in my prior post. I need to study this more thoroughly though.
Of course manual (both unchecked and compiler checked) mgmt can be a bit faster or correct some pathological cases (as well create potentially other ones ;), but at great cost as I wrote in my prior post as follows:
And you alluded to the great cost in terms of bugs and/or tsuris:
In my prior post, I had alluded to how attempting to have the compiler check memory mgmt is forcing it into a deterministic, total order, which will necessarily cause the programmer to violate the compiler checks and leads to bugs (unless perhaps that compile-time paradigm is somehow limited to only deterministic instances, but from my study thus far it appears that the cross-pollution/interopt between young and older generation allocation makes non-deterministic, runtime automated algorithm interop with a deterministic young generation have worse characteristics than a runtime, automated generational non-deterministic collector for the younger generation):
1 T. Cao, S. M. Blackburn, T. Gao, and K. S. McKinley, "The Yin and Yang of Power and Performance for Asymmetric Hardware and Managed Software," in ISCA ‘12: The 39th International Symposium on Computer Architecture, 2012. Originally found on Steve Blackburn’s website. Also T. Cao’s master’s thesis prior art.
Also it is about consistency of memory mgmt, i.e. avoiding the pitfalls of the average programmer or rushed development team.
Well yes, but as I mentioned above in my reply to @keean, and I presume you’re also alluding to, that the differences in space vs. time tradeoffs between automatic and manual memory mgmt are getting closer to low double digit (or even single digit) percentages (i.e. “extremes” of optimization) and thus heavily favoring automatic GC for most use cases. Note I’m not conflating this metric with the 2 - 10% figure for throughput differences that you asserted.
Note though that the metric you propose (although significantly representative given all other factors equivalent) doesn’t account for all impacts of the GC scheme chosen, such as cache and virtual paging locality (contiguity) to minimize thrashing which can significantly impact performance. Note however, that deterministic compile-time (manual explicit or for stack argument conventions and such thus assisted partially implicit) memory mgmt doesn’t necessarily achieve a panacea for this ancillary factor and could actually degrade it because generational and copy GC increase locality.
I appreciate your assistance in helping to explain. And I agree that is an important clarification to emphasize. Please feel free to participate here as often or infrequently as you want.
Btw, @keean is expert in other areas such as C/C++, Haskell, Prolog, type theory, and library algebras. Afaik, he has only been developing seriously with JS/TypeScript for a relatively much shorter period of time. So that might explain it. Or it could also be rushing and trying to answer late in the evening after a long day of managing his company.
If you consider reference as objects with a delete operators rather than only a pointer with a free, the destructor can release reference to childs before to release the reference.
It's not anecdotal, i have fully working application server with it's own memory management, object oriented script language, which deal with thousands of objects and i never spent more than 1h on tracking memory leak in the whole development cycle because i have clear methodology to track them.
Been doing tons of experiment with different model of memory management.
And it's not even multi threaded, and there are already these kind of problem that are recurrent in every GC language out there.
I’m trying to preempt another 1000+ post thread. We already had extensive discussions of your ideas, framework, perspective in the Concurrency and your “Saluations! :)” threads. We don’t need to repeat that again. I’m not trying to be dictatorial or unappreciative of discussion. I’m very interested right now in automated GC, for the reasons I’m explaining. I want to see if we can make it work optimally with the lower-level abstractions and resolve the pathological asymptotics issue.
I already told you that explicitly asking the programmer to remember to break cycles is off-topic (i.e. the destructor will not be automatically fired if there is a cyclic reference preventing the parent most object's reference count from going to 0):
Although you’ve deleted (or moved to your thread?) those numerous prior off-topic posts I asked you to (thank you), you want to repeat the same points of what I percieve to be off-topic noise w.r.t. to the focus I want in this thread. I understand you think you have something important to say, but you apparently did not even pay attention or grok that I had already stated that it (i.e. the same point you’re repeating again) was/is off-topic.
We already discussed in the prior threads that (your and Rust’s paradigm of allowing/presuming) willy-nilly multithreading everywhere is not the paradigm I’m interested in. I’m looking at the single-threaded event model. I will be discussing GC and multithreading in this thread.
How many times am I going to have to repeat myself that I am not interested in discussing further the anecdotal claims of your low-level framework. You’re apparently overlooking the more holistic point that:
If you’re interested in your paradigm in spite of what I perceive to be the convincing exposition in my prior posts, then please feel free of course to discuss in a different thread. I may occasionally read there. If you make your best arguments there (not in this thread) and if I see anything I agree with, then I will quote it over here (and invite/entice you to discuss here). Please do not pollute my thread (at least respect the one post for every 10 others request, or 6 if all the intervening posts are lengthy and infrequent) with what I currently perceive to be off-topic discussion. I do not want an interactive chat style discussion in this thread. We can chat back-and-forth realtime in private or another thread. I wanted focused, carefully thought out discussion in this thread that is easy to re-read and condense enough for others to follow. Because this is my thread wherein I am trying to sort of what language paradigms I need and this is a very serious issue I am in a rush to complete and I do not want the distraction of what I perceive to be a boisterous (repetitively arguing the same points over and over) dinosaur and entirely incorrect.
Essentially you’re trying to route me away from extensive sources to your pet project. That is very myopic.
You’re apparently a prolific and very productive programmer (unlike my pitiful self reduced to rubble because still apparently suffering from a perhaps resistant strain of Tuberculosis). You and I aren’t able to collaborate much because you and I are ostensibly headed in different directions w.r.t. programming paradigms, while maintaining very low overhead on throughput.
It “should” only do so in a balance of throughput, asymptotics, pause latency, and (time & space) locality. If you’re preferring priorities of time locality (immediacy), then not only are you attempting the impossible of defying the non-determinism of memory mgmt (i.e. immediacy is irrelevant when the period of allocation is non-deterministic), you’re also not optimizing the entire equation for memory mgmt. This sort of myopia might not manifest as measured problems in your handcrafted project due to the confirmation bias of handcrafted design decisions, but may in more diverse and general ecosystem of use cases and developers. Your exposition has the hallmarks of an amateur lacking academic study.
I wrote already in this thread that new research has drastically improved this issue:
You make me repeat myself because you don’t assimilate all the details I write. Presumably you’re so glued to your mental model of programming that you don’t notice subtleties that paradigm-shift the presumptions.
Agreed, but note in my prior post with the many footnotes about GC, I cited some research (Weighted Reference Counting) that places the count in the pointers for allocation and only has to decrement the shared count on deallocation. Also cited research about employing the reference counting only for a single-bit stored in the pointer in a reference counting generation scheme. Also in an earlier post I suggested to pulling all the counts closer together in contiguous locality to minimize cache line and virtual page faults. There’s also the concept of buffering the increments and decrements to amortize.
The thing is my paradigm is the basics for high level language, which is fully working with JS/Json, can generate binary data for webGL apps, or other kind of dynamic application.
The script language can already process HTML data, including form variable, query variable, and interact 100% with js app, and i think it's still cleaner than PHP.
It can parse POST variable, including files, generate js variable and code with live variable from the node etc.
The low level mechanism is only to be seen as a minimalistic VM to run the script language, which either handle network events, or generate html page with dynamic data.
Any kind of advanced high level language will need this sort of code to manage memory, references, multi threads etc
The lexer/VM can be very simple because the low level code already have advanced feature to deal with objects, memory, scope and evaluation of objects data.
@NodixBlockchain, you’re repeating the same points I already discussed with you in the Concurrency and your “Saluations! :)” threads. I refuse to repeat my explanations again to you as to why what you think is applicable, is not applicable to what I am doing. It’s not my job to repeat myself over and over, just because you think I didn’t already refute you before in those other threads.
Please stop. Please.
Incorrect. And I do not want to explain why again.
Your paradigm for multithreading is not applicable to the direction I’m headed, as I will repeat what I wrote before quoted as follows:
I’m not trying to be a jerk. I have clear reasons for the focus that I’m pursuing. I hope you also understand my available time is very limited (and no that is not a valid reason that I should employ your framework).
Sometimes I ponder if the trolls at BCT paid you to come and disrupt me. I’m having enough difficulty as it is making forward progress, and I need to be taken off on repetitive tangents like I need another hole in my head in addition to the one I already have from a hammer.
Agreed afaik you appear to be making a valid and important point, which dovetails with my initial peek into analyses about GC for proper integration with the low-level coding.
The problem you’re experiencing perhaps has something to do with the fact that most browsers are limited to presuming a least common denominator of a 32-bit virtual address space and this gets very complicated as to how they split this up between processes. They also have to protect against writing outside of bounds efficiently. I dug a little bit into the issues (c.f. the last 2 links in quoted text below) but did not completely climb down the rabbit hole.
I had mentioned to you in private discussion my idea about storing older generation objects in same sized objects memory pages in order to minimize fragmentation with a 64-bit virtual address space. This would rely on less frequent compaction pauses for old generation objects, and would reuse old generation slots instead of waiting to compact them (i.e. not primarily a copy collector).
Also note that very large or small objects are typically handled different by GC algorithms, so there may be an issue there w.r.t. to the 16MB objects you’re allocating.
I agree that the memory allocation seems to have corner cases around the interaction of the GC in JS and the allocation of the
However, as I pointed out in prior posts, just because automated GC has a corner issue with a particular scenario, doesn’t enable the presumption that manual memory mgmt is a panacea in terms of the holistic equation of memory mgmt optimization. You gain some control in one area and lose something in other axis of the optimization space (e.g. more human-error prone, less throughput, or worse asymptotics, etc).
I’m contemplating whether I can design my own programming language which models what I want, will tolerably to transpile to JS/TypeScript (or something else) easily enough to allow me to use it now, and then rewrite the VM and compiler later to optimum. This is what I’m analysing now and the reason I’m doing research on GC.
How do you arrive at this conclusion? Have the posts I have made since you wrote that, changed your mind?
Also I’m prepared to deprecate 32-bit OSes as a viable target. I’m future directed because anyway, we’re not going to get something popularized within a couple of years at best. Afaik, even mobile is moving towards 64-bit rapidly. Microsoft Windoze may be a bit resistant, but 64-bit Home is available for those who care when they buy a machine. And anyone still running that spyware without dual booting or running Linux in a VirtualBox is really behind the curve.
Because I am good enough to get the manual memory management right, and it will use a lot less memory, which will result in less fragmentation, and less crashes and therefore a better user experience.
I am still convinced that static analysis and region based memory management is a sweet spot for high performance languages.
I think GC is useful for simple coding, but there are some big problems:
If you have a system with all three of the above problems, there is nothing you can do as a programmer within the language to solve the problem.
You can however solve problems (2) and (3) if you can avoid (1) by limiting object creation. If you can force pre-allocation of resources, and then use effects to restrict new object creation, so that the compiler will prevent you using idioms that thrash the GC then it could be acceptable. The problem is that you need all the library APIs to respect this too, and provide a way for the user to pass the result buffer into the function/procedure. As we know if you give people multiple ways to solve a problem they will probably pick the wrong way.
This is why I am interested in a language that forces people to do things the right way, so that then library APIs will always be written the correct way. Of course this might not make the language very popular, but I think it is an interesting approach to explore.
@keean I was careful to point out that I think Jon’s experience applies when trying to force FP on Rust. The benchmarks you cite were hand-coded to use the most optimum programming paradigm for each PL. So I think Jon is probably correct that Rust will not be so decisively more performant when someone is trying to use Rust as a FP language. FP typically doesn’t work well without automatic GC. I continue my stance (until proven otherwise) that Rust is an inflexible PL that forces the programmer to pay a huge price in productivity, expression, and thus readability in order to obtain its vaunted performance and safety guarantees. Some people may justify that in some use cases. I just can’t agree with those who claim that Rust is the best future PL for statically typed programming. I still posit (and hope) we can improve on Rust with the oft-mentioned Actor paradigm we’re contemplating.
In short, I agree with you and I also give some credit to Jon’s point. You are both probably correct, but you are more correct that mark-and-sweep GC is not acceptable moving forward. But Jon is probably also correct that Rust’s lifetimes are probably also not flexible enough to be acceptable moving forward. That’s why I have been looking at other paradigms for automated GC. And I am excited about the “actor epiphany” proposal I devised with your instigation. Interested to see if that will be the sweet spot that checks off all the needs:
@shelby3 I must have misunderstood what he was saying. When I read it he seemed to be implying that GC was faster than manually managed memory and used more space, both things I have heard before, but that real world experience, and benchmarks do not seem to support.
I am also interpreting what he wrote as implying that broader claim, but I realized that he is really complaining about how Rust will perform in a FP use case scenario (even if he didn’t realize or at least didn’t explicitly acknowledge that his claim is narrower).
Jon Harrop replied again.
And I'm mostly of his opinion in the following aspects:
Manual management is not zero cost, deallocation is done at runtime, not at compile time, only the decision to do it is at compile time.
Rust memory management does only cover compile time decidable deallocations by forcing a global order for its unique ptrs which constrains user flexibility.
There is another point.
Usual code written in Rust or C++ is, in my eyes, slower than code written in a good PL with GC because outperforming a world class GC is a hard task these days and incur development time and therefore causing costs.
@shelby3 I disagree about Java, we don't use it because it would cost us much more for the resources to run the software. We do use languages with GC, just not Java.
I agree that for most business process logic GC is fast enough, but I disagree with his analysis of GC being faster in general. I think it's easy to speculate and be wrong. Real examples of code that performs better with GC would be needed to make that point believable.
One of the problem of GC language is often that they also tend to use dynamic allocation much more, as Its automatic and doesnt have immediate cost in development time, but it can also easily encourage sloppy memory management and have bad cache performance. Making it concurrent might also have performance impact with write barriers and such, and stop the world can be bad for multi thread real time application. Le mer. 12 sept. 2018 à 23:11, Keean Schupke <firstname.lastname@example.org> a écrit :…
@shelby3 <https://github.com/shelby3> I disagree about Java, we don't use it because it would cost us much more for the resources to run the software. We do use languages with GC, just not Java. I agree that for most business process logic GC is fast enough, but I disagree with his analysis of GC being faster in general. I think it's easy to speculate and be wrong. Real examples of code that performs better with GC would be needed to make that point believable. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#35 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/Ab61I9ba1jzRvQZk5bOeAE5kD7lXXyUaks5uaXiTgaJpZM4OZwjF> .
@keean Jon replied again and I think he is making some valid points. I do not think he is arguing that the current GC we have in Java is more performant in every use case compared to for example Rust or C++. Rather he is arguing that manual memory management is not the best balance we can possibly achieve (for most programmers and most “usual cases”) nor is it best in every use case. He may actually be trying to make a stronger argument than my summary, but for now I will stick with the interpretation that his analysis is more nuanced than just stating that Java’s GC is faster. I think @sighoya summarized it neatly above.
@keean I do agree with your objections about automatic memory management being difficult to control and having in some cases pathological memory footprints. Also I agree with @NodixBlockchain’s concise summary of points I have made in the past. Thus I think if we will have AMM (automatic memory management), then it must address those concerns of yours. Perhaps this “actor epiphany” idea we’re exploring might be such a sweet spot in the design space.
I replied to Jon on Quora:
Speculation is easy. I want to see an example where any program is measurably faster in a GC language like Java compared to an implementation in a non-GC language. I think these kind of arguments sound good in theory, but in practice I have never seen them to be true. Rust has shown that you can have 'C' level performance with abstraction (traits/typeclasses) and no performance penalty. So what is left to explain the worse performance of other languages? You can no longer argue the higher level abstraction makes it slower.
Note: I am not arguing against automatic memory management in a mainstream language, I think it is essential.
There is a big issue though, and that is when the GC performance is poor, you are reduced to looking at tooling to try and understand why. This can be very difficult. The advantage for Rust is that lifetimes are an abstract concept that we can see in the source code and reason about, so poor memory performance becomes something we can deal with in code-reviews, rather than something we have to deal with using instrumentation and tooling whilst the program is running.
Is Jon correct to imply that Rust’s lifetimes are incompatible with tail calls? If yes, then your argument hinges on that the programmer has to program a certain way that avoids tail calls.
Now I think we can probably side-step recursion with lazy iterators and still achieve the same functional compositionality. So I am thinking Jon’s complaint about lack of tail calls may be a red-herring. Anyway apparently we can get them with Pony’s reference capabilities.
But I think he absolutely correct that we can’t have the same level of algorithmic expression flexibility with Rust lifetimes as compared to the bump-pointer heap I have proposed for the “actor epiphany”. In fact, we know we can’t express certain constructs with lifetimes because lifetimes are not smart enough to see they are safe. And that bump-pointer heap I have proposed for the “actor epiphany” will be more efficient than Rust’s lifetimes because of cache locality!! Instead Rust would have been putting all on the non-bump-pointer heap with ARC.
So you are comparing apples-to-giraffes.
Indeed you can probably devise some complex memory regions design in order to code something in Rust as fast as it would be with my/our “actor epiphany” but it’s going to be non-standard (i.e. in a custom library of yours), difficult, and even more difficult for others to understand.
Just because you can make something that is more performant because you’re a programming guru, but your productivity drops into the toilet and/or new hires can’t read your code, that is not a solution!
I stumbled onto this comment which is about an unrelated topic but analogously applies:
So why even argue for something you know we can’t possibly accept for the design of Zer0?
Who you making your PL for?
I think the key issue is how close can we get to Rust/C++ performance with this “actor epiphany”. If we can get within the noise band of difference for single-threaded, then Rust/C++ have no chance to complete at massively multi-core! Game over. Checkmate.
Ah when you need to use memory regions and other complex design then your Rust code could potentially also require you use tooling to understand what your code is actually doing. Bugs that programmers generate such as overflowing with unnecessary allocations are really the same debugging issue whether you use ARC in Rust or AMM.
I think the problem here about control is really mark-and-sweep and uncontrolled heuristics about collection intervals. So pray tell me how my “actor epiphany” idea is any worse than what you could do with Rust??? I can’t see it. I see only improvement.
@keean I think perhaps my prior comment is not 100% clear as to my stance. What I am saying is we can’t just take some optimally coded performance benchmarks as gospel and the whole truth. The actual truth is more nuanced. Although I agree with you that mark-and-sweep is difficult to control, maybe there is something better as a balance between Rust’s lifetimes and mark-and-sweep. And again, I am excited to find out whether this “actor epiphany” idea is that sweet spot or not.
@shelby The thing is people should not be implementing their own collections, they should use the ones in the library. So when I insert some data into my index map, how does GC help? I have to remember to remove the object from the index. Personally I think GC is focusing on too low a level. Memory management should be automatic, I don't want to see malloc/free or new/delete. However I expect to see a lot of insert/delete on collections. In this way I don't think lifetimes help, nor does GC, if you don't remove things when they are not needed you will leak menory.
There are no cases I know of where with attention to memory allocation you can do better with GC than without. But there are some cases where using standard best practice you do better with GC. A notable example is the
Note that the basic "C gcc" implementation, which naively mallocs/frees tree components, takes 30s (single threaded) to complete. The "Java #3" implementation is similarly straightforward and single-threaded, yet takes only 15s of CPU time (12s wall clock; GC is parallelized).
This has been my (informal) experience with code that is like
Just FYI, since you were talking about having hard data about GC vs non-GC benchmarks.
The collections library can’t call
That’s a good example of the point Jon Harrop was making and which I intuitively understood would be the case.
I think we need memory regions for Zer0 (which can employ efficient bump-pointer allocation and a single region-wide deallocation), because it will be very slow to build for example an immutable binary tree for sharing with another Actor using ARC. Can’t use the efficient Actor-local bump-pointer heap of my “actor epiphany” because that will be only for non-shared.
@keean you said you prefer control instead of mark-and-sweep heuristics, so that’s what I’m contemplating to provide.
Pony would instead give you GC automation without the regions, but there’s a performance penalty.
@shelby I liked the idea of memory regions when I looked into them.
I really like the idea of automatic memory management by default. @Ichoran 's example highlights that GC does better than naive management. We should enable beginners, and lazy people like me to have GC by default, so you get the 15s performance without trying.
Then I like the idea of being able to tune the memory handling to get the 8second version where you care about it. In most programs 80% or more of the code is not in performance critical sections, so let's have something simple and safe by default.
Then when performance is critical let's have some concept of thread priority, so we can use regions to manually take over management, but with the guarantee that the GC will clean up after the critical section if you leaked anything.
I have written up a detail new issue on the new Zer0 repository based on our up-thread discussion:
Please come on over to that GitLab issue and please comment there instead of here if you don’t mind. I put considerable effort into that post linked above as you will see.
Note I am hereby putting everyone on notice that I am ramping down and will asap discontinue posting to Github. The reason is both because I don’t entirely trust Microsoft (has anyone used that ad-malware crap they named Windows 10) given what is coming down the pike politically (quotes below from #30):
Some more links to the rapid transformation of our freedoms in the West:
And more importantly, because it is time to make a Zer0 repository and GitLab appears to be technologically better than GitHub:
You will notice the improvement immediately. For example, the markdown support for footnotes and the reddish-colored highlights for code.
I do hope @keean, @sighoya, @Ichoran, @jdh30 et al will enjoin me in ongoing discussions at the new Zer0 repository. I hope I will prove to be as tolerant as @keean has been. Keean this change is by no means motivated by wanting to take control away from you (and thanks so much for making the issues backup). I just need to have a Zer0 repository and focus my discussions for bringing Zer0 to fruition. I welcome you with open arms (and I am going to try my damnedest to keep my frustration level from boiling over).
I just noticed that the Benchmarks Games (aka Great Computer Language Shootout) has been cited several times from this thread. I'm afraid you cannot draw any strong conclusions from its results because their methodology is so flawed.
In point of fact I think @keean stated that GC'd languages are slower there. That is a reflection of the ideological beliefs of that site's owner who subjectively "de-optimises" solutions that are deemed by him to be too fast. Another problem is that most of the tasks are trivially reducible because, for example, they take trivial input, produce trivial output (e.g. a constant like the digits of pi) and compute needless data. Haskell caused havok when it was introduced because lazy evaluation meant that several benchmarks returned immediately because their intermediate results didn't need computing. More sinisterly, many other languages were doing artificially well because their compiler's optimisers would statically evaluate half of the benchmark.
Consequently, the results say little to nothing about the characteristics of the languages and their implementations being tested.
I'd also note that the original binary trees benchmark is due to Hans Boehm who, I believe, invented it as a test for his conservative Boehm GC.
@keean @jdh30 - The CLBG expressly says, itself, that the benchmarks are flawed; but there are quite a few, and the flaws are not all identical, and my experience also agrees with Keean's (i.e. CLBG basically gets it right for most stuff).
The nice thing about CLBG is that it lets you talk about something instead of waving your hands in the air. People love waving their hands in the air about their favorite language and how fast it is. But that's even less reliable, even more flawed--by a long shot--than CLBG.
So I'd love to have better, but sadly and weirdly, it seems to be about the best there is out there right now. Efforts like TechEmpower's web benchmarks suffer from two gigantic confounds: first, all the slow languages are calling routines in C and C++ to do a lot of the heavy lifting (not useful if you want to know how good the language is on its own rather than acting as glue); and second, there isn't adequate attention to produce optimized implementations of the different platforms due to the effort involved. In contrast, the CLBG benchmarks require simple algorithms where you mostly have to rely upon basic language features, and are small enough so multiple people can (and have) submitted benchmarks that try to improve on the existing scores.
There are a non-negligible number of glitches and problems (e.g. I spent a lot of time writing Scala benchmarks which all were removed when Scala was dropped--I'm pretty annoyed by that), but it's still a source of data in an area that is mostly just hot air.
I did experiments with a c runtime some time ago, to test performance of atomic reference counting vs tracing GC on binary tree algorithm, as the runtime make reference tracing easy, tracing GC was still much faster, but i would need to redo those tests because i updated memory management since i made them.
I wanted to also test cases where there is infrequent allocation/deallocation and lot of allocated regerences, so that the GC has to do tracing for nothing many times, to see how it can impact performance.
It really depends on what you are doing with the binary tree. If you know max bound of node count, then pre allocate all the nodes in a single Arena, and then deallocate the lot when you have finished with the tree. Nothing can be faster than that.
Conversely if you need eager deallocation, then freeing memory as soon as the node is removed from the tree will be optimal for memory usage.
I tried to find better benchmark, there is a suit in java supposed to test memory management performance, but didnt find much things in C. On the overall performance for page generation or such, it doesnt make a huge difference, maybe in the order of 10% faster with tracing GC. Knowing that tracing GC can incur penalty for memory shared by several threads. Le jeu. 18 oct. 2018 à 11:40, Keean Schupke <email@example.com> a écrit :…
It really depends on what you are doing with the binary tree. If you know max bound of node count, then pre allocate all the nodes in a single Arena, and then deallocate the lot when you have finished with the tree. Nothing can be faster than that. Conversely if you need eager deallocation, then freeing memory as soon as the node is removed from the tree will be optimal for memory usage. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#35 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/Ab61I0z_cESksAqGDxi-K8FxxO6otLmAks5umEyLgaJpZM4OZwjF> .
@keean "my own benchmarking agrees with those conclusions". As long as you've tested it independently that's fine, of course. I'm just highly skeptical of any conclusions drawn from the Benchmarks Game. My own results vary, e.g. http://fsharpnews.blogspot.com/2010/03/f-vs-unmanaged-c-for-parallel-numerics.html
"Conversely if you need eager deallocation, then freeing memory as soon as the node is removed from the tree will be optimal for memory usage". You may want sliding compaction.
@Ichoran "So I'd love to have better, but sadly and weirdly, it seems to be about the best there is out there right now".
You may find the following benchmarks more useful:
Ray Tracer Language Comparison http://www.ffconsultancy.com/languages/ray_tracer/
List-based n-queens Benchmark http://flyingfrogblog.blogspot.com/2010/12/towards-mark-region-gc-for-hlvm.html
Symbolic differentiation with Mathematica vs Swift vs OCaml vs F# on .NET and Mono http://flyingfrogblog.blogspot.com/2017/12/does-reference-counting-really-use-less_26.html
"multiple people can (and have) submitted benchmarks that try to improve on the existing scores."
I have contributed many different solutions, often many times faster and sometimes orders of magnitude faster only to have them rejected. My personal experience was that improvements to solutions in non-GC'd languages like C and C++ tended to be accepted whereas huge improvements to OCaml and F# were rejected.
I find it laughable that I am still credited with only C++ versions: https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/spectralnorm-gpp-6.html
"which all were removed when Scala was dropped"
Eh, no Scala?!
If you have "owning references" then you don't need reference counting with a tree. The internal links in the tree would be owning references, so when nodes are removed you just free them. In effect you do not allow any external references to outlive the owning reference (this is what Rust lifetimes are all about).
The test is not the optimal solution for the problem, but it can use enough allocation and reference tree to test the memory management performance. Le jeu. 18 oct. 2018 à 11:52, Keean Schupke <firstname.lastname@example.org> a écrit :…
If you have "owning references" then you don't need reference counting with a tree. The internal links in the tree would be owning references, so when nodes are removed you just free them. In effect you do not allow any external references to outlive the owning reference (this is what Rust lifetimes are all about). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#35 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/Ab61IzBN63UrEBWMDobK7FR_lN0lmsiIks5umE9BgaJpZM4OZwjF> .
@NodixBlockchain My point is with data structures that have an "insert" and "remove" operation we don't need memory management, because we are manually managing the collection in the first place. If we forget to remove a node from a tree then we leak the space. In other words "insert" is allocation and "remove" is deallocation. There is no need to separately manage the memory.
Yes its just to simulate condition in which there would be dynamic manipulation of an object dag for example. The operations would be similar and it can give an idea of the performance, but its not the best test possible. Le jeu. 18 oct. 2018 à 11:57, Keean Schupke <email@example.com> a écrit :…
@NodixBlockchain <https://github.com/NodixBlockchain> My point is with data structures that have an "insert" and "remove" operation we don't need memory management, because we are manually managing the collection in the first place. If we forget to remove a node from a tree then we leak the space. In other words "insert" is allocation and "remove" is deallocation. There is no need to separately manage the memory. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#35 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/Ab61I8AvVg9n-zP7_BgcUXLANrUu14ibks5umFCXgaJpZM4OZwjF> .
@jdh30 Interesting that the HLVM work points the finger at boxing for poor performance of GC languages, as its has GC but an unboxed representation and seems to perform pretty well.
If HLVM can keep that sort of comparison with C++ more widely, I would say it is good enough performance wise.