Tier 0———————————————————————————

`What problem are you trying to solve?`
Compilers translate our code into machine code.  Without an optimizer, it does this literally.  With an optimizer, it does a series of quick hacks to make the code faster.  Optimizing code has little formality or rigor to it.  This used to be necessary as to keep compilation time reasonable.  Today, this is no longer the case.

I hypothesize that it is now feasible to model code optimization as a search problem.  Given a set of hardware and a piece of code, an optimizer should be able to use search algorithms to make that code much faster.

Advances in hardware is what makes search optimization feasible.  Search optimization is computationally heavy, but now that cloud computing is commonplace, it is easy to run search algorithms.  Faster, stronger hardware makes for faster, stronger code, which in turn again makes for faster, stronger hardware.  You want to know what techniques of optimization are now feasilbe because of cloud computing.

`What is your plan?`
Find a fun job, and do this in your spare time:
Year 1: Learn Klee, and figure out how to utilize it.
Year 2: Learn cloud and distributed computing.
Year 3 (option 1): If you can find a compiler tool that automatically distributes code to cloud systems, learn it, and contribute to it. 
Year 3 (option 2): If there's nothing like what you want, retool LLVM for distributed compilation and more Klee integration.  Use cloud for super semantic analysis to deduce all you can in order to make the most optimizations.
Year 4: Sweet graphics.

Tier 1———————————————————————————

You need to say more on the automatic theorem proving front.  Before, you could only do the simplest of optimizations.  Ones that were obvious enough that they didn’t need a formal proof in order to see that the code was equivalent.  Even then, sometimes you would get different behavior.  Now with automatic theorem proving, you can have wildly different code, since you can prove automatically that it’s equivalent.  This is what truly enables the next wave of optimizations.

Also need to find where this fits:  You might take the stance that GPUs will be just one of many ASICs in the near future, and that optimizing specifically for GPUs is overly myopic, and not widely applicable.  But GPUs aren't just for graphics.  A GPU is basically a linear system processing unit.  Optimizing for GPUs isn't just interesting, or nice to have.  It is becoming clear that linear algebra is essential to heavy computation, and that means GPUs are essential to heavy computation.  Big data is only going to get bigger.  Quote Gilbert Strang here?

Right now, I'm getting the feeling that your problem statement isn't going to work.  What you want right now is 'code that is simple like python, but just as fast as C.'  Consider that code paradigms like multithreading and manual garbage collection serve the purpose of taking the load off of the compiler and putting it onto the user.  What happens when it turns out there's some other paradigm that would be better used explicitly?  Like maybe something to do with passing functions as parameters.  Maybe that would be better left unused.  So you have to choose here, again.  It's not a choice between 'should I make the user do it explicitly, or should I make the compiler do it?'  It's a question of 'should I make the user do it explicitly, or should I let the code run slower?'  An example of this is with type inference.  You usually get the same speed up as explicit typing, but not quite.  There's probably a simple example out there.  This language is something that you have a general idea of what you want.  But instead of coming up with a solution to something that's not really a problem, you need to figure out the right question to ask.  Yes, you want to make optimizations, but to what exactly do you want to apply those optimizations?  Here's a possible question:  
Certain code structures are better explicit, and certain code structures are better implicit.  Garbage collection is easier when made implicit.  Typing is better implicit.  Parallelism is complicated.  Numpy is an explicit form of parallelism, but it is conceptually simpler when the user does it explicitly.  Multithreading with stuff like C's \_exec is better left implicit.  What code structures make it easier on the user, and what code structures make it harder on the user?  What exactly is a code structure?

Here's another possible question:  
What exactly is a type?  Do we need explicit types?  Let's say we have an array of ints, A.  Let's also say that they're asteroid weights in kg.  We don't know exactly what the asteroids size is, but we know it'll be greater than 10kg.  Would it help if instead of declaring x to be an int, we declared x > 10?  Would that make the code run faster?  This is probably a bad example, but you get what I mean.  It's highly related to the previous question.  What parts of a problem should the user make explicit, and what can be left off?  I guess another way to say this is:  what if instead of figuring out what type a variable is, we just tried to figure out what operations we could perform on that variable?  I guess you could say every variable has its own custom type.  Most would fall into the standard string / int /float paradigm, but perhaps some could be restricted further.  Would this be helpful?  Would looking at things this way make any difference?

A third, more meta question:
You want to make optimizations.  You want code running on all the CPUs, and all the GPUs.  What kind of things would you make, that you would definitely be satisfied with making?

One of the Julia founders got their PhD from Santa Barbara.  It really seems like you could get a nice, easy job if you become a PhD student.  Then it's that easy time as a student where you can make the thing you care about.

https://www.ibm.com/developerworks/community/blogs/jfp/entry/A_Comparison_Of_C_Julia_Python_Numba_Cython_Scipy_and_BLAS_on_LU_Factorization?lang=en
Look at this speed comparison of python numba and C.  Would it be possible to just compile a whole bunch of python code?  You could compile every python library.  Wouldn't it be great to not have to rewrite all those libraries?  Of course, numba has a bunch of extra decorators and stuff in it.  That seems kind of annoying.  But just imagine all the stuff you wouldn't have to do.
Ok, wait wait wait.  Now I'm having second thoughts again.  Numba.  Cython.  SciPy.  Numpy.  All of these things are building parallel to each other.  Again, you need to think about how you can synthesize all of these.  There must be 1 right answer.  How do you tell what it is?  This might be solved by Stoke.  The problem is 'oh, this code thing doesn't do this 1 specific task in the exact way that optimizes it the best, so I better write a whole other thing just like it.'  Stoke would allow all of these very different things to be compiled to basically the same source code.  You need to turn human-hours into computer-hours.  That might be the question.  How do you turn human-hours into computer-hours?  Hm... back to a Stoke-centered idea.

Remember that 'kolmogorov' thing you read about once?  I think it might actually be important.  It's like, it's impossible to determine if there's a better solution out there.  Or something to that effect.

Also, Julia isn't actually JIT compiled, I don't think.  It's just a word they're using for whatever reason.  

`Help`
'Usually, if using STOKE for optimization, starting with a better program often results in a better program'.  I thought this was pretty much a done deal.  So you've been wasting all your time thinking about things that don't matter yet because STOKE is still in its infancy.  No, it's still a fetus.  Oh well, this will be more fun that creating your own language.  Remember the basic principle.  Optimization modeled as a search problem is feasible now that cloud computing is a thing.  You also might want to throw in that it will also allow for easy optimization on arbitrary hardware as long as you can somehow specify to the search algorithm what the hardware is.

`What do you want to do?`
You want a language that goes as fast as possible.
How are you going to achieve that?
The cloud allows anyone (with money) to easily access parallel computers.
This means that all programs, compilers included, can be much more powerful.
Things like stoke and klee can take advantage of this.
Furthermore, stoke can take advantage of different hardware architectures.
The cloud offers a lot of powerful architecture that you could not get otherwise.
Therefore, stoke is not just optimized for parallelism, but also for varied hardware architecture.
Compute clouds offer a lot of varied hardware architecture.
It has never been easier to create a programming language.
Moreover, there is no language that has been specifically optimized for the cloud.
This is what you want to create.  A programming language, or more generally, a computing environment in which scientists are able to take full advantage of the cloud, both in terms of compilation time and in terms of the runtime code that it can produce.

`Why do you need your own language?  LLVM optimizers work for any language, so why do you need your own front end too?`
Because the smaller your language is, the easier it is to optimize.  Normal optimizers use a lot of heuristics on the language, but Stoke would basically do away with this.  All you have to do is specify your language to Stoke, and it will be able to optimize it.  All of those man-hours poured into C and the like are no longer an impossible hurdle to get over.  You can now have code as fast as C without having to use a bunch of guess-like optimization techniques.  So you’re no longer bound to the cumbersome syntax of C either.  Type inference, functions as data, easy vector operations, no need to adhere to 40 years worth of backwards compatibility.  With Stoke, making a new language doesn’t mean you have to sacrifice runtime efficiency.

Basically, having your own frontend language will let you know the search space exactly.  If you use C, there may be code structures that your optimizer will not be able to work around.

`What kinds of optimizations do you actually want to look into?`
Stoke, as it currently is, as well as applied to GPUs and other ASICs
Automatic parallelization in terms of threading and multiple CPUs/GPUs
Polly, which does data locality optimizations.
Using SMT/SAT solvers for general optimizations.
Using software like Klee, Cyclone, CCured, and SAFECode to automatically catch even more errors.
Runtime optimizations.  Not really sure how the cloud can help here, but big data programs by definition take in a lot of varied data, and there is no other way to optimize for that other than at runtime.

Leo said when I asked about the guy at apple:  “He wasn’t on LLVM, but he was using an SMT solver, not sure which one.  But basically yeah searching for a sequence of assembly instructions than can do the same thing as another sequence but in fewer steps”.  Hm… Unrelated, Leo also said he had to use Lex and Yacc at Apple.  Why didn’t they use Antlr?  Much easier, basically the same license.  They use LLVM, which uses a pretty much identical license.

`Who is this for?`
Scientific programmers who want to perform massive computations.
Of particular interest is machine learning, which currently uses many heuristics, not only to get better answers but also to speed up the arrival at those answers.
...Cryptocurrency miners.
Perhaps it won’t be the main language that scientific programmers use, but maybe it could be used to create libraries that these languages could call.

`So how are you actually going to do it?`
Right now, you need to make a toy language.  Get a feel for the whole thing.
Then, look further into the different tools you want to use / work on.  LLVM, Stoke, Klee, and the like.  
If for some reason you want to make your own full language specification, take a deep dive into Antlr.
Can’t use Flex and Bison, those are GPL licensed.
It’s important that we stick with BSD-like licenses.
Remember Apple switch from gcc to clang because of the licensing.

`How is this any different from Julia?  Go?  C?  The millions of other languages?`
This language will be designed specifically for compiling and running on massive parallel computers.  It will attempt to be the fastest language possible.

Julia was designed as ‘a fast scripting language’.  It has weird syntax.  Also, it's not that fast.
Go was designed as a 'systems language'.  Not explicitly for fast computations.  Yes, it was designed for concurrency, but for explicit concurrency, as opposed to automatic.
C has garbage syntax and too much to worry about in terms of backwards compatibility to get much faster.

It won't go faster than all other languages in all instances.  But maybe if it goes faster than all languages in just a few instances, then it's a success.  You could make some libraries that could be called from python, and people would get some use out of it, even if they don't use it directly.

Tier 2———————————————————————————

`Why pay any attention to the syntax at all?  Why not stick purely to an optimizer that can take in any kind of language?`
Because the more features your language has, the harder the search problem is.
More features means potentially slower as well.
Read the below link that talks about how tuples are much faster than lists in python. https://stackoverflow.com/questions/2174124/why-do-we-need-tuples-in-python-or-any-immutable-data-type
The smaller your programming language, the smaller the surface area is that you need to optimize.

https://medium.com/@simplyianm/why-gos-structs-are-superior-to-class-based-inheritance-b661ba897c67
Skip to ‘the fragile base class problem’.  I think this explanation is good.

`Writing your own programming language is pointless.  There are so many out there, and they all fail.`
That may be true, but as long as you put most of your effort into established projects like LLVM, Stoke, Klee, and the like, your work won't be wasted.

https://mortoray.com/2018/08/07/sadly-i-must-say-goodbye-to-leaf-my-programming-language/
This guy spent years of his life on this language called leaf, and is now abandoning it.  You can learn from his mistakes.

https://www.ponylang.org/discover/#what-is-pony
A language called Pony.  “In Pony, performance is the most important thing besides correctness.”
https://news.ycombinator.com/item?id=9482483
Rust’s creator doesn’t like it.

https://www.reddit.com/r/rust/comments/7qels2/i_wonder_why_graydon_hoare_the_author_of_rust/
This is from the original creator of rust.  He burned out for a while.

`There are so many different things you want to do.  How are you supposed to do all of this stuff?`
For now, it’s just about exploring.  You won’t be able to do all of this stuff.  But you do want to explore it, and that’s what matters.

Also, to reiterate, this is not a general purpose language.  Since it has fewer concerns than C/C++, it can be faster, and with fewer bugs.

`What exactly do all the code correctness projects do for you?  Things like klee?`
Besides making it easier to debug, it would allow your language to be based on much less safe constructs, which means they can be faster.  In other languages, they're afraid to implement 'unsafe' constructs because in the past they would lead to very difficult bugs.  But now you can have these unsafe constructs and be comfortable in the fact that they will be caught at compile time.  The cloud will also allow you to do more powerful automated testing.  Your general idea is that the power of the cloud unlocks a whole host of new opportunities for simplifying and improving the user experience with compiled languages.

Tier 3———————————————————————————

Google 'skiplang'.  It's a programming language made by Facebook to 'memoize stuf in the cache' or whatever.  Pretty much built just to go faster or something.

Creator of Apache Mesos (often used with Spark): "We wanted people to be able to program for the data center just like they program for their laptop"

https://mesosphere.com/blog/docker-vs-kubernetes-vs-apache-mesos/
Ok, wtf.  Why is mesos being compared with docker and kubernetes?  They're similar?  I didn't know that.

So now you're back to the whole distributed systems thing, except this time you're thinking 'I'll contribute to an existing tool if possible, rather than writing my own language'.  That's all well and good, but what if it so happens that the language itself plays a huge part in how jobs get distributed?  If that's the case, maybe you could ask on Reddit 'what features of Python are absolutely necessary?' and then create a distributed compiler based on only the features that are needed, not the ones that are just cool.  The way code describes different jobs is definitely a big part of this.  I don't think you should design your own language if you can help it, but creating a tool for a 'subset' of a language is never helpful.  Nobody wants to use a 'subset' of a language.  They want to use their whole language, or if that's not possible, just use some other language.  But having to remember 'oh, I forget I can't use feature X Y Z here' is just too much for a developer to deal with.  Perhaps Scala can help you out here.  Yes, it's based on the JVM, but I think there are open source implementations of the JVM.  What big companies support the open source portion of the JVM?  Who runs it?  Is it well run?  Damn, now you might have to choose between supporting whatever JVM and LLVM.  If it already works out fine, you could just skip to making some graphics libraries, or finding some substitute for Klee, or whatever.

https://www.embeddedrelated.com/showarticle/195.php
Gah, this makes FPGAs sound cool.  And also simpler than what we currently have with dedicated whatever.  Hm...what is easier?  Programming an FPGA, or accepting CPUs and GPUs and working from those?  Remember that you want _simple_ abstraction.  What is easier to do a simple abstraction for?  It doesn't matter how smart you are, you need to be able to make something that other people are able to use.  People who aren't hardware experts.  Which of these things will allow you to abstract it, for yourself, and for users?  Remember that the basic problem of distributed systems is "given this code, how do you pull it apart into independent jobs?"  That's easy to abstract.  You don't need to know anything about hardware.  But what about in a heterogeneous system?  "Given this code, how do you pull it apart into jobs such that each job goes to the piece of hardware that can do it best?"

Another important question about FPGAs that you don't want to think about because it's complicated: Can an FPGA reprogram itself fast enough to tailor itself to each program on the fly?  If I have program X, and it does a lot of multiplications and a lot of memory lookups, can I run a program that, before anything starts executing, it first tells the FPGA how to configure itself for max performance?  Could you write a compiler that tells the FPGA how to configure itself?  What would it even mean to translate your program into machine code for an FPGA?  Most people are thinking 'configure the FPGA, then translate the program into something that will run on the FPGA'.  But if there's no set 'language' of the FPGA, that's a relaxation of a constraint that your code translation must adhere to.  If there's fewer constraints, it's easier to translate your code, and it's easier to optimize your code.

WAIT.  If an FPGA is configurable, does that mean it has no set instruction set?  Which means you could basically define your own instruction set?  Which means you wouldn't have to go through some shitty proprietary API like Cuda in order to get maximum performance?  You wouldn't need to figure out how to use x86 intrinsics or blah blah blah.  None of that.  All software defined.

Maybe instead of trying to predict where hardware is going to go, you try and figure out the parts that will work no matter which direction hardware goes.  Hedge your bets.  Like Jeff Bezos.  And MapReduce, and Spark.  But this thing will have to be more general than MapReduce or Spark.  It can't just be 'big data'.  It has to be any type of code.  Yeah, there's no way you can predict where hardware will go and tailor a solution toward that type of hardware.  You just need to figure out how to easily distribute jobs.  Then you need to figure out how you can take processor A and processor B, and ask them 'what are you good at?'  The processor will respond with its specs or something, and then you can distribute jobs based on that.  So leave that part up to the hardware people.  Is this 'serverless' stuff?  As in you just run your program without caring about how many nodes you have?

Ok, I think I have a good idea of how to tell if something is 'low level' or not.  You have a person run through your code.  If the person can perform the code one step at a time by hand (as is the case for most of python), that's significantly 'high level'.  If there's a bunch of lines that the person will just skip over (like malloc and free for example) then that's lower level.  You want to think about the physical machine you are using as little as possible.  I really feel like Python hit the nail on the head. If you ever think 'maybe a little more other stuff wouldn't be so bad....' just program some more in C.  You'll be over it. 

Julia uses LLVM.  What if you took Julia and just modified it however you wanted?  Then if it's better, you tell everyone that it's better, and if not, you'll know why Julia is the way it is.  Like why it uses the fucking end keyword.

Google 'what are the difficulties in writing an automatically parallelizing compiler?'  I think that should give you a good idea of what's going on.

Ok, you keep going back and forth on this topic but:  Speed vs simplicity; what are you holding constant, what are you trying to make better?  In your head, you have this dumb feeling of 'python with C-like speed.  Not almost C.  C speed'.  Well, that's not going to happen.  So what are you going to do?
Mostly unrelated thought.  You have this itch.  You say 'I want to go fast', but is that really what you mean?  Do you actually mean 'I want to go optimal'?  LLVM takes an intermediate code representation as its input.  This means all your thoughts on 'what should this language look like?' and 'what is the best way to optimize code?'  are more decoupled than you're thinking.  Let's say you had a python-like language that got compiled through some part of LLVM.  Given this python-like language, it was able to utilize the CPU and GPU and whatever else to its fullest potential.  I would be satisfied by that.  It's not as fast as C, but it's close.  I think I would be satisfied by this.  You want a single point of reference, LLVM IR, where you can say 'if I translate my language into LLVM IR code, and run the most basic optimization option, I can rest assured knowing that all known code speed up techniques are being adequitely utilized'.  Perhaps really all you want is a unified suite of compiler technologies; not duck-taped together, but as one seamless whole.  Seamless.  Like.... well shit, like MacOS, for lack of a better comparison.  A unification of all these tools that won't make your head spin.  A debugger that's easy.  A compiler that prints helpful error messages.  Assurance that you're using all the compute power you have.  That there isn't some switch that you forgot to flip that would result in a 1000x speed up, but oops, you already released a completed product and now you'll have to rejigger everything because you didn't take this one switch flip into account.  You want assurance that you're using this tool to its full abilities, and that this tool is using its hardware to its full abilities.  I feel like figuring out LLVM would scratch most of that itch.
You want perfect inference.  What is inference?  Given a certain amount of information, the inferred information is all the information you can perfectly deduce out of the pieces that you're given.  Again, super semantic analysis.  Again, I don't know what I'm talking about.  But if you figure out a way to either describe or modify the LLVM compiler suite to make it simple (and give you that peace of mind), then I think your itch will be scratched.
The itch that you need scratched is 'is this compiler doing everything it possibly can?'
To do everything it possibly can, it needs to utilize all of the information it is given, and all the information it can infer from its given information.  All of its information isn't just the code its given.  It's the hardware that it's running on.  It's the input that it will eventually receive.  Peace of mind.  That's what you want.  

Look at the VHDL wikipedia page.  It 'borrows heavily from the Ada programming language'.  It's a hardware description language.  I think this and Verilog are the languages that are 'perfectly parallelizable' or something, because they describe hardware.

I think more than anything else, a language needs support in order to be good.  Python has a ton of support.  How do you generate support?  You need hype.  How do you generate hype?  Make something really flashy with your programming language.  All these other languages try to generate support first.  But that requires supporters to take a big leap of faith.  If you prove that your language can do really cool stuff, then supporters don't need to have faith that your language will do cool stuff, because it already does cool stuff.

Did you know that Spark is a language?  It's not some tool for blah blah blah.  It's a language.  And Scala is supposed to interface with it somehow?  Also, if you look at the wikipedia article for Automated theorem proving, there's a link to Spark.  Why?  wait... nevermind.  This is an entirely different thing from Apache Spark.  They have nothing to do with each other.  On Ada Spark's wikipedia entry, it says something about Ada having 'unrestricted parallel tasking' which is potentially bad.  What does that mean?

Another simple question, probably asked before: If safe means slow, and unsafe means fast, could integrating an automatic theorem prover like Klee alleviate that unsafeness?  Could you get unsafe performance without too much of the annoying debugging?

How does multithreading work if you call a library that multithreads, then you also call your own multithreading stuff?  Then that could be too many jobs.  Why split at the library level when you can split at a higher level?  Would it matter?  Multithreading is a DAG, right?  Some jobs run in parallel, some jobs depend on others.  Does it matter if you split or not?

https://arxiv.org/pdf/1809.02161.pdf
A paper that you definitely need to read.
There's a few referenced papers that talk about how compiler research isn't relevant, and how it's just not worth doing, etc etc.  Definitely read those.
Maybe making all code faster isn't that great.  But maintaining the speed of code while making it less verbose is great.  Maybe these studies and statistics are looking at the wrong thing?  Or maybe it's because you just haven't read what they're talking about very closely.

https://en.wikipedia.org/wiki/Python_(programming_language)
Look at ‘unsupported implementations’.  It’s just too hard to make python  fast.  There have been so many attempts.


If you look at polly, you’ll see that it ‘finds’ things in the LLVM IR code.  You need a thing that is able to comb through a lot of code, and infer all information that is possible from the code.  Then you can apply it to different stuff like Stoke and Polly and Klee.  What matters is that all the information is present.  Semantic analysis in overdrive.
For some reason, I feel like this could help you with the ASIC/FPGA thing.  Perhaps attempting to solve that more general problem would actually be easier than attempting to solve the GPU problem.
Again about FPGAs.  It's kind of weird to think about.  If a compiler translates human readable code into machine code, and an FPGA basically has programmable machine code, what exactly are you translating to?  How is it decided what instructional pathways or whatever go into an FPGA?  If you can make something that will optimize for any instruction set, then I think it would be easy to optimize for any specific instruction set.  Have a set of rules to follow, rather than guessing through the solution each time, right?  But what do you know about hardware?  Nothin.... maybe to learn you could play that Shenzhen.io game.  Haha.  But for real.

What if you thought of ‘your’ language as a ‘subset of C’?  As in, you make a language that by design can only do a subset of the things that C can do.  Then, anything you write in this language gets translated to C.  I mean a subset of C in the sense that there’s pretty much a 1-to-1 correspondence with C.  So anything you write in this language will be guaranteed as fast as C.
I get the feeling that ‘inference’ is a very important thing in making programming languages simpler.  If you can infer certain information with 100% certainty, that means that the user will also have inferred that same information.  So having the user type out that information is just annoying.  What exactly is inference?  What is a more formal definition in terms of programming languages?  What can it be extended to outside of types?  Remember that inference in your Stanford class used formal proofs.  Klee uses formal proofs too.
Given a program, what can be stripped out entirely while still maintaining the same amount of informational accuracy?


Max runtime and min compile time.  What do you do when you have 2 overlapping optimization problems?  I guess one thing you could do would be to reformulate the problem, if possible.  Something like ‘maximize language simplicity.’  I feel like that’s actually a better goal than ‘maximize code speed’.  Because that can branch off in many different paths.  But it’s pretty easy to tell when a language is simple or not.  I feel like you said literally the exact opposite of this somewhere else.  Or perhaps you could consider ‘language simplicity’ to already be solved.  I mean, Python and pseudocode look really, really similar most of the time.  So, given this set of very simple building blocks, how do you solve the problem of translating it to machine code that is as efficient as possible?

Just because Julia doesn’t use LLVM now doesn’t mean it can’t in the future.  C++ didn’t use LLVM before.  It does now.  For Clang, at least.

You know, you’ve been thinking about Julia and Go a lot.  But you know what you’ve kind of not been thinking of?  Python.  You need to read a lot more on why Numpy, Scipy, Pypy, and Numba are or aren’t good enough.
https://discourse.julialang.org/t/julia-motivation-why-werent-numpy-scipy-numba-good-enough/2236
Honestly, why wipe the slate clean?  If you actually look at basic Julia and Python notation, the actual syntax is the same.  Why start over?  How does that help?  You can learn a lot about how to help through the failures of these libraries.  Wait.  These libraries haven’t failed.  I know for a fact numpy and scipy are going strong.  I don’t know about Numba.  Maybe look at why Cython and those others failed.  You know you want to do stuff with Klee and Stoke and whatever.  But who are you going to do it for?
https://github.com/dropbox/pyston
This sounds a lot like what you want to do.  Python.  LLVM.  Open Source.  Sponsored by a Fortune 500 Company, Dropbox.  Failed.  Guido was at Dropbox….
https://www.theregister.co.uk/2018/07/13/python_creator_guido_van_rossum_quits/
Fuck.
Also, look at what goes into a PEP.  There’s a lot of thought put into these.
One of the complaints about Pyston was that Python is ‘just too dynamic’ of a language.  Hm….
http://blog.kevmod.com/2017/02/personal-thoughts-about-pystons-outcome/
Look at the below comment:
‘I have the strong feeling they didn’t spend a lot of time investigating Pypy. It feels like they read a few things about it and called it a day.
This whole work scream the typical attitude of geeks that want to play at creating something, and rationalize it after the fact. I understand that. I do it all the time. But let’s not lie.’
This is totally you.  You need to be very careful.  Learn about all of the failed attempts.  This is a scary thought, though.  You could devote years of your life to a project that might yield absolutely nothing.  How do you avoid this?  By contributing to an already existing project.  A project like LLVM.  Everyone uses it.  You know what you work on will get used.  Things like ‘automatic parallelization’ and blah blah have been tried before.  With our current tools, it just doesn’t work.  You need to look at the cutting edge.  New tools that haven’t been used before.  Tools like Klee, Polly, and Stoke.  Doing things the same way they’ve always been done is not how new, better things are made.
Also, you need constant sanity checks.  Everything you’re thinking about, you need to talk to other people about it.  Pretty much no one does that.  Well, Linus Torvalds did it.  Look at how successful Linux is.  Just talk to people, online and in person.


What if you could have ‘interactive’ klee?  Something like in a jupyter notebook?  You could run klee on a single function, like this:
Def mean(x):
	Return sum(x) / len(x)
You could then run Klee on this function, and Klee determine when the function would be properly defined.  For mean, it would spit out:
“Mean will work properly if:
X is a list
X has at least 1 element
The values of X are numeric (ints, floats, doubles, etc)”

Completely different thing:  there are ‘automatic theorem provers’ for mathematics.  I don’t know how far along they are.  But what if you just made a program that automatically generated queries, or theorems, and ran them through the automatic theorem prover?  I feel like you’d get a bunch of garbage, probably, but perhaps it’s possible you could get something new.


You’re nervous that anything you do will be for nothing.  That your efforts will be wasted.  If you document everything you do very, very clearly, your efforts will not be wasted.  At the absolute minimum, you will have a paper showing the current problems that you are facing, and why you aren’t able to overcome them right now.  Then either you or someone else can keep going from there.  But the most important thing is to very clearly document; to close the distance for other people to continue the fight.  Also, you’ll know what tools you can use to make code really, really fast.

https://en.wikipedia.org/wiki/Superoptimization
Check out these other super optimizers like Stoke.  One of them, called ‘souper’ also uses LLVM.

I feel like much of the reason why code is bad is because of legacy.  It gets written, it’s bad, but it works, and now people use it.  Breaking it means breaking all of your users code.  You don’t want that.  It’s a lot of politics.  Changing it is high risk, low (very intangible) reward.  How do you make it easy to change your codebase without breaking everyone’s code?  I would say perhaps make everyone use Docker.  You absolutely need to make it so that your codebase is always changeable.  Right now, if you want to make a tool that everyone can use, you need to get it right on the first try.  If you don’t, best case scenario people use it, and then you can’t change it, then it gets abandoned.  Worst case, no one uses it, and it gets abandoned immediately.  Actually, maybe those cases should be reversed.


Strange thing. It's been said that it's more important to be able to ask the right questions than to give the right answers. That's kind of true in convex optimization. The hard part isn't solving these problems. That's mechanical. The hard part is setting them up. A kind of related idea is that of Prolog, and of logical programming languages in general.

What if 'type inference' could be extended to other things?  Say we had a function 'searchForX', which takes in an array of numbers, searches for X, and returns it.  Now, Let's say the compiler can somehow figure out 'searchForX just wants me to return the index of X.  It doesn't matter how I do it.'  Now, let's say that the array is random.  In that case, you would just iterate through the array.  O(n) time.  But let's say that on some calls of the array function, the compiler knows that it's sorted.  Then, rather than just searching, it could use a function it made called 'binarySearchForX'.  So it used the additional information that it knew about the parameter being passed in to decide what function to use.  So even though the compiler is using a different function, it's able to do so because it has proven that on that specific input, 'searchForX' and 'binarySearchForX' are equivalent.  It's all about proving equivalency under the specific conditions of the problem at hand.
Here's a difficult question: look at that one graph question you did on hackerrank.  Remember how you had all that code that was taking the difference of sets, and how it could all be replaced by that one line?  Could a compiler figure that out?  I highly doubt it.  Maybe stoke could figure it out if it was lucky.
What if you used Stoke to find new algorithms?

Here's a question that you've thought of but never explicitly stated:  What is the difference between the search space of the high level user code, and the generated machine code?  When should the structure of user level code be taken into account when trying to optimize?  When should it not?  If the compiler changes code, what is the difference between changing it at the high user level, and the low machine level?

https://www.quora.com/What-are-the-horrors-of-Java

Say you were trying to compile this program using type inference:

def fib(n):
    if n <= 1:
        return n
    return fib(n-1) + fib(n-2)
    
If the compiler decided that n was a 

https://discourse.julialang.org/t/notes-on-the-julia-compiler-jit-vs-static/4275
I don't know what to believe anymore.  I don't know anything.

http://beza1e1.tuxen.de/articles/faster_than_C.html
This tells you how to beat C.  It talks about template metaprogramming and JIT compilation.
http://beza1e1.tuxen.de/articles/proglang_mistakes.html
Same person, talking about common mistakes in designing a programming language.

https://julialang.org/blog/2017/12/ml&pl
'Any sufficiently complicated machine learning system contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of a programming language.'  Basically explains that TensorFlow is its own language, etc etc.  Also explains that with current languages, machine learning isn't nearly as optimal as it could be.  Talks about array-like type systems.  Also talks about 'end to end compiler stacks for deep learning'.

http://fsl.cs.illinois.edu/images/e/ef/P157-landin.pdf
'The next 700 programming languages'.


What is the difference between actually understanding something vs memorizing it?  Consider regular expressions.  Most people don’t know how they are actually implemented in code.  But I bet there are people who will use regex their whole lives and understand it intuitively without knowing how it works under the hood.  What is the difference between this and someone using pytorch to do machine learning stuff, and having no idea what they’re doing?  Come to think of it, you don’t really know how a computer really adds 2 numbers together, do you?  But why don’t you feel the need to scratch that itch?  It’s because you know how to do addition by hand.  If asked to, you could match a string to a regex expression in your head using a known procedure.  If I see a \*, I can match as many as I want.  If I see a |, I match one or the other.  But machine learning is complex.  If asked to do machine learning by hand, there is no way in hell you could do most of the coding stuff by hand.  So you don’t need to know how a computer actually does things.  All you need to know is how a human could do them by hand.  Then leave it up to the compiler writer to actually understand how a computer does it.  For all the nay-sayers, do you know how a calculator puts together numbers?  Do you fault every small business owner, accountant, and scientist who uses a calculator without knowing how it works?

https://en.wikipedia.org/wiki/History_of_Python
Very interesting.  Explains stuff like the GIL (python was originally designed for a weird operating system called Amoeba).

https://www.reddit.com/r/rust/comments/55k577/rust_compilation_times_compared_to_c_d_go_pascal/
This seems to assert that Rust can be compiled very, very fast.

How to decide on software tools:  whichever is most popular.  If a google search doesn’t help, see if you can collect analytics yourself.  How many contributors does the project have?  How many pull requests?  How compatible is it with other stuff?  The compatibility thing can be found on wikipedia through charts.

Metaprogramming and template metaprogramming.  “Moves computations from runtime to compile time”.  Also look at ‘convention over configuration’ on wikipedia.  Linked to on the metaprogramming page.

Reflection:  another interesting concept.  But looking at it, it makes me think that this is the prototypical example of a feature that your language should not implement.  These types of features (of which I’m having a hard time describing their commonalities) are things that increase the linguistic power of the language; not the computational power.

I know this is supposed to be about automatic parallelization, but I feel like a really easy way to implement locking would be to just do something like this:

	Regular code

	LOCK:
		Synced code
		Synced code
		Synced code

	Regular code

This is better than lock.acquire() and lock.release() because you never have to search for where the lock is actually happening.  The indentation makes it obvious.  Of course, in C you could just do lock.acquire and lock.release, then indent every line between them since C doesn’t care about whitespace.  But nobody does that because it would look weird and generate wtfs from readers.

Spark is basically for Scala.  You should look at Scala.

https://stackoverflow.com/questions/6319086/are-gcc-and-clang-parsers-really-handwritten?noredirect=1&lq=1
Even C is ambiguous.

https://softwareengineering.stackexchange.com/questions/23718/whats-the-most-used-programming-language-in-high-performance-computing-and-why
Found when googling ‘best programming language for high performance computing’.  Makes me think that a runtime optimizer is definitely important.  Is it possible if you’re not using an interpreter/virtual machine?

https://lmax-exchange.github.io/disruptor/
Paul says this is a really good white paper on ‘disruptors’, whatever those are.  Something to do with inter-process communication or something.  He says disruptors have been around for a while, but this is just a good white paper on them.

https://www.reddit.com/r/programming/comments/6z6fgz/how_did_python_become_a_data_science_powerhouse/

‘A performance comparison of container-based technologies for the Cloud’.  
https://www.sciencedirect.com/science/article/pii/S0167739X16303041

Wait...I remember a GoDoc saying something like it doesn’t allow implicit type coercion.  But then you have stuff like 120 * Time.minute.  Isn’t that coercion?

http://llvm.org/
See the OpenMP thing?  What does the word ‘runtime’ even mean?  Also if you look at the Go faq they use the word 'runtime' too, and even explain that their definition is different from other people's definition.

http://www.zverovich.net/2016/05/13/giving-up-on-julia.html
People say ‘oh, I don’t like Julia because it’s not as fast as they advertise’, but just look at how this person is benchmarking Julia.  Hello World?  Come on.  Of course Julia is going to be slow for tiny insignificant programs like this since it’s JIT compiled.

Ok, so Julia is compiled just-in-time.
https://agilescientific.com/blog/2014/9/4/julia-in-a-nutshell.html
Also, notice that they say something like ‘metaprogramming can make julia faster than fortran’.  Faster to code, or faster to execute?  I’m pretty sure they mean faster to code, but would it have an effect on runtime performance?

Rust used OCaml to implement its first compiler, then it was bootstrapped.

Professor Aiken said Stoke only does work on basic blocks of assembly code.  Why not higher level as well?  Yeah, the search space is larger (or maybe its equivalent), but maybe from this angle you could find more 'islands' of viable solutions.

No Country For Old Men:  ‘You pick the one right tool for the job.’  Or something to that effect.

Google 'undefined behavior' in C.  It's not laziness.  I think it's actually an important part of what makes C fast.

https://www.reddit.com/r/rust/comments/6g8v6p/seer_symbolic_execution_engine_for_rust/
Forget the actual question, just read some of the comments.  This is the same thing as klee.  There’s even a comment about ‘super compilation’.

Types are fundamental.  A class is a user defined type.  A type specifies the set of values of a variable, and the operations you can perform on it.  The user should know exactly what they can do with a piece of data at any time, and that means static typing.  Compile time errors are easier to catch than runtime errors.

Professor Aiken said in Static Vs Dynamic Typing Part 1 that ‘there are a lot of fancy new type systems that haven’t been implemented yet.’  What are they?  What exactly is he talking about?

If everyone spoke English and used the English alphabet, would ascii be ok for everything?  Could we constrain chars to just 8 bits?

Could you potentially use your language to mine crypto?  Haha, just joking, that would be stupid.  Everyone's using crazy ASICs to mine crypto now.  But what if it makes you a gorillionaire?  Haha jk.  But srsly.

Linear algebra is a kind of explicit parallelization (or more accurately, vectorization), but unlike other kinds of parallelization, this one helps programmers because it actually makes things conceptually simpler.  With that in mind, you should make matrix operations a fundamental part of the language.  Not function calls like numpy, but using operators.

Check out the Go testing/benchmark library.  Interesting concept.

DE Shaw's specialized hardware.

Wikipedia ‘array programming’, ‘programming paradigms', 'vector programming'.

Declarative languages like Prolog.  Or perhaps it's something you want to avoid.  You want a language that feels natural.

The Golang regex doc contains a link comparing Go’s implementation to Perl, Python, etc.  Worth looking at.

The cathedral and the bazaar.

Runtime optimizations.

Syntactic sugar:  pretty, also restricting.  If it restricts what you can write, its a smaller domain space, and therefore easier to optimize.  Then again, users might get confused.  “This syntactically sugared expression should be identical to this unsugared expression, but for some reason it runs faster.  What gives?”  But maybe you’re conceptualizing it wrong.  If the sugared expression really is identical, then the compiler should see no difference, right?

Geoff really appreciates that React deprecates things.  They are willing to get rid of bad stuff so they don't get crushed under their own weight.  So you shouldn't be afraid to deprecate either.

Why does constexpr exist in Cpp?  If we have a factorial function f(n), why do we need to say constexpr f(n) in order for Cpp to evaluate it at compile time?  If the compiler sees an f(5) somewhere in the program, it could easily just try to execute that function now.  If it works out, great, if it doesn’t, oh well, at least you’ve partially evaluated it.  I mean, there are certain implications.  Like what if the user wants to time the function.  Then it takes a long time to compile, and when they go to test the runtime it comes back as constant time.  That’s not exactly an accurate measurement.  Still, it seems like a really big waste to not do that automatically.  Perhaps another argument in favor of compilation + runtime together.

Every dynamic programming algorithm has a recursive equivalent, right?  Does that mean you could auto-dynamize recursive functions?

Whitespace is good not just because it’s cleaner, but because it frees up other punctuation.  Think about all the different punctuation, and how it’s wasted.  \#, \$, \{\}, etc.  Why do we use \# in front of an include?  What is the point.  I feel like all these symbols could be put to better use.  Like for operations.  Just things you do all the time.  I feel like angle brackets really have a lot of potential.  =>, <>, ->, >>.  Lots of good stuff there.  Not so sure about pound sign.  Seriously, go nuts.  'Operators' are just fancy function calls.  That's literally it.  And why are & and | saved for bit shifting?  Who does that anymore?  I feel like it's just a leftover from the 60s-90s days when shifting bits around made things faster.  I wouldn't be so quick to replace it though.  It's not like basic math is the only thing you can do with bit shifting.  There might be some common application out there that you're forgetting.  Why waste these symbols on includes, macros, and other stuff that we rarely even use?  We should be using these for common operations.  Like isn't >> reading from a file or something in C++?  Also don't forget that ** is the cool new power operator now.  Can't forget that one.  But what about 'and/or' vs '&&\||' debate?  'and/or' feel more natural to type.  With '&&\||' you have to hold the shift key down, then reach above the qwerty row.  With 'and/or' it's much more natural.  Which makes me think that maybe the other operators are also awkward?  Hm.... \* doesn't feel unnatural.  Or maybe its just that writing out 'mul' would feel really really unnatural.  Yes, you can easily overload all this stuff with macros.  But remember that there should be 1 obvious way to do something.  Make the decision for them.

Remember that crow comic 'Go get is just a fancy git clone'.  Silly, but remember that pip is also a thing.  So is npm.  It seems like all the most popular languages have their own package managers.  Hm....

Why doesn't Java require the use of makefiles?  Or maybe it does and you just never realized it.  But how did they do that?  It's so much easier.  Why didn't Go do that?

Remember what you said about the cloud and how hardware would become more and more specialized?  How can you optimize for hardware that hasn’t come out yet?  Cluster computers are constantly changing.  How can you make something that will be able to adapt to these changes?

If you ever forget why you’re doing this, remember that Jay let his comp run overnight to get pictures of fly neurons.

SQL and all its implementations.  How will these be used in conjunction with this language?

Hacker News article about thread 'nurseries'.

Type inference.

Read “Go’s Declarative Syntax”. Has a good explanation of why C pointers are how they are, and why Go is different.  Maybe instead of ‘pointers’ you could call them ‘boxes’, since you ‘open’ them to get the actual data.  Or maybe that’s too close to packages.  Maybe ‘portal’ is a better word?  But if you’re a scientist, do you even need pointers?  Actually, you know the MapReduce model takes in functions as arguments.  So scientists need functions as args.  I think that counts as needing pointers.

garbage collection and its alternatives.  Specifically, I think you should go for optional garbage collection.  This way, you can write libraries and manually collect garbage while also letting your users have garbage collected automatically.  Call malloc and free 'save' and 'trash' instead.  Forget convention.  This language is for scientists, not computer scientists.  Remember that numpy for python was written in C, so python programmers get all the benefits of manual garbage collection.

Containerization.

Machine learning.  It's a bag of tricks.  Rather than trying to figure out which tricks are the best, just try all the tricks and see which ones give the best results.

Data pipelines.  How do users scrub data, put it into a recognizable format, feed it into a program?

IEEE, Posix, and potentially other standards you will have to comply with.

Licensing, such as GPL vs BSD.

Explicit documentation.  Document everything.  If it's not documented, it doesn't exist.  Users will get frustrated and quit.  Look at the java docs.  Those are good.  Remember at Scale there were 2 documents:  the README, and the wiki.  Nobody ever wanted to change the README because you had to go through bureaucracy to change it.  It had to be spotless before it could be added.  It was much easier to change the wiki.  You need to make documentation easy to change.  You should want to update it.  Maybe you could have some webpages that are 'set in stone' and some that are less so.  Then let users know that there's a difference.  Maybe that sounds like a 'this is right' and 'this is wrong'.  Hm...how do you make people understand the difference?

Golang seems to be of particular interest these days.  A new language that is actually popular and compiled.  If you google 'most popular programming languages on github', Go is on there.  It's pretty low, but it beat out C, Swift, and Scala.  Only ones it didn't beat out were the big hitters.  Javascript, Python, Java, C++, etc.  Python is number 2 by the way.  I wonder how it did that?  If you look at 'the zen of python' I think the most important one is 'There should be one—and preferably only one—obvious way to do it.'  That is why everyone loves python.  The creator, Guido Van Possum, is the ‘benevolent dictator for life’.  I would assume this meant he had the last say on all design decisions.  How did one guys decisions lead to such unanimous satisfaction with the language?  Wouldn't different people have different preferences?  Nope, it turns out everyone perfers simplicity.  Back to Go.  Another interesting thing is that unlike Python, C++, javascript, etc, by the time Go came around there were already tools that did the same thing.  It's a systems programming language.  How did they break into the 'market' of systems programming when its already so saturated?  Well, maybe not saturated.  Dominated by C.

https://jguegant.github.io/blogs/tech/meta-crush-saga.html
This guy Jguegant made a ‘compile time game’.  He says ‘most of the computations are done during the compiler phase.'  Interesting.  How does this work?  Supposedly it’s very computationally efficient.  But also your compiler is running over and over again.  How could that possibly be efficient? 

Gofmt is an interesting simple tool.  Makes me think about how tools aren’t really a standard thing.  There isn’t just a compiler, and a makefile, or an interpreter.  There isn’t just a java environment or whatever.  It’s whatever you want it to be.  I feel like tools can definitely be simpler, more helpful.  Remember that lisp debugger thing that let you rewind code?  That was pretty interesting.  Why don’t other languages do that?

That start up ‘Big Stream’ was doing a whole bunch of big data compiler stuff.

Rust seems interesting.  Was voted ‘most loved programming language’ on stack overflow in 2016, 2017, and 2018.  Also has an interesting not-garbage-collector thing.  I think I’ve mentioned this before, but Bigstream was modifying compilers to make them better for big data or something.

There's so many kinds of parallelization. SIMD and MIMD and bit level and instruction level and task level.  Multi-core and symmetric and distributed.  So so many different techniques that no one uses because it takes too long to implement.

Why do we need virtual addressing?  Why can't the compiler use the physical addresses?  Wouldn't that be faster?  There's probably something you're missing here.

Something on parallelism:  If we have a task that we can make parallel, the best way to do it is to parallelize at the ‘highest’ possible level that we can.  Lets say we have a big list of matrices.  Each line in our list is a row of matrices.  Lets say each row is 100 matrices, and there are 10000 rows.  You want to multiply all 100 matrices together for each row, so your task is to spit out a list of 10000 matrices.  We’ll also say you have 4 cores to do this with.  You could split it up in a number of ways.  You could have each core multiply 25 matrices together for each row, then do 4 more multiplications to get the final product.  This involves 3 splits for each row, so 30000 splits, and 3 merges for each row, so 30000 merges.  Or you could just split the rows into 4 chunks and have each core work on 2500 rows.  This is just 3 splits, and 3 merges.  Much better.  So when jobs >> workers, splitting up your workload at the highest level possible is probably the best idea.  But what about if we have the same number of workers as jobs, or workers >> jobs?  The current ‘fastest’ supercomputer in the world is Taihu Light, which has 10,649,600 total cores, or workers.  Then how would you split up the work?

Another separate thought on parallel processing:  splitting up tasks takes time, any you want all of your parallel processes to finish at the same time, so maybe you should give your earlier processes a little more work, since they start earlier.  Would that make much of a difference?  Is that part of load balancing?

If polly makes C++ perform 100x better on big data sets, how come we’re not hearing about it?  How come libraries like numpy aren’t suddenly 100x faster?  Isn’t Python compiled into C++?  Maybe it’s a separate project because it’s so hard to merge with the regular optimizations present in the normal optimizer.  What if you made a compiler thing that only worked on TPU’s?  Since this thing works on CPU’s, perhaps its optimizations clash with the more classical optimizations?

You want ‘automatic’ parallelization.  What is automatic parallelization?  If I use numpy, is that ‘automatic’?  Why is numpy a library and not a built in thing?  I think ‘automatic’ parallelization means it’s parallelized without you having to use any libraries or fancy classes.  Just primitives.  Maybe classes too.  But still not sure.  The program must be written in a completely sequential manner.  Then the compiler automatically makes it parallel.  What does it mean for the program to be ‘sequential’?  Are matrices a special exception?  They are a concept that allows us to ‘parallelize’ something in our mind.  While it is explicit parallelization, it is simpler for us to conceptualize it in the form of a matrix than 1 by 1.  So matrices should be allowed to be in your code.  Things like writing pragma omp parallel or calling go func are definitely explicit parallelism that you shouldn’t need.  So you should compare based only on being able to use matrices, and perhaps setting compiler options.  If the user has to think 'i'm going to do it this way because I know the compiler will parallelize it', then you're doing your language wrong.  They shouldn't have to think about that.

Why when you have a compiled language, the options for changing certain behaviors are always hidden behind some menu or as a compilation tag or switch.  Switches for turning on/off garbage collection, or static type checking, or automatic formatting, or whatever environment variables.  Why not just put them in your main file?  Have a file called start/main where all your configurations go.  Now you don't need to remember any terminal commands or a ton of different files.  It's all right there, in the most obvious place.  You might think 'noooo I don't want to copy and paste my config every time I make a new project.'  Well, you don't have to.  The compiler should be able to function in the normal way, or in this much easier way as well.  Now you don't need to know what a bash_rc vs a bash_profile is (which by the way are different on MacOS vs Linux).  That isn't to say that you want a bunch of conflicting compiler options.  All you want is 1 cohesive compiler option.  The MacOS of compilers.  You don't want users messing with flags and stuff.  They should basically be coding in python.

When doing machine learning, why are we the ones separating pictures into learn and test sets?   I feel like it would be easier if you just give the program all the pictures and labels, then the machine decides what goes into the learning and testing sets.  The idea would be that you want the highest possible accuracy with the smallest possible learning set.  This seems like it would take exponentially more time, though.  But you want your algorithm to learn the ‘essence’ of whatever you give it.  The smaller the learning set, the more you have condensed the essence of the thing it’s trying to learn.  This might be wrong, but right now I think machine learning algorithms just divide the pictures randomly.  I don't think choosing randomly is a good idea.  There's obviously some arrangement of pictures into training / test sets that would result in the optimal way to do things in the wild.  Oh, here's an application of making your training sets smaller: you can retrain much faster.  It would be a way to filter out the useless / bad data.

Haskell and Erlang.  What are they?  What is functional programming?  It sounds they just took regular programming and removed all the conveniences.

If Stoke can easily optimize anything, does that mean you could easily create a graphics library?  If you can optimize general code for a GPU, doesn't that mean it would be really easy to optimize graphical code for a GPU?
