Skip to content
This repository

Introduce SemiScript. #11

Open
wants to merge 3 commits into from

6 participants

Eloy Durán Thomas Fuchs showell Laurent Sansonetti Ninh Bui James Tucker
Eloy Durán
alloy commented

After much discussion, it became clear that to support this library far into the future, we would need to abstract the requirements into its own language, which can be compiled down to different languages.

Currently the compiler generates valid code for JavaScript and CoffeeScript, but could easily be extended to support additional languages.

The compiler has been written in C for optimal ROFLScale performance.

Related #8.

Eloy Durán alloy Introduce SemiScript.
* Allows us to support popular languages, now, and far into the future.
* Translates to JavaScript and CoffeeScript, but is modular enough to
  easily be extended to support other languages.
f9a916a
Eloy Durán
alloy commented

Btw, the option parsing has been implemented very naively on purpose, I think it can provide us with quite some bike-shedding hours in the future.

Thomas Fuchs
Owner

+1

showell

Great idea, and nice work.

It should be easy to extend this concept to other languages. Sorry, this is just a sketch (untested):

    if (
        (strcmp(argv[1], "-ada") == 0) ||
        (strcmp(argv[1], "-algol") == 0) ||
        (strcmp(argv[1], "-c") == 0) ||
        (strcmp(argv[1], "-cpp") == 0) ||
        (strcmp(argv[1], "-java") == 0) ||
        (strcmp(argv[1], "-js") == 0) ||
        (strcmp(argv[1], "-pascal") == 0) ||
        (strcmp(argv[1], "-perl") == 0) ||
        (strcmp(argv[1], "-php") == 0)
    ) {
         semicolon = 1;
    }
Laurent Sansonetti
lrz commented

Well, multiple strcmp statements are going to impact the promised ROFLScale performance. It would be more efficient to use a perfect hash table here (one can be generated using gperf(1)). I would certainly throw a few belgian dollars in a kickstarter project that promised to contribute such a change.

Laurent Sansonetti lrz commented on the diff
compiler.c
((34 lines not shown))
  34 + // CoffeeScript translation.
  35 + }
  36 + }
  37 + break;
  38 +
  39 + // Pass-through new-lines, unless in a comment.
  40 + case '\n':
  41 + if (!comment) {
  42 + putc('\n', stdout);
  43 + }
  44 + comment = 0;
  45 + break;
  46 +
  47 + // Comment, ignore the rest of the line.
  48 + case '#':
  49 + comment = 1;
7
Laurent Sansonetti
lrz added a note

That's probably not going to work if the sharp character is used in a string (or properly escaped), right? It might be a better idea to tokenize the file contents first. As performance is an issue here I recommend using the Amazon Mechanical Turk.

Ninh Bui
prototype added a note

Indeed, it's not going to work due to it only considering one character at a time (with too little state info) and therefor not parsing the grammar for comments properly; I'd normally recommend using a more elaborate finite state automaton for the latter, but crowdsourcing is definitely more cloud 3.0. Beware of using jquery files though, the excessive use of dollar signs might influence the tokenizer in thinking you might be part of a billion dollar photo filter company and could drive up pricing per token parsed.

Laurent Sansonetti
lrz added a note

Hmm, you make a valid point, @prototype. It would probably be safer to perform a DSE optimization pass (dollar sign elimination) on the js code before submitting it to Amazon. I believe there was a paper recently published in The Journal of Machine Learning Research about this technique.

I would recommend writing the optimization pass using LLVM, since it provides all the necessary foundations.

Eloy Durán
alloy added a note

Guys, I suggest you create a fork if you want to add support for strings, or anything else than a semicolon or sharp character. As it is, there is no support for them and as such your code would be at fault.

Laurent Sansonetti
lrz added a note

So this is how you thank people working on your code? This is open source, we have the right to open meaningful discussions about any project. I can see that you are trying to create a culture of exclusion here. I do hope that you will properly apologize for this atrocious behavior.

Eloy Durán
alloy added a note

I will do no such thing. Open source means that I get to dictate what you will be using, not some democratic model from your hippy lala-world.

Laurent Sansonetti
lrz added a note

I vigorously protest against this dictatorial behavior. You should be open to our concerns day and night. I am going to write a blog post about this story and submit it to hacker news.

This said, you are contradicting yourself. In your previous comment you proudly state that the code only supports semicolon or sharp characters, or you are deliberately checking for \n. I suspect you have bigger plans about the code and that you already started working on our suggestions, with the goal of delivering a proprietary product and selling it to facebook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Eloy Durán
alloy commented

@showell I like the suggestion, but @lrz raises a valid point. Any takers?

Ninh Bui

I do have some ethical concerns though with introducing semicolons back into our javascript. Do we really want to be responsible for a new generation of emo developers?

emo()

Laurent Sansonetti
lrz commented

Is that valid javascript code? I believe that emo is a reserved keyword in the grammar.

Eloy Durán
alloy commented

It would definitely be cool to be able to have turning-complete code like that in semiscript.

showell

@lrz I would commit some of my own Belgian money to this project, but I'm afraid Belgium stopped printing its own currency in 1995, and that was AGES ago.

Also, "emo" is only supported as a keyword in certain browser implementations of what we colloquially call "javascript." It is not part of the ECMAScript standard. As far as I know, the committee hasn't even reviewed the draft proposal.

James Tucker
raggi commented

Don't you have to wrap all this in a (function(){;})(); ??? ECOMMON

Eloy Durán
alloy commented

@showell & @lrz Done.

showell

@alloy By my calculation, your use of gperf introduces at least 256 bytes of memory overhead into the solution for a minimal speed enhancement. You can look at asso_values to see what I mean. In addition, each word in wordlist necessitates a 4-byte pointer due to the extra level of indirection, not to mention an extra 30 bytes for the five empty strings necessary to make the overly clever hash function work.

Having said that, I do think it's a promising patch. Next time, don't even bother submitting compiler.gperf. The gperf transcompiler generates ugly, non-idiomatic C code. I'd rather just work with the C directly and avoid the extra build steps. I see that you automated it to some degree in the Makefile, but what happens if I forget to run the makefile? Talk about a debugging nightmare.

Also, what if I mistype "pascal" in the "%%" section? At that point, I'm gonna have to drop down into a C debugger to figure out what's broken. So I might as well have just learned C.

Sorry to be so harsh. I totally understand if you want to use gperf for your own projects, and think it's cool, but I don't want to have to learn it myself. I'm perfectly happy with C. I don't even want to hear about gperf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Showing 3 unique commits by 1 author.

Apr 15, 2012
Eloy Durán alloy Introduce SemiScript.
* Allows us to support popular languages, now, and far into the future.
* Translates to JavaScript and CoffeeScript, but is modular enough to
  easily be extended to support other languages.
f9a916a
Apr 16, 2012
Eloy Durán alloy Support more languages as per @showell's suggestion, with gperf for s…
…peed as per @lrz's suggestion.
966e5ea
Eloy Durán alloy Print an error when a syntax error occurs. f3b36c3
This page is out of date. Refresh to see the latest.
4 .gitignore
... ... @@ -0,0 +1,4 @@
  1 +*.o
  2 +.*.sw?
  3 +semiscriptc
  4 +distbuild
12 Makefile
... ... @@ -0,0 +1,12 @@
  1 +semiscriptc: compiler.c
  2 + gcc compiler.c -o semiscriptc
  3 +
  4 +compiler.c:
  5 + gperf compiler.gperf > compiler.c
  6 +
  7 +distbuild: semiscriptc
  8 + mkdir -p distbuild/JS && ./semiscriptc -js semicolon.semi > distbuild/JS/semicolon.js
  9 + mkdir -p distbuild/CoffeeScript && ./semiscriptc -cs semicolon.semi > distbuild/CoffeeScript/semicolon.coffee
  10 +
  11 +clean:
  12 + rm -rf compiler.c semiscriptc *.o distbuild
46 README.md
Source Rendered
@@ -3,6 +3,49 @@
3 3 Semicolon.js is a much more secure, stable and reliable alternative to
4 4 <a href="http://vaporjs.com/">Vapor.js</a>.
5 5
  6 +
  7 +## Build:
  8 +
  9 +After much discussion, it became clear that to support this library far into the
  10 +future, we would need to abstract the requirements into its own language, which
  11 +can be compiled down to different languages.
  12 +
  13 +Currently the compiler generates valid code for JavaScript and CoffeeScript, but
  14 +could easily be extended to support additional languages.
  15 +
  16 +The compiler can be build with:
  17 +
  18 +```
  19 +make
  20 +```
  21 +
  22 +It produces the `semiscriptc` tool, which can then be used to compile SemiScript
  23 +source. These source files typically have the `.semi` extension.
  24 +
  25 +JavaScript:
  26 +
  27 +```
  28 +./semiscriptc -js semicolon.semi
  29 +;
  30 +```
  31 +
  32 +CoffeeScript:
  33 +
  34 +```
  35 +./semiscriptc -cs semicolon.semi
  36 +
  37 +```
  38 +
  39 +Beatiful.
  40 +
  41 +Finally, a convenience task is available to produce products for both JavaScript
  42 +_and_ CoffeeScript:
  43 +
  44 +```
  45 +make distbuild
  46 +```
  47 +
  48 +
6 49 ## Usage:
7 50 ```html
8 51 <script src="semicolon.js"></script>
@@ -20,6 +63,7 @@ Thanks to @alloy for pointing out the inherent code security and
20 63 interoperability problems with Vapor.js; and suggesting to
21 64 leverage the semicolon solution to address the underlying issues.
22 65
  66 +
23 67 ### License
24 68
25 69 (c) Copyright 2012 Thomas Fuchs
@@ -35,4 +79,4 @@ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
35 79 GNU General Public License for more details.
36 80
37 81 You should have received a copy of the GNU General Public License
38   -along with this program. If not, see <http://www.gnu.org/licenses/>.
  82 +along with this program. If not, see <http://www.gnu.org/licenses/>.
196 compiler.c
... ... @@ -0,0 +1,196 @@
  1 +/* C code produced by gperf version 3.0.3 */
  2 +/* Command-line: gperf compiler.gperf */
  3 +/* Computed positions: -k'2' */
  4 +
  5 +#if !((' ' == 32) && ('!' == 33) && ('"' == 34) && ('#' == 35) \
  6 + && ('%' == 37) && ('&' == 38) && ('\'' == 39) && ('(' == 40) \
  7 + && (')' == 41) && ('*' == 42) && ('+' == 43) && (',' == 44) \
  8 + && ('-' == 45) && ('.' == 46) && ('/' == 47) && ('0' == 48) \
  9 + && ('1' == 49) && ('2' == 50) && ('3' == 51) && ('4' == 52) \
  10 + && ('5' == 53) && ('6' == 54) && ('7' == 55) && ('8' == 56) \
  11 + && ('9' == 57) && (':' == 58) && (';' == 59) && ('<' == 60) \
  12 + && ('=' == 61) && ('>' == 62) && ('?' == 63) && ('A' == 65) \
  13 + && ('B' == 66) && ('C' == 67) && ('D' == 68) && ('E' == 69) \
  14 + && ('F' == 70) && ('G' == 71) && ('H' == 72) && ('I' == 73) \
  15 + && ('J' == 74) && ('K' == 75) && ('L' == 76) && ('M' == 77) \
  16 + && ('N' == 78) && ('O' == 79) && ('P' == 80) && ('Q' == 81) \
  17 + && ('R' == 82) && ('S' == 83) && ('T' == 84) && ('U' == 85) \
  18 + && ('V' == 86) && ('W' == 87) && ('X' == 88) && ('Y' == 89) \
  19 + && ('Z' == 90) && ('[' == 91) && ('\\' == 92) && (']' == 93) \
  20 + && ('^' == 94) && ('_' == 95) && ('a' == 97) && ('b' == 98) \
  21 + && ('c' == 99) && ('d' == 100) && ('e' == 101) && ('f' == 102) \
  22 + && ('g' == 103) && ('h' == 104) && ('i' == 105) && ('j' == 106) \
  23 + && ('k' == 107) && ('l' == 108) && ('m' == 109) && ('n' == 110) \
  24 + && ('o' == 111) && ('p' == 112) && ('q' == 113) && ('r' == 114) \
  25 + && ('s' == 115) && ('t' == 116) && ('u' == 117) && ('v' == 118) \
  26 + && ('w' == 119) && ('x' == 120) && ('y' == 121) && ('z' == 122) \
  27 + && ('{' == 123) && ('|' == 124) && ('}' == 125) && ('~' == 126))
  28 +/* The character set is not based on ISO-646. */
  29 +error "gperf generated tables don't work with this execution character set. Please report a bug to <bug-gnu-gperf@gnu.org>."
  30 +#endif
  31 +
  32 +#line 1 "compiler.gperf"
  33 + /* -*- C -*- */
  34 +#include <stdio.h>
  35 +#include <string.h>
  36 +
  37 +#define TOTAL_KEYWORDS 9
  38 +#define MIN_WORD_LENGTH 2
  39 +#define MAX_WORD_LENGTH 7
  40 +#define MIN_HASH_VALUE 4
  41 +#define MAX_HASH_VALUE 14
  42 +/* maximum key range = 11, duplicates = 0 */
  43 +
  44 +#ifdef __GNUC__
  45 +__inline
  46 +#else
  47 +#ifdef __cplusplus
  48 +inline
  49 +#endif
  50 +#endif
  51 +static unsigned int
  52 +hash (str, len)
  53 + register const char *str;
  54 + register unsigned int len;
  55 +{
  56 + static unsigned char asso_values[] =
  57 + {
  58 + 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
  59 + 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
  60 + 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
  61 + 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
  62 + 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
  63 + 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
  64 + 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
  65 + 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
  66 + 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
  67 + 15, 15, 15, 15, 15, 15, 15, 5, 15, 10,
  68 + 15, 15, 15, 15, 15, 15, 5, 15, 15, 15,
  69 + 15, 15, 0, 15, 15, 15, 15, 15, 15, 15,
  70 + 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
  71 + 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
  72 + 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
  73 + 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
  74 + 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
  75 + 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
  76 + 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
  77 + 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
  78 + 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
  79 + 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
  80 + 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
  81 + 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
  82 + 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
  83 + 15, 15, 15, 15, 15, 15
  84 + };
  85 + return len + asso_values[(unsigned char)str[1]];
  86 +}
  87 +
  88 +#ifdef __GNUC__
  89 +__inline
  90 +#ifdef __GNUC_STDC_INLINE__
  91 +__attribute__ ((__gnu_inline__))
  92 +#endif
  93 +#endif
  94 +const char *
  95 +in_word_set (str, len)
  96 + register const char *str;
  97 + register unsigned int len;
  98 +{
  99 + static const char * wordlist[] =
  100 + {
  101 + "", "", "", "",
  102 + "-php",
  103 + "-perl",
  104 + "",
  105 + "-pascal",
  106 + "-js",
  107 + "-ada",
  108 + "-java",
  109 + "-algol",
  110 + "-c",
  111 + "",
  112 + "-cpp"
  113 + };
  114 +
  115 + if (len <= MAX_WORD_LENGTH && len >= MIN_WORD_LENGTH)
  116 + {
  117 + register int key = hash (str, len);
  118 +
  119 + if (key <= MAX_HASH_VALUE && key >= 0)
  120 + {
  121 + register const char *s = wordlist[key];
  122 +
  123 + if (*str == *s && !strcmp (str + 1, s + 1))
  124 + return s;
  125 + }
  126 + }
  127 + return 0;
  128 +}
  129 +#line 16 "compiler.gperf"
  130 +
  131 +
  132 +int main (int argc, char *argv[])
  133 +{
  134 + if (argc != 3) {
  135 + printf("Usage: %s [-ada|-algol|-c|-cpp|-cs|-java|-js|-pascal|-perl|-php|-ruby] FILE\n", argv[0]);
  136 + return 1;
  137 + }
  138 +
  139 + unsigned char semicolon = 0;
  140 + if (in_word_set(argv[1], strlen(argv[1]))) {
  141 + semicolon = 1;
  142 + }
  143 +
  144 + char *source_file = argv[2];
  145 +
  146 + int row = 0, column = 0;
  147 + unsigned char comment = 0;
  148 +
  149 + FILE *file = fopen(source_file, "r");
  150 + if (file != NULL) {
  151 + char c;
  152 + while ((c = fgetc(file)) != EOF) {
  153 + switch (c) {
  154 + // This is where the language translation magic happens.
  155 + case ';':
  156 + if (!comment) {
  157 + if (semicolon) {
  158 + // With semcolon translation.
  159 + putc(';', stdout);
  160 + } else {
  161 + // Without semicolon translation.
  162 + }
  163 + }
  164 + column++;
  165 + break;
  166 +
  167 + // Pass-through new-lines, unless in a comment.
  168 + case '\n':
  169 + if (!comment) {
  170 + putc('\n', stdout);
  171 + }
  172 + comment = 0;
  173 + row++;
  174 + column = 0;
  175 + break;
  176 +
  177 + // Comment, ignore the rest of the line.
  178 + case '#':
  179 + comment = 1;
  180 + break;
  181 +
  182 + default:
  183 + if (!comment) {
  184 + fprintf(stderr, "Syntax error at row `%d' column `%d'.\n", row, column);
  185 + return 1;
  186 + }
  187 + }
  188 + }
  189 + fclose(file);
  190 + } else {
  191 + printf("Unable to read semiscript source file `%s'.\n", source_file);
  192 + return 1;
  193 + }
  194 +
  195 + return 0;
  196 +}
82 compiler.gperf
... ... @@ -0,0 +1,82 @@
  1 +%{ /* -*- C -*- */
  2 +#include <stdio.h>
  3 +#include <string.h>
  4 +%}
  5 +
  6 +%%
  7 +-ada
  8 +-algol
  9 +-c
  10 +-cpp
  11 +-java
  12 +-js
  13 +-pascal
  14 +-perl
  15 +-php
  16 +%%
  17 +
  18 +int main (int argc, char *argv[])
  19 +{
  20 + if (argc != 3) {
  21 + fprintf(stderr, "Usage: %s [-ada|-algol|-c|-cpp|-cs|-java|-js|-pascal|-perl|-php|-ruby] FILE\n", argv[0]);
  22 + return 1;
  23 + }
  24 +
  25 + unsigned char semicolon = 0;
  26 + if (in_word_set(argv[1], strlen(argv[1]))) {
  27 + semicolon = 1;
  28 + }
  29 +
  30 + char *source_file = argv[2];
  31 +
  32 + int row = 0, column = 0;
  33 + unsigned char comment = 0;
  34 +
  35 + FILE *file = fopen(source_file, "r");
  36 + if (file != NULL) {
  37 + char c;
  38 + while ((c = fgetc(file)) != EOF) {
  39 + switch (c) {
  40 + // This is where the language translation magic happens.
  41 + case ';':
  42 + if (!comment) {
  43 + if (semicolon) {
  44 + // With semcolon translation.
  45 + putc(';', stdout);
  46 + } else {
  47 + // Without semicolon translation.
  48 + }
  49 + }
  50 + column++;
  51 + break;
  52 +
  53 + // Pass-through new-lines, unless in a comment.
  54 + case '\n':
  55 + if (!comment) {
  56 + putc('\n', stdout);
  57 + }
  58 + comment = 0;
  59 + row++;
  60 + column = 0;
  61 + break;
  62 +
  63 + // Comment, ignore the rest of the line.
  64 + case '#':
  65 + comment = 1;
  66 + break;
  67 +
  68 + default:
  69 + if (!comment) {
  70 + fprintf(stderr, "Syntax error at row `%d' column `%d'.\n", row, column);
  71 + return 1;
  72 + }
  73 + }
  74 + }
  75 + fclose(file);
  76 + } else {
  77 + fprintf(stderr, "Unable to read semiscript source file `%s'.\n", source_file);
  78 + return 1;
  79 + }
  80 +
  81 + return 0;
  82 +}
1  semicolon.js
... ... @@ -1 +0,0 @@
1   -;
2  semicolon.semi
... ... @@ -0,0 +1,2 @@
  1 +# This source file is available under the GPL license.
  2 +;

Tip: You can add notes to lines in a file. Hover to the left of a line to make a note

Something went wrong with that request. Please try again.