Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine definition of KiCad #3743

Merged
merged 5 commits into from
Aug 8, 2017
Merged

Refine definition of KiCad #3743

merged 5 commits into from
Aug 8, 2017

Conversation

Alhadis
Copy link
Collaborator

@Alhadis Alhadis commented Jul 28, 2017

What the site currently classifies as KiCad is, according to their documentation, three entirely different formats:

Technical specifications of each format are covered in a PDF file, downloadable from KiCad's website.

I collected a fresh harvest of both sch and brd files, with the breakdown graphs included in each silo:

Notes

  • KiCad formats are data formats, not programming languages. I'm not sure why @pchaigno chose the latter in KiCad language with .sch extension #2309 / *.sch is both Eagle and KiCad schematics file #2187, but these formats are essentially just lists of coordinates, property lists, and object descriptions. XML has more in common with programming languages than these do.

  • A considerable number of sch search results were Scheme files. I ran an additional search to gauge the extension's usage better, and concluded that it's common enough to include it as a recognised Scheme extension. There were numerous Racket files as well, but I don't know the difference between Scheme and Racket, so I left the latter as-is.

  • Sample files were sourced from faffing around with vanilla KiCad and gEDA-gaf installations, and me clicking anything that looked like a drawing tool. Needless to say, I couldn't find any samples released under a clear permissive license, so I hacked together my own.

  • Scheme sample sboyer.sch released to the public domain (source confirmed here).

  • Obligatory sidenote: The harvester.js snippet I wrote to collect public search results has been moved to a Gist. Everything else has been torched and salted, with local copies eradicated by rm ‑rfP. I've never felt palpable disgust over such sloppy code. NFI what I was thinking or doing when I wrote that bullcrap.

    If you want to calculate a summary of unique repositories, you can use the standard Unix toolchain:

# Filter list of unique repositories
grep < urls.log -iEoe '^https?://raw\.githubusercontent\.com/([^/]+/){2}' |\
uniq | sed -Ee 's,raw\.(github)usercontent,\1,i' > unique-repos.log

Update

New samples added from recently-discovered repositories which are, thankfully, released under a permissive license:

Sources shared with #3744.

@Alhadis Alhadis requested a review from pchaigno July 28, 2017 05:32
@@ -4056,6 +4078,7 @@ Scheme:
color: "#1e4aec"
extensions:
- ".scm"
- ".sch"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for a heuristic rule between KiCad Schematic and Scheme?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, do you think it warrants one? These were the only non-KiCad files marked as KiCad Schematics, and this was the only KiCad file identified as Scheme. All-in-all, I think the classifier's doing a good enough job. =)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok 👍

Copy link
Collaborator Author

@Alhadis Alhadis Jul 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Travis shat bricks the moment I submitted this PR... any idea what's up with that? 😕

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's due to a recent update of Travis CI's default images. @kivikakk fixed it in her pull request.

Copy link
Collaborator Author

@Alhadis Alhadis Jul 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I trust she knows what she's doing. 👍 'Cause I don't, haha. Thanks!

(I named one of the samples after her, BTW... or rather, I made a typo and decided to keep it because it was so lulzy)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I named one of the samples after her, BTW... or rather, I made a typo and decided to keep it because it was so lulzy)

Awwww, this is as nice as the time I got called "Purveyor of the finest kivikode" by a coworker!

Copy link
Collaborator Author

@Alhadis Alhadis Jul 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extent of my Estonian knowledge in one picture, screen-capped from Notes.app:

screen shot 2017-07-28 at 6 01 06 pm

Also, this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've spent the entire day sorting files into folders, I'm getting jittery and restless. 😆

Copy link
Contributor

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pull request and all the work you put in!

(I wish I still had the time and resolve to do this kind of pull request...)

@Alhadis
Copy link
Collaborator Author

Alhadis commented Jul 28, 2017

Haha, merci! This is only the first half of my research; I'm currently working on the second. =) It's related, but a separate format.

@pchaigno
Copy link
Contributor

Soon, a monopoly on GitHub's grammars?

@Alhadis
Copy link
Collaborator Author

Alhadis commented Jul 28, 2017

I have a special project in mind for the future regarding grammars and GitHub. =) But I have an infuriating habit to announce my intent to start projects before I actually do so, and then later lose interest when something else distracts me later on. So I'm saying nothing until I actually do start working on it.

I believe I've alluded to it to you before, though. ;)

@Alhadis Alhadis mentioned this pull request Jul 29, 2017
6 tasks
@pchaigno
Copy link
Contributor

I believe I've alluded to it to you before, though. ;)

This one? 😃

@Alhadis
Copy link
Collaborator Author

Alhadis commented Jul 29, 2017

In a sense. It won't be a canonicalisation of TextMate, though. I envision it being more of an evolution: building upon every strength the TextMate grammar format has, while addressing the glaring weaknesses that've contributed to its growing fragmentation between editors - no sense of context, zero macro or variable support for grammar authors, and most of all, no support for multiline pattern-matching.

Preserving the existence of a portable, flexible and approach grammar format is the first half of the project. The other... well, language recognition. Remember my first grammar submission to Linguist, where I mistakenly believed that grammars were needed for classification accuracy? Well I was damn wrong, but that misconception started an interesting train of thought. If authored carefully, a grammar could reliably identify languages based on the frequency of matches, and the scopes assigned to them. The executable would be passed a file, and a list of grammar scopes to test. It'd then spit out a line-delimited list of multipliers describing how confident it is in each match. So for a start, there's your not-a-language right here:

$ syn something.sol --scopes js,tsx...
0.025 TeX
0.25 JS
...

Secondly, an important characteristic about Synapse (I may as well just spill my damn guts, now nothing will ever get started 😆 ) would be the simplicity of what it does. It doesn't output CSS, or HTML, or anything but an ordered sequence of offsets in a file tagged with human-readable descriptions ("scopes"), which are up to a higher-level implementation like Atom or GitHub to deal with. Hopefully it'll be pure Unix in spirit: think grep or ack with semantic pattern-matching.

There's the idea, now I've jinxed it. 😢

@Alhadis
Copy link
Collaborator Author

Alhadis commented Jul 29, 2017

Gotta finish this off first, though: I've been promising to do it since last September. Nearly there...

screen shot 2017-07-29 at 9 01 18 pm
screen shot 2017-07-29 at 9 02 18 pm

Real-time troff rendering with HTML5 canvas technology. 👍 Only taken me 3-4 months, hahahah.

@pchaigno
Copy link
Contributor

If authored carefully, a grammar could reliably identify languages based on the frequency of matches, and the scopes assigned to them. The executable would be passed a file, and a list of grammar scopes to test. It'd then spit out a line-delimited list of multipliers describing how confident it is in each match.

I think that was the approach taken by Pygments to classify files. The issue with this approach is that it's rather costly, in particular if you have a long list of grammars to interpret. It would be nice to have some actual benchmarks for this approach though.

@Alhadis
Copy link
Collaborator Author

Alhadis commented Jul 29, 2017

Nope. See, Pygments uses lexical parsing (IIRC), whereas Synapse grammars use the same nested, regexp-based syntax that TextMate grammars have always used. My grammars are probably the most complex, structured and "semantically authored" grammars out there, and if anybody wants to compete with my belligerently OCD scrutiny, step up to the plate. My Roff grammar is probably the best example of how this can be done well.

There's a damn good reason I refuse to enforce an AST-based approach: artistic freedom. The flexibility of the current format is one thing I refuse to part with, and not everything that receives highlighting is, dare I say, a "language".

Besides, Python is slow as crap and sucks as a language anyway, but oh-its-so-readable-tho.

Accurate.

@Alhadis
Copy link
Collaborator Author

Alhadis commented Jul 29, 2017

Oh, and grammar authors who've taken care to structure their grammars semantically can assign arbitrary "weights" to each pattern that can influence how strongly a successful match impacts language recognition.

function name ( param ) would, for example, hold much greater weight over, say, a floating point literal. Obviously, Synapse would use heuristics to draw a best guess for stuff like this, but allowing authors that degree of control over recognition makes me feel more comfortable than entrusting circuitry to make an informed decision.

This is where both of my backgrounds - artist and programmer, t-bone each other in a foaming display of "control freak". Suffice to say I have this shit planned out... writing 20 CSON grammars by hand will do that to ya.

(Nobody believes me when I tell them I have no girlfriend, can you believe it? L E L)

EDIT: Oh yeah, another responsibility of Synapse would, of course, be converting between established grammar formats for other implementations. Pygments <-> TextMate <-> Highlights …

There's no way I'd do this if I knew this wasn't getting squash-merged.
@Alhadis
Copy link
Collaborator Author

Alhadis commented Aug 2, 2017

@pchaigno While I'm reminded by #3751, do you feel that Eagle would be better represented as data rather than markup? I suppose there's a grey area where page-description languages overlap with image data, but Eagle seems pretty clearly composed of immutable data...

EDIT: Actually, don't worry. =) We can address this topic in #3751....

@Alhadis Alhadis requested a review from lildude August 8, 2017 06:17
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants