Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for V #4564

Merged
merged 6 commits into from
Aug 12, 2019
Merged

Add support for V #4564

merged 6 commits into from
Aug 12, 2019

Conversation

S-YOU
Copy link
Contributor

@S-YOU S-YOU commented Jun 26, 2019

Description

Add support for the V programming language.

Checklist:

Copy link

@binkiklou binkiklou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should make your own syntax highlighting, don't use go's

@S-YOU
Copy link
Contributor Author

S-YOU commented Jun 26, 2019

May be later.

@S-YOU S-YOU closed this Jun 26, 2019
@pchaigno
Copy link
Contributor

It's perfectly fine to use another language's grammar if it works well enough. The search query showing in-the-wild usage and the samples are mandatory though.

@joe-conigliaro
Copy link

Can linguist use the grammar from here? https://github.com/0x9ef/vscode-vlang/tree/master/syntaxes

@ylluminate
Copy link

I believe this should help on the syntax highlighting / grammar front: vlang/v#465

@atakanyenel
Copy link
Contributor

V language uses the extension .v, and when research a in-the-wild usage it shows repository with verilog language that also uses the .v extension. Because Verilog already has syntax hightlighting, it should also be considered not to break that syntax highlighting. The usage of .v extension on Github

@ylluminate
Copy link

@atakanyenel there has been some discussion on this. Are you suggesting to change the V language extension? @medvednikov believes that changing the extension is not appropriate.

It seems as though we need to consider options to retain the .v extension while also facilitating the syntax highlighting and detection.

Also, all things considered, Verilog is a rather specialized language and ultimately a less widely desirable language (not saying this in a negative context towards it, just simply stating that it's not a general programming language like V is and ultimately has lower potential reach and use). Given this V should have, or at least will need, a higher priority at some point.

@joe-conigliaro
Copy link

linguist can support multiple languages with the same extension, is says here https://github.com/github/linguist/blob/master/CONTRIBUTING.md

Additionally, if this extension is already listed in languages.yml and associated with another language, then sometimes a few more steps will need to be taken:

@Alhadis
Copy link
Collaborator

Alhadis commented Jul 7, 2019

Don't change any extensions. I'm working on getting support for this, but it involves going through hundreds of .v files manually and figuring out what language it uses.

Alhadis added a commit to Alhadis/Silos that referenced this pull request Jul 9, 2019
@Alhadis
Copy link
Collaborator

Alhadis commented Jul 9, 2019

Okay, I've amassed 5,273 unique results from a collection of harvests spanning various keyword searches (my attempt to narrow them down to V files). The Silos repository I've uploaded them to is enormous (well over 10 GBs), so make sure to shallow-clone the branch if you're interested in going through the search results:

$ git clone --branch v --depth 1 https://github.com/Alhadis/Silos.git

The files include Verilog, Coq, and (hopefully) V. Unfortunately, I'm not familiar with any of these languages, and there are too many files to scrutinise by hand. The only sane approach I can think of would be using the V compiler itself to statically parse each file and determine which are lexically valid V syntax. Picture something like this:

$ ./vvalidator verilog.v
✘ Syntax error: Unexpected token "…" on line 4

$ ./vvalidator hello_world.v
✔ File "hello_world.v" is syntactically valid

That would enable us not only to filter the files now, but to assist @pchaigno in monitoring the language's popularity over time if there isn't enough in-the-wild usage yet.

@medvednikov, would you know if this is easy to do?

@joe-conigliaro
Copy link

joe-conigliaro commented Jul 10, 2019

@Alhadis This is definitely possible, thanks for your work! Couldn't we also train a classifier?

@Alhadis
Copy link
Collaborator

Alhadis commented Jul 10, 2019

Oh, and that grammar you suggested looks great. Here's a preview:

figure-1

So yeah, we'll definitely use @0x9ef's grammar if V's in-the-wild usage is high enough. 👍

@joe-conigliaro
Copy link

@Alhadis Excellent 💃 I think some other people were working a grammar in another format also, but I cant remember where it was.

@medvednikov
Copy link

medvednikov commented Jul 31, 2019

@Alhadis @pchaigno

Hello,

Sorry for the delay.

Thanks for spending the time to collect the data on .v files.

@medvednikov, would you know if this is easy to do?

This is definitely doable, although my current Internet connection won't allow it.

I just wonder if this work is necessary. V seems to be a language with the fastest growth ever: over 10k stars and 600 forks in just 1 month since the open-source release. By the way, shouldn't that already be enough for the hundreds of repositories rule? (V compiler is written 100% in V.)

I've also just released the web framework, and V UI release is around the corner, so there's going to be even more interest and projects in V.

@S-YOU S-YOU reopened this Jul 31, 2019
@ylluminate
Copy link

@medvednikov would the donation of a Linux or macOS virtual machine for you to use via ssh, TeamViewer and/or Jump that has a fast and unlimited network connection be helpful to you for remote work? If so, please ping me on Discord and I'll set you up.

@atakanyenel
Copy link
Contributor

atakanyenel commented Aug 1, 2019

I can do the vvalidator. I have a linux VM that has fast connection and as far as I can tell, the program itself is just a script around the v compiler(parser) that counts number of correctly parsed .v files in 5,273 results.

I can return to you the number of correctly parsed results until end of the week on this thread.

P.N.: Even map-reduce can be used for this, but with v's compilation speed I expect this to be short.

@medvednikov
Copy link

@ylluminate thanks, but I have access to cheap AWS instances I can use for that.

Didn't think about that.

It's just like I said in my previous post: is this work really necessary?

@ylluminate
Copy link

@medvednikov right, I get it and suspect you're correct. Just wanted to throw it out there "just in case." Hopefully it isn't and we can just move ahead sans the effort, although @Alhadis has done a valiant job so far in pushing things ahead and we certainly need to get behind his effort if there's no other way.

@Alhadis
Copy link
Collaborator

Alhadis commented Aug 1, 2019

600 forks in just 1 month since the open-source release. By the way, shouldn't that already be enough for the hundreds of repositories rule? (V compiler is written 100% in V.)

That's... a pretty good point, actually. 😂 Don't know why that never occurred to me (I feel a bit dumb now, since the number of forks clearly corresponds to the number of users/repositories).

I'll send a PR. (EDIT: Wait, this is a PR 😀 My bad for reading this on mobile)

@lildude
Copy link
Member

lildude commented Aug 1, 2019

By the way, shouldn't that already be enough for the hundreds of repositories rule?

Well, technically no as forks aren't unique repositories 😉

Copy link
Member

@lildude lildude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As my inline comment states 😄

lib/linguist/languages.yml Outdated Show resolved Hide resolved
@Alhadis
Copy link
Collaborator

Alhadis commented Aug 1, 2019

Well, technically no as forks aren't unique repositories

What defines "unique" when it comes to authorship, then? Given that half of GitHub is comprised of derivative works, should we really be splitting hairs when it comes to distribution?

@lildude
Copy link
Member

lildude commented Aug 1, 2019

What defines "unique" when it comes to authorship, then? Given that half of GitHub is comprised of derivative works, should we really be splitting hairs when it comes to distribution?

I was being facetious, and also keeping in mind that GitHub search and even Harvester don't include forks in their search results by default 😉

Even ignoring forks, there are tons of files with a .v extension.

Update: oh wait, most of those already have syntax highlighting which suggests they're not this V meaning we're going to need heuristics etc too.

@Alhadis
Copy link
Collaborator

Alhadis commented Aug 1, 2019

Even ignoring forks, there are tons of files with a .v extension.

Which brings us back to the original problem: sifting through them to filter out Verilog and Coq files...

@atakanyenel
Copy link
Contributor

atakanyenel commented Aug 1, 2019

@Alhadis I ran the mentioned vvalidator script on the Silos folder and the results I get are following:

5287 files parsed.
106 parsed correctly by V compiler.
Took 3 m 29 seconds.

The script is as follows:

# vvalidator.rb
files_path="Silos/files/"

files=Dir.entries(files_path).drop(2) # remove ., ..

test=files # for partial testing, test=files[0,100]
num_of_v=0
index=1
File.open("v_files.txt","w"){
|file|

        for f in test
                puts index
                is_compiled=system("./v",files_path+f,:out=>:close)
                if is_compiled
                        num_of_v+=1
                        file.write(f+"\n")
                end
                index+=1
        end
}
puts "number of v files: ", num_of_v

The V compiler comes from master ( 0197f20 ). Silos repo is cloned into the V repo.

You can try it and tell me if I missed some cases. 106 seemed small to me , maybe there are other reasons that real v files didn't compiled correctly. Updated language spec might be a reason.
Correctly parsed files are attached.
v_files.txt

@medvednikov
Copy link

Thank you for the release.

The changes are now live! I'm super excited about this :)

@Alhadis some files are identified as Coq and Verilog.

Screen Shot 2019-08-27 at 19 55 39

https://github.com/vlang/v/search?l=coq

https://github.com/vlang/v/search?l=verilog

@Alhadis
Copy link
Collaborator

Alhadis commented Aug 28, 2019

@medvednikov Try pushing a trivial change to the affected files. That'll force GitHub to refresh its cache and recalculate the languages of every file. When I forked the V compiler, the language bar was 99.7% V, 0.3% "Other":

Figure 1

Changes to Linguist don't retroactively affect repositories until their files need to be reanalysed (such as when they're modified, or the repository is deleted and republished).

@medvednikov
Copy link

I see, didn't know that.

Thanks, @Alhadis

@medvednikov
Copy link

@Alhadis indeed pushing fixed the language in all V repos I tried except for vlang/v.

It's still at ~6% Coq/Verilog. For you it was correct because I set up a temporary *.v linguist-language=V.

One I removed it, Coq/Verilog were back:

https://github.com/vlang/v/search?l=coq
https://github.com/vlang/v/search?l=verilog

@Alhadis
Copy link
Collaborator

Alhadis commented Aug 30, 2019

For you it was correct because I set up a temporary *.v linguist-language=V.

Honestly, that's something I'd leave in, even if the current classification is correct. Bayesian classification will always have the potential for error, and if repository owners are able to make Linguist's life easier with an explicit override, all the better.

indeed pushing fixed the language in all V repos I tried

Why are these Coqs still showing “Last indexed on 29 Jun”, then? The changes I mentioned have to target the file's contents directly, forcing GitHub to reanalyse each file.

That's another point I need to mention: Linguist analyses files on a case-by-case basis; so as far as it's concerned, each file may as well exist in isolation. Pushing changes to neighbouring directories will therefore have zero effect.

@lildude
Copy link
Member

lildude commented Aug 30, 2019

One I removed it, Coq/Verilog were back:

https://github.com/vlang/v/search?l=coq
https://github.com/vlang/v/search?l=verilog

All those results aren't 100% correct as that is showing the cached search results (see the last indexed date). Linguist has no control over this but they should become correct when the repo is next reindexed or the files modified.

As for the percentage breakdown... direct analysis shows the following, which aligns with the language bar results:

$ bundle exec bin/github-linguist ~/tmp/trash/v --breakdown
93.23%  V
5.50%   Coq
0.79%   Verilog
0.36%   Batchfile
0.06%   Dockerfile
0.04%   C
0.03%   Makefile

[... truncate for brevity ...]

Verilog:
compiler/cheaders.v
vlib/encoding/binary/binary.v
vlib/net/socket_nix.v
vlib/net/socket_win.v

Coq:
compiler/fn.v
vlib/gx/gx.v
vlib/time/time.v

[... truncate for brevity ...]

If we take an even closer look, we can see why using the first "verilog" file as an example:

$ LINGUIST_DEBUG=1 bundle exec bin/github-linguist ~/tmp/trash/v/compiler/cheaders.v
cheaders.v: 127 lines (104 sloc)
  type:      Text
  mime type: text/plain
       Coq =  -2371.290 +  -5.141 =  -2376.431
         V =  -2227.686 +  -5.508 =  -2233.195
   Verilog =  -2211.562 +  -5.141 =  -2216.703
  language:  Verilog
$

Or even more closely:

$ LINGUIST_DEBUG=2 bundle exec bin/github-linguist ~/tmp/trash/v/compiler/cheaders.v
cheaders.v: 127 lines (104 sloc)
  type:      Text
  mime type: text/plain
                            #       Coq         V   Verilog
                       (   11         -     5.727     8.484
                       )   11         -     5.869     7.329
                      //   14         -   108.060   152.351
                       ;    6         -     4.887    12.294
                       [    1         -     1.994     1.922
                       ]    1         -     2.057     1.997
                     and    3    14.684         -         -
                   const    1         -     7.719         -
                     etc    1         -     7.025         -
                     for    1         -     6.320     3.710
                function    1     3.796         -         -
                     int    1     8.608     9.223         -
                    main    1     3.796     9.223         -
                  module    1         -     7.025     9.010
                    type    2    10.811         -         -
                       x    3     1.274         -     2.186
                       {    2         -     7.640     2.276
                       }    1         -     3.805     1.138
       Coq =  -2371.290 +  -5.141 =  -2376.431
         V =  -2227.686 +  -5.508 =  -2233.195
   Verilog =  -2211.562 +  -5.141 =  -2216.703
  language:  Verilog
$

The heuristic hasn't been able to confirm this file is definitely V so it has fallen through to the classifier and it has assessed that based on the samples we have for V, Coq and Verilog that this file looks more like Verilog than any of the others.

The only solution here is to use an override or improve the heuristic.

@Alhadis
Copy link
Collaborator

Alhadis commented Aug 30, 2019

$ LINGUIST_DEBUG=2

What the hell. How long has that been there for? What other environment variables or features are available to the command-line? :|

Okay, seriously, you need to consider having a man page for github-linguist. I'm happy to write it as long as you tell me everything the executable can do (and I do mean everything).

@lildude
Copy link
Member

lildude commented Aug 30, 2019

What the hell. How long has that been there for?

Not long at all... it's only just over six years since #529 was merged 😉

@medvednikov
Copy link

Linguist analyses files on a case-by-case basis; so as far as it's concerned, each file may as well exist in isolation

Ah, that's an important detail, thanks :) I'll try it.

that is showing the cached search results (see the last indexed date). Linguist has no control over this but they should become correct when the repo is next reindexed or the files modified.

I see, thanks. I'll shut up now. :)

@Alhadis
Copy link
Collaborator

Alhadis commented Aug 30, 2019

See, this is why I take software documentation seriously. 😜

Had I known which words and tokens the classifier finds most prominent, I would have chosen a more careful mix of samples each time I added support for a new language. Until now, it's always been a case of "find the most diverse-looking sample, check the license, then rerun the classifier, repeating the cycle until everything is 100% accurate."

@Alhadis
Copy link
Collaborator

Alhadis commented Aug 30, 2019

Anyway, regarding our current heuristic, we could probably amend it to recognise fn as a V keyword, provided it's illegal in Verilog. I've already confirmed it isn't valid Coq.

Which brings me to my next problem: I don't even know how to test Verilog. 😢 Wikipedia's description says that

Verilog is a hardware description language used to model electronic systems. It is most commonly used in the design and verification of digital circuits at the register-transfer level of abstraction.

... which kind of strikes me as something that doesn't have a REPL to "Hello, world" to. 😞

@medvednikov
Copy link

See, this is why I take software documentation seriously. 😜

Yeah, sometimes 5 minutes spent on documentation can save days or even weeks for other developers.

@Alhadis
Copy link
Collaborator

Alhadis commented Aug 30, 2019

A well-written man page can also save googling a project and fishing around for its CLI reference. And no, --help output isn't sufficient when there are bells and whistles like LINGUIST_DEBUG laying around in need of attention.

@medvednikov My offer regarding V man pages still stands. 😉

@medvednikov
Copy link

@Alhadis sorry, I missed it. Can't find it here. What's the offer? :)

By the way, I created a new V repo (not a fork), it's still detected incorrectly, so it's not about the cache:

https://github.com/medvednikov/v3/search?l=coq
https://github.com/medvednikov/v3/search?l=verilog

@lildude
Copy link
Member

lildude commented Aug 30, 2019

The only thing that is cache related is the number of files in the search results - your new repo has far fewer results. The rest of my previous reply still applies.

@Alhadis
Copy link
Collaborator

Alhadis commented Aug 30, 2019

Would you mind if we added fn.v as a sample for our classifier? That'll likely improve its accuracy. 👍

@Alhadis sorry, I missed it. Can't find it here. What's the offer? :)

It's over here. We can continue this part of the conversation over there to avoid sidetracking this thread too much. 👍

@medvednikov
Copy link

Would you mind if we added fn.v as a sample for our classifier? That'll likely improve its accuracy. 👍

Sure, that's fine. Thanks.

@pchaigno
Copy link
Contributor

@Alhadis I'm definitely in favor of a github-linguist manpage.

@Alhadis
Copy link
Collaborator

Alhadis commented Sep 8, 2019

Okay, looks like fn is safe to whitelist as V-only syntax. I found a Verilog simulator called Icarus Verilog that “compiled” an a.out executable from this source:

module main;
	initial
		begin
			$display("Hello, world.");
			$finish;
		end
endmodule
λ iverilog i.v
λ ./a.out
Hello, world.
Contents of a.out script (unimportant)
#! /usr/local/Cellar/icarus-verilog/10.3/bin/vvp
:ivl_version "10.3 (stable)" "(v10_3)";
:ivl_delay_selection "TYPICAL";
:vpi_time_precision + 0;
:vpi_module "system";
:vpi_module "vhdl_sys";
:vpi_module "v2005_math";
:vpi_module "va_math";
S_0x7febc2d01ac0 .scope module, "main" "main" 2 1;
 .timescale 0 0;
    .scope S_0x7febc2d01ac0;
T_0 ;
    %vpi_call 2 4 "$display", "Hello, world." {0 0 0};
    %vpi_call 2 5 "$finish" {0 0 0};
    %end;
    .thread T_0;
# The file index is used to find the file name in the following table.
:file_names 3;
    "N/A";
    "<interactive>";
    "i.v";

But adding fn print_field(); or $fn print_field(); produced syntax errors:

@@ ~/Desktop/i.v @@
 	module main;
 	initial
 		begin
+			fn print_field();
 			$display("Hello, world.");
 			$finish;
 		end
 endmodule
λ Desktop: iverilog i.v
i.v:4: syntax error
i.v:4: error: malformed statement
λ Desktop: echo $?
2

👍 So it should be safe to extend V's heuristc to include something like:

/^\s*(pub\s+)?(fn)\s+([\w(])/m

@pchaigno I'm definitely in favour of a github-linguist man page.

@pchaigno Sorry, I missed your last response. I'll see what I can do when I find the time. 👍

ylluminate added a commit to ylluminate/explore that referenced this pull request Sep 17, 2019
Add the V programming language to the list (already in Linguist: github-linguist/linguist#4564)
@ylluminate ylluminate mentioned this pull request Sep 17, 2019
9 tasks
@medvednikov
Copy link

I just realized that none of the .v files have syntax highlighting on GitHub:

https://github.com/mvlootman/vbench/blob/master/bench_ips.v

Language detection works fine, but the syntax is not highlighted for some reason.

@Alhadis
Copy link
Collaborator

Alhadis commented Sep 27, 2019

@medvednikov A lot of languages aren't showing syntax highlighting (both recent and otherwise). This can only be an issue on GitHub's end, so I expect they're working on fixing it.

You can confirm the grammar's valid using Lightshow. Note that a direct link to the grammar file is necessary because V isn't showing up in the built-in grammars list. @lildude, this may or may not be related.

@lildude
Copy link
Member

lildude commented Oct 3, 2019

I've taken a look into this as this appears to only affect languages or grammars added in v7.6.0. Work is underway to improved the syntax highlighting service used by GitHub and I suspect it has missed the v7.6.0 update. I've opened an issue to bring this to the attention of the team responsible for the improvements.

I'll update when I know more or when things are working as expected again.

@Alhadis
Copy link
Collaborator

Alhadis commented Oct 3, 2019

Do you think our (admittedly inelegant) transition to GitHub Actions may have botched one of the deployment steps, somewhere?

@lildude
Copy link
Member

lildude commented Oct 3, 2019

Do you think our (admittedly inelegant) transition to GitHub Actions may have botched one of the deployment steps, somewhere?

Nope. Completely unrelated.

@lildude
Copy link
Member

lildude commented Oct 3, 2019

Work is underway to improved the syntax highlighting service used by GitHub and I suspect it has missed the v7.6.0 update.

All sorted. This was indeed the case.

@Alhadis
Copy link
Collaborator

Alhadis commented Oct 27, 2022

@Alhadis I'm definitely in favour of a github-linguist manpage

It only took three years for me to get to this, but here you go:

Currently blocked on something stupid: figuring out how to install the damn things as part of the github-linguist gem's installation. 😰

(To everybody else who commented here: apologies for the thread necromancy…)

An actual band name, courtesy of Indonesia

@github-linguist github-linguist locked as resolved and limited conversation to collaborators Jun 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants