-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for V #4564
Add support for V #4564
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should make your own syntax highlighting, don't use go's
May be later. |
It's perfectly fine to use another language's grammar if it works well enough. The search query showing in-the-wild usage and the samples are mandatory though. |
Can linguist use the grammar from here? https://github.com/0x9ef/vscode-vlang/tree/master/syntaxes |
I believe this should help on the syntax highlighting / grammar front: vlang/v#465 |
V language uses the extension |
@atakanyenel there has been some discussion on this. Are you suggesting to change the V language extension? @medvednikov believes that changing the extension is not appropriate. It seems as though we need to consider options to retain the Also, all things considered, Verilog is a rather specialized language and ultimately a less widely desirable language (not saying this in a negative context towards it, just simply stating that it's not a general programming language like V is and ultimately has lower potential reach and use). Given this V should have, or at least will need, a higher priority at some point. |
linguist can support multiple languages with the same extension, is says here https://github.com/github/linguist/blob/master/CONTRIBUTING.md
|
Don't change any extensions. I'm working on getting support for this, but it involves going through hundreds of |
Okay, I've amassed 5,273 unique results from a collection of harvests spanning various keyword searches (my attempt to narrow them down to V files). The Silos repository I've uploaded them to is enormous (well over 10 GBs), so make sure to shallow-clone the branch if you're interested in going through the search results: $ git clone --branch v --depth 1 https://github.com/Alhadis/Silos.git The files include Verilog, Coq, and (hopefully) V. Unfortunately, I'm not familiar with any of these languages, and there are too many files to scrutinise by hand. The only sane approach I can think of would be using the V compiler itself to statically parse each file and determine which are lexically valid V syntax. Picture something like this: $ ./vvalidator verilog.v
✘ Syntax error: Unexpected token "…" on line 4
$ ./vvalidator hello_world.v
✔ File "hello_world.v" is syntactically valid That would enable us not only to filter the files now, but to assist @pchaigno in monitoring the language's popularity over time if there isn't enough in-the-wild usage yet. @medvednikov, would you know if this is easy to do? |
@Alhadis This is definitely possible, thanks for your work! Couldn't we also train a classifier? |
Oh, and that grammar you suggested looks great. Here's a preview: So yeah, we'll definitely use @0x9ef's grammar if V's in-the-wild usage is high enough. 👍 |
@Alhadis Excellent 💃 I think some other people were working a grammar in another format also, but I cant remember where it was. |
Hello, Sorry for the delay. Thanks for spending the time to collect the data on .v files.
This is definitely doable, although my current Internet connection won't allow it. I just wonder if this work is necessary. V seems to be a language with the fastest growth ever: over 10k stars and 600 forks in just 1 month since the open-source release. By the way, shouldn't that already be enough for the hundreds of repositories rule? (V compiler is written 100% in V.) I've also just released the web framework, and V UI release is around the corner, so there's going to be even more interest and projects in V. |
@medvednikov would the donation of a Linux or macOS virtual machine for you to use via ssh, TeamViewer and/or Jump that has a fast and unlimited network connection be helpful to you for remote work? If so, please ping me on Discord and I'll set you up. |
I can do the I can return to you the number of correctly parsed results until end of the week on this thread. P.N.: Even map-reduce can be used for this, but with v's compilation speed I expect this to be short. |
@ylluminate thanks, but I have access to cheap AWS instances I can use for that. Didn't think about that. It's just like I said in my previous post: is this work really necessary? |
@medvednikov right, I get it and suspect you're correct. Just wanted to throw it out there "just in case." Hopefully it isn't and we can just move ahead sans the effort, although @Alhadis has done a valiant job so far in pushing things ahead and we certainly need to get behind his effort if there's no other way. |
That's... a pretty good point, actually. 😂 Don't know why that never occurred to me (I feel a bit dumb now, since the number of forks clearly corresponds to the number of users/repositories).
|
Well, technically no as forks aren't unique repositories 😉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As my inline comment states 😄
What defines "unique" when it comes to authorship, then? Given that half of GitHub is comprised of derivative works, should we really be splitting hairs when it comes to distribution? |
I was being facetious, and also keeping in mind that GitHub search and even Harvester don't include forks in their search results by default 😉 Even ignoring forks, there are tons of files with a Update: oh wait, most of those already have syntax highlighting which suggests they're not this V meaning we're going to need heuristics etc too. |
Which brings us back to the original problem: sifting through them to filter out Verilog and Coq files... |
@Alhadis I ran the mentioned 5287 files parsed. The script is as follows: # vvalidator.rb
files_path="Silos/files/"
files=Dir.entries(files_path).drop(2) # remove ., ..
test=files # for partial testing, test=files[0,100]
num_of_v=0
index=1
File.open("v_files.txt","w"){
|file|
for f in test
puts index
is_compiled=system("./v",files_path+f,:out=>:close)
if is_compiled
num_of_v+=1
file.write(f+"\n")
end
index+=1
end
}
puts "number of v files: ", num_of_v The You can try it and tell me if I missed some cases. 106 seemed small to me , maybe there are other reasons that real v files didn't compiled correctly. Updated language spec might be a reason. |
Thank you for the release. The changes are now live! I'm super excited about this :) @Alhadis some files are identified as Coq and Verilog. |
@medvednikov Try pushing a trivial change to the affected files. That'll force GitHub to refresh its cache and recalculate the languages of every file. When I forked the V compiler, the language bar was 99.7% V, 0.3% "Other": Changes to Linguist don't retroactively affect repositories until their files need to be reanalysed (such as when they're modified, or the repository is deleted and republished). |
I see, didn't know that. Thanks, @Alhadis |
@Alhadis indeed pushing fixed the language in all V repos I tried except for vlang/v. It's still at ~6% Coq/Verilog. For you it was correct because I set up a temporary One I removed it, Coq/Verilog were back: https://github.com/vlang/v/search?l=coq |
Honestly, that's something I'd leave in, even if the current classification is correct. Bayesian classification will always have the potential for error, and if repository owners are able to make Linguist's life easier with an explicit override, all the better.
Why are these Coqs still showing “Last indexed on 29 Jun”, then? The changes I mentioned have to target the file's contents directly, forcing GitHub to reanalyse each file. That's another point I need to mention: Linguist analyses files on a case-by-case basis; so as far as it's concerned, each file may as well exist in isolation. Pushing changes to neighbouring directories will therefore have zero effect. |
All those results aren't 100% correct as that is showing the cached search results (see the last indexed date). Linguist has no control over this but they should become correct when the repo is next reindexed or the files modified. As for the percentage breakdown... direct analysis shows the following, which aligns with the language bar results: $ bundle exec bin/github-linguist ~/tmp/trash/v --breakdown
93.23% V
5.50% Coq
0.79% Verilog
0.36% Batchfile
0.06% Dockerfile
0.04% C
0.03% Makefile
[... truncate for brevity ...]
Verilog:
compiler/cheaders.v
vlib/encoding/binary/binary.v
vlib/net/socket_nix.v
vlib/net/socket_win.v
Coq:
compiler/fn.v
vlib/gx/gx.v
vlib/time/time.v
[... truncate for brevity ...] If we take an even closer look, we can see why using the first "verilog" file as an example: $ LINGUIST_DEBUG=1 bundle exec bin/github-linguist ~/tmp/trash/v/compiler/cheaders.v
cheaders.v: 127 lines (104 sloc)
type: Text
mime type: text/plain
Coq = -2371.290 + -5.141 = -2376.431
V = -2227.686 + -5.508 = -2233.195
Verilog = -2211.562 + -5.141 = -2216.703
language: Verilog
$ Or even more closely: $ LINGUIST_DEBUG=2 bundle exec bin/github-linguist ~/tmp/trash/v/compiler/cheaders.v
cheaders.v: 127 lines (104 sloc)
type: Text
mime type: text/plain
# Coq V Verilog
( 11 - 5.727 8.484
) 11 - 5.869 7.329
// 14 - 108.060 152.351
; 6 - 4.887 12.294
[ 1 - 1.994 1.922
] 1 - 2.057 1.997
and 3 14.684 - -
const 1 - 7.719 -
etc 1 - 7.025 -
for 1 - 6.320 3.710
function 1 3.796 - -
int 1 8.608 9.223 -
main 1 3.796 9.223 -
module 1 - 7.025 9.010
type 2 10.811 - -
x 3 1.274 - 2.186
{ 2 - 7.640 2.276
} 1 - 3.805 1.138
Coq = -2371.290 + -5.141 = -2376.431
V = -2227.686 + -5.508 = -2233.195
Verilog = -2211.562 + -5.141 = -2216.703
language: Verilog
$ The heuristic hasn't been able to confirm this file is definitely V so it has fallen through to the classifier and it has assessed that based on the samples we have for V, Coq and Verilog that this file looks more like Verilog than any of the others. The only solution here is to use an override or improve the heuristic. |
What the hell. How long has that been there for? What other environment variables or features are available to the command-line? :| Okay, seriously, you need to consider having a man page for |
Not long at all... it's only just over six years since #529 was merged 😉 |
Ah, that's an important detail, thanks :) I'll try it.
I see, thanks. I'll shut up now. :) |
See, this is why I take software documentation seriously. 😜 Had I known which words and tokens the classifier finds most prominent, I would have chosen a more careful mix of samples each time I added support for a new language. Until now, it's always been a case of "find the most diverse-looking sample, check the license, then rerun the classifier, repeating the cycle until everything is 100% accurate." |
Anyway, regarding our current heuristic, we could probably amend it to recognise Which brings me to my next problem: I don't even know how to test Verilog. 😢 Wikipedia's description says that
... which kind of strikes me as something that doesn't have a REPL to "Hello, world" to. 😞 |
Yeah, sometimes 5 minutes spent on documentation can save days or even weeks for other developers. |
A well-written man page can also save googling a project and fishing around for its CLI reference. And no, @medvednikov My offer regarding V man pages still stands. 😉 |
@Alhadis sorry, I missed it. Can't find it here. What's the offer? :) By the way, I created a new V repo (not a fork), it's still detected incorrectly, so it's not about the cache: https://github.com/medvednikov/v3/search?l=coq |
The only thing that is cache related is the number of files in the search results - your new repo has far fewer results. The rest of my previous reply still applies. |
Would you mind if we added
It's over here. We can continue this part of the conversation over there to avoid sidetracking this thread too much. 👍 |
Sure, that's fine. Thanks. |
@Alhadis I'm definitely in favor of a github-linguist manpage. |
Okay, looks like
But adding
👍 So it should be safe to extend V's heuristc to include something like: /^\s*(pub\s+)?(fn)\s+([\w(])/m
@pchaigno Sorry, I missed your last response. I'll see what I can do when I find the time. 👍 |
Add the V programming language to the list (already in Linguist: github-linguist/linguist#4564)
I just realized that none of the .v files have syntax highlighting on GitHub: https://github.com/mvlootman/vbench/blob/master/bench_ips.v Language detection works fine, but the syntax is not highlighted for some reason. |
@medvednikov A lot of languages aren't showing syntax highlighting (both recent and otherwise). This can only be an issue on GitHub's end, so I expect they're working on fixing it. You can confirm the grammar's valid using Lightshow. Note that a direct link to the grammar file is necessary because V isn't showing up in the built-in grammars list. @lildude, this may or may not be related. |
I've taken a look into this as this appears to only affect languages or grammars added in v7.6.0. Work is underway to improved the syntax highlighting service used by GitHub and I suspect it has missed the v7.6.0 update. I've opened an issue to bring this to the attention of the team responsible for the improvements. I'll update when I know more or when things are working as expected again. |
Do you think our (admittedly inelegant) transition to GitHub Actions may have botched one of the deployment steps, somewhere? |
Nope. Completely unrelated. |
All sorted. This was indeed the case. |
It only took three years for me to get to this, but here you go: Currently blocked on something stupid: figuring out how to install the damn things as part of the (To everybody else who commented here: apologies for the thread necromancy…) |
Description
Add support for the V programming language.
Checklist:
vlang/v/examples
0x9ef/vscode-vlang
: Preview 1 | Preview 2 | MIT-licensed