New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCaml code is reported as standard ML #2208
Comments
Was going to report that aswell. My repos are being converted to SML on new pushes. See e.g. https://github.com/dbuenzli/rtime |
Makes good jokes though... https://github.com/ocaml/ocaml |
My bad, it's a side effect of #2087. |
I'm not very knowlegable in SML but this (a little bit old) page has a few hints. I'd suggest
For reference here is the list of OCaml reserved keywords. @johnwhitington may have a definitive answer. |
Well or in comments, so I would rule out |
Some thoughts. Reserved words in Standard ML:
In OCaml:
So, if we remove words which might be very common value (variable) names, then good positive indicators of OCaml would be
And good positive indicators of Standard ML might be
Unfortunately, many Standard ML programs may not contain any of these. Perhaps the common strongest discriminator of OCaml would be the two-keyword sequence "let rec" appearing in a .ml file. |
One other option: use the project context. If the project name contains the string |
The ratio of It might be that the mere presence of |
We could use the presence of |
@pchaigno |
Something like that. I've never written ruby before so can't guarantee anything. |
SML uses "signature" and "structure" where OCaml uses "module type" and "module". This can never be seen in OCaml:
Case expression and anonymous function syntax is distinct:
SML "val foo = ..." binding syntax also cannot occur in OCaml. OCaml "val" keyword is only used in module signatures where it looks like
Since virtually every ML program contains bindings, this probably can be a good indicator. SML let expression syntax is "let ... in ... end" with multiple bindings between "let" and "in", which also never occurs in OCaml. |
Whoops! My bad. I saw an Octocat and just jumped to conclusions. :) |
@johnwhitington or @kayceesrk, do you know if |
I don't know what is standard, but a quick google suggests both mlton and SML/NJ seem to expect .sml: |
|
I tried to do some quick searching on GitHub (not straightforward given the current state) and found the following Standard ML project - https://github.com/HOL-Theorem-Prover/HOL. @mn200, sorry to tag you in a thread out of the blue but perhaps you can help us clarify whether |
.sml is nearly universal among SML users (and implementations), I've never seen anyone using .ml. You can see a lot of SML code here: http://github.com/standardml Conversely, .ml and .mli are nearly universal among OCaml users. |
As others have said, |
#2227 is in production now and the results are looking good: |
Wonderful. Thank you, all. I expect it might take a little while to propagate but it's already looking much better: |
Great! |
There are lots of repos that still seem to be wrong. For example, near and dear to my heart: https://github.com/janestreet/core_kernel is supposedly 77.8% SML. When should we expect this rerun to be complete? |
@yminsky it seems that updates are performed when you push to the repo, see https://help.github.com/articles/my-repository-is-marked-as-the-wrong-language/ (though I also still have mislabellings after having done so e.g. https://github.com/dbuenzli/mtime, I don't know if you have to actually touch the files). |
Would the regexps |
Both are possible but unlikely in SML.
is weird looking but valid SML. |
@yminsky - repository stats are only updated when there is a push event. I've manually recalculated the statistics for https://github.com/janestreet/core_kernel and things are looking much better. |
@dbuenzli - any update to any file after a new version of Linguist will completely rebuild the language statistics. |
I submitted #2270 which should at least catch those few OCaml files which use a shebang. |
I guess Another idea: isn't it the case that See e.g. https://github.com/search?q=language%3Asml+module+-extension%3Asig+-extension%3Acache&type=Code |
|
Here is a list of repositories that are classified by GitHub/Linguist as SML but contain the string "ocaml" in their name, description, or README. Some of these repositories are actually SML (and some of those contain files incorrectly classified as OCaml) but the most popular ones are definitely OCaml.
frenetic-lang/frenetic (Standard ML) [67 stars] ocaml/ocaml-re (Standard ML) [46 stars] zoggy/stog (Standard ML) [34 stars] Cumulus/Cumulus (Standard ML) [27 stars] rdicosmo/parmap (Standard ML) [26 stars] nojb/ocaml-imap (Standard ML) [25 stars] johnwhitington/cpdf-source (Standard ML) [18 stars] dbuenzli/tgls (Standard ML) [17 stars] dbuenzli/tsdl (Standard ML) [16 stars] akabe/slap (Standard ML) [16 stars] mackwic/To.ml (Standard ML) [14 stars] mjambon/biniou (Standard ML) [14 stars] ocaml/opam2web (Standard ML) [13 stars] c-cube/cconv (Standard ML) [12 stars] axiles/ocaml-efl (Standard ML) [11 stars] pyrocat101/opal (Standard ML) [11 stars] mirage/ocaml-crunch (Standard ML) [10 stars] modlfo/firmata (Standard ML) [10 stars] mirage/ocaml-fat (Standard ML) [8 stars] tel/ocaml-cats (Standard ML) [8 stars] coccinelle/herodotos (Standard ML) [8 stars] mirage/ocaml-pcap (Standard ML) [6 stars] nojb/ocaml-gsasl (Standard ML) [6 stars] arlencox/mlbdd (Standard ML) [6 stars] RobertHarper/TILT-Compiler (Standard ML) [6 stars] hcarty/ocaml-gdal (Standard ML) [5 stars] infidel/ocaml-mdns (Standard ML) [5 stars] tokenrove/shred-for-satan (Standard ML) [5 stars] ahrefs/ocaml-qfs (Standard ML) [4 stars] jonsterling/ocaml-modular-typechecking (Standard ML) [4 stars] rgrinberg/stringext (Standard ML) [4 stars] mirage/mirage-net-unix (Standard ML) [4 stars] tobiasBora/phluor_tools (Standard ML) [4 stars] mirage/mirage-net-xen (Standard ML) [4 stars] mirage/mirage-console (Standard ML) [4 stars] mirage/mirage-block-xen (Standard ML) [4 stars] OCamlPro/operf-macro (Standard ML) [4 stars] jhckragh/SMLDoc (Standard ML) [4 stars] avsm/ocaml-dockerfile (Standard ML) [3 stars] struktured/ocaml-prob-cache (Standard ML) [3 stars] lpw25/compiler_eq (Standard ML) [3 stars] aluuu/frmttr (Standard ML) [3 stars] scvalex/Super-Max (Standard ML) [3 stars] tokenrove/zookicker (Standard ML) [3 stars] mietek/et-language (Standard ML) [3 stars] yanguango/visual_sort (Standard ML) [2 stars] linerlock/featherweight-java (Standard ML) [2 stars] samoht/mirmin (Standard ML) [2 stars] whitequark/ocamlnet (Standard ML) [1 stars] OCamlPro/ocaml-benchs (Standard ML) [1 stars] bkc39/ocaml-prelude (Standard ML) [1 stars] tel/ocaml-collage (Standard ML) [1 stars] tel/ocaml-abt (Standard ML) [1 stars] choeger/modelica.ml (Standard ML) [1 stars] stephlm2dev/SchmilkaHashCode (Standard ML) [1 stars] smondet/locoseq (Standard ML) [1 stars] melsman/sml-llvm (Standard ML) [1 stars] gameboy1024/minijavac (Standard ML) [1 stars] massimo-nocentini/theory-of-programming-languages (Standard ML) [1 stars] simonegasperoni/funzionale (Standard ML) [0 stars] thomas-huet/coop-ocaml (Standard ML) [0 stars] MFreidank/ocaml_exercising (Standard ML) [0 stars] Lokibes/obelisk-ocaml (Standard ML) [0 stars] zakhar/ocaml-onnt (Standard ML) [0 stars] taquangtrung/ocaml-tools (Standard ML) [0 stars] i-am-jd/ocaml-onnt (Standard ML) [0 stars] suisse91/ocaml_mylist (Standard ML) [0 stars] jrrk/ocaml-for-ios (Standard ML) [0 stars] zoggy/ocamldoc-generators (Standard ML) [0 stars] domsj/orocksdb (Standard ML) [0 stars] HerbertJordan/otest (Standard ML) [0 stars] fetburner/OFold (Standard ML) [0 stars] fetburner/OCat (Standard ML) [0 stars] fetburner/owc (Standard ML) [0 stars] SusanHuang/MinimalistGrammarWithCoordination (Standard ML) [0 stars] art1pirat/img_pipieline (Standard ML) [0 stars] rcefala/pascaml (Standard ML) [0 stars] fpottier/pprint (Standard ML) [0 stars] thomas-huet/lwt-pgocaml (Standard ML) [0 stars] coutar-a/My_list (Standard ML) [0 stars] sfritz/a-song-of-ones-and-zeros (Standard ML) [0 stars] antoyo/tq (Standard ML) [0 stars] juster/ffp (Standard ML) [0 stars] mohamedaf/Projet1-CompilationAvancee (Standard ML) [0 stars] IzzyRahaman/99MLProblems (Standard ML) [0 stars] daherb/Kreis-Kugel (Standard ML) [0 stars] cicku/stcntroll (Standard ML) [0 stars] pgalland/ProgProblems (Standard ML) [0 stars] remyzorg/ppx_comprehension (Standard ML) [0 stars] iraikov/pprint (Standard ML) [0 stars] iraikov/mpi-mlton (Standard ML) [0 stars] BernardBeefheart/ml-games (Standard ML) [0 stars] spacemanaki/lexluthor (Standard ML) [0 stars] Alexis211/SystemeReseaux-Projet (Standard ML) [0 stars] gfxmonk/passe (Standard ML) [0 stars] khuumi/SNL (Standard ML) [0 stars] bdkoepke/pfds (Standard ML) [0 stars] velour/caml-spt (Standard ML) [0 stars] |
It would appear that the search index has a cache that is/was out-of-date. Only some of the repositories I just reported are now misclassified. Sorry for the confusion. |
Yup - I'm going through these manually now re-indexing them. On 8 April 2015 at 10:36, David Sheets notifications@github.com wrote:
|
In case you are interested, some more examples of (recently pushed to) repositories with files misidentified as SML: |
Everything was working fine until few days ago: all my new projects are now begin reported to be written in Standard ML instead of OCaml. See https://github.com/samoht/ocaml-huffman-code.
The text was updated successfully, but these errors were encountered: