Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect code detection #334

Closed
whizzzkid opened this issue Dec 14, 2013 · 6 comments
Closed

Incorrect code detection #334

whizzzkid opened this issue Dec 14, 2013 · 6 comments

Comments

@whizzzkid
Copy link

I was wondering with any possibilities to use this awesome library to detect code snippets. But I think there is some errors with even the test cases mentioned. I tried using many code snippets shown here http://highlightjs.org/static/test.html let's have a look at the ruby code. I added the following code to the highlight.pack.js:

var code='class A < B; def self.create(object = User) object end end' +
'class Zebra; def inspect; "X#{2 + self.object_id}" end end' +
'' +
'module ABC::DEF' +
'  include Comparable' +
'' +
'  # @param test' +
'  # @return [String] nothing' +
'  def foo(test)' +
'    Thread.new do |blockvar|' +
'      ABC::DEF.reverse(:a_symbol, :\'a symbol\', :<=>, \'test\' + ?\012)' +
'      answer = valid?4 && valid?CONST && ?A && ?A.ord' +
'    end.join' +
'  end' +
'' +
'  def [](index) self[index] end' +
'  def ==(other) other == self end' +
'end' +
'' +
'anIdentifier = an_identifier' +
'Constant = 1' +
'render action: :new' +
'' +
'str =~ /^(?:foo)$/' +
'str =~ %r{foo|bar|buz$}' +
'str =~ %r!foo|bar$!' +
'str =~ %r[foo|bar$]' +
'str =~ %r(\(foo|bar\)$)';
console.log(hljs.highlightAuto(code).language);
console.log(hljs.highlightAuto(code).second_best.language);

On executing:

node highlight.pack.js

I was expecting the language to be ruby, but I got the output:

livecodeserver
ruby

In case of java I get

coffeescript
java

These were cases from the test, I even tried using rather some random snippet (let's say: http://attachment.fbsbx.com/hackercup_source.php?sid=1402033520037632) I got:

d
ocaml

I might be wrong with the usage, please correct me, or does this qualifies as a bug?
Thanks!

@isagalaev
Copy link
Member

You're concatenating all the lines without line breaks between them, it becomes completely mangled as a result.

@whizzzkid
Copy link
Author

You're awesome!... It worked... I built http://code.nishantarora.in/langdetect.js/ over this... now I can detect code languages of multiple files I needed to profile!...

Thanks!

@whizzzkid
Copy link
Author

closing this!

@isagalaev
Copy link
Member

I should note that using highlight.js specifically for detecting languages might not be a good idea as our goal is to produce a useful highlighting, not detect a language. I.e., we don't consider it a bug if a snippet language is detected incorrectly but resulting highlighting looks okay.

@whizzzkid
Copy link
Author

Makes sense, but I just needed a crude mechanism to profile thousands of source code files (withoutfile extensions or meta comments). I think I can live with a percentage of errors. Though as you say it might not detect the language correctly, as of now this is the closest thing I have... Thanks!

@isagalaev
Copy link
Member

Glad it worked for you :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants