New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.inc file extension #2180
.inc file extension #2180
Conversation
SourcePawn also uses
They're most likely to be in |
@xPaw How common would you say the |
They're very common, but not guaranteed to be in every file. Maybe @dvander can say more. |
By the way, the number of |
Let's not forget the samples in #1268 |
One of "native", "forward", "stock", or "public" would be in almost any SourcePawn .inc file. |
This search will find more assembly files: |
@larsbrinkhoff Thanks! |
So, I finally turned this into a pull request and here are the results:
It looks to me like heuristic rules won't be needed for @arfon This is ready for review ;) |
Wow, I'm impressed the results are that accurate. I'm 👍. This actually got me thinking that it would be interesting to add an extended test suite that runs against these other corpuses and ensures they match a certain threshold. |
Yep, I am too. I added more samples than usually so that's one way to explain it but I'm afraid we could be overfitting. Although, only C++/SourcePawn/Pascal really needed more samples. C++ vs. SourcePawn needed more samples because only a handful of keywords are different. If there's overfitting somewhere I'd expect it to be for Pascal...
Sounds like a good idea to me. That would be in a separate PR right? Should we open an issue to track progress on that idea? |
👍 |
👍 thanks @pchaigno. |
Please add Makefile to the set of languages which can have a .inc suffix. For that matter, a file named "Makefile.inc" should probably be automatically classified as "Makefile" without even considering other options. Right now github is telling me that Tarsnap/spiped's Makefile.inc is a sourcepawn file! https://github.com/Tarsnap/spiped/search?l=sourcepawn |
There is more than 10 millions
.inc
files on GitHub and it's used by many languages. Thus, to add that extension, we need to identify all languages using it (with at least hundreds of examples).I will update this post with the list of languages as we go. I might eventually turn this issue into a pull request to add support of the extension.
*
I checked these by downloading 1,000 samples from the search results and using simple heuristics to triage them.