False Virus Total Flags [Changed title to reflect current status] #342

trholding · 2021-12-29T18:14:06Z

Hi Justine

Awesome work! Love it.

But I have a concern. I uploaded the compiled hello world binary to virus total. It was flagged, cos the binary pings an IP apparently.

This happens only to freshly compiled binaries.

For binaries that have been executed on Linux once and then uploaded to virus total, the are no flags.

Apparently, the binary calls home to microsoft connection test servers and other IP's when run on windows:

Domain Detections Created Registrar
img-prod-cms-rt-microsoft-com.akamaized.net 0/ 90 2014-03-18 Akamai Technologies, Inc.
www.msftconnecttest.com 0/ 90 2014-04-04 NOM-IQ Ltd dba Com Laude

IP Detections Autonomous System Country
23.215.176.152 0/ 90 20940 US
95.101.28.33 0/ 90 20940 GB
95.101.28.59 0/ 90 20940 GB
13.107.4.52 1/ 90 8068 US

The IP 13.107.4.52 is a flagged and suspicious one. It is hosted at Azure.

I have yet to see how how virus total parses this after I run the binaries on various other oses and scan.

So far:

Linux: Works well, no flags in Virus Total. But I am checking network usage / wireshark to be sure.
Windows: Works well. Virus total flags it - On windows, the binaries check for network.
Mac, BSD variants, to be tested soon.

Interesting: The zipped version of a fresh binary does not raise a flag on Virus Total.

Hoping that you'll analyze what is happening on windows.

Regards

Vulcan

jart · 2021-12-30T06:59:01Z

Not possible.

Something somewhere else must be DLL hijacking them or something.

I had this issue recently with LISP.COM which VirusTotal said was contacting MSN, Akamai, and launching all these processes. But it's a programming language from 1961. It can't even call malloc() let alone Berkeley sockets. https://www.virustotal.com/gui/file/869abd2ebd9a31257b768398ac340967f888911dfa9152583fad79575ca11411/details Most of the false flags cleared up when I submitted an FP case with Microsoft 62b62f79-fc7f-4523-a2e5-1f94162462c7

If anyone can help us gain more insight into whatever screwy thing is going on, let us know. For example, maybe VirusTotal changed their service to launch from a downloads folder with hostile DLLs in it, because maybe they expect everyone to use some particular hardening measure.

For binaries that have been executed on Linux once and then uploaded to virus total, the are no flags.

That's likely because they no longer have PE morphology. APE binaries localize as ELF by default on Linux. It happens only once after the first run. That way the startup latency is reduced from 500µs to 50µs.

trholding · 2021-12-30T07:25:33Z

Thanks Justine for clearing it up

If it helps, here is the virus total link:

https://www.virustotal.com/gui/file/297b8c138932dd9added7cb5390d6a5b6953829cd0cb8f73135c1160f111a051/detection

What I figured out after co-relating your virus total link and mine is that SecureAge APEX and Gridinsoft flag the binaries based on flagged IP's that they have contacted.

And since there is no way that these binaries actually contact anything, there must be a issue at virus total or an OS image that they are using. It could be that MS Windows phones home each time something is executed.

I'll check this by uploading some windows and dos exes to virus total today.

trholding · 2022-01-04T13:01:18Z

Here is a reply I got from Virus Total

Hello XXXXXX,

Thanks for the details.

SecureAge APEX does not use our sandbox data to generate their reports, they have their own in house detection mechanisms.

Regarding the connections that you see, there might be multiple reasons. This could be connections made natively by the operating system. In this case it's not only VirusTotal Zenbox which see's these IPs. Microsoft Sysinternals also shows IPs and this is a third party tool which is not related to VirusTotal. You would have to contact them as well to ask about this.

Please note that VirusTotal only aggregates information, which could be more or less precise. End users should not take it as a source of truth. This is information that should be analyzed and filtered based on each users needs.

Nevertheless, thank's for your feedback. I will send it to our dev team in case we can improve how we show this information to make it more user friendly.

Best regards,
CXXXX BXXXX - VirusTotal - www.virustotal.com

I am leaving it here and not pursuing to contact SecureAge APEX and Microsoft Sysinternals. Maybe future people who are worried about this could pursue to get misleading flags deleted.

For any future people who may have concerns as to why there is a flag, its not the build tools or binaries that call home - its the way proprietary OSes and tools from certain vendors are - also only 1 engine out of several dozens flagged this - so safely ignore.

Closing this non issue in a few hours.

jart · 2022-01-04T16:42:32Z

I remember noticing earlier that some of the IPs I saw were operated by MSN. Does this mean that Microsoft is now monitoring the executables we run on Windows? Similar to how Macintosh recently changed their policy to upload a hash of all the binaries you run to Apple's servers?

trholding · 2022-01-06T13:26:45Z

I tested all executables in a Windows 11 Virtualbox instance with host only networking. On the host I had wireshark running and I did not see any pings or traffic when running the com exes.

Future people who encounter false positive issues could make use of the following forms to get the false positives removed:

AV

AV/Sandbox?

https://www.microsoft.com/en-us/wdsi/filesubmission

Sandbox

https://github.com/kevoreilly/CAPEv2/issues

TODO: ~~Remove all content on this response after 48 hrs. Keep only the FP form links for info.~~ DONE

gnu-enjoyer · 2022-01-07T08:43:36Z

I remember noticing earlier that some of the IPs I saw were operated by MSN. Does this mean that Microsoft is now monitoring the executables we run on Windows? Similar to how Macintosh recently changed their policy to upload a hash of all the binaries you run to Apple's servers?

They've done this for a while, it's called "smart screen" and is triggered by running unsigned binaries.

trholding · 2022-01-08T17:43:31Z

TL;DR Some Useful Stuff / Skip to Possible Solution

Final Check

I anticipate that cosmo libc / ape would become a UPX alternative and popular method to distribute cross-platform binaries on x86_64 in the future.

For that reason, I investigated to find the specific reasons as to why cosmo binaries are being flagged and what could be done to remediate this.

I checked the Virus Total flagged redbean and the hello world equivalent binary on Intezer - a malware analysis cloud service. Contrary to Virus Total's report of network access, as expected there was no network access:

https://analyze.intezer.com/analyses/b1122329-c81a-4cda-be1b-72ec8cbefdcb/behavior

https://analyze.intezer.com/analyses/6d4a0daa-3c9a-4ea4-a44b-7ae600bd6c33/behavior

(Taking a hint from @gnu-enjoyer I replied to the email from Virus Total. I asked for clarification (- pointing out the Intezer analysis results -) if their Zenbox and Microsoft Sysinternals sandboxes could be doing the smart screen thing in some way and logging spurious pings.)

I wanted to further test those binaries on Joe's Sandbox (https://www.joesandbox.com/#windows) on multiple OS (Mac, Win, etc) versions but didn't get approved yet.

Findings

The whole AV industry uses some form of rule files to identify Malware or they consider unsigned binaries as malicious by default. Binaries that pass rules are executed and observed based on further such rules that are applied to memory and disk patterns, access to network, resources, etc. Sometimes the rule files are applied to the download URL itself!

Yara rule files seem to be a standard. Rule files are generated or hand-coded and they contain match rules for strings found in the binaries minus the matches from a good binary string corpus, the number of matches and simple conditions determine classification.

AI-based AVs create AV rules on the fly based on machine learning of existing positive rulesets - since there is a corpus of large positive rulesets much greater than that of false-positive rulesets, AVs based on AI have a much greater false-positive rate. This comes at the cost of developer reputation and in some cases such as indie games, revenue loss.

Example of a Yara rule file: https://github.com/Yara-Rules/rules/blob/master/malware/000_common_rules.yar

Yara: https://virustotal.github.io/yara/

So the AV tools are sort of glorified search with fancy UI's that make use of dumb rules and certificate-based blocking! A 4B$ industry based on dumb rules! It should be noted that there are huge repos for positive rule files and none for false positives (which is not the way it should be). False positives are handled manually and it is the burden of the developer to reach out to the AV companies to get it sorted out. (When the AV industry grows to 10B$, perhaps by that time all software would be flagged and the industry would self destruct.)

Forcing signed binaries and the AV industry not sharing false positives in repos like they do for signed binaries and positives is a form of punishment to the developer. Moreover, code signing can cost money.

What are these dumb rulesets?

If a few strings such as "ssl_decompress_buf", "Allocating compression buffer", "Sec-GPC", "oldskool", "fumbles", "OPENBSD", "verifies", "caused", "6789:lmnop" are matches and vaguely follow a pattern found in binaries that were considered malicious, then the binary is flagged as malicious with a Malware family name and corresponding severity! (Imagine how flawed their corpus and feedback loop are... AV tools seem like quackery but surprisingly do their job of blocking too well! Maybe more flags mean more business, for the customer ultimate satisfaction and a feeling safe emotion...)

Conclusion

The redbean binary contained 11, 25, and 46 strings common to Malware, Packers, and Admin Tools respectively, which is shocking but judging the quality of the strings, the rules seem silly... See here:

https://analyze.intezer.com/analyses/b1122329-c81a-4cda-be1b-72ec8cbefdcb/sub/883238aa-b803-4676-8bd4-28066ec3c014/string-reuse

Similarly, for the other binary, 2 was common to Packers, and 1 string to Admin Tools:

https://analyze.intezer.com/analyses/6d4a0daa-3c9a-4ea4-a44b-7ae600bd6c33/sub/44962a9f-2a1f-4b59-be1e-632ea27d9005/string-reuse

These are the offending strings: "CreateFileMappingNumaW" , "MapViewOfFileExNuma", "oldskool".

Apparently "oldskool" is related to a remoting tool made by Devolutions inc...

A quick search on GitHub https://github.com/jart/cosmopolitan/search?q=oldskool shows that "oldskool" is a string found in the bootloader.

Some more interesting insight here: https://analyze.intezer.com/analyses/6d4a0daa-3c9a-4ea4-a44b-7ae600bd6c33/sub/44962a9f-2a1f-4b59-be1e-632ea27d9005/ttps which were triggered by the string Qemu and some x86 assembly in the ape bootloader...

Statically compiled binaries would contain many strings like that introduced by libraries, which is expected.

The AV industry's rules are defective as they target very short sequences which are broad/common / very general permutations, and are found very commonly in goodware and malware.

Possible Solution

Wherever possible rename / reword string constants, sequences found in Malware, Admin, and Packer family classes to reduce false positive matches. (Maybe rename "oldskool", "fumble", and others)
Wherever possible do the same for libs used. (SSL, zlib related strings, etc)
Generate Yara rule files for each binary release and place them in a "Yara false positives" directory in the git repo for reference (and so that AV's pick it up in the future),
Upload binaries to virus total and note the filenames and hashes / Virus Total URL in a text file called false positives.
Pack them along with the binaries in a zip file and send them to false-positive email addresses of various AV companies.

Here is a repo that has a big but incomplete list of AV company contact forms and emails: https://github.com/yaronelh/False-Positive-Center

Here is a generated Yara rule file for all the relevant binaries from redbean.dev and the cosmopolitan website:

https://gist.github.com/trholding/df5ee7ed50d9c47c0e98e6d996f3ba0d

This is how I generated the rules:
Uses: https://github.com/Neo23x0/yarGen

wget https://github.com/Neo23x0/yarGen/archive/refs/tags/0.23.4.tar.gz
tar -xvf 0.23.4.tar.gz 
cd yarGen-0.23.4/
pip install -r requirements.txt
python yarGen.py --update
mkdir FP
# copy all release binaries into the FP dir
python yarGen.py -a "For https://github.com/jart/cosmopolitan" -r "Yara rules for false positives" -m FP/
mv yargen_rules.yar cosmopolitan_virustotal_false_positve_fix_rules.yar

Since we have used an automated tool, it creates rather narrow rules (vs the broad handcrafted ones) which serve our purpose to be very narrow/specific.

All the above could be automated for each build and also used as a template by projects that use cosmopolitan libc once cosmo becomes very popular.

Signing

One could enable code signing, but it has to be researched if it would at all be compatible with ape header/format. Even if it would be, there is a chicken and egg problem: https://en.wikipedia.org/wiki/Microsoft_SmartScreen#Criticism

Links to Code Signing on Windows

https://stackoverflow.com/questions/84847/how-do-i-create-a-self-signed-certificate-for-code-signing-on-windows?noredirect=1&lq=1

https://docs.microsoft.com/en-us/windows/win32/seccrypto/cryptography-tools

Tool for Code Signing on Mac

https://github.com/mitchellh/gon

TODO: Future/Create a false positives spec, repo for false-positive rules and an automated service with a web interface to check for false positives, to help opensource projects (Social Cause / free) as well as legit commercial software (Business Case / paid) to get unflagged, commercial access to AV companies (Pareto Optimal / B2B / paid). 100% of proceeds (after deduction of running costs) to be distributed to support extremely cutting-edge opensource projects.

On another note, sometimes I wonder if an Industry that solely depends on the existence of malware could also be a hidden source of malware and the malware scare. If that is the case, there must be a way to disincentivize it and balance the forces, reverse grade AV tools, point out the false positives, make them write better rules and tools, give them a run for their money, maybe even create a modern free cross-platform alternative to ClamAV - making use of advanced algos, better AI, with training obtained from static analysis and dynamic analysis, virtual software and hardware-based looking glass probes (debug bridge / JTAG / bus pirate like) that monitor memory, IO and execution state, execution of millions of malware samples, and samples from GANs that generate both short positive and false-positive malware samples.

TODO: Remove all content on this response after 48 hrs. Keep only the Possible Solutions and relevant links.

jart · 2022-01-11T07:21:34Z

Thank you for the detailed analysis.

Wherever possible rename / reword string constants, sequences found in Malware, Admin, and Packer family classes to reduce false positive matches. (Maybe rename "oldskool", "fumble",

That's easily actionable and is something we can do. Many of the other proposals is something where we're all going to need to chip in a little bit. I naturally do everything within my power to get my own release binaries unflagged. If APE users use this information to do the same, then it'll help the antivirus makers improve. Even if that's just registering with VirusTotal and upvoting your work, so that other people can see that it came from you. Since ultimately these things boil down to reputation.

trholding changed the title ~~Possible Security / Privacy issue. Binaries created ping a server.~~ False Virus Total Flags [Changed title to reflect current status] Dec 30, 2021

trholding closed this as completed Jan 4, 2022

bloc97 mentioned this issue Jan 15, 2022

Virus Report? Blinue/Magpie#276

Closed

trholding mentioned this issue Aug 26, 2023

Microsoft Defender warning latest build trholding/llama2.c#8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

False Virus Total Flags [Changed title to reflect current status] #342

False Virus Total Flags [Changed title to reflect current status] #342

trholding commented Dec 29, 2021

jart commented Dec 30, 2021 •

edited

Loading

trholding commented Dec 30, 2021

trholding commented Jan 4, 2022

jart commented Jan 4, 2022

trholding commented Jan 6, 2022 •

edited

Loading

gnu-enjoyer commented Jan 7, 2022

trholding commented Jan 8, 2022

jart commented Jan 11, 2022

False Virus Total Flags [Changed title to reflect current status] #342

False Virus Total Flags [Changed title to reflect current status] #342

Comments

trholding commented Dec 29, 2021

jart commented Dec 30, 2021 • edited Loading

trholding commented Dec 30, 2021

trholding commented Jan 4, 2022

jart commented Jan 4, 2022

trholding commented Jan 6, 2022 • edited Loading

gnu-enjoyer commented Jan 7, 2022

trholding commented Jan 8, 2022

Final Check

Findings

Conclusion

Possible Solution

Signing

jart commented Jan 11, 2022

jart commented Dec 30, 2021 •

edited

Loading

trholding commented Jan 6, 2022 •

edited

Loading