<a href="https://colab.research.google.com/github/ryanoleary26/VirusTotal-Payload-Scanner/blob/master/VirusTotal_API.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Analysing Honeypot Payloads with the VirusTotal API

Any files that are given to this script in a .ZIP file will be indivually sent to VirusTotal for scanning using [their v2 API](https://developers.virustotal.com/reference).

#Features



1. Displays scan reports from over [70 different Antivirus products](https://support.virustotal.com/hc/en-us/articles/115002146809-Contributors).
2. Creates a file extension summary graph of all payloads scanned.
3. Provides a URL to each indivudual scan performed.
4. Provides statitistics on total scans performed and total positive results.
5. Ideal for analysis of malicious files collected through honeypots.
6. API usage quota detection
7. Customisability option to only display results from specified Antivirus engines.


#Prerequesites

1. Python 3+
2. [Requests](https://requests.readthedocs.io/en/master/) module
3. [Matplotlib](https://matplotlib.org/) module
4. [OrderedDict](https://docs.python.org/3/library/collections.html#collections.OrderedDict) the [collections](https://docs.python.org/3/library/collections.html) module

#How to use

1. To use this script for yourself you will need to obtain a free public API key by [creating an account on the VirusTotal website](https://www.virustotal.com/gui/join-us).
2. Enter your API key on line 10, see below:

  ```
  #Enter API key here ########
  api_key = '<API KEY HERE>' #
  ############################
  ```
3. Enter the file path of the ZIP file you wish to use on line 18, see below. Note that currentlyy files that are contained within sub-directories of the ZIP file are currently not scanned. To get round this seperate groups of files into seperate ZIP files until I implement this feature.
```
  #Enter the location of your .ZIP file here #
  filePath = '<FILE PATH HERE>'              #
  ############################################
```
4. Line 14 contains a dictionary of Antivirus (AV) names to include in the scan output. Add or remove Antivirus names from this dictionary as you please. The default configuration will include Avast, AVG, BitDefender, FireEye, F-Secure, McAfee and Microsoft in the specific scan result output. This **IS** case sensitive.
```
#Customise the scan output by changing the included AV names here. This IS case and space sensitive.
antiviruses = {'Avast', 'AVG', 'BitDefender', 'FireEye', 'F-Secure', 'Malwarebytes', 'McAfee', 'Microsoft'}
```
5. Run the script and enjoy the sexy Antivirus results - Happy Virus Hunting!







In [0]:
import json
from zipfile import ZipFile
import time
import requests
import os
import matplotlib.pyplot as plotter
from collections import OrderedDict

#Enter API key here ##########################################################
api_key = '<API KEY HERE>' #
##############################################################################  

#Customise the scan output by changing the included AV names here. This IS case and space sensitive.########
antiviruses = {'Avast', 'AVG', 'BitDefender', 'FireEye', 'F-Secure', 'Malwarebytes', 'McAfee', 'Microsoft'}#
############################################################################################################

#Enter the location of your .ZIP file here #############################
filePath = '<FILE PATH HERE>'
########################################################################

with ZipFile(filePath, 'r') as zipObj:#
  # Extract all the contents of zip file in different directory
  hashList = zipObj.namelist()

print('Scanning ZIP file: {0}'.format(filePath))
print('{0} files found.'.format(len(hashList)))
print('Your customised Antivirus engine outputs are:', ', '.join([str(av) for av in antiviruses]))
print('Scan commencing in 5 seconds.')
time.sleep(5)

print('\n##############################################################################################################\n')
payloadNo = 0

#Scan statistics tracking class
class Stats:
  def __init__ (self):
    self.positiveResult = 0
    self.negativeResult = 0
    self.noResult = 0
    self.extensionsFound = {}

  positiveResult = 0
  negativeResult = 0
  noResult = 0
  extensionsFound = {}

try:
  for payload in hashList:
      payloadNo += 1
      #Makes request to virustotal to scan file
      url = 'https://www.virustotal.com/vtapi/v2/file/report'
      params = {'apikey': api_key, 'resource': payload, 'allinfo': 1}
      try:
        response = requests.get(url, params=params)
        vt_response = response.json()
      except:
        if response.status_code == 204:
        #Handles the 204 response code
          print('Recieved HTTP Response Code:', response.status_code, '-- Indicates that API usage quota has been used up.\nRefer to the VirusTotal API documentation for error description.\nDocumentation link: https://developers.virustotal.com/reference#api-responses')
          break
        print('An interrupt or error has occured.')
        break
          
      #Parsing the JSON response
      print('Scanning Payload {0}/{1}: {2}'.format( payloadNo, len(hashList),payload))
      if vt_response['response_code'] == 1:
        print('Scan initiated at', vt_response['scan_date'])
        #Doesn't post AV results if there are zero positive results
        if vt_response['positives'] == 0:
          print('\nAll scans returned negative.')
          Stats.negativeResult += 1
        else:
          print('=== Antivirus Results ===')
          #Loops through selected AVs for detection results
          for av in antiviruses:
            try:
              print(av, '-> Detected:', vt_response['scans'][av]['detected'], ' Result:', vt_response['scans'][av]['result'])
            except KeyError:
              #AVs included in 'antiviruses' might not always return scan results, which will cause a TypeError
              print(av, 'could not return any test results.') 
          Stats.positiveResult += 1

        #Standard scan result output
        print('\n=== Scan Results ===')
        print('Total Scans:', vt_response['total'])
        print('Total Positive Results:', vt_response['positives'])
        print('First seen:', vt_response['first_seen'])
        print('Times Submitted:', vt_response['times_submitted'])
        print('Scan Permalink: ', vt_response['permalink'], '\n')
      
        #Attempts to output any EXIF data returned from the API
        print('=== EXIF Data provided by https://exiftool.org/ ===')
        try:
          print('File Type/Extension: {} (.{})'.format(vt_response['additional_info']['exiftool']['FileType'], vt_response['additional_info']['exiftool']['FileTypeExtension']))          
          #Retrieves the extension type and the number of occurances
          #If the extension is not already stored then it is added with default value of 0
          extension = Stats.extensionsFound.get(vt_response['type'], 0)
          extension += 1
          Stats.extensionsFound[vt_response['type']] = extension
          
          print('File Size:', round(vt_response['size']/1024/1024,2), 'MB')
          print('Description:', vt_response['additional_info']['magic'])
          print('Target Operating System:', vt_response['additional_info']['exiftool']['OperatingSystem'])

        #Not every EXIF data field is returned by the API, which is handled below.  
        except KeyError as err:
          print('\nSome EXIF data was not found :(\n')
        
      #Handles other response codes: https://developers.virustotal.com/reference#api-responses
      elif vt_response['response_code'] != 1:
        print('===========================================================================\n',vt_response['verbose_msg'],'\n===========================================================================')
        Stats.noResult += 1
      print('\n##############################################################################################################\n')

      #Sleeps script to comply with API user agreement
      #Set to 15 for public API, 4.5 for Academic/Premium API
      time.sleep(0)

#Generic exception handling :/
except KeyboardInterrupt:
  print('Keyboard Interrupt')

#Show intersting scan stats
print('=== Scan Report ===')
print('{0} files were scanned'.format(payloadNo))
print('{0} positive matches were found for malicous files'.format(Stats.positiveResult))
print('{0} negative matches were found for malicous files'.format(Stats.negativeResult))
print('{0} scans had no result due to files still being analysed.'.format(Stats.noResult))
print('\nList of different file extensions found from successful scans:')

#Show each file extension found
sortedExtensions = OrderedDict(sorted(Stats.extensionsFound.items(), key=lambda x:x[1], reverse=True))
for ext in sortedExtensions:
  print('   {0}: {1}'.format(ext,sortedExtensions[ext]))

#Graphing
print('\n=== Graph Report ===')
print('Once created, you can customise the graph in the matplotlib GUI.\n')
x_data = []
extensionFreq = []

for ext in sortedExtensions:
    x_data.append(ext)
    extensionFreq.append(sortedExtensions[ext])
x_pos = [i for i, _ in enumerate(x_data)]
rects = plotter.bar(x_pos, extensionFreq, color='#621360', align='center', width=0.4)
#Annotates the top of each bar with its value
for rect in rects:
        height = rect.get_height()
        plotter.annotate('{}'.format(height), xy=(rect.get_x() + rect.get_width() / 2, height), xytext=(0, 3), textcoords="offset points", ha='center', va='bottom')
plotter.xlabel('Extension type')
plotter.ylabel('Frequency')
plotter.title('File extensions found inside {0}'.format(filePath),bbox={'facecolor':'0.8', 'pad':5}, y=1.08)
plotter.xticks(x_pos, x_data)
plotter.figure(figsize=(25,3),dpi=300)
plotter.rcParams['figure.facecolor'] = 'white'
#Opens the matplotlib GUI
plotter.show()
print('Scan complete')