# VirusShare Scraper

This code is used to access VirusShare website https://virusshare.com/ and download samples automatically. 


Developed by: Faithful Onwuegbuche.

Please reference the MLRan paper if you are to use this code.

**Caution: You are about to download potentially harmful samples from the VirusShare website. Ensure that their use aligns with VirusShare’s policies and adheres to established cybersecurity ethics. Handle these samples responsibly and only within a secure, isolated environment (e.g., a virtual machine) to prevent accidental infection. VirusShare archives these files with the password "infected" as an additional security measure.**

In [2]:
# importing libraries

import requests
from bs4 import BeautifulSoup
import time
import os

### Login Function

Function to login to VirusShare using the credentials. 

In [4]:
# VirusShare credentials
# You can get these credentials at: https://virusshare.com/ 
username = '' #write your virusshare username here
password = '' #write your virusshare password here

In [5]:
# Function login() enables you to login to virusshare programmatically.
login_url = 'https://virusshare.com/processlogin'

session = requests.Session()

def login(session, username, password):
    login_data = {
        'username': username,
        'password': password
    }
    response = session.post(login_url, data=login_data, allow_redirects=True)
    if response.status_code == 200 and "Logout" in response.text:
        print('Successfully logged in')
        return True
    else:
        print('Failed to log in')
        print(f'Response status code: {response.status_code}')
        print(f'Response text: {response.text}')
        return False

In [6]:
# Testing the login function
login(session, username, password)

Successfully logged in


True

### Search Function

This code is used to search VirusShare according to the sample hash provided.

In [8]:
# URL for search
search_url = 'https://virusshare.com/search'

def search_hash(session, hash_to_search):
    search_data = {
        'search': hash_to_search
    }
    response = session.post(search_url, data=search_data, allow_redirects=True)
    if response.status_code == 200:
        print(f'Successfully searched for hash {hash_to_search}')
        return response.text
    else:
        print(f'Failed to search for hash {hash_to_search}')
        print(f'Response status code: {response.status_code}')
        print(f'Response text: {response.text}')
        return None

In [9]:
# Testing

try_hash = '60468339f5464275bf51af4bb997ac81d05d75db'
search_results_html = search_hash(session, try_hash)
search_results_html

Successfully searched for hash 60468339f5464275bf51af4bb997ac81d05d75db


'\n<html>\n<head>\n<meta http-equiv="content-type" content="text/html; charset=UTF-8">\n<meta name="viewport" content="width=device-width, initial-scale=1.0">\n<link rel="icon"\n href="https://virusshare.com/favicon.ico">\n<title>VirusShare.com</title>\n<style>\n<!--\nbody,td,a,p,.h{font-family:arial,sans-serif; hyphens: auto;}\n.h{font-size: 20px;}\n.q{color:#4444ff;}\n.small{font-size: small;}\ntable.rpt { max-width: 100%; overflow-wrap:anywhere; border-collapse: collapse;}\ntr { border-bottom: 1px solid #888; }\ntr.nb { border-bottom: 0px; }\n.lc { font-weight : bold; min-width: 5em;  vertical-align: top; }\n.mc { min-width: 5em; }\n\ntable.wordy { max-width: 640px; overflow-wrap:anywhere; text-align: justify; }\n\nhr.break {\n  border-top: 3px double #888;\n}\n\nimg.vxsicon {\n  display: block;\n  max-width: 48px;\n  max-height: 48px;\n  width: auto;\n  height: auto;\n}\nimg.exticon {\n  display: block;\n  max-width: 34px;\n  max-height: 48px;\n  width: auto;\n  height: auto;\n}\n\

### Find Download Function

This code is used to find the link for download. Make sure you are logged into virusshare website to download.

In [11]:
# Download url 
download_base_url = 'https://virusshare.com/download'

In [12]:
def find_download_link(search_results_html, sample_hash):
    # Create a BeautifulSoup object
    soup = BeautifulSoup(search_results_html, 'html.parser')
    
    # Find all <tr> tags
    tr_tags = soup.find_all('tr')
    
    # Iterate through <tr> tags and find the one containing 'SHA256'
    for tr in tr_tags:
        if 'SHA256' in str(tr):
            # Find the <td> tag containing 'SHA256'
            td_tags = tr.find_all('td')
            for td in td_tags:
                if td.text.strip() == 'SHA256':
                    # Extract the hash value from the next <td> tag
                    hash_value = td.find_next_sibling('td').text.strip()
                    print("SHA256 is:", hash_value)
                    break
            break
    else:
        print("No <tr> tag containing SHA256 found.")
        
    # Construct the download URL
    download_url = f'{download_base_url}?{hash_value}'
    print("The download link is:", download_url)
    return download_url

In [13]:
# Test the find Link 
find_download_link(search_results_html, try_hash)

SHA256 is: 7c3f822d3fb51567e8c629392bc83f55521f4f99aef2da08d8c7925b555fb7bf
The download link is: https://virusshare.com/download?7c3f822d3fb51567e8c629392bc83f55521f4f99aef2da08d8c7925b555fb7bf


'https://virusshare.com/download?7c3f822d3fb51567e8c629392bc83f55521f4f99aef2da08d8c7925b555fb7bf'

### Download Function

This code is used to download samples from virusshare.

In [15]:
def download_sample(session, download_url, sample_hash):
    response = session.get(download_url)
    if response.status_code == 200:
        # Save the file
        filename = f"{sample_hash}.zip"  # VirusShare samples are zip. Open the file with password, "infected"

        with open(filename, 'wb') as f:
            f.write(response.content)
        print(f"Successfully downloaded sample {sample_hash} to {filename}")
    else:
        print(f"Failed to download {sample_hash}")
        print(f"Status Code: {response.status_code}")

### Downloading multiple samples by their hashes

In [17]:
## MD5 hashes for 3 files for testing
## Here you can create a list of all the hashes you would like to download.

hashes = ["60468339f5464275bf51af4bb997ac81d05d75db", "8e9ab34c889dd3741fb251c30bdfc0ee97cfa174", "bd778bb52e3f58957d462e375e69fbf9829bc29b"]

In [18]:
# Ensure the folder exists for storing downloaded files
download_folder = "sample_download"
if not os.path.exists(download_folder):
    os.makedirs(download_folder)

# implementation:
if login(session, username, password):
    for sample_hash in hashes:
        search_results_html = search_hash(session, sample_hash)
        if search_results_html:
            download_url = find_download_link(search_results_html, sample_hash)
            if download_url:
                download_sample(session, download_url, os.path.join(download_folder, sample_hash))
                time.sleep(10)  # To avoid too frequent requests, adjust the delay as needed
                print()
                print("Next file processing...")
                print()

Successfully logged in
Successfully searched for hash 60468339f5464275bf51af4bb997ac81d05d75db
SHA256 is: 7c3f822d3fb51567e8c629392bc83f55521f4f99aef2da08d8c7925b555fb7bf
The download link is: https://virusshare.com/download?7c3f822d3fb51567e8c629392bc83f55521f4f99aef2da08d8c7925b555fb7bf
Successfully downloaded sample sample_download/60468339f5464275bf51af4bb997ac81d05d75db to sample_download/60468339f5464275bf51af4bb997ac81d05d75db.zip

Next file processing...

Successfully searched for hash 8e9ab34c889dd3741fb251c30bdfc0ee97cfa174
SHA256 is: 7af6d15f32d699466ee92f978a6eda5ae3ad6223c65caa8f299605e86536840a
The download link is: https://virusshare.com/download?7af6d15f32d699466ee92f978a6eda5ae3ad6223c65caa8f299605e86536840a
Successfully downloaded sample sample_download/8e9ab34c889dd3741fb251c30bdfc0ee97cfa174 to sample_download/8e9ab34c889dd3741fb251c30bdfc0ee97cfa174.zip

Next file processing...

Successfully searched for hash bd778bb52e3f58957d462e375e69fbf9829bc29b
SHA256 is: 7aba