# CVE Mitre Introduction

Vulnerabilities data are available in three different sources: **CVE Mitre, NVD and CVE Details**, being created and annotated through the data sources in this respective order. Launched in 1999 when most information security tools used their own databases with their own names for security vulnerabilities, the Common Vulnerabilities and Exposures (CVE) by Mitre documents known vulnerabilities manually for public usage. Each vulnerability contains a description, is uniquely identified by a CVE ID, and may also include fields specifying the vulnerable software, version and vendors affected by it. If a set of vulnerabilities are similar, but occur for different software, they can have different CVE-IDs, and contain the same weakness ID (CWE ID). When created by CVE Mitre, each vulnerability may or not be annotated with a weakness ID(CWE ID),but when available they can serve to group similar vulnerabilities conceptually,and observe how they have been ‘instantiated’ in different software, version or vendor. CVE Mitre’s vulnerabilities are then annotated with severity scores, fix information, and impact ratings in the National Vulnerability Database(NVD),and made available for download as XML feeds. CVE Details was created to provide a user-friendly interface to NVD’s XML feeds. For instance, using vulnerabilities’ CWE IDs and keyword matching, it defines 13 vulnerability types to facilitate browsing vulnerabilities. Since CVE Details warns about inconsistencies in NVD XML Feeds (e.g.same vendor’s software having different names), and irrelevant entries to our purposes (i.e. reserved, duplicates and removed entries), we downloaded all software vulnerabilities to date from the three sources to define our vulnerability dataset and ensure consistency.

## Motivation

The CVE Mitre database has information about the reference (or the source) of the vulnerability. There are various sources, and the database provides information about the Url it is reported from the description of the attack (with an ID associated). It is important to identify the right sources of vulnerabilities and this notebook aims to help choose the sources and and filter the chosen ones into a new file. 

# Method

The files provided by the CVE Mitre website are in CVRF(XML)format and can be found http://cve.mitre.org/data/downloads/index.html . The XML schema is built such that it encapsulates tables within a table. We will parse through the tree to reach the required child node and perform pattern matching using regular expressions. This will enable us to extract the right fields and write it onto a file(file1). The other unfiltered sources are writen into another file(file2), from where they can be fetched if felt they are to be considered.

In [122]:
#import Element tree for parsing xml
from xml.etree.ElementTree import ElementTree 
import csv
import re
import glob

#parsing the tree and fetching root node
table_root = "{http://www.icasi.org/CVRF/schema/vuln/1.1}"

In [131]:
#creating variables for regex search
BID_regex = "BID:(\d+)"
SECTRACK_regex = "SECTRACK:(\d+)"
MS_regex = "MS:[A-Z]*[0-9]*-[0-9]*"
REDHAT_regex = "REDHAT:RHSA-[0-9]*:[0-9]*"
GENTOO_regex = "GENTOO:GLSA-[0-9]*-[0-9]*"
DEBIAN_regex = "DEBIAN:DSA-(\d+)"
SECUNIA_regex = "SECUNIA:(\d+)"
TURBO_regex = "TURBO:TLSA-[0-9]*-[0-9]*"
AIXAPAR_regex = "AIXAPAR:[A-Z]*[0-9]*"
ALLAIRE_regex = "ALLAIRE:[A-Z]*[0-9]*-[0-9]*"
AUSCERT_regex = "AUSCERT:[A-Z]*-[0-9]*.[0-9]*"
BEA_regex = "BEA:BEA[0-9]*-[0-9]*.[0-9]*"
CIAC_regex = "CIAC:[A-Z]*-[0-9]*"
CONECTIVA_regex = "CONECTIVA:[A-Z]*-[0-9]*:[0-9]*"
OSVDB_regex = "OSVDB:(\d+)"

#creating headers for file
header_file1= ["CVE ID","BID Description","BID Url","SECTRACK Description",
               "SECTRACK Url","MS Description","MS Url","REDHAT Description",
               "REDHAT Url","DEBIAN Description","DEBIAN Url","GENTOO Description",
               "GENTOO Url","SECUNIA Description","SECUNIA Url","TURBO Description",
               "TURBO Url","AIXAPAR Description","AIXAPAR Url","ALLAIRE Description",
               "ALLAIRE Url","APPLE Description","APPLE Url","ATSTAKE Description",
               "ATSTAKE Url","AUSCERT Description","AUSCERT Url","BEA Description",
               "BEA Url","CALDERA Description","CALDERA Url","CERT Description","CERT Url",
               "CIAC Description","CIAC Url","CONECTIVA Description",
               "CONECTIVA Url","CONFIRM Description","CONFIRM Url","OSVDB Description",
               "OSVDB Url","BUGTRAQ Description", "BUGTRAQ Url","CISCO Description", 
               "CISCO Url","BINDVIEW Description", "BINDVIEW Url","EXPLOIT Description",
               "EXPLOIT Url","FEDORA Description", "FEDORA Url","FULLDISC Description",
               "FULLDISC Url","MILLWORM Description", "MILLWORM Url","MISC Description",
               "MISC Url","MLIST Description", "MLIST Url","SUSE Description",
               "SUSE Url","XF Description", "XF Url","UBUNTU Description", "UBUNTU Url",
              "VUPEN Description", "VUPEN Url","SREASON Description", "SREASON Url",
              "OVAL Description", "OVAL Url","SGI Description", "SGI Url","CHECKPOINT Description",
              "CHECKPOINT Url","MANDRAKE Description", "MANDRAKE Url","MANDRIVA Description", 
               "MANDRIVA Url","COMPAQ Description", "COMPAQ Url","FREEBSD Description", "FREEBSD Url",
              "HP Description", "HP Url","IBM Description", "IBM Url","IDEFENSE Description", 
               "IDEFENSE Url","IMMUNIX Description", "IMMUNIX Url","ISS Description", "ISS Url",
              "JVN Description", "JVN Url","L0PHT Description","L0PHT Url","OPENBSD Description",
               "OPENBSD Url","SUNALERT Description","SUNALERT Url","TRUSTIX Description",
               "TRUSTIX Url","SLACKWARE Description","SLACKWARE Url","NETBSD Description","NETBSD Url",
              "VIM Description","VIM Url","VULNWATCH Description","VULNWATCH Url",
               "MSKB Description","MSKB Url"]
header_file2= ["CVE ID","Reference Description","Reference Url"]

#creating list to hold data for file.write into file
file1_data = []
file2_data = []

In [132]:
#write into file currently holding references
def write_file(filename,data,header):
            with open(filename , 'w') as file:
                writer = csv.DictWriter(file, fieldnames = header)
                writer.writeheader()
                for value in data:
                    try:
                        writer.writerow(value)
                    except UnicodeEncodeError:
                        
                        writer.writerow({k:v.encode('utf8') for k,v in value.items()})

In [133]:
def reference_sort(data):
    #data[1] holds references
    if data[1] is not None:
        for child in data[1].findall(table_root+"Reference"):
            file1 = {}
            file2 = {}
            #re.search(regex,text)
            if re.search(BID_regex,child.find(table_root + "Description").text):
                file1["CVE ID"] = data[0].text
                file1["BID Url"] = child.find(table_root + "URL").text
                file1["BID Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif re.search(SECTRACK_regex,child.find(table_root + "Description").text):
                file1["CVE ID"] = data[0].text
                file1["SECTRACK Url"] = child.find(table_root + "URL").text
                file1["SECTRACK Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif re.search(MS_regex,child.find(table_root + "Description").text):
                file1["CVE ID"] = data[0].text
                file1["MS Url"] = child.find(table_root + "URL").text
                file1["MS Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "DEBIAN:" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["DEBIAN Url"] = child.find(table_root + "URL").text
                file1["DEBIAN Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)            
            elif re.search(REDHAT_regex,child.find(table_root + "Description").text):
                file1["CVE ID"] = data[0].text
                file1["REDHAT Url"] = child.find(table_root + "URL").text
                file1["REDHAT Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif re.search(AIXAPAR_regex,child.find(table_root + "Description").text):
                file1["CVE ID"] = data[0].text
                file1["AIXAPAR Url"] = child.find(table_root + "URL").text
                file1["AIXAPAR Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "GENTOO:" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["GENTOO Url"] = child.find(table_root + "URL").text
                file1["GENTOO Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif re.search(SECUNIA_regex,child.find(table_root + "Description").text):
                file1["CVE ID"] = data[0].text
                file1["SECUNIA Url"] = child.find(table_root + "URL").text
                file1["SECUNIA Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "TURBO:" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["TURBO Url"] = child.find(table_root + "URL").text
                file1["TURBO Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif re.search(ALLAIRE_regex,child.find(table_root + "Description").text):
                file1["CVE ID"] = data[0].text
                file1["ALLAIRE Url"] = child.find(table_root + "URL").text
                file1["ALLAIRE Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "APPLE:" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["APPLE Url"] = child.find(table_root + "URL").text
                file1["APPLE Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "ATSTAKE:" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["ATSTAKE Url"] = child.find(table_root + "URL").text
                file1["ATSTAKE Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif re.search(AUSCERT_regex,child.find(table_root + "Description").text):
                file1["CVE ID"] = data[0].text
                file1["AUSCERT Url"] = child.find(table_root + "URL").text
                file1["AUSCERT Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "CALDERA:" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["CALDERA Url"] = child.find(table_root + "URL").text
                file1["CALDERA Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif re.search(BEA_regex,child.find(table_root + "Description").text):
                file1["CVE ID"] = data[0].text
                file1["BEA Url"] = child.find(table_root + "URL").text
                file1["BEA Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)            
            elif re.search(CIAC_regex,child.find(table_root + "Description").text):
                file1["CVE ID"] = data[0].text
                file1["CIAC Url"] = child.find(table_root + "URL").text
                file1["CIAC Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "CONECTIVA:" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["CONECTIVA Url"] = child.find(table_root + "URL").text
                file1["CONECTIVA Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "CERT" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["CERT Url"] = child.find(table_root + "URL").text
                file1["CERT Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "BINDVIEW:" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["BINDVIEW Url"] = child.find(table_root + "URL").text
                file1["BINDVIEW Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "CHECKPOINT:" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["CHECKPOINT Url"] = child.find(table_root + "URL").text
                file1["CHECKPOINT Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)            
            elif "CONFIRM:" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["CONFIRM Url"] = child.find(table_root + "URL").text
                file1["CONFIRM Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif re.search(OSVDB_regex,child.find(table_root + "Description").text):
                file1["CVE ID"] = data[0].text
                file1["OSVDB Url"] = child.find(table_root + "URL").text
                file1["OSVDB Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "BUGTRAQ:" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["BUGTRAQ Url"] = child.find(table_root + "URL").text
                file1["BUGTRAQ Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "CISCO:" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["CISCO Url"] = child.find(table_root + "URL").text
                file1["CISCO Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "EXPLOIT" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["EXPLOIT Url"] = child.find(table_root + "URL").text
                file1["EXPLOIT Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "FEDORA:" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["FEDORA Url"] = child.find(table_root + "URL").text
                file1["FEDORA Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "FULLDISC:" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["FULLDISC Url"] = child.find(table_root + "URL").text
                file1["FULLDISC Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "MILW0RM:" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["MILLWORM Url"] = child.find(table_root + "URL").text
                file1["MILLWORM Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "MISC:" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["MISC Url"] = child.find(table_root + "URL").text
                file1["MISC Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "MLIST:" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["MLIST Url"] = child.find(table_root + "URL").text
                file1["MLIST Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "OVAL:" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["OVAL Url"] = child.find(table_root + "URL").text
                file1["OVAL Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "SGI:" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["SGI Url"] = child.find(table_root + "URL").text
                file1["SGI Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "SREASON:" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["SREASON Url"] = child.find(table_root + "URL").text
                file1["SREASON Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "SUSE:" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["SUSE Url"] = child.find(table_root + "URL").text
                file1["SUSE Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "UBUNTU:" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["UBUNTU Url"] = child.find(table_root + "URL").text
                file1["UBUNTU Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "VUPEN:" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["VUPEN Url"] = child.find(table_root + "URL").text
                file1["VUPEN Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "XF:" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["XF Url"] = child.find(table_root + "URL").text
                file1["XF Description"] = child.find(table_root + "Description").text
                file1_data.append(file1) 
            elif "COMPAQ" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["COMPAQ Url"] = child.find(table_root + "URL").text
                file1["COMPAQ Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "FREEBSD" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["FREEBSD Url"] = child.find(table_root + "URL").text
                file1["FREEBSD Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "HP:" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["HP Url"] = child.find(table_root + "URL").text
                file1["HP Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "IBM" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["IBM Url"] = child.find(table_root + "URL").text
                file1["IBM Description"] = child.find(table_root + "Description").text
                file1_data.append(file1) 
            elif "IDEFENSE" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["IDEFENSE Url"] = child.find(table_root + "URL").text
                file1["IDEFENSE Description"] = child.find(table_root + "Description").text
                file1_data.append(file1) 
            elif "IMMUNIX" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["IMMUNIX Url"] = child.find(table_root + "URL").text
                file1["IMMUNIX Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "ISS" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["ISS Url"] = child.find(table_root + "URL").text
                file1["ISS Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "JVN" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["JVN Url"] = child.find(table_root + "URL").text
                file1["JVN Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "L0PHT" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["L0PHT Url"] = child.find(table_root + "URL").text
                file1["L0PHT Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "MANDRAKE" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["MANDRAKE Url"] = child.find(table_root + "URL").text
                file1["MANDRAKE Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "MANDRIVA" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["MANDRIVA Url"] = child.find(table_root + "URL").text
                file1["MANDRIVA Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "OPENBSD" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["OPENBSD Url"] = child.find(table_root + "URL").text
                file1["OPENBSD Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "SUNALERT" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["SUNALERT Url"] = child.find(table_root + "URL").text
                file1["SUNALERT Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "SLACKWARE" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["SLACKWARE Url"] = child.find(table_root + "URL").text
                file1["SLACKWARE Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "VULNWATCH" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["VULNWATCH Url"] = child.find(table_root + "URL").text
                file1["VULNWATCH Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "VIM" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["VIM Url"] = child.find(table_root + "URL").text
                file1["VIM Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "NETBSD" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["NETBSD Url"] = child.find(table_root + "URL").text
                file1["NETBSD Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "MSKB" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["MSKB Url"] = child.find(table_root + "URL").text
                file1["MSKB Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif "TRUSTIX" in child.find(table_root + "Description").text:
                file1["CVE ID"] = data[0].text
                file1["TRUSTIX Url"] = child.find(table_root + "URL").text
                file1["TRUSTIX Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)                        
            else:
                file2["CVE ID"] = data[0].text
                file2["Reference Description"] = child.find(table_root + "Description").text
                file2["Reference Url"] = child.find(table_root + "URL").text
                file2_data.append(file2)

In [134]:
#verify write operation into file and perform reference sort
def module_runner(cve_Tree):
    print("Vulnerability data count:" + str(len(CVE_tree.findall(table_root+"Vulnerability")))); 
    v_counter = 0
    for vul in CVE_tree.findall(table_root+"Vulnerability"):
        #print ("Vulnerability index: " + str(v_counter));
        v_counter +=1
        reference_sort((vul.find(table_root + "CVE"),vul.find(table_root + "References")))

In [135]:
#call module runner to perform parsing
for filename in glob.glob("CVE_XML/*.xml"):
        CVE_tree = ElementTree()
        CVE_tree.parse(filename)
        module_runner(CVE_tree)
print ("Reference count: "+ str(len(file1_data)));
write_file('file1.csv', file1_data, header_file1)
write_file('file2.csv', file2_data, header_file2)

Vulnerability data count:103484
Reference count: 497569
