## 1. TRONTO from OWL document

_To speed up the making of an ontology, we will write it directly as an OWL document. This notebook shows the steps that need to be taken to achieve that goal._

The ontology we are going to load has been written directly as an OWL document (i.e. an XML document following the rdf specifications). For a reference on the OWL programming language, see https://www.w3.org/TR/owl-guide/ and Chapter 3 (file://Users/raranovi/Documents/Research/SemanticWeb/WOHLGENANNT_2011/j.ctv9hj8nd.5.pdf) of Wohlgenannt (2011).

The ontology models the relations between two major taxonomical domains: one for "configurations" of software and hardware and another one for vulnerabilities. Configurations have vulnerabilities (they can be understood as parts of configurations, flaws of design that render a system open to a malicious attack). We model the domains on two structured sources of information: the National Vulnerabilities Database (NVD) and NIST's Common Platform Enumeration (CPE). 

NVD uses a slice of the Common Weaknesses Enumeration to classify vulnerabilities. This slice forms the taxonomic backbone of the Vulnerabilities portion of the ontology (with an additional class for unclassified vulnerabilities: NVDCWEnoinfo"). Individual vulnerabilities (reported in the Common Vulnerabilities and Exploits database, CVE) are the instances.

CPE offers a dictionary with Individual Resource Identifiers (IRI) for a vast array of computer systems and components. The major segments of an IRI are "part" "vendor" and "product". "Part" classifies configurations into "application" "operating system" and "hardware" , and gives us the first taxonomic layer of the configurations part of the ontology. Products are the instances of this part of the ontology. Products may have other attributes (e.g. vendor, language, etc.) which we may choose to model as separate classes (even though "vendor" is a higher-order attribute than "product", a product is a configuration, but a vendor is not). 

https://nvd.nist.gov/General/News/NVD-CWE-Slice-Update-2019  
https://nvd.nist.gov/vuln/categories/cwe-layout  
https://cwe.mitre.org/data/definitions/1003.html  
https://nvd.nist.gov/products/cpe  
https://csrc.nist.gov/projects/security-content-automation-protocol/specifications/cpe  
https://csrc.nist.gov/publications/detail/nistir/7695/final  
https://csrc.nist.gov/publications/detail/nistir/7696/final  
https://csrc.nist.gov/publications/detail/nistir/7697/final  
https://csrc.nist.gov/publications/detail/nistir/7698/final  


The basic document has two major classes (Vulnerability and Configuration), and their sub-classes, as well as a property `has_vulnerability` that takes a Configuration as its domain and a Vulnerability as its range. We also define an inverse relation `is_vulnerability_of`. There is a second property `has_severity_score`, a data type property, that specifies the severity score of a vulnerability, and a third property `depends_on` that relates configurations to configurations and is transitive.

The goal of the ontology is to duct the status of a configuration (i.e. an application) as vulnerable or safe, depending on the vulnerabilities that affect it or  the configurations it depends on. 

The rest of the work will consist in processing a NVD feed to enrich the ontology with instances and relations.

### Adding instances to the existing ontology

NVD entries can be retrieved in JSON format from the NVD Json feed page:

https://nvd.nist.gov/vuln/data-feeds#JSON_FEED

Json strings can be turned into python dicts in a simple way:

https://www.w3schools.com/python/python_json.asp

After we do that, it is possible to extract the structured information we need to populate an ontology.

The json feed is a dict. The `CVE_Items` key has as its value a list of CVEs. Each individual CVE is in turn a dict. Each item in this dict encodes some information about the CVE. Below is an example of one particular CVE.

This is a string including the information from one of the NVD entries.
This is a sample that we used to fine-tune the information extraction
process.

~~~~
cve_J = """{
    "cve" : {
      "data_type" : "CVE",
      "data_format" : "MITRE",
      "data_version" : "4.0",
      "CVE_data_meta" : {
        "ID" : "CVE-2019-0001",
        "ASSIGNER" : "cve@mitre.org"
      },
      "problemtype" : {
        "problemtype_data" : [ {
          "description" : [ {
            "lang" : "en",
            "value" : "CWE-400"
          } ]
        } ]
      },
      "references" : {
        "reference_data" : [ {
          "url" : "http://www.securityfocus.com/bid/106541",
          "name" : "106541",
          "refsource" : "BID",
          "tags" : [ "Third Party Advisory", "VDB Entry" ]
        }, {
          "url" : "https://kb.juniper.net/JSA10900",
          "name" : "https://kb.juniper.net/JSA10900",
          "refsource" : "CONFIRM",
          "tags" : [ "Vendor Advisory" ]
        }, {
          "url" : "https://lists.fedoraproject.org/archives/list/package-announce@lists.fedoraproject.org/message/RMKFSHPMOZL7MDWU5RYOTIBTRWSZ4Z6X/",
          "name" : "FEDORA-2019-5f14b810f8",
          "refsource" : "FEDORA",
          "tags" : [ ]
        }, {
          "url" : "https://lists.fedoraproject.org/archives/list/package-announce@lists.fedoraproject.org/message/W7CPKBW4QZ4VIY4UXIUVUSHRJ4R2FROE/",
          "name" : "FEDORA-2019-815807c020",
          "refsource" : "FEDORA",
          "tags" : [ ]
        } ]
      },
      "description" : {
        "description_data" : [ {
          "lang" : "en",
          "value" : "Receipt of a malformed packet on MX Series devices with dynamic vlan configuration can trigger an uncontrolled recursion loop in the Broadband Edge subscriber management daemon (bbe-smgd), and lead to high CPU usage and a crash of the bbe-smgd service. Repeated receipt of the same packet can result in an extended denial of service condition for the device. Affected releases are Juniper Networks Junos OS: 16.1 versions prior to 16.1R7-S1; 16.2 versions prior to 16.2R2-S7; 17.1 versions prior to 17.1R2-S10, 17.1R3; 17.2 versions prior to 17.2R3; 17.3 versions prior to 17.3R3-S1; 17.4 versions prior to 17.4R2; 18.1 versions prior to 18.1R3; 18.2 versions prior to 18.2R2."
        } ]
      }
    },
    "configurations" : {
      "CVE_data_version" : "4.0",
      "nodes" : [ {
        "operator" : "OR",
        "cpe_match" : [ {
          "vulnerable" : true,
          "cpe23Uri" : "cpe:2.3:o:juniper:junos:16.1:*:*:*:*:*:*:*"
        }, {
          "vulnerable" : true,
          "cpe23Uri" : "cpe:2.3:o:juniper:junos:16.1:r1:*:*:*:*:*:*"
        }, {
          "vulnerable" : true,
          "cpe23Uri" : "cpe:2.3:o:juniper:junos:16.1:r2:*:*:*:*:*:*"
        }, {
          "vulnerable" : true,
          "cpe23Uri" : "cpe:2.3:o:juniper:junos:16.1:r3:*:*:*:*:*:*"
        }, {
          "vulnerable" : true,
          "cpe23Uri" : "cpe:2.3:o:juniper:junos:16.1:r3-s10:*:*:*:*:*:*"
        }, {
          "vulnerable" : true,
          "cpe23Uri" : "cpe:2.3:o:juniper:junos:16.1:r4:*:*:*:*:*:*"
        }, {
          "vulnerable" : true,
          "cpe23Uri" : "cpe:2.3:o:juniper:junos:16.1:r5:*:*:*:*:*:*"
        }, {
          "vulnerable" : true,
          "cpe23Uri" : "cpe:2.3:o:juniper:junos:16.1:r6:*:*:*:*:*:*"
        }, {
          "vulnerable" : true,
          "cpe23Uri" : "cpe:2.3:o:juniper:junos:16.1:r6-s6:*:*:*:*:*:*"
        }, {
          "vulnerable" : true,
          "cpe23Uri" : "cpe:2.3:o:juniper:junos:16.1:r7:*:*:*:*:*:*"
        } ]
      }, {
        "operator" : "OR",
        "cpe_match" : [ {
          "vulnerable" : true,
          "cpe23Uri" : "cpe:2.3:o:juniper:junos:16.2:*:*:*:*:*:*:*"
        }, {
          "vulnerable" : true,
          "cpe23Uri" : "cpe:2.3:o:juniper:junos:16.2:r1:*:*:*:*:*:*"
        }, {
          "vulnerable" : true,
          "cpe23Uri" : "cpe:2.3:o:juniper:junos:16.2:r2:*:*:*:*:*:*"
        } ]
      }, {
        "operator" : "OR",
        "cpe_match" : [ {
          "vulnerable" : true,
          "cpe23Uri" : "cpe:2.3:o:juniper:junos:17.1:*:*:*:*:*:*:*"
        }, {
          "vulnerable" : true,
          "cpe23Uri" : "cpe:2.3:o:juniper:junos:17.1:r1:*:*:*:*:*:*"
        }, {
          "vulnerable" : true,
          "cpe23Uri" : "cpe:2.3:o:juniper:junos:17.1:r2:*:*:*:*:*:*"
        } ]
      }, {
        "operator" : "OR",
        "cpe_match" : [ {
          "vulnerable" : true,
          "cpe23Uri" : "cpe:2.3:o:juniper:junos:17.2:*:*:*:*:*:*:*"
        }, {
          "vulnerable" : true,
          "cpe23Uri" : "cpe:2.3:o:juniper:junos:17.2:r1:*:*:*:*:*:*"
        }, {
          "vulnerable" : true,
          "cpe23Uri" : "cpe:2.3:o:juniper:junos:17.2:r1-s7:*:*:*:*:*:*"
        }, {
          "vulnerable" : true,
          "cpe23Uri" : "cpe:2.3:o:juniper:junos:17.2:r2:*:*:*:*:*:*"
        } ]
      }, {
        "operator" : "OR",
        "cpe_match" : [ {
          "vulnerable" : true,
          "cpe23Uri" : "cpe:2.3:o:juniper:junos:17.3:*:*:*:*:*:*:*"
        }, {
          "vulnerable" : true,
          "cpe23Uri" : "cpe:2.3:o:juniper:junos:17.3:r1:*:*:*:*:*:*"
        }, {
          "vulnerable" : true,
          "cpe23Uri" : "cpe:2.3:o:juniper:junos:17.3:r2:*:*:*:*:*:*"
        } ]
      }, {
        "operator" : "OR",
        "cpe_match" : [ {
          "vulnerable" : true,
          "cpe23Uri" : "cpe:2.3:o:juniper:junos:17.4:*:*:*:*:*:*:*"
        }, {
          "vulnerable" : true,
          "cpe23Uri" : "cpe:2.3:o:juniper:junos:17.4:r1:*:*:*:*:*:*"
        } ]
      }, {
        "operator" : "OR"
      }, {
        "operator" : "OR",
        "cpe_match" : [ {
          "vulnerable" : true,
          "cpe23Uri" : "cpe:2.3:a:juniper:junos:18.2:*:*:*:*:*:*:*"
        }, {
          "vulnerable" : true,
          "cpe23Uri" : "cpe:2.3:a:juniper:junos:18.2:r1-s3:*:*:*:*:*:*"
        }, {
          "vulnerable" : true,
          "cpe23Uri" : "cpe:2.3:a:juniper:junos:18.2:r1-s4:*:*:*:*:*:*"
        } ]
      } ]
    },
    "impact" : {
      "baseMetricV3" : {
        "cvssV3" : {
          "version" : "3.0",
          "vectorString" : "CVSS:3.0/AV:N/AC:H/PR:N/UI:N/S:U/C:N/I:N/A:H",
          "attackVector" : "NETWORK",
          "attackComplexity" : "HIGH",
          "privilegesRequired" : "NONE",
          "userInteraction" : "NONE",
          "scope" : "UNCHANGED",
          "confidentialityImpact" : "NONE",
          "integrityImpact" : "NONE",
          "availabilityImpact" : "HIGH",
          "baseScore" : 5.9,
          "baseSeverity" : "MEDIUM"
        },
        "exploitabilityScore" : 2.2,
        "impactScore" : 3.6
      },
      "baseMetricV2" : {
        "cvssV2" : {
          "version" : "2.0",
          "vectorString" : "AV:N/AC:M/Au:N/C:N/I:N/A:C",
          "accessVector" : "NETWORK",
          "accessComplexity" : "MEDIUM",
          "authentication" : "NONE",
          "confidentialityImpact" : "NONE",
          "integrityImpact" : "NONE",
          "availabilityImpact" : "COMPLETE",
          "baseScore" : 7.1
        },
        "severity" : "HIGH",
        "exploitabilityScore" : 8.6,
        "impactScore" : 6.9,
        "acInsufInfo" : false,
        "obtainAllPrivilege" : false,
        "obtainUserPrivilege" : false,
        "obtainOtherPrivilege" : false,
        "userInteractionRequired" : false
      }
    },
    "publishedDate" : "2019-01-15T21:29Z",
    "lastModifiedDate" : "2020-07-22T18:00Z"
  }"""
  ~~~~

Here are the paths to each of the pieces of information we are interested in:

+ for the ID `cve : CVE_data_meta : ID`  
+ for the CWE class `cve : probemtype : problemtype_data : [description : [value]]`  
+ for the description: `cve : description : description_data : [value]`
+ for the affected software: `configurations`, but we need to flatten its content
+ for the severity score: `impact : baseMetricV3 : cvssV3 : baseScore`
+ for the base severity : `impact : baseMetricV3 : cvssV3 : baseSeverity`

`Description` is a definition of the vulnerability. It will be treated as datatype = string in the ontology.

`baseSeverity` can be one of {NONE, LOW, MEDIUM, HIGH, CRITICAL} (see below). We are going to convert this into a sclae from 0 to 4  

We are going to store each piece of information in one variable, to assemble an object later.

The **components** present several problems. There are many components listed in each CVE entry. The nested structure of the _configurations_ field is also problematic, since it varies from entry to entry. There must be a way to flatten the field and find all the `cpe23Uri` elements there (see below).

**SEVERITY**

A CVSS score can be between 0.0 and 10.0, with 10.0 being the most severe. To help convey CVSS scores to less technical stakeholders, FIRST maps CVSS scores to the following qualitative ratings:

+ 0.0 = None
+ 0.1-3.9 = Low
+ 4.0-6.9 = Medium
+ 7.0-8.9 = High
+ 9.0 - 10.0 = Critical

To store this as a rank, we will convert the levels into a scale of decimals from 0 (lowest) to 4( highest).

#### Flattening the configuration item

We need to extract all the URIs and split them into two sets: vulnerable and non-vulnerable. 

First, we turn the dict into a flat string. Then we use regular expression matching to extract all the URIs. We need to match every sub-string of the form `"{'vulnerable': True, 'cpe23Uri': 'cpe:2.3:o:juniper:junos:16.1:*:*:*:*:*:*:*'}"`

We want the pattern to be: 

`(\{'vulnerable': True, 'cpe23Uri': 'cpe:2.3:)(.*?)('\})`

The pattern will match a sub-string and split it into three groups. We are interested in the second one, which tells us the class (a, o, or h), vendor, product, and other attributes. We use a non-greedy suffix ? to make sure we capture only the smallest sybstring (otherwise the pattern will scan to the end of the configurations item).

Now we need to capture the items we are interested in. We use findall.

`re.findall(pattern, string, flags=0)`: Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.

Because we are using grouping in the pattern, each returned item is a tuple. We want to extract the second element, and add it to a set to reduce duplicates.

In [156]:
!pip install owlready2




In [157]:
# # The argument of this function is an entry in a cve dict. 
# # It should iterate over the dict, and the output should be added to a new dict.

# def parse_cve(cve):    
#     cve_id = cve["cve"]["CVE_data_meta"]["ID"] # Get the ID
#     cve_class = cve["cve"]["problemtype"]["problemtype_data"][0]["description"][0]["value"] # Get the class
#     cve_description = cve["cve"]["description"]["description_data"][0]["value"] # Get the description
#     cve_score = cve["impact"]["baseMetricV3"]["cvssV3"]["baseScore"] #Get the severity score, numeric
#     #Now we gat the base score and convert it to a scale
#     baseSeverity = cve["impact"]["baseMetricV3"]["cvssV3"]["baseSeverity"]#Get the severity score, qualitative
#     if baseSeverity == "NONE": 
#         cve_severity = 0 
#     elif baseSeverity == "LOW": 
#         cve_severity = 1
#     elif baseSeverity == "MEDIUM": 
#         cve_severity = 2
#     elif baseSeverity == "HIGH": 
#         cve_severity = 3
#     elif baseSeverity == "CRITICAL": 
#         cve_severity = 4
#     # Flatten the configurations dict into a string
#     conf_str = str(cve['configurations'])
#     # Compile the regular expression pattern for matching. This may need to be done only once.
#     cpe_item = re.compile("(\{'vulnerable': True, 'cpe23Uri': 'cpe:2.3:)(.*?)('\})")
#     # Extract the matches
#     cpe_list = cpe_item.findall(conf_str)
#     # Extract the item of interest and add it to a set of affected configurations
#     cpe_set = set(list())
#     for item in cpe_list:
#         cpe_set.add(item[1])
#     return [cve_id, cve_class, cve_description, cve_score, cve_severity, cpe_set]

NOTE: The CVE entries also include configurations that are **not vulnerable** with respect to a particular vulnerability. They are identified by the value `"vulnerable" : false`. A TO DO task is to exploit that information

In [158]:
# The argument of this function is an entry in a cve dict. 
# It should iterate over the dict, and the output should be added to a new dict.

def parse_cve(cve):    
    cve_id = cve["cve"]["CVE_data_meta"]["ID"] # Get the ID
    cve_class = cve["cve"]["problemtype"]["problemtype_data"][0]["description"][0]["value"] # Get the class
    cve_description = cve["cve"]["description"]["description_data"][0]["value"] # Get the description
    cve_score = cve["impact"]["baseMetricV3"]["cvssV3"]["baseScore"] #Get the severity score, numeric
    #Now we gat the base score and convert it to a scale
    baseSeverity = cve["impact"]["baseMetricV3"]["cvssV3"]["baseSeverity"]#Get the severity score, qualitative
    if baseSeverity == "NONE": 
        cve_severity = 0 
    elif baseSeverity == "LOW": 
        cve_severity = 1
    elif baseSeverity == "MEDIUM": 
        cve_severity = 2
    elif baseSeverity == "HIGH": 
        cve_severity = 3
    elif baseSeverity == "CRITICAL": 
        cve_severity = 4
    # Flatten the configurations dict into a string
    conf_str = str(cve['configurations'])
    # Compile the regular expression pattern for matching. This may need to be done only once.
    cpe_item = re.compile("(\{'vulnerable': True, 'cpe23Uri': 'cpe:2.3:)(.*?)('\})")
    # Extract the matches
    cpe_list = cpe_item.findall(conf_str)
    # Extract the item of interest and add it to a set of affected configurations
    cpe_set = set(list())
    for item in cpe_list:
        cpe_set.add(item[1])
        
# Expansion of the funciton with the ability to identify the "complement" set of
# CPE configurations that are NOT vulnerable. We can use the same flattened configurations
# string as before.

    # Compile the regular expression pattern for matching. This may need to be done only once.
    cpe_comp_item = re.compile("(\{'vulnerable': False, 'cpe23Uri': 'cpe:2.3:)(.*?)('\})")
    # Extract the matches
    cpe_comp_list = cpe_comp_item.findall(conf_str)
    # Extract the item of interest and add it to a set of affected configurations
    cpe_comp_set = set(list())
    for item in cpe_comp_list:
        cpe_comp_set.add(item[1])

    return [cve_id, cve_class, cve_description, cve_score, cve_severity, cpe_set, cpe_comp_set]

#### Writing to an OWL-XML file
We now need to turn each dict item into a series of xml elements for the ontology.

We are working on a backbone of classes and properties, so we need to render the individuals and relations.

~~~~
<CWE79 rdf:about="#CVE20156099">
	<rdf:type rdf:resource="http://www.w3.org/2002/07/owl#NamedIndividual"/>
</CWE79>
~~~~
We can get a CVE ID from the keys in a dict entry: 

`cve_id = list(cve_dict.keys())[0]`

This is how we format a string to then write down an instance CVE:

`vuln = """<%(class)s rdf:about="#%(id)s"> \
\n\t<rdf:type rdf:resource="http://www.w3.org/2002/07/owl#NamedIndividual"/> \
\n\t<has_severity_level rdf:datatype="http://www.w3.org/2001/XMLSchema#decimal"> \
%(severity_level)d</has_severity_level> \
\n\t<has_description rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> \
%(description)s</has_description> \
\n</%(class)s>""" \
% {"class": parsed_cve[1].replace('-', "") , \
   "id": parsed_cve[0].replace('-', '') , \
   "severity_level": parsed_cve[4], \
   "description": parsed_cve[2]}`
          
Components require an extra step. We have to parse the URI string into a class and an identifier:

`re.match(r'(.:)([^:]*):([^:]*):([^:]*):([^:]*)', cve_component).group(#)`

The groups are:
+ 1: class (a, o, h)
+ 2: vendor
+ 3: product
+ 4: version
+ 5: release

The class is the first gorup, the identifier is the rest.

~~~~
<Application rdf:about="#ASPNET">
  	<rdf:type rdf:resource="http://www.w3.org/2002/07/owl#NamedIndividual"/>
	<has_vulnerability rdf:resource="#CVE20156099"/>
</Application>
~~~~

We have to cycle over all the configurations in the set, each of which is structured as a URI string. We turn the set into a list, and iterate (or pick an arbitrary one to work with it)

comp_uri = list(cve_dict[cve_id][3])[5]

~~~~
cve_id = cve_dict.keys[0] #iterate over
comp_uri = list(cve_dict[cve_id][3])[5] #iterate over
comp_class = re.match(r'(.:)([^:]*):([^:]*):([^:]*):([^:]*)', comp_uri).group(1)
comp_vendor = re.match(r'(.:)([^:]*):([^:]*):([^:]*):([^:]*)', comp_uri).group(2)
comp_product = re.match(r'(.:)([^:]*):([^:]*):([^:]*):([^:]*)', comp_uri).group(3)
comp_version = re.match(r'(.:)([^:]*):([^:]*):([^:]*):([^:]*)', comp_uri).group(4)
comp_release = re.match(r'(.:)([^:]*):([^:]*):([^:]*):([^:]*)', comp_uri).group(5)
comp_id = comp_vendor + ':' + comp_product + ':' + comp_version + ':' + comp_release

if comp_class == 'a:':
    comp = """<Application rdf:about="#%(c_id)s">\n\t
              <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#NamedIndividual"/>\n\t
	          <has_vulnerability rdf:resource="#(v_id)"/>\n
              </Application>""" % {"c_id" : comp_id, "v_id" : vuln_id}
~~~~

One problem is that the IRI's have a "-" character in their name. This should be removed using the string `.replace(old, new)` method. Another problem: periods to designate version or release numbers mess up the entity identifiers in OWLREADY, must be removed or replaced.

In [159]:
import json

In [160]:
import re

In [161]:
# mount google drive
from google.colab import drive

drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [162]:
# Load the cve feed json file. It loads it as a string
json_file = '/content/gdrive/My Drive/tronto/nvdcve-1.1-2019.json'

with open(json_file, 'r') as json_in:
    json_string = json_in.read()

# This is necessary to properly load a json feed as a python dict
json_feed = json.loads(json_string)

In [163]:
# add backbone to the beginning of document
with open('/content/gdrive/My Drive/tronto/backbone_begin.txt', 'r') as back_file:
  backbone_begin = back_file.read()


destination_file = '/content/gdrive/My Drive/tronto/tronto_instances.txt'
with open(destination_file, 'a') as outfile:
  outfile.write(backbone_begin)

In [164]:
#As we iterate over the json feed, extracting the information, 
#we format it as a string and write to file directly (no need to store 
#information in a dict). The json feed is a dict. The `CVE_Items` key 
# has as its value a **list** of CVEs.

# Since we are going to write to a file, it is a good idea to specify 
#the file where the information is going to be stored first (uncomment
#one for the desired destination):

#destination_file = '/Users/raranovi/Pywk/ontologies/tronto_instances.txt'

error_list = []

# Now we start the processing of the information from the json feed

# To iterate over the members of the full list uncomment line below: 
#for id_num, it in enumerate(json_feed["CVE_Items"]):  

# To keep the iteration short as a trial, use the option below:
# Remember that the json feed is a dict. The 
# CVE_Items key has as its value a list of CVEs. Each individual CVE 
# is in turn a dict. NOTE: comment the line below if you want to run the
# whole JSON feed.

for id_num, it in enumerate(json_feed["CVE_Items"][:]): 
    
#Each member or item is a CVE entry, that needs to be parsed using the 
#`parse_cve()` function
    
    try:
        parsed_cve = parse_cve(it)   #parse the item

# The output of parse_cve is a list: [cve_id, cve_class, cve_description, 
# cve_score, cve_severity, cpe_set] We use the items in this list to format strings 
# for writing to the OWL document. First format a string for the 
# vulnerability. The fields of interest are the CVE ID and the CWE class.
# We are also including a datatype property for the (qualitative) severity 
# score as severity_level, taken from baseSeverity, and a field for the
# description of the vulnerability. The last two are datatypes.

        description = (parsed_cve[2]).replace('<','&lt;').replace('>','&gt;').replace(' & ',' &amp; ').replace('&','&amp;').replace('\"','&quot;').replace('\'','&apos;')
        description = re.sub(r"&", "&amp;", description)

        vuln = """<%(class)s rdf:about="#%(id)s"> \
\n\t<rdf:type rdf:resource="http://www.w3.org/2002/07/owl#NamedIndividual"/> \
\n\t<has_severity_level rdf:datatype="http://www.w3.org/2001/XMLSchema#decimal">%(severity_level)d</has_severity_level> \
\n\t<has_description rdf:datatype="http://www.w3.org/2001/XMLSchema#string">%(description)s</has_description> \
\n</%(class)s>\n""" \
% {"class": parsed_cve[1].replace('-', "") , \
   "id": parsed_cve[0].replace('-', '') , \
   "severity_level": parsed_cve[4], \
   "description": description}

# NOTE: original formulation included a field for severity score: 
#<has_severity rdf:datatype="http://www.w3.org/2001/XMLSchema#float">%(severity_score).f1</has_severity>

# Once the vulnerability string is formatted using the XML/OWL standard 
# and stored as the value of the vuln variable, we write it to a text file.

        with open(destination_file, 'a') as outfile:
            outfile.write(vuln)

# Next format a string for the configuration(s), including the relation 
# to a vulnerability. The parse_cve function stores all the configurations
# in a set. We iterate over the members of the set to extract the fields 
# we are interested in. We join the fields vendor/product/version with
# semicolons to avoid naming conflicts with Owlready (colons designate 
# prefixes in OWL)

        for comp_uri in list(parsed_cve[5]):
            comp_class = re.match(r'(.:)([^:]*):([^:]*):([^:]*):([^:]*)', comp_uri).group(1)
            comp_vendor = re.match(r'(.:)([^:]*):([^:]*):([^:]*):([^:]*)', comp_uri).group(2)
            comp_product = re.match(r'(.:)([^:]*):([^:]*):([^:]*):([^:]*)', comp_uri).group(3)
            comp_version = re.match(r'(.:)([^:]*):([^:]*):([^:]*):([^:]*)', comp_uri).group(4)
            comp_release = re.match(r'(.:)([^:]*):([^:]*):([^:]*):([^:]*)', comp_uri).group(5)

            comp_vendor = (comp_vendor).replace('<','&lt;').replace('>','&gt;').replace('\"','&quot;').replace('\'','&apos;').replace('\\','')
            comp_vendor = re.sub(r"&", "&amp;", comp_vendor)

            comp_product = (comp_product).replace('<','&lt;').replace('>','&gt;').replace('\"','&quot;').replace('\'','&apos;').replace('\\','')
            comp_product = re.sub(r"&", "&amp;", comp_product)

            comp_id = comp_vendor + ';' + comp_product + ';' + comp_version + ';' + comp_release #comp_vendor, comp_product -> remove special, same for description

# Replace problematic characters in CPE identifiers (in the end,
# not needed, because only alphanumeric and _ are allowed)

#        comp_id = comp_id.replace('.', '¢')
#        comp_id = comp_id.replace('-', '–')
#        comp_id = comp_id.replace('*', '¥')
        
# Next we assemble the strings, on for each component. The class field 
# determines the class of component (o, a, or h) the configuration is a daughter of.


            if comp_class == 'o:': 
                comp = """<Operating_system rdf:about="#%(c_id)s">\n\t<rdf:type \
rdf:resource="http://www.w3.org/2002/07/owl#NamedIndividual"/> \
\n\t<has_vulnerability rdf:resource="#%(v_id)s"/> \
\n</Operating_system>\n""" \
% {"c_id" : comp_id, "v_id" : parsed_cve[0].replace('-', "")}
            elif comp_class == 'a:': 
                comp = """<Application rdf:about="#%(c_id)s">\n\t<rdf:type \
rdf:resource="http://www.w3.org/2002/07/owl#NamedIndividual"/> \
\n\t<has_vulnerability rdf:resource="#%(v_id)s"/> \
\n</Application>\n""" \
% {"c_id" : comp_id, "v_id" : parsed_cve[0].replace('-', "")}
            elif comp_class == 'h:': 
                comp = """<Hardware rdf:about="#%(c_id)s">\n\t<rdf:type rdf:resource="http://www.w3.org/2002/07/owl#NamedIndividual"/> \n\t<has_vulnerability rdf:resource="#%(v_id)s"/> \n</Hardware>\n""" \
% {"c_id" : comp_id, "v_id" : parsed_cve[0].replace('-', "")}

#  This will write each of the components with the vulnerability to a file

            with open(destination_file, 'a') as outfile:
                outfile.write(comp)
    
    except IndexError:
        error_list.append('IndexError: list index out of range in CVE %d' % (id_num) + " = " + str(it))
        pass
    except KeyError:
        error_list.append("KeyError: 'baseMetricV3' in CVE %d" % (id_num))
        pass
            
# NOTE: I am getting an "IndexError: list index out of range" error for some CVEs 
# when I try to get the class. It may be that these are classified as "other".
# I have to write a try block to catch the exception.
#
# I also get a "KeyError: 'baseMetricV3'" exception, need to catch it too.

In [165]:
len(error_list)

992

In [166]:
error_list[:5]

["IndexError: list index out of range in CVE 33 = {'cve': {'data_type': 'CVE', 'data_format': 'MITRE', 'data_version': '4.0', 'CVE_data_meta': {'ID': 'CVE-2019-0034', 'ASSIGNER': 'cve@mitre.org'}, 'problemtype': {'problemtype_data': [{'description': []}]}, 'references': {'reference_data': []}, 'description': {'description_data': [{'lang': 'en', 'value': '** REJECT ** DO NOT USE THIS CANDIDATE NUMBER. Reason: This candidate was withdrawn by its CNA. Further investigation showed that it was not a vulnerability. Notes: Google gRPC credentials were found which existed for specific internal product testing purposes which are not used as part of production releases of Junos OS. Hence this is not a vulnerability and this CVE ID assignment has been withdrawn.'}]}}, 'configurations': {'CVE_data_version': '4.0', 'nodes': []}, 'impact': {}, 'publishedDate': '2019-04-10T20:29Z', 'lastModifiedDate': '2019-04-15T12:31Z'}",
 "IndexError: list index out of range in CVE 122 = {'cve': {'data_type': 'CVE

In [167]:
# write the backbone tag to end of destination file
with open(destination_file, 'a') as outfile:
  outfile.write("</rdf:RDF>")

# now, all that's left is to rename the file to -.owl

Once we have a list of owl/xml formatted instances, it can be appended to the taxonomic backbone to get a full-fledged ontology.

## 2. Loading and exploring an ontology

The resulting ontology can be loaded onto Owlready (a python module to handle ontologies) with the `get_ontology().load()` command. Once loaded, it can be searched for its classes, properties, and individuals. To do so, use the `.classes()`, `.properties()`, and `.individuals()` methods of the ontology. These methods return a generator, which cant be displayed or printed by itself. To see the output, make it the argument of a `list()` function, or iterate over the generator itself using a `print()` command.

The previous methods help us find the entities and relations in the ontology. Now we need to find out how they are related, i.e. the taxonomy and other relations. 

+ The .is_a method of any Entity (class, property, individual) reveals the taxonomy going upwards. 
+ The .instances and .subclasses methods show the taxonomy going downwards
+ The .ancestors() and .descendants() methods have similar functionality
+ The .get_properties() method returns the properties associated with an individual or instance, the .get_class_properties() method does the same for a class.
+ The .get_relations() method returns a list of subject/object pairs for a given property.
+ A property can be used as a method of an individual to find the range in the relation
+ Properties have .domain and .range methods, to query their arguments

Below I run examples of these query methods.

Another way to query or inspect an ontology is to run **searches**, using the .search() method of the ontology. A search needs a keyword and a value (the value can have wildcards). E.g.:

`onto.search(type = onto.Vulnerability)`

The first steps in creating an ontology are to import Owlready and make a path to a local directory (where the repository for ontologies is.

In [173]:
# look at specific lines to find xml parsing errors
with open("/content/gdrive/My Drive/tronto/tronto_f.owl") as fp:
    for i, line in enumerate(fp):
        if i == 143903:
          print(line)
          break

</Operating_system>



In [174]:
from owlready2 import *

#Path to local repository
onto_path.append('/content/gdrive/My Drive/tronto/')

In [175]:
# Specify which ontology we want to load, by IRI
currentOnto = "http://tronto_f.owl"

In [176]:
#load ontology from owl document using IRI
onto = get_ontology(currentOnto).load()

NOTE: There were some errors in the XML/OWL file, in the designators for the configurations. `//"` for "inch" created errors because the character was interpreted as the end of a string. `//&` also created errors, had to be replaced by "and".

In [None]:
# Start to explore the ontology
print(onto.base_iri)

In [None]:
list(onto.classes())

In [None]:
list(onto.properties())

In [None]:
onto.CWE74.is_a

In [None]:
onto.CWE74.descendants()

In [None]:
onto.Configuration.ancestors()

In [None]:
# List the instances. Careful: very long!
list(onto.individuals())

Most of the identifiers for the configuration instances have illegal python variable characters in them. Before querying them, we need to assign them a legally named identifier, quoting the instance by its IRI using the IRIS pseudo-dictionary.

In [177]:
# Assign individual to variable through IRIS name
junos_17_2 = IRIS['http://tronto_f.owl#juniper;junos;17.2;*']

In [178]:
junos_17_2.is_a

[tronto_f.Operating_system]

In [None]:
onto.CVE20190001.is_a

In [None]:
print(onto.CVE20190001.has_description[0])

In [None]:
onto.CVE20190001.has_severity_level[0]

In [None]:
onto.CVE20190002.has_severity_level[0]

In [None]:
print(onto.CVE20190002.is_vulnerability_of)

In [None]:
# Assign individual to variable through IRIS name
junos_18_2 = IRIS['http://tronto_f.owl#juniper;junos;18.2;*']

In [None]:
onto.CWE400.is_a

In [None]:
junos_17_2.has_vulnerability

#### Adding a configuration of interest
But what if we want to know about a configuration that is not in the original NIST list? Let's say, an application developped by a team of OSS programmers. What we can do for them is enter it in the system as an instance, listing the dependencies. If one of the dependencies has a vulnerability, the reasoner will conclude that the application is vulnerable as well.

The entry will look like this:

~~~~
<Application rdf:about="#my_application">
  <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#NamedIndividual"/>
  <depends_on rdf:resource="#juniper;junos;17.4;*"/>
</Application>
~~~~

We can do it dynamically using the methods defined in OWLREADY:

In [None]:
myApp = onto.Application("my_application", depends_on = [junos_18_2])

In [None]:
myApp.is_a

In [None]:
myApp.depends_on

The last step is to save the updated ontology if we so desire.

In [None]:
# Save the updated configuration
# onto_1.save(file = "tronto_c1")

### Re-evaluating a configuration

Current version of Junos (as of 2/5/2021): 20.4

We want to "update" the dependencies of the application to remove vulnerable dependencies and replace them with non-vulnerable ones.

PROBLEM: Once the app is classified as vulnerable because of a dependency, how do we reclassify it when the dependency is "patched"? 

One possiblity: if a configuration has no vulnerable dependencies then it is safe. But this would make all of our configurations safe, since they are just listed as unsafe.

We may need to make a distinction between "safe" and "secure". Safety is an inherent property of a configuration. A configuration whose code has a vulnerability is unsafe. Security is a systemic property: a configuration is secure if it is safe and it has no unsafe dependencies. 

OR: Once we remove a "has_vulnerability" relation from an instance, will it be re-classified as not-a-vulnerable configuration? Does the reasoner destroy is-a relations? 

### MODIFYING A RELATION
_NOTE: restart the whole process from here._

To modify a relation, i.e. deleting it, set the range to the empty list:

`individual.relation = []`


In [179]:
from owlready2 import *

#Path to local repository
onto_path.append('/content/gdrive/My Drive/tronto/')

In [180]:
# Specify which ontology we want to load, by IRI
currentOnto = "http://tronto_f.owl"

In [181]:
#load ontology from owl document using IRI
onto = get_ontology(currentOnto).load()

In [None]:
list(onto.classes())

In [None]:
list(onto.classes())

In [None]:
print(onto.Operating_system.instances())

The "local" solution does not work because the restrictions on the class "vulnerable configuration" have the form of a conditional. Removing the antecedent does not affect the consequent once the truth of the consequent is established. 

There are two avenues:  
a) Adopt a "global" solution, where the status of an application as vulnerable or not is evaluated on the basis of the totality of relations the application has to other instances.
b) Modify the restrictions on the class "vulnerable configuration". This may require setting up a separate class "secure configuration", which is disjoint with the former.

Property name can be prefixed by “INDIRECT_” to obtain all indirectly related entities. It takes into account:

        transitive, symmetric and reflexive properties,
        property inheritance (i.e. subproperties),
        classes of an individual (i.e. values asserted at the class level),
        class inheritance (i.e. parent classes).
        equivalences (i.e. equivalent classes, identical “same-as” individuals,…)



In [None]:
# Modified function that queries the ontology and returns an 
# evaluation of the application based on global properties.
# It checks to see if any of the dependencies of an app and the 
# app itself have a "has_vulnerability" property that targets a
# vulnerability. This is a recursive process, which uses the 
# transitive nature of the "depends_on" property. It exploits the 
# "INDIRECT_" prfix to generate a list of all the dependencies of a
# configuration closed under transitivity.

def tell_me_if_vulnerable(app):
    dependencies = list(app.INDIRECT_depends_on) #generate recursive list of dependencies
    dependencies.append(app) #append target app to list
    vuln = 0                 #set default vulneable status to NOT VULNEABLE
    for d in dependencies: #Check configurations list
        if (len(d.has_vulnerability) > 0): #look for member with vulnerability
            vuln = 1 #if found, change status to VULNERABLE and exit loop
            break
    if vuln == 1: #Confirm status and print report
        print("Yes, it is vulnerable")
        
    # Next, see if any vulnerabilities are CRITICAL
    
        for c in dependencies:
            for v in c.has_vulnerability:
                if v.has_severity_level[0] > 3.0:
                    print("WARNING: One or more dependencies have CRITICAL vulnerabilities!")
    else:
        print("No, it is not vulnerable, AFAIK")
            

In [None]:
tell_me_if_vulnerable(junos_17_2)

In [None]:
tell_me_if_vulnerable(junos_18_2)

In [None]:
tell_me_if_vulnerable(myApp)

Now let's patch the dependency

In [None]:
# remove the vulnerability from the range
junos_18_2.has_vulnerability = []

In [None]:
tell_me_if_vulnerable(myApp)

In [None]:
myApp.depends_on