# Agenda

- Enriching data with external knowledge


## A real-life example: kubernetes infrastructure

Kubernetes is a resource orchestrator where you
describe resources in terms of
container images, ram, cpu, network.

In [None]:
!cat guestbook-all-in-one.yaml

[d3fendtools](https://github.com/par-tec/d3fend-tools)
converts a Kubernetes YAML file to an RDF graph.

Let's load one.

In [1]:
from rdflib import Dataset
d = Dataset(store='Oxigraph', default_union=True)
kube = d.graph("urn:my_app")
kube.parse("guestbook.ttl", format="turtle")

<Graph identifier=urn:my_app (<class 'rdflib.graph.Graph'>)>

Exercise: display the graph using `tools.plot_graph`

In [2]:
import tools
tools.plot_graph(d.graph("urn:my_app"), limit=50)

  df = df.fillna(df.max().max())
ERROR:bokeh.core.validation.check:E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name. This could either be due to a misspelling or typo, or due to an expected column being missing. : fill_color='fill_color' [no close matches] {renderer: GlyphRenderer(id='p1053', ...)}


urn:k8s:hasChild
urn:k8s:hasChild
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
urn:k8s:hasChild
urn:k8s:hasChild
urn:k8s:hasChild
urn:k8s:hasChild
urn:k8s:hasChild
urn:k8s:hasChild
http://www.w3.org/2000/01/rdf-schema#label
urn:k8s:hasChild
urn:k8s:hasPort
urn:k8s:hasNamespace
urn:k8s:hasChild
urn:k8s:hasChild
urn:k8s:hasHost
http://www.w3.org/2000/01/rdf-schema#label
urn:k8s:portForward
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
urn:k8s:hasChild
urn:k8s:hasHost
urn:k8s:hasNamespace
http://www.w3.org/2000/01/rdf-schema#label
urn:k8s:hasPort
urn:k8s:hasChild
urn:k8s:hasChild
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
urn:k8s:hasImage
http://d3fend.mitre.org/ontologies/d3fend.owl#runs
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/2000/01/rdf-schema#label
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
urn:k8s:hasNamespace
http://www.w3.org/2000/01/rdf-schema#la

## D3FEND knowledge graph

D3FEND is a cybersecurity knowledge graph
containing a taxonomy of:

- digital artifacts (e.g., Server, Database, etc.);
- defensive techniques (e.g., Multifactor Authentication, Network Isolation, File Analysis, etc.);
- offensive techniques (e.g., Phishing, Content Injection, etc.).

Let's load the D3FEND graph.

In [3]:
d.bind("d3f", "http://d3fend.mitre.org/ontologies/d3fend.owl#")
d3fend = d.graph("http://d3fend.mitre.org/ontologies/d3fend.owl")
d3fend.parse("d3fend.ttl", format="ox-turtle")

<Graph identifier=http://d3fend.mitre.org/ontologies/d3fend.owl (<class 'rdflib.graph.Graph'>)>

Exercise:

- list digital artifacts

In [4]:
q = """
SELECT DISTINCT ?artifact
WHERE {
  ?artifact rdfs:subClassOf d3f:DigitalArtifact .
}
LIMIT 10
"""
result = d.query(q)
list(result)

[(rdflib.term.URIRef('http://d3fend.mitre.org/ontologies/d3fend.owl#DigitalInformationBearer'),),
 (rdflib.term.URIRef('http://d3fend.mitre.org/ontologies/d3fend.owl#DigitalInformation'),)]

- use the `rdfs:subClassOf` predicate to list the
subclasses of `d3f:Server`

In [5]:
q = """
SELECT DISTINCT ?artifact
WHERE {
  ?artifact rdfs:subClassOf d3f:Server .
}
"""
result = d.query(q)
list(result)

[(rdflib.term.URIRef('http://d3fend.mitre.org/ontologies/d3fend.owl#MailServer'),),
 (rdflib.term.URIRef('http://d3fend.mitre.org/ontologies/d3fend.owl#WebServer'),),
 (rdflib.term.URIRef('http://d3fend.mitre.org/ontologies/d3fend.owl#ProxyServer'),),
 (rdflib.term.URIRef('http://d3fend.mitre.org/ontologies/d3fend.owl#OrchestrationServer'),),
 (rdflib.term.URIRef('http://d3fend.mitre.org/ontologies/d3fend.owl#NetworkTimeServer'),),
 (rdflib.term.URIRef('http://d3fend.mitre.org/ontologies/d3fend.owl#VPNServer'),),
 (rdflib.term.URIRef('http://d3fend.mitre.org/ontologies/d3fend.owl#TFTPServer'),),
 (rdflib.term.URIRef('http://d3fend.mitre.org/ontologies/d3fend.owl#PrintServer'),),
 (rdflib.term.URIRef('http://d3fend.mitre.org/ontologies/d3fend.owl#MediaServer'),),
 (rdflib.term.URIRef('http://d3fend.mitre.org/ontologies/d3fend.owl#FileServer'),),
 (rdflib.term.URIRef('http://d3fend.mitre.org/ontologies/d3fend.owl#DatabaseServer'),),
 (rdflib.term.URIRef('http://d3fend.mitre.org/ontologie

Now we use `CONSTRUCT` to create a graph of the subclasses of `d3f:Server`

In [6]:
q = """
CONSTRUCT {
  ?artifact rdfs:subClassOf ?parent .
}
WHERE {
  ?artifact rdfs:subClassOf ?parent .
  ?parent rdfs:subClassOf* d3f:Server .
}
"""
servers = d.query(q).graph

Exercise: display the graph of subclasses of `d3f:Server`

In [7]:
import tools
tools.plot_graph(servers)

  df = df.fillna(df.max().max())
ERROR:bokeh.core.validation.check:E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name. This could either be due to a misspelling or typo, or due to an expected column being missing. : fill_color='fill_color' [no close matches] {renderer: GlyphRenderer(id='p1171', ...)}


http://www.w3.org/2000/01/rdf-schema#subClassOf
http://www.w3.org/2000/01/rdf-schema#subClassOf
http://www.w3.org/2000/01/rdf-schema#subClassOf
http://www.w3.org/2000/01/rdf-schema#subClassOf
http://www.w3.org/2000/01/rdf-schema#subClassOf
http://www.w3.org/2000/01/rdf-schema#subClassOf
http://www.w3.org/2000/01/rdf-schema#subClassOf
http://www.w3.org/2000/01/rdf-schema#subClassOf
http://www.w3.org/2000/01/rdf-schema#subClassOf
http://www.w3.org/2000/01/rdf-schema#subClassOf
http://www.w3.org/2000/01/rdf-schema#subClassOf
http://www.w3.org/2000/01/rdf-schema#subClassOf
http://www.w3.org/2000/01/rdf-schema#subClassOf
http://www.w3.org/2000/01/rdf-schema#subClassOf
http://www.w3.org/2000/01/rdf-schema#subClassOf
http://www.w3.org/2000/01/rdf-schema#subClassOf
http://www.w3.org/2000/01/rdf-schema#subClassOf
http://www.w3.org/2000/01/rdf-schema#subClassOf
http://www.w3.org/2000/01/rdf-schema#subClassOf
http://www.w3.org/2000/01/rdf-schema#subClassOf
http://www.w3.org/2000/01/rdf-schema#sub

- list defensive techniques that are subclasses with `d3f:FileAnalysis`
  together with their `d3f:definition`s

In [8]:
q = """
SELECT DISTINCT
    ?technique
    ?definition
WHERE {
  ?technique rdfs:subClassOf* d3f:FileAnalysis ;
    d3f:definition ?definition .
}
"""

for r in d.query(q):
    print(r.technique, r.definition)

http://d3fend.mitre.org/ontologies/d3fend.owl#FileContentRules Employing a pattern matching rule language to analyze the content of files.
http://d3fend.mitre.org/ontologies/d3fend.owl#FileHashing Employing file hash comparisons to detect known malware.
http://d3fend.mitre.org/ontologies/d3fend.owl#FileAnalysis File Analysis is an analytic process to determine a file's status. For example: virus, trojan, benign, malicious, trusted, unauthorized, sensitive, etc.
http://d3fend.mitre.org/ontologies/d3fend.owl#DynamicAnalysis Executing or opening a file in a synthetic "sandbox" environment to determine if the file is a malicious program or if the file exploits another program such as a document reader.
http://d3fend.mitre.org/ontologies/d3fend.owl#EmulatedFileAnalysis Emulating instructions in a file looking for specific patterns.
http://d3fend.mitre.org/ontologies/d3fend.owl#FileContentAnalysis Employing a pattern matching algorithm to statically analyze the content of files.


Now, let's list the defensive techniques associated with the `d3f:Server` artifact.

In [9]:
q = """
SELECT DISTINCT
    ?technique
    ?artifact
WHERE {
  ?technique rdfs:subClassOf* d3f:DefensiveTechnique .
  ?artifact rdfs:subClassOf* d3f:Server .
  
  ?technique ?protects ?artifact .
}
"""
result = d.query(q)
for r in result:
    print(f"{r.technique} protects {r.artifact}")

http://d3fend.mitre.org/ontologies/d3fend.owl#EmailRemoval protects http://d3fend.mitre.org/ontologies/d3fend.owl#MailServer


Exercise:

- show the `rdfs:label` of the `?technique`

In [10]:
q = """
SELECT DISTINCT
    ?technique_label
    ?artifact
WHERE {
  ?artifact rdfs:subClassOf* d3f:Server .

  ?technique rdfs:subClassOf* d3f:DefensiveTechnique .
  ?technique ?protects ?artifact .
  ?technique rdfs:label ?technique_label .
}
"""
result = d.query(q)
for r in result:
    print(f"{r.techniqueLabel} protects {r.artifact}")

AttributeError: techniqueLabel

- replace `d3f:DefensiveTechnique` with `d3f:OffensiveTechnique`
  and list offensive techniques that affect the `d3f:Server` artifact

In [12]:
q = """
SELECT DISTINCT
    ?techniqueLabel
    ?artifact
WHERE {
  ?artifact rdfs:subClassOf* d3f:Server .
  
  ?technique rdfs:subClassOf* d3f:OffensiveTechnique ;
    rdfs:label ?techniqueLabel ;
    ?protects ?artifact
  .
}
"""
result = d.query(q)
for r in result:
    print(f"{r.techniqueLabel} attacks {r.artifact}")

Web Shell attacks http://d3fend.mitre.org/ontologies/d3fend.owl#WebServer
Transport Agent attacks http://d3fend.mitre.org/ontologies/d3fend.owl#MailServer
Remote Email Collection attacks http://d3fend.mitre.org/ontologies/d3fend.owl#MailServer


## Packing it all together

Let's look at our dataset now:

- one contains our application infrastructure;
- one contains cybersecurity knowledge, including
  artifacts, offensive and defensive techniques.

```mermaid
graph LR

subgraph dataset
  subgraph d3fend 
  d3f:Credential -.-> d3f:DigitalArtifact
  d3f:WebServer -.-> d3f:Server -.-> d3f:DigitalArtifact
  d3f:CredentialRotation([d3f:CredentialRotation]) ==>|defends| d3f:Credential
  d3f:UnsecuredCredential>d3f:UnsecuredCredential] ==>|attacks| d3f:Credential
  end

  subgraph kube 
  db_password -.->|a| k8s:Secret
  app -.->|a| k8s:Deployment
  end
end
```

The kube graph contains not only the Kubernetes resources...

In [16]:
q = """
SELECT DISTINCT
  ?kube
WHERE {
  # Kubernetes resources.
  ?kube rdfs:subClassOf* k8s:Kind .
}
"""
[str(x[0]) for x in kube.query(q)]

['urn:k8s:Job',
 'urn:k8s:HorizontalPodAutoscaler',
 'urn:k8s:Workload',
 'urn:k8s:Pod',
 'urn:k8s:Host',
 'urn:k8s:Endpoints',
 'urn:k8s:CronJob',
 'urn:k8s:DeploymentConfig',
 'urn:k8s:Node',
 'urn:k8s:ReplicaSet',
 'urn:k8s:ImageStream',
 'urn:k8s:Port',
 'urn:k8s:Deployment',
 'urn:k8s:Secret',
 'urn:k8s:BuildConfig',
 'urn:k8s:PersistentVolumeClaim',
 'urn:k8s:DaemonSet',
 'urn:k8s:StatefulSet',
 'urn:k8s:Image',
 'urn:k8s:Service',
 'urn:k8s:Kind',
 'urn:k8s:Container',
 'urn:k8s:Cluster',
 'urn:k8s:ConfigMap',
 'urn:k8s:Route',
 'urn:k8s:Namespace',
 'urn:k8s:Application']

.. but even links to the D3FEND graph.

In [23]:
q = """
PREFIX k8s: <urn:k8s:>
PREFIX d3f: <http://d3fend.mitre.org/ontologies/d3fend.owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT
  ?kube
  ?d3fend
WHERE {
  # Kubernetes resources.
  GRAPH <urn:my_app> {
    ?kube rdfs:subClassOf* k8s:Kind ;
        rdfs:subClassOf ?d3fend 
    .
  }

  # D3FEND resources.
  GRAPH <http://d3fend.mitre.org/ontologies/d3fend.owl>{
    ?d3fend rdfs:subClassOf* d3f:DigitalArtifact .
  }
}
"""
for r in d.query(q):
    print(r.kube, "is a", d.namespace_manager.curie(r.d3fend))

urn:k8s:Service is a d3f:IntranetNetworkTraffic
urn:k8s:Secret is a d3f:Credential
urn:k8s:Secret is a d3f:ConfigurationResource
urn:k8s:Route is a d3f:InternetNetworkTraffic
urn:k8s:Port is a d3f:NetworkService
urn:k8s:PersistentVolumeClaim is a d3f:Volume
urn:k8s:Node is a d3f:Server
urn:k8s:ImageStream is a d3f:ContainerImage
urn:k8s:Image is a d3f:ContainerImage
urn:k8s:Host is a d3f:NetworkNode
urn:k8s:Endpoints is a d3f:NetworkService
urn:k8s:DeploymentConfig is a d3f:ApplicationConfiguration
urn:k8s:Deployment is a d3f:ApplicationConfiguration
urn:k8s:Container is a d3f:ContainerProcess
urn:k8s:ConfigMap is a d3f:ConfigurationResource
urn:k8s:Cluster is a d3f:ContainerOrchestrationSoftware


So we actually get links between our application
objects and the d3fend knowledge base.

```mermaid
graph LR

subgraph dataset
  subgraph d3fend 
  d3f:Credential -.-> d3f:DigitalArtifact
  d3f:ContainerImage -.-> d3f:Server -.-> d3f:DigitalArtifact
  d3f:CredentialRotation([d3f:CredentialRotation]) ==>|defends| d3f:Credential
  d3f:UnsecuredCredential>d3f:UnsecuredCredential] ==>|attacks| d3f:Credential
  d3f:Container
  end

  subgraph kube 
  db_password -.->|a| k8s:Secret
  app -->|a| k8s:Deployment
  end

  k8s:Secret -.-> d3f:Credential
  k8s:Deployment --> d3f:Container --> d3f:ContainerImage
end
```

So we can ask for example
the attack classes towards
our components.

In [None]:
attack_surface = d.query("""
SELECT DISTINCT
  ?attack_label
  ?affects
  ?artifact
  ?kube_resource
WHERE {
  # Get digital artifacts associated with Kubernetes resources.
  ?kind rdfs:subClassOf* k8s:Kind ;
          rdfs:subClassOf ?artifact .
                         
  # Kubernetes resource type is ?kind 
  ?kube_resource a ?kind .
                         
  # Get the associated attacks.
  ?attack ?affects ?artifact .
  
  # With their data.
  ?attack
    d3f:attack-id ?attack_id;
    rdfs:label ?attack_label
  .
}
""")

for attack in sorted(attack_surface):
    print(f"{attack.attack_label}, "
          f"{attack.affects.fragment} {attack.artifact.fragment} for {attack.kube_resource}")


Adversary-in-the-Middle, produces NetworkTraffic for https://k8s.local/default_/Service/frontend
Adversary-in-the-Middle, produces NetworkTraffic for https://k8s.local/default_/Service/redis-master
Adversary-in-the-Middle, produces NetworkTraffic for https://k8s.local/default_/Service/redis-replica
Application Access Token, may-produce NetworkTraffic for https://k8s.local/default_/Service/frontend
Application Access Token, may-produce NetworkTraffic for https://k8s.local/default_/Service/redis-master
Application Access Token, may-produce NetworkTraffic for https://k8s.local/default_/Service/redis-replica
CMSTP, may-produce NetworkTraffic for https://k8s.local/default_/Service/frontend
CMSTP, may-produce NetworkTraffic for https://k8s.local/default_/Service/redis-master
CMSTP, may-produce NetworkTraffic for https://k8s.local/default_/Service/redis-replica
Emond, modifies ConfigurationResource for https://k8s.local/default_/Deployment/frontend
Emond, modifies ConfigurationResource for ht

In [109]:
from collections import defaultdict
attack_by_artifact = defaultdict(list)
for attack in attack_surface:
    uri = str(attack.kube_resource)
    attack_by_artifact[uri].append(str(attack.attack_label))

print(*attack_by_artifact.items(), sep="\n")

('https://k8s.local/default_/Service/redis-replica', ['Remote Services', 'Exploitation of Remote Services', 'Trusted Relationship', 'Internal Proxy'])
('https://k8s.local/default_/Service/redis-master', ['Remote Services', 'Exploitation of Remote Services', 'Trusted Relationship', 'Internal Proxy'])
('https://k8s.local/default_/Service/frontend', ['Remote Services', 'Exploitation of Remote Services', 'Trusted Relationship', 'Internal Proxy'])
('https://gcr.io/google_samples/gb-redisslave', ['Implant Internal Image'])
('https://gcr.io/google-samples/gb-frontend', ['Implant Internal Image'])
('https://docker.io/registry.k8s.io/redis', ['Implant Internal Image'])
('https://k8s.local/default_/Deployment/redis-replica', ['Email Hiding Rules', 'Disable Windows Event Logging', 'Email Forwarding Rule'])
('https://k8s.local/default_/Deployment/redis-master', ['Email Hiding Rules', 'Disable Windows Event Logging', 'Email Forwarding Rule'])
('https://k8s.local/default_/Deployment/frontend', ['Ema

In [88]:
# Most common attacks
from collections import Counter
attack_counter = Counter([a for attacks in attack_by_artifact.values() for a in attacks])
(attack_counter.most_common(3))

[('Supply Chain Compromise', 20),
 ('HTML Smuggling', 20),
 ('Netsh Helper DLL', 7)]

In [100]:
d.query("""
SELECT DISTINCT *
WHERE {
        ?r rdfs:subClassOf d3f:DigitalArtifact ,d3f:Server .
        } 
        
        """).bindings

[]