In [None]:
#| hide
import kglab
import pandas as pd
from sbom_analysis.core import *

# Frameworks-Getting-Started SBOM

SBOM Source: [ndcrane/frameworks-getting-started](https://github.com/nd-crane/frameworks-getting-started) generated using [microsoft/sbom-tool](https://github.com/microsoft/sbom-tool)

RDF Source: Generated using [pyspdxtools](https://github.com/spdx/tools-python)

**NOTICE:** For ease of viewing some cell inputs are hidden.  Please view inputs [here](https://github.com/nd-crane/sbom-analysis/blob/main/nbs/spdx_example.ipynb) for further explinations.

This SBOM was generated with [microsoft/sbom-tool](https://github.com/microsoft/sbom-tool) from the [ndcrane/frameworks-getting-started](https://github.com/nd-crane/frameworks-getting-started) repo.  This page analyzes the performance of the sbom-tool from an extremely simple ML workflow.

#### Importing Graph

Here we import the graph to be analyzed as an XML with kglab to the variable `kg`.  This will be our main graph throughout the entirety of this notebook and will always be referred to as `kg`.  From it we will query data and create subgraphs for analysis.

In [None]:
kg = kglab.KnowledgeGraph()
kg.load_rdf("sboms/rdf/frameworks-getting-started.rdf.xml", format="xml")

<kglab.kglab.KnowledgeGraph>

## Graph Overview

First let's get a general overview of the graph we are working with.   Let's visualize it as a whole and look at some metadata.

Under default settings, orange represnets spdx: elements, red represents ptr: elements and blue represents all others.  These can be changed as wished.

Let's also take a look at basic graph metadata:

In [None]:
show_metadata(kg)

In [None]:
show_measures(kg)

Here's some more advanced metadata:

First let's look at a count of each entity type to get a general idea of what our graph represents

In [None]:
show_entity_types(kg)

We can also view the top 10 properties of all elements:

In [None]:
show_top_n_props(kg)

SPDX schemas generally represent three main items (in addition to project metadata)

1. Files in the project
2. Dependencies (or packages) used in the project
3. Relationships between everything

Let's start by examining how files are represented in this KG

## Files

From the graph let's look at all properties that are present for files

In [None]:
file_schema(kg)

Already we see there is less information included from this generated file compared to the SPDX example sbom

And also a dataframe of what is present for files

In [None]:
files = get_files_data(kg)
files.head(5)

In [None]:
files.describe()

Looking at a basic description of the files dataframe there are a few important items:

1. All fileIDs and names are unique (this is good)
2. All checksum's are unique (this is good)
    - The checksums point to another node in the KG
3. **There is no file license information or contributor information**

Here's a representation of all files in a graph form:

In [None]:
subgraph = get_files_graph(kg)

In [None]:
show_measures(subgraph)

## Packages

In [None]:
package_schema(kg)

Unnamed: 0,property
0,spdx:copyrightText
1,spdx:downloadLocation
2,spdx:externalRef
3,spdx:filesAnalyzed
4,spdx:licenseConcluded
5,spdx:licenseDeclared
6,spdx:licenseInfoFromFiles
7,spdx:name
8,spdx:packageVerificationCode
9,spdx:relationship


In [None]:
packages = get_package_data(kg)
packages

Unnamed: 0,package,annotations,attributionTexts,checksums,copyrightText,downloadLocation,externalRefs,hasFiles,licenseConcluded,licenseDeclared,licenseInfoFromFiles,name,packageVerificationCode,supplier,versionInfo,relationships
0,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,,,,NOASSERTION,spdx:noassertion,,,spdx:noassertion,spdx:noassertion,,test,_:Nc4d508e3f1a04ce9911a1aef09501a84,Organization: NDCRC,1.0.0,"N3404dd5aaabf4ac5a6d50cc94832070b, N4044ea1d8c..."
1,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,,,,NOASSERTION,spdx:noassertion,N00410ba69d9b4439874e49ef0264a205,,spdx:noassertion,spdx:noassertion,,argon2-cffi,,NOASSERTION,21.3.0,
2,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,,,,NOASSERTION,spdx:noassertion,N520dfb0cc29f460886e9783436e6345a,,spdx:noassertion,spdx:noassertion,,click,,NOASSERTION,8.1.3,
3,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,,,,NOASSERTION,spdx:noassertion,Nf2493daea9574f74962186a3567c655f,,spdx:noassertion,spdx:noassertion,,jupyterlab-widgets,,NOASSERTION,3.0.7,
4,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,,,,NOASSERTION,spdx:noassertion,Nd3e76933e62f4ae9be93501bfe5eb128,,spdx:noassertion,spdx:noassertion,,mpmath,,NOASSERTION,1.3.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
202,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,,,,NOASSERTION,spdx:noassertion,N44434646df3d4dce905af77ac8e64117,,spdx:noassertion,spdx:noassertion,,dvc-objects,,NOASSERTION,0.22.0,
203,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,,,,NOASSERTION,spdx:noassertion,Nd452724740c045739c99d337d4217499,,spdx:noassertion,spdx:noassertion,,requests,,NOASSERTION,2.30.0,
204,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,,,,NOASSERTION,spdx:noassertion,N8fa356e6cef94c348c9d99e5d37af0fd,,spdx:noassertion,spdx:noassertion,,entrypoints,,NOASSERTION,0.4,
205,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,,,,NOASSERTION,spdx:noassertion,N2d4bd4c95ba04c3a944f6ed821d49ed7,,spdx:noassertion,spdx:noassertion,,tornado,,NOASSERTION,6.3.2,


In [None]:
packages[packages['name'].str.contains('python')]

Unnamed: 0,package,annotations,attributionTexts,checksums,copyrightText,downloadLocation,externalRefs,hasFiles,licenseConcluded,licenseDeclared,licenseInfoFromFiles,name,packageVerificationCode,supplier,versionInfo,relationships
28,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,,,,NOASSERTION,spdx:noassertion,Ned499aef0ba84f2fb0eeda72b45df10a,,spdx:noassertion,spdx:noassertion,,ipython,,NOASSERTION,8.13.2,
42,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,,,,NOASSERTION,spdx:noassertion,N84ebb7c12df34d48842b774188ab8557,,spdx:noassertion,spdx:noassertion,,gitpython,,NOASSERTION,3.1.31,
58,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,,,,NOASSERTION,spdx:noassertion,N621e9f13138d4b53b06f1a662ae3a6c4,,spdx:noassertion,spdx:noassertion,,python-dateutil,,NOASSERTION,2.8.2,
99,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,,,,NOASSERTION,spdx:noassertion,Nfc23b7d04ac54d60a69b52fa58bb4cb3,,spdx:noassertion,spdx:noassertion,,ipython-genutils,,NOASSERTION,0.2.0,
179,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,,,,NOASSERTION,spdx:noassertion,Ndca685a73c454a5d915d9371e6b3b0f6,,spdx:noassertion,spdx:noassertion,,antlr4-python3-runtime,,NOASSERTION,4.9.3,
206,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,,,,NOASSERTION,spdx:noassertion,Neae3036e94d447d7a21c80ad1ed3f586,,spdx:noassertion,spdx:noassertion,,python-json-logger,,NOASSERTION,2.0.7,


In [None]:
packages.describe()

Unnamed: 0,package,annotations,attributionTexts,checksums,copyrightText,downloadLocation,externalRefs,hasFiles,licenseConcluded,licenseDeclared,licenseInfoFromFiles,name,packageVerificationCode,supplier,versionInfo,relationships
count,207,207.0,207.0,207.0,207,207,207.0,207.0,207,207,207.0,207,1,207,207,207.0
unique,207,1.0,1.0,1.0,1,1,207.0,1.0,1,1,1.0,207,1,2,186,2.0
top,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,,,,NOASSERTION,spdx:noassertion,,,spdx:noassertion,spdx:noassertion,,test,_:Nc4d508e3f1a04ce9911a1aef09501a84,NOASSERTION,1.0.0,
freq,1,207.0,207.0,207.0,207,207,1.0,207.0,207,207,207.0,1,1,206,3,206.0


Here we see we are missing even more information then the files section.

## Relationships

In [None]:
relationship_schema(kg)

Unnamed: 0,property
0,rdf:type
1,spdx:relationshipType
2,spdx:relatedSpdxElement


In [None]:
rels = get_relationship_data(kg)
rels

Unnamed: 0,element,elementType,relationshipType,relatedElement,relatedElementType
0,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,spdx:Package,spdx:relationshipType_contains,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,spdx:File
1,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,spdx:Package,spdx:relationshipType_contains,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,spdx:File
2,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,spdx:Package,spdx:relationshipType_contains,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,spdx:File
3,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,spdx:Package,spdx:relationshipType_contains,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,spdx:File
4,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,spdx:Package,spdx:relationshipType_contains,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,spdx:File
...,...,...,...,...,...
2855,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,spdx:Package,spdx:relationshipType_contains,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,spdx:File
2856,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,spdx:Package,spdx:relationshipType_contains,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,spdx:File
2857,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,spdx:Package,spdx:relationshipType_contains,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,spdx:File
2858,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,spdx:Package,spdx:relationshipType_contains,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,spdx:File


In [None]:
rels.describe()

Unnamed: 0,element,elementType,relationshipType,relatedElement,relatedElementType
count,2860,2860,2860,2860,2860
unique,2,2,3,2860,2
top,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,spdx:Package,spdx:relationshipType_contains,<https://spdx.org/spdxdocs/sbom-tool-1.1.1-098...,spdx:File
freq,2859,2859,2653,1,2653


Lastly our relationshipsare only limited to 3 types and are mostly between Packages and Files

In [None]:
subgraph = get_relationship_graph(kg)

In [None]:
show_measures(subgraph)

edges 11441
nodes 5727
