# Task 1 Perspectives on Python-oriented SBOM Generation Tools

## Datasets and setup 
Our datasets (`\dataset1` and `\dataset2`) share the same structure:
- `\packages` contains the packages to analyse and to generate SBOMs from, 
- `\sbom` contains the generated SBOMs generated by each tool for each package, allowing to draw the comparisons

### Dataset n°1
Dataset n1 (ds1) is a copy from Cofano et al. Dependencies are read both from `requirements.txt` and `pyproject.toml`, and will be deduced from `\packages`. This dataset contains no ground truth, and only `\sbom` can be used to draw a comparison between our new approach and the other tools. 

### Dataset n°1
Dataset n2 (ds2) is a copy from Jia et al's dataset and thus directly allows us to reuse their data for comparison without having to reexcute each tool on each packagee. `\deptree_gt` contains the ground truth as json files for each package. Unlike the first  `analysis_ds2` is set manually and lists all packages from which SBOMs will be generated.

### Merge
These two datasets have been combined 

In [3]:
import os 
import ast

ds2 = os.path.join(".", "dataset2", "packages", )

analysis_ds2 = {
    "packages": [
        {
            "package": "apprise",
            "metadata": os.path.join(ds2, "apprise", "requirements.txt"),
            "source": os.path.join(ds2, "apprise", "apprise", "apprise.py"),
            "deps_from_metadata": [],
            "deps_from_ast": []
        },
        {
            "package": "django",
            "metadata": os.path.join(ds2, "django-rest-framework", "requirements.txt"),
            "source": os.path.join(ds2, "django-rest-framework", "rest_framework"),
            "deps_from_metadata": [],
            "deps_from_ast": []
        },
    ]
}


## Basic AST parsing
Here, we

In [None]:

def parse_source(source_path: str):
    def _parse(file_path: str):
        with(open(file=file_path, mode="+rb")) as start_file:
            source_code = start_file.read()
            return ast.parse(source=source_code)

    if os.path.isfile(source_path):
        yield _parse(source_path)
    elif os.path.isdir(source_path):
        # we dont bother with subdirectories and thus only analyse .py files
        source_files = [f for f in os.listdir(source_path) if os.path.isfile(os.path.join(source_path, f) and ".py" in f)]
        for source_file in source_files:
            yield _parse(os.path.join(source_path, source_file))

def analyse_package(metadata_path: str, source_path: str):
    for tree in parse_source(source_path):
        print(tree)

for package in analysis_ds2["packages"]:
    analyse_package(metadata_path=package["metadata"], source_path=package["source"])