Skip to content

maxleow/duck_jenkins

Repository files navigation

Duck Jenkins: loading jenkins build info into DuckDB

Python package PyPI Downloads PyPI - Python Version License

What is it?

ETL(Extract Transform Load) for Jenkins data.

Installation

pip install duck-jenkins

Main features

Jenkins build extractor

  • Extract and serialize Jenkins' build information along with artefact metadata into files.
  • A fix file structure can support multiple Jenkins servers.
  • Support multi-branch structure
└── data
    ├── jenkins1.example.io
    └── jenkins2.example.io
        ├── pipeline1
        │    └── 1_info.json
        └── pipeline2
            └── master
                ├── 1_info.json
                └── 1_artifact.csv

DuckDB transformer

Transform all serialized data above to relational database, DuckDB.

Database ER diagram

erDiagram
    Jenkins ||--o{ Job: has
    Job ||--o{ Build: has
    Build ||--o{ Artifact: has
    Build ||--o| Jenkins_User: has
    Build ||--o{ Cause: has
    Build ||--o{ Parameter: has
    Build ||--|| Result: has
    Parameter ||--|| ParameterDictionary: has
    Jenkins{
        int id PK
        str domain_name
    }
    Job{
        int id  PK
        str name
        int jenkins_id FK
    }
    Result{
        int id PK
        str name
    }
    Jenkins_User{
        int id PK
        str name
        str lan_id
    }
    Cause{
        int id PK
        str category
    }
    Build{
        int id               PK
        int job_id           FK
        int build_number
        int result_id        FK
        int user_id          FK 
        int trigger_type     FK "Cause table's PK"
        int duration
        datetime timestamp
        int upstream_job_id FK
        int upstream_build_number
        int upstream_type   FK "Cause table's PK"
        int previous_build_number
    }
    ParameterDictionary{
        int id PK
        str name
    }
    Parameter{
        int build_id FK
        int name_id  FK
        str value
    }
    Artifact{
        int id        PK
        int build_id  FK
        str file_name
        str dir
        int size
        datetime timestamp
    }
Loading

Example

Jenkins Build extractor

Following examples try to emulate the file structure aboved.

1. Extract build

Extracting a multi-branch pipeline

from duck_jenkins import JenkinsData

jd = JenkinsData(
    domain_name='jenkins1.example.io',
    verify_ssl=False,
    user_id='C001',
    secret='elwerqoqiweucv',
    data_directory='data'
)
jd.pull(
    project_name='pipeline2/master',
    build_number=1,
    artifact=True
)

2. Extract upstream build

Let assume the upstream of pipeline2/master/1 is pipeline1/1.

from duck_jenkins import JenkinsData

jd = JenkinsData(
    domain_name='jenkins1.example.io',
    verify_ssl=False,
    user_id='C001',
    secret='elwerqoqiweucv',
    data_directory='data'
)
jd.pull_upstream(
    project_name='pipeline2/master',
    build_number=1,
    artifact=False
)

3. Extract previous build

from duck_jenkins import JenkinsData

jd = JenkinsData(
    domain_name='jenkins1.example.io',
    verify_ssl=False,
    user_id='C001',
    secret='elwerqoqiweucv',
    data_directory='data'
)
jd.pull_previous(
    project_name='pipeline2/master',
    build_number=2,  # build 2 is excluded from the extraction in this function. 
    artifact=True,
    overwrite=True,
    size=1  # say, you only interested 1 previous build.
)

DuckDB transformation

Without transform into a database, it is useless. Following steps demostrate how to import into DuckDB.

from duck_jenkins import DuckLoader
import duckdb

db = duckdb.connect('1.ddb')
cursor = db.cursor()

dl = DuckLoader(cursor, 'data')
dl.import_into_db(
    jenkins_domain_name='jenkins1.example.io', 
    overwrite=False  # False to skip insert for existing record.
)

cursor.commit()
cursor.close()

For more usage of DuckDB, visit the official document: https://duckdb.org/docs/