Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: updated README #32

Merged
merged 1 commit into from
Mar 30, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,17 @@

Snyk helps you find, fix and monitor for known vulnerabilities in your dependencies, both on an ad hoc basis and as part of your CI (Build) system.

## Snyk RPM Parser
## Snyk RPM Parser ##

A library that reads the list of packages inside an RPM database file.

### How it works
### How it works ###

The parser loads an RPM database file, which is in a BerkeleyDB format. The parser reads the beginning of the file to determine if this is a valid an expected BerkeleyDB.

The database is split into equally-sized pages. The database itself is of type Hash DB so the parser is interested only in pages that signify this type.

### BerkeleyDB internals
### BerkeleyDB internals ###

The database layout looks like this:

Expand All @@ -37,7 +37,7 @@ The first page is the metadata page, which contains the database type, magic num

Refer to the [BerkeleyDB module README](https://github.com/snyk/rpm-parser/blob/master/lib/berkeleydb/README.md) for a breakdown of the page layout for every page type.

### RPM package extraction
### RPM package extraction ###

Once all data entries are obtained, the parser processes them as RPM metadata blobs.

Expand Down
33 changes: 32 additions & 1 deletion lib/berkeleydb/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Bytes 0-71 contain the generic BerkeleyDB header, whereas bytes 72-511 contain t
| iv | 476-495 | Crypto IV | Not used in RPM |
| chksum | 496-511 | Page checksum | Not used in RPM |

## Page header
## Page header ##

This header appears at the start of every page regardless of its type and is 26-bytes long. This means that for a 4096-bytes page it leaves up to 4070 bytes for content.

Expand All @@ -52,3 +52,34 @@ This header appears at the start of every page regardless of its type and is 26-
| hf_offset | 22-23 | High free byte page offset | Useful to determine where the data ends in an Overflow page |
| level | 24 | B-Tree page level | Not used in Hash DB |
| type | 25 | Page type | 0x07 for a Hash page, 0x0D for an Overflow page |

## Hash page index ##

| Hash page section | Bytes | Comment
|---|---|---|
| Page header | 0-25 | |
| Hash index | 26-n | Every index entry is 2 bytes. The number of entries (n) is defined in the entries field of the page header, hence the hash index is 2*n bytes long. |
| ... |
| Hash entries | m-4095 | Located at the end of the page, in reverse order. The start of the entries (m) is defined in the hf_offset field of the page header. |

The hash index contains 2-bytes long entries. Each entry is an offset to a byte in the current page. Reading this byte returns the type of hash entry, as defined below:

## Hash entries ##

The hash index can point to two types of entries: KEYDATA or OFFPAGE. Other types of entries are not used by RPM.

Hash entries are always stored as key/value pairs.

KEYDATA (Key/data entry) is 5 bytes long:
| KEYDATA entry layout | Bytes | Comment |
|---|---|---|
| type | 0 | Must have the value 0x01. |
| data | 1-4 | |

OFFPAGE (Overflow entry) is 12 bytes long:
| OFFPAGE entry layout | Bytes | Comment |
|---|---|---|
| type | 0 | Must have the value 0x03. |
| unused | 1-3 | Padding |
| pgno | 4-7 | Offset page number. This means that the data/content for this entry starts at that page number. Inspecting the page should return an Overflow page type in its page header. |
| tlen | 8-11 | Total length of the item. Defined in terms of bytes. Data may span multiple pages. |
6 changes: 3 additions & 3 deletions test/README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# RPM Parser Tests
# RPM Parser Tests #

The RPM parser tests use fixtures and output obtained from RPM-based images in order to compare its output to that of `rpm`.

## Running tests
## Running tests ##

```sh
npm test
```

## Creating or updating fixtures
## Creating or updating fixtures ##

Fixtures are created by running a container image, installing packages, and then obtaining the RPM database file as well as some formatted output from `rpm`.

Expand Down