Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Manage the occlum image with bom file: Design Details #565

Closed
StevenJiang1110 opened this issue Jul 28, 2021 · 2 comments
Closed

[RFC] Manage the occlum image with bom file: Design Details #565

StevenJiang1110 opened this issue Jul 28, 2021 · 2 comments
Labels

Comments

@StevenJiang1110
Copy link
Contributor

StevenJiang1110 commented Jul 28, 2021

  • Feature Name: occlum bom file
  • Start Date: 2021-07-28

Summary

This RFC is about design details of RFC #542 . Bom file is used to manage occlum image. In this RFC, we explain why we try to introduce bom file into occlum and the design details of bom files.

Motivation

Currently, what files are included in occlum image is not transparent to users. This will have several drawbacks:

  1. The current practice to build an image is not flexible. If we only want to run hello_world example compiled with musl, we may not want to copy glibc libraries into occlum image. Since users do not know which files will be copied into the image when they execute occlum new or occlum init, they can't achieve this goal.
  2. It is difficult to handle large programs with very complex dependencies. The current image contains only musl and glibc libraries by default. But for java or python programs, there may be many more dependencies. But occlum currently does not provide a suitable way to include these dependencies.
  3. Currently ,to find dependencies for elf files and shared objects relies on human labor. It will be rather dirty work when there are complex dependencies. Bom files can be used to automate management of the dependences.

To deal with all above problems, we wish to introduce a bom file to describe the files we try to include in occlum image. Users can know the source of each file in image from this bom file. We can use bom file to copy files into occlum image. The obvious benefits are

  1. The users can have clear idea about what is included in bom image so that they can easily customize their image.
  2. It is convenient to build official images by including all dependencies.

Guide-level explanation

  1. The bom file format
    The bom file is used to describe all file entries included in occlum image. It should meet several requirements:
    • It should be convenient for users to read and write.
    • It can contain comments.
    • It can be easy to extend to support other entries.
  2. The bom file content
    The bom file should have entries to describe directories and files and can include other bom files. More details can be seen in the bom file template below.
#Uncomment following line to include other bom files. All entries in included bom files will also be copied to the image. 
#include=[base.bom, musl.bom]

#An example of a file entry. There can be multiple file entries.
#src: the path of the source file. 
#dest: where to put the file. 
#hash: optional and default not set. If hash is set, we will check whether the hash of file is changed when we copy file.
#target_executable: optional and default not set. If target_executable is true, we will find all dependencies of this file and copy these files into image.
[[files]]
src = "/etc/hosts"
dest="image/etc/hosts"

[[files]]
src = "/etc/XXX"
dest="XXXXX"

#An example of a directory entry. There can be multiple directories entries.
#src: optional. The path of the source directory. If path is set, it will copy source directory recursively to output_path. Otherwise, a new directory will be created.
#dest: where to put the directory
[[directories]]
dest = "image/proc"
  1. Tools to deal with bom files
    We suggest two tools to deal with bom files.
    generate_bom: generate bom files from command line options.
    • copy_bom: copy files described in bom files into occlum image.
  2. How to use bom files in occlum?
    We will use hello_c as an example. Currently, we don't integrate bom files into occlum new, so after install our tool, the commands are
cd ~/demos/hello_c && make 
occlum new occlum_workspace && cd occlum_workspace 
rm -rf image 
cp ~/occlum-bom/prepared_bomfiles/{base.bom,musl.bom,glibc.bom} . 
generate_bom -e ../hello_world -o image.bom -i base.bom -i musl.bom -d image/bin (musl version) 
copy_bom -f image.bom 
occlum build 

If we integrate bom file into occlum init and occlum build, the commands will be

cd ~/demos/hello_c && make
occlum new occlum_workspace && cd occlum_workspace (occlum new will copy bom files)
(manually write bom file or use our prepared one, this is a one-time job.)
occlum build (occlum build will copy files described in bom files)

Compared to the previous command, there is no big change. We save human labor to manually copy files.

Note: In this RFC, we don't focus on how to change the behavior of occlum new or occlum build. We only want to discuss the design of bom files. We will issue another RFC about the behavior if this RFC is approved.

Reference-level explanation

  1. The files will be copied with rsync . It will copy the linked content for symbolic links. rsync can skip files that are not changed. However, rsync is not able to copy files in /sys directory. See https://unix.stackexchange.com/questions/315680/rsync-option-to-disable-verification.
  2. The hash value of file content is calculated by sha256.
  3. All paths in bom files can be absolute and relative. If it is relavtive path, it should be path relative to the bom file itself.
  4. For dependencies, the musl dependency will be copied to image/lib, the glibc lib will be copied to image/opt/occlum/glibc/lib. Other libraries will be copied to the directory corresponding to the host directory.

Further:

  1. In order for occlum build to find the bom file, this bom file should have a fixed name, such as image.bom

Drawbacks

  1. It's not clear whether users will like to write bom files.
  2. If we integrate bom file into occlum init and occlum build, it will be a break change for previous practice and docs.

Unresolved questions

  1. Where should we put dependencies? Should we put libraries in directories corresponding to their host ones? (this will change the path of musl libs)
  2. Should we keep the src and destination directories synchronized? i.e., delete extraneous files from destination dirs?
  3. Should we add an entry of symbolic link to support user creating symbolic links?
  4. Is there a better file format for bom file rather than toml? From my point of view, toml is a good choice to meet all above requirements. Toml also has good support from Rust for serialization. Maybe yaml is also a good idea, but I have no idea. And a better name?

if we try to integrate bom file into occlum build or occlum init:

  1. Should be split the bom files into two parts? one for init(copy dependencies), one for build(copy files that may be changed)?

Future possibilities

In the future, the bom file may be part of occlum.toml (If we convert the occlum.json file to toml). This bom file can be an entry in the occlum.toml.

@tatetian
Copy link
Contributor

tatetian commented Aug 3, 2021

I totally agree with the motivation of the RFC. But I think the syntax needs some rethinking to make it more user friendly. Here is an alternative, YAML-based syntax for BOM. I didn't have the time to come up with a formal spec for the syntax. But here is a sample code to give your an idea.

# A sample code of an alternative, YAML-based syntax for BOM
#
# The benefits
# * More concise (see the numbers of line reduction)
# * More readable (organized in target directories)
# * No quotes for paths thanks to the YAML

# The new base.bom, whose lines is reduced from 64 to 21
- target: /
  dirs:
    - bin
    - dev
    - root
    - sys
    - proc
    - tmp
    - glibc/lib
- target: /lib64
  files:
    - /lib64/ld-linux-x86-64.so.2
- target: /etc
  src: /etc
  files:
    - localtime
    - hosts
- target: /opt/occlum
  dirs:
    - glibc
    - glibc/etc
    - glibc/libc

# The new glibc.bom, whose lines is reduced from 28 to 10
- include:
  - base.bom
- target: /opt/occlum/glibc/lib
  src: /opt/occlum/glibc/lib/
  files:
    - *.so
    - *.so.*
- target: /lib64
  files:
    - /lib64/ld-linux-x86-64.so.2

# The new musl.bom, whose line is reduced from 24 to 7
- include:
  - base.bom
- target: /lib
  src: /opt/occlum/toolchains/gcc/x86_64-linux-musl/lib/
  files:
    - *.so
    - *.so.*

# A bom file for the glibc python demo
- include:
  - glibc.bom
- target: /bin
  files:
    - /opt/python-occlum/bin/python3
- target: /opt/occlum/glibc/lib/
  src: /opt/occlum/glibc/lib/
  files:
    - libdl.so.2
    - libutil.so.1
    - librt.so.1
- target: /dataset
  src: ../dataset
- target: /
  files:
    - ../demo.py

The syntax is by no means complete or perfect. But I think it is a promising direction that we should explore.

As you know, YAML can be viewed as a natural superset of JSON. If we are to adopt YAML, we may also consider replace Occlum.json with Occlum.yaml and integrate the BOM support into Occlum.yaml.

@StevenJiang1110
Copy link
Contributor Author

StevenJiang1110 commented Aug 6, 2021

After discussion, something has come to a conclusion:

  • We should add a symlink entry to enable user to create symbolic link in occlum image
  • When copying symbolic links, we will follow the link to copy the real files.
  • The field of target_executable will be changed to auto_dependence . To find dependences for an elf file or shared object is a default behavior.
  • We should keep the src and dest directories or files synchronized.

Some problems is still open and we give some suggestions:

  • Where to put dependencies? Suggestion: libraries other than glibc or musl libs will be copied to their corresponding host directories. Musl and glibc libraries will be skipped. We define where to copy musl and glibc dependencies in musl.bom and glibc.bom .
  • How to synchronize? Suggestion: We will try to keep each file defined in bom files to be sync. But we won't ensure the whole image directory sync with the contents in bom file. Users can achieve this goal by occlum build -f , it will delete the image directory and copy all files.
  • The file format and file content

After reading the above comment, the bom file content should be as concise as possible to relieve the writing burden. This can achieved by group files with the same destinations and srcs together. So we can avoid writing the same directory prefixes repeatedly. Yaml can also support directories with space.

we give a formal example of yaml by appending some corner cases (rename files, hash, auto_dependence) to the yaml in above comment.

# yaml example
# Each line with ~ can be deleted when human writes yaml files. We contain ~ lines here because we want to show the inner details.
# include other bom files
includes: 
  - base.yaml
  - java-11-alibaba-dragonwell.yaml
# This excludes will only take effect when copy directories. We will exclude files or dirs with following patterns.
excludes:
  - .git
  - .dockerignore
targets: 
  # one target represents operations at the same destination
  - target: /
    # make directory in dest: mkdir -p $target/dirname
    mkdirs: 
     - bin
     - proc
    # build a symlink: ln -s $src $target/linkname
    createlinks:
      - src: ../hello
        linkname: hello_softlink
    copy: 
      # from represents the prefix of copydirs and files(to copy)
      # If there's no copydirs or files, copy the *ENTIRE from directory* to target: cp -r $from/ $target
      - from: .
        # copy directory: cp -r $from/dirname $target
        dirs:
          - hello_c_demo
          - example_dirname
        # copy file: cp $from/filename $target
        files: 
          - Makefile
          - name: Cargo.toml
            hash: DA665E483C11922D07239B1A04BEE0F0C7C1AB6D60AF041DDA7CE56D07AF723E
            autodep: false
            rename: Cargo.toml.backup
  - target: /bin
     mkdirs:
       - python-occlum
       - python-occlum/bin

To compare, we also give a toml case. Toml indeed represents a hash table. The below toml is auto-generated by Rust toml crate. Toml has two disadvantages: If we want to define arrays of objects, we must wirte [[array_name]] repeatedly; It can not clearly picture the structure of directories. Toml has two advantages: It is friendly for people who dislike indent; toml facilitates expressing very deep hash tables.

#base.bom
[[targets]]
dest = "/"
dirs = ["bin", "dev", "root", "sys", "proc", "tmp", "glibc/lib"]

[[targets]]
dest = "/lib64"

[[targets.files]]
name = "/lib64/ld-linux-x86-64.so.2"

[[targets]]
dest = "/etc"
src = "/opt/occlum/glibc/lib/"

[[targets.files]]
name = "localtime"

[[targets.files]]
name = "host"

[[targets]]
dest = "/opt/occlum"
dirs = ["glibc", "glibc/etc", "glibc/libc"]

We also give an example of trying to simplify the toml version bom file with braces and wildcards. The advantage is that it has the same semantics as shell commands; The disadvantages are 1. It will be somewhat tricky to deal with renaming files if we use braces 2. braces and wildcards can't be used together.

# shell: mkdir -p <dest>
[[directories]]
dest = "{bin, dev, etc, host, lib, lib64, proc, root, sys, tmp}"

# shell: cp -p <src> <dest>
[[files]]
dest = "etc/"
src = "/etc/{hosts, localtime}"

[[files]]
dest = "lib64/"
src = "/lib64/ld-linux-x86-64.so.2"

[[files]]
dest = "opt/occlum/glibc/etc/"
src = "/etc/localtime"

Our final suggestion is to use the first yaml version, because it facilitates human reading and writing, and we can finally make bom file an entry in occlum.yaml .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants