The goal of this project is to evaluate the CernVM File System (CVMFS) as a platform to store, organize, and distribute model files. Specifically, we want to understand the latency and file access patterns for inference tasks using models loaded from a CVMFS repo.
- Set up a test client machine and a CVMFS Stratum 0 server (at
model-registry-test.cern.ch) on two LXPlus virtual machines. - Tested various container tools like Skopeo, Oras, and CVMFS Ducc to unpack then publish container
images into
model-registry-test.cern.ch - Coded a gpu-accelerated test inference script in python using Microsoft's Phi-4-mini LLM in
.onnxformat and recorded performance data into csv files comparing using local vs CVMFS storage. Coded a helper bash script to clear kernel and CVMFS cache for repeat trials. - Found that there is significant loading overhead on CVMFS (>30 sec) compared to local (>10 sec) on the first load, but after the model gets cached the load time is negligable (>1 sec). Inference performance (generating tokens) was the same for both.
- Attempted testing loading overhead with file chunking disabled on the server, discovered a bug where CVMFS fails to read unchunked files.
- Began investigating inference-time file access patterns using CVMFS debug logs and wrote scripts
to count
cvmfs_read()calls at inference time. Initial tests usingcat <file>showed that the read only gets called once per file per CVMFS process, which was unexpected.
- Fix CVMFS continuously crashing on the client machine (began after investigating file access patterns)
- Run the file tracing scripts during inference runs
- Merge the file tracing scripts with
fuse-monitor-read-writetool to display file access heat maps from this github repo - Run inference benchmarks on smaller, commonly used CERN models like Particle Transformer, which are just a few MB compared to Phi-4-mini which is several GB
- Develop the server into a publically usable one by creating Stratum 1 Proxy servers and using Ducc
to automatically add and unpack user requested container images, in the same manner as
unpacked.cern.ch - Develop the server infrastructure into an OCI-compliant registry with authentication and HTTP requests
fuse-monitor-datacontains the access pattern for the locally-stored Phi-4-minibenchmark-datacontains some of the inference benchmarking I referenced previouslytrace-cvmfs-reads.sh,inodes.csv,raw-reads.txt, andread_data.csvare part of the prototype CVMFS file tracing pipeline
- The biggest challenges I faced were system-administration problems, where the packages I'd install would be incompatible with each other or the machine for various reasons, sometimes they would even cause the machine to crash!
- I learned a lot about sysadmin tools like using ssh, configuring clients and servers, monitoring files and processes, and working with containers. One of the coolest moments was when I first got the inference script to work, seeing the client and server communicate seemlessly was amazing! Another one was when I used tmux to create two panes and simultaneously view the debug output of a running CVMFS process and print the size of the CVMFS cache, seeing the cache fill up with data chunks sent through HTTP was so cool.
- In the beginning, it was challenging to understand all the container-specific terminology and implementations. Now I've gained an appreciation for how standardized container workflows have become (through OCI) and how convenient and secure they can be for both sharing coding environments and files (artifacts) through registries.
- Debugging crashes can be quite difficult, and in the final week I never figured out why CVMFS kept on crashing on startup, which halted my progress.