Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static assets endpoint #5063

Merged
merged 23 commits into from
Jun 28, 2024
Merged

Static assets endpoint #5063

merged 23 commits into from
Jun 28, 2024

Conversation

egor-ryashin
Copy link
Contributor

@egor-ryashin egor-ryashin commented Jun 11, 2024

Files can be accessed through http endpoint, the template request:

GET /v1/instances/{instance_id}/assets/{path...} 

For example, if repo has the following structure:

openrtb/
├── README.md
├── dashboards
│   ├── auction.yaml
│   └── bids.yaml
├── models
│   ├── auction_data_model.sql
│   ├── bids_data_model.sql
│   ├── model.sql
│   └── model_1.sql
├── public
│   ├── 2.txt
│   ├── custom.geo.json
│   └── geo.json
├── rill.yaml
├── sources
│   ├── auction_data_raw.yaml
│   └── bids_data_raw.yaml
└── tmp
    ├── default
    │   └── tmp
    ├── main.db
    ├── main.db.wal
    └── meta.db

then requesting public/geo.json can be done with:

GET /v1/instances/{instance_id}/assets/public/geo.json

@egor-ryashin egor-ryashin changed the title static assets endpoint Static assets endpoint Jun 12, 2024
@egor-ryashin egor-ryashin marked this pull request as ready for review June 12, 2024 16:28
Comment on lines 71 to 72
Features yaml.Node `yaml:"features"`
PublicPaths []string `yaml:"public_paths"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Features yaml.Node `yaml:"features"`
PublicPaths []string `yaml:"public_paths"`
Features yaml.Node `yaml:"features"`
// Paths to expose over HTTP (defaults to ./public)
PublicPaths []string `yaml:"public_paths"`

Comment on lines 86 to 88
if len(tmp.PublicPaths) == 0 {
tmp.PublicPaths = []string{"public"}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to around line 168 – it's easier to navigate if the properties are handled in the same order as they are declared

Comment on lines 290 to 323
cache := make(map[string][]byte)
for _, p := range h.cachedPaths {
p = filepath.Join(h.projPath, p)
_, err = os.Stat(p)
if err != nil {
if os.IsNotExist(err) {
continue
}
return nil, err
}

err := filepath.Walk(p, func(path string, info os.FileInfo, err error) error {
if err != nil {
return err
}
if !info.IsDir() {
b, err := os.ReadFile(path)
if err != nil {
return err
}

rel, err := filepath.Rel(h.projPath, path)
if err != nil {
return err
}
cache[strings.TrimLeft(rel, "/")] = b
}
return nil
})
if err != nil {
return nil, err
}
}
h.assetsCache = cache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a useful cache. The place where caching is nice is to be able to serve assets immediately when the runtime starts, even if Github takes a few minutes to clone. This means the cached files would need to be stored on the persistent disk, and be readable before the first call to cloneOrPull.

Also, we should not eagerly cache the assets in memory – it can lead to excessive memory usage if there are larger files. Relying on the OS file system page cache is more than adequate for our performance requirements here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how a user can get into a state where there's cache on disk but no cloned repo? Because cache can be created from a clone repo.

And why would user like to delete a cloned repo leaving just the cache on disk?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The repo is currently cloned into a temp directory which will not exist after a process restart:

h.repoPath, err = os.MkdirTemp(h.config.TempDir, "admin_driver_repo")
.

It might be fine to cache the full repo on the persistent disk – that just isn't what it's doing now.

Comment on lines 182 to 216
cache := make(map[string][]byte)
for _, p := range c.cachedPaths {
p = filepath.Join(c.root, p)
_, err := os.Stat(p)
if err != nil {
if os.IsNotExist(err) {
continue
}
return err
}
err = filepath.Walk(p, func(path string, info os.FileInfo, err error) error {
if err != nil {
return err
}
if !info.IsDir() {
b, err := os.ReadFile(path)
if err != nil {
return err
}

rel, err := filepath.Rel(c.root, path)
if err != nil {
return err
}
cache[strings.TrimLeft(rel, "/")] = b
}
return nil
})
if err != nil {
return err
}
}
c.cacheMutex.Lock()
defer c.cacheMutex.Unlock()
c.assetsCache = cache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file driver does not have a cloning latency like Github has, so there is no need for caching assets in it.

Comment on lines 293 to 302
paths := parser.RillYAML.PublicPaths
if len(paths) == 0 {
paths = []string{"public"}
}
repo, release, err := r.C.Runtime.Repo(ctx, parser.InstanceID)
if err != nil {
return err
}
defer release()
repo.SetCachedPaths(paths)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be handled inside UpdateInstanceWithRillYAML

Comment on lines 34 to 36
paths := repo.GetCachedPaths()
allowed := false
for _, p := range paths {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's too implicit to rely on the repo's cached paths as an allow list. It only works as long as that feature is only used for PublicPaths.

Instead, I would suggest explicitly adding PublicPaths as a property on Instance, assigning it in UpdateInstanceWithRillYAML, and checking it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you suggest adding an instance reference to repo, right? because there's no instance in the params list
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it could just pass the paths to cache, no need to pass the full instance. But as discussed above, it might be fine just to cache the whole repo on a persistent disk.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still suggest to omit caching from this PR since it's not so important

paths := repo.GetCachedPaths()
allowed := false
for _, p := range paths {
if strings.HasPrefix(strings.TrimLeft(path, ","), strings.TrimLeft(p, ",")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This is not a safe way to check the path. For example, if /public is the public directory, it will match /publications/secret.txt.
  2. Why is it necessary to trim commas from the paths?

}
}
if !allowed {
return fmt.Errorf("path is not allowed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should return a httputil.Error with an appropriate status code

@begelundmuller
Copy link
Contributor

I would suggest completing this PR without caching support and optionally taking it up in a follow up PR.

Comment on lines +63 to +64
// Paths to expose over HTTP (defaults to ./public)
PublicPaths []string `db:"public_paths"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add migrations and handling for it in runtime/drivers/sqlite (can search for feature_flags to find locations)

Comment on lines 293 to 296
if len(parser.RillYAML.PublicPaths) == 0 {
parser.RillYAML.PublicPaths = []string{"public"}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be removed since the default is assigned in parse_rillyaml.go

@@ -58,6 +58,7 @@ type configProperties struct {
// a smaller subset of relevant parts of rill.yaml
type rillYAML struct {
IgnorePaths []string `yaml:"ignore_paths"`
CachedPaths []string `yaml:"public_paths"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CachedPaths []string `yaml:"public_paths"`
PublicPaths []string `yaml:"public_paths"`

@begelundmuller
Copy link
Contributor

@egor-ryashin After this PR merges, can you also update in #team-applications about the new endpoint? So that they can wire it up to the custom dashboard components.

@begelundmuller begelundmuller merged commit 349403e into main Jun 28, 2024
4 checks passed
@begelundmuller begelundmuller deleted the static-assets branch June 28, 2024 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants