Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxy image gets #1686

Merged
merged 29 commits into from Jan 9, 2024
Merged

Proxy image gets #1686

merged 29 commits into from Jan 9, 2024

Conversation

tzjames
Copy link
Collaborator

@tzjames tzjames commented Sep 22, 2023

Features and Changes

Before this change for files stored on s3 and google-cloud we were creating a link that required the buckets to be open to the public.

  • This change introduces an env variable that if enabled proxies the image requests through our server so our clients can keep the buckets private.
  • Local images are now always proxied regardless if env var and can only be seen with the correct auth headers.
  • Fixes a pre-existing css issue with the carousel in the new design where it didn't have a background for the next/prev image button.
  • Fixed a pre-existing bug where we were only showing the loading spinner on put on the first image upload, but not subsequent ones.
  • Changes the markdown library to one that is compatible for replacing elements with React.
  • Uses uuidv4 to generate the filenames to make it less likely to be guessed (in case people are still using public buckets.

A few notes on the implementation:

  • For errors hitting the proxy endpoint it does show ! Icon FaExclamationTriangle, with error message following it.
    • I tried showing Tooltip but the drop shadow somehow took over the entire variations div. I don't think it is worth the time to figure it out, because hopefully few people will ever see the message.
    • This does mean that if the error is very long, that it would look weird/be unreadable ... but again hopefully it won't happen much.
  • I was able to get it to work with Markdown ... however I had to move away from using markdown-it and instead use markdown-react. I don't think it is possible to use markdown-it to replace rendering with a react component. I think you can only replace it with html.
  • Old images urls that access the buckets directly get rewritten to use the proxy if the env var is true.
  • I have added caching. I am not deleting things from the cache ever, and so I am a bit worried it might end up using too much memory.
    • I could delete it when they delete the screenshot from the variation (easy to implement, probably rare and not so helpful)
    • It might be trickier for detecting when a screenshot in Markdown editing that got viewed in preview but then later gets deleted from editing, but perhaps there is a way. (harder to implement, probably very rare and not so helpful)
    • I could also put in a hard limit into how much we cache (items/size) and then not cache more. (easy to implement, protects against too much memory, worse UX)
    • I could put in a hard limit into how much we cache (items/size) and then remove the oldest viewed key when we go over the limit (slightly longer to implement, protects against too much memory, better UX - but still risks loading things multiple times ).
  • I noticed and fixed a few existing css bugs
    • On the new experiments design page, the background for the next arrow when there are multiple screenshots for a variation is missing making the next arrow hard to see.
    • On the screenshot enlargement when you click a screenshot in a variation, for some screen sizes the delete waste-can icon sometimes overlaps the image.
  • I haven't done any unit testing yet.
    • It doesn't look like we do much unit testing of controllers anywhere
    • There was no existing unit tests for files.ts. As it is mostly 3rd party urls that get called, there is not all that much logic to test.

Closes: #857

Testing

For each of the three UPLOAD_METHOD's:

  • set USE_FILE_PROXY=false
  • Upload screen shots
  • See a loading spinner.
  • See the file appear.
  • Lock down buckets.
  • See error message.
  • set USE_FILE_PROXY=true
  • See the file appear again.
  • Upload another file.
  • See the file appear.
  • Delete the screen shots.

For markdown testing:

  • Drop an image into the comment box that is from an upload url.
  • Add the following into the comment box:
![img](https://test-growthbook-upload.s3.amazonaws.com/org_19exnt1a6lljfsluzz/2023-11/img_4posgyrglpb877hj.png)
 ![img](https://storage.googleapis.com/openimages/web/images/a8b164235b631065.png)
![img2](https://growthbook-files.s3.amazonaws.com/org_a919vk7kc59purn/2023-11/img_19g61olpbapa4z.png)
  • Go to Preview. See the images there accessing the files directly for the ones from different s3/gcs buckets but see the AuthorizedImage rewrite for the img from the S3_BUCKET of the server.
  • Copy markdown from here into comment box: https://markdown-it.github.io/
  • Go to Preview. See that it rendered most things including tables. Unlike the old renderer it doesn't render . It also doesn't render subscript and a few others, but the old one didn't either.

@github-actions
Copy link

github-actions bot commented Sep 22, 2023

Your preview environment pr-1686-bttf has been deployed.

Preview environment endpoints are available at:

@@ -50,6 +50,7 @@ const Carousel: FC<{
open={true}
header={"Screenshot"}
close={() => setModalOpen(false)}
bodyClassName="d-flex justify-content-center align-items-center"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This aligns the loading spinner in the middle of the variation area.

@@ -28,7 +28,7 @@ const ScreenshotUpload = ({
const [loading, setLoading] = useState(0);

const onDrop = async (files: File[]) => {
setLoading(loading + files.length);
setLoading((previous) => previous + files.length);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the ones blow fixed a pre-existing bug where we were only showing the loading spinner on put on the first image upload, but not subsequent ones.

@tzjames tzjames changed the title WIP: proxy image gets Proxy image gets Sep 27, 2023
@tzjames tzjames requested a review from jdorn October 1, 2023 10:51
packages/back-end/src/app.ts Outdated Show resolved Hide resolved
packages/front-end/components/AuthorizedImage.tsx Outdated Show resolved Hide resolved
import styles from "./Markdown.module.scss";

const md = markdown({ html: true, linkify: true }).use(sanitizer).use(mark);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the renderer is a pretty big change. How thoroughly did you test this with different markdown text? Does it turn URLs into links automatically? Does it sanitize html like the previous one did? Does it use the same custom CSS styles we configured to make the output match the rest of the app styles?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How thoroughly did you test this with different markdown text?
I tested on: https://markdown-it.github.io/. The only thing the new one doesn't render that the old one did is ==mark== as <mark>

Does it turn URLs into links automatically?
yes

Does it sanitize html like the previous one did?
Neither the old one nor this one renders html. The html is escaped.

Does it use the same custom CSS styles we configured to make the output match the rest of the app styles?
yes we surround the ReactMarkdown with <div {...props} className={clsx(className, styles.markdown)}> which is what the markdown was wrapped with before. So it should all look the same as before. I didn't notice any differences.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDK if it's worth doing at this point (up to you) but it might be worth separating this distinct change (updating our markdown renderer) in a separate PR that can be deployed/tested and potentially rolled back easily if we need to.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we store the same thing in the database now, we should be able to revert the whole changeset without a problem, as long as no one locked down their bucket in the meantime.

packages/front-end/components/AuthorizedImage.tsx Outdated Show resolved Hide resolved
@tzjames tzjames requested a review from jdorn October 17, 2023 15:00
Copy link

github-actions bot commented Nov 23, 2023

Deploy preview for docs ready!

✅ Preview
https://docs-8u6t1tgpk-growthbook.vercel.app

Built with commit 3418d79.
This pull request is being automatically deployed with vercel-action

Copy link
Collaborator

@bttf bttf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We may want to start limiting the size of images we accept in putUpload
  • Have we considered leveraging browser-based caching (Cache-Control header)? Is security a blocker?
  • The github issue mentions using pre-signed URLs for fetching images. This was my first thought at an approach as well... Have we considered this? It does come at a cost to security but it can be minimized. And it would allow for a more lightweight implementation.

packages/back-end/src/routers/upload/upload.controller.ts Outdated Show resolved Hide resolved
@tzjames
Copy link
Collaborator Author

tzjames commented Nov 28, 2023

  • We may want to start limiting the size of images we accept in putUpload
    There is one at upload.router.ts:15 at 10mb.
  • Have we considered leveraging browser-based caching (Cache-Control header)? Is security a blocker?
    I would prefer to rely on the browser cache rather than our own. However createObjectURL only stores things in memory and is cleared when the browser is closed, while the browser cache could conceivably be read off the disk even after someone has logged out, and even after the cache has expired - so security wise not as strong. I'll run it by @jdorn again. Most machines are not shared these days, and I'm sure most big firms will enforce disk encryption, but perhaps it might be an issue from some of our most paranoid clients.
  • The github issue mentions using pre-signed URLs for fetching images. This was my first thought at an approach as well... Have we considered this? It does come at a cost to security but it can be minimized. And it would allow for a more lightweight implementation.
    We went back and forth on this one. Certainly the pre-signed URL approach would be more light weight. But we had been using this approach for uploads originally and a bunch of users had issues experiencing errors. It is possible that pre-signed URLs for reading would not be as error prone, so perhaps we could try that approach first and then if we get users complaining about errors return to this approach - but that might take another day to re-implement and test everything. So at this point perhaps its better to continue with this approach and if we run into issues the fall back to that approach.

@@ -71,3 +71,43 @@ export async function uploadFile(
}
return fileURL;
}

export async function getImageData(filePath: string) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be async

import styles from "./Markdown.module.scss";

const md = markdown({ html: true, linkify: true }).use(sanitizer).use(mark);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDK if it's worth doing at this point (up to you) but it might be worth separating this distinct change (updating our markdown renderer) in a separate PR that can be deployed/tested and potentially rolled back easily if we need to.

} else if (
usingFileProxy() &&
getGcsDomain() &&
src.startsWith(getGcsDomain())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this (and for the similar s3 check below) I'm having trouble understanding when the src property will be set to an S3 url and not a GB host. When the user uploads a screenshot, would we not update the variation with the GB proxy URL? Is this meant for existing customers to accomplish a backfill?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had gone back and forth on this a bit, but if you look in files.ts now you will see that we store the GCS location where it is actually stored on the mongo model and not the Growthbook proxy. Only when usingFileProxy() is true will we then use the proxy and rewrite it to be a GB url. This will hence behave the same for both old and new urls and they can go ahead and lock their buckets down.

@@ -7,6 +7,9 @@ import { useAuth } from "@/services/auth";
import { trafficSplitPercentages } from "@/services/utils";
import Carousel from "../Carousel";
import ScreenshotUpload from "../EditExperiment/ScreenshotUpload";
import AuthorizedImage from "../AuthorizedImage";

const imageCache = {};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be better suited to be defined by a useState variable, and reset when variation.id changes

const [imageCache, setImageCache] = useState({});

useEffect(() => {
  setImageCache({});
}, [variation.id]);

// you can still set imageCache as it's currently done
imageCache[src] = blob;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing that would make the cache disappear as they navigate to other pages and come back. Keeping it as is makes the cache last longer.

Copy link
Collaborator Author

@tzjames tzjames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing again @bttf. I have removed the async's and responded to your other comments. I noticed that react was sometimes call useEffect multiple times before the api call could finish running, so now I put it in a lock so that it only fetches it from the proxy once.

} else if (
usingFileProxy() &&
getGcsDomain() &&
src.startsWith(getGcsDomain())
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had gone back and forth on this a bit, but if you look in files.ts now you will see that we store the GCS location where it is actually stored on the mongo model and not the Growthbook proxy. Only when usingFileProxy() is true will we then use the proxy and rewrite it to be a GB url. This will hence behave the same for both old and new urls and they can go ahead and lock their buckets down.

@@ -7,6 +7,9 @@ import { useAuth } from "@/services/auth";
import { trafficSplitPercentages } from "@/services/utils";
import Carousel from "../Carousel";
import ScreenshotUpload from "../EditExperiment/ScreenshotUpload";
import AuthorizedImage from "../AuthorizedImage";

const imageCache = {};
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing that would make the cache disappear as they navigate to other pages and come back. Keeping it as is makes the cache last longer.

import styles from "./Markdown.module.scss";

const md = markdown({ html: true, linkify: true }).use(sanitizer).use(mark);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we store the same thing in the database now, we should be able to revert the whole changeset without a problem, as long as no one locked down their bucket in the meantime.

@tzjames tzjames requested a review from bttf December 22, 2023 15:43
@tzjames tzjames merged commit 1c5a821 into main Jan 9, 2024
6 checks passed
@tzjames tzjames deleted the ji/proxy_image_gets branch January 9, 2024 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Serve uploaded variation screenshots using private S3 bucket
3 participants