Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better solution for hosting images (or the site as a whole)? #636

Open
17cupsofcoffee opened this issue May 14, 2021 · 12 comments
Open

Better solution for hosting images (or the site as a whole)? #636

17cupsofcoffee opened this issue May 14, 2021 · 12 comments

Comments

@17cupsofcoffee
Copy link
Collaborator

I noticed while working on the most recent newsletter that the size of the repo is starting to get a bit unwieldy - a fresh clone is 800MB (half of which is the index, half of which is the actual files).

Not only does this make it more of a pain for people to contribute, there is a hard 1GB limit on published GitHub Pages sites which we're eventually going to hit. The index wouldn't factor into that though (I hope...), so I don't think we're in imminent danger.

In general, Git is just not a good solution for storing lots of binary content, and I think we'll need to deal with this eventually if the newsletter is going to stick around long term (which I hope it does!)

Some potential fixes could be:

Fix Pros Cons
Stop accepting GIFs (they take up the majority of the repo's space) Only a policy change, not a technical one. Would only slow down the issue, not fix it.
Move to a different hosting solution with a higher size cap (Netlify?) Wouldn't need changes to the site or to the workflow. Doesn't solve the repo size issue long term.
Host images externally (maybe on a CDN?) Wouldn't need changes to the site, minimal change to the workflow. Might make it more difficult for people to submit images.
Move away from Git-based hosting altogether Might be easier to edit the site through a CMS instead of markdown. Would require us to rethink the whole contribution workflow.
Something else I've not thought of? ??? ???

Don't think we need to deal with this immediately, but just wanted to start the conversation sooner rather than later 😄

@ozkriff
Copy link
Member

ozkriff commented May 14, 2021

Another option is to stop accepting GIFs but start accepting video files and limit the image and video sized drastically. The downsides are that we need to somehow process all the published newsletters and GitHub still doesn't support videos in Markdown preview.

@17cupsofcoffee
Copy link
Collaborator Author

Some stats on what's taking up the space:

image

(the .pack is the index)

@ozkriff
Copy link
Member

ozkriff commented May 14, 2021

Another theoretical option is to use git LFS (don't know if it will help with GitHub Pages limitations) but GitHub LFS pricing doesn't fit for open source projects at all :(

@AngelOnFira
Copy link
Member

re: Videos instead of gifs, we did this recently with Veloren's blog on Zola, and it has made it significantly easier to show off more content with smaller filesize. I think this would be a good move even outside of this particular issue.

re: GitHub LFS, I would not recommend this path. We've look at it for Veloren, and when we asked if GitHub would sponsor us with more bandwidth, they recommended we just go back to checking files into the repo instead.

re: GitHub pages limit, it also seems like they have some soft limits that we could potentially break through. 100 GB bandwidth isn't that much. I don't think we track traffic, but that might come up in the future.

I think netlify is a pretty good path, and we can probably retroactivly go back and do a batch conversion of Gifs -> MP4 on the repo. Also, would it be possible to see if the Rust foundation has infrastructure that we could provision on? With Veloren, we have a dedicated server in Germany that is about 60 Euro a month, and it has full gigabit without data caps. I would imagine that the foundation has some servers that could just be used as a CDN pretty nicely.

@17cupsofcoffee
Copy link
Collaborator Author

re: Videos instead of gifs, we did this recently with Veloren's blog on Zola, and it has made it significantly easier to show off more content with smaller filesize. I think this would be a good move even outside of this particular issue.

Sounds like a win-win then :)

I think netlify is a pretty good path

I already use Netlify for all of my personal/Tetra stuff, and I much prefer it to GitHub Pages personally. They also have a 100GB a month bandwidth limit, but no deploy size limit as far as I've been able to tell. Think the only downside to this would be that we'd need a seperate login rather than being able to manage it all via GitHub (you can't add team members on a free account, annoyingly).

I think there's a few other services that offer similar 'instant deploy from Git' functionality, I'll look into those.

we can probably retroactivly go back and do a batch conversion of Gifs -> MP4 on the repo.

👍 though we may have to do some playing around with the Git history if we want to free up the associated space from the index: https://docs.github.com/en/github/managing-large-files/removing-files-from-a-repositorys-history

I don't think this would be a huge problem, as I don't think people keep long-lived checkouts of the repo, but it would require a force push to source.

@17cupsofcoffee
Copy link
Collaborator Author

Ah, one other thing we'd have to figure out to switch to videos would be how we embed them - there's not a standard syntax for embedding videos into Markdown, so we'd potentially have to use a Zola shortcode? Not sure how Veloren has solved this, will have to take a look.

I think if we do go that route we'd need to really clearly flag it up in the co-ordination issue.

@AngelOnFira
Copy link
Member

AngelOnFira commented May 14, 2021

though we may have to do some playing around with the Git history if we want to free up the associated space from the index

We've done this before when we moved Veloren's assets to LFS, and there are some pretty good tools out there to rewrite history like this 👍

Ah, one other thing we'd have to figure out to switch to videos would be how we embed them - there's not a standard syntax for embedding videos into Markdown, so we'd potentially have to use a Zola shortcode?

Ya, we use Zola shortcodes for videos. It looks like this:

<figure class="inline-image">
	<video width="100%" controls autoplay playsinline muted loop>
		<source src="{{ src }}" type="{% if type %}{{ type }}{% else %}video/mp4{% endif %}">
		Your browser does not support the video tag.
	</video>
	{% if caption %}<figcaption><p>{{caption}}</p></figcaption>{% endif %}
</figure>

and can be used like:

# Videos

mp4 video, direct download (e.g. a discord url):

`{{ video(src="https://.../foo.mp4") }}`

With a caption:

`{{ video(src="https://.../foo.mp4", caption="This is a nice video") }}`

Different video type:

{{ video(src="https://.../foo.ogv", type="video/ogg") }}
{{ video(src="https://.../foo.ogv", type="video/ogg", caption="An ogg vorbis video") }}

Youtube video:

`{{ youtube(id="dQw4w9WgXcQ") }}`

With caption:

`{{ youtube(id="dQw4w9WgXcQ", caption="Best song ever.") }}`

I think the youtube one is a different shortcode.

@17cupsofcoffee
Copy link
Collaborator Author

A bit of lunchtime hacking - running this in the root of the repo converts all GIFs to (hopefully) web-optimized MP4s:

#!/usr/bin/python

import os, subprocess

def process_directory(path):
    for root, dirs, files in os.walk(path):
        print("Processing: " + root)
        
        for filename in files:
            name, ext = os.path.splitext(filename)

            if ext == ".gif":
                convert_gif(os.path.join(root, filename))

        for dirname in dirs:
            process_directory(os.path.join(root, dirname))


def convert_gif(path):
    print("Converting to MP4: " + path)

    name, _ = os.path.splitext(path)
    new_name = name + ".mp4"

    subprocess.call([
        "ffmpeg",
        "-i", path,
        "-movflags", "+faststart",
        "-pix_fmt", "yuv420p",
        "-profile:v", "baseline",
        "-level", "3.0",
        "-vf", "crop=trunc(iw/2)*2:trunc(ih/2)*2",
        "-preset", "slow",
        new_name 
    ], shell=True)

    path("Deleting GIF: " + path)

    os.remove(path)


process_directory("./content")

Still need to figure out the best way of batch replacing image tags in Markdown with video tags or shortcodes - I feel like I'd need to pull in a Markdown parser to do that, since they're all formatted differently/some have alt text/some are inside links.

@17cupsofcoffee
Copy link
Collaborator Author

Updated script with find-and-replace for the markdown files:

Script
#!/usr/bin/python

import os, subprocess, re, io

def convert_gif(path):
    print("Converting to MP4: " + path)

    name, _ = os.path.splitext(path)
    new_name = name + ".mp4"

    retcode = subprocess.call([
        "ffmpeg",
        "-i", path,
        "-movflags", "+faststart",
        "-pix_fmt", "yuv420p",
        "-profile:v", "baseline",
        "-level", "3.0",
        "-vf", "crop=trunc(iw/2)*2:trunc(ih/2)*2",
        "-preset", "slow",
        new_name 
    ], shell=True)

    print("Deleting GIF: " + path)

    os.remove(path)

# TODO: We could use a Markdown parser for this, but might be overkill, and would also
# need to be format-preserving.
gif_regex = re.compile('!\[(?P<alt>[^]]*)]\((?P<path>.+)\.gif\)')

def update_markdown(path):
    print("Updating markdown file: " + path)

    with io.open(path, mode="r+", encoding="utf-8") as file:
        content = file.read()
        output = gif_regex.sub("<video controls type=\"video/mp4\" src=\"\g<path>.mp4\"></video>", content)
        file.seek(0)
        file.write(output)
        file.truncate()

def process_directory(path):
    for root, dirs, files in os.walk(path):
        print("Processing: " + root)
        
        for filename in files:
            name, ext = os.path.splitext(filename)

            if ext == ".md":
                update_markdown(os.path.join(root, filename))

            if ext == ".gif":
                convert_gif(os.path.join(root, filename))

        for dirname in dirs:
            process_directory(os.path.join(root, dirname))

process_directory("./content")

I've tested this out locally with a few CSS tweaks, and it seems to work well (and vastly reduces the page download size, as the videos don't get preloaded):

image

I think the remaining questions before we could continue with this would be:

  • Should we be using video tags directly or a Zola shortcode? The latter seems like it'd be easier for contributors (but perhaps less discoverable), but might make life harder for reviews. Then again, embeds never seem to show for me in reviews as it is 😅
  • How would we handle GIFs going forward? I feel like we should stop allowing them if we go the video route, personally.

@ozkriff
Copy link
Member

ozkriff commented Dec 7, 2021

Then again, embeds never seem to show for me in reviews as it is

yeah, losing GitHub previews still sounds like a really big downside to me :(

@LPGhatguy
Copy link
Contributor

Video embeds launched on GitHub in May, 2021, though you need to use the web editor, and they show up in the markdown file as just... the video's URL.

I would love support for videos! I made a 3 second gameplay video for this month's post. As an MP4, it's a megabyte at 720p30 and 500 KB at 360p30, but it doesn't seem like I can turn it into a intelligible GIF without making it gigantic. 😢

@caspark
Copy link
Contributor

caspark commented May 30, 2024

FWIW I have been happy with (free) Cloudflare Pages for hosting my Zola site - there's a 20mb limit on individual files, but that's not a problem in this context. I tested that R2 is usable for storing & serving videos too (& has a free tier), but I haven't done the plumbing for my site to integrate it properly yet, so can't say I've used that portion in anger.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants