New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rendering courses from forks and content fragments caching #362

Merged
merged 1 commit into from Apr 13, 2018

Conversation

Projects
None yet
3 participants
@mikicz
Copy link
Member

mikicz commented Apr 2, 2018

So, I guess it's about time to submit this... I'm submitting now so you guys can get acquainted with the implementation before the sprint. (I'll come Friday afternoon.). Feel free to question anything and everything.

How courses from forks work:

Courses and runs in the base repo are defined by info.yml, the forked courses by link.yml. That file can contain two values, but only repo is required (https link to a git repo), the default for branch is master.

The forks are serviced by model CourseLink. The model has just a selection of attributes, which are required for a list of courses/runs, for a list of ongoing runs for base-courses and for cache determination.

Otherwise, contents of course pages are rendered in the fork. (So even the formats of the pages can be updated). By content I mean: things which are in the template folder templates/content or rendered lessons (Page.render_html). These templates are also used in rendering things from the base repository, via inclusion.

The content rendered in the fork is run through a HTML whitelist (naucse.utils.routes:AllowedElementsParser) and then put into templates from templates/link.

In the fork, I'm using the logger which gets urls from url_from, which are returned with the content and metadata and are put in a queue of absolute urls to be rendered, so the urls from the forks are frozen without parsing. Only urls leading to pages inside the course are added to the queue.

There's a new file in the root of the repository, config.yml, which contains info about the current repo and branch. This file should be modified in the fork with the info about the fork, so edit links lead to the right direction. Alternatively, if the fork is not on GitHub, the edit link can be redefined completely in naucse.utils.forks:get_edit_page_name, naucse.utils.forks:get_edit_icon and naucse.templates:edit_link.

How caching works:

So the first thing that needs to be said about caching is that Arca has its own caching. If cache is set, every result is stored under the commit hash of the repo at the time of execution and the task definition. So if the same task is executed and nothing changes in the repo, the result comes straight from cache. This is useful for forks, since the tasks and the repos won't change often. (Arca cache is persistent on Travis, see bellow.)

But that's not the only caching I implemented. I also did the content fragments caching as requested in #175. I'm reusing the arca dogpile.cache region for it.

Since absolute urls are generated in lessons for almost anything (subpages, solutions, static), the first change I did was generating relative urls instead. Since the url generators were already injected in the rendering, I just modified the functions to generate relative urls. The functions are generated in naucse.routes:relative_url_functions.

The next thing needed was the list of urls generated in a specific fragment, so it doesn't have to be parsed to get urls to freeze. I used the logger from Frozen-Flask again, this time in a custom context manager way (naucse.freezer:temporary_url_for_logger).

So that's the value - a content fragment with relative urls, so it can be reused and with it the list of the relative urls that have to be frozen when used.

The key is bit more complicated. I created names which can share individual fragments. The namespace is the commit hash of the commit which last modified anything inside of the folder naucse. The rest of the keys is hash of the following information:

  • the lesson/subpage/solution identificators
  • course variables (or None for canonical lessons)
  • the commit which last modified the folder the lesson is defined in

So when the value is retrieved from cache, absolute urls are generated from the current url and the relative urls and they're added to the queue of urls to be frozen.

Now only sharing the cache with forks is left to explain:

The base repository creates a key for the content fragment for the specific page from the fork on the same principles as described as above. The cache is checked if the value is present. If the value is present, the key is provided to the fork and the fork can decide if it wants to use it. If so, None is returned instead of content and the base repository uses the value from cache.

If the value is not present in cache, the key is not provided and the fork has to render the content on its own. The value the fork returns is then used to populate the cache for next time.

How are errors handled:

By default, all errors in forks are silenced. This can be overridden by setting the environment variable RAISE_FORK_ERRORS to true.

Of course, the forks can be malfunctioning in a number of ways.

Travis build:

The Docker backend is used in Travis. To speed things up, the built images (with installed requirements) are pushed to a registry, so next time they can be pulled instead of building. While implementing the solution I was using https://hub.docker.com/r/mikicz/naucse/tags/, but I think the best solution would be to create a new account for naucse.

Unfortunately, the build takes some extra time compared to the current version once forks are introduced, this is mainly because of the extra boot time (sudo is required for docker) and pulling/building images.

The first thing I've done to improve the situation is persistent arca cache. (Meaning if there are no new commits in a fork, the whole thing can be retrieved from cache.)

If a really bad combination comes about (e.g. clean cache and new requirements), the freeze itself can be killed by Travis. (It kills builds that don't print anything in 10 minutes (and freezing doesn't)).

So the second thing is a new cli command which lists all courses and runs with info about them (slug and title, dates for runs, repo/branch for forked). Even courses which can't be pulled or don't return basic info are printed (only slug, repo/branch and a warning), so it serves like a nice overview what's actually being rendered. But the practical effect is that all docker images are pulled or built during this command, which also prints stuff regularly so it won't be killed and the freezing itself doesn't take that much time.

Finally, travis prints out debug information about what failed (the exact url is always mentioned so debugging is easier...). The sleep 1 command is necessary so Travis can read the output before the machine is shut down.

Webhooks:

I implemented the webhook for triggering builds in a separate repository at mikicz/naucse-hooks. Currently, the hook has to be installed manually, I'll code the automatic creation later.

It's a simple Flask app which listens at /hooks/push/ for events from GitHub. It only reacts to ping (to verify the hook was installed correctly) and to push. Before a build is triggered, the app checks that the pushed repository and branch is actually used in Naucse (all the link.yml files are checked, Naucse is pulled using Arca).

For my own naucse I've deployed the hook at naucse_hooks.poul.me. I'm willing to keep hosting the hooks on my VPS (vpsfree.cz), however, I'd rather not use the poul.me domain. (Maybe a new domain, something like naucsepython.cz?) If you don't like me hosting the app I'll figure out something else (like PythonAnywhere or some reasonable serverless?).

Testing:

Naucse uses the Current Environment backend when launched locally (as mentioned in #175). Caching is enabled. So pages from forks are cached and so are content fragments in lessons from the base repository. The cache for content fragments in the base repository is disabled automatically if your local version is modified (so changes you're making are always visible)

In addition to testing stuff manually I've written a couple of tests. They make a local fork from the current state of the repo (even with changes), however, they only test the basic stuff, usually just the structure of the response and couple of values. Furthermore, they test how naucse behaves if there's an error in a fork - that the page returns 200 with a warning.

What should to be finalized:

  • Release Arca on PyPI so it's not installed from my GitHub
  • Where to push docker images (see above)
  • Deploy production hook somewhere (see above)

What's not in this PR:

I will submit these later in further pull requests.

  • Automatic webhook installation (to mikicz/naucse-hooks)
  • The meta course with a description of the process (to this repository)

@mikicz mikicz requested review from encukou and hroncok Apr 2, 2018

@mikicz mikicz force-pushed the mikicz:arca_implementation branch from 39e305f to 27cbb39 Apr 2, 2018

@mikicz

This comment has been minimized.

Copy link
Member Author

mikicz commented Apr 2, 2018

Oh, Travis doesn't trust me to provide an encrypted variable [1], so I had to comment out settings which push images to docker (which isn't done in this PR anyway, because there aren't any forked repositories...)

[1] https://docs.travis-ci.com/user/pull-requests/#Pull-Requests-and-Security-Restrictions

@hroncok hroncok added the sprint-idea label Apr 2, 2018

@hroncok

This comment has been minimized.

Copy link
Member

hroncok commented Apr 2, 2018

This is huge! I will try to go trough it before you arrive, but no promises.

@mikicz

This comment has been minimized.

Copy link
Member Author

mikicz commented Apr 2, 2018

Yeah, I know. I should've submitted this a while ago tbh to give you more time, but you know, procrastination... Plus I wanted to write up what I'm actually doing so you don't have to go by code alone...

A large part of the diff is a refactor of templates (which, I think, is more manageable if you only look at the result and not at the diff), but there's a lot of code as well too...

@encukou

This comment has been minimized.

Copy link
Member

encukou commented Apr 6, 2018

  • The whitelist only checks HTML tags, not attributes
@encukou

This comment has been minimized.

Copy link
Member

encukou commented Apr 6, 2018

  • For the hash key, you can use the Git tree hash instead of the commit hash.
@mikicz

This comment has been minimized.

Copy link
Member Author

mikicz commented Apr 6, 2018

  • Don't use ShutdownableFreezer, the default Frozen-Flask is propable good enough
  • Prefer logged calls in AllLinksLogger
@encukou

This comment has been minimized.

Copy link
Member

encukou commented Apr 6, 2018

  • the hook should react to tagging
@encukou

This comment has been minimized.

Copy link
Member

encukou commented Apr 6, 2018

  • I should point the hooks.nauc.se domain to your server, for now

yield logger

app.url_default_functions[None].pop(0)

This comment has been minimized.

@hroncok

hroncok Apr 6, 2018

Member

Add comment what this does.

class NaucseFreezer(ShutdownableFreezer):

def __init__(self, app):
super(NaucseFreezer, self).__init__(app)

This comment has been minimized.

@hroncok

hroncok Apr 6, 2018

Member

use super() (no arguments are necessary)

@@ -430,21 +465,21 @@ def __str__(self):

canonical = DataProperty(info, default=False)

COURSE_INFO = ["title", "description", "vars", "canonical"]

This comment has been minimized.

@hroncok

hroncok Apr 6, 2018

Member

Add comment that explain what those two lists represent and what should be done if we need to extend them in the future, not to break old forks.

.travis.yml Outdated
script:
- python -m pytest test_naucse
- python -m naucse list_courses

This comment has been minimized.

@encukou

encukou Apr 6, 2018

Member

Please add a comment on why you're adding this – the pull request description isn't kept with the code

@encukou

This comment has been minimized.

Copy link
Member

encukou commented Apr 6, 2018

  • config.yml (which says the repo URL and branch) should be removed (and replaced by passing the info through the system)
# see the generator for details
freezer.register_generator(lesson_static_generator)

cli(app, base_url='http://naucse.poul.me', freezer=freezer)

This comment has been minimized.

@encukou

encukou Apr 6, 2018

Member
  • Don't forget to change this back
"""
from naucse.routes import model

def canonical(course, x=""):

This comment has been minimized.

@encukou

encukou Apr 6, 2018

Member

Use suffix, not x.

to_return = [None, None, None]

from naucse.routes import logger
logger.info(result.output)

This comment has been minimized.

@hroncok

hroncok Apr 6, 2018

Member

debug maybe?

result = arca.run(self.repo, self.branch, task, reference=Path("."), depth=-1)

links = ["prev_link", "session_link", "next_link"]
to_return = [None, None, None]

This comment has been minimized.

@hroncok

hroncok Apr 6, 2018

Member

I suggest to rework this a bit as we talked about it. It is very hard to grasp now.

class RunYear(Model):
"""A year of runs"""
def __str__(self):
return self.path.parts[-1]

runs = DirProperty(Course)
runs = MultipleModelDirProperty([("info.yml", Course), ("link.yml", CourseLink)])

This comment has been minimized.

@hroncok

hroncok Apr 6, 2018

Member

If you set Course.data_filename = "info.yml" and CourseLink.data_filename = "info.yml" and check that in MultipleModelDirProperty, it no longer needs to take an ordereddict-like list, but a simple list of models is enough.

config = YamlProperty()

slug = DataProperty(config)
branch = DataProperty(config, default="master")

This comment has been minimized.

@hroncok

hroncok Apr 6, 2018

Member

Would be nice if you read this from git, and not from a config file. (But that's not blocking this PR.)

@@ -49,12 +49,19 @@
<div class="footer container">
<hr>
<div class="lesson-attribution">
<p>
{% if edit_info is defined and edit_info %} {# the page is from fork #}

This comment has been minimized.

@hroncok

hroncok Apr 6, 2018

Member

Instead of not setting edit_info on non forks, make it so the default values are the ones that would result in the following else in this template. Than, you cen get rid of this if-elsing entirely.

Jelikož ten obsah pouze upravoval obsah ze základního repozitáře, zobrazujeme zde původní verzi.
Níže jsou informace o repozitáři ze kterého měl být obsah vykreslen. Jestli chceš pomoci, můžeš například
napsat správci daného repozitáře nebo v něm vytvořit ticket.
Všechny podrobnosti proč se nepovedlo stránku vykreslit by měly být na

This comment has been minimized.

@hroncok

hroncok Apr 6, 2018

Member

Všechny podrobnosti, proč se nepovedlo stránku vykreslit, by měly...


# if content isn't cached or the version was refused, let's render
# the content here (but just the content and not the whole page with headers, menus etc)
if content == -1:

This comment has been minimized.

@hroncok

hroncok Apr 6, 2018

Member

Please don't use magic numbers to represent state. Create a constant that represents the state. It can be object() or even -1 internally, however in the code, one would only read if content == ...MISSING.


class CourseLink:

def __init__(self, course):

This comment has been minimized.

@hroncok

hroncok Apr 6, 2018

Member

Apparently, this was (probably) added because jinja templates didn't work with dicts. However I think that is false. Let's try what breakse if you pass the dict instead of a CourseLink instance.


if self.css is not None:
try:
self.css = Page.limit_css_to_lesson_content(self.css)

This comment has been minimized.

@hroncok

hroncok Apr 6, 2018

Member

If we indeed remove the ...Link classes, this needs to be put somewhere else.


from arca import Task, Arca

Path(".arca/cache").mkdir(parents=True, exist_ok=True)

This comment has been minimized.

@hroncok

hroncok Apr 6, 2018

Member

Make Arca create the directory on demand (when needed).

This comment has been minimized.

@mikicz

mikicz Apr 8, 2018

Author Member

I put Arca in a LocalProxy (discussed IRL) and also implemented the change into Arca...

mikicz/arca#22

@mikicz mikicz force-pushed the mikicz:arca_implementation branch 2 times, most recently from 61fd924 to 755751d Apr 7, 2018

@mikicz

This comment has been minimized.

Copy link
Member Author

mikicz commented Apr 7, 2018

  • Unless explicitly configured, ignore forks completely when running locally
@encukou

This comment has been minimized.

Copy link
Member

encukou commented Apr 7, 2018

  • Authors of branched courses should be able to build their branch locally, without Docker/Vagrant, and safely (no externaly linked courses)
.travis.yml Outdated
after_script:
- echo "Suplementary logs:"
- cat .arca/naucse.log
- sleep 1

This comment has been minimized.

@hroncok

hroncok Apr 9, 2018

Member

please comment why.

@hroncok

This comment has been minimized.

Copy link
Member

hroncok commented Apr 9, 2018

@mikicz let me know once ready for a next round of review.

I'm going to set up a new dockerhub account dedicated for this.

@mikicz

This comment has been minimized.

Copy link
Member Author

mikicz commented Apr 10, 2018

@hroncok I think it's ready for the next round of review.

.travis.yml Outdated
install:
- pip install -r test_requirements.txt
# https://docs.travis-ci.com/user/pull-requests/#Pull-Requests-and-Security-Restrictions
# before_script:
# - docker login -u "$DOCKER_HUB_USERNAME" -p "$DOCKER_HUB_PASSWORD"

This comment has been minimized.

@hroncok

hroncok Apr 13, 2018

Member

what happens on future pull-requests once this is not commented out?

This comment has been minimized.

@mikicz

mikicz Apr 13, 2018

Author Member

Yes, I thought about it. I'll introduce a new feature to Arca which will disable pushing to the registry but will keep pulling images. Then I'll conditionally either login to docker on master builds or turn on the "pull only" functionality of Arca in pull requests.

@@ -514,12 +560,161 @@ def default_end_time(self):
return self._default_time('end')


def optional_convert_date(x):

This comment has been minimized.

@hroncok

hroncok Apr 13, 2018

Member

what is x? can it ba called datestr?

return None


def optional_convert_time(x):

This comment has been minimized.

@hroncok

hroncok Apr 13, 2018

Member

what is x? can it ba called timestr?

""" Renders a page in the fork, checks the content and registers urls to freeze.
"""
if not naucse.utils.routes.forks_enabled():
raise ValueError("You must explicitly allow forks to be rendered.")

This comment has been minimized.

@hroncok

hroncok Apr 13, 2018

Member

Maybe this deserves it's own function? such as forks_raise_if_disabled()

_arca = Arca(settings={"ARCA_BACKEND": "arca.backend.CurrentEnvironmentBackend",
"ARCA_BACKEND_CURRENT_ENVIRONMENT_REQUIREMENTS": "requirements.txt",
"ARCA_BACKEND_VERBOSITY": 2,
"ARCA_BACKEND_APK_DEPENDENCIES": ["libffi-dev"],

This comment has been minimized.

@hroncok

hroncok Apr 13, 2018

Member

Are we still using apk? I am a bit behind here, so bare with me.

Also a comment about why libffi-dev is needed would be nice.

This comment has been minimized.

@mikicz

mikicz Apr 13, 2018

Author Member

Yes, I didn't have the time to move Arca from alpine yet, I'll do it in the next release of Arca

"ARCA_BACKEND_APK_DEPENDENCIES": ["libffi-dev"],
"ARCA_BACKEND_KEEP_CONTAINER_RUNNING": True,
# https://docs.travis-ci.com/user/pull-requests/#Pull-Requests-and-Security-Restrictions
# "ARCA_BACKEND_PUSH_TO_REGISTRY_NAME": "docker.io/mikicz/naucse",

This comment has been minimized.

@hroncok

hroncok Apr 13, 2018

Member

Once we enable this, what happens with pull-requests? Will it fail Travuis build?

Don't forget to update this to the naucse user.

@mikicz mikicz force-pushed the mikicz:arca_implementation branch from 8e74de5 to e6082b6 Apr 13, 2018

@mikicz mikicz changed the title WIP: Rendering courses from forks and content fragments caching Rendering courses from forks and content fragments caching Apr 13, 2018

@mikicz

This comment has been minimized.

Copy link
Member Author

mikicz commented Apr 13, 2018

I rebased the commits. Arca is being installed from PyPI, docker should be configured, http://naucse_hooks.poul.me/ is reconfigured to trigger builds of production naucse for now.

@hroncok
Copy link
Member

hroncok left a comment

I say let's deal with the rest afterwards. Pinging @encukou if he wants to stop me :)

@encukou

This comment has been minimized.

Copy link
Member

encukou commented Apr 13, 2018

I skimmed it and noticed no blockers. Go ahead!

@hroncok hroncok merged commit 825eb5a into pyvec:master Apr 13, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment