New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bytecode.rpyb recompiled when there's no changes #1647

Open
Beuc opened this Issue Nov 19, 2018 · 2 comments

Comments

Projects
None yet
2 participants
@Beuc
Contributor

Beuc commented Nov 19, 2018

While working on RenPyWeb I noted that bytecode.rpyb was updated even when there was no change in any .rpy file.
This caused a gratuitous delay of dozen(s) of seconds on load for the web platform due to recompiling a lot of Python snippets embedded in the renpy/common/ .rpyc/.rpymc files.
Currently I fixed the issue by having developers ship their renpy/common/ directory but this was strange nonetheless.

I tracked the culprit in ast.py:

        # The time is necessary so we can disambiguate between Python
        # blocks on the same line in different script versions.
        self.location = loc + ( int(time.time()), )
...
    def get_hash(self):
...
        self.hash = chr(renpy.bytecode_version) + hashlib.md5(repr(self.location) + code.encode("utf-8")).digest()

Ren'Py checks if the Python code located in a .rpyc file was already compiled by checking whether its pre-computed hash is referenced in bytecode.rpyb. This pre-computed hash changes at each compilation, even if the resulting bytecode is the exact same, due to using time().

Since the .rpyc/.rpymc files are recompiled every time one makes a new Distribution (with Force recompile), this means a game's bytecode.rpyb never carries the same hashes even when compiling the same snippets twice in a row. Consequently the bytecode hashes referenced in the renpy/common/ shipped in renpyweb will never be found in a game's game/bytecode.rpyb even if these bytecodes are identical.

It seems to me that the location of the Python snippet does not matter as long as the resulting bytecode is bit-for-bit identical. I tested with:

        self.hash = chr(renpy.bytecode_version) + hashlib.md5(code.encode("utf-8")).digest()

and with this bytecode.rpyb is left untouched, unless there was an actual change in the Python snippet.
This is similar to the way Git identifies files with their SHA-1.

I understand that we ship a full renpy/ directory in a Distribution anyway (we'll do that even in renpyweb eventually), so this unnecessary recompilation does not happen often.
However (maybe that's my Reproducible Builds background kicking in) I do not see a valid reason to make the bytecode hashes change every time.

Did I miss something?

@renpytom

This comment has been minimized.

Member

renpytom commented Nov 22, 2018

I think you did miss something.

The main thing is that:

Doesn't run during a "normal" run of Ren'Py, one where no script file has changed. During a normal run, the .rpy file won't change, and hence Ren'Py will load in the .rpyc file instead (see

def load_appropriate_file(self, compiled, source, dir, fn, initcode): # @ReservedAssignment
for the details of that.)

Because of that, the hash won't change, the bytecode will be looked up, the bytecode cache won't get dirtied, and Ren'Py will start fast.

So I think there's a problem here, but my bet is that something is changing the md5 of the .rpy file between the time it is created and the time it is loaded into Ren'Py, and that's causing it to be recompiled unnecessarily.

(Ren'Py actively rejects Reproducible Builds. The .rpyc files contain annotations needed to make a game loadable - annotations that if included in the script, would make the scripts unbearable to write.)

@Beuc

This comment has been minimized.

Contributor

Beuc commented Nov 22, 2018

Hi,

Note: I'm talking about Python snippets compilation only, not about the .rpy->.rpyc AST-compilation.
I.e. all this applies even if you remove all the .rpy files.

I did see that in the common case the bytecode isn't dirtied.
What I saw is that the bytecode's hash changes everytime even when the resulting bytecode didn't change.
Since the user's Distribution's and renpyweb's .rpyc files were generated at a different time, they reference different hashes for the exact same Python snippets bytecode.
Consequently if the user gives renpyweb their game/ without their very own renpy/common/ replacement, all the Python snippets for renpyweb's renpy/common/*.rpyc are recompiled, because their bytecode was stored in the cache with different hashes than the ones referenced in renpyweb's renpy/common/*.rpyc.

Again, that's a confusing corner-case situation, that could be avoided by using a stable hash for the Python snippets bytecode. If we care to.

(I don't think .rpyc files are a reason to "reject" Reproducible Builds. From what I understand .rpyc are partial source files too. For clarity we could even have 3 files: the source .rpy, the annotations that complement the .rpy (e.g. a ".rpya"), and the purely-derived/compiled parts in a .rpyc that could be safely removed/recreated. Anyway, just an aside.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment