Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mtime based cache id problems #28

Closed
glensc opened this issue May 31, 2012 · 17 comments
Closed

mtime based cache id problems #28

glensc opened this issue May 31, 2012 · 17 comments

Comments

@glensc
Copy link
Collaborator

glensc commented May 31, 2012

currently if using groupsConfig, then cache id becames simply max(mtime) of all files.

so, the cache id will remain the same, if i add more files to the group having identical timestamp.

i guess there should be added * NUMBER_OF_FILES to avoid this problem

so i had in my groupsConfig.php:

  'l' => array(
      '/path/to/lazyload.js'
  ),

then url built with was:

  g=l&1338376276

later i added one more file:

  'l' => array(
      '/path/to/uaft.js',
      '/path/to/lazyload.js'
  ),

and the url remained the same:

  g=l&1338376276

as the files have identical timestamp:

$ ls -l --full uaft.js lazyload.js 
-rw-r--r-- 1 root root 1.3K 2012-05-30 14:11:16.000000000 +0300 lazyload.js
-rw-r--r-- 1 root root 1.7K 2012-05-30 14:11:16.000000000 +0300 uaft.js

expanding that timestamp:

$ LC_ALL=C php -r 'echo strftime("%c %z\n", 1338376276);'
Wed May 30 14:11:16 2012 +0300
@mrclay
Copy link
Owner

mrclay commented May 31, 2012

Agreed, Minify_HTML_Helper::getRawUri should take into account more than the max mtime. Since this is likely to run on every HTML page, it should probably not be more advanced than md5(serialize(arrays of filepaths and individual mtimes)), then cut that into 10 or so digits. Or better, allow non-digits in the trigger for far off expires.

@glensc
Copy link
Collaborator Author

glensc commented May 31, 2012

anyway, i added testsuite for this :
https://github.com/glensc/minify/tree/issue-28

i tried to link that branch to this issue (without creating pull request with another issue), but failed

glensc added a commit to glensc/minify that referenced this issue May 31, 2012
@glensc
Copy link
Collaborator Author

glensc commented May 31, 2012

how about just adding count() of files in group to existing mtime value? chances it will get the old value with just adding new file to group are really small (you need then to have all group members mtime be decreased by amount of files being added to group and that's just not that realistic than just having all files in group with same mtime)

unfortunately $this->_filePaths is empty in groups mode, need to dig more how to get access to files in groups

@mrclay
Copy link
Owner

mrclay commented May 31, 2012

To get the mtimes you need the filepaths so we may as well use it (it will involve some refactoring). If we don't someone will complain that they swapped out one file for another and it didn't change...

@glensc
Copy link
Collaborator Author

glensc commented May 31, 2012

i wonder, is the original mtime calculation still needed?

as i see two paths:

  • modify Minify_HTML_Helper::setGroup and fill $this->_filePathChecksum with new method similar to getLastModified, which does crc32(serialize($filepaths))
  • modify Minify_HTML_Helper::getRawUri and append $this->_filePathChecksum to the url, if it is not empty

rename Minify_HTML_Helper::getLastModified and do the same in just one iteration over group members

for both remains question, perhaps should instead mathematically sum the _filePathChecksum with _lastModified instead of string concat, so the urls won't grow too long? or that would cause weird side effects...

in fact i have patch ready for 1. already

@glensc
Copy link
Collaborator Author

glensc commented May 31, 2012

correction: if we rename ->lastmodified to something new (combination of lastmod+filepath) then the result won't be that long, just one number. additionally need to run it over crc32 and do sprintf %u over it (to avoid 32/64 bit differences)

@glensc
Copy link
Collaborator Author

glensc commented Jun 1, 2012

here's implementation for 1) - https://gist.github.com/2851198

@glensc
Copy link
Collaborator Author

glensc commented Jun 5, 2012

ping

@mrclay
Copy link
Owner

mrclay commented Jun 5, 2012

On 6/1/12 7:01 AM, Elan Ruusamäe wrote:

here's implementation for 1) - https://gist.github.com/2851198

Please fork and submit this as a pull request.

Steve

http://www.mrclay.org/

glensc added a commit to glensc/minify that referenced this issue Jun 5, 2012
@glensc
Copy link
Collaborator Author

glensc commented Jun 5, 2012

i have forked already, linked to this issue as well, just wanted to know opinion which way to make next commit. but fine. i'll commit the 1) and you can merge:

https://github.com/glensc/minify/tree/issue-28

do you really need pull request to merge from there? as that would create another issue in your repository

@mrclay
Copy link
Owner

mrclay commented Jun 6, 2012

Sorry, your branch is fine. I'll look at this ASAP and probably will merge as is.

@glensc
Copy link
Collaborator Author

glensc commented Jun 22, 2012

Okay, the checksum of file paths, is working okay, but there's still problem when i replace oldest file with even more older file, i think that here should additionally just sum mtimes together and take just last 4 bytes (to allow 32bit systems overflow and get same result as 64bit systems)

glensc added a commit to glensc/minify that referenced this issue Jun 22, 2012
glensc added a commit to glensc/minify that referenced this issue Jun 22, 2012
@glensc
Copy link
Collaborator Author

glensc commented Jun 22, 2012

rewrite it once again, pls review whole branch diff

glensc/minify@master...issue-28

glensc added a commit to glensc/minify that referenced this issue Jun 22, 2012
@glensc
Copy link
Collaborator Author

glensc commented Jul 3, 2012

did you have time to review?

@glensc
Copy link
Collaborator Author

glensc commented Aug 16, 2012

i think cacheId should be changed as well. otherwise client is told to fetch new content, with new url, however minify internally will still serve old content because for it nothing changed.

currently it tried to understand how internal cacheId works, but i found it doesn't even use files or for it at all. i.e Minify::_getCacheId() has:

    protected static function _getCacheId($prefix = 'minify')
    {       
        $name = preg_replace('/[^a-zA-Z0-9\\.=_,]/', '', self::$_controller->selectionId);
        $name = preg_replace('/\\.+/', '.', $name);
        $name = substr($name, 0, 200 - 34 - strlen($prefix));
        $md5 = md5(serialize(array(
            Minify_Source::getDigest(self::$_controller->sources)
            ,self::$_options['minifiers'] 
            ,self::$_options['minifierOptions']
            ,self::$_options['postprocessor']
            ,self::$_options['bubbleCssImports']
            ,self::VERSION
        )));
        return "{$prefix}_{$name}_{$md5}";
    }

and Minify_Source::getDigest has:

    public static function getDigest($sources)
    {  
        $info = array();
        foreach ($sources as $source) {
            $info[] = array(
                $source->_id, $source->minifier, $source->minifyOptions
            );
        }
        return md5(serialize($info));
    }

am i looking something wrong? or this Minify_Source::getDigest shouldn't look mtime/content/path at all currently?

@mrclay
Copy link
Owner

mrclay commented Nov 24, 2012

I did not want tons of old cache files piling up requiring a cleaning process, so cacheId regards everything EXCEPT timestamps. That way, if the user is just altering files, the Minify cache files are reused.

You're right that this means the cache is not properly invalidated if MAX(timestamps) doesn't increase. It requires the user to touch() at least one source file in that case.

I think the only real fix for this is to move to a system like this.

@mrclay
Copy link
Owner

mrclay commented Sep 29, 2015

Closing in favor of #41

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants