New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it necessary to store file content in db.json for large blog? #3271

Open
ahuigo opened this Issue Sep 26, 2018 · 3 comments

Comments

2 participants
@ahuigo

ahuigo commented Sep 26, 2018

I have nearly about 800 markdown files, and it leads db.json increment to about 20M.

I don't think It is necessary to store content within db.json.

@tcrowe

This comment has been minimized.

Show comment
Hide comment
@tcrowe

tcrowe Sep 29, 2018

Contributor

Yeah @ahuigo that is an ongoing discussion what to do with large sites. Any ideas? Just keep in memory or what? The db.json is a cache so it doesn't re-parse anything it already parsed.

Contributor

tcrowe commented Sep 29, 2018

Yeah @ahuigo that is an ongoing discussion what to do with large sites. Any ideas? Just keep in memory or what? The db.json is a cache so it doesn't re-parse anything it already parsed.

@ahuigo

This comment has been minimized.

Show comment
Hide comment
@ahuigo

ahuigo Sep 30, 2018

Some ideas about decreasing the building time of Hexo.

  1. The db.json
    1. Stores only markdown files's meta info(path,title,date,updated,category). and building info such as last building time.
    2. Don't cache the whole file content in db.json. Read the content directly from file system If we need it.
  2. We can just find out the modified files via git ,find, or other tools . https://stackoverflow.com/questions/16085958/scripts-find-the-files-have-been-changed-in-last-24-hours
  3. Support incremental building. We can just build the modified files only when build site every time.
    Building should not relate to unmodified files .

For example:

# hexo g;
# {build_meta:{'last_time':'2018-09-29...'}, files_meta:{...}}
dbinfo = parse('db.json') 
cmd = 'git diff-index --cached --name-status --diff-filter=ACMRD HEAD -- ./_posts '
output = getoutput(cmd).strip()
if output:
    # find out modified files and deleted files
    modified_blogs = {}
    delete_blogs = []
    for line in output.split('\n'):
        status, path = line.split('\t')
        if status == 'D':
            delete_blogs.append(path)
            continue

        blog = parseBlog(path)
        modified_blogs[path] = blog['meta']

    # delete file
    if path not in dbinfo['files_meta']:
        html_path = f'public/{path}.html'
        getoutput(f'rm {html_path}')
        hexo_delete_tags(file_meta)
        hexo_delete_category(file_meta)

    # add & update file(Incremental Building)
    for path,file_meta in modified_blogs.items():
        hexo_generate_html(path)
        hexo_add_update_tags(file_meta)
        hexo_add_update_category(file_meta)

    # save db.json
    hexo_update_db('db.json',modified_blogs, delete_blogs)

ahuigo commented Sep 30, 2018

Some ideas about decreasing the building time of Hexo.

  1. The db.json
    1. Stores only markdown files's meta info(path,title,date,updated,category). and building info such as last building time.
    2. Don't cache the whole file content in db.json. Read the content directly from file system If we need it.
  2. We can just find out the modified files via git ,find, or other tools . https://stackoverflow.com/questions/16085958/scripts-find-the-files-have-been-changed-in-last-24-hours
  3. Support incremental building. We can just build the modified files only when build site every time.
    Building should not relate to unmodified files .

For example:

# hexo g;
# {build_meta:{'last_time':'2018-09-29...'}, files_meta:{...}}
dbinfo = parse('db.json') 
cmd = 'git diff-index --cached --name-status --diff-filter=ACMRD HEAD -- ./_posts '
output = getoutput(cmd).strip()
if output:
    # find out modified files and deleted files
    modified_blogs = {}
    delete_blogs = []
    for line in output.split('\n'):
        status, path = line.split('\t')
        if status == 'D':
            delete_blogs.append(path)
            continue

        blog = parseBlog(path)
        modified_blogs[path] = blog['meta']

    # delete file
    if path not in dbinfo['files_meta']:
        html_path = f'public/{path}.html'
        getoutput(f'rm {html_path}')
        hexo_delete_tags(file_meta)
        hexo_delete_category(file_meta)

    # add & update file(Incremental Building)
    for path,file_meta in modified_blogs.items():
        hexo_generate_html(path)
        hexo_add_update_tags(file_meta)
        hexo_add_update_category(file_meta)

    # save db.json
    hexo_update_db('db.json',modified_blogs, delete_blogs)
@ahuigo

This comment has been minimized.

Show comment
Hide comment
@ahuigo

ahuigo Oct 6, 2018

I've written a script to generate static blog. https://github.com/ahuigo/a/blob/master/tool/pre-commit It's only for my own use, not for hexo.

ahuigo commented Oct 6, 2018

I've written a script to generate static blog. https://github.com/ahuigo/a/blob/master/tool/pre-commit It's only for my own use, not for hexo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment