New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于Wordpress迁移时,生成文件名的中文名字问题。 #12

Open
Xuanwo opened this Issue Dec 14, 2015 · 17 comments

Comments

3 participants
@Xuanwo
Contributor

Xuanwo commented Dec 14, 2015

ref hexojs/hexo#1650

content:

从Wordpress迁移过来时,生成的文件名会出现如下例子
e8-af-ad-e8-a8-80-e7-89-b9-e6-80-a7-e8-bf-98-e6-98-af-e6-9c-89-e5-bf-85-e8-a6-81-e5-ad-a6-e4-b9-a0-e7-9a-84.md
e8-bd-af-e8-bf-9e-e6-8e-a5-e5-92-8c-e7-a1-ac-e8-bf-9e-e6-8e-a5.md
e8-bf-bd-e8-b8-aaquery-too-complex-not-enough-stack-e9-94-99-e8-af-af.md
我想这是将URL转化过来的结果,因为URL中,中文是UTF-8编码。这里之所以有问题是因为这样的文件名生成的URL和以前在Wordpress中不一样,这样之前在搜索引擎索引的文章就不能访问了,因为URL变了。
目前我的解决办法是写了一个Python脚本,然后从文件内容的title部分提取出文件名,代码如下:

import os,sys

def getTitle(firstLine):
    strs = ':'.join(firstLine.split(':')[1:])
    strs = strs.replace("'", '') 
    strs = strs.strip()
    title = '-'.join(strs.split(' '))
    return title
if __name__ == "__main__":
    dirName = sys.argv[1]
    for root,dirs,fileNames in os.walk(dirName):
        for fileName in fileNames:
            print fileName 
            print root
            fileName = os.path.join(root, fileName)
            f = open(fileName)
            firstLine = f.readline()
            title = getTitle(firstLine)
            print title
            content = firstLine + f.read()
            f.close()
            newname = title + '.md'
            print newname
            os.rename(fileName, os.path.join(root,newname))

是否有其它解决办法?

@SunnyBingoMe

This comment has been minimized.

Contributor

SunnyBingoMe commented May 4, 2016

我用了:
ALTER TABLE wp_posts DROP COLUMN post_name;
再导出 xml,文件名就是正常的 title 了。你试试?

@dengshilong

This comment has been minimized.

Contributor

dengshilong commented May 5, 2016

用脚本已经解决了,只是想要更好的办法。等下次再迁移Wordpress博客时试试吧。

@SunnyBingoMe

This comment has been minimized.

Contributor

SunnyBingoMe commented May 6, 2016

我找到了另一个办法,修改一句这个 migrate 插件的 index.js (line 56): slug = item.title[0], 详见: SunnyBingoMe@9440ecb

~~ @dengshilong : 你的脚本是不是把html的url逆向了?能分享一下么? ~~

@dengshilong

This comment has been minimized.

Contributor

dengshilong commented May 6, 2016

  • 不要slug = item.title[0]这样改,有些人的slug是英文的,是他们自己写的,不是用title。
  • 脚本就在上面啊。
@SunnyBingoMe

This comment has been minimized.

Contributor

SunnyBingoMe commented May 6, 2016

那个 slug 我是觉得在本地文件名用title更合适,恰好我不想用post-name做url,因为原博客url已经比较乱了,还是post-id 靠谱。
@dengshilong 汗,没注意脚本细节。。。 谢了。 突然想起来 url 在 js 可不可以用类似方法逆向?

@dengshilong

This comment has been minimized.

Contributor

dengshilong commented May 6, 2016

另外,谁把那次我commit的代码删了?什么原因? 现在又报错了

@SunnyBingoMe

This comment has been minimized.

Contributor

SunnyBingoMe commented May 6, 2016

额。。。 不好意思,没注意。。。 要我帮你改回来么。。。

@dengshilong

This comment has been minimized.

Contributor

dengshilong commented May 6, 2016

确实post-id靠谱。 本地文件名确实用title更合适。 但许多人的Wordpress博客会自己写slug啊,如果强制用title的话,到时候他们迁移到hexo时,绝对是一大堆404.

@dengshilong

This comment has been minimized.

Contributor

dengshilong commented May 6, 2016

是你改的?那你改回来吧。

@SunnyBingoMe

This comment has been minimized.

Contributor

SunnyBingoMe commented May 6, 2016

不对。。。 不是我。。。 我用的本地 index.js,只是改了自己的github hexo 插件。

@dengshilong

This comment has been minimized.

Contributor

dengshilong commented May 6, 2016

  • 那就算了,我来改回来
  • 我明白了,`ALTER TABLE wp_posts DROP COLUMN post_name;`这里,删掉的就是slug, 没有slug后,插件用title做标题了,所以你看到的文件名正常了。
@SunnyBingoMe

This comment has been minimized.

Contributor

SunnyBingoMe commented May 6, 2016

嗯,是的,之前在某个地方看到的,然后想到用 js 逆向,但是不想再去找,加上我博客之前多种permalink url都有,一直用php做301,现在就直接 post-id 了。

@dengshilong

This comment has been minimized.

Contributor

dengshilong commented May 6, 2016

你发现没,Hexo新建文章时,没有categories, 这不科学啊. 于是我机智的把scaffolds里的post.md加上了categories。我想post.md是由scaffold.js生成的。 你认为categories是不是必须的,如果需要,我去测试一下。

@SunnyBingoMe

This comment has been minimized.

Contributor

SunnyBingoMe commented May 6, 2016

额。。。 //(不是砸场子)
我在用 farbox,他们的wp迁移工具有问题,好不容易找到hexo的工具,很好用。。。

@dengshilong

This comment has been minimized.

Contributor

dengshilong commented May 6, 2016

Hexo确实不错。只是当初不能支持一篇文章多个category的设定把Wordpress迁移过来的数据弄的不好,或许当初一篇文章多个category也是有问题。

@SunnyBingoMe

This comment has been minimized.

Contributor

SunnyBingoMe commented May 6, 2016

嗯,这几天看了不少博客,hexo确实不错。我觉得阮一峰说的有道理, http://eleveneat.com/2015/04/24/Hexo-a-blog/
可能是自己懒了,就想随便写写东西,又不想有一堆dependencies。。。

@dengshilong

This comment has been minimized.

Contributor

dengshilong commented May 7, 2016

我找到解决的办法了,加一句if (slug) slug = decodeURI(slug);就可以了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment