New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlite3.ProgrammingError with non-english attachments #243

Closed
item4 opened this Issue Jul 20, 2016 · 4 comments

Comments

Projects
None yet
3 participants
@item4
Contributor

item4 commented Jul 20, 2016

Python version: 2.7.11
Lektor version: master branch

Let's make simple and same situation

  1. $ lektor quickstart and $ cd to project dir. (in this example, project name is test. other fields must type enter to skip with yes)
  2. $ touch content/blog/first-post/미츠히코.txt (미츠히코 is Korean alphabet of mitsuhiko)
  3. $ lektor server

Traceback

(lektor) [item4@item4-mbp test]$ lektor server
 * Project path: /Users/item4/Projects/lektor/test/test.lektorproject
 * Output path: /Users/item4/Library/Caches/Lektor/builds/1d660b474e4a3c0330f4ce35b4827072
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
Started source info update
Finished source info update in 0.23 sec
Started build
U index.html
U about/index.html
U projects/index.html
U blog/index.html
U static/style.css
U blog/first-post/index.html
U blog/first-post/미츠히코.txt
Finished build in 0.07 sec
Started prune
Finished prune in 0.00 sec
Traceback (most recent call last):
  File "/Users/item4/Projects/lektor/lektor/devserver.py", line 49, in build
    builder.prune()
  File "/Users/item4/Projects/lektor/lektor/builder.py", line 1062, in prune
    for aft in build_state.iter_unreferenced_artifacts(all=all):
  File "/Users/item4/Projects/lektor/lektor/builder.py", line 371, in iter_unreferenced_artifacts
    and is_primary_source''', [artifact_name])
ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.
@singingwolfboy

This comment has been minimized.

Member

singingwolfboy commented Jul 31, 2016

This is a bytestring/unicode problem, and one that's probably going to be difficult to fix. Bytestrings can't reliably handle characters outside of the ASCII character space, while unicode strings can. In Python 2.7, the str type is a bytestring, while in Python 3, the str type is a unicode string. (This is one of the biggest changes between Python 2 and Python 3.)

If you look in the lektor/builder.py file, you'll see that the error is coming from the iter_unreferenced_artifacts() method, which is iterating over all the files in the destination path. Notice the following line:

for dirpath, dirnames, filenames in os.walk(dst):

That os.walk function is from Python's os module. The values returned from that function are of type str, which are bytestrings on Python 2 and unicode strings on Python 3. As a result, this function will just work on Python 3, but when you're running on Python 2 and you have non-ASCII filenames, you'll get a failure like this. It may be possible to decode the bytestrings into unicode strings on Python 2, but I imagine that doing so will be heavily dependent on what character encoding the filesystem is using, which will make this very brittle and error-prone.

I don't have a good solution to this problem on Python 2, but if you can switch to Python 3, I would highly suggest doing so. I just tested this example on Python 2 and Python 3, and while I was able to reproduce the failure on Python 2, switching to Python 3 made it go away. Give it a try!

@item4

This comment has been minimized.

Contributor

item4 commented Jul 31, 2016

@singingwolfboy I have no problem with using Python 3, I actually prefer using it. But I believe mitsuhiko aims to support Python 2 as well.

item4 added a commit to jinsu-kim/jinsu-kim.github.io that referenced this issue Aug 12, 2016

@haseenapa

This comment has been minimized.

haseenapa commented Oct 9, 2016

I have edited /Users/item4/Projects/lektor/lektor/builder.py and added a single line

con.text_factory = lambda x: unicode(x, 'utf-8', 'ignore')

after the following line

con = sqlite3.connect(self.buildstate_database_filename,
                              timeout=10, check_same_thread=False)

Ref Link : http://hakanu.net/sql/2015/08/25/sqlite-unicode-string-problem/

@item4

This comment has been minimized.

Contributor

item4 commented Oct 13, 2016

@haseenapa Thanks for solution. I will make PR.

item4 added a commit to item4/lektor that referenced this issue Oct 13, 2016

Make text_factory for PY2
Read the lektor#243 for more info.

fix lektor#243
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment