allow zero-length strings #22

Ichimonji10 · 2014-05-17T12:27:13Z

Methods like generate_alphanumeric and generate_alpha allow the user to choose how long the resultant string should be. Each of those string generation methods checks length to ensure that the value is an integer and is not too short. There are two problems here:'

A length of zero is not allowed. That doesn't make sense. A user should be able to generate a zero-length string if they so desire.
The validation logic in each method is identical. It should be refactored out into a single private method.

The text was updated successfully, but these errors were encountered:

omaciel · 2014-05-17T13:46:02Z

My logic for not allowing zero-length strings is because if that is the case then you could just pass data="" or data=u"" to your test. It's pretty trivial to change the code to allow zero-length but I wonder if it is useful...

As far as refactoring the code that performs the validation, I agree and will do that soon-ish (unless you beat me to the punch and send me a PR) :)

Ichimonji10 · 2014-05-17T13:53:07Z

If you refactor out that validation logic, your test suite can be trimmed down significantly.

omaciel · 2014-05-17T13:57:26Z

Cool, will do that today.

Ichimonji10 · 2014-05-17T14:25:09Z

If you look at pull request 20, you'll see I only have three unit tests for generate_utf8. You can probably get by with just three tests for all of your other string generation methods, too. Doing so will significantly lighten your codebase while still providing thorough test coverage. That's the route I'd go, after refactoring out the validation logic.

In my opinion. ;)

omaciel · 2014-05-17T16:02:10Z

@Ichimonji10 created PR #23 and would love your feedback before merging it.

Btw, interestingly enough, when I run the test suite locally (python 2.7) I keep getting the following errors (though Travis seems to be happy):

 ======================================================================
ERROR: @Test: Create a unicode string.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/omaciel/hacking/fauxfactory/tests/test_strings.py", line 351, in test_generate_utf8_1
    result = self.factory.generate_string('utf8', 5)
  File "/Users/omaciel/hacking/fauxfactory/fauxfactory/__init__.py", line 85, in generate_string
    return cls.generate_utf8(length)
  File "/Users/omaciel/hacking/fauxfactory/fauxfactory/__init__.py", line 636, in generate_utf8
    output = u''.join(unichr(codepoint) for codepoint in codepoints)
  File "/Users/omaciel/hacking/fauxfactory/fauxfactory/__init__.py", line 636, in <genexpr>
    output = u''.join(unichr(codepoint) for codepoint in codepoints)
ValueError: unichr() arg not in range(0x10000) (narrow Python build)

======================================================================
ERROR: @Test: Create a unicode string and specify a length.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/omaciel/hacking/fauxfactory/tests/test_strings.py", line 366, in test_generate_utf8_2
    len(self.factory.generate_string('utf8', length)),
  File "/Users/omaciel/hacking/fauxfactory/fauxfactory/__init__.py", line 85, in generate_string
    return cls.generate_utf8(length)
  File "/Users/omaciel/hacking/fauxfactory/fauxfactory/__init__.py", line 636, in generate_utf8
    output = u''.join(unichr(codepoint) for codepoint in codepoints)
  File "/Users/omaciel/hacking/fauxfactory/fauxfactory/__init__.py", line 636, in <genexpr>
    output = u''.join(unichr(codepoint) for codepoint in codepoints)
ValueError: unichr() arg not in range(0x10000) (narrow Python build)

----------------------------------------------------------------------
Ran 150 tests in 0.849s

FAILED (errors=2)

omaciel · 2014-05-17T16:02:35Z

Could be my version of python 2.7 after all: http://wordaligned.org/articles/narrow-python

Running on recently installed python 3.4 shows no issues:

python3 -m unittest discover tests
......................................................................................................................................................
----------------------------------------------------------------------
Ran 150 tests in 0.310s

OK

Ichimonji10 · 2014-05-18T14:21:34Z

Good article.

You can discover whether your Python 2 executable is compiled with narrow or wide unicode support by executing the following:

>>> import sysconfig
>>> sysconfig.get_config_vars()['Py_UNICODE_SIZE']

On my dev machine, a value of 4 is returned, which would explain why I am sucessfully able to execute the test suite without encountering encoding issues.

When I execute the same instructions under Python 3.4.0, I get a KeyError, which indicates that all Python 3.4 builds support the full range of unicode characters. Additionally, this stackoverflow Q&A indicates that all Python builds from version 3.3 and beyond are guaranteed to support all unicode characters.

All that said, the question arises: should the generate_utf8 method be changed in response to this discovery? I would answer "no". Here's my reasoning:

If an application's requirements doc states that it should support UTF8 characters, then that application should support all UTF-8 characters. Not some, but all.
The generate_utf8 method allows users to discover an issue preventing full UTF-8 support.
Once this issue is discovered, users can choose how to handle the issue on a project-by-project basis. Perhaps they'll support only narrow utf8 chars, or perhaps they'll make sure their development/deployment environment uses a correctly compiled Python executable.

That said, we cannot anticipate all use cases for FauxFactory. Some users may be interested in only providing "narrow" unicode support. If this is the case, then a generate_narrow_utf8 method should be added, and a corresponding "narrow_utf8" argument should be added to the generate_string method. (Or call it generate_utf8_narrow and create a "utf8_narrow" argument. Ehh.)

Ichimonji10 · 2014-05-18T14:43:24Z

To restate my argument from above, in fewer words:

Incomplete support for UTF-8 is something users should be aware of. generate_utf8 facilitates discovery of such. Therefore, generate_utf8 has value.

omaciel · 2014-09-30T21:27:02Z

@Ichimonji10 ping. Is this still relevant?

Ichimonji10 · 2014-09-30T21:28:29Z

I've not needed zero-length strings for the past several months, so no.

Ichimonji10 closed this as completed Sep 30, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow zero-length strings #22

allow zero-length strings #22

Ichimonji10 commented May 17, 2014

omaciel commented May 17, 2014

Ichimonji10 commented May 17, 2014

omaciel commented May 17, 2014

Ichimonji10 commented May 17, 2014

omaciel commented May 17, 2014

omaciel commented May 17, 2014

Ichimonji10 commented May 18, 2014

Ichimonji10 commented May 18, 2014

omaciel commented Sep 30, 2014

Ichimonji10 commented Sep 30, 2014

allow zero-length strings #22

allow zero-length strings #22

Comments

Ichimonji10 commented May 17, 2014

omaciel commented May 17, 2014

Ichimonji10 commented May 17, 2014

omaciel commented May 17, 2014

Ichimonji10 commented May 17, 2014

omaciel commented May 17, 2014

omaciel commented May 17, 2014

Ichimonji10 commented May 18, 2014

Ichimonji10 commented May 18, 2014

omaciel commented Sep 30, 2014

Ichimonji10 commented Sep 30, 2014