Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow zero-length strings #22

Closed
Ichimonji10 opened this issue May 17, 2014 · 10 comments
Closed

allow zero-length strings #22

Ichimonji10 opened this issue May 17, 2014 · 10 comments

Comments

@Ichimonji10
Copy link
Contributor

Methods like generate_alphanumeric and generate_alpha allow the user to choose how long the resultant string should be. Each of those string generation methods checks length to ensure that the value is an integer and is not too short. There are two problems here:'

  1. A length of zero is not allowed. That doesn't make sense. A user should be able to generate a zero-length string if they so desire.
  2. The validation logic in each method is identical. It should be refactored out into a single private method.
@omaciel
Copy link
Owner

omaciel commented May 17, 2014

My logic for not allowing zero-length strings is because if that is the case then you could just pass data="" or data=u"" to your test. It's pretty trivial to change the code to allow zero-length but I wonder if it is useful...

As far as refactoring the code that performs the validation, I agree and will do that soon-ish (unless you beat me to the punch and send me a PR) :)

@Ichimonji10
Copy link
Contributor Author

If you refactor out that validation logic, your test suite can be trimmed down significantly.

@omaciel
Copy link
Owner

omaciel commented May 17, 2014

Cool, will do that today.

@Ichimonji10
Copy link
Contributor Author

If you look at pull request 20, you'll see I only have three unit tests for generate_utf8. You can probably get by with just three tests for all of your other string generation methods, too. Doing so will significantly lighten your codebase while still providing thorough test coverage. That's the route I'd go, after refactoring out the validation logic.

In my opinion. ;)

@omaciel
Copy link
Owner

omaciel commented May 17, 2014

@Ichimonji10 created PR #23 and would love your feedback before merging it.

Btw, interestingly enough, when I run the test suite locally (python 2.7) I keep getting the following errors (though Travis seems to be happy):

 ======================================================================
ERROR: @Test: Create a unicode string.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/omaciel/hacking/fauxfactory/tests/test_strings.py", line 351, in test_generate_utf8_1
    result = self.factory.generate_string('utf8', 5)
  File "/Users/omaciel/hacking/fauxfactory/fauxfactory/__init__.py", line 85, in generate_string
    return cls.generate_utf8(length)
  File "/Users/omaciel/hacking/fauxfactory/fauxfactory/__init__.py", line 636, in generate_utf8
    output = u''.join(unichr(codepoint) for codepoint in codepoints)
  File "/Users/omaciel/hacking/fauxfactory/fauxfactory/__init__.py", line 636, in <genexpr>
    output = u''.join(unichr(codepoint) for codepoint in codepoints)
ValueError: unichr() arg not in range(0x10000) (narrow Python build)

======================================================================
ERROR: @Test: Create a unicode string and specify a length.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/omaciel/hacking/fauxfactory/tests/test_strings.py", line 366, in test_generate_utf8_2
    len(self.factory.generate_string('utf8', length)),
  File "/Users/omaciel/hacking/fauxfactory/fauxfactory/__init__.py", line 85, in generate_string
    return cls.generate_utf8(length)
  File "/Users/omaciel/hacking/fauxfactory/fauxfactory/__init__.py", line 636, in generate_utf8
    output = u''.join(unichr(codepoint) for codepoint in codepoints)
  File "/Users/omaciel/hacking/fauxfactory/fauxfactory/__init__.py", line 636, in <genexpr>
    output = u''.join(unichr(codepoint) for codepoint in codepoints)
ValueError: unichr() arg not in range(0x10000) (narrow Python build)

----------------------------------------------------------------------
Ran 150 tests in 0.849s

FAILED (errors=2)

@omaciel
Copy link
Owner

omaciel commented May 17, 2014

Could be my version of python 2.7 after all: http://wordaligned.org/articles/narrow-python

Running on recently installed python 3.4 shows no issues:

python3 -m unittest discover tests
......................................................................................................................................................
----------------------------------------------------------------------
Ran 150 tests in 0.310s

OK

@Ichimonji10
Copy link
Contributor Author

Good article.

You can discover whether your Python 2 executable is compiled with narrow or wide unicode support by executing the following:

>>> import sysconfig
>>> sysconfig.get_config_vars()['Py_UNICODE_SIZE']

On my dev machine, a value of 4 is returned, which would explain why I am sucessfully able to execute the test suite without encountering encoding issues.

When I execute the same instructions under Python 3.4.0, I get a KeyError, which indicates that all Python 3.4 builds support the full range of unicode characters. Additionally, this stackoverflow Q&A indicates that all Python builds from version 3.3 and beyond are guaranteed to support all unicode characters.

All that said, the question arises: should the generate_utf8 method be changed in response to this discovery? I would answer "no". Here's my reasoning:

  1. If an application's requirements doc states that it should support UTF8 characters, then that application should support all UTF-8 characters. Not some, but all.
  2. The generate_utf8 method allows users to discover an issue preventing full UTF-8 support.
  3. Once this issue is discovered, users can choose how to handle the issue on a project-by-project basis. Perhaps they'll support only narrow utf8 chars, or perhaps they'll make sure their development/deployment environment uses a correctly compiled Python executable.

That said, we cannot anticipate all use cases for FauxFactory. Some users may be interested in only providing "narrow" unicode support. If this is the case, then a generate_narrow_utf8 method should be added, and a corresponding "narrow_utf8" argument should be added to the generate_string method. (Or call it generate_utf8_narrow and create a "utf8_narrow" argument. Ehh.)

@Ichimonji10
Copy link
Contributor Author

To restate my argument from above, in fewer words:

Incomplete support for UTF-8 is something users should be aware of. generate_utf8 facilitates discovery of such. Therefore, generate_utf8 has value.

@omaciel
Copy link
Owner

omaciel commented Sep 30, 2014

@Ichimonji10 ping. Is this still relevant?

@Ichimonji10
Copy link
Contributor Author

I've not needed zero-length strings for the past several months, so no.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants