Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed iter_text adding an empty string #2998

Merged
merged 7 commits into from Dec 11, 2023
Merged

Fixed iter_text adding an empty string #2998

merged 7 commits into from Dec 11, 2023

Conversation

jamesbraza
Copy link
Contributor

Summary

Fixes TextChunker().decode("") not handling empty string, as exposed in #2995

Checklist

  • I understand that this PR may be closed in case there was no previous discussion. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.

@jamesbraza
Copy link
Contributor Author

Looks like coverage checks are failing because this bug was ensuring code got covered. I will adjust now

@jamesbraza
Copy link
Contributor Author

@tomchristie so since LineDecoder is only used after iter_text, I think it's no longer possible to get empty string to be the input to decode. To test a null input to LineDecoder.decode, I would have to directly import and test it. What do you think of that?

tests/test_decoders.py Outdated Show resolved Hide resolved
@tomchristie
Copy link
Member

What do you think of that?

I think it's better to add a nocover annotation to the unreachable line. Perhaps also with a comment that the line is "only reachable if empty text is fed in, which we don't actually ever do".

@jamesbraza
Copy link
Contributor Author

Alright, done! Ready for review again @tomchristie and @zanieb 👌

@tomchristie
Copy link
Member

Looks great. Nice tidy lil PR, this. One query on the usage of chain, could probs approve either ways tho.

@jamesbraza
Copy link
Contributor Author

jamesbraza commented Dec 11, 2023

Thanks! Yeah, I added the itertools.chain to uphold code coverage. After fixing this bug, it meant the chunker.decode(decoder.flush()) iterable was empty, so the first yield didn't get covered. By using itertools.chain, it basically combines the two lines so code coverage remains intact.

Alternately, iter_bytes solved this with a pragma: no cover comment (link). Imo this is isn't as good as itertools.chain. Would you like me to roll the itertools.chain out to all iter_ methods in the _models module?

@tomchristie
Copy link
Member

Not sure. Perhaps we could consider iter_chain() separately?

@jamesbraza
Copy link
Contributor Author

I think I was changing the internals of iter_text here with itertools.chain to keep coverage at 100%. I am not following what you mean by iter_chain.

To merge this PR, would you like me to switch to a pragma: no cover comment instead of itertools.chain?

@zanieb
Copy link
Contributor

zanieb commented Dec 11, 2023

@jamesbraza I think the suggestion is use a no cover pragma then follow with a separate pull request that explores using itertools.chain consistently throughout the code.

I'm okay with keeping this as is or that 🤷‍♀️

@jamesbraza
Copy link
Contributor Author

Sounds good, agree two PRs is better. Switched to pragma: no cover comments

@tomchristie tomchristie merged commit 1e11096 into encode:master Dec 11, 2023
5 checks passed
@tomchristie
Copy link
Member

Wonderful, thanks.

(Note to self: CHANGELOG.md updates as part of the PR before ya merge in the future, doofus.)

@jamesbraza jamesbraza deleted the fixing-streaming-iter-text branch December 11, 2023 23:27
@jamesbraza
Copy link
Contributor Author

I will get around to itertools.chain change this weekend, and will have a CHANGELOG.md entry for that 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants