Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

class iterator doesn't work with stream_iterator #47

Closed
ritonglue opened this issue Oct 6, 2019 · 5 comments
Closed

class iterator doesn't work with stream_iterator #47

ritonglue opened this issue Oct 6, 2019 · 5 comments

Comments

@ritonglue
Copy link

I tried to convert an istream to a sequence of code points. Unfortunately, you can't use utf8::iterator because it doesn't compile. It's possible to bypass the compilation problem by removing the two blocking lines which are just checks.
The problem is the same with both version of iterator (checked and unchecked) : its converts every two characters. The origin of the cause is the operator*() : it assumes that the injected iterator doesn't change while performing the next method.

I suggest that the class iterator stores the code point.

Here is an other problem : with a stream_iterator, you can't go back ! So operator--() can't work.

I suggest also to provide an end point for the iterator, just like std::istream_iterator so that you can write a classical for loop :
for(utf8::iterator iter ; iter != end ; ++iter) {//do stuff}

I hope you can find a solution.
Regards

#include <iostream>
#include <sstream>
#include <iomanip>
#include <iterator>
#include <utf8.h>

void print(uint32_t cp) {
//	std::cout << std::hex << std::setfill('0') << std::setw(2) << cp << ' ';
	std::cout << (char)cp;
}

int main() try {
	using iterator = std::istream_iterator<char>;
	using utf8_iterator = utf8::unchecked::iterator<iterator>;
	//using utf8_iterator = utf8::iterator<iterator>;
	std::istringstream is("abc");
	iterator it(is);
	iterator eos{};

	utf8_iterator end_iter{};
	/*
	for(utf8_iterator iter(it, it, eos) ; iter != end_iter ; ++iter) {
		std::cout << std::hex << std::setfill('0') << std::setw(2) << *iter << ' ';
	}
	*/
	for(utf8_iterator iter(it) ; iter != end_iter ; ++iter) {
		print(*iter);
	}
	std::cout << std::endl;
	return 0;
} catch(const std::exception & e) {
	std::cerr << "exception: " << e.what() << '\n';
	return 1;
}
@0x17de
Copy link

0x17de commented Oct 13, 2019

I would like to have a look at it today. I have already spotted multiple calls to operator* moves the actual iterator by one - so if you print(*iter); multiple times and use "abcdef" as your string, you see the difference.

@nemtrif
Copy link
Owner

nemtrif commented Oct 20, 2019

The compilation problem is simple: unchecked iterator constructor takes only one argument, not three. Instead of:

for(utf8_iterator iter(it, it, eos)

do

for(utf8_iterator iter(it)

Looking at the rest...

@ritonglue
Copy link
Author

ritonglue commented Oct 20, 2019

unchecked::iterator compiles. checked::iterator doesn't.
The problem comes from "operator<" which doesn't exist for std::istream_iterator

@nemtrif
Copy link
Owner

nemtrif commented Dec 9, 2019

Not a bug. The iterator class adapts bi-directional iterators and istream_itrator is an input iterator.

@nemtrif nemtrif closed this as completed Dec 9, 2019
@ritonglue
Copy link
Author

The class utf8::unchecked::iterator doesn't say so. See my PR #59 for a bug example.
The utf8::iterator needs a bi-directional iterator because of the range-checking. Remove the range-checking and it will fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants