Splitting text line by line #902
-
Hello, on this link that will be attached at the end, I tried to create a code that would allow me to write text with different line endings and then I wanted to split it via regex but for some reason when I used a for loop to check and spin the first result it worked like this that println printed the entire text without splitting. |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 1 reply
-
The output shows that the string is split at each line ending and prints test1, test2 and test3. This is what I would expect. What specific output are you looking for? |
Beta Was this translation helpful? Give feedback.
-
I thought that when I use split and I have it set so that it works
line by line, after the first break it should print test1 and it
should stop there, but it looks like it prints test1 test2 and test3
even though I put a break after the first iteration
2022-08-17 4:30 GMT+02:00, Andrew Gallant ***@***.***>:
… The output shows that the string is split at each line ending and prints
test1, test2 and test3. This is what I would expect. What specific output
are you looking for?
--
Reply to this email directly or view it on GitHub:
#902 (comment)
You are receiving this because you authored the thread.
Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I'm trying to create this as an example, because if I succeeded, later
I would need to load a file via regex::bytes, which probably won't be
utf8, then split the text according to line endings, which can be \r\n
or \n, and then when the the file is loaded and split, from there I
pull out a random string according to the number, that generates crate
rand, but when I tried what I sent you, it looked like the strings
were not split correctly according to line endings because if it would
be correct then it should print test1 and stop when i added break
2022-08-17 9:20 GMT+02:00, Peter Kubek ***@***.***>:
… I thought that when I use split and I have it set so that it works
line by line, after the first break it should print test1 and it
should stop there, but it looks like it prints test1 test2 and test3
even though I put a break after the first iteration
2022-08-17 4:30 GMT+02:00, Andrew Gallant ***@***.***>:
> The output shows that the string is split at each line ending and prints
> test1, test2 and test3. This is what I would expect. What specific output
> are you looking for?
>
> --
> Reply to this email directly or view it on GitHub:
> #902 (comment)
> You are receiving this because you authored the thread.
>
> Message ID:
> ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Note also that the only recognized line ending is |
Beta Was this translation helpful? Give feedback.
-
hello, I can say that you understood my question very well and
suggested something that seems like a solution, but I still have 3
questions, do you think the issue with handling \r\n will ever be
solved in the near future?
when I load a file in which there is no guarantee that the file is all
utf8, do I have to add something else to your example so that there is
no problem with non utf8 characters?
in case the first problem has not been solved in the near future, do
you think I could use bstring and its lines handler to handle \r\n?
2022-08-17 13:44 GMT+02:00, Andrew Gallant ***@***.***>:
… `$` only matches at the end of a string. It sounds like you want it to match
at line endings too. You need to enable multi line mode for that, just like
you have to do in most other regex engines.
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=3e4ac6e8bf10babeb8937d5c333134e3
Note also that the only recognized line ending is `\n` currently. There is
an issue somewhere tracking the addition of `\r\n`.
--
Reply to this email directly or view it on GitHub:
#902 (comment)
You are receiving this because you authored the thread.
Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
the point is that I would use this in an older game where there is
only ansi support and therefore I just need to split the file
according to line endings and then return the string to the player as
it was so that his ansi structure is not broken while I process it so
that means i would split with line endings but will not check what is
in this string until player get back that to his game
2022-08-17 14:50 GMT+02:00, Andrew Gallant ***@***.***>:
… Here's the issue tracking CRLF support for `$`:
#244
> do you think the issue with handling \r\n will ever be
solved in the near future?
I don't give estimates for projects I work on in my free time.
> when I load a file in which there is no guarantee that the file is all
utf8, do I have to add something else to your example so that there is
no problem with non utf8 characters?
This is kind of a complicated and nuanced topic. If you expect your input to
be conventionally UTF-8 but might have the odd latin-1 byte somewhere, then
you can:
* Assume it's close enough to UTF-8 and operate on `&[u8]` directly. That's
what `regex::bytes` is for.
* Lossily decode your data to UTF-8 such that invalid UTF-8 gets replaced
with `U+FFFD` (the replacement codepoint).
* Return an error to the end user if it isn't valid UTF-8.
Any one of those might be reasonable. I can't tell you which is right for
your use case because I don't know what problem you're trying to solve.
Now, if your input might be UTF-8, or UTF-16 or something else entirely,
then that's a different problem and you likely need something like the
`encoding_rs_io` crate to help you there.
> in case the first problem has not been solved in the near future, do
you think I could use bstring and its lines handler to handle \r\n?
I don't see how I could answer this without knowing the problem you're
trying to solve. Like... are you just trying to iterate over lines? Then
yeah, umm, don't use a regex for that... You can either use the standard
library's line iterator (for `&str`) or `bstr`'s iterator (for `&[u8]`) or
just roll your own. If you're trying to do more complicated matching and do
specifically want a regex, then just use `$`. And if your match ends with a
`\r`, remove it.
But like I said, you haven't actually _explained the problem you're trying
to solve_. So I don't really know the answers to your questions.
--
Reply to this email directly or view it on GitHub:
#902 (reply in thread)
You are receiving this because you authored the thread.
Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
$
only matches at the end of a string. It sounds like you want it to match at line endings too. You need to enable multi line mode for that, just like you have to do in most other regex engines.https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=3e4ac6e8bf10babeb8937d5c333134e3
Note also that the only recognized line ending is
\n
currently. There is an issue somewhere tracking the addition of\r\n
.