Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Flag to Avoid Treating NUL Separated Input as Binary #2974

Closed
LangLangBart opened this issue May 29, 2024 · 11 comments · Fixed by #2976
Closed

Add Flag to Avoid Treating NUL Separated Input as Binary #2974

LangLangBart opened this issue May 29, 2024 · 11 comments · Fixed by #2976
Labels
feature-request New feature or request

Comments

@LangLangBart
Copy link

LangLangBart commented May 29, 2024

Discussed in #2971


Issue

Currently, running a command like the following will print a warning:

printf "First\0" | bat -p
SCR-20240529-trdo

The warning is defined in src/printer.rs:

bat/src/printer.rs

Lines 435 to 444 in 8f8c953

if !self.config.style_components.header() {
if Some(ContentType::BINARY) == self.content_type && !self.config.show_nonprintable {
writeln!(
handle,
"{}: Binary content from {} will not be printed to the terminal \
(but will be present if the output of 'bat' is piped). You can use 'bat -A' \
to show the binary file contents.",
Yellow.paint("[bat warning]"),
input.description.summary(),
)?;

The decision to label the input as BINARY seems to be made in src/input.rs:

bat/src/input.rs

Lines 260 to 271 in 8f8c953

let mut first_line = vec![];
reader.read_until(b'\n', &mut first_line).ok();
let content_type = if first_line.is_empty() {
None
} else {
Some(content_inspector::inspect(&first_line[..]))
};
if content_type == Some(ContentType::UTF_16LE) {
reader.read_until(0x00, &mut first_line).ok();
}

A hacky workaround is to make the first line empty, use bat, and then remove the first line:

printf "\nFirst\0" | bat -p | sed '1d'

Proposed solution

A new flag that doesn't label content_type as BINARY when the first line ends with a NUL byte:

# naming the flag '--text' to align with 'grep/git diff'
printf "First\0" | bat -p --text

The crate 1 used to determine if content is binary states:

//! encoding). Note that **this analysis can fail**. For example, even if unlikely, UTF-8-encoded
//! text can legally contain NULL bytes. Conversely, some particular binary formats (like binary

Based on this, a --text flag would be very appropriate, similar to how grep and git diff have one as well.

printf "First\0" | grep 'First'
# grep: (standard input): binary file matches

printf "First\0" | grep --text 'First'
# First

Footnotes

  1. sharkdp/content_inspector: Fast inspection of binary buffers to guess/determine the type of content

@LangLangBart LangLangBart added the feature-request New feature or request label May 29, 2024
@domenicomastrangelo
Copy link

Hi @LangLangBart, would your issue be fixed by adding the flag -A to show non printable characters?

This would result in the following output:

image

@LangLangBart
Copy link
Author

LangLangBart commented May 30, 2024

by adding the flag -A

Thanks for the suggestion. I failed to mention this in the issue report here and only described it in the linked discussion. For my use case, the -A/--show-all flag would not be adequate.

I try to colorize my zsh history and pipe it into fzf.

# zsh only, the '-N'  flag separates the array elements by `NUL`
print -rNC1 -- "${(@uv)history}" | bat -pl zsh | fzf --read0

@domenicomastrangelo
Copy link

My bad, I missed the discussion link.

So the issue you're having, is that when printing something (in this case a line from the history file), in case it has a null char in it, it will give an error.

It feels like a very nieche problem to have, but I think it could be fixed, as you said, adding a --read0 or --read-null-bytes flag.

I could work on this as I'm looking for my first contribution to the project, but it would be good to have an opinion from a more senior contributor too :)

@keith-hall
Copy link
Collaborator

I'm personally in favor of the idea, but it would be great to wait for input from some of the other maintainers before spending time on it, in case we don't all agree 😉

@LangLangBart
Copy link
Author

It feels like a very niche problem to have, but I think it could be fixed, as you said, adding a --read0 or --read-null-bytes flag.

I have updated the description, and I would propose a --text flag to align with grep and git diff.

wait for input from some of the other maintainers

Agreed, we should wait for input from some of the maintainers.

@sharkdp
Copy link
Owner

sharkdp commented May 30, 2024

Sounds good to me. Let's think about making this an option, not a flag. Maybe there are other reasonable options that we want to add later (apart from a yes or no decision). Like whether or not we print that warning.

@LangLangBart
Copy link
Author

LangLangBart commented May 31, 2024

Project goals and alternatives
...
Be a drop-in replacement for (POSIX) cat

Question: Why was the binary message added at all ?

EDIT1: I found the reason in #248, and #336


Let's think about making this an option, not a flag.

How about this?
If the input was labeled binary, check if the first_line would also be labeled binary if the last char is not there, don't label the input as binary?

How about --input={text,auto,…} ?

@domenicomastrangelo
Copy link

great, so @LangLangBart you propose --input to specify which language to use for printing or did I get it wrong?

so basically (more or less)

  if input not set
     let mut first_line = vec![]; 
     reader.read_until(b'\n', &mut first_line).ok(); 
      
     let content_type = if first_line.is_empty() { 
         None 
     } else { 
         Some(content_inspector::inspect(&first_line[..])) 
     }; 
      
     if content_type == Some(ContentType::UTF_16LE) { 
         reader.read_until(0x00, &mut first_line).ok(); 
     }
  else
    content_type = get_content_type_from_input(input)
  endif

@LangLangBart
Copy link
Author

@domenicomastrangelo

@einfachIrgendwer0815 started already a PR.

Besides the color, it works well. Image below comparing 0.7.1 vs their PR.

@einfachIrgendwer0815
Copy link
Contributor

@LangLangBart the color issue should be fixed now

@domenicomastrangelo
Copy link

nice :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants