Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autodetect file delimiters by scanning the first ten lines #2409

Closed
jwnacnud opened this issue May 21, 2024 · 1 comment
Closed

Autodetect file delimiters by scanning the first ten lines #2409

jwnacnud opened this issue May 21, 2024 · 1 comment

Comments

@jwnacnud
Copy link

I deal with comma, tab, pipe, and even tilde separated text files, and I swap back and forth. In python scripts that I have written I scan the first ten lines to find frequency of common delimiters and use that to interpret the file. It's correct 99% of the time. Can there be an option to autodetect, not based on the file extension?

Here's what I use to autodetect:

import sys
import subprocess

def detect_delimiter(filename, num_lines=10):
delimiters = {'|': 0, ',': 0, '\t': 0}

# Read the first few lines and count occurrences of each potential delimiter
with open(filename, 'r') as file:
    for _ in range(num_lines):
        line = file.readline()
        if not line:
            break
        for delimiter in delimiters:
            delimiters[delimiter] += line.count(delimiter)

# Determine the most common delimiter
max_delimiter = max(delimiters, key=delimiters.get)
if max_delimiter == '|':
    return "|"
elif max_delimiter == ',':
    return ","
elif max_delimiter == '\t':
    return "\\t"
else:
    return None

I'm sure there's a more elegant way to do it.

@saulpw
Copy link
Owner

saulpw commented May 21, 2024

Hi @jwnacnud, you can do this since v3.0 with a guess_ function: https://www.visidata.org/docs/api/loaders#guessing-filetypes

You should be able to port the above snippet into a function in your visidatarc and it should be used automatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants