You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I deal with comma, tab, pipe, and even tilde separated text files, and I swap back and forth. In python scripts that I have written I scan the first ten lines to find frequency of common delimiters and use that to interpret the file. It's correct 99% of the time. Can there be an option to autodetect, not based on the file extension?
# Read the first few lines and count occurrences of each potential delimiter
with open(filename, 'r') as file:
for _ in range(num_lines):
line = file.readline()
if not line:
break
for delimiter in delimiters:
delimiters[delimiter] += line.count(delimiter)
# Determine the most common delimiter
max_delimiter = max(delimiters, key=delimiters.get)
if max_delimiter == '|':
return "|"
elif max_delimiter == ',':
return ","
elif max_delimiter == '\t':
return "\\t"
else:
return None
I'm sure there's a more elegant way to do it.
The text was updated successfully, but these errors were encountered:
I deal with comma, tab, pipe, and even tilde separated text files, and I swap back and forth. In python scripts that I have written I scan the first ten lines to find frequency of common delimiters and use that to interpret the file. It's correct 99% of the time. Can there be an option to autodetect, not based on the file extension?
Here's what I use to autodetect:
import sys
import subprocess
def detect_delimiter(filename, num_lines=10):
delimiters = {'|': 0, ',': 0, '\t': 0}
I'm sure there's a more elegant way to do it.
The text was updated successfully, but these errors were encountered: