New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
--sniff option for sniffing delimiters #230
Comments
Running this could take any CSV (or TSV) file and automatically detect the delimiter. If no header row is detected it could add
(Using This could be called |
The challenge here is how to read the first 2048 bytes and then reset the incoming file. The Python docs example looks like this: with open('example.csv', newline='') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect) Here's the relevant code in sqlite-utils/sqlite_utils/cli.py Lines 671 to 679 in 726219c
The challenge is going to be having the sqlite-utils/sqlite_utils/utils.py Lines 106 to 113 in 726219c
If |
No, you can't
|
Types involved:
|
Maybe I shouldn't be using sqlite-utils/sqlite_utils/cli.py Lines 667 to 668 in 726219c
|
There are two code paths here that matter:
I'm a bit stuck on the second one. Ideally I could use something like |
I think I've got it. I can use encoding = encoding or "utf-8"
buffered = io.BufferedReader(json_file, buffer_size=4096)
decoded = io.TextIOWrapper(buffered, encoding=encoding, line_buffering=True)
if pk and len(pk) == 1:
pk = pk[0]
if csv or tsv:
if sniff:
# Read first 2048 bytes and use that to detect
first_bytes = buffered.peek(2048)
print('first_bytes', first_bytes) |
Here's the implementation in Python: https://github.com/python/cpython/blob/63298930fb531ba2bb4f23bc3b915dbf1e17e9e1/Lib/csv.py#L204-L225 |
Originally posted by @simonw in #228 (comment)
The text was updated successfully, but these errors were encountered: