Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: implemented arbitrary data separator, for now supported Tab and Comma. #15

Merged
merged 3 commits into from
May 16, 2024

Conversation

jankoslavic
Copy link
Contributor

This is a PR handling arbitrary data separator.

lvm_read.py Outdated
if line.startswith('Separator'):
separator = line.strip()[10:]
break
file.seek(0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using file.seek will fail when calling from read_str

lvm_read.py Outdated
break
file.seek(0)
if separator in separators.keys():
return separators[separator]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the specification: "To find out what the separator character(s) is, read the entire header block and search for the keyword Separator. The character(s) that follows the keyword is the separator." Any separator can be supported by just using the character following 'Separator'.

lvm_read.py Outdated
}
for line in file:
if line.startswith('Separator'):
separator = line.strip()[10:]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could cause an Indexation error if a writer decides to write a header like:
Separator;

@jankoslavic
Copy link
Contributor Author

Thank you for the suggestions. I made a few changes to address the issues opened. Any further ideas?

@Professor-0
Copy link

Changes look good. _get_separator is not called from read_str. Also a check for the end of header symbol could be added to _get_separator to avoid reading the entire file if the separator header is missing.

@jankoslavic
Copy link
Contributor Author

jankoslavic commented May 15, 2024

_get_separator should be called only from _read_lvm_base and on the file handler.
if the separator is not found in the first 20 lines, it continues with the default tab separator.

@jankoslavic jankoslavic merged commit 5793c74 into master May 16, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants