Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance parsing by headers #32

Closed
a1tavista opened this issue Jun 22, 2021 · 3 comments
Closed

Enhance parsing by headers #32

a1tavista opened this issue Jun 22, 2021 · 3 comments

Comments

@a1tavista
Copy link

Hello there! Thank you for the gem, you made a really good job!

I wanna suggest one feature that I can implement as a contributor if we'll decide to do that. So, there's the subject.

In a gem called roo (definitely you know about it), there's a very good feature that allows passing a hash by a set of headers. My team uses roo for parsing datasheets with headers in Russian, and then we use the content of a datasheet to create some AR entities for example.

In roo it looks like:

SET_OF_HEADERS = {
  name: /Название организации|Название/i,
  inn: /ИНН/i,
  kpp: /КПП/i
}.freeze

xlsx = Roo::Excelx.new(filepath)
raw_data = xlsx.sheet(0).parse(SET_OF_HEADERS)

raw_data.first.keys # => [:name, :inn, :kpp]

That allows you to define the keys of your data items so there is no need to transform the keys of every hash to pass data to the next method for example. And with that feature, you can also automatically detect the offset between the first significant row of your data and some blank space, because sometimes docs that we parse looks like this:

image

As you can see, there are two rows that shouldn't be present in parsed data – it just the information to one who works with this template on how to fill rows.

So if this interesting for you I could contribute some time to implement that feature in xsv too.

Best regards.

@martijn
Copy link
Owner

martijn commented Jun 22, 2021

Thanks for the feedback and suggestion.

I would gladly merge something like this. It seems to me that a header_translations parameter on the parse_headers! method would be the way to go. Please submit some tests with your code so we can ensure the feature does not break in the future.

@a1tavista
Copy link
Author

Thank you for your answer, then I'm going to implement this functionality in the near future 🙂

@martijn
Copy link
Owner

martijn commented Jun 25, 2021

Looking forward to it!

@martijn martijn closed this as completed Jan 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants