Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New inverted grammar, starting with header cells #31

Open
nacnudus opened this issue Dec 10, 2019 · 2 comments
Open

New inverted grammar, starting with header cells #31

nacnudus opened this issue Dec 10, 2019 · 2 comments

Comments

@nacnudus
Copy link
Owner

@nacnudus nacnudus commented Dec 10, 2019

The current unpivotr grammar starts from the point of view of data cells, and searches for associated headers. This imitated databaker, because it is useful in the most common case (in my experience).

  1. The header cells surround the data cells.
  2. There are more different headers than you care to hardcode into a script

At long last, there is an example of a consistent schema that breaks (1) and doesn't suffer from (2).

Untidy data

image

Tidy version

image

Thoughts

  1. Locate each type of header by filtering, e.g. character == "Species:". Error if not unique (see step 4 for when whole tables repeat, as in the example).
  2. Describe the domain of the header over related data cells by its direction and limit, e.g. direction = "W" and limit = 1 or limit = Inf. Unlike the existing grammar, the direction is from the point of view of the header cell, rather than the data cells.
  3. Given a set of headers so described, unpivotr would resolve the data cells to the matching headers.
  4. If the whole table repeats, as in the example above, the same technique would apply as now -- identify a corner cell of each table, nest, and unpivot one at a time.
@jl5000

This comment has been minimized.

Copy link

@jl5000 jl5000 commented Dec 10, 2019

Do we know if there are any other datasets with this structure or if it's an evil one-off? I've never seen a structure like this before.

@nacnudus

This comment has been minimized.

Copy link
Owner Author

@nacnudus nacnudus commented Dec 11, 2019

That's a reasonable point, although it isn't how nerd-sniping works 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.