Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handling merge/split of a country #58

Open
semio opened this issue Apr 21, 2017 · 1 comment
Open

handling merge/split of a country #58

semio opened this issue Apr 21, 2017 · 1 comment

Comments

@semio
Copy link
Owner

semio commented Apr 21, 2017

Problem:

see open-numbers/ddf--gapminder--co2_emission#1

semio added a commit that referenced this issue Apr 24, 2017
@semio
Copy link
Owner Author

semio commented Apr 25, 2017

current design:

merge

- procedure: merge_entity
  ingredients:
      - input_ingredient
  options:
      dictionary: merge.json
      merged: keep    # what to do with the entities to be merged
      target_column: entity_name
  result: output_ingredient

in merge.json:

{
    "new_entity_1": ["old_entity_1", "old_entity_2"],
    "new_entity_2": ["old_entity_3", "old_entity_4"]
}

split

- procedure: split_entity
  ingredients:
      - input_ingredient
  options:
      dictionary: split.json
      splitted: keep   # what to do with the entities to be splitted 
      target_column: entity_name
  result: input_ingredient

in split.json:

{
    "entity_to_split_1":  ["sub_entity_1", "sub_entity_2"],
    "entity_to_split_2":  ["sub_entity_3", "sub_entity_4"]
}

This assumes sub_entity_1 to sub_entity_4 exists in the dataset. The split ratio will be calculated with first valid values form the sub entities

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant