Infer metadata from the data *.csv file #173

Bachibouzouk · 2024-05-27T12:56:35Z

Description

The function infer_metadata_from_data can be run on a datapackage where only the data/elements and data/sequences have been defined by a user and will generate automatically the datapackage.json file containing the metadata.

Type of change

Please tick or delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

Please tick or delete options that are not relevant.

New and adjusted code is formatted using the pre-commit hooks
--> I couldn't get the pre-commit local tagged object to run, it told me isort is unknown command even after I pip installed it ....
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
existing unit tests pass locally with my changes
I have added new features/fixes to the CHANGELOG
I have added my name to AUTHORS

Each key map to a list of descriptors which themselves contain a "fields" attribute. We map the values under this "fields" attribute to the keys of FOREIGN_KEY_DESCRIPTORS

The function infer_metadata_from_data can be run on a datapackage where only the data/elements and data/sequences have been defined by a user and will generate automatically the datapackage.json file containing the metadata.

Bachibouzouk · 2024-05-27T12:58:21Z

src/oemof/tabular/datapackage/building.py

+            p.add_resource(r.descriptor)
+
+
+def infer_metadata_from_data(


Idea was to make this feature optional, therefore I did not want to modify infer_metadata function

Bachibouzouk · 2024-05-27T13:00:52Z

src/oemof/tabular/datapackage/building.py

+        if "/elements/" in r.descriptor["path"]:
+            infer_resource_basic_foreign_keys(r)
+    # this function saves the metadata of the package in json format
+    infer_metadata(


Because this function does already part of the job if provided a dict foreign_keys, I used it and therefore just wrote the function infer_resource_basic_foreign_keys to fill this foreign_keys dict.

henhuy

Nice feature! Thanks for implementing.
Tested it with existing investment datapackage from examples - files were almost identical (only no empty line add end of file was missing 😛).
Would be nice to such test in tests folder too (aka build metadata for existing datapackage in examples and check if files are identical).
I also found three smaller issues.

henhuy · 2024-06-03T14:42:19Z

src/oemof/tabular/datapackage/building.py

+
+def infer_metadata_from_data(
+    package_name="default-name",
+    path=None,


Default parameter "None" throws an error and makes no sense IMO.
I would make it mandatory and first argument.

Fixed in 0c4279b 0c4279b

henhuy · 2024-06-03T14:42:33Z

src/oemof/tabular/datapackage/building.py

+    metadata_filename="datapackage.json",
+):
+    """
+


Small docstring would be good

Fixed in 0c4279b 0c4279b

henhuy · 2024-06-03T14:44:00Z

src/oemof/tabular/datapackage/building.py

+        # write an error message here
+        pass


An error should be thrown here.

Fixed in b8b4e64 b8b4e64

Bachibouzouk · 2024-06-03T18:33:14Z

Nice feature! Thanks for implementing. Tested it with existing investment datapackage from examples - files were almost identical (only no empty line add end of file was missing 😛). Would be nice to such test in tests folder too (aka build metadata for existing datapackage in examples and check if files are identical). I also found three smaller issues.

Thanks for your review and your comments :)

I will implement your suggestions and run those on the example datapackages :)

Bachibouzouk · 2024-06-03T18:34:25Z

src/oemof/tabular/datapackage/building.py

+    sequence_labels = []
+    duplicated_labels = []
+    for r in p.resources:
+        if "/sequences/" in r.descriptor["path"]:


Here one has to use os.sep instead of "/"

Bachibouzouk · 2024-06-03T18:34:47Z

src/oemof/tabular/datapackage/building.py

+    sequences_profiles_to_resource = map_sequence_profiles_to_resource_name(p)
+
+    for r in p.resources:
+        if "/elements/" in r.descriptor["path"]:


Here one has to use os.sep instead of "/"

Fixed in 37c5ad5 37c5ad5

Bachibouzouk · 2024-06-03T22:01:47Z

Would be nice to such test in tests folder too (aka build metadata for existing datapackage in examples and check if files are identical).

I did run the new function on files and some of them are just changing the order of the resources within the datapackage. There was a difference which helped me find a bug and solve it in c7ab4bf

Those were generated by infer_metadata_from_data and had no differences except the package name

Those were generated by infer_metadata_from_data and had only a different order of resources

henhuy

LGTM! Thx for the changes

Bachibouzouk added 3 commits May 24, 2024 17:00

Map fields to the keys of FOREIGN_KEY_DESCRIPTORS

2065240

Each key map to a list of descriptors which themselves contain a "fields" attribute. We map the values under this "fields" attribute to the keys of FOREIGN_KEY_DESCRIPTORS

Add function to infer datapackage foreign_keys

eb9a8b9

The function infer_metadata_from_data can be run on a datapackage where only the data/elements and data/sequences have been defined by a user and will generate automatically the datapackage.json file containing the metadata.

Update documentation

5a89536

Bachibouzouk commented May 27, 2024

View reviewed changes

Bachibouzouk added 4 commits May 28, 2024 15:11

Update oemof.tabular version

c6c5dc8

Fix flake8

e420923

Update AUTHORS.rst

6dfb247

Update CHANGELOG.rst

a7ee863

Bachibouzouk marked this pull request as ready for review May 28, 2024 20:21

Bachibouzouk requested review from FelixMau and henhuy May 28, 2024 20:21

henhuy requested changes Jun 3, 2024

View reviewed changes

Bachibouzouk commented Jun 3, 2024

View reviewed changes

Bachibouzouk added 4 commits June 3, 2024 23:45

Make path argument mandatory

0c4279b

Fix path separator bug

37c5ad5

Raise error message in case of non unique sequences labels

b8b4e64

Lint with black

1f81196

Bachibouzouk added 4 commits June 4, 2024 00:04

Fix flake error

ef6ed5f

Fix foreign key parsing bug

c7ab4bf

Update example datapackages

6343749

Those were generated by infer_metadata_from_data and had no differences except the package name

Update example datapackages

bb3fef5

Those were generated by infer_metadata_from_data and had only a different order of resources

Bachibouzouk requested a review from henhuy June 3, 2024 22:34

Bachibouzouk added 2 commits June 4, 2024 00:58

Sort resources alphabetically

bbe05fa

Fix failing test

ed91085

henhuy approved these changes Jun 4, 2024

View reviewed changes

Bachibouzouk merged commit 782118e into dev Jun 4, 2024
2 checks passed

Bachibouzouk deleted the feature/infer_metadata branch June 4, 2024 06:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infer metadata from the data *.csv file #173

Infer metadata from the data *.csv file #173

Bachibouzouk commented May 27, 2024 •

edited

Loading

Bachibouzouk May 27, 2024

Bachibouzouk May 27, 2024

henhuy left a comment

henhuy Jun 3, 2024

Bachibouzouk Jun 3, 2024

henhuy Jun 3, 2024

Bachibouzouk Jun 3, 2024

henhuy Jun 3, 2024

Bachibouzouk Jun 3, 2024

Bachibouzouk commented Jun 3, 2024

Bachibouzouk Jun 3, 2024

Bachibouzouk Jun 3, 2024

Bachibouzouk Jun 3, 2024

Bachibouzouk commented Jun 3, 2024 •

edited

Loading

henhuy left a comment

Infer metadata from the data *.csv file #173

Infer metadata from the data *.csv file #173

Conversation

Bachibouzouk commented May 27, 2024 • edited Loading

Description

Type of change

Checklist:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

henhuy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Bachibouzouk commented Jun 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Bachibouzouk commented Jun 3, 2024 • edited Loading

henhuy left a comment

Choose a reason for hiding this comment

Bachibouzouk commented May 27, 2024 •

edited

Loading

Bachibouzouk commented Jun 3, 2024 •

edited

Loading