Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow fetching other datasets (obf, opff, opf) #223

Merged
merged 2 commits into from
Apr 2, 2024

Conversation

raphodn
Copy link
Member

@raphodn raphodn commented Apr 1, 2024

Description

The current ProductDataset implementation only allows to fetch the OFF (food) data

Solution

  • extend ProductDataset to allow fetching other datasets: OBF, OPFF, OPF
  • update documentation

Related issue(s)

@raphodn raphodn requested a review from a team as a code owner April 1, 2024 12:54
},
Flavor.obf: {
DatasetType.jsonl: "openbeautyfacts-products.jsonl.gz",
DatasetType.csv: "en.openbeautyfacts.org.products.csv",
Copy link
Member Author

@raphodn raphodn Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the obf, opff & opf datasets don't seem to have .csv.gz files (getting 404).
so I used plain .csv instead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obf, opff, opf use an old version of Product Opener so this is expected. The migration to a more recent version is in progress (but it needs quite a lot of work).

@@ -57,7 +73,12 @@ def get_dataset(


class ProductDataset:
def __init__(self, dataset_type: DatasetType = DatasetType.jsonl, **kwargs):
def __init__(
Copy link
Member Author

@raphodn raphodn Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slight breaking change, as the documentation gives an explicit usage example with ProductDataset("csv"), which will not work anymore

// instead
ProductDataset(dataset_type="csv")

// or
from openfoodfacts.types import Flavor
ProductDataset(flavor=Flavor.off, dataset_type="csv")

Copy link

sonarcloud bot commented Apr 1, 2024

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@raphael0202
Copy link
Contributor

raphael0202 commented Apr 2, 2024

The thing is, to my knowledge, we don't have JSONL export of anything else than OFF..
Ex: https://static.openproductsfacts.org/openproductsfacts-products.jsonl.gz returns HTTP 404

My bad, it's now functional!

Copy link
Contributor

@raphael0202 raphael0202 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me, thank you for the PR!

@raphael0202 raphael0202 merged commit 36a9625 into develop Apr 2, 2024
6 of 7 checks passed
@raphael0202 raphael0202 deleted the product-dataset-flavor branch April 2, 2024 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extend ProductDataset to other flavors (obf, opf, opff)
2 participants