Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: S3_override_endpoint #3795

Closed
wants to merge 9 commits into from

Conversation

vmallya-123
Copy link
Contributor

What this PR does / why we need it:

Currently using s3 override URL for file data sources in MinIO cause AWS error

To reproduce define features.py as

from datetime import timedelta

from feast import FeatureView, Feature, ValueType, FileSource, Field
from feast.data_format import ParquetFormat
from feast.types import String
from entity import user
import os

demo_features_parquet_file_source = FileSource(
    file_format=ParquetFormat(),
    path="s3://feature-data-sets/demographic_features.parquet",
    s3_endpoint_override=os.environ.get("FEAST_S3_ENDPOINT_URL"),
)
demo_features = FeatureView(
    name="demographic",
    entities=[user],
    schema=[
        Field(name="Native_country", dtype=String),
        Field(name="Sex", dtype=String),
        Field(name="Race", dtype=String),
    ],
    ttl=timedelta(days=365),
    source=demo_features_parquet_file_source,
    tags={
        "authors": "Benjamin Tan <benjamin.tan@abc.random.com, Varun Mallya <varun.mallya@abc.random.com",
        "description": "User Demographics",
        "used_by": "Income_Calculation_Team",
    },
)

On running feast apply we get

Traceback (most recent call last):
  File "/Users/varunmallya/Documents/personal_projects/chapter-4/myvenv/bin/feast", line 8, in <module>
    sys.exit(cli())
  File "/Users/varunmallya/Documents/personal_projects/chapter-4/myvenv/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/Users/varunmallya/Documents/personal_projects/chapter-4/myvenv/lib/python3.9/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/Users/varunmallya/Documents/personal_projects/chapter-4/myvenv/lib/python3.9/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/varunmallya/Documents/personal_projects/chapter-4/myvenv/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/varunmallya/Documents/personal_projects/chapter-4/myvenv/lib/python3.9/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/Users/varunmallya/Documents/personal_projects/chapter-4/myvenv/lib/python3.9/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/varunmallya/Documents/personal_projects/chapter-4/myvenv/lib/python3.9/site-packages/feast/cli.py", line 490, in apply_total_command
    apply_total(repo_config, repo, skip_source_validation)
  File "/Users/varunmallya/Documents/personal_projects/chapter-4/myvenv/lib/python3.9/site-packages/feast/usage.py", line 288, in wrapper
    return func(*args, **kwargs)
  File "/Users/varunmallya/Documents/personal_projects/chapter-4/myvenv/lib/python3.9/site-packages/feast/repo_operations.py", line 358, in apply_total
    apply_total_with_repo_instance(
  File "/Users/varunmallya/Documents/personal_projects/chapter-4/myvenv/lib/python3.9/site-packages/feast/repo_operations.py", line 308, in apply_total_with_repo_instance
    registry_diff, infra_diff, new_infra = store.plan(repo)
  File "/Users/varunmallya/Documents/personal_projects/chapter-4/myvenv/lib/python3.9/site-packages/feast/usage.py", line 299, in wrapper
    raise exc.with_traceback(traceback)
  File "/Users/varunmallya/Documents/personal_projects/chapter-4/myvenv/lib/python3.9/site-packages/feast/usage.py", line 288, in wrapper
    return func(*args, **kwargs)
  File "/Users/varunmallya/Documents/personal_projects/chapter-4/myvenv/lib/python3.9/site-packages/feast/feature_store.py", line 722, in plan
    self._make_inferences(
  File "/Users/varunmallya/Documents/personal_projects/chapter-4/myvenv/lib/python3.9/site-packages/feast/feature_store.py", line 586, in _make_inferences
    update_data_sources_with_inferred_event_timestamp_col(
  File "/Users/varunmallya/Documents/personal_projects/chapter-4/myvenv/lib/python3.9/site-packages/feast/inference.py", line 72, in update_data_sources_with_inferred_event_timestamp_col
    ) in data_source.get_table_column_names_and_types(config):
  File "/Users/varunmallya/Documents/personal_projects/chapter-4/myvenv/lib/python3.9/site-packages/typeguard/__init__.py", line 1033, in wrapper
    retval = func(*args, **kwargs)
  File "/Users/varunmallya/Documents/personal_projects/chapter-4/myvenv/lib/python3.9/site-packages/feast/infra/offline_stores/file_source.py", line 169, in get_table_column_names_and_types
    schema = ParquetDataset(path, filesystem=filesystem).schema
  File "/Users/varunmallya/Documents/personal_projects/chapter-4/myvenv/lib/python3.9/site-packages/pyarrow/parquet/__init__.py", line 1663, in __new__
    return _ParquetDatasetV2(
  File "/Users/varunmallya/Documents/personal_projects/chapter-4/myvenv/lib/python3.9/site-packages/pyarrow/parquet/__init__.py", line 2333, in __init__
    if filesystem.get_file_info(path_or_paths).is_file:
  File "pyarrow/_fs.pyx", line 441, in pyarrow._fs.FileSystem.get_file_info
  File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
OSError: When getting information for key 'demographic_features.parquet' in bucket 'feature-data-sets': AWS Error [code 99]: curlCode: 1, Unsupported protocol

To fix this we need to use s3fs filesystem and use arrow_schema after applying these changes, it seems to work.

Which issue(s) this PR fixes:

Fixes #

@vmallya-123 vmallya-123 changed the title fix_s3_override fix: s3_override_endpoint Oct 14, 2023
Signed-off-by: “Varun <varun.mallya@tech.jago.com>
Signed-off-by: “Varun <varun.mallya@tech.jago.com>
Signed-off-by: “Varun <varun.mallya@tech.jago.com>
Copy link
Collaborator

@etirelli etirelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vmallya-123 Thank you for your PR! There are a number of test failures, can you please take a look?

@vmallya-123
Copy link
Contributor Author

Sure thing will take a look

@vmallya-123 vmallya-123 changed the title fix: s3_override_endpoint fix: S3_override_endpoint Jan 30, 2024
@vmallya-123
Copy link
Contributor Author

hi @etirelli I am trying to install s3fs dependency, I have added it in setup.py and also ran lock-python-ci-dependencies, it's however its not being installed by the integration-test actions. What am I missing?

@tokoko
Copy link
Collaborator

tokoko commented Mar 19, 2024

@vmallya-123 We just dropped 3.8, so some of test failures might go away now. Can you try fixing conflicts when you get the chance? btw, the reason for failures might have been that you were using python 3.9 to lock dependencies for 3.8 and 3.10 as well. Normally I switch python versions for that (e.g. use 3.9 for locking 3.9 and 3.10 for locking 3.10). At least looking at it, that's the only idea i have.

@vmallya-123
Copy link
Contributor Author

Hey @tokoko sure can try that

@jeremyary
Copy link
Collaborator

Closing issue as stale, but please feel free to open a new PR or ping us here should you wish to continue the initiative & we can try and help troubleshoot conflicts/test fails. Thanks!

@jeremyary jeremyary closed this Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants