Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading data from Azure Blob occasionally causes InternalExeption #1026

Closed
grizzlybearg opened this issue Nov 1, 2023 · 8 comments · Fixed by #1487
Closed

Reading data from Azure Blob occasionally causes InternalExeption #1026

grizzlybearg opened this issue Nov 1, 2023 · 8 comments · Fixed by #1487
Assignees
Labels
Bug fixing drive Q1 2024 bug Something isn't working

Comments

@grizzlybearg
Copy link

Describe the bug

From time to time, reading a dataframe from Azure fails with a InternalException.

Below is the error:

`File /usr/local/python/3.11.6/lib/python3.11/site-packages/arcticdb/version_store/library.py:986, in Library.read(self, symbol, as_of, date_range, columns, query_builder)
921 def read(
922 self,
923 symbol: str,
(...)
927 query_builder: Optional[QueryBuilder] = None,
928 ) -> VersionedItem:
929 """
930 Read data for the named symbol. Returns a VersionedItem object with a data and metadata element (as passed into
931 write).
(...)
984 2 7
985 """
--> 986 return self._nvs.read(
987 symbol=symbol, as_of=as_of, date_range=date_range, columns=columns, query_builder=query_builder
988 )

File /usr/local/python/3.11.6/lib/python3.11/site-packages/arcticdb/version_store/_store.py:1619, in NativeVersionStore.read(self, symbol, as_of, date_range, row_range, columns, query_builder, **kwargs)
1615 query_builder = q.date_range(date_range).then(query_builder)
1616 version_query, read_options, read_query = self._get_queries(
1617 as_of, date_range, row_range, columns, query_builder, **kwargs
1618 )
-> 1619 read_result = self._read_dataframe(symbol, version_query, read_query, read_options)
1620 return self._post_process_dataframe(read_result, read_query, query_builder)

File /usr/local/python/3.11.6/lib/python3.11/site-packages/arcticdb/version_store/_store.py:1686, in NativeVersionStore._read_dataframe(self, symbol, version_query, read_query, read_options)
1685 def _read_dataframe(self, symbol, version_query, read_query, read_options):
-> 1686 return ReadResult(*self.version_store.read_dataframe_version(symbol, version_query, read_query, read_options))

InternalException: Azure::Core::OperationCancelledException(Request was cancelled by context.)`

Steps/Code to Reproduce

from arcticdb import Arctic

ac = Arctic("azure://.......)

ticker_lib=ac["symbol"]
ticker_lib.read(symbol).data

Expected Results

No exception

OS, Python Version and ArcticDB Version

Python 3.11
Arcticdb 4.0.1

Backend storage used

Azure

Additional Context

No response

@grizzlybearg grizzlybearg added the bug Something isn't working label Nov 1, 2023
@phoebusm phoebusm self-assigned this Nov 2, 2023
@vasil-pashov
Copy link
Collaborator

Hi @grizzlybearg thank you for reporting this. We haven't experienced such an exception until now. We are putting this in our backlog in order to prioritize it. We will keep you posted on any updates.

@grizzlybearg
Copy link
Author

Thanks in advance

@poodlewars
Copy link
Collaborator

hi @grizzlybearg which OS is this on? Are you using a PyPi wheel or a Conda build?

@grizzlybearg
Copy link
Author

I was using this with pypi wheel. The exception was often raised if the data was huge.

@grizzlybearg
Copy link
Author

grizzlybearg commented Mar 3, 2024

Hey @poodlewars, any update on this? Seems the error still persists despite new release updates since I reported this error. It seems to affect dataframes with a large number of columns. My dataframes with 10 columns have no problem loading but fails on dataframes with more than 100 columns

@poodlewars
Copy link
Collaborator

Thanks @grizzlybearg that's useful info. Do you have any more information about the shape of the data? Number of rows, column types? Are you using the dynamic schema library option? https://docs.arcticdb.io/latest/api/arctic/#arcticdb.LibraryOptions

@alexowens90 alexowens90 modified the milestone: Bug fixing drive Q1 2024 Mar 14, 2024
@phoebusm
Copy link
Collaborator

Hi @grizzlybearg Azure::Core::OperationCancelledException(Request was cancelled by context.) is an exception thrown by Azure C++ SDK, suggesting timeout in the connection with the server. It is likely because a large segment is being upload/download. If the time of completion of the tasks is longer than the timeout setting, exception will be thrown.
The timeout of Azure storage in arcticdb has been incorrectly set to 60000ms. In #1487, it will be set to 200000ms to align with the setting in S3.

Also, please consider changing library setting to:

  1. Static schema
  2. Smaller columns_per_segment and rows_per_segment
  3. For pickled data, make it smaller

See https://docs.arcticdb.io/latest/api/arctic/#arcticdb.LibraryOptions
They will help shredding the data into smaller segments, making each segment take shorter to be uploaded/downloaded.
The original 60000ms setting is already quite lenient.

@grizzlybearg
Copy link
Author

@phoebusm Thanks for the update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug fixing drive Q1 2024 bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants