-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault error calling library.read() using LMDB backend #181
Comments
Hi @Naich Well that's weird! Is it segfaulting on import do you know, or is it definitely on read? If it is on import, would you be able to try the wheel located here: https://github.com/man-group/ArcticDB/suites/11870429345/artifacts/620829633 You'll have to be signed into GItHub to access that file. It is the same code, but built slightly differently which might help with your environment. |
Hi mehertz, I ran some additional tests and it appears that the issue I've been experiencing may be related to parallel write to the same database. Here is the code I used for my tests: Method 1: Sequential Write from arcticdb import Arctic
import pandas as pd
import numpy as np
from tqdm.notebook import tqdm
ac_store = Arctic("lmdb://./testdb")
ac_store.create_library("test_db")
lib = ac_store.get_library('test_db')
date_idx = pd.Timestamp('2021-1-1') + pd.timedelta_range(start='1 days', end='720 days', periods=1_000_000)
df = pd.DataFrame(np.random.random(size=(len(date_idx))), index=date_idx, columns=['a'])
for _i in tqdm(range(1000)):
lib.write(f"TEST_{_i}",df) This method worked fine for read/write operations. Method 2: Parallel Write on Different Symbols from arcticdb import Arctic
import pandas as pd
import numpy as np
from tqdm.notebook import tqdm
from joblib import Parallel,delayed
#this function ensures no concurrent writes to single symbol
def write_ticker_db(db_id):
_ac_store = Arctic("lmdb://./testdb")
_lib = _ac_store['test_db']
date_idx = pd.Timestamp('2021-1-1') + pd.timedelta_range(start='1 days', end='720 days', periods=1_000_000)
df = pd.DataFrame(np.random.random(size=(len(date_idx))), index=date_idx, columns=['a'])
_lib.write(f'TEST_{db_id}', df)
return 0
rst = Parallel(n_jobs=40, backend='multiprocessing')(delayed(write_ticker_db)(db_id) for db_id in tqdm(range(1000))) When I wrote data using Method 2, I found that when performing heavy reads, the program would occasionally throw a Segmentation fault error. So does ArcticDB support concurrent writing on different symbols? In the documentation, it says that I need to set "staged=True" if I want to do concurrent writes on the same symbol, so it seems that concurrent writes on different symbols should be supported. However, maybe I am not understanding it correctly, and I would appreciate your suggestions. PS: I have already tried the wheel that you suggested on a clean environment, but I still get the same results. |
UPDATE: When I tried using larger actual stock data, even Sequential Write did not solve the problem. Head of traceback#1:
Head of traceback#2:
|
I have the same problem with sequential reading (LMDB backend). from arcticdb_ext import set_config_int
set_config_int('VersionStore.NumCPUThreads', 1) or from arcticdb_ext import set_config_int
set_config_int('VersionStore.NumIOThreads', 1) |
Thanks! It worked, so it seemed to be a multithread-read related bug. However the read throughput (in my case) dropped about 40% if apply this tread limit. from concurrent.futures import ProcessPoolExecutor
executor = ProcessPoolExecutor(max_workers=1)
def run_with_timeout(fn, timeout=0.1):
global executor
future = executor.submit(fn)
try:
result = future.result(timeout=timeout)
except Exception as e:
print('Error: {}; retrying'.format(e))
future.cancel()
executor.shutdown()
executor = ProcessPoolExecutor(max_workers=1)
result = run_with_timeout(fn, timeout)
return result
for i in range(4000):
result = run_with_timeout(random_lib_read, timeout=0.1) |
Could you give the example of the function you are passing to run_with_timeout please, and how you call it? I am trying to follow your dirty workaround, with a simple 'return arcdb['eom_fut_cont'].read(item_name).data' in my function, but it just times out. If I don't include the timeout it just hangs. def read_lib(item_name):
return arcdb['eom_fut_cont'].read(item_name).data
def run_until_success(fn, item_name, timeout=0.1):
global executor
future = executor.submit(fn, item_name)
try:
result = future.result(timeout=timeout)
except Exception as e:
print('Error: {}; retrying'.format(e))
future.cancel()
executor.shutdown()
executor = ProcessPoolExecutor(max_workers=1)
result = run_until_success2(fn, item_name, timeout)
return result
test = run_until_success(read_lib, item_name) I hope there can be a proper fix to this issue, it's happening very often when I try to read data, pretty much making arcticdb unusable for me. |
hi @Naich @huroh please could you retest with You might also be interested in the "Threads and Processes" section http://www.lmdb.tech/doc/starting.html . What you are doing, if I understand it correctly, should be safe, but it's worth noting that you should never open the same LMDB environment more than once from a given process. In our case, creating the Arctic instance |
Working fine for me now, thank you so much @poodlewars @mehertz and team for all your work on this great project |
I am trying to test the performance of ArcticDB database using the LMDB protocol. However, while testing lib.read(), I occasionally encounter a "Segmentation fault (core dumped)" or "free(): invalid pointerdouble free or corruption (out) Aborted (core dumped)"error without any other error messages, causing the program to exit abruptly.
The error seems to be random as I have recorded the symbol and date_range that triggered the error, but upon retrying, I was able to retrieve the data successfully.
Could you please provide guidance on possible reasons for the error and how to debug it? Thanks
I am using ArcticDB version 1.0.1, Python 3.8.16, and Ubuntu 20.04.1 LTS operating system.
Here is the code I used:
Here is the traceback info:
Attached is my environment info.
environment.txt
The text was updated successfully, but these errors were encountered: