Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C:\arrow\cpp\src\arrow\filesystem\s3fs.cc:2598: arrow::fs::FinalizeS3 was not called even though S3 was initialized. This could lead to a segmentation fault at exit #35771

Closed
Aricept094 opened this issue May 25, 2023 · 6 comments
Assignees
Labels
bug Something that is supposed to be working; but isn't P2 Important issue, but not time-critical

Comments

@Aricept094
Copy link

Aricept094 commented May 25, 2023

What happened + What you expected to happen

my trials are keep getting terminated and my models score zero with this error : C:\arrow\cpp\src\arrow\filesystem\s3fs.cc:2598: arrow::fs::FinalizeS3 was not called even though S3 was initialized. This could lead to a segmentation fault at exit.

i would appreciate some help . i use ray 2.4.0 and conda for windows

Versions / Dependencies

# Name                    Version                   Build  Channel
aiobotocore               2.5.0                    pypi_0    pypi
aiohttp-cors              0.7.0                    pypi_0    pypi
aioitertools              0.11.0                   pypi_0    pypi
alembic                   1.11.1                   pypi_0    pypi
ansicon                   1.89.0                   pypi_0    pypi
asttokens                 2.2.1                    pypi_0    pypi
async-generator           1.10                     pypi_0    pypi
attrs                     23.1.0                   pypi_0    pypi
ax-platform               0.3.2                    pypi_0    pypi
backcall                  0.2.0                    pypi_0    pypi
blessed                   1.20.0                   pypi_0    pypi
bokeh                     3.1.1                    pypi_0    pypi
boto3                     1.26.140                 pypi_0    pypi
botocore                  1.29.76                  pypi_0    pypi
botorch                   0.8.5                    pypi_0    pypi
bzip2                     1.0.8                h8ffe710_4    conda-forge
ca-certificates           2023.5.7             h56e8100_0    conda-forge
certifi                   2023.5.7                 pypi_0    pypi
cffi                      1.15.1                   pypi_0    pypi
charset-normalizer        3.1.0                    pypi_0    pypi
click                     8.1.3                    pypi_0    pypi
cloudpickle               2.2.1                    pypi_0    pypi
cmaes                     0.9.1                    pypi_0    pypi
colorama                  0.4.6                    pypi_0    pypi
colorful                  0.5.5                    pypi_0    pypi
colorlog                  6.7.0                    pypi_0    pypi
comm                      0.1.3                    pypi_0    pypi
contourpy                 1.0.7                    pypi_0    pypi
cuda-version              11.8                 h70ddcb2_2    conda-forge
cudatoolkit               11.8.0              h09e9e62_11    conda-forge
cudnn                     8.8.0.121            h9631440_0    conda-forge
cycler                    0.11.0                   pypi_0    pypi
dask                      2023.5.0                 pypi_0    pypi
debugpy                   1.6.7                    pypi_0    pypi
decorator                 5.1.1                    pypi_0    pypi
distributed               2023.5.0                 pypi_0    pypi
eli5                      0.13.0                   pypi_0    pypi
et-xmlfile                1.1.0                    pypi_0    pypi
exceptiongroup            1.1.1                    pypi_0    pypi
executing                 1.2.0                    pypi_0    pypi
filelock                  3.12.0                   pypi_0    pypi
fonttools                 4.39.4                   pypi_0    pypi
fsspec                    2023.5.0                 pypi_0    pypi
gast                      0.4.0                    pypi_0    pypi
google-api-core           2.11.0                   pypi_0    pypi
google-auth               2.18.1                   pypi_0    pypi
google-auth-oauthlib      1.0.0                    pypi_0    pypi
google-pasta              0.2.0                    pypi_0    pypi
googleapis-common-protos  1.59.0                   pypi_0    pypi
gpustat                   1.1                      pypi_0    pypi
gpytorch                  1.10                     pypi_0    pypi
greenlet                  2.0.2                    pypi_0    pypi
grpcio                    1.51.3                   pypi_0    pypi
h11                       0.14.0                   pypi_0    pypi
h5py                      3.8.0                    pypi_0    pypi
idna                      3.4                      pypi_0    pypi
imageio                   2.28.1                   pypi_0    pypi
imbalanced-learn          0.10.1                   pypi_0    pypi
importlib-metadata        6.6.0                    pypi_0    pypi
ipykernel                 6.23.1                   pypi_0    pypi
ipython                   8.13.2                   pypi_0    pypi
ipywidgets                8.0.6                    pypi_0    pypi
jax                       0.4.10                   pypi_0    pypi
jedi                      0.18.2                   pypi_0    pypi
jinja2                    3.1.2                    pypi_0    pypi
jinxed                    1.2.0                    pypi_0    pypi
jmespath                  1.0.1                    pypi_0    pypi
joblib                    1.2.0                    pypi_0    pypi
jsonschema                4.17.3                   pypi_0    pypi
jupyter-client            8.2.0                    pypi_0    pypi
jupyter-core              5.3.0                    pypi_0    pypi
jupyterlab-widgets        3.0.7                    pypi_0    pypi
keras                     2.12.0                   pypi_0    pypi
kiwisolver                1.4.4                    pypi_0    pypi
lazy-loader               0.2                      pypi_0    pypi
libffi                    3.4.2                h8ffe710_5    conda-forge
libsqlite                 3.42.0               hcfcfb64_0    conda-forge
libzlib                   1.2.13               hcfcfb64_4    conda-forge
libzlib-wapi              1.2.13               hcfcfb64_4    conda-forge
linear-operator           0.4.0                    pypi_0    pypi
locket                    1.0.0                    pypi_0    pypi
lz4                       4.3.2                    pypi_0    pypi
markdown                  3.4.3                    pypi_0    pypi
markupsafe                2.1.2                    pypi_0    pypi
matplotlib                3.7.1                    pypi_0    pypi
matplotlib-inline         0.1.6                    pypi_0    pypi
ml-dtypes                 0.1.0                    pypi_0    pypi
mpmath                    1.3.0                    pypi_0    pypi
msgpack                   1.0.5                    pypi_0    pypi
multipledispatch          0.6.0                    pypi_0    pypi
nest-asyncio              1.5.6                    pypi_0    pypi
networkx                  3.1                      pypi_0    pypi
numpy                     1.23.5                   pypi_0    pypi
nvidia-ml-py              11.525.112               pypi_0    pypi
opencensus                0.11.2                   pypi_0    pypi
opencensus-context        0.1.3                    pypi_0    pypi
openpyxl                  3.1.2                    pypi_0    pypi
openssl                   3.1.0                hcfcfb64_3    conda-forge
opt-einsum                3.3.0                    pypi_0    pypi
optuna                    3.1.1                    pypi_0    pypi
outcome                   1.2.0                    pypi_0    pypi
pandas                    2.0.1                    pypi_0    pypi
pandas-ta                 0.3.14b0                 pypi_0    pypi
parso                     0.8.3                    pypi_0    pypi
partd                     1.4.0                    pypi_0    pypi
pickleshare               0.7.5                    pypi_0    pypi
pillow                    9.5.0                    pypi_0    pypi
pip                       23.1.2             pyhd8ed1ab_0    conda-forge
platformdirs              3.5.1                    pypi_0    pypi
plotly                    5.14.1                   pypi_0    pypi
prometheus-client         0.16.0                   pypi_0    pypi
prompt-toolkit            3.0.38                   pypi_0    pypi
protobuf                  3.20.3                   pypi_0    pypi
psutil                    5.9.5                    pypi_0    pypi
pure-eval                 0.2.2                    pypi_0    pypi
py-spy                    0.3.14                   pypi_0    pypi
pyarrow                   12.0.0                   pypi_0    pypi
pyasn1                    0.5.0                    pypi_0    pypi
pyasn1-modules            0.3.0                    pypi_0    pypi
pycparser                 2.21                     pypi_0    pypi
pygments                  2.15.1                   pypi_0    pypi
pyjwt                     2.7.0                    pypi_0    pypi
pyparsing                 3.0.9                    pypi_0    pypi
pyro-api                  0.1.2                    pypi_0    pypi
pyro-ppl                  1.8.4                    pypi_0    pypi
pyrsistent                0.19.3                   pypi_0    pypi
pysocks                   1.7.1                    pypi_0    pypi
python                    3.10.11         h4de0772_0_cpython    conda-forge
python-dateutil           2.8.2                    pypi_0    pypi
python-graphviz           0.20.1                   pypi_0    pypi
pytz                      2023.3                   pypi_0    pypi
pywavelets                1.4.1                    pypi_0    pypi
pywin32                   306                      pypi_0    pypi
pyyaml                    6.0                      pypi_0    pypi
pyzmq                     25.0.2                   pypi_0    pypi
ray                       2.4.0                    pypi_0    pypi
requests                  2.30.0                   pypi_0    pypi
requests-oauthlib         1.3.1                    pypi_0    pypi
rsa                       4.9                      pypi_0    pypi
s3fs                      2023.5.0                 pypi_0    pypi
s3transfer                0.6.1                    pypi_0    pypi
scikit-image              0.20.0                   pypi_0    pypi
scikit-learn              1.2.2                    pypi_0    pypi
scipy                     1.10.1                   pypi_0    pypi
selenium                  4.9.1                    pypi_0    pypi
setuptools                67.7.2             pyhd8ed1ab_0    conda-forge
six                       1.16.0                   pypi_0    pypi
smart-open                6.3.0                    pypi_0    pypi
sniffio                   1.3.0                    pypi_0    pypi
sortedcontainers          2.4.0                    pypi_0    pypi
sqlalchemy                2.0.15                   pypi_0    pypi
stack-data                0.6.2                    pypi_0    pypi
sympy                     1.12                     pypi_0    pypi
ta                        0.10.2                   pypi_0    pypi
ta-lib                    0.4.26                   pypi_0    pypi
tabulate                  0.9.0                    pypi_0    pypi
tblib                     1.7.0                    pypi_0    pypi
tenacity                  8.2.2                    pypi_0    pypi
tensorboard               2.12.3                   pypi_0    pypi
tensorboard-data-server   0.7.0                    pypi_0    pypi
tensorboardx              2.6                      pypi_0    pypi
tensorflow                2.12.0                   pypi_0    pypi
tensorflow-addons         0.20.0                   pypi_0    pypi
tensorflow-estimator      2.12.0                   pypi_0    pypi
tensorflow-intel          2.12.0                   pypi_0    pypi
tensorflow-io-gcs-filesystem 0.31.0                   pypi_0    pypi
threadpoolctl             3.1.0                    pypi_0    pypi
tifffile                  2023.4.12                pypi_0    pypi
tk                        8.6.12               h8ffe710_0    conda-forge
toolz                     0.12.0                   pypi_0    pypi
torch                     2.0.0+cu118              pypi_0    pypi
torch-summary             1.4.5                    pypi_0    pypi
tornado                   6.3.2                    pypi_0    pypi
tqdm                      4.65.0                   pypi_0    pypi
trio                      0.22.0                   pypi_0    pypi
trio-websocket            0.10.2                   pypi_0    pypi
typing-extensions         4.5.0                    pypi_0    pypi
tzdata                    2023.3                   pypi_0    pypi
ucrt                      10.0.22621.0         h57928b3_0    conda-forge
urllib3                   1.26.15                  pypi_0    pypi
vc                        14.3                hb25d44b_16    conda-forge
vc14_runtime              14.34.31931         h5081d32_16    conda-forge
vs2015_runtime            14.34.31931         hed1258a_16    conda-forge
wcwidth                   0.2.6                    pypi_0    pypi
werkzeug                  2.3.4                    pypi_0    pypi
wheel                     0.40.0             pyhd8ed1ab_0    conda-forge
widgetsnbextension        4.0.7                    pypi_0    pypi
wrapt                     1.14.1                   pypi_0    pypi
wsproto                   1.2.0                    pypi_0    pypi
xyzservices               2023.5.0                 pypi_0    pypi
xz                        5.2.6                h8d14728_0    conda-forge
zict                      3.0.0                    pypi_0    pypi
zipp                      3.15.0                   pypi_0    pypi

Reproduction script

ray.init(_metrics_export_port=9191)
input_size = X_train.shape[1]
num_cores = 16

class LSTMModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers, dropout):
        super(LSTMModel, self).__init__()

        self.lstm = nn.LSTM(input_size,  hidden_size, num_layers=num_layers, batch_first=True, dropout=dropout)
        self.linear = nn.Linear(hidden_size, output_size)
        
    def forward(self, x):
        x = x.view(x.shape[0], -1, input_size)
        h_0, c_0 = self.init_hidden(x.shape[0], x.device)
        out, _ = self.lstm(x, (h_0, c_0))
        out = self.linear(out[:, -1])
        return out.squeeze()

    def init_hidden(self, batch_size, device):
        h_0 = torch.zeros(self.lstm.num_layers, batch_size, self.lstm.hidden_size).to(device)
        c_0 = torch.zeros(self.lstm.num_layers, batch_size, self.lstm.hidden_size).to(device)
        return h_0, c_0

def train_model(model, optimizer, criterion, data_loader, device, scaler, scheduler):
    model.train()
    total_loss = 0
    for x_batch, y_batch in data_loader:
        x_batch, y_batch = x_batch.to(device), y_batch.to(device)
        optimizer.zero_grad()

        with torch.cuda.amp.autocast():
            output = model(x_batch)
            loss = criterion(output, y_batch)

        scaler.scale(loss).backward()
        scaler.unscale_(optimizer)  # to check for any possible inf/nan gradients

        # step the optimizer manually
        optimizer.step()

        # update the scaler
        scaler.update()

        # step the scheduler after the optimizer
        scheduler.step()

        total_loss += loss.item()

    return total_loss



def evaluate_model(model, data_loader, device):
    model.eval()
    predictions = []
    with torch.no_grad():
        for x_batch, y_batch in data_loader:
            x_batch = x_batch.to(device)
            output = model(x_batch)
            predictions.extend(torch.sigmoid(output).detach().cpu().numpy().flatten())
    return predictions

def objective(trial, device):
    
    hidden_size = trial.suggest_int('hidden_size', 500, 2000)
    num_layers = trial.suggest_int('num_layers', 1, 5)
    dropout = trial.suggest_float('dropout', 0.0, 0.5)
    lr = trial.suggest_float('lr', 1e-5, 1.0, log=True)
    optimizer_name = trial.suggest_categorical('optimizer', ['Adam', 'RMSprop', 'SGD', 'AdamW'])
    batch_size = trial.suggest_int('batch_size', 32, 256)
    scheduler_name = trial.suggest_categorical('lr_scheduler', ['StepLR', 'ExponentialLR'])
    gamma = trial.suggest_float('gamma', 0.05, 1.0)
    step_size = trial.suggest_int('step_size', 1, 100)

    model = LSTMModel(input_size, hidden_size, 1, num_layers, dropout)
    model.to(device)
    
    scaler = torch.cuda.amp.GradScaler()
    criterion = nn.MSELoss()

    optimizer_classes = {
        'Adam': torch.optim.Adam,
        'RMSprop': torch.optim.RMSprop,
        'SGD': torch.optim.SGD,
        'AdamW': torch.optim.AdamW
    }
    optimizer = optimizer_classes[optimizer_name](model.parameters(), lr=lr)

    scheduler_classes = {
        'StepLR': torch.optim.lr_scheduler.StepLR,
        'ExponentialLR': torch.optim.lr_scheduler.ExponentialLR,
    }

    if scheduler_name == 'StepLR':
        scheduler = scheduler_classes[scheduler_name](optimizer, step_size=step_size, gamma=gamma)
    elif scheduler_name == 'ExponentialLR':
        scheduler = scheduler_classes[scheduler_name](optimizer, gamma=gamma)

    train_data_loader = DataLoader(TensorDataset(X_train, y_train), batch_size=batch_size, pin_memory=True)
    val_data_loader = DataLoader(TensorDataset(X_val, y_val), batch_size=batch_size, pin_memory=True)

    for epoch in range(40):
        train_loss = train_model(model, optimizer, criterion, train_data_loader, device, scaler, scheduler)
        intermediate_value = 1.0 / (train_loss + 1e-5)
        trial.report(intermediate_value, epoch)
        if trial.should_prune():
            raise optuna.TrialPruned()

    predictions_val = evaluate_model(model, val_data_loader, device)
    binary_predictions_val = (np.array(predictions_val) > 0.5).astype(int)
    binary_labels_val = y_val.numpy().reshape(-1)
    f1_val = f1_score(binary_labels_val, binary_predictions_val)

    trial.set_user_attr("f1_val", f1_val)

    return f1_val

def trainable(config, checkpoint_dir=None):
    device = "cuda" if torch.cuda.is_available() else "cpu"
    trial = optuna.trial.FixedTrial(config)
    result = objective(trial, device)
    tune.report(score=result)
                  
if __name__ == "__main__":
    resources_per_trial = {"gpu": 1, "cpu": num_cores} if torch.cuda.is_available() else {"cpu": num_cores}
    scheduler = MedianStoppingRule(metric="score", mode="max")
    search_alg = OptunaSearch(metric="score", mode="max")

    analysis = tune.run(
        trainable,
        config={
            "input_size": input_size,
            "hidden_size": tune.randint(500, 2000),
            "num_layers": tune.randint(1, 5),
            "dropout": tune.uniform(0.0, 0.5),
            "lr": tune.loguniform(1e-5, 1.0),
            "optimizer": tune.choice(['Adam', 'RMSprop', 'SGD', 'AdamW']),
            "batch_size": tune.randint(32, 256),
            "lr_scheduler": tune.choice(['StepLR', 'ExponentialLR']),
            "gamma": tune.uniform(0.05, 1.0),
            "step_size": tune.randint(1, 100),
        },
        resources_per_trial=resources_per_trial,
        num_samples=15,
        scheduler=scheduler,
        search_alg=search_alg,
    )

    best_parameters = analysis.get_best_config(metric="score", mode="max")
    best_trial = analysis.get_best_trial(metric="score", mode="max")
    
    print('Best Trial: score {},\nparams {}'.format(best_trial.last_result["score"], best_parameters))

    for trial in analysis.trials:
        print(f"Trial {trial.trial_id}, F1 score: {trial.last_result['score']}")
        
    ray.shutdown()

Issue Severity

None

@Aricept094 Aricept094 added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels May 25, 2023
@krfricke krfricke self-assigned this May 30, 2023
@krfricke krfricke added P1 Issue that should be fixed within a few weeks and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels May 30, 2023
@krfricke
Copy link
Contributor

@Aricept094 thanks for the issue.

Can you provide a full repro script? The current script does not include the imports, and also not the data (e.g. X_train is undefined). We can only fix the issue when we have a script that reproduces the error and that we can "just run".

On a side now, it doesn't look like you're doing any data uploading or processing within Ray, so it might be related to the way that you do data loading. For instance, if you use pyarrow to read a CSV into X_train. Here it might be a good idea to eitehr move the data loading into the trainable, or to use tune.with_parameters to avoid capturing the data (and dataloader) from the outer scope. See also https://docs.ray.io/en/latest/tune/api/doc/ray.tune.with_parameters.html

@Aricept094
Copy link
Author

thank you for getting back to me
Here i made a repo: https://github.com/Aricept094/pyarrowS3ERROR.git

I am relatively new to coding, and most of the code was done using GPT-4, so I apologize in advance for any obvious issues.

@krfricke
Copy link
Contributor

krfricke commented Jun 1, 2023

There are two main things I'd suggest to improve in your code.

First, you are using optuna within the trainable and as a searcher. That will probably not work well. If you're using Ray Tune, you can use the Optuna-provided search engine (e.g. Tree-parzen estimators), but you won't use the optuna interface. Here is a tutorial that will teach you how to use it: https://docs.ray.io/en/latest/tune/examples/optuna_example.html

As a TLDR, if you're using Ray Tune, you should use the OptunaSearcher, but not use the optuna-specific APIs, e.g. trial.suggest_int, trial.report, etc. Instead, you use Ray AIR's session.report to report results during an epoch and configure the search space, stoppers, etc in the Tuner constructor.

Second, the problem you're experiencing likely comes up because you are loading the data outside the training function, and it's implicilty captured in the scope. This can lead to problems with stateful dataloaders, tensors etc.

I would suggest you move this code block into a separate function that you call either in the objective or in the trainable.

You should also consider using tune.with_parameters to pass X and y as arguments to trainable - this will avoid serializing these objects with the trainable. If you're training on a lot of data this can lead to problems.

This may already resolve the problem you're experiencing. Maybe you can try updating your code and let us know? I'm also happy to take another look when you made the update.

@Aricept094
Copy link
Author

wow , thanks a lot . i will do my best .

@krfricke krfricke added P2 Important issue, but not time-critical and removed P1 Issue that should be fixed within a few weeks labels Jun 21, 2023
@krfricke
Copy link
Contributor

krfricke commented Jul 5, 2023

I'll close this for now, please feel free to re-open if the problem still comes up!

@krfricke krfricke closed this as completed Jul 5, 2023
@Tunneller
Copy link

I have the identical problem with PyArrow and am not running Ray-Project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't P2 Important issue, but not time-critical
Projects
None yet
Development

No branches or pull requests

3 participants