C:\arrow\cpp\src\arrow\filesystem\ arrow::fs::FinalizeS3 was not called even though S3 was initialized. This could lead to a segmentation fault at exit #35771

Aricept094 opened this issue May 25, 2023
bug Something that is supposed to be working; but isn't P2 Important issue, but not time-critical


Aricept094 commented May 25, 2023

What happened + What you expected to happen

my trials are keep getting terminated and my models score zero with this error : C:\arrow\cpp\src\arrow\filesystem\ arrow::fs::FinalizeS3 was not called even though S3 was initialized. This could lead to a segmentation fault at exit.

i would appreciate some help . i use ray 2.4.0 and conda for windows

Versions / Dependencies

Reproduction script

input_size = X_train.shape[1]
num_cores = 16

class LSTMModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers, dropout):
        super(LSTMModel, self).__init__()

        self.lstm = nn.LSTM(input_size,  hidden_size, num_layers=num_layers, batch_first=True, dropout=dropout)
        self.linear = nn.Linear(hidden_size, output_size)
    def forward(self, x):
        x = x.view(x.shape[0], -1, input_size)
        h_0, c_0 = self.init_hidden(x.shape[0], x.device)
        out, _ = self.lstm(x, (h_0, c_0))
        out = self.linear(out[:, -1])
        return out.squeeze()

    def init_hidden(self, batch_size, device):
        h_0 = torch.zeros(self.lstm.num_layers, batch_size, self.lstm.hidden_size).to(device)
        c_0 = torch.zeros(self.lstm.num_layers, batch_size, self.lstm.hidden_size).to(device)
        return h_0, c_0

def train_model(model, optimizer, criterion, data_loader, device, scaler, scheduler):
    total_loss = 0
    for x_batch, y_batch in data_loader:
        x_batch, y_batch =,

        with torch.cuda.amp.autocast():
            output = model(x_batch)
            loss = criterion(output, y_batch)

        scaler.unscale_(optimizer)  # to check for any possible inf/nan gradients

        # step the optimizer manually

        # update the scaler

        # step the scheduler after the optimizer

        total_loss += loss.item()

    return total_loss

def evaluate_model(model, data_loader, device):
    predictions = []
    with torch.no_grad():
        for x_batch, y_batch in data_loader:
            x_batch =
            output = model(x_batch)
    return predictions

def objective(trial, device):
    hidden_size = trial.suggest_int('hidden_size', 500, 2000)
    num_layers = trial.suggest_int('num_layers', 1, 5)
    dropout = trial.suggest_float('dropout', 0.0, 0.5)
    lr = trial.suggest_float('lr', 1e-5, 1.0, log=True)
    optimizer_name = trial.suggest_categorical('optimizer', ['Adam', 'RMSprop', 'SGD', 'AdamW'])
    batch_size = trial.suggest_int('batch_size', 32, 256)
    scheduler_name = trial.suggest_categorical('lr_scheduler', ['StepLR', 'ExponentialLR'])
    gamma = trial.suggest_float('gamma', 0.05, 1.0)
    step_size = trial.suggest_int('step_size', 1, 100)

    model = LSTMModel(input_size, hidden_size, 1, num_layers, dropout)
    scaler = torch.cuda.amp.GradScaler()
    criterion = nn.MSELoss()

    optimizer_classes = {
        'Adam': torch.optim.Adam,
        'RMSprop': torch.optim.RMSprop,
        'SGD': torch.optim.SGD,
        'AdamW': torch.optim.AdamW
    optimizer = optimizer_classes[optimizer_name](model.parameters(), lr=lr)

    scheduler_classes = {
        'StepLR': torch.optim.lr_scheduler.StepLR,
        'ExponentialLR': torch.optim.lr_scheduler.ExponentialLR,

    if scheduler_name == 'StepLR':
        scheduler = scheduler_classes[scheduler_name](optimizer, step_size=step_size, gamma=gamma)
    elif scheduler_name == 'ExponentialLR':
        scheduler = scheduler_classes[scheduler_name](optimizer, gamma=gamma)

    train_data_loader = DataLoader(TensorDataset(X_train, y_train), batch_size=batch_size, pin_memory=True)
    val_data_loader = DataLoader(TensorDataset(X_val, y_val), batch_size=batch_size, pin_memory=True)

    for epoch in range(40):
        train_loss = train_model(model, optimizer, criterion, train_data_loader, device, scaler, scheduler)
        intermediate_value = 1.0 / (train_loss + 1e-5), epoch)
        if trial.should_prune():
            raise optuna.TrialPruned()

    predictions_val = evaluate_model(model, val_data_loader, device)
    binary_predictions_val = (np.array(predictions_val) > 0.5).astype(int)
    binary_labels_val = y_val.numpy().reshape(-1)
    f1_val = f1_score(binary_labels_val, binary_predictions_val)

    trial.set_user_attr("f1_val", f1_val)

    return f1_val

def trainable(config, checkpoint_dir=None):
    device = "cuda" if torch.cuda.is_available() else "cpu"
    trial = optuna.trial.FixedTrial(config)
    result = objective(trial, device)
if __name__ == "__main__":
    resources_per_trial = {"gpu": 1, "cpu": num_cores} if torch.cuda.is_available() else {"cpu": num_cores}
    scheduler = MedianStoppingRule(metric="score", mode="max")
    search_alg = OptunaSearch(metric="score", mode="max")

    analysis =
            "input_size": input_size,
            "hidden_size": tune.randint(500, 2000),
            "num_layers": tune.randint(1, 5),
            "dropout": tune.uniform(0.0, 0.5),
            "lr": tune.loguniform(1e-5, 1.0),
            "optimizer": tune.choice(['Adam', 'RMSprop', 'SGD', 'AdamW']),
            "batch_size": tune.randint(32, 256),
            "lr_scheduler": tune.choice(['StepLR', 'ExponentialLR']),
            "gamma": tune.uniform(0.05, 1.0),
            "step_size": tune.randint(1, 100),

    best_parameters = analysis.get_best_config(metric="score", mode="max")
    best_trial = analysis.get_best_trial(metric="score", mode="max")
    print('Best Trial: score {},\nparams {}'.format(best_trial.last_result["score"], best_parameters))

    for trial in analysis.trials:
        print(f"Trial {trial.trial_id}, F1 score: {trial.last_result['score']}")

@Aricept094 Aricept094 added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels May 25, 2023
@krfricke krfricke self-assigned this May 30, 2023
@krfricke krfricke added P1 Issue that should be fixed within a few weeks and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels May 30, 2023
@Aricept094 thanks for the issue.

Can you provide a full repro script? The current script does not include the imports, and also not the data (e.g. X_train is undefined). We can only fix the issue when we have a script that reproduces the error and that we can "just run".

On a side now, it doesn't look like you're doing any data uploading or processing within Ray, so it might be related to the way that you do data loading. For instance, if you use pyarrow to read a CSV into X_train. Here it might be a good idea to eitehr move the data loading into the trainable, or to use tune.with_parameters to avoid capturing the data (and dataloader) from the outer scope. See also

thank you for getting back to me
Here i made a repo:

I am relatively new to coding, and most of the code was done using GPT-4, so I apologize in advance for any obvious issues.

krfricke commented Jun 1, 2023

There are two main things I'd suggest to improve in your code.

First, you are using optuna within the trainable and as a searcher. That will probably not work well. If you're using Ray Tune, you can use the Optuna-provided search engine (e.g. Tree-parzen estimators), but you won't use the optuna interface. Here is a tutorial that will teach you how to use it:

As a TLDR, if you're using Ray Tune, you should use the OptunaSearcher, but not use the optuna-specific APIs, e.g. trial.suggest_int,, etc. Instead, you use Ray AIR's to report results during an epoch and configure the search space, stoppers, etc in the Tuner constructor.

Second, the problem you're experiencing likely comes up because you are loading the data outside the training function, and it's implicilty captured in the scope. This can lead to problems with stateful dataloaders, tensors etc.

I would suggest you move this code block into a separate function that you call either in the objective or in the trainable.

You should also consider using tune.with_parameters to pass X and y as arguments to trainable - this will avoid serializing these objects with the trainable. If you're training on a lot of data this can lead to problems.

This may already resolve the problem you're experiencing. Maybe you can try updating your code and let us know? I'm also happy to take another look when you made the update.

wow , thanks a lot . i will do my best .

@krfricke krfricke added P2 Important issue, but not time-critical and removed P1 Issue that should be fixed within a few weeks labels Jun 21, 2023
krfricke commented Jul 5, 2023

I'll close this for now, please feel free to re-open if the problem still comes up!

@krfricke krfricke closed this as completed Jul 5, 2023
I have the identical problem with PyArrow and am not running Ray-Project.

