# Identifying even or odd number of digits in any whole number without using number of digits in features
In this notebook, I wanted to explore an ideal Neural Network for identifying if a passed number has even or odd number of digits based on the number alone.
- This would entail that the model needs to understand that $digits = log(|n|) + 1$, where $n$ is the number of digits would generally give us the answer for if it is even or odd. But how can a Neural Network do this, and more importantly choice of the NN?

In [1]:
import torch

In [2]:
if torch.backends.mps.is_available():
    mps_device = torch.device("mps")
    x = torch.ones(1, device=mps_device)
    print(x)
    print(f"MPS device found: {mps_device}")
else:
    print("MPS device not found or not built with MPS enabled.")
    print("Check macOS version (12.3+) and PyTorch installation.")


tensor([1.], device='mps:0')
MPS device found: mps


In [3]:
import torch.nn as nn
import numpy as np
import pandas as pd
from torch.utils.data import Dataset, DataLoader
import torch.optim as optim
import tqdm


# Data Generation
We need to generate uniform data across all lengths evenly. Now if we generate even number of number for every length, that wouldn't make sense because for example we only have 10 numbers between 0 and 9, but we have 9000 number between 1000 and 9999. But how would we sample the different numbers within each range? 

- We need the number of data points to indcrease with increased number of digits. This can be increased randomly or in multiples of a certain number.
- We can choose to have data points for all ranges or not.

*Solution?*
- Let's test on 3 different dataset configurations: One with each setting mentioned in above bullet points.

In [4]:
def generate_dataset(n_samples:int=10e6, exp=15, random_ranges=False, all_ranges=True):
    ## If random ranges is True, that means we can sample by the exponents and not numbers itself
    if random_ranges:
        exponents = np.random.randint(0, exp, size=n_samples)
        
        unique_exps, counts = np.unique(exponents, return_counts=True)
        # unique_exps = [0, 1, 2]
        # counts = [2, 3, 4]  <- exp 0 appears 2 times, exp 1 appears 3 times, 
        # exp 2 appears 4 times

        
        numbers = []
        for e, count in zip(unique_exps, counts):
            lower, upper = 10**e, 10**(e+1)
            range_size = upper - lower

            if count <= range_size:
                nums = np.random.choice(range_size, size=count, replace=False)
                # choice means duplicate numbers are not allowed
                numbers.extend(nums)
            else:
                nums = np.random.randint(lower, upper, size=count)
                numbers.extend(np.unique(nums))
                
    ## otherwise, we do it proportional to the log size of random range from 
    ## -exp to +exp to see how important each order of magnitude is
    else:
        # we will use weight technique to weigh how many samples we want
        # from each order of magnitude

        # all_ranges means we want to sample from all ranges equally
        if all_ranges:
            weights = [9 * (10 ** i) for i in range(exp)] # so like 9, 90, 900, 9000, ...
            total_weights = sum(weights)
            
            probabilities = [n_samples * (w / total_weights) for w in weights]
        else:
            weights = np.random.randint(1, exp, size=exp)
            total_weights = sum(weights)
            probabilities = [n_samples * (w / total_weights) for w in weights]
        numbers = []
        for i, p in enumerate(probabilities):
            count = int(p)

            if count == 0:
                continue

            lower, upper = 10**i, 10**(i+1)
            range_size = upper - lower
            if count <= range_size:
                nums = np.random.choice(np.arange(lower, upper), size=count, replace=False)
                numbers.extend(nums)
            else:
                nums = np.random.randint(lower, upper, size=count)
                numbers.extend(np.unique(nums))


    labels = [1 if len(str(n)) % 2 == 0 else 0 for n in numbers]
    return np.array(numbers), np.array(labels)

In [5]:
# Data Type 1: Balanced dataset from all ranges
X, Y = generate_dataset(n_samples=int(10e5), exp=6)
print(X.shape, Y.shape)
print(X[59990:60000], Y[59990:60000])

(999999,) (999999,)
[98746 18555 26132 64747 72689 74940 67966 61145 37937 83937] [0 0 0 0 0 0 0 0 0 0]


### Data splits

In [6]:
# data splits
split = {"train": 0.7, "val": 0.2, "test": 0.1}
n_samples = X.shape[0]

# split randomly
indices = np.random.permutation(n_samples)

X_train = X[indices[:int(split["train"] * n_samples)]]
X_val = X[indices[int(split["train"] * n_samples):int((split["train"] + split["val"]) * n_samples)]]
X_test = X[indices[int((split["train"] + split["val"]) * n_samples):]]

Y_train = Y[indices[:int(split["train"] * n_samples)]]
Y_val = Y[indices[int(split["train"] * n_samples):int((split["train"] + split["val"]) * n_samples)]]
Y_test = Y[indices[int((split["train"] + split["val"]) * n_samples):]]

# Neural Networks
As mentioned at the start, we need a NN that can learn log functions. For this purpose, simple MLPs might be more difficult to work with. Given below is a list of networks we would test and compare for our problem.

### 1. Gradient Boosted Trees
- Ensemble models can be a good choice for problems like this where models can learn from data splits. 
- In the case of GB trees, splits will be recurring to avoid data bias which may occur based on seen data. 
- If we take our own dataset, certain ranges have much lower number of samples (ex. 0-9) in comparison to others (ex. 1000 - 9999).
- We can hypothesize that GB trees can learn a boundary function on repetitive training on splits.

### 2. Random Forests
- While GB trees could learn sequentially for handling any data bias, random forests can deal with variance in the data.
- While we can hypothesize that GB Trees would perform better than RF, this could still be a good contender.
- RF has the potential for learning the power-of-10 boundaries where the label changes from 0 to 1 and back to 0.

### 3. MLP with RBF Kernel
- While we can hypothesize that this will not perform as well as the latter 2, RBF kernels can be good at drawing decision non-linear decision boundaries.

## Gradient Boosted Trees

In [5]:
# We will use XGBoost for this task
!pip3 install xgboost
import xgboost as xgb



In [8]:
# xgboost requires our data to be in 2D array
DX_train = X_train.reshape(-1, 1)
DX_val = X_val.reshape(-1, 1)
DX_test = X_test.reshape(-1, 1)

In [6]:
gbt_model = xgb.XGBClassifier(
    max_depth=12,
    n_estimators=100,
    learning_rate=0.1,
    objective='binary:logistic',
    eval_metric='logloss',
    random_state=42
)

In [None]:
print("Training Gradient Boosted Trees Model...")
gbt_model.fit(DX_train, Y_train)

In [10]:
y_pred = gbt_model.predict(DX_test)
accuracy = np.mean(y_pred == Y_test)
print(f"Test Accuracy: {accuracy * 100:.2f}%")

Test Accuracy: 99.51%


In [11]:
# lets see where it fails
[(len(str(DX_test[i])), Y_test[i], y_pred[i]) for i in range(len(Y_test)) if Y_test[i] != y_pred[i]][:10]

[(7, np.int64(0), np.int64(1)),
 (7, np.int64(0), np.int64(1)),
 (5, np.int64(0), np.int64(1)),
 (7, np.int64(0), np.int64(1)),
 (7, np.int64(0), np.int64(1)),
 (7, np.int64(0), np.int64(1)),
 (7, np.int64(0), np.int64(1)),
 (7, np.int64(0), np.int64(1)),
 (7, np.int64(0), np.int64(1)),
 (5, np.int64(0), np.int64(1))]

## Random Forest

In [7]:
from sklearn.ensemble import RandomForestClassifier

rf_model = RandomForestClassifier(
    n_estimators=100,
    max_depth=12,
    min_samples_split=5,
    min_samples_leaf=2,
    bootstrap=True,
    random_state=42
)



In [None]:
print("Training Random Forest Model...")
rf_model.fit(DX_train, Y_train)

In [13]:
y_pred = rf_model.predict(DX_val)
accuracy = np.mean(y_pred == Y_val)
print(f"Validation Accuracy: {accuracy * 100:.2f}%")

Validation Accuracy: 100.00%


In [14]:
y_pred = rf_model.predict(DX_test)

accuracy = np.mean(y_pred == Y_test)
print(f"Test Accuracy: {accuracy * 100:.2f}%")

Test Accuracy: 100.00%


## Simple MLP with RBF Kernel

In [8]:
'''
In RBF kernel, we will want the decision boundary to look like concentric circles
around the origin for each reference point - reference point here being the 
boundaries from one order of magnitude to another.
'''

def rbf_features(X, centers, gamma=1.0):
    '''
    X: numpy array of input numbers (n_samples, 1)
    centers: reference points (powers of 10)
    gamma: parameter for RBF kernel for width (higher gamma = narrower kernel)
    '''
    X = X.astype(np.float64)
    centers = centers.astype(np.float64)
    # || X - center ||^2
    distances_sq = (X - centers.reshape(1, -1)) ** 2

    # rbf kernel = exp(-gamma * distance^2)
    rbf = np.exp(-gamma * distances_sq)
    return rbf

In [13]:
positive_centers = np.array([10**i for i in range(1, 6)])  # 10, 100, ..., 10^15
negative_centers = np.array([-10**i for i in range(1, 6)])  # -10, -100, ..., -10^15
all_centers = np.concatenate([positive_centers, negative_centers])


# we choose a smaller gamma to have wider kernels since our span of data is large
gamma = 1e-10



In [14]:
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
# we want to standaridize the rbf features
scaler = StandardScaler()


In [None]:

PX_train = rbf_features(DX_train, all_centers, gamma=gamma)
PX_val = rbf_features(DX_val, all_centers, gamma=gamma)
PX_test = rbf_features(DX_test, all_centers, gamma=gamma)
PX_train = scaler.fit_transform(PX_train)
PX_val = scaler.transform(PX_val)

In [10]:

#MLP model
mlp_model = MLPClassifier(
    hidden_layer_sizes = (128, 64, 32), # 3 hidden layers
    activation='relu',
    solver='adam',
    max_iter=100,
    learning_rate_init=0.001,
    random_state=42,
    batch_size=256,
    early_stopping=True,
    verbose=True
)

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
# we want to standaridize the rbf features
scaler = StandardScaler()
PX_train = scaler.fit_transform(PX_train)
PX_val = scaler.transform(PX_val)

#MLP model
mlp_model = MLPClassifier(
    hidden_layer_sizes = (128, 64, 32), # 3 hidden layers
    activation='relu',
    solver='adam',
    max_iter=100,
    learning_rate_init=0.001,
    random_state=42,
    batch_size=256,
    early_stopping=True,
    verbose=True
)



Training MLP Model with RBF Features...
Iteration 1, loss = 0.02155356
Validation score: 0.998114
Iteration 2, loss = 0.00464805
Validation score: 0.998014
Iteration 3, loss = 0.00408982
Validation score: 0.999343
Iteration 4, loss = 0.00375168
Validation score: 0.996843
Iteration 5, loss = 0.00350684
Validation score: 0.998871
Iteration 6, loss = 0.00329555
Validation score: 0.999071
Iteration 7, loss = 0.00303089
Validation score: 0.999529
Iteration 8, loss = 0.00320611
Validation score: 0.998614
Iteration 9, loss = 0.00299132
Validation score: 0.999129
Iteration 10, loss = 0.00283701
Validation score: 0.999514
Iteration 11, loss = 0.00279841
Validation score: 0.999186
Iteration 12, loss = 0.00276658
Validation score: 0.998014
Iteration 13, loss = 0.00291125
Validation score: 0.999443
Iteration 14, loss = 0.00284425
Validation score: 0.999343
Iteration 15, loss = 0.00276574
Validation score: 0.998614
Iteration 16, loss = 0.00261296
Validation score: 0.998600
Iteration 17, loss = 0.00

0,1,2
,"hidden_layer_sizes  hidden_layer_sizes: array-like of shape(n_layers - 2,), default=(100,) The ith element represents the number of neurons in the ith hidden layer.","(128, ...)"
,"activation  activation: {'identity', 'logistic', 'tanh', 'relu'}, default='relu' Activation function for the hidden layer. - 'identity', no-op activation, useful to implement linear bottleneck,  returns f(x) = x - 'logistic', the logistic sigmoid function,  returns f(x) = 1 / (1 + exp(-x)). - 'tanh', the hyperbolic tan function,  returns f(x) = tanh(x). - 'relu', the rectified linear unit function,  returns f(x) = max(0, x)",'relu'
,"solver  solver: {'lbfgs', 'sgd', 'adam'}, default='adam' The solver for weight optimization. - 'lbfgs' is an optimizer in the family of quasi-Newton methods. - 'sgd' refers to stochastic gradient descent. - 'adam' refers to a stochastic gradient-based optimizer proposed  by Kingma, Diederik, and Jimmy Ba For a comparison between Adam optimizer and SGD, see :ref:`sphx_glr_auto_examples_neural_networks_plot_mlp_training_curves.py`. Note: The default solver 'adam' works pretty well on relatively large datasets (with thousands of training samples or more) in terms of both training time and validation score. For small datasets, however, 'lbfgs' can converge faster and perform better.",'adam'
,"alpha  alpha: float, default=0.0001 Strength of the L2 regularization term. The L2 regularization term is divided by the sample size when added to the loss. For an example usage and visualization of varying regularization, see :ref:`sphx_glr_auto_examples_neural_networks_plot_mlp_alpha.py`.",0.0001
,"batch_size  batch_size: int, default='auto' Size of minibatches for stochastic optimizers. If the solver is 'lbfgs', the classifier will not use minibatch. When set to ""auto"", `batch_size=min(200, n_samples)`.",256
,"learning_rate  learning_rate: {'constant', 'invscaling', 'adaptive'}, default='constant' Learning rate schedule for weight updates. - 'constant' is a constant learning rate given by  'learning_rate_init'. - 'invscaling' gradually decreases the learning rate at each  time step 't' using an inverse scaling exponent of 'power_t'.  effective_learning_rate = learning_rate_init / pow(t, power_t) - 'adaptive' keeps the learning rate constant to  'learning_rate_init' as long as training loss keeps decreasing.  Each time two consecutive epochs fail to decrease training loss by at  least tol, or fail to increase validation score by at least tol if  'early_stopping' is on, the current learning rate is divided by 5. Only used when ``solver='sgd'``.",'constant'
,"learning_rate_init  learning_rate_init: float, default=0.001 The initial learning rate used. It controls the step-size in updating the weights. Only used when solver='sgd' or 'adam'.",0.001
,"power_t  power_t: float, default=0.5 The exponent for inverse scaling learning rate. It is used in updating effective learning rate when the learning_rate is set to 'invscaling'. Only used when solver='sgd'.",0.5
,"max_iter  max_iter: int, default=200 Maximum number of iterations. The solver iterates until convergence (determined by 'tol') or this number of iterations. For stochastic solvers ('sgd', 'adam'), note that this determines the number of epochs (how many times each data point will be used), not the number of gradient steps.",100
,"shuffle  shuffle: bool, default=True Whether to shuffle samples in each iteration. Only used when solver='sgd' or 'adam'.",True


In [None]:
#Train
print("Training MLP Model with RBF Features...")
mlp_model.fit(PX_train, Y_train)

In [18]:
y_pred = mlp_model.predict(PX_val)
accuracy = np.mean(y_pred == Y_val)
print(f"Validation Accuracy: {accuracy * 100:.2f}%")

Validation Accuracy: 99.94%


In [19]:
y_pred = mlp_model.predict(PX_test)
accuracy = np.mean(y_pred == Y_test)
print(f"Test Accuracy: {accuracy * 100:.2f}%")

Test Accuracy: 95.96%


# Testing
Now its important to see how each model performs with data outside the training range. Training rangeonly covers the top 

In [22]:
# generating a dataset with exponents from 6 to 15

exps = np.random.choice(np.arange(6, 60), size=30)

X_new = [
    10**exp for exp in exps
]

X_new = np.array(X_new, dtype=object)
Y_new = [1 if len(str(n)) % 2 == 0 else 0 for n in X_new]


In [23]:
exps

array([10, 34, 51, 31, 26, 16, 24, 19, 23, 49, 50, 48, 48, 20,  9, 48, 39,
       33, 17, 13, 47, 27, 25, 22, 39, 16, 28, 39, 23, 25])

In [24]:
DX_new = X_new.reshape(-1, 1)

In [25]:
PX_new = rbf_features(DX_new, all_centers, gamma=1e-20)
PX_new = scaler.transform(PX_new)

In [39]:
# testing each model on new data
y_pred = gbt_model.predict(DX_new)
accuracy = np.mean(y_pred == Y_new)
print(f"GBT Model New Data Accuracy: {accuracy * 100:.2f}%")
print([(len(str(X_new[i])), Y_new[i], y_pred[i]) for i in range(len(X_new)) if Y_new[i] != y_pred[i]])

y_pred = rf_model.predict(DX_new)
accuracy = np.mean(y_pred == Y_new)
print(f"Random Forest Model New Data Accuracy: {accuracy * 100:.2f}%")
print([(len(str(X_new[i])), Y_new[i], y_pred[i]) for i in range(len(X_new)) if Y_new[i] != y_pred[i]])


y_pred = mlp_model.predict(PX_new)
accuracy = np.mean(y_pred == Y_new)
print(f"MLP Model New Data Accuracy: {accuracy * 100:.2f}%")
print([(len(str(X_new[i])), Y_new[i], y_pred[i]) for i in range(len(X_new)) if Y_new[i] != y_pred[i]])


GBT Model New Data Accuracy: 53.33%
[(19, 0, np.int64(1)), (19, 0, np.int64(1)), (19, 0, np.int64(1)), (19, 0, np.int64(1)), (19, 0, np.int64(1)), (19, 0, np.int64(1)), (19, 0, np.int64(1)), (17, 0, np.int64(1)), (19, 0, np.int64(1)), (11, 0, np.int64(1)), (19, 0, np.int64(1)), (19, 0, np.int64(1)), (11, 0, np.int64(1)), (17, 0, np.int64(1))]
Random Forest Model New Data Accuracy: 16.67%
[(19, 0, np.int64(1)), (19, 0, np.int64(1)), (20, 1, np.int64(0)), (19, 0, np.int64(1)), (20, 1, np.int64(0)), (20, 1, np.int64(0)), (20, 1, np.int64(0)), (19, 0, np.int64(1)), (19, 0, np.int64(1)), (19, 0, np.int64(1)), (20, 1, np.int64(0)), (19, 0, np.int64(1)), (20, 1, np.int64(0)), (20, 1, np.int64(0)), (17, 0, np.int64(1)), (19, 0, np.int64(1)), (11, 0, np.int64(1)), (19, 0, np.int64(1)), (20, 1, np.int64(0)), (20, 1, np.int64(0)), (19, 0, np.int64(1)), (20, 1, np.int64(0)), (11, 0, np.int64(1)), (20, 1, np.int64(0)), (17, 0, np.int64(1))]
MLP Model New Data Accuracy: 60.00%
[(19, 0, np.int64(1)),

## Dataset Type 2: Random ranges - not uniform

In [16]:
X, Y = generate_dataset(n_samples=int(10e5), exp=8, random_ranges=True)

# data splits
split = {"train": 0.7, "val": 0.2, "test": 0.1}
n_samples = X.shape[0]

# split randomly
indices = np.random.permutation(n_samples)

X_train = X[indices[:int(split["train"] * n_samples)]]
X_val = X[indices[int(split["train"] * n_samples):int((split["train"] + split["val"]) * n_samples)]]
X_test = X[indices[int((split["train"] + split["val"]) * n_samples):]]

Y_train = Y[indices[:int(split["train"] * n_samples)]]
Y_val = Y[indices[int(split["train"] * n_samples):int((split["train"] + split["val"]) * n_samples)]]
Y_test = Y[indices[int((split["train"] + split["val"]) * n_samples):]]

In [17]:
DX_train = X_train.reshape(-1, 1)
DX_val = X_val.reshape(-1, 1)
DX_test = X_test.reshape(-1, 1)

PX_train = rbf_features(DX_train, all_centers, gamma=gamma)
PX_val = rbf_features(DX_val, all_centers, gamma=gamma)

In [19]:
# run each model again on this new dataset
PX_train = scaler.fit_transform(PX_train)
PX_val = scaler.transform(PX_val)
#MLP model
mlp_model.fit(PX_train, Y_train)
y_pred = mlp_model.predict(PX_val)
accuracy = np.mean(y_pred == Y_val)
print(f"Validation Accuracy on New Dataset: {accuracy * 100:.2f}%")



Iteration 1, loss = 0.50985911
Validation score: 0.723648
Iteration 2, loss = 0.49258463
Validation score: 0.726397
Iteration 3, loss = 0.48947846
Validation score: 0.725480
Iteration 4, loss = 0.48791877
Validation score: 0.723932
Iteration 5, loss = 0.48742955
Validation score: 0.724880
Iteration 6, loss = 0.48635602
Validation score: 0.725196
Iteration 7, loss = 0.48659391
Validation score: 0.723521
Iteration 8, loss = 0.48522881
Validation score: 0.722099
Iteration 9, loss = 0.48550538
Validation score: 0.725543
Iteration 10, loss = 0.48568131
Validation score: 0.726365
Iteration 11, loss = 0.48554051
Validation score: 0.725164
Iteration 12, loss = 0.48715385
Validation score: 0.722447
Iteration 13, loss = 0.48584210
Validation score: 0.726713
Iteration 14, loss = 0.48621224
Validation score: 0.726744
Iteration 15, loss = 0.48572417
Validation score: 0.724785
Iteration 16, loss = 0.48574683
Validation score: 0.725322
Iteration 17, loss = 0.48502813
Validation score: 0.725543
Iterat

In [20]:

#RF model
rf_model.fit(DX_train, Y_train)
y_pred = rf_model.predict(DX_val)
accuracy = np.mean(y_pred == Y_val)
print(f"Random Forest Validation Accuracy on New Dataset: {accuracy * 100:.2f}%")

Random Forest Validation Accuracy on New Dataset: 100.00%


In [21]:

#GBT model
gbt_model.fit(DX_train, Y_train)
y_pred = gbt_model.predict(DX_val)
accuracy = np.mean(y_pred == Y_val)
print(f"GBT Validation Accuracy on New Dataset: {accuracy * 100:.2f}%")

GBT Validation Accuracy on New Dataset: 99.34%


In [27]:
# testing on unseen data X_new

#MLP model
y_pred = mlp_model.predict(PX_new)
accuracy = np.mean(y_pred == Y_new)
print(f"MLP Model New Data Accuracy on New Dataset: {accuracy * 100:.2f}%")
print([(len(str(X_new[i])), Y_new[i], y_pred[i]) for i in range(len(X_new)) if Y_new[i] != y_pred[i]])

#RF model
y_pred = rf_model.predict(DX_new)
accuracy = np.mean(y_pred == Y_new)
print(f"Random Forest Model New Data Accuracy on New Dataset: {accuracy * 100:.2f}%")
print([(len(str(X_new[i])), Y_new[i], y_pred[i]) for i in range(len(X_new)) if Y_new[i] != y_pred[i]])

#GBT model
y_pred = gbt_model.predict(DX_new)
accuracy = np.mean(y_pred == Y_new)
print(f"GBT Model New Data Accuracy on New Dataset: {accuracy * 100:.2f}%")
print([(len(str(X_new[i])), Y_new[i], y_pred[i]) for i in range(len(X_new)) if Y_new[i] != y_pred[i]])

MLP Model New Data Accuracy on New Dataset: 33.33%
[(19, 0, np.int64(1)), (19, 0, np.int64(1)), (17, 0, np.int64(1)), (19, 0, np.int64(1)), (19, 0, np.int64(1)), (19, 0, np.int64(1)), (19, 0, np.int64(1)), (19, 0, np.int64(1)), (10, 1, np.int64(0)), (19, 0, np.int64(1)), (19, 0, np.int64(1)), (19, 0, np.int64(1)), (19, 0, np.int64(1)), (19, 0, np.int64(1)), (19, 0, np.int64(1)), (19, 0, np.int64(1)), (17, 0, np.int64(1)), (19, 0, np.int64(1)), (19, 0, np.int64(1)), (19, 0, np.int64(1))]
Random Forest Model New Data Accuracy on New Dataset: 20.00%
[(11, 0, np.int64(1)), (19, 0, np.int64(1)), (19, 0, np.int64(1)), (20, 1, np.int64(0)), (20, 1, np.int64(0)), (17, 0, np.int64(1)), (19, 0, np.int64(1)), (20, 1, np.int64(0)), (20, 1, np.int64(0)), (19, 0, np.int64(1)), (19, 0, np.int64(1)), (19, 0, np.int64(1)), (19, 0, np.int64(1)), (19, 0, np.int64(1)), (19, 0, np.int64(1)), (19, 0, np.int64(1)), (20, 1, np.int64(0)), (19, 0, np.int64(1)), (19, 0, np.int64(1)), (19, 0, np.int64(1)), (17, 0