In [1]:
!pip install catboost

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [2]:
!pip install sktime[all_extras]

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [3]:
import pandas as pd
from catboost import CatBoostClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, f1_score
from sktime.transformations.panel.rocket import Rocket

The sktime PyPI package is deprecated.
The sktime project split into two projects.

To find out how to install the new packages, please go to:

* https://github.com/aeon-toolkit/aeon
* https://github.com/sktime/sktime

Here is what you can do when installing sktime via pip (e.g. using `pip install ...` or a requirement file like `requirements.txt`, `setup.py`, `setup.cfg`):

* replace sktime with one of the new projects,
* if the sktime package is used by one of your dependencies, it would be great if you take some time to track which package uses sktime and report to their issue tracker that sktime is deprecated.

More information is available at:
https://github.com/mloning/sktime-deprecation/discussions/2

If the previous advice does not support your use case, feel free to report it at:
https://github.com/mloning/sktime-deprecation/issues/new



First we open our datasets and try common catboost classifier and random forest

In [4]:
train_df = pd.read_csv('Ham_TRAIN.txt', sep='\s+', skipinitialspace=True, header=None)
test_df = pd.read_csv('Ham_TEST.txt', sep='\s+', skipinitialspace=True, header=None)

X_train = train_df.iloc[:, 1:433]
y_train = train_df.iloc[:, 0]

X_test = test_df.iloc[:, 1:433]
y_test = test_df.iloc[:, 0]

In [5]:
# common catboost

model = CatBoostClassifier(iterations=100, random_state=42)

model.fit(X_train, y_train, verbose=False)

y_pred = model.predict(X_test)

f1 = f1_score(y_test, y_pred)
accuracy = accuracy_score(y_test, y_pred)

print("F1 Score:", f1)
print("Accuracy:", accuracy)


F1 Score: 0.673469387755102
Accuracy: 0.6952380952380952


In [6]:
# random forest

model = RandomForestClassifier(n_estimators=100, random_state=42)

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

f1 = f1_score(y_test, y_pred)
accuracy = accuracy_score(y_test, y_pred)

print("F1 Score:", f1)
print("Accuracy:", accuracy)

F1 Score: 0.7184466019417477
Accuracy: 0.7238095238095238


Now try implementation of ROCKET from sktime

In [7]:
def create_x_y_for_rocket(name):
    '''
    additional function for opening our txt 
    file in necessary for sktime rocket format 
    '''

    with open(name, "r") as file:
        content = file.read()

    content = content.strip()
    data = content.split()
    data = [float(x) for x in data]
    class_labels = [data[i] for i in range(0, len(data), 432)]
    feature_values = [pd.Series(data[i+1:i+432]) for i in range(0, len(data), 432)]

    y = pd.DataFrame({'Class': class_labels})
    X = pd.DataFrame({'dim_0': feature_values})
    return X, y

In [8]:
X_train, y_train = create_x_y_for_rocket('Ham_TRAIN.txt')
X_test, y_test = create_x_y_for_rocket('Ham_TEST.txt')

In [9]:
rocket = Rocket()
rocket.fit(X_train)
X_train_transform = rocket.transform(X_train)
X_test_transform = rocket.transform(X_test)

In [10]:
# catboost rocket

model = CatBoostClassifier(iterations=100, random_state=42)

model.fit(X_train_transform, y_train, verbose=False)

y_pred = model.predict(X_test_transform)

f1 = f1_score(y_test, y_pred)
accuracy = accuracy_score(y_test, y_pred)

print("F1 Score:", f1)
print("Accuracy:", accuracy)

F1 Score: 0.7474747474747475
Accuracy: 0.7619047619047619


In [11]:
# random forest rocket

model = RandomForestClassifier(n_estimators=100, random_state=42)

model.fit(X_train_transform, y_train.values.ravel())

y_pred = model.predict(X_test_transform)

f1 = f1_score(y_test, y_pred)
accuracy = accuracy_score(y_test, y_pred)

print("F1 Score:", f1)
print("Accuracy:", accuracy)

F1 Score: 0.7547169811320754
Accuracy: 0.7523809523809524
