The data and variables names are in different files; you will likely need them both. The goal here is to predict the age of the abalone using the other variables in the dataset because the traditional method for aging these organisms is boring and tedious.

There are two challenges (in my opinion):

1. You should try to build the best, bagging-based model (this includes random forests) to predict age.

2. The UC Irvine Machine Learning Repository classifies this dataset as a "classification" dataset, but age is stored as a numeric (albeit discrete-valued) variable. So, I think it could maybe be reasonable to treat this as a regression problem. It's up to you!

In [7]:
pip install ucimlrepo

Collecting ucimlrepo
  Downloading ucimlrepo-0.0.7-py3-none-any.whl.metadata (5.5 kB)
Collecting pandas>=1.0.0 (from ucimlrepo)
  Downloading pandas-2.2.3-cp312-cp312-win_amd64.whl.metadata (19 kB)
Collecting certifi>=2020.12.5 (from ucimlrepo)
  Downloading certifi-2025.1.31-py3-none-any.whl.metadata (2.5 kB)
Collecting numpy>=1.26.0 (from pandas>=1.0.0->ucimlrepo)
  Downloading numpy-2.2.4-cp312-cp312-win_amd64.whl.metadata (60 kB)
Collecting pytz>=2020.1 (from pandas>=1.0.0->ucimlrepo)
  Downloading pytz-2025.2-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.7 (from pandas>=1.0.0->ucimlrepo)
  Downloading tzdata-2025.2-py2.py3-none-any.whl.metadata (1.4 kB)
Downloading ucimlrepo-0.0.7-py3-none-any.whl (8.0 kB)
Downloading certifi-2025.1.31-py3-none-any.whl (166 kB)
Downloading pandas-2.2.3-cp312-cp312-win_amd64.whl (11.5 MB)
   ---------------------------------------- 0.0/11.5 MB ? eta -:--:--
   -- ------------------------------------- 0.8/11.5 MB 16.9 MB/s eta 0:00:0


[notice] A new release of pip is available: 24.2 -> 25.0.1
[notice] To update, run: C:\Users\achur\AppData\Local\Programs\Python\Python312\python.exe -m pip install --upgrade pip


In [20]:
# install package

from ucimlrepo import fetch_ucirepo 
import pandas as pd
import numpy as np
  
# fetch dataset 
abalone = fetch_ucirepo(id=1) 
  
# data (as pandas dataframes) 
X = abalone.data.features 
y = abalone.data.targets 
  
# metadata 
print(abalone.metadata) 
  
# variable information 
print(abalone.variables) 

X = pd.get_dummies(X)


{'uci_id': 1, 'name': 'Abalone', 'repository_url': 'https://archive.ics.uci.edu/dataset/1/abalone', 'data_url': 'https://archive.ics.uci.edu/static/public/1/data.csv', 'abstract': 'Predict the age of abalone from physical measurements', 'area': 'Biology', 'tasks': ['Classification', 'Regression'], 'characteristics': ['Tabular'], 'num_instances': 4177, 'num_features': 8, 'feature_types': ['Categorical', 'Integer', 'Real'], 'demographics': [], 'target_col': ['Rings'], 'index_col': None, 'has_missing_values': 'no', 'missing_values_symbol': None, 'year_of_dataset_creation': 1994, 'last_updated': 'Mon Aug 28 2023', 'dataset_doi': '10.24432/C55C7W', 'creators': ['Warwick Nash', 'Tracy Sellers', 'Simon Talbot', 'Andrew Cawthorn', 'Wes Ford'], 'intro_paper': None, 'additional_info': {'summary': 'Predicting the age of abalone from physical measurements.  The age of abalone is determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope -- 

## Bagging model:

In [12]:
pip install scikit-learn

Note: you may need to restart the kernel to use updated packages.Collecting scikit-learn
  Downloading scikit_learn-1.6.1-cp312-cp312-win_amd64.whl.metadata (15 kB)
Collecting scipy>=1.6.0 (from scikit-learn)
  Downloading scipy-1.15.2-cp312-cp312-win_amd64.whl.metadata (60 kB)
Collecting joblib>=1.2.0 (from scikit-learn)
  Downloading joblib-1.4.2-py3-none-any.whl.metadata (5.4 kB)
Collecting threadpoolctl>=3.1.0 (from scikit-learn)
  Downloading threadpoolctl-3.6.0-py3-none-any.whl.metadata (13 kB)
Downloading scikit_learn-1.6.1-cp312-cp312-win_amd64.whl (11.1 MB)
   ---------------------------------------- 0.0/11.1 MB ? eta -:--:--
   ----------------- ---------------------- 5.0/11.1 MB 23.2 MB/s eta 0:00:01
   --------------------------------- ------ 9.4/11.1 MB 22.6 MB/s eta 0:00:01
   ------------------------------------ --- 10.2/11.1 MB 19.9 MB/s eta 0:00:01
   ------------------------------------- -- 10.5/11.1 MB 14.9 MB/s eta 0:00:01
   --------------------------------------- 


[notice] A new release of pip is available: 24.2 -> 25.0.1
[notice] To update, run: C:\Users\achur\AppData\Local\Programs\Python\Python312\python.exe -m pip install --upgrade pip


In [13]:
from numpy import mean
from numpy import std
from sklearn.datasets import make_regression
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedKFold
from sklearn.ensemble import BaggingRegressor

In [22]:

model = BaggingRegressor(n_estimators = 50)
# evaluate the model
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
n_scores = cross_val_score(model, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1, error_score='raise')
# report performance
print('MAE: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))

MAE: -1.553 (0.072)


In [26]:
# make predictions
model.fit(X,y)
predictions = model.predict(X)
avg_predictions = mean(predictions)
print(avg_predictions)

9.942312664591812


The average age of the abalone is 9.942.