# Abalone UCI
### Predicting the age of abalone from physical measurements.  

***

## Contexto

El abulón es el nombre común de un grupo de caracoles marinos de pequeño a gran tamaño, que se encuentran habitualmente en las costas de todo el mundo y se utilizan como manjar en las cocinas. Su concha sobrante se utiliza en joyería debido a su brillo iridiscente. 

Debido a su demanda y valor económico, a menudo se recolecta en granjas, por lo que es necesario predecir la edad del abalón a partir de medidas físicas. El método tradicional para determinar su edad consiste en cortar la concha a través del cono, teñirla y contar el número de anillos a través de un microscopio, una tarea aburrida y que requiere mucho tiempo.


<img src="abalone_rings.jpg" width="500" />

Given *data about abalone*, let's try to predict **multiple attributes** of a given organism.  
  
We will use linear regression and logistic regression models to make our predictions.

## Datos

Número de instancias: 4177

Número de atributos: 8

Objetivo: Anillos (Rings)

Note: Number of rings is the value to predict: either as a continuous value or it can be converted to classification problem.

<img src="abalone_partes.jpg" width="500" />

| Atributo | Tipo de datos| Unidades | Descripción |
| :----: | :----: | :----: | :----: |
| Sex | nominal | - | M, F, and I (infant) | 
| Length | continuous | mm | Longest shell measurement | 
| Diameter | continuous | mm | perpendicular to length | 
| Height | continuous | mm | with meat in shell | 
| Whole weight | continuous | grams | whole abalone | 
| Shucked weight | continuous | grams | weight of meat |
| Viscera weight | continuous | grams | gut weight (after bleeding) |
| Shell weight | continuous | grams | after being dried |
| Rings | integer | - | +1.5 gives the age in years | 



## Primeros pasos

In [3]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O

#--------------------------
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

from sklearn.linear_model import LinearRegression, LogisticRegression
#---------------------------t

In [4]:
data = pd.read_csv('abalone_original.csv')

In [5]:
data

Unnamed: 0,sex,length,diameter,height,whole-weight,shucked-weight,viscera-weight,shell-weight,rings
0,M,91,73,19,102.8,44.9,20.2,30.0,15
1,M,70,53,18,45.1,19.9,9.7,14.0,7
2,F,106,84,27,135.4,51.3,28.3,42.0,9
3,M,88,73,25,103.2,43.1,22.8,31.0,10
4,I,66,51,16,41.0,17.9,7.9,11.0,7
...,...,...,...,...,...,...,...,...,...
4172,F,113,90,33,177.4,74.0,47.8,49.8,11
4173,M,118,88,27,193.2,87.8,42.9,52.1,10
4174,M,120,95,41,235.2,105.1,57.5,61.6,9
4175,F,125,97,30,218.9,106.2,52.2,59.2,10


In [6]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4177 entries, 0 to 4176
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   sex             4177 non-null   object 
 1   length          4177 non-null   int64  
 2   diameter        4177 non-null   int64  
 3   height          4177 non-null   int64  
 4   whole-weight    4177 non-null   float64
 5   shucked-weight  4177 non-null   float64
 6   viscera-weight  4177 non-null   float64
 7   shell-weight    4177 non-null   float64
 8   rings           4177 non-null   int64  
dtypes: float64(4), int64(4), object(1)
memory usage: 293.8+ KB


# Preprocessing + Training Function

In [None]:
def preprocess_and_train(df, target, task):
    df = df.copy()
    
    # If the sex column is not the target, one-hot encode it
    if target != 'sex':
        dummies = pd.get_dummies(df['sex'])
        df = pd.concat([df, dummies], axis=1)
        df = df.drop('sex', axis=1)
    
    # Split target from df
    y = df[target].copy()
    X = df.drop(target, axis=1).copy()
    
    # Train-test split
    X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, random_state=1)
    
    # Scale X
    scaler = StandardScaler()
    scaler.fit(X_train)
    X_train = pd.DataFrame(scaler.transform(X_train), columns=X.columns)
    X_test = pd.DataFrame(scaler.transform(X_test), columns=X.columns)
    
    # Define model
    if task == 'regression':
        model = LinearRegression()
    elif task == 'classification':
        model = LogisticRegression()
    
    # Fit model to train set
    model.fit(X_train, y_train)
    
    # Return the test results
    return model.score(X_test, y_test)

# Predicting Sex Column

In [None]:
data

In [None]:
results = preprocess_and_train(data, target='sex', task='classification')

print("Sex Classification Accuracy: {:.2f}%".format(results * 100))

In [None]:
# Predicting Length Column

In [None]:
data

In [None]:
results = preprocess_and_train(data, target='length', task='regression')

print("Length Regression R^2: {:.4f}".format(results))

# Predicting Diameter Column

In [None]:
data

In [None]:
results = preprocess_and_train(data, target='diameter', task='regression')

print("Diameter Regression R^2: {:.4f}".format(results))

# Predicting Height Column

In [None]:
data

In [None]:
results = preprocess_and_train(data, target='height', task='regression')

print("Height Regression R^2: {:.4f}".format(results))

# Predicting Whole-Weight Column

In [None]:
data

In [None]:
results = preprocess_and_train(data, target='whole-weight', task='regression')

print("Whole-Weight Regression R^2: {:.4f}".format(results))

# Predicting Shucked-Weight Column

In [None]:
data

In [None]:
results = preprocess_and_train(data, target='shucked-weight', task='regression')

print("Shucked-Weight Regression R^2: {:.4f}".format(results))

# Predicting Viscera-Weight Column

In [None]:
data

In [None]:
results = preprocess_and_train(data, target='viscera-weight', task='regression')

print("Viscera-Weight Regression R^2: {:.4f}".format(results))

# Predicting Shell-Weight Column

In [None]:
data

In [None]:
results = preprocess_and_train(data, target='shell-weight', task='regression')

print("Shell-Weight Regression R^2: {:.4f}".format(results))

# Predicting Rings Column

In [None]:
data

In [None]:
results = preprocess_and_train(data, target='rings', task='regression')

print("Rings Regression R^2: {:.4f}".format(results))

In [None]:
results = preprocess_and_train(data, target='rings', task='classification')

print("Rings Classification Accuracy: {:.2f}%".format(results * 100))

# Bibliografía

***

El link principal de referencia de Kaggle es: https://www.kaggle.com/hurshd0/abalone-uci

El Dataset fue sacado originalmente del repositorio de UCI Machine Learning: https://archive.ics.uci.edu/ml/datasets/Abalone
