## ARTIFICIAL NEURAL NETWORKS

### Tasks :

#### 1. Data Exploration and Preprocessing

●	Begin by loading and exploring the "Alphabets_data.csv" dataset. Summarize its key features such as the number of samples, features, and classes.

●	Execute necessary data preprocessing steps including data normalization, managing missing values.

#### 2. Model Implementation

●	Construct a basic ANN model using your chosen high-level neural network library. Ensure your model includes at least one hidden layer.

●	Divide the dataset into training and test sets.

●	Train your model on the training set and then use it to make predictions on the test


#### 3. Hyperparameter Tuning

●	Modify various hyperparameters, such as the number of hidden layers, neurons per hidden layer, activation functions, and learning rate, to observe their impact on model performance.

●	Adopt a structured approach like grid search or random search for hyperparameter tuning, documenting your methodology thoroughly.

#### 4. Evaluation

●	Employ suitable metrics such as accuracy, precision, recall, and F1-score to evaluate your model's performance.

●	Discuss the performance differences between the model with default hyperparameters and the tuned model, emphasizing the effects of hyperparameter tuning.


In [1]:
import pandas as pd

# Load the dataset
data = pd.read_csv('Alphabets_data.csv')

# Display basic information about the dataset
print("Number of samples:", data.shape[0])
print("Number of features:", data.shape[1])
print("\nColumns:")
print(data.columns)
print("\nData types:")
print(data.dtypes)
print("\nFirst few rows of the dataset:")
print(data.head())


Number of samples: 20000
Number of features: 17

Columns:
Index(['letter', 'xbox', 'ybox', 'width', 'height', 'onpix', 'xbar', 'ybar',
       'x2bar', 'y2bar', 'xybar', 'x2ybar', 'xy2bar', 'xedge', 'xedgey',
       'yedge', 'yedgex'],
      dtype='object')

Data types:
letter    object
xbox       int64
ybox       int64
width      int64
height     int64
onpix      int64
xbar       int64
ybar       int64
x2bar      int64
y2bar      int64
xybar      int64
x2ybar     int64
xy2bar     int64
xedge      int64
xedgey     int64
yedge      int64
yedgex     int64
dtype: object

First few rows of the dataset:
  letter  xbox  ybox  width  height  onpix  xbar  ybar  x2bar  y2bar  xybar  \
0      T     2     8      3       5      1     8    13      0      6      6   
1      I     5    12      3       7      2    10     5      5      4     13   
2      D     4    11      6       8      6    10     6      2      6     10   
3      N     7    11      6       6      3     5     9      4      6      4   


In [2]:
# Display the first few rows of the dataset
print(data.head())

# Display summary information about the dataset
print(data.info())

# Display summary statistics of the dataset
print(data.describe())


  letter  xbox  ybox  width  height  onpix  xbar  ybar  x2bar  y2bar  xybar  \
0      T     2     8      3       5      1     8    13      0      6      6   
1      I     5    12      3       7      2    10     5      5      4     13   
2      D     4    11      6       8      6    10     6      2      6     10   
3      N     7    11      6       6      3     5     9      4      6      4   
4      G     2     1      3       1      1     8     6      6      6      6   

   x2ybar  xy2bar  xedge  xedgey  yedge  yedgex  
0      10       8      0       8      0       8  
1       3       9      2       8      4      10  
2       3       7      3       7      3       9  
3       4      10      6      10      2       8  
4       5       9      1       7      5      10  
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20000 entries, 0 to 19999
Data columns (total 17 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   letter  20000 non-null  object
 1   xbo

In [3]:
# Check for missing values
print(data.isnull().sum())

# Option 1: Drop rows with missing values
data.dropna(inplace=True)

# Option 2: Impute missing values (e.g., with mean for numerical columns)
data.fillna(data.mean(), inplace=True)


letter    0
xbox      0
ybox      0
width     0
height    0
onpix     0
xbar      0
ybar      0
x2bar     0
y2bar     0
xybar     0
x2ybar    0
xy2bar    0
xedge     0
xedgey    0
yedge     0
yedgex    0
dtype: int64


  data.fillna(data.mean(), inplace=True)


In [6]:
from sklearn.preprocessing import MinMaxScaler

# Identify numerical columns
numerical_columns = data.select_dtypes(include=['float64', 'int64']).columns

# Initialize the scaler
scaler = MinMaxScaler()

# Normalize the numerical columns
data[numerical_columns] = scaler.fit_transform(data[numerical_columns])

# Display the first few rows of the normalized dataset
print(data.head())


  letter      xbox      ybox  width    height     onpix      xbar      ybar  \
0      T  0.133333  0.533333    0.2  0.333333  0.066667  0.533333  0.866667   
1      I  0.333333  0.800000    0.2  0.466667  0.133333  0.666667  0.333333   
2      D  0.266667  0.733333    0.4  0.533333  0.400000  0.666667  0.400000   
3      N  0.466667  0.733333    0.4  0.400000  0.200000  0.333333  0.600000   
4      G  0.133333  0.066667    0.2  0.066667  0.066667  0.533333  0.400000   

      x2bar     y2bar     xybar    x2ybar    xy2bar     xedge    xedgey  \
0  0.000000  0.400000  0.400000  0.666667  0.533333  0.000000  0.533333   
1  0.333333  0.266667  0.866667  0.200000  0.600000  0.133333  0.533333   
2  0.133333  0.400000  0.666667  0.200000  0.466667  0.200000  0.466667   
3  0.266667  0.400000  0.266667  0.266667  0.666667  0.400000  0.666667   
4  0.400000  0.400000  0.400000  0.333333  0.600000  0.066667  0.466667   

      yedge    yedgex  
0  0.000000  0.533333  
1  0.266667  0.666667  
2 

In [17]:
# Identify non-numeric columns
non_numeric_columns = data.select_dtypes(exclude=['float64', 'int64']).columns

# One-Hot Encode non-numeric columns
data = pd.get_dummies(data, columns=non_numeric_columns, drop_first=True)


In [19]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Load the dataset
data = pd.read_csv('Alphabets_data.csv')

# Identify non-numeric columns
non_numeric_columns = data.select_dtypes(exclude=['float64', 'int64']).columns

# One-Hot Encode non-numeric columns
data = pd.get_dummies(data, columns=non_numeric_columns, drop_first=True)

# Split the data into features (X) and target (y)
X = data.drop('xbox', axis=1)  # Replace 'target_column' with your actual target column name
y = data['xbox']

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Normalize the feature data
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Initialize the ANN model
model = Sequential()

# Add the input layer and the first hidden layer
model.add(Dense(units=64, activation='relu', input_dim=X_train.shape[1]))

# Add the second hidden layer (optional)
model.add(Dense(units=32, activation='relu'))

# Add the output layer
model.add(Dense(units=1, activation='sigmoid'))

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
history = model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.2)

# Evaluate the model on the test set
loss, accuracy = model.evaluate(X_test, y_test)
print(f'Test Loss: {loss}')
print(f'Test Accuracy: {accuracy}')


Epoch 1/50


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step - accuracy: 0.0638 - loss: -630.9511 - val_accuracy: 0.0591 - val_loss: -10911.1152
Epoch 2/50
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.0641 - loss: -24959.1836 - val_accuracy: 0.0591 - val_loss: -93030.9766
Epoch 3/50
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.0633 - loss: -137149.5156 - val_accuracy: 0.0591 - val_loss: -300665.5312
Epoch 4/50
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.0689 - loss: -382298.1875 - val_accuracy: 0.0591 - val_loss: -670194.5000
Epoch 5/50
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.0659 - loss: -798359.9375 - val_accuracy: 0.0591 - val_loss: -1226792.5000
Epoch 6/50
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.0630 - loss: -1418292.7500 - val_accuracy: 0.0

Epoch 47/50
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.0608 - loss: -325665248.0000 - val_accuracy: 0.0591 - val_loss: -335972832.0000
Epoch 48/50
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.0619 - loss: -345169024.0000 - val_accuracy: 0.0591 - val_loss: -354591040.0000
Epoch 49/50
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.0688 - loss: -363007040.0000 - val_accuracy: 0.0591 - val_loss: -373874656.0000
Epoch 50/50
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.0667 - loss: -384204448.0000 - val_accuracy: 0.0591 - val_loss: -393827584.0000
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.0603 - loss: -389917312.0000
Test Loss: -397441760.0
Test Accuracy: 0.05999999865889549


In [21]:
!pip install tensorflow



In [27]:
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier


ModuleNotFoundError: No module named 'tensorflow.keras.wrappers'

In [23]:
# Define the create_model function with hyperparameters
def create_model(hidden_layers=1, neurons_per_layer=64, activation='relu', learning_rate=0.001):
    model = Sequential()
    model.add(Dense(neurons_per_layer, activation=activation, input_dim=X_train.shape[1]))
    for _ in range(hidden_layers - 1):
        model.add(Dense(neurons_per_layer, activation=activation))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(optimizer=Adam(learning_rate=learning_rate), loss='binary_crossentropy', metrics=['accuracy'])
    return model


In [24]:
# Create a KerasClassifier based on the create_model function
model = KerasClassifier(build_fn=create_model, verbose=0)

# Define the hyperparameters grid
param_grid = {
    'hidden_layers': [1, 2, 3],
    'neurons_per_layer': [32, 64, 128],
    'activation': ['relu', 'sigmoid'],
    'learning_rate': [0.001, 0.01, 0.1]
}


NameError: name 'KerasClassifier' is not defined

In [25]:
# Perform grid search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X_train, y_train)


NameError: name 'param_grid' is not defined

In [26]:

# Print the best hyperparameters found
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))


NameError: name 'grid_result' is not defined

In [33]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import LabelEncoder, OneHotEncoder

# Load the dataset
data = pd.read_csv('Alphabets_data.csv')

# Preprocess the data
label_encoder = LabelEncoder()
data['xbox'] = label_encoder.fit_transform(data['xbox'])

X = data.drop('xbox', axis=1)
X = pd.get_dummies(X)  # One-hot encoding for categorical variables
y = data['xbox']

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a Random Forest classifier with default hyperparameters
rf = RandomForestClassifier(random_state=42)

# Train the model
rf.fit(X_train, y_train)

# Evaluate the model on the test set
y_pred = rf.predict(X_test)
accuracy_default = accuracy_score(y_test, y_pred)
precision_default = precision_score(y_test, y_pred, average='macro')
recall_default = recall_score(y_test, y_pred, average='macro')
f1_default = f1_score(y_test, y_pred, average='macro')

# Perform hyperparameter tuning using GridSearchCV
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20],
    'min_samples_split': [2, 5, 10]
}
grid_search = GridSearchCV(estimator=rf, param_grid=param_grid, cv=5, n_jobs=-1)
grid_search.fit(X_train, y_train)

# Train the tuned model
best_rf = grid_search.best_estimator_
best_rf.fit(X_train, y_train)

# Evaluate the tuned model on the test set
y_pred_tuned = best_rf.predict(X_test)
accuracy_tuned = accuracy_score(y_test, y_pred_tuned)
precision_tuned = precision_score(y_test, y_pred_tuned, average='macro')
recall_tuned = recall_score(y_test, y_pred_tuned, average='macro')
f1_tuned = f1_score(y_test, y_pred_tuned, average='macro')

# Print the performance metrics
print("Default model:")
print(f"Accuracy: {accuracy_default}, Precision: {precision_default}, Recall: {recall_default}, F1-score: {f1_default}")

print("\nTuned model:")
print(f"Accuracy: {accuracy_tuned}, Precision: {precision_tuned}, Recall: {recall_tuned}, F1-score: {f1_tuned}")


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Default model:
Accuracy: 0.74825, Precision: 0.5933095529909457, Recall: 0.5436640155436755, F1-score: 0.5630152292607081

Tuned model:
Accuracy: 0.75425, Precision: 0.6021466618240614, Recall: 0.5451056918317714, F1-score: 0.5680857162466465


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
