. The drive.mount('/content/drive') command establishes a connection between the Colab notebook and the user's Google Drive, making it possible to access and manipulate files stored on Google Drive directly within the Colab environment.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


This Python script imports essential libraries for data manipulation and machine learning:

Reads CSV files using pandas.
Utilizes NumPy for numerical operations.
Implements train-test splitting with scikit-learn.
Aliases libraries for convenience.
Sets a random seed for reproducibility using NumPy.






In [None]:
#importing packages
from pandas import read_csv
import numpy as np
from sklearn.model_selection import train_test_split
import pandas as pd
np.random.seed(42)

to read the file from drive

In [None]:
import pandas as pd
Data=pd.read_csv('/content/drive/MyDrive/dataset/cancer.csv')

This code snippet performs a train-test split on a dataset, separating features and the target variable. It then prints the shapes of the training and testing sets. The split reserves 10% of the data for testing and uses 90% for training.







In [None]:
#Train Test split

X = Data.drop("results",axis=1)
y=Data["results"]
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.1)
print('Train',X_train.shape,y_train.shape)
print('Test',X_test.shape,y_test.shape)

Train (512, 31) (512,)
Test (57, 31) (57,)


In [None]:
#pre-processing
from sklearn.preprocessing import StandardScaler
sc=StandardScaler()
X_train=sc.fit_transform(X_train)
X_test=sc.transform(X_test)


#model building
from sklearn.naive_bayes import GaussianNB
NBclassif=GaussianNB()
NBclassif.fit(X_train,y_train)


In [None]:
#model building
from sklearn.naive_bayes import GaussianNB
NBclassif=GaussianNB()
NBclassif.fit(X_train,y_train)


In [None]:
#prediction
ypred=NBclassif.predict(X_test)

#performance evaluation
from sklearn.metrics import confusion_matrix,accuracy_score
cm=confusion_matrix(y_test,ypred)
acc=accuracy_score(y_test,ypred)

In [None]:
#performance evaluation
from sklearn.metrics import confusion_matrix,accuracy_score
cm=confusion_matrix(y_test,ypred)
acc=accuracy_score(y_test,ypred)

In [None]:
print(cm,acc)

[[39  1]
 [ 1 16]] 0.9649122807017544


In [None]:
from sklearn.metrics import classification_report
classy_rep=classification_report(y_test,ypred)
print(classy_rep)

              precision    recall  f1-score   support

           0       0.97      0.97      0.97        40
           1       0.94      0.94      0.94        17

    accuracy                           0.96        57
   macro avg       0.96      0.96      0.96        57
weighted avg       0.96      0.96      0.96        57



In [None]:
#hyperparameter tuning
import numpy as np

In [None]:
np.logspace(0,-9,10)

array([1.e+00, 1.e-01, 1.e-02, 1.e-03, 1.e-04, 1.e-05, 1.e-06, 1.e-07,
       1.e-08, 1.e-09])

This code sets up a cross-validation strategy using Repeated Stratified K-Fold with 5 splits, 3 repeats, and a random seed of 1.

In [None]:
from sklearn.model_selection import RepeatedStratifiedKFold
cv=RepeatedStratifiedKFold(n_splits=5,n_repeats=3,random_state=1)

This code snippet imports the PowerTransformer for feature transformation and GridSearchCV for hyperparameter tuning using cross-validation in scikit-learn.

In [None]:
from sklearn.preprocessing import PowerTransformer
from sklearn.model_selection import GridSearchCV




This code defines a grid search for hyperparameter tuning of a Naive Bayes classifier (`NBclassif`). It explores various values of 'var_smoothing' using cross-validation, with accuracy as the evaluation metric. Results are stored in `grid_NB`.

In [None]:
grid_param={'var_smoothing':np.logspace(0,-9,100)}
grid_NB=GridSearchCV(estimator=NBclassif, param_grid=grid_param,cv=cv, verbose=1, scoring='accuracy')


This code applies the PowerTransformer to transform the features of the test data (`X_test`) and then fits a Naive Bayes classifier (`grid_NB`) using the transformed data and corresponding target labels (`y_test`).

In [None]:
data_trans=PowerTransformer().fit_transform(X_test)
grid_NB.fit(data_trans,y_test)


Fitting 15 folds for each of 100 candidates, totalling 1500 fits


In [None]:
grid_NB.best_score_

0.9707070707070706

In [None]:
grid_NB.best_params_

{'var_smoothing': 0.657933224657568}

This code uses the trained Naive Bayes classifier (`grid_NB`) to make predictions on the test data (`X_test`), and the predictions are stored in the variable `ypred`.

In [None]:
ypred=grid_NB.predict(X_test)


This code calculates the confusion matrix (`cm`) and accuracy score (`acc`) based on the actual labels (`y_test`) and the predicted labels (`ypred`) obtained from the Naive Bayes classifier.

In [None]:
cm=confusion_matrix(y_test,ypred)
acc=accuracy_score(y_test,ypred)

In [None]:
print(cm)
print(acc)

[[40  0]
 [ 2 15]]
0.9649122807017544


This code generates a classification report (`cr`) using the actual labels (`y_test`) and predicted labels (`ypred`) from a classification model. The report includes precision, recall, F1-score, and support for each class. The result is then printed.

In [None]:
cr=classification_report(y_test,ypred)
print(cr)

              precision    recall  f1-score   support

           0       0.95      1.00      0.98        40
           1       1.00      0.88      0.94        17

    accuracy                           0.96        57
   macro avg       0.98      0.94      0.96        57
weighted avg       0.97      0.96      0.96        57

