# Assignment: Machine Learning for Genomic Data. 

Task: Apply machine learning algorithms, such as random 
forests or support vector machines, to classify genomic data based on specific features or markers.
 Deliverable: A
comprehensive analysis report presenting the classification results, model performance evaluation, and insights 
into the predictive features.  

In [4]:
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report

In [5]:
# Load the dataset (replace 'your_dataset.csv' with the actual dataset file name)
data = pd.read_csv("heart.csv")

In [6]:
# Split the data into features (X) and the target variable (y)
X = data.drop("target", axis=1)  # Features
y = data["target"]  # Target variable

In [7]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [8]:
# Initialize the Random Forest Classifier
clf = RandomForestClassifier(n_estimators=100, random_state=0)

In [9]:
# Train the model
clf.fit(X_train, y_train)

In [10]:
# Make predictions on the test set
y_pred = clf.predict(X_test)

In [11]:
# Evaluate the model's performance
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

In [12]:
# Output the evaluation metrics
print(f"Accuracy: {accuracy:.2f}")
print("Classification Report:\n", report)

Accuracy: 0.85
Classification Report:
               precision    recall  f1-score   support

           0       0.86      0.83      0.84        29
           1       0.85      0.88      0.86        32

    accuracy                           0.85        61
   macro avg       0.85      0.85      0.85        61
weighted avg       0.85      0.85      0.85        61



In [13]:
# Import necessary libraries for SVM
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report

In [14]:
# Initialize the Support Vector Classifier (SVC)
svm_clf = SVC(kernel='linear', random_state=0)

In [15]:
# Train the SVM model
svm_clf.fit(X_train, y_train)

In [16]:
# Make predictions on the test set using the SVM model
y_pred_svm = svm_clf.predict(X_test)

In [17]:
# Evaluate the SVM model's performance
accuracy_svm = accuracy_score(y_test, y_pred_svm)
report_svm = classification_report(y_test, y_pred_svm)

In [18]:
# Output the evaluation metrics for the SVM model
print(f"SVM Accuracy: {accuracy_svm:.2f}")
print("SVM Classification Report:\n", report_svm)

SVM Accuracy: 0.87
SVM Classification Report:
               precision    recall  f1-score   support

           0       0.86      0.86      0.86        29
           1       0.88      0.88      0.88        32

    accuracy                           0.87        61
   macro avg       0.87      0.87      0.87        61
weighted avg       0.87      0.87      0.87        61



In [19]:
# Compare Random Forest and SVM
print(f"Random Forest Accuracy: {accuracy:.2f}")
print("Random Forest Classification Report:\n", report)

Random Forest Accuracy: 0.85
Random Forest Classification Report:
               precision    recall  f1-score   support

           0       0.86      0.83      0.84        29
           1       0.85      0.88      0.86        32

    accuracy                           0.85        61
   macro avg       0.85      0.85      0.85        61
weighted avg       0.85      0.85      0.85        61



In [20]:
# Compare accuracies
print(f"Accuracy Comparison: Random Forest = {accuracy:.2f}, SVM = {accuracy_svm:.2f}")

Accuracy Comparison: Random Forest = 0.85, SVM = 0.87
