<h1>Machine Learning Algorithms in Dataset Analysis: A Comprehensive Report
</h1>

1. **Abstract:**
   This project aims to apply machine learning algorithms to a chosen dataset, evaluating and comparing their performance. The report summarizes the project's goal, methodologies, and key findings.

2. **Introduction:**
   Placing the project in a machine learning context, this section amplifies the project's aim and relevance. The chosen datasets, including standard ones like Iris and Wine, are introduced. Particular challenges or issues with the dataset and prior investigations are discussed.

3. **Background:**
   In this section, the operational principles of employed machine learning algorithms are explained. Clear demonstrations of understanding the models used, such as kNN, decision trees, linear regression, gradient descent, polynomial regression, Bayesian classification, k-means, and PCA, will be reflected in the evaluation.

4. **Methodology:**
   The methodology outlines how the dataset was explored, detailing any algorithm modifications. Techniques like cross-validation are justified, showcasing an understanding of why specific approaches were chosen.

5. **Results:**
   Clear and concise presentation of results, preferably in tabular form, will be provided. Cross-referencing with the experiments described in the Methodology section will enhance clarity.

6. **Evaluation:**
   This section critically assesses the strengths and weaknesses of the project. The evaluation demonstrates an awareness of the project's limitations, emphasizing that success is not a prerequisite for a well-executed research project. Marks are allocated based on the project's execution and the depth of understanding demonstrated.

7. **Conclusions:**
   Succinctly summarizing the findings and their implications, this section relates them back to the initial aim of the project.

8. **References:**
   A comprehensive list of academic references used in the main body of the report is provided, including books and research papers. Proper citation is essential to maintain academic integrity.

The report, along with the accompanying Jupiter notebook, constitutes the hand-in for the CM3015 Mid-term coursework.


In [8]:
import pandas as pd
from sklearn.datasets import load_breast_cancer

# Load breast cancer dataset
data = load_breast_cancer()

# Create a Pandas DataFrame
df = pd.DataFrame(data.data, columns=data.feature_names)

# Add the target variable to the DataFrame
df['target'] = data.target

# Display the first few rows of the DataFrame
pd.set_option('display.max_columns', None)
df.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,radius error,texture error,perimeter error,area error,smoothness error,compactness error,concavity error,concave points error,symmetry error,fractal dimension error,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,target
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,1.095,0.9053,8.589,153.4,0.006399,0.04904,0.05373,0.01587,0.03003,0.006193,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,0
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,0.5435,0.7339,3.398,74.08,0.005225,0.01308,0.0186,0.0134,0.01389,0.003532,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,0
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,0.7456,0.7869,4.585,94.03,0.00615,0.04006,0.03832,0.02058,0.0225,0.004571,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,0
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,0.4956,1.156,3.445,27.23,0.00911,0.07458,0.05661,0.01867,0.05963,0.009208,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,0
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,0.7572,0.7813,5.438,94.44,0.01149,0.02461,0.05688,0.01885,0.01756,0.005115,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,0
