Collecting workspace information```markdown
This project provides C++ implementations of a k-D Tree data structure and a k-Nearest Neighbors (kNN) classifier algorithm built upon the k-D Tree. It includes functionality for building, manipulating, and traversing k-D Trees, as well as training a kNN model and making predictions on datasets. The project also includes a Dataset class for basic data handling from CSV files like the included init/mnist.csv.
- k-D Tree Implementation (
kDTree):- Construction from point lists (
kDTree::buildTree). - Insertion of individual points (
kDTree::insert). - Removal of points (
kDTree::remove). - Searching for points (
kDTree::search). - Tree traversals: Inorder (
kDTree::inorderTraversal), Preorder (kDTree::preorderTraversal), Postorder (kDTree::postorderTraversal). - Calculation of tree height (
kDTree::height), node count (kDTree::nodeCount), and leaf count (kDTree::leafCount). - Nearest neighbor search (
kDTree::nearestNeighbour). - k-Nearest neighbor search (
kDTree::kNearestNeighbour).
- Construction from point lists (
- kNN Classifier Implementation (
kNN):- Training the classifier with training data (
kNN::fit). - Predicting labels for test data (
kNN::predict). - Evaluating prediction accuracy (
kNN::score).
- Training the classifier with training data (
- Dataset Handling (
Dataset):- Loading data from CSV files (
Dataset::loadFromCSV). - Basic data inspection (
printHead,printTail,getShape,columns). - Data extraction (
Dataset::extract). - Splitting data into training and testing sets (
train_test_split).
- Loading data from CSV files (
kDTreeNode: Represents a node in the k-D Tree, storing data point coordinates and an optional label.kDTree: The main class for the k-D Tree, managing nodes and providing tree operations.Dataset: A class to load and manage datasets, represented as a list of lists of integers.kNN: The k-Nearest Neighbors classifier class.
- init/kDTree.hpp: Header file containing declarations for
kDTreeNode,kDTree, andkNNclasses, along with helper function prototypes likecopyNode. - init/kDTree.cpp: Implementation file for the methods of
kDTreeandkNN, and helper functions likemergeSort. - init/Dataset.hpp: Header file for the
Datasetclass andtrain_test_splitfunction. init/Dataset.cpp(Not provided): Assumed implementation file for theDatasetclass methods.- init/main.hpp: Common header file including standard libraries.
- init/main.cpp: Contains example usage and test cases (
tc1throughtc9) for the implemented classes. - init/mnist.csv: Sample dataset (MNIST handwritten digits).
The init/main.cpp file contains several test cases demonstrating how to use the kDTree and kNN classes.
Example: Building and traversing a k-D Tree (tc3 in init/main.cpp)
// filepath: init/main.cpp
// ...existing code...
void tc3()
{
kDTree tree(2); // Create a 2-dimensional k-D Tree
vector<vector<int>> pointList = {{5, 6}, {2, 2}, {7, 3}, {2, 8}, {8, 7}, {8, 1}, {9, 4}, {3, 5}};
tree.buildTree(pointList); // Build the tree from points
tree.preorderTraversal(); // Print the tree using preorder traversal
}
// ...existing code...Example: Using the kNN classifier (tc9 in init/main.cpp)
// filepath: init/main.cpp
// ...existing code...
void tc9()
{
Dataset dataset;
dataset.loadFromCSV("mnist.csv"); // Load data
kNN knn; // Create a kNN classifier (default k=5)
Dataset X_train, X_test, y_train, y_test;
Dataset feature = dataset.extract(0, -1, 1, -1); // Extract features
Dataset label = dataset.extract(0, -1, 0, 0); // Extract labels
// Split data into training and testing sets
train_test_split(feature, label, 0.2, X_train, X_test, y_train, y_test);
knn.fit(X_train, y_train); // Train the model
Dataset y_pred = knn.predict(X_test); // Predict on the test set
double accuracy = knn.score(y_test, y_pred); // Calculate accuracy
std::cout << "Accuracy: " << accuracy << endl;
}
// ...existing code...You can compile the source files using a C++ compiler (like g++). Ensure all necessary .cpp files (init/kDTree.cpp, Dataset.cpp, init/main.cpp) are included in the compilation command.
Example compilation command (assuming Dataset.cpp exists):
g++ init/kDTree.cpp init/Dataset.cpp init/main.cpp -o main -std=c++11Then, run the executable:
./main(Note: The main function in main.cpp currently calls tc3(). Modify it to run other test cases.)