Skip to content

hakancangunerli/turkish_binary_classifier

Repository files navigation

Overview:

This document summarizes the results obtained from the Turkish Name Classifier, which was developed to predict whether a given name belongs to a Turkish individual. The code includes steps for data preprocessing, model training, and evaluation. Here, we focus on understanding the model's performance and insights drawn from the predictions. Data was obtained from philipperemy's name dataset made up of nearly 492 million people's names from a Facebook leak from 2021.

Model Evaluation Results:

  1. Accuracy: The logistic regression model achieved an accuracy of 95.27% on the test set.

  2. Classification Report:

    Class Precision Recall F1-Score Support
    Not Turkish 96% 98% 97% 1,459,107
    Turkish 93% 84% 88% 389,067
  3. Overall Metrics:

    • Accuracy: 95.27%
    • Macro Average:
      • Precision: 94%
      • Recall: 91%
      • F1-Score: 93%
    • Weighted Average:
      • Precision: 95%
      • Recall: 95%
      • F1-Score: 95%

Insights and Observations:

  • The model showcases a high precision for both classes, indicating a strong ability to correctly predict the class labels.
  • With a recall of 98% for the Not Turkish class, the model demonstrates a high capability to detect names not of Turkish origin.
  • The Turkish class, although performing well, has a slightly lower recall at 84%. This suggests that there are instances where the model might misclassify Turkish names as not Turkish.
  • The high F1-Scores for both classes highlight the model's balanced performance regarding precision and recall.

Limitations:

  • While the overall accuracy is high, the slightly lower recall for the Turkish class suggests there's room for improvement in detecting Turkish names.
  • Model performance may vary with different datasets. It's essential to perform periodic evaluations with diverse and updated data.

Conclusion:

The Turkish Name Classifier, with an accuracy of 95.27%, provides a reliable tool for identifying names of Turkish origin. Its high precision and recall rates make it a robust model, though continuous validation and refinement are recommended for optimal real-world performance. Future iterations might focus on further enhancing the recall rate for the Turkish class.

References/Idea:

https://github.com/heni/turkish-names-classifier

https://github.com/mkozturk/turkishnames

About

this is a logistic regression binary classification system for classifying turkish names trained on first and lastnames of over 105 nations.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors