GitHub - hakancangunerli/turkish_binary_classifier: this is a logistic regression binary classification system for classifying turkish names trained on first and lastnames of over 105 nations.

Overview:

This document summarizes the results obtained from the Turkish Name Classifier, which was developed to predict whether a given name belongs to a Turkish individual. The code includes steps for data preprocessing, model training, and evaluation. Here, we focus on understanding the model's performance and insights drawn from the predictions. Data was obtained from philipperemy's name dataset made up of nearly 492 million people's names from a Facebook leak from 2021.

Model Evaluation Results:

Accuracy: The logistic regression model achieved an accuracy of 95.27% on the test set.
Classification Report:

Class Precision Recall F1-Score Support

Not Turkish 96% 98% 97% 1,459,107

Turkish 93% 84% 88% 389,067
Overall Metrics:
- Accuracy: 95.27%
- Macro Average:
  - Precision: 94%
  - Recall: 91%
  - F1-Score: 93%
- Weighted Average:
  - Precision: 95%
  - Recall: 95%
  - F1-Score: 95%

Insights and Observations:

The model showcases a high precision for both classes, indicating a strong ability to correctly predict the class labels.
With a recall of 98% for the Not Turkish class, the model demonstrates a high capability to detect names not of Turkish origin.
The Turkish class, although performing well, has a slightly lower recall at 84%. This suggests that there are instances where the model might misclassify Turkish names as not Turkish.
The high F1-Scores for both classes highlight the model's balanced performance regarding precision and recall.

Limitations:

While the overall accuracy is high, the slightly lower recall for the Turkish class suggests there's room for improvement in detecting Turkish names.
Model performance may vary with different datasets. It's essential to perform periodic evaluations with diverse and updated data.

Conclusion:

The Turkish Name Classifier, with an accuracy of 95.27%, provides a reliable tool for identifying names of Turkish origin. Its high precision and recall rates make it a robust model, though continuous validation and refinement are recommended for optimal real-world performance. Future iterations might focus on further enhancing the recall rate for the Turkish class.

References/Idea:

https://github.com/heni/turkish-names-classifier

https://github.com/mkozturk/turkishnames

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
Untitled-1.ipynb		Untitled-1.ipynb
combine.py		combine.py
logistic_regression_model.pkl		logistic_regression_model.pkl
tfidf_vectorizer.pkl		tfidf_vectorizer.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview:

Model Evaluation Results:

Insights and Observations:

Limitations:

Conclusion:

References/Idea:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Class	Precision	Recall	F1-Score	Support
Not Turkish	96%	98%	97%	1,459,107
Turkish	93%	84%	88%	389,067

Folders and files

Latest commit

History

Repository files navigation

Overview:

Model Evaluation Results:

Insights and Observations:

Limitations:

Conclusion:

References/Idea:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages