<a href="https://colab.research.google.com/github/isiktopcu/bart/blob/main/berturk_turkish_NER.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Toponymy Detection Using BERTurk in Turkish Tweets

We're trying to find the location related tokens in Turkish tweets.

In [None]:
import pandas as pd
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

model_name = "savasy/bert-base-turkish-ner-cased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

tweet ='"📌Hatay Büyükşehir Belediye Başkanı Lütfü Savaş: "Şu andaki demografik yapıya baktığınız zaman Suriyeliler, Hataylılardan bir adım önde. Hatay, Suriye şehri olma yolunda!"'
tokens = tokenizer.tokenize(tweet)

#add special tokens [CLS] and [SEP]
tokens = ["[CLS]"] + tokens + ["[SEP]"]
input_ids = tokenizer.convert_tokens_to_ids(tokens)
input_tensor = torch.tensor([input_ids])

model.eval()
with torch.no_grad():
    outputs = model(input_tensor)
    predicted_labels = torch.argmax(outputs.logits, dim=2).squeeze(0).tolist()[1:-1]

label_names = model.config.id2label

locations = []


current_location = ""
for token, label_id in zip(tokens[1:-1], predicted_labels):
    label = label_names[label_id]
    if label == 'B-LOC':  # 'B-LOC' represents the beginning of a location entity
        if current_location:
            locations.append(current_location)
        current_location = token
    elif label == 'I-LOC':  # 'I-LOC' represents the continuation of a location entity
        current_location += token

if current_location:
    locations.append(current_location)

#join the location tokens and replace "##" with a blank
locations_cleaned = [loc.replace("##", "") for loc in locations]
locations_string = ', '.join(locations_cleaned)
print("Locations:", locations_string)


Locations: Hatay, Hatay, Suriye
