**Navigation**
1. [Dependencies](#Dependencies)
2. [Model Loading and Local Inference](#Model-Loading-and-Local-Inference)

# **Dependencies**

In [1]:
!pip install datasets transformers torch

from datasets import load_dataset
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

langs = ['java', 'python', 'pharo']
labels = {
    'java': ['summary', 'Ownership', 'Expand', 'usage', 'Pointer', 'deprecation', 'rational'],
    'python': ['Usage', 'Parameters', 'DevelopmentNotes', 'Expand', 'Summary'],
    'pharo': ['Keyimplementationpoints', 'Example', 'Responsibilities', 'Classreferences', 'Intent', 'Keymessages', 'Collaborators']
}

Collecting datasets
  Downloading datasets-3.2.0-py3-none-any.whl.metadata (20 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.9.0,>=2023.1.0 (from fsspec[http]<=2024.9.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.9.0-py3-none-any.whl.metadata (11 kB)
Downloading datasets-3.2.0-py3-none-any.whl (480 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading fsspec-2024.9.0-py3-none-any.whl (1

# **Model Loading and Local Inference**

In [2]:
samples = {
    "java": [
        "this is the main method that starts the java application | main.java",
        "a method to calculate the factorial of a number | factorial.java",
        "a class that represents a bank account with deposit and withdrawal methods | bankaccount.java",
        "a simple loop to print numbers from 1 to 10 | loopexample.java",
        "a method that checks whether a number is prime | primechecker.java"
    ],
    "python": [
        "a function that returns the fibonacci sequence up to a given number | fibonacci.py",
        "this code reads a text file and prints its contents | file_reader.py",
        "a class that implements a simple stack data structure | stack.py",
        "this script generates random numbers and calculates their average | random_average.py",
        "a function that takes two arguments and returns their sum | sum_two_numbers.py"
    ],
    "pharo": [
        "this class defines a simple bank account with deposit and withdrawal functionality | bankaccount.pharo",
        "a loop that prints numbers from 1 to 5 | loop.pharo",
        "a method that calculates the area of a circle given the radius | circle_area.pharo",
        "a script that prints 'hello, pharo!' to the transcript | hello.pharo",
        "this code defines a person class with name and age attributes | person.pharo"
    ]
}

def inference(lang, input_samples):
    model_name = f"harisathar04/graphic-nlbse-{lang}"
    model = AutoModelForSequenceClassification.from_pretrained(model_name, use_auth_token=False)
    tokenizer = AutoTokenizer.from_pretrained(model_name, use_auth_token=False)

    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model = model.to(device)
    model.eval()

    inputs = tokenizer(input_samples, truncation=True, padding=True, return_tensors="pt")
    input_ids = inputs['input_ids'].to(device)
    attention_mask = inputs['attention_mask'].to(device)

    with torch.no_grad():
        outputs = model(input_ids=input_ids, attention_mask=attention_mask)
        logits = torch.sigmoid(outputs.logits)
        predictions = (logits > 0.5).int()

    return predictions.cpu().numpy()

for lang, input_samples in samples.items():
    print(f"Performing inference for {lang}...")

    predictions = inference(lang, input_samples)

    for i, (sample, prediction) in enumerate(zip(input_samples, predictions)):
        print(f"Sample {i + 1} ({lang}):")
        print(f"Comment: {sample}\nPrediction: {prediction}\n")

Performing inference for java...




config.json:   0%|          | 0.00/1.12k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/1.41k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/3.56M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/957 [00:00<?, ?B/s]

Sample 1 (java):
Comment: this is the main method that starts the java application | main.java
Prediction: [1 0 0 1 0 0 0]

Sample 2 (java):
Comment: a method to calculate the factorial of a number | factorial.java
Prediction: [1 0 0 0 0 0 0]

Sample 3 (java):
Comment: a class that represents a bank account with deposit and withdrawal methods | bankaccount.java
Prediction: [1 0 0 0 0 0 0]

Sample 4 (java):
Comment: a simple loop to print numbers from 1 to 10 | loopexample.java
Prediction: [1 0 0 0 0 0 0]

Sample 5 (java):
Comment: a method that checks whether a number is prime | primechecker.java
Prediction: [1 0 0 0 0 0 0]

Performing inference for python...


config.json:   0%|          | 0.00/1.04k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.41k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/3.56M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/957 [00:00<?, ?B/s]

Sample 1 (python):
Comment: a function that returns the fibonacci sequence up to a given number | fibonacci.py
Prediction: [0 0 0 0 1]

Sample 2 (python):
Comment: this code reads a text file and prints its contents | file_reader.py
Prediction: [0 0 0 0 1]

Sample 3 (python):
Comment: a class that implements a simple stack data structure | stack.py
Prediction: [0 0 0 0 1]

Sample 4 (python):
Comment: this script generates random numbers and calculates their average | random_average.py
Prediction: [0 0 0 0 1]

Sample 5 (python):
Comment: a function that takes two arguments and returns their sum | sum_two_numbers.py
Prediction: [0 0 0 0 1]

Performing inference for pharo...


config.json:   0%|          | 0.00/1.12k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.41k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/3.56M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/957 [00:00<?, ?B/s]

Sample 1 (pharo):
Comment: this class defines a simple bank account with deposit and withdrawal functionality | bankaccount.pharo
Prediction: [0 0 1 0 0 0 0]

Sample 2 (pharo):
Comment: a loop that prints numbers from 1 to 5 | loop.pharo
Prediction: [0 0 1 0 1 0 0]

Sample 3 (pharo):
Comment: a method that calculates the area of a circle given the radius | circle_area.pharo
Prediction: [0 0 1 0 0 0 0]

Sample 4 (pharo):
Comment: a script that prints 'hello, pharo!' to the transcript | hello.pharo
Prediction: [0 0 1 0 1 0 0]

Sample 5 (pharo):
Comment: this code defines a person class with name and age attributes | person.pharo
Prediction: [0 0 0 0 0 1 0]

