# API do Chatgpt

## Utilização
Será usada para:
- geração do contexto das perguntas
- Filtragem e classificação de perguntas
- Geração dos distratores

## Preço
- gpt-4o:
  - $0.00250$ / 1K input tokens
  - $0.00125$ / 1K cached** input tokens
  - $0.01000$ / 1K output tokens
- gpt-4o-mini (melhor opção)
  - $0.000150$ / 1K input tokens
  - $0.000075$ / 1K cached** input tokens
  - $0.000600$ / 1K output tokens

- text-embedding-3-small (pode ser útil para analise de desempenho)

## Tipos de tokens
- Tokens de Entrada (Input Tokens):

  - São os tokens consumidos pelo texto enviado para a API (o seu prompt e as instruções que você envia).
Tudo que você coloca no campo de mensagens conta como tokens de entrada.
O sistema divide esse texto em tokens, que representam palavras ou partes de palavras. Quanto mais longo o prompt, mais tokens de entrada são contabilizados.

- Tokens de Saída (Output Tokens):

  - São os tokens gerados pela resposta da API.
Quanto mais longa a resposta desejada, mais tokens de saída serão usados. Você pode definir o limite de saída com o parâmetro max_tokens para controlar o tamanho da resposta.

## Como Saber Quantos Tokens estão sendo utilizados

- Estimativa Manual: Para uma estimativa aproximada, considere que 1 token equivale a cerca de 4 caracteres em inglês ou 0,75 palavras.
- Ferramentas de Contagem de Tokens: A OpenAI tem um tokenizer tool que permite estimar a contagem de tokens para um texto específico.
## Dicas para Otimizar o Uso de Tokens
- Reduza o Tamanho dos Prompts: Use prompts mais diretos para reduzir os tokens de entrada.
- Controle max_tokens para Saídas: Ajuste max_tokens para evitar respostas excessivamente longas e controlar os tokens de saída.

tanto tokens de entrada quanto de saída serão contabilizados conforme você usa a API para gerar contextos, distratores, filtrar perguntas e respostas, etc. Isso permitirá que você veja exatamente o consumo por chamada e ajuste as instruções conforme necessário.

# Configuração da api da openAI

In [None]:
import os
from openai import OpenAI

In [None]:
# Inserindo chave API
from google.colab import drive

drive.mount('/content/drive')

with open('/content/drive/MyDrive/api secret key.txt') as f:
    os.environ['OPENAI_API_KEY'] = f.read().strip()

client = OpenAI(
  api_key=os.environ['OPENAI_API_KEY'],
)

Mounted at /content/drive


# Contexto

In [None]:
# Definindo função de geração de contexto
def gerar_contexto(topic):
    prompt = (f"Write a comprehensive and structured tutorial on the Java programming topic '{topic}' without using any code examples. "
              f"Begin with an introduction explaining why '{topic}' is important in Java. "
              "Provide an in-depth explanation of key concepts, including important terminology, common patterns, and relevant methods or classes. "
              "Describe each concept clearly, focusing on how they work, how to apply them, and when they are commonly used. "
              "Finally, discuss common pitfalls and best practices associated with this topic in Java. Ensure that the tutorial is thorough enough for the reader to understand without needing code examples.")

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a Java programming expert who writes in-depth tutorials without code examples."},
            {"role": "user", "content": prompt}
        ],
        max_tokens=1200,
        temperature=0.7
    )
    return response.choices[0].message.content

In [None]:
# Teste da geração de contexto
topic = "Exception Handling"

texto_gerado = gerar_contexto(topic)

print("Texto gerado pelo modelo:")
print(texto_gerado)


Texto gerado pelo modelo:
# Comprehensive Tutorial on Exception Handling in Java

## Introduction

Exception handling is a critical aspect of robust Java programming. It provides a mechanism to manage runtime anomalies gracefully and ensures that the program can respond to unexpected events without crashing. In the real world, software applications are often subject to various errors, ranging from user input mistakes to hardware failures and network issues. Exception handling allows developers to create resilient applications that can recover from these situations and maintain a good user experience.

In Java, exceptions are categorized into distinct types, allowing developers to handle different error scenarios appropriately. By using exception handling, developers can ensure that their applications remain stable and provide meaningful feedback when errors occur.

## Key Concepts in Exception Handling

### 1. What is an Exception?

An exception is an event that disrupts the normal flo

Texto gerado pelo modelo:
## Comprehensive Tutorial on Exception Handling in Java

## Introduction

Exception handling is a critical aspect of robust Java programming. It provides a mechanism to manage runtime anomalies gracefully and ensures that the program can respond to unexpected events without crashing. In the real world, software applications are often subject to various errors, ranging from user input mistakes to hardware failures and network issues. Exception handling allows developers to create resilient applications that can recover from these situations and maintain a good user experience.

In Java, exceptions are categorized into distinct types, allowing developers to handle different error scenarios appropriately. By using exception handling, developers can ensure that their applications remain stable and provide meaningful feedback when errors occur.

## Key Concepts in Exception Handling

### 1. What is an Exception?

An exception is an event that disrupts the normal flow of a program's execution. It can arise from various sources, such as invalid user input, failed file operations, or network connectivity issues. In Java, exceptions are represented as objects that extend the `Throwable` class, which acts as the superclass for all errors and exceptions.

### 2. Types of Exceptions

Java exceptions are primarily divided into two categories:

- **Checked Exceptions**: These exceptions are checked at compile-time. The Java compiler mandates that these exceptions must either be caught using a try-catch block or declared in the method signature using the `throws` keyword. Examples include `IOException` and `SQLException`. Checked exceptions are typically used when a method can reasonably be expected to fail due to external factors.

- **Unchecked Exceptions**: These exceptions are not checked at compile-time and are subclasses of `RuntimeException`. They represent programming errors, such as logic mistakes or improper use of APIs. Examples include `NullPointerException` and `ArrayIndexOutOfBoundsException`. Unchecked exceptions indicate a flaw in the program's logic that could have been avoided through better coding practices.

### 3. The Exception Hierarchy

Understanding the hierarchy of exceptions in Java helps in grasping how exceptions are structured. At the top of the hierarchy is the `Throwable` class, which has two main subclasses: `Error` and `Exception`.

- **Error**: This subclass represents serious problems that a typical application should not try to catch. Examples include `StackOverflowError` and `OutOfMemoryError`. These errors indicate issues that are generally outside the control of the application.

- **Exception**: This subclass is further divided into checked and unchecked exceptions, as previously mentioned. Developers primarily work with this class when handling runtime anomalies.

### 4. Exception Handling Mechanism

Java employs a structured exception handling mechanism that relies on the following key components:

- **Try Block**: This block contains the code that may throw an exception. It serves as a protective wrapper for potentially problematic operations.

- **Catch Block**: This block is used to handle the exception thrown in the try block. It specifies the type of exception it can catch and contains the code that executes if such an exception occurs.

- **Finally Block**: This block is optional and contains code that always executes after the try and catch blocks, regardless of whether an exception was thrown or caught. It is typically used for resource cleanup, such as closing file streams or database connections.

- **Throw Statement**: This statement is used to explicitly throw an exception in a method. It can be used to signal an error condition, allowing developers to enforce error handling policies.

- **Throws Clause**: This clause is used in method signatures to declare that a method can throw one or more exceptions. This informs callers of the method that they need to handle or propagate the specified exceptions.

### 5. Common Patterns in Exception Handling

When working with exceptions, certain patterns emerge that can improve code readability and maintainability:

- **Single Responsibility Principle**: Each catch block should handle only one type of exception to enhance clarity. This approach makes it easier to understand the error-handling logic.

- **Chaining Exceptions**: When throwing exceptions, it's often useful to include the original exception as the cause of the new exception. This practice preserves the stack trace and provides more context for debugging.

- **Custom Exceptions**: Developers can create their custom exception classes by extending the `Exception` or `RuntimeException` class. This helps in representing application-specific error conditions and facilitates better error categorization.

## Common Pitfalls in Exception Handling

While exception handling in Java is powerful, there are several common pitfalls developers should avoid:

- **Overusing Exceptions**: Using exceptions for control flow can lead to performance issues and make the code harder to read. Exceptions should be used for exceptional circumstances, not regular control flow.

- **Ignoring Exceptions**: Failing to catch or handle exceptions can lead to application crashes or undefined behavior. It's essential to handle exceptions appropriately to maintain application stability.

- **Catching Throwable**: Catching the `Throwable` class is discouraged, as it can intercept serious errors that should not be handled by the application. Instead, focus on catching specific exceptions to avoid masking critical issues.

## Best Practices for Exception Handling

To ensure effective exception handling in Java, consider the following best practices:

- **Be Specific**: Always catch the most specific exception possible. This practice allows for more precise error handling and improves clarity.

- **Log Exceptions**: Implement logging for exceptions to track errors and facilitate debugging. Logging provides valuable insights into the application's behavior in production environments.

- **Use Finally for Cleanup**: Utilize the finally block for resource management to ensure that resources are released properly, regardless of whether an exception occurs.

- **Document Exceptions**: Clearly document the exceptions that a method can throw, either through Javadoc comments or method signatures. This documentation helps other developers understand how to interact with the code safely.

- **Avoid Swallowing Exceptions**: When catching exceptions, do not leave the catch block empty. Always handle the exception in a meaningful way, whether through logging, rethrowing, or providing user feedback

# Gerador de perguntas

In [None]:
text = """Exception handling is a critical component of robust Java programming, providing a mechanism to manage runtime anomalies gracefully and ensuring that programs can respond to unexpected events without crashing. In the real world, software applications often encounter various errors, ranging from user input mistakes to hardware failures and network issues. Exception handling allows developers to create resilient applications that can recover from these scenarios, thus maintaining a good user experience.

In Java, exceptions are organized into distinct categories, allowing developers to handle different error scenarios appropriately. Through effective use of exception handling, developers can ensure that their applications remain stable and provide meaningful feedback to users when errors arise.

An exception is essentially an event that disrupts the normal flow of a program’s execution. Exceptions can stem from various sources, such as invalid user inputs, failed file operations, or network connectivity issues. In Java, exceptions are represented as objects extending from the `Throwable` class, which acts as the superclass for all errors and exceptions.

Java categorizes exceptions primarily into two types: checked and unchecked exceptions. Checked exceptions are identified at compile-time, requiring developers to either catch them using a try-catch block or declare them in the method signature using the `throws` keyword. Examples of checked exceptions include `IOException` and `SQLException`, which are typically used when a method might fail due to external factors. On the other hand, unchecked exceptions, which are not checked at compile-time, are subclasses of `RuntimeException` and often represent programming errors, such as logic mistakes or incorrect API usage. Examples include `NullPointerException` and `ArrayIndexOutOfBoundsException`, which indicate flaws in the program's logic that could be mitigated with better coding practices.

Understanding the hierarchy of exceptions in Java provides insight into how exceptions are structured. At the top of this hierarchy is the `Throwable` class, with two main subclasses: `Error` and `Exception`. The `Error` subclass represents serious problems that an application should not attempt to catch, such as `StackOverflowError` and `OutOfMemoryError`, which often indicate issues beyond the application's control. The `Exception` subclass, however, is further divided into checked and unchecked exceptions and is where developers focus when managing runtime anomalies.

Java's structured exception handling mechanism relies on key components such as the try block, which contains code that may throw an exception, acting as a protective wrapper for potentially risky operations. When an exception occurs, the catch block handles it, specifying the type of exception it can catch and containing code that executes when the exception arises. Additionally, there is the optional finally block, which always executes after the try and catch blocks, regardless of whether an exception was thrown. The finally block is commonly used for resource cleanup, such as closing file streams or database connections. Java also offers the throw statement, which explicitly throws an exception in a method, allowing developers to signal specific error conditions. Furthermore, the throws clause in method signatures declares that a method might throw certain exceptions, informing callers that they must manage or propagate these exceptions.

In practice, certain patterns in exception handling can improve code readability and maintainability. Following the Single Responsibility Principle, each catch block should handle only one type of exception, enhancing clarity and making the error-handling logic easier to understand. Another pattern is chaining exceptions, where including the original exception as the cause of a new exception preserves the stack trace, aiding in debugging. Developers can also create custom exception classes by extending the `Exception` or `RuntimeException` classes, enabling application-specific error categorization.

Despite the robustness of Java’s exception handling system, there are common pitfalls to avoid. Overusing exceptions for control flow, for example, can lead to performance issues and decrease code readability. Exceptions should be reserved for exceptional circumstances, not regular control flow. Ignoring exceptions can lead to application crashes or undefined behavior, making it essential to handle exceptions properly to maintain application stability. Catching the `Throwable` class is generally discouraged because it can intercept serious errors that should not be handled by the application, potentially masking critical issues.

To ensure effective exception handling in Java, certain best practices are recommended. Always catch the most specific exception possible, which allows for more precise error handling and improved code clarity. Implement logging for exceptions to track errors and facilitate debugging, as this provides valuable insights into the application’s behavior, especially in production environments. Utilize the finally block for resource management, ensuring resources are released properly regardless of whether an exception occurs. Document the exceptions that a method can throw clearly, either through Javadoc comments or method signatures, to help other developers understand how to interact with the code safely. Lastly, avoid "swallowing" exceptions, where a catch block is left empty. Always handle exceptions in a meaningful way, such as through logging, rethrowing, or providing user feedback, to ensure transparency and maintain application reliability.
"""

Gerador de questão (multitask QA-QG)

In [None]:
# punkt - tokenizer que divide trecho em sentenças e em palavras
!python -m nltk.downloader punkt

!git clone https://github.com/joao326/question_generation/
#!git clone https://github.com/patil-suraj/question_generation.git

%cd question_generation
!git checkout Tentando-corrigir-o-erro-substring-not-found

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
Cloning into 'question_generation'...
remote: Enumerating objects: 165, done.[K
remote: Counting objects: 100% (19/19), done.[K
remote: Compressing objects: 100% (12/12), done.[K
remote: Total 165 (delta 10), reused 16 (delta 7), pack-reused 146 (from 1)[K
Receiving objects: 100% (165/165), 274.49 KiB | 1.34 MiB/s, done.
Resolving deltas: 100% (82/82), done.
/content/question_generation
Branch 'Tentando-corrigir-o-erro-substring-not-found' set up to track remote branch 'Tentando-corrigir-o-erro-substring-not-found' from 'origin'.
Switched to a new branch 'Tentando-corrigir-o-erro-substring-not-found'


Selecionando o pipeline a ser utilizado, o modelo de geração de questão e o modelo de geração de respostas respectivamente:

In [None]:
#tópico = "Exception Handling"
#contexto = gerar_contexto(tópico)

from pipelines import pipeline

# pipeline(task, model, ans_model)
nlp = pipeline("multitask-qa-qg", model="valhalla/t5-base-qa-qg-hl", ans_model="valhalla/t5-base-qa-qg-hl")

"""
    trecho relacionado em pipeline.py:
    "multitask-qa-qg": {
        "impl": MultiTaskQAQGPipeline, # task_class
        "default": {
            "model": "valhalla/t5-small-qa-qg-hl",
        }
    },
"""


perguntas_respostas = nlp(text)
nlp(text)

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/31.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


pytorch_model.bin:   0%|          | 0.00/892M [00:00<?, ?B/s]



Answer without <pad>: 'Exception handling'
Answer without <pad>: 'various errors'
Answer without <pad>: 'create resilient applications'
Answer without <pad>: 'distinct categories'
Answer without <pad>: 'ensure that their applications remain stable and provide meaningful feedback to users when errors arise'
Answer without <pad>: 'An exception'
Answer without <pad>: 'invalid user inputs, failed file operations, or network connectivity issues'
Answer without <pad>: 'Throwable'
Answer without <pad>: 'checked and unchecked exceptions'
Answer without <pad>: '<unk>throws<unk>'
Answer without <pad>: 'IOException'
Answer without <pad>: 'unchecked exceptions'
Answer without <pad>: 'NullPointerException'
Answer without <pad>: 'understanding'
Answer without <pad>: '<unk>Error<unk> and <unk>Exception<unk>'
Answer without <pad>: 'StackOverflowError'




Answer without <pad>: 'Exception handling'
Answer without <pad>: 'various errors'
Answer without <pad>: 'create resilient applications'
Answer without <pad>: 'distinct categories'
Answer without <pad>: 'ensure that their applications remain stable and provide meaningful feedback to users when errors arise'
Answer without <pad>: 'An exception'
Answer without <pad>: 'invalid user inputs, failed file operations, or network connectivity issues'
Answer without <pad>: 'Throwable'
Answer without <pad>: 'checked and unchecked exceptions'
Answer without <pad>: '<unk>throws<unk>'
Answer without <pad>: 'IOException'
Answer without <pad>: 'unchecked exceptions'
Answer without <pad>: 'NullPointerException'
Answer without <pad>: 'understanding'
Answer without <pad>: '<unk>Error<unk> and <unk>Exception<unk>'
Answer without <pad>: 'StackOverflowError'


[{'answer': 'Exception handling',
  'question': 'What is a critical component of robust Java programming?'},
 {'answer': 'various errors',
  'question': 'What do software applications often encounter in the real world?'},
 {'answer': 'create resilient applications',
  'question': 'What does Exception handling allow developers to do?'},
 {'answer': 'distinct categories',
  'question': 'What are exceptions organized into in Java?'},
 {'answer': 'ensure that their applications remain stable and provide meaningful feedback to users when errors arise',
  'question': 'What can developers do through effective use of exception handling?'},
 {'answer': 'An exception',
  'question': "What is essentially an event that disrupts the normal flow of a program's execution?"},
 {'answer': 'invalid user inputs, failed file operations, or network connectivity issues',
  'question': 'Exceptions can stem from what?'},
 {'answer': 'Throwable',
  'question': 'What class acts as the superclass for all errors 

to do:
- [ ] Retirar "Answer without \<pad>:"

# Filtragem das perguntas ideais e classificação com base no nível de dificuldade

In [None]:
import json
import re

def filtrar_e_classificar_perguntas(perguntas_respostas):
    prompt = (
        "Analyze the following list of question-answer pairs generated for a Java programming topic. "
        "Identify and remove any question-answer pairs that are incomplete, confusing, inconsistent, redundant, repetitive, or vague. "
        "For each high-quality question, classify its difficulty level as 'easy', 'medium', or 'hard'. "
        "Provide the filtered and classified list in JSON format with the structure: "
        "[{'question': '...', 'answer': '...', 'difficulty': '...'}]."
    )

    # Formatação da lista de perguntas e respostas para o prompt
    perguntas_formatadas = "\n".join([f"Q: {pr['question']} | A: {pr['answer']}" for pr in perguntas_respostas])

    # Enviar o prompt para a API
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are an assistant that filters and classifies Java programming question-answer pairs."},
            {"role": "user", "content": f"{prompt}\n\n{perguntas_formatadas}"}
        ],
        max_tokens=1500,
        temperature=0.5
    )

    # Extraindo a parte JSON da resposta
    content = response.choices[0].message.content
    print("Reposta da API:", content) # imprimindo resposta completa para verificação

    # Regex para capturar a parte JSON
    match = re.search(r'```json\n(.*?)\n```', content, re.DOTALL)
    if match:
      json_text = match.group(1)
      try:
        perguntas_filtradas_classificadas = json.loads(json_text)
      except json.JSONDecodeError:
        print("Erro ao decodificar JSON.")
        perguntas_filtradas_classificadas = None
    else:
      print("Formato JSON não encontrado na resposta")
      perguntas_filtradas_classificadas = None
    return perguntas_filtradas_classificadas

perguntas_classificadas = filtrar_e_classificar_perguntas(perguntas_respostas)
print("Perguntas de boa qualidade e classificadas por dificuldade(formato JSON): ")
print(perguntas_classificadas)

Reposta da API: ```json
[
    {
        "question": "What is a critical component of robust Java programming?",
        "answer": "Exception handling",
        "difficulty": "easy"
    },
    {
        "question": "What does Exception handling allow developers to do?",
        "answer": "create resilient applications",
        "difficulty": "medium"
    },
    {
        "question": "What is essentially an event that disrupts the normal flow of a program's execution?",
        "answer": "An exception",
        "difficulty": "easy"
    },
    {
        "question": "What class acts as the superclass for all errors and exceptions in Java?",
        "answer": "Throwable",
        "difficulty": "medium"
    },
    {
        "question": "What are the two types of exceptions in Java?",
        "answer": "checked and unchecked exceptions",
        "difficulty": "easy"
    },
    {
        "question": "What is an example of a checked exception?",
        "answer": "IOException",
        "difficu

In [None]:
# Saída
Reposta da API: ```json
[
    {
        "question": "What is a critical component of robust Java programming?",
        "answer": "Exception handling",
        "difficulty": "easy"
    },
    {
        "question": "What does Exception handling allow developers to do?",
        "answer": "create resilient applications",
        "difficulty": "medium"
    },
    {
        "question": "What is essentially an event that disrupts the normal flow of a program's execution?",
        "answer": "An exception",
        "difficulty": "easy"
    },
    {
        "question": "What class acts as the superclass for all errors and exceptions in Java?",
        "answer": "Throwable",
        "difficulty": "medium"
    },
    {
        "question": "What are the two types of exceptions in Java?",
        "answer": "checked and unchecked exceptions",
        "difficulty": "easy"
    },
    {
        "question": "What is an example of a checked exception?",
        "answer": "IOException",
        "difficulty": "medium"
    },
    {
        "question": "What is an example of an unchecked exception?",
        "answer": "NullPointerException",
        "difficulty": "medium"
    },
    {
        "question": "What is an example of an error that an application should not attempt to catch?",
        "answer": "StackOverflowError",
        "difficulty": "medium"
    }
]
```
Perguntas de boa qualidade e classificadas por dificuldade:
[{'question': 'What is a critical component of robust Java programming?', 'answer': 'Exception handling', 'difficulty': 'easy'}, {'question': 'What does Exception handling allow developers to do?', 'answer': 'create resilient applications', 'difficulty': 'medium'}, {'question': "What is essentially an event that disrupts the normal flow of a program's execution?", 'answer': 'An exception', 'difficulty': 'easy'}, {'question': 'What class acts as the superclass for all errors and exceptions in Java?', 'answer': 'Throwable', 'difficulty': 'medium'}, {'question': 'What are the two types of exceptions in Java?', 'answer': 'checked and unchecked exceptions', 'difficulty': 'easy'}, {'question': 'What is an example of a checked exception?', 'answer': 'IOException', 'difficulty': 'medium'}, {'question': 'What is an example of an unchecked exception?', 'answer': 'NullPointerException', 'difficulty': 'medium'}, {'question': 'What is an example of an error that an application should not attempt to catch?', 'answer': 'StackOverflowError', 'difficulty': 'medium'}]

# Geração dos distratores

In [None]:
def gerar_distratores(perguntas_classificadas):
    distratores_final = []

    for pr in perguntas_classificadas:
        prompt = f"Generate 4 plausible distractors for the following Java question.\nQuestion: {pr['question']}\nAnswer: {pr['answer']}"

        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": "You are an assistant who generates multiple-choice question distractors."},
                {"role": "user", "content": prompt}
            ],
            max_tokens=200,
            temperature=0.7
        )

        distratores = response.choices[0].message.content
        pr['distractors'] = distratores.split('\n')
        distratores_final.append(pr)

    return distratores_final

# Gere distratores para as perguntas classificadas
perguntas_com_distratores = gerar_distratores(perguntas_classificadas)

# Visualização e exportação das perguntas

In [None]:
import json

# Salvar as perguntas finalizadas em um arquivo JSON
with open('perguntas_finalizadas.json', 'w') as f:
    json.dump(perguntas_com_distratores, f, indent=4)

print("Perguntas geradas e classificadas com sucesso!")

Perguntas geradas e classificadas com sucesso!


In [None]:
from google.colab import files
files.download('perguntas_finalizadas.json')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

# Pipeline final de geração de questão (Com todas as etapas)

In [None]:
import os
import json
import re
from openai import OpenAI

In [None]:
# Inserindo chave API
from google.colab import drive

drive.mount('/content/drive')

with open('/content/drive/MyDrive/api secret key.txt') as f:
    os.environ['OPENAI_API_KEY'] = f.read().strip()

client = OpenAI(
  api_key=os.environ['OPENAI_API_KEY'],
)

Mounted at /content/drive


In [None]:
# punkt - tokenizer que divide trecho em sentenças e em palavras
!python -m nltk.downloader punkt

!git clone https://github.com/joao326/question_generation/
#!git clone https://github.com/patil-suraj/question_generation.git

%cd question_generation
!git checkout Tentando-corrigir-o-erro-substring-not-found

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
Cloning into 'question_generation'...
remote: Enumerating objects: 165, done.[K
remote: Counting objects: 100% (19/19), done.[K
remote: Compressing objects: 100% (12/12), done.[K
remote: Total 165 (delta 10), reused 16 (delta 7), pack-reused 146 (from 1)[K
Receiving objects: 100% (165/165), 274.49 KiB | 16.15 MiB/s, done.
Resolving deltas: 100% (82/82), done.
/content/question_generation
Branch 'Tentando-corrigir-o-erro-substring-not-found' set up to track remote branch 'Tentando-corrigir-o-erro-substring-not-found' from 'origin'.
Switched to a new branch 'Tentando-corrigir-o-erro-substring-not-found'


In [None]:
# Função para gerar o contexto do tópico
def gerar_contexto(topic):
    prompt = (f"Write a comprehensive and structured tutorial on the Java programming topic '{topic}' without using any code examples. "
              f"Begin with an introduction explaining why '{topic}' is important in Java. "
              "Provide an in-depth explanation of key concepts, including important terminology, common patterns, and relevant methods or classes. "
              "Describe each concept clearly, focusing on how they work, how to apply them, and when they are commonly used. "
              "Finally, discuss common pitfalls and best practices associated with this topic in Java. Ensure that the tutorial is thorough enough for the reader to understand without needing code examples.")

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a Java programming expert who writes in-depth tutorials without code examples."},
            {"role": "user", "content": prompt}
        ],
        max_tokens=1200,
        temperature=0.7
    )
    return response.choices[0].message.content

# Função para gerar perguntas e respostas usando o contexto gerado
def gerar_perguntas_respostas(contexto):
    from pipelines import pipeline
    nlp = pipeline("multitask-qa-qg", model="valhalla/t5-base-qa-qg-hl", ans_model="valhalla/t5-base-qa-qg-hl")
    perguntas_respostas = nlp(contexto)
    return perguntas_respostas

# Função para filtrar e classificar perguntas e respostas
def filtrar_e_classificar_perguntas(perguntas_respostas):
    prompt = (
        "Analyze the following list of question-answer pairs generated for a Java programming topic. "
        "Identify and remove any question-answer pairs that are incomplete, confusing, inconsistent, redundant, repetitive, or vague. "
        "For each high-quality question, classify its difficulty level as 'easy', 'medium', or 'hard'. "
        "Provide the filtered and classified list in JSON format with the structure: "
        "[{'question': '...', 'answer': '...', 'difficulty': '...'}]."
    )

    perguntas_formatadas = "\n".join([f"Q: {pr['question']} | A: {pr['answer']}" for pr in perguntas_respostas])

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are an assistant that filters and classifies Java programming question-answer pairs."},
            {"role": "user", "content": f"{prompt}\n\n{perguntas_formatadas}"}
        ],
        max_tokens=1500,
        temperature=0.5
    )

    content = response.choices[0].message.content
    match = re.search(r'```json\n(.*?)\n```', content, re.DOTALL)
    if match:
        json_text = match.group(1)
        try:
            perguntas_filtradas_classificadas = json.loads(json_text)
        except json.JSONDecodeError:
            print("Erro ao decodificar JSON.")
            perguntas_filtradas_classificadas = None
    else:
        print("Formato JSON não encontrado na resposta")
        perguntas_filtradas_classificadas = None
    return perguntas_filtradas_classificadas

# Função para gerar distratores para cada pergunta
def gerar_distratores(perguntas_classificadas):
    for pr in perguntas_classificadas:
        prompt = f"Generate 4 plausible distractors for the following Java question.\nQuestion: {pr['question']}\nAnswer: {pr['answer']}"
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": "You are an assistant who generates multiple-choice question distractors."},
                {"role": "user", "content": prompt}
            ],
            max_tokens=200,
            temperature=0.7
        )
        distratores = response.choices[0].message.content
        pr['distractors'] = distratores.split('\n')
    return perguntas_classificadas

# Função principal para gerar questões para todos os tópicos
def gerar_questoes(tópicos):
    todas_perguntas = []

    for tópico in tópicos:
        print(f"Gerando contexto para o tópico: {tópico}")
        contexto = gerar_contexto(tópico)

        print(f"Gerando perguntas para o tópico: {tópico}")
        perguntas_respostas = gerar_perguntas_respostas(contexto)

        print(f"Filtrando e classificando perguntas para o tópico: {tópico}")
        perguntas_classificadas = filtrar_e_classificar_perguntas(perguntas_respostas)

        if perguntas_classificadas:
            print(f"Gerando distratores para o tópico: {tópico}")
            perguntas_com_distratores = gerar_distratores(perguntas_classificadas)

            # Adiciona o tópico a cada pergunta e armazena no resultado final
            for pergunta in perguntas_com_distratores:
                pergunta['topic'] = tópico
                todas_perguntas.append(pergunta)

    # Salvar todas as perguntas em um JSON
    with open('perguntas.json', 'w') as f:
        json.dump(todas_perguntas, f, indent=4)

    print("Todas as questões foram geradas e salvas com sucesso no arquivo 'todas_perguntas.json'.")

# Uso com tópicos básicos
tópicos_básicos = [
    "Basic Syntax", "DataTypes, Variables", "Conditionals", "Functions",
    "Loops", "Exception Handling", "DataStructures", "OOP, Interfaces, Classes",
    "Packages", "Working With Files and APIs"
]
gerar_questoes(tópicos_básicos)


Gerando contexto para o tópico: Basic Syntax
Gerando perguntas para o tópico: Basic Syntax


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/90.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/31.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


pytorch_model.bin:   0%|          | 0.00/892M [00:00<?, ?B/s]



Answer without <pad>: 'Basic Syntax'
Answer without <pad>: 'Syntax'
Answer without <pad>: 'writing, reading, and understanding Java code'
Answer without <pad>: 'more advanced programming concepts and error resolution'
Answer without <pad>: 'object-oriented'
Answer without <pad>: 'C and C++'
Answer without <pad>: 'Java's basic syntax'
Answer without <pad>: 'Key Concepts of Java Basic Syntax'
Answer without <pad>: 'Statements'
Answer without <pad>: 'expressions, declarations, and control flow statements'
Answer without <pad>: 'Understanding statements'
Answer without <pad>: 'Expressions'
Answer without <pad>: 'Declarations'
Answer without <pad>: 'Control Flow Statements'
Answer without <pad>: 'Data Types'
Answer without <pad>: 'Data Types'
Answer without <pad>: 'boolean'
Answer without <pad>: 'size and range'
Answer without <pad>: 'objects and arrays'
Answer without <pad>: 'A reference type'
Answer without <pad>: 'Variables'
Answer without <pad>: 'Variables'
Answer without <pad>: 'specif



Answer without <pad>: 'Understanding data types and variables'
Answer without <pad>: 'data types and variables'
Answer without <pad>: 'Data types'
Answer without <pad>: 'mastering these concepts'
Answer without <pad>: 'Key Concepts'
Answer without <pad>: 'reference data types'
Answer without <pad>: 'Primitive Data Types'
Answer without <pad>: 'eight'
Answer without <pad>: '16'
Answer without <pad>: '32'
Answer without <pad>: '64'
Answer without <pad>: 'single-precision 32-bit IEEE 754 floating point'
Answer without <pad>: 'greater precision'
Answer without <pad>: '16'
Answer without <pad>: 'true or false'
Answer without <pad>: 'Reference Data Types'
Answer without <pad>: 'addresses'
Answer without <pad>: 'Strings'
Answer without <pad>: 'arrays'
Answer without <pad>: 'User-defined data types'
Answer without <pad>: 'Variable'
Answer without <pad>: 'Variables'
Filtrando e classificando perguntas para o tópico: DataTypes, Variables
Gerando distratores para o tópico: DataTypes, Variables
Ge



Answer without <pad>: 'conditionals'
Answer without <pad>: 'conditionals'
Answer without <pad>: 'creating dynamic and interactive applications'
Answer without <pad>: 'understanding conditionals'
Answer without <pad>: 'Key Concepts'
Answer without <pad>: 'condition'
Answer without <pad>: 'comparison operators'
Answer without <pad>: '**Branching'
Answer without <pad>: 'branches in a tree'
Answer without <pad>: 'Common Conditional Statements'
Answer without <pad>: 'if Statement'
Answer without <pad>: 'if Statement'
Answer without <pad>: 'optional else clause'
Answer without <pad>: 'if statement'
Answer without <pad>: 'If the first condition is false'
Answer without <pad>: 'switch Statement'
Answer without <pad>: 'case labels'
Answer without <pad>: 'Logical Operators'
Answer without <pad>: 'Logical Operators'
Answer without <pad>: 'true if both conditions are true'
Answer without <pad>: 'OR'
Filtrando e classificando perguntas para o tópico: Conditionals
Gerando distratores para o tópico: 



Answer without <pad>: 'methods'
Answer without <pad>: 'reusable segments of code'
Answer without <pad>: 'code reusability, enhance readability, facilitate debugging, and support modular programming'
Answer without <pad>: 'better software design and architecture'
Answer without <pad>: 'Key Concepts of Functions in Java'
Answer without <pad>: 'Return Type'
Answer without <pad>: 'any Java data type'
Answer without <pad>: 'reference types'
Answer without <pad>: 'void'
Answer without <pad>: 'The name of the function'
Answer without <pad>: 'camelCase'
Answer without <pad>: 'Parameters'
Answer without <pad>: 'Parameters'
Answer without <pad>: 'zero or more parameters'
Answer without <pad>: 'Method Body'
Answer without <pad>: 'contains the statements that are executed when the method is called'
Answer without <pad>: 'Method Overloading'
Answer without <pad>: 'Method Overloading'
Answer without <pad>: 'Method Overloading'
Answer without <pad>: 'Overloading'
Answer without <pad>: 'Access Modifie



Answer without <pad>: 'Loops'
Answer without <pad>: 'repetitive tasks'
Answer without <pad>: 'Understanding loops'
Answer without <pad>: 'Importance of Loops in Java'
Answer without <pad>: 'Automating repetitive tasks'
Answer without <pad>: 'Automating repetitive tasks'
Answer without <pad>: 'Flexibility'
Answer without <pad>: 'Control Flow'
Answer without <pad>: 'Control Flow'
Answer without <pad>: 'Key Concepts of Loops'
Answer without <pad>: 'Terminology'
Answer without <pad>: 'condition'
Answer without <pad>: 'count'
Answer without <pad>: 'Types of Loops'
Answer without <pad>: 'For Loop'
Answer without <pad>: 'initialization, condition, and increment/decrement'
Answer without <pad>: 'iterating over arrays or collections'
Answer without <pad>: 'While Loop'
Answer without <pad>: 'as long as the specified condition remains true'
Answer without <pad>: 'user input or external factors'
Answer without <pad>: 'Do-While Loop'
Filtrando e classificando perguntas para o tópico: Loops
Gerando 



Answer without <pad>: 'Exception handling'
Answer without <pad>: 'an unexpected event that disrupts the normal flow of the program's instructions'
Answer without <pad>: 'Effective exception handling'
Answer without <pad>: 'program reliability, readability, and maintainability'
Answer without <pad>: 'Key Concepts of Exception Handling'
Answer without <pad>: 'an event that disrupts the normal execution of a program'
Answer without <pad>: 'objects'
Answer without <pad>: 'by catching them or declaring them in the method signature'
Answer without <pad>: 'IOException'
Answer without <pad>: 'Unchecked Exceptions'
Answer without <pad>: 'NullPointerException'
Answer without <pad>: 'RuntimeException'
Answer without <pad>: '### 2.'
Answer without <pad>: 'class hierarchy'
Answer without <pad>: '<unk>Throwable<unk>'
Answer without <pad>: 'serious problems'
Answer without <pad>: 'Java runtime environment'
Answer without <pad>: 'Represents conditions that a program might want to catch'
Answer without



Answer without <pad>: 'Data structures'
Answer without <pad>: 'data structures'
Answer without <pad>: 'speed and efficiency'
Answer without <pad>: 'Key Concepts of Data Structures'
Answer without <pad>: 'specialized format'
Answer without <pad>: 'how much memory is required'
Answer without <pad>: 'primitive data structures'
Answer without <pad>: 'Important Terminology'
Answer without <pad>: 'Important Terminology'
Answer without <pad>: 'collection'
Answer without <pad>: 'Node'
Answer without <pad>: 'Pointer/Reference'
Answer without <pad>: 'maximum number of elements'
Answer without <pad>: 'current number of elements in a data structure'
Answer without <pad>: 'Common Data Structures in Java'
Answer without <pad>: 'Arrays'
Answer without <pad>: 'dynamic resizing and flexibility'
Answer without <pad>: 'Lists'
Answer without <pad>: 'Lists'
Answer without <pad>: 'Lists'
Answer without <pad>: 'Sets'
Filtrando e classificando perguntas para o tópico: DataStructures
Gerando distratores para o



Answer without <pad>: 'Object-Oriented Programming'
Answer without <pad>: 'a clear modular structure'
Answer without <pad>: 'classes and interfaces'
Answer without <pad>: 'facilitate code reuse, enhance readability, and simplify complex systems'
Answer without <pad>: 'Key Concepts'
Answer without <pad>: '1. Encapsulation'
Answer without <pad>: '**Encapsulation'
Answer without <pad>: 'Encapsulation'
Answer without <pad>: 'inheritance'
Answer without <pad>: 'inheritance'
Answer without <pad>: 'superclass'
Answer without <pad>: 'simplify code management'
Answer without <pad>: 'Polymorphism'
Answer without <pad>: 'Polymorphism'
Answer without <pad>: 'data types'
Answer without <pad>: 'Polymorphism'
Answer without <pad>: 'Abstraction'
Answer without <pad>: 'Abstraction'
Answer without <pad>: 'abstract classes and interfaces'
Answer without <pad>: 'blueprint'
Answer without <pad>: 'attributes'
Answer without <pad>: 'attributes'
Answer without <pad>: 'any data type'
Filtrando e classificando 



Answer without <pad>: 'a namespace'
Answer without <pad>: 'managing large software systems'
Answer without <pad>: 'easier distribution and modularization of code'
Answer without <pad>: 'object-oriented programming paradigm'
Answer without <pad>: 'Key Concepts'
Answer without <pad>: 'easy management'
Answer without <pad>: 'folders'
Answer without <pad>: 'enhance clarity and maintainability'
Answer without <pad>: 'Naming Conflicts'
Answer without <pad>: 'Access Control'
Answer without <pad>: 'Important Terminology'
Answer without <pad>: 'Package Declaration'
Answer without <pad>: 'place the class in the correct location'
Answer without <pad>: 'Import Statement'
Answer without <pad>: 'code readability and convenience'
Answer without <pad>: 'default package'
Answer without <pad>: 'naming conflicts and maintenance issues'
Answer without <pad>: 'Types of Packages'
Answer without <pad>: 'Java Development Kit'
Answer without <pad>: 'java.lang'
Answer without <pad>: 'Understanding these package



Answer without <pad>: 'working with files and APIs'
Answer without <pad>: 'powerful libraries and frameworks'
Answer without <pad>: 'greatly enhance'
Answer without <pad>: 'importance, key terminology, patterns, and best practices'
Answer without <pad>: 'data-driven applications'
Answer without <pad>: 'Key Concepts'
Answer without <pad>: 'File Handling'
Answer without <pad>: 'java.io'
Answer without <pad>: 'a collection of data stored on a disk'
Answer without <pad>: 'various types'
Answer without <pad>: 'Streams'
Answer without <pad>: 'input streams'
Answer without <pad>: 'Buffered Streams'
Answer without <pad>: 'file channels'
Answer without <pad>: 'Reading Files'
Answer without <pad>: 'buffered readers or file channels'
Answer without <pad>: 'Writing Files'
Filtrando e classificando perguntas para o tópico: Working With Files and APIs
Gerando distratores para o tópico: Working With Files and APIs
Todas as questões foram geradas e salvas com sucesso no arquivo 'todas_perguntas.json'.

In [None]:
# Outros tópicos: (é mesmo necessário?)

# Intermediário 1
# Intermetdiário 2
# Avançado
# Completo

In [None]:
from google.colab import files
files.download('perguntas.json')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

to do:
- [ ] Retirar "Answer without \<pad>:"