This project builds an intelligent assistant capable of classifying code-related questions and generating accurate responses based on the type of question. It is designed to help students and developers interact more effectively with programming problems and receive targeted help.

#### Note: This notebook should be run on Kaggle

# Question Categorization
## Masked Languge Modeling
### Load Dataset

Add this dataset to input https://www.kaggle.com/datasets/imoore/60k-stack-overflow-questions-with-quality-rate

### Tokenization

In [1]:
import kagglehub
from kagglehub import KaggleDatasetAdapter
import pandas as pd

train_df = pd.read_csv("/kaggle/input/60k-stack-overflow-questions-with-quality-rate/train.csv")
valid_df = pd.read_csv("/kaggle/input/60k-stack-overflow-questions-with-quality-rate/valid.csv")

print("First 5 records:", train_df.head())

train_texts = train_df['Title'].fillna("") + " " + train_df['Body'].fillna("")
valid_texts = valid_df['Title'].fillna("") + " " + valid_df['Body'].fillna("")

First 5 records:          Id                                              Title  \
0  34552656             Java: Repeat Task Every Random Seconds   
1  34553034                  Why are Java Optionals immutable?   
2  34553174  Text Overlay Image with Darkened Opacity React...   
3  34553318         Why ternary operator in swift is so picky?   
4  34553755                 hide/show fab with scale animation   

                                                Body  \
0  <p>I'm already familiar with repeating tasks e...   
1  <p>I'd like to understand why Java 8 Optionals...   
2  <p>I am attempting to overlay a title over an ...   
3  <p>The question is very simple, but I just cou...   
4  <p>I'm using custom floatingactionmenu. I need...   

                                                Tags         CreationDate  \
0                                     <java><repeat>  2016-01-01 00:21:59   
1                                   <java><optional>  2016-01-01 02:03:20   
2  <javascript><im

Disable wandb to get a faster training speed

In [2]:
import os
os.environ["WANDB_DISABLED"] = "true"

In [7]:
import warnings
warnings.filterwarnings('ignore')

In [3]:
from transformers import RobertaTokenizerFast
from datasets import Dataset

tokenizer = RobertaTokenizerFast.from_pretrained("roberta-base")
#tokenizer = RobertaTokenizerFast.from_pretrained("microsoft/codebert-base")

# Change to HuggingFace datasets format
train_dataset = Dataset.from_dict({"text": train_texts})
valid_dataset = Dataset.from_dict({"text": valid_texts})

# Encode with tokenizer
def tokenize_function(example):
    return tokenizer(example["text"], truncation=True, padding="max_length", max_length=128)


tokenized_train = train_dataset.map(tokenize_function, batched=True)
tokenized_valid = valid_dataset.map(tokenize_function, batched=True)


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]

Map:   0%|          | 0/45000 [00:00<?, ? examples/s]

Map:   0%|          | 0/15000 [00:00<?, ? examples/s]

### Create Mask

In [8]:
from transformers import DataCollatorForLanguageModeling

data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=True,
    mlm_probability=0.15
)

### Initialize the model and trainer

In [9]:
from transformers import RobertaForMaskedLM, TrainingArguments, Trainer

model = RobertaForMaskedLM.from_pretrained("roberta-base")
#model = RobertaForMaskedLM.from_pretrained("microsoft/codebert-base")
#To load trained weights use the following line, replace the path with your model path
#model = RobertaForMaskedLM.from_pretrained("/kaggle/input/codeqa-datasets-and-models/mlm-roberta")


training_args = TrainingArguments(
    output_dir="./mlm-roberta-stackoverflow",
    overwrite_output_dir=True,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    save_total_limit=2,
    prediction_loss_only=True,
    fp16=True,
    report_to="none",
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_valid,
    data_collator=data_collator,
    tokenizer=tokenizer
)

In [10]:
trainer.train()
trainer.save_model("./mlm-roberta-stackoverflow")
tokenizer.save_pretrained("./mlm-roberta-stackoverflow")

Step,Training Loss
500,1.784
1000,1.6082
1500,1.5433
2000,1.4838
2500,1.4582
3000,1.4165
3500,1.391
4000,1.3716


('./mlm-roberta-stackoverflow/tokenizer_config.json',
 './mlm-roberta-stackoverflow/special_tokens_map.json',
 './mlm-roberta-stackoverflow/vocab.json',
 './mlm-roberta-stackoverflow/merges.txt',
 './mlm-roberta-stackoverflow/added_tokens.json',
 './mlm-roberta-stackoverflow/tokenizer.json')

### Download the trained model

In [11]:
import shutil
import os
shutil.make_archive('/kaggle/working/mlm-roberta', 'zip', '/kaggle/working/mlm-roberta-stackoverflow')

from IPython.display import FileLink, display
FileLink("mlm-roberta.zip")

### Load and evaluate the model

In [12]:
from transformers import RobertaForMaskedLM, RobertaTokenizerFast

model = RobertaForMaskedLM.from_pretrained("/kaggle/input/codeqa-datasets-and-models/mlm-roberta")
tokenizer = RobertaTokenizerFast.from_pretrained("/kaggle/input/codeqa-datasets-and-models/mlm-roberta")

In [13]:
from transformers import RobertaTokenizer, RobertaForSequenceClassification

tokenizer = RobertaTokenizer.from_pretrained("roberta-base")
model = RobertaForMaskedLM.from_pretrained("roberta-base")

In [16]:
import pandas as pd
from datasets import Dataset

# load valid data
df_valid = pd.read_csv("/kaggle/input/60k-stack-overflow-questions-with-quality-rate/valid.csv")
df_valid = df_valid.dropna(subset=["Body"])  # 防止空值报错

valid_dataset = Dataset.from_pandas(df_valid[["Body"]])

# Tokenize for MLM
def tokenize_mlm(example):
    return tokenizer(example["Body"], truncation=True, padding="max_length", max_length=128)

tokenized_valid = valid_dataset.map(tokenize_mlm, batched=True)
tokenized_valid.set_format("torch", columns=["input_ids", "attention_mask"])


Map:   0%|          | 0/15000 [00:00<?, ? examples/s]

In [17]:
from transformers import DataCollatorForLanguageModeling

data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=True, mlm_probability=0.15)

In [18]:
from transformers import Trainer

trainer = Trainer(
    model=model,
    tokenizer=tokenizer,
    data_collator=data_collator
)

eval_results = trainer.evaluate(eval_dataset=tokenized_valid)
print(eval_results)


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


{'eval_loss': 3.4965949058532715, 'eval_runtime': 131.8841, 'eval_samples_per_second': 113.736, 'eval_steps_per_second': 7.112}


## Classification
### Load Dataset

In [19]:
import json
import os

# Supposed to be 11 classes, but in the cleaned train dataset, actually
# there are only 9 classes available. After merging similar meaning classes, 
# we got 7 classes for this question categorization module
label_mapping = {
    "code_understanding": "code_explain",
    "code_explain": "code_explain",
    "logical": "logical_reasoning",
    "reasoning": "logical_reasoning",
    "error": "error",
    "usage": "usage",
    "algorithm": "algorithm",
    "task": "task",
    #"comparison": "comparison",
    "variable": "variable",
    #"guiding": "guiding"
}

def jsonl_to_json(input_file, output_file):
    # Make sure the output path exists
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
    
    # Extract data from the file
    questions = []
    labels = []
    
    with open(input_file, "r") as f:
        for line in f:
            data = json.loads(line)
            question = data["question"]  # Extract question
            original_label = data["questionType"]  # Extract label
            
            # Remap the questions to the labels
            if original_label in label_mapping:
                new_label = label_mapping[original_label]
            else:
                new_label = original_label  # Keep the label if the label is not in the dict
                print("New label found", new_label)
            
            questions.append(question)
            labels.append(new_label)
    
    # Change the format for Hugging Face models
    train_data = [{"text": q, "label": l} for q, l in zip(questions, labels)]
    
    # Write the data into the .json files
    with open(output_file, "w") as f:
        json.dump(train_data, f, ensure_ascii=False, indent=4)
    
    print(f"Converted data has been saved to {output_file}")

input_file = "/kaggle/input/codeqa-datasets-and-models/cs1qa/cs1qa/augmented_train_cleaned.jsonl"
output_dir = "/kaggle/working/cs1qa"
output_file = os.path.join(output_dir, "train.json")
jsonl_to_json(input_file, output_file)

input_file = "/kaggle/input/codeqa-datasets-and-models/cs1qa/cs1qa/test_cleaned.jsonl"
output_dir = "/kaggle/working/cs1qa"
output_file = os.path.join(output_dir, "test.json")
jsonl_to_json(input_file, output_file)


Converted data has been saved to /kaggle/working/cs1qa/train.json
Converted data has been saved to /kaggle/working/cs1qa/test.json


In [20]:
from datasets import load_dataset
import numpy as np
import torch

dataset = load_dataset("json", data_files={"train": "/kaggle/working/cs1qa/train.json", "test": "/kaggle/working/cs1qa/test.json"})

labels = list(set(example["label"] for example in dataset["train"]))
label2id = {label: i for i, label in enumerate(labels)}
id2label = {i: label for label, i in label2id.items()}

def encode_labels(example):
    example["label"] = label2id[example["label"]]
    return example

dataset = dataset.map(encode_labels)
print(id2label)

Generating train split: 0 examples [00:00, ? examples/s]

Generating test split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/5907 [00:00<?, ? examples/s]

Map:   0%|          | 0/1847 [00:00<?, ? examples/s]

{0: 'error', 1: 'usage', 2: 'algorithm', 3: 'code_explain', 4: 'variable', 5: 'logical_reasoning', 6: 'task'}


In [21]:
# Describe the dataset
train_labels = dataset["train"]["label"]
test_labels = dataset["test"]["label"] if "test" in dataset else []

train_unique, train_counts = np.unique(train_labels, return_counts=True)
print("Train set label distribution:")
for label_id, count in zip(train_unique, train_counts):
    print(f"{id2label[label_id]}: {count}")

if len(test_labels) > 0:
    test_unique, test_counts = np.unique(test_labels, return_counts=True)
    print("\nTest set label distribution:")
    for label_id, count in zip(test_unique, test_counts):
        print(f"{id2label[label_id]}: {count}")

Train set label distribution:
error: 575
usage: 486
algorithm: 717
code_explain: 1054
variable: 1165
logical_reasoning: 1116
task: 794

Test set label distribution:
error: 193
usage: 162
algorithm: 239
code_explain: 229
variable: 389
logical_reasoning: 371
task: 264


### Tokenization

In [22]:
from transformers import RobertaTokenizer, RobertaForSequenceClassification

tokenizer = RobertaTokenizer.from_pretrained("microsoft/codebert-base")


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/150 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/498 [00:00<?, ?B/s]

In [23]:
from transformers import RobertaTokenizerFast

#tokenizer = RobertaTokenizerFast.from_pretrained("/kaggle/input/codeqa-datasets-and-models/mlm-roberta")

def tokenize(example):
    return tokenizer(example["text"], truncation=True, padding="max_length", max_length=128)

dataset = dataset.map(tokenize, batched=True)
dataset.set_format(type="torch", columns=["input_ids", "attention_mask", "label"])


Map:   0%|          | 0/5907 [00:00<?, ? examples/s]

Map:   0%|          | 0/1847 [00:00<?, ? examples/s]

In [24]:
from transformers import RobertaForSequenceClassification

model = RobertaForSequenceClassification.from_pretrained("microsoft/codebert-base",
                                                         num_labels=len(label2id),
                                                         id2label=id2label,
                                                         label2id=label2id,
                                                         hidden_dropout_prob=0.3,  # 加大 dropout
                                                         attention_probs_dropout_prob=0.3)

pytorch_model.bin:   0%|          | 0.00/499M [00:00<?, ?B/s]

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at microsoft/codebert-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [25]:
from sklearn.metrics import accuracy_score, precision_recall_fscore_support

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    preds = logits.argmax(axis=-1)
    precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average="weighted")
    acc = accuracy_score(labels, preds)
    return {
        "accuracy": acc,
        "precision": precision,
        "recall": recall,
        "f1": f1
    }

In [26]:
from transformers import TrainingArguments, Trainer
from transformers import EarlyStoppingCallback 

lr = 2e-5
train_epochs = 10

training_args = TrainingArguments(
    output_dir="./roberta_cs1qa_results",
    eval_strategy="epoch",
    save_strategy="epoch",
    learning_rate=lr,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=16,
    num_train_epochs=train_epochs,
    weight_decay=0.01,
    logging_dir="./logs",
    load_best_model_at_end=True,
    report_to="none",
    save_total_limit=2,
    greater_is_better=True,
    metric_for_best_model='f1',
    warmup_steps=200,
    lr_scheduler_type='cosine_with_restarts',
)
    
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=2)],
)

trainer.train()

model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,No log,1.456014,0.411478,0.335022,0.411478,0.29557
2,1.720900,1.062162,0.596643,0.598852,0.596643,0.575119
3,1.040100,0.942639,0.681104,0.667902,0.681104,0.664441
4,1.040100,0.952123,0.672442,0.66544,0.672442,0.661127
5,0.799000,0.890245,0.726042,0.721601,0.726042,0.722303
6,0.661800,0.857468,0.732539,0.731833,0.732539,0.730459
7,0.584400,0.876068,0.720628,0.723326,0.720628,0.718123
8,0.584400,0.89864,0.725501,0.728169,0.725501,0.723967


TrainOutput(global_step=2960, training_loss=0.8948978888021933, metrics={'train_runtime': 1038.061, 'train_samples_per_second': 56.904, 'train_steps_per_second': 3.564, 'total_flos': 3108533553100800.0, 'train_loss': 0.8948978888021933, 'epoch': 8.0})

In [27]:
# Evaluate the model
#metrics = trainer.evaluate()
#print(metrics)

# Save the model and tokenizer
trainer.save_model("./roberta_cs1qa_model_10epoch")
tokenizer.save_pretrained("./roberta_cs1qa_model_10epoch")

('./roberta_cs1qa_model_10epoch/tokenizer_config.json',
 './roberta_cs1qa_model_10epoch/special_tokens_map.json',
 './roberta_cs1qa_model_10epoch/vocab.json',
 './roberta_cs1qa_model_10epoch/merges.txt',
 './roberta_cs1qa_model_10epoch/added_tokens.json')

In [28]:
import shutil
shutil.make_archive('/kaggle/working/mlm-roberta-cs1qa', 'zip', '/kaggle/working/roberta_cs1qa_model_10epoch')

from IPython.display import FileLink
FileLink("mlm-roberta-cs1qa.zip")

In [29]:
eval_results = trainer.evaluate()
print(eval_results)

{'eval_loss': 0.857467532157898, 'eval_accuracy': 0.7325392528424473, 'eval_precision': 0.7318332635306875, 'eval_recall': 0.7325392528424473, 'eval_f1': 0.7304591718765676, 'eval_runtime': 9.0229, 'eval_samples_per_second': 204.702, 'eval_steps_per_second': 6.428, 'epoch': 8.0}


## Streamlit App

In [32]:
%%writefile app.py
import streamlit as st
from transformers import (
    AutoModelForSequenceClassification,
    AutoTokenizer,
    AutoModelForCausalLM
)
import torch

st.title("CodeQA")

with st.sidebar:
    st.header("Settings")
    max_length = st.slider("Max Answer Length", 100, 500, 300)
    temperature = st.slider("Temperature", 0.1, 1.0, 0.7)

@st.cache_resource
def load_models():
    cls_model_path = "/kaggle/working/roberta_cs1qa_model_10epoch"
    cls_tokenizer = AutoTokenizer.from_pretrained(cls_model_path)
    cls_model = AutoModelForSequenceClassification.from_pretrained(cls_model_path)
    
    gen_model_path = "/kaggle/input/codeqa-datasets-and-models/codellama/codellama"
    gen_tokenizer = AutoTokenizer.from_pretrained(gen_model_path)
    gen_model = AutoModelForCausalLM.from_pretrained(
        gen_model_path,
        torch_dtype=torch.float16,
        device_map="auto"
    )
    
    return cls_model, cls_tokenizer, gen_model, gen_tokenizer

cls_model, cls_tokenizer, gen_model, gen_tokenizer = load_models()

id2label = {0: 'usage', 
            1: 'code_explain', 
            2: 'logical_reasoning', 
            3: 'error', 
            4: 'task', 
            5: 'variable', 
            6: 'algorithm'}

question = st.text_area("Enter your question", height=150, placeholder="Example: why does this loop only execute 5 times?")
code = st.text_area("Enter your code(if applicable)", height=150, placeholder="Optional: Paste your code here...") 


if st.button("Generate"):
    if not question.strip():
        st.warning("Please enter your question.")
    else:
        with st.spinner("Analysing..."):
            cls_inputs = cls_tokenizer(question, return_tensors="pt", truncation=True, padding=True)
            with torch.no_grad():
                cls_outputs = cls_model(**cls_inputs)
            pred = cls_outputs.logits.argmax(dim=-1).item()
            category = id2label.get(pred, "unknown")

            code_explain_prompt = """
            [INST] <<SYS>>
            You are a code behavior specialist. Explain ONLY:
            1. What the code literally does (no assumptions)
            2. Key operations line-by-line
            3. Output format if applicable
            Avoid suggesting improvements or external context.
            <</SYS>>
            
            [Task: code_explain]
            Question: {question}
            Code:{code}
            [/INST]
            """
            
            logical_reasoing_prompt = """
            [INST] <<SYS>>
            You are a logic analyzer.:
            1. Trace the control flow step-by-step
            2. Highlight conditionals/branches
            3. Show final conclusion with evidence
            <</SYS>>
            
            [Task: logical_reasoning]
            Question: {question}
            Code:{code}
            [/INST]
            """
            
            error_prompt = """
            [INST] <<SYS>>
            You are a debugger. Respond with:
            1. Exact error type + trigger line
            2. Immediate fix (code snippet)
            3. Root cause
            Format fixes as: ```python\n[code]\n```
            <</SYS>>
            
            [Task: error]
            Question: {question}
            Code:{code}
            [/INST]
            """
            
            usage_prompt = """
            [INST] <<SYS>>
            You are a documentation expert. Provide:
            1. Standard library/API usage syntax
            2. Required parameters + return type
            3. Minimal working example
            <</SYS>>
            
            [Task: usage]
            Question: {question}
            Code:{code}
            [/INST]
            """
            
            algorithm_prompt = """
            [INST] <<SYS>>
            You are an algorithms professor. Explain:
            1. Algorithm name/pattern
            2. Time/space complexity (Big-O)
            3. Optimization potential
            <</SYS>>
            
            [Task: algorithm]
            Question: {question}
            Code:{code}
            [/INST]
            """
            
            task_prompt = """
            [INST] <<SYS>>
            You are a pair programmer.:
            1. Break down requirements into steps
            2. Provide complete implementation
            3. Explain key decisions
            Format code as: ```python\n[code]\n```
            <</SYS>>
            
            [Task: task]
            Question: {question}
            Code:{code}
            [/INST]
            """
            
            variable_prompt = """
            [INST] <<SYS>>
            You are a runtime inspector.:
            1. Track variable value changes
            2. Show scope/lifetime
            3. Highlight type conversions
            <</SYS>>
            
            [Task: variable]
            Question: {question}
            Code:{code}
            [/INST]
            """
            
            concept_prompt = """
            [INST] <<SYS>>
            You are a CS lecturer. Explain:
            1. Core concept definition
            2. Real-world analogy
            3. Simple code demonstration
            <</SYS>>
            
            [Task: concept]
            Question: {question}
            Code:{code}
            [/INST]
            """
 
            PROMPT_TEMPLATES = {
                "code_explain" : code_explain_prompt,
                "logical_reasoing" : logical_reasoing_prompt,
                "error" : error_prompt,
                "usage" : usage_prompt,
                "algorithm" : algorithm_prompt,
                "task" : task_prompt,
                "variable" : variable_prompt,
                "concept" : concept_prompt
            }
            
            def build_prompt(category, question, code=None):
                template = PROMPT_TEMPLATES[category]
                prompt = template.replace("{question}", question)
                if "{code}" in prompt and code:
                    prompt = prompt.replace("{code}", code)
                return prompt

            prompt = build_prompt(category, question, code=code)

            st.write(prompt)
            
            gen_inputs = gen_tokenizer(prompt, return_tensors="pt").to("cuda")
            outputs = gen_model.generate(
                **gen_inputs,
                max_new_tokens=max_length,
                temperature=temperature,
                do_sample=True
            )
            answer = gen_tokenizer.decode(outputs[0], skip_special_tokens=True)
            
            answer = answer.split("[/INST]")[-1].strip()

        st.write(answer)

st.divider()
st.subheader("Example Questions")
examples = [
    "How to reverst arrays in Python",
    "Why I got 'IndexError: list index out of range'?"
]
for ex in examples:
    if st.button(ex, key=ex):
        question = ex

Writing app.py


In [30]:
!pip install streamlit -q
!pip install pyngrok -q

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.9/9.9 MB[0m [31m59.8 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.9/6.9 MB[0m [31m98.5 MB/s[0m eta [36m0:00:00[0m:00:01[0m
[?25h

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [31]:
import subprocess
subprocess.Popen(["streamlit", "run", "app.py", "--server.port", "8501"])

# set ngrok
from pyngrok import ngrok
ngrok.set_auth_token("2vPblhhFOkqjvlAOgSDa8qisMXa_546jNEKPFTVZyrBfdGZMm")  # ← 替换这里

public_url = ngrok.connect(addr="8501", proto="http") 
print("Access URL", public_url)

                                                                                                    

Usage: streamlit run [OPTIONS] TARGET [ARGS]...
Try 'streamlit run --help' for help.

Error: Invalid value: File does not exist: app.py


Access URL NgrokTunnel: "https://99a1-34-42-216-145.ngrok-free.app" -> "http://localhost:8501"
