# Specialized Fine-Tuning of Transformer Models

## Objective
This notebook demonstrates how to **fine-tune transformer models** for **domain-specific tasks**, including:
- **Clinical Report Generation:** Generating medical reports from structured patient data.
- **Financial Forecasting:** Analyzing financial data and generating investment strategies.
- **Education & Tutoring Systems:** Creating an AI tutor to generate customized learning material and answer students' questions.

In [1]:
# Install necessary libraries
!pip install transformers pandas scikit-learn openai




## 1. Clinical Report Generation Example


In [2]:
from transformers import pipeline
import pandas as pd

print("Clinical Report Generation Example")
clinical_report_pipeline = pipeline('text-generation', model='gpt2')

patient_data = "Patient presents with mild fever and cough. Past medical history includes asthma."
report = clinical_report_pipeline(patient_data, max_length=100, num_return_sequences=1)
print("Generated Clinical Report:", report[0]['generated_text'])

Clinical Report Generation Example


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated Clinical Report: Patient presents with mild fever and cough. Past medical history includes asthma. Possible signs of flu include:

Hip fever

Rashes

Spitting and blisters

Chest pain

Red eyes or sore throats.

If there will be no immediate impact, an emergency plan should be offered to cover the costs of a visit to the hospital.

Exempting children from treatment

If the conditions caused by influenza or pneumonia are determined to be life-


## 2. Financial Forecasting Example


In [3]:
print("Financial Forecasting Example")
financial_forecasting_pipeline = pipeline('text-generation', model='gpt2')

financial_data = "The stock price of XYZ company has shown a steady increase over the past quarter."
forecast = financial_forecasting_pipeline(financial_data, max_length=50, num_return_sequences=1)
print("Financial Insight:", forecast[0]['generated_text'])

Financial Forecasting Example


Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Financial Insight: The stock price of XYZ company has shown a steady increase over the past quarter. The first quarter earnings were up 7% versus May 30.

Sales for some of the other largest U.S. stocks like Walmart are also expected to continue


## 3. Education & Tutoring Systems Example


In [4]:
print("Education & Tutoring Systems Example")
tutoring_pipeline = pipeline('text-generation', model='gpt2')

student_query = "Can you explain the Pythagorean theorem?"
tutor_response = tutoring_pipeline(student_query, max_length=100, num_return_sequences=1)
print("Tutor Response:", tutor_response[0]['generated_text'])

Education & Tutoring Systems Example


Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Tutor Response: Can you explain the Pythagorean theorem? To begin, ask a question such as "How is the point of zero ever to be determined in one particular way?" You'll be asked multiple questions, such as, "Can one person be certain of all known objects?" and, "If this person had a finite number of known objects, why didn't others consider this number at all?"

In every case of the Pythagorean theorem, you will now be able to say that a number


# NFL prediction Example

In [6]:
print("NFL prediction Example")
NFL_prediction_pipeline = pipeline('text-generation', model='gpt2')

player_data = "There is a tough competition between Kansas city cheifs and San Francisco 49ers, There are higher chances of Cheifs winning the game ."
predict = NFL_prediction_pipeline(player_data, max_length=50, num_return_sequences=1)
print("NFL_prediction:", predict[0]['generated_text'])

NFL prediction Example


Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


NFL_prediction: There is a tough competition between Kansas city cheifs and San Francisco 49ers, There are higher chances of Cheifs winning the game . However, the 49ers have a strong offensive line that can beat San Francisco and the secondary can be one


In [7]:
!pip install streamlit

Collecting streamlit
  Downloading streamlit-1.42.2-py2.py3-none-any.whl.metadata (8.9 kB)
Collecting watchdog<7,>=2.1.5 (from streamlit)
  Downloading watchdog-6.0.0-py3-none-manylinux2014_x86_64.whl.metadata (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.3/44.3 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
Collecting pydeck<1,>=0.8.0b4 (from streamlit)
  Downloading pydeck-0.9.1-py2.py3-none-any.whl.metadata (4.1 kB)
Downloading streamlit-1.42.2-py2.py3-none-any.whl (9.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.6/9.6 MB[0m [31m51.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pydeck-0.9.1-py2.py3-none-any.whl (6.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.9/6.9 MB[0m [31m55.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading watchdog-6.0.0-py3-none-manylinux2014_x86_64.whl (79 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.1/79.1 kB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[

In [14]:
%%writefile app.py

import streamlit as st
from transformers import pipeline

# Load the pre-trained GPT-2 model for text generation
NFL_prediction_pipeline = pipeline('text-generation', model='gpt2')

# Define the function to predict the winner
def predict_winner(team1, team2, score1_q1, score1_q2, score1_q3, score2_q1, score2_q2, score2_q3):
    # Summing the scores for both teams before Q4
    total_score_team1 = score1_q1 + score1_q2 + score1_q3
    total_score_team2 = score2_q1 + score2_q2 + score2_q3

    # Predict the winner based on the score comparison
    if total_score_team1 > total_score_team2:
        winner = team1
    elif total_score_team2 > total_score_team1:
        winner = team2
    else:
        winner = "It's a tie!"

    # Generate a prediction text using GPT-2
    prediction_text = f"Before Q4, {team1} has {total_score_team1} points, while {team2} has {total_score_team2} points. "
    prediction_text += f"The winner will likely be {winner}."

    # Generate prediction using the text-generation pipeline
    prediction_output = NFL_prediction_pipeline(prediction_text, max_length=100, num_return_sequences=1)

    return prediction_output[0]['generated_text']

# Streamlit app interface
st.title("NFL Prediction App")

# Input for teams and their scores for Q1, Q2, Q3
team1 = st.text_input("Enter Team 1 Name")
team2 = st.text_input("Enter Team 2 Name")

score1_q1 = st.number_input(f"Enter {team1} Q1 score", min_value=0)
score1_q2 = st.number_input(f"Enter {team1} Q2 score", min_value=0)
score1_q3 = st.number_input(f"Enter {team1} Q3 score", min_value=0)

score2_q1 = st.number_input(f"Enter {team2} Q1 score", min_value=0)
score2_q2 = st.number_input(f"Enter {team2} Q2 score", min_value=0)
score2_q3 = st.number_input(f"Enter {team2} Q3 score", min_value=0)

# When the user clicks the "Predict Winner" button
if st.button("Predict Winner"):
    if team1 and team2:
        prediction = predict_winner(team1, team2, score1_q1, score1_q2, score1_q3, score2_q1, score2_q2, score2_q3)
        st.write(prediction)
    else:
        st.error("Please enter valid team names.")


Overwriting app.py


In [15]:
!curl ipv4.icanhazip.com


34.30.95.69


In [16]:
!streamlit run app.py &>./logs.txt & npx localtunnel --port 8501

[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0Kyour url is: https://upset-clouds-joke.loca.lt
^C
