# Instruct part of the model

This part of the code is using the instruct model to write new requirements

In [1]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

torch.random.manual_seed(0)

model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-128k-instruct", 
    device_map="cuda", 
    torch_dtype="auto", 
    trust_remote_code=True, 
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-128k-instruct")

messages = [
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

generation_args = {
    "max_new_tokens": 500,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

output = pipe(messages, **generation_args)
print(output[0]['generated_text'])


`flash-attention` package not found, consider installing for better performance: No module named 'flash_attn'.
Current `flash-attenton` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`.


model.safetensors.index.json:   0%|          | 0.00/16.3k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/172 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.18k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/568 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
You are not running the flash-attention implementation, expect numerical differences.


To solve the equation 2x + 3 = 7, you need to isolate the variable x. Here are the steps:

1. Subtract 3 from both sides of the equation to get: 2x = 7 - 3, which simplifies to 2x = 4.

2. Divide both sides of the equation by 2 to solve for x: x = 4 / 2.

So, the solution to the equation 2x + 3 = 7 is x = 2.


In [2]:
messages = [
    {"role": "user", "content": "Generate a requirements starting from 'The payload of the system shall' based on the following text 'The UPF latency time budget related to robotic aided diagnosis, shall be less than the end-to-end values listed in TS 22.104, Table A.6.3-1: Service performance requirements for motion control and haptic feedback.'"},
]

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

generation_args = {
    "max_new_tokens": 500,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

output = pipe(messages, **generation_args)
print(output[0]['generated_text'])



1. The payload of the system shall ensure the latency time for robotic aided diagnosis is less than the values specified in TS 22.104, Table A.6.3-1.

2. The system's payload shall adhere to the service performance requirements for motion control as outlined in TS 22.104, Table A.6.3-1.

3. The payload of the system shall meet the haptic feedback performance requirements as stated in TS 22.104, Table A.6.3-1.

4. The system's payload shall guarantee the latency time for robotic aided diagnosis does not exceed the end-to-end values listed in TS 22.104, Table A.6.3-1.

5. The payload of the system shall ensure the system's overall performance aligns with the standards set in TS 22.104, Table A.6.3-1.


The UPF latency time budget related to robotic aided diagnosis, shall be less than the end-to-end values listed in TS 22.104, Table A.6.3-1: Service performance requirements for motion control and haptic feedback.

When the UE is in CM-CONNECTED with RRC_INACTIVE state with CN based mobile terminating (MT) communication handling, high latency communication as described in clause 5.31.8 of TS 23.501 [2] is applied.

In [3]:
messages = [
    {"role": "user", "content": "Generate a requirements starting from 'The payload of the system shall' based on the following text 'When the UE is in CM-CONNECTED with RRC_INACTIVE state with CN based mobile terminating (MT) communication handling, high latency communication as described in clause 5.31.8 of TS 23.501 [2] is applied.'"},
]

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

generation_args = {
    "max_new_tokens": 500,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

output = pipe(messages, **generation_args)
print(output[0]['generated_text'])

1. The system shall support the UE (User Equipment) in CM-CONNECTED state with RRC_INACTIVE state.
2. The system shall handle CN based mobile terminating (MT) communication.
3. The system shall be capable of managing high latency communication.
4. The system shall apply high latency communication as described in clause 5.31.8 of TS 23.501 [2] when the UE is in CM-CONNECTED state with RRC_INACTIVE state and CN based MT communication handling.


In [4]:
strContent = "enerate a requirements starting from: 'The payload of the system shall'"

strC = '''Detected when the UE transitions to CM-CONNECTED state or when the UE will become reachable for paging, e.g. Periodic Registration Update timer. It indicates when the UE becomes reachable for sending downlink data to the UE.
The AF may provide the following parameters:
1)	Maximum Latency;
2)	Maximum Response Time;
3)	Suggested number of downlink packets. (see NOTE 5 and NOTE 7).
This event requires the Reachability Filter set to UE reachable for DL traffic" (see clause 5.2.2.3.1-1). For the usage of this event, see clauses 4.2.5.2 and 4.2.5.3.
When requesting UE reachability monitoring, the AF may in addition request Idle Status Indication to be included in the UE reachability event reporting.
'''

strContent = strContent + strC

In [5]:
messages = [
    {"role": "user", "content": strContent},
]

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

generation_args = {
    "max_new_tokens": 500,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

output = pipe(messages, **generation_args)
print(output[0]['generated_text'])



The system shall initiate a payload transmission when the User Equipment (UE) enters the CM-CONNECTED state or when it becomes accessible for paging, such as during the Periodic Registration Update timer. This event serves as a signal that the UE is now capable of receiving downlink data from the system.

The Access Function (AF) is expected to provide certain parameters in this context:

1) Maximum Latency: This parameter refers to the longest allowable delay from the time the UE becomes reachable until the system starts sending downlink data.

2) Maximum Response Time: This parameter indicates the maximum time the system should take to respond after the UE becomes reachable.

3) Suggested number of downlink packets: The AF may suggest a certain number of downlink packets to be sent. This suggestion is based on NOTE 5 and NOTE 7.

This event necessitates the Reachability Filter to be set to 'UE is reachable for DL traffic'. This is detailed in clause 5.2.2.3.1-1.

For the utilization 