In [29]:
drdo="""Predictive Factor Analysis of Air-to-Air Engagement Outcomes Using Air Combat Manoeuvring Instrumentation Data
Air superiority is essential in modern warfare1-3. Air superiority refers to controlling the battlefield sky against an enemy. Once air superiority is achieved, friendly forces, including ground forces, can manoeuvre without prohibitive interference from enemy forces4,5. Air combat is a tactical method used to achieve air superiority, and various studies have been conducted to improve its efficiency6-9. In this study, we focus on the critical factors of air combat against an enemy’s aerial vehicle regarding Air Combat Manoeuvres (ACM). Regarding ACM, it is essential to develop effective combat tactics and train fighter pilots to improve the win rate in air-to-air combat. However, due to costs, the use of fighters and weapons for developing or evaluating tactics and training or testing pilot skills is limited10. Thus, air-to-air combat training is mostly conducted in virtual environments, and the development of precise ACM performance measurements is becoming increasingly important to ensure the reliability of air combat tactics and pilot skills in real-world scenarios. Existing research approaches to ACM performance measurements mainly focus on combining analytical and empirical methodologies to develop appropriate measurement structures and algorithms11. Candidate measurements such as positional advantage and weapon events have been developed based on the state information of both aircraft and weapons, and various studies have utilised these candidates12-17. Waag18 , et al. proposed a composite measure to predict engagement outcomes during ACM. Krusmark12, et al. assessed the effectiveness of the traditional Grade sheet used to measure air-combat performance. ARAR19, et al. proposed a flexible rule-based framework for a pilot performance analysis. However, while the utility and effectiveness of both simulation systems and ACM performance measurements have been demonstrated regarding training fighter pilots and developing air combat tactics, more debate still needs to be had on their reliability and validity in real-world environments20-21 . Balcerzak22, et al. insisted that there was a shortage of research demonstrating the validity of simulation systems, citing the case of civilian aircraft, and that it was more apparent whether the skills learned in simulations were appropriately applied to actual flights. This debate has significant implications for the military domain. Therefore, providing feedback based on actual manoeuvring track data analysis is essential for calibrating measurements developed in a virtual environment. However, a statistical approach to ACM based on actual data has rarely been studied in this domain because acquiring the actual manoeuvring data of an aircraft is limited because of cost and safety concerns.
Air Combat Manoeuvring Instrumentation (ACMI) systems may be an alternative to resolve these limitations. An ACMI system records in-flight data, such as positional information, aircraft state, and weapon events, using pod devices attached to the aircraft, and the recorded data are used for debriefing. The system consists of aircraft pods and a ground system. ACM data are transmitted from the pod to the ground system for recording, displaying, and debriefing23. In addition, these data have been consistently accumulated and managed for over a decade. Thus, given the various attributes and quantities of ACMI data, they can be used in data-driven research24-25 . Motivated by the need for more realistic and data-driven analyses of air combat engagements, this study presents a comprehensive study based on extensive real-world ACMI data from training engagements. Our objectives are threefold: First, to demonstrate a standard procedure for utilising ACMI system data, encompassing feature extraction, selection, and effective modelling of a hit-prediction problem. Second, an airto-air engagement hit prediction model was constructed using machine learning algorithms, which allowed us to determine the most dominant components of the ACM in deciding engagement outcomes. Third, interpretable machine-learning techniques were applied to rank the key factors for successful engagement. We analyze feature importance using correlation coefficients, feature importance scores, and SHAP (SHapley Additive exPlanations) values26. This approach also allowed us to validate conventional methods, differentiating our work from previous studies that relied primarily on simulated or limited flight test data. The ACMI data are provided by the Republic of Korea Air Force (ROKAF) for research purposes only and are not publicly accessible. The remainder of this paper is organized as follows. Section 2 describes the problem definition and data. Sections 3 and 4 demonstrate the results of feature engineering and the analysis details, respectively, followed by a discussion and conclusion in Section 5.
According to the ROKAF training protocol, air-to-air combat training can be divided into the five categories listed in Table 1. This study only focused on the BFM training procedure. Let BLUE be a fighter of friendly forces and RED
be an adversarial fighter in an air combat training scenario. BLUE and RED are the same type of fighter, F-16, who engage in Within-Visual Range (WVR) combat. BLUE fires AIM-9 IR (infrared) tracking-guided air-to-air missiles to shoot down RED27. During training, the ACMI pods collected the maneuvering data of both aircraft, except for the RED probability of kill (PK) value. The PK value, which represents the extent to which BLUE’s missile damages RED and ranges from 0 to 1, was calculated internally using the ACMI system. This calculation method has not yet been publicly disclosed. Thus, this study assumed that the PK value calculated by the system adequately reflects the damage to the actual air-to-air engagement. Based on maneuvering data and PK values, we formulate the hit-prediction model to predict a ‘Hit’ or ‘Miss’ from the maneuvering and weapon event data of BLUE and RED. The ‘0’ PK value indicates ‘Miss,’ which means no damage to RED, and the others are converted to ‘Hit,’ which means sufficient damage to RED. The distribution of PK values and the distribution of ‘Hit’ and ‘Miss’ are shown in Fig. 1. The data for training the hit prediction model were obtained from the ACMI system operated by the ROKAF, where the collection period was from 2009 to 2019. To prepare the data, we applied several pre-processing steps. First, we addressed data quality issues by removing outliers and missing data points, which often result from the highspeed data acquisition inherent to the ACMI system. Next, data consistency was ensured by standardizing the units of speed and angle across all attributes. However, we did not perform data normalization because the machine-learning algorithms employed were designed to appropriately handle varying scales of input features. After pre-processing, the dataset contains 2,258 instances corresponding to 2,258 missile launches (hits or misses). Of the total, 1,721 instances were labeled as ‘Hit’ and 537 as ‘Miss,’ yielding a hit ratio of 76.2 % and establishing the baseline performance. Table 2 lists the 18 attributes used in this study.
"""

In [30]:
import numpy as np

In [31]:
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer

In [32]:
tokenizer=Tokenizer()

In [33]:
tokenizer.fit_on_texts([drdo])

In [34]:
len(tokenizer.word_index)

479

In [36]:
input_sequences=[]
for sentence in drdo.split('\n'):
    tokenized_sentence=tokenizer.texts_to_sequences([sentence])[0]

    for i in range(1,len(tokenized_sentence)):
        input_sequences.append(tokenized_sentence[:i+1])

In [38]:
max_len=max([len(x) for x in input_sequences])
max_len

411

In [39]:
from tensorflow.keras.preprocessing.sequence import pad_sequences
padded_input_sequences=pad_sequences(input_sequences,maxlen=max_len,padding='pre')

In [40]:
padded_input_sequences

array([[  0,   0,   0, ...,   0, 157, 158],
       [  0,   0,   0, ..., 157, 158,  45],
       [  0,   0,   0, ..., 158,  45,   4],
       ...,
       [  0,   0,   0, ...,  80,  35,   7],
       [  0,   0,   0, ...,  35,   7,  12],
       [  0,   0,   0, ...,   7,  12,  26]])

In [41]:
X=padded_input_sequences[:,:-1]

In [42]:
y=padded_input_sequences[:,-1]

In [43]:
X.shape

(1128, 410)

In [44]:
y.shape

(1128,)

In [45]:
from tensorflow.keras.utils import to_categorical
y=to_categorical(y,num_classes=480)

In [46]:
y.shape

(1128, 480)

In [47]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Embedding,LSTM,Dense

In [48]:
model = Sequential()
model.add(Embedding(input_dim=480, output_dim=300, input_length=410))  # Added input_length
model.add(LSTM(256)) # No need for input_shape here
model.add(Dropout(0.3))
model.add(Dense(480, activation='softmax'))



In [49]:
model.compile(loss='categorical_crossentropy', optimizer='adam',metrics=['accuracy'])

In [50]:
model.summary()

In [51]:
model.fit(X,y,epochs=35)

Epoch 1/35
[1m36/36[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m22s[0m 577ms/step - accuracy: 0.0331 - loss: 6.0663
Epoch 2/35
[1m36/36[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m29s[0m 803ms/step - accuracy: 0.0497 - loss: 5.6204
Epoch 3/35
[1m36/36[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m29s[0m 794ms/step - accuracy: 0.0634 - loss: 5.5354
Epoch 4/35
[1m36/36[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m29s[0m 799ms/step - accuracy: 0.0633 - loss: 5.4905
Epoch 5/35
[1m36/36[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m29s[0m 805ms/step - accuracy: 0.0706 - loss: 5.3349
Epoch 6/35
[1m36/36[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m29s[0m 809ms/step - accuracy: 0.1157 - loss: 5.1615
Epoch 7/35
[1m36/36[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m29s[0m 802ms/step - accuracy: 0.1568 - loss: 4.8394
Epoch 8/35
[1m36/36[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m27s[0m 749ms/step - accuracy: 0.1675 - loss: 4.4397
Epoch 9/35
[1m36/36[0m [32m━━

<keras.src.callbacks.history.History at 0x271b049c380>

In [53]:
def word_predict(text):
    for i in range(4):
      # tokenize
      token_text = tokenizer.texts_to_sequences([text])[0]
      # padding
      padded_token_text = pad_sequences([token_text], maxlen=56, padding='pre')
      # predict
      pos = np.argmax(model.predict(padded_token_text))
    
      for word,index in tokenizer.word_index.items():
        if index == pos:
          text = text + " " + word
          print(text)

In [54]:
word_predict("Air Combat")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 76ms/step
Air Combat manoeuvring
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 64ms/step
Air Combat manoeuvring instrumentation
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 73ms/step
Air Combat manoeuvring instrumentation acmi
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 65ms/step
Air Combat manoeuvring instrumentation acmi systems


In [57]:
word_predict("Rokaf")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 68ms/step
Rokaf training
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 82ms/step
Rokaf training protocol
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 70ms/step
Rokaf training protocol air
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 72ms/step
Rokaf training protocol air combat


In [60]:
word_predict("Predictive")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 68ms/step
Predictive factor
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 72ms/step
Predictive factor analysis
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 66ms/step
Predictive factor analysis of
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 64ms/step
Predictive factor analysis of air
