# Close Encounters Calculator
### Preamble: Session details
Start a Cloudera Machine Learning (CML) session on Cloudera which has the following sessions settings:

![Cloudera Machine Learning Session Settings](close-encounters/media/CloseEncountersSessionCML.JPG "Cloudera Machine Learning Session Settings")

### 1. Install requirements 
It might be you need to install some additional Python packages first time you run this code. Run the cell below. 

In [1]:
#!pip install close-encounters==0.1.0

### 2. Library imports

In [2]:
# Python
import pandas as pd
pd.DataFrame.iteritems = pd.DataFrame.items # Hotfix since iteritems is deprecated
import numpy as np
from time import time
from close_encounters import CloseEncounters
import os
from pyspark.sql import SparkSession
from IPython.display import display, HTML

### 2. Close encounter algorithm settings

In [3]:
## Set Minimal Horizontal Separation in Nautical Miles (NM) 
h_dist_NM = 5

## Set Minimal Vertical Separation in Flight Levels (FL) 
v_dist_FL = 10

# Set Minimal Flight Level (FL)
# Note: All flight sections below this altitude are pruned before close encounter algorithm is applied.
v_cutoff_FL = 245

# Set resampling frequency 
freq_s = 1

# Set Maximal Interpolation Time in Minutes (min) 
# Note: Whenever a trajectory is missing a portion of the flight which takes longer than this time, it will not be interpolated. 
t_max = 10

### 3. Spark Session Initialization

In [4]:
# Initialize the Spark Session
spark = SparkSession.builder \
    .appName("CloseEncounters") \
    .config("spark.executor.memory", "5g") \
    .config("spark.driver.memory", "2g") \
    .config("spark.executor.cores", "1") \
    .config("spark.executor.instances", "10") \
    .config("spark.sql.shuffle.partitions", "100") \
    .config("spark.default.parallelism", "100") \
    .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \
    .config("spark.rpc.message.maxSize", "512") \
    .getOrCreate()

# Display the Spark URL to monitor the process
# Get environment variables
engine_id = os.getenv('CDSW_ENGINE_ID')
domain = os.getenv('CDSW_DOMAIN')

# Format the URL
url = f"https://spark-{engine_id}.{domain}"

# Display the clickable URL
display(HTML(f'<a href="{url}">{url}</a>'))

Setting spark.hadoop.yarn.resourcemanager.principal to quinten.goens


### 4. Run on sample data

In [None]:
%%time
# Initiate Close Encounters with Spark
ce = CloseEncounters(spark = spark)

# Load trajectories into close encounters
ce = ce.load_parquet_trajectories(
    parquet_path = 'data/flight_profiles_cpf_20240701_filtered.parquet',
    flight_id_col = 'FLIGHT_ID', 
    icao24_col = 'ICAO24',
    longitude_col = 'LONGITUDE',
    latitude_col = 'LATITUDE',
    time_over_col = 'TIME_OVER',
    flight_level_col = 'FLIGHT_LEVEL'
)

[Stage 0:>                                                          (0 + 0) / 1]

In [None]:
%%time
# Find close encounters
ce_sdf = ce.find_close_encounters(
    h_dist_NM=h_dist_NM,
    v_dist_FL=v_dist_FL,
    v_cutoff_FL=v_cutoff_FL,
    freq_s=freq_s,
    t_max=t_max
)

# Convert from a Spark DataFrame (sdf) to Pandas Dataframe (pdf)
ce_pdf = ce_sdf.toPandas() 

In [7]:
ce_pdf.to_parquet('data/ce_20240701_10FL_5NM_gtFL245_v6.parquet')

In [8]:
ce_pdf

Unnamed: 0,ID2,ID1,time_over,h3_group,ID,lat1,lon1,time1,flight_lvl1,flight_id1,icao241,lat2,lon2,time2,flight_lvl2,flight_id2,icao242,time_diff_s,v_dist_FL,h_dist_NM
0,68721659260,68721553066,2024-07-01 21:11:21,841e157ffffffff,68721553066_68721659260,47.292760,15.321823,2024-07-01 21:11:21,34000.0,ID_273727707,440C1B,47.292926,15.319435,2024-07-01 21:11:21,34000.0,ID_273728319,440C1B,0,0.0,0.097848
1,68721659272,68721553078,2024-07-01 21:11:33,841e143ffffffff,68721553078_68721659272,47.295885,15.287031,2024-07-01 21:11:33,34000.0,ID_273727707,440C1B,47.296037,15.284657,2024-07-01 21:11:33,34000.0,ID_273728319,440C1B,0,0.0,0.097198
2,68721659289,68721553095,2024-07-01 21:11:50,841e155ffffffff,68721553095_68721659289,47.300191,15.237743,2024-07-01 21:11:50,34000.0,ID_273727707,440C1B,47.300389,15.235278,2024-07-01 21:11:50,34000.0,ID_273728319,440C1B,0,0.0,0.101190
3,68721659426,68721553232,2024-07-01 21:14:07,841e109ffffffff,68721553232_68721659426,47.332921,14.839570,2024-07-01 21:14:07,34000.0,ID_273727707,440C1B,47.334639,14.837787,2024-07-01 21:14:07,34000.0,ID_273728319,440C1B,0,0.0,0.126232
4,68721659434,68721553240,2024-07-01 21:14:15,841e143ffffffff,68721553240_68721659434,47.334965,14.816146,2024-07-01 21:14:15,34000.0,ID_273727707,440C1B,47.336630,14.814611,2024-07-01 21:14:15,34000.0,ID_273728319,440C1B,0,0.0,0.117966
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5872,652837027418,558347843547,2024-07-01 21:39:34,841f881ffffffff,558347843547_652837027418,47.346197,10.722490,2024-07-01 21:39:34,34000.0,ID_273727771,4D2245,47.347630,10.721370,2024-07-01 21:39:34,34000.0,ID_273728035,4D2245,0,0.0,0.097429
5873,652837027582,558347843711,2024-07-01 21:42:18,841f889ffffffff,558347843711_652837027582,47.654472,10.552167,2024-07-01 21:42:18,34000.0,ID_273727771,4D2245,47.656278,10.550944,2024-07-01 21:42:18,34000.0,ID_273728035,4D2245,0,0.0,0.119275
5874,652837027586,558347843715,2024-07-01 21:42:22,841f8ebffffffff,558347843715_652837027586,47.661917,10.548019,2024-07-01 21:42:22,34000.0,ID_273727771,4D2245,47.663722,10.546833,2024-07-01 21:42:22,34000.0,ID_273728035,4D2245,0,0.0,0.118658
5875,652837027616,558347843745,2024-07-01 21:42:52,841f8c7ffffffff,558347843745_652837027616,47.717296,10.517037,2024-07-01 21:42:52,34000.0,ID_273727771,4D2245,47.717643,10.516729,2024-07-01 21:42:52,34000.0,ID_273728035,4D2245,0,0.0,0.024289


In [7]:
ce_pdf

Unnamed: 0,ID2,ID1,time_over,h3_group,ID,lat1,lon1,time1,flight_lvl1,flight_id1,icao241,lat2,lon2,time2,flight_lvl2,flight_id2,icao242,time_diff_s,v_dist_FL,h_dist_NM
0,137439090770,128849216314,2024-07-01 09:44:30,841faabffffffff,128849216314_137439090770,49.402738,9.354444,2024-07-01 09:44:30,369.857143,ID_273707946,896536,49.463254,9.384881,2024-07-01 09:44:30,360.000000,ID_273705373,3C54A4,0,9.857143,3.827026
1,188978914205,137439311525,2024-07-01 18:17:40,841f9b3ffffffff,137439311525_188978914205,44.394944,9.361056,2024-07-01 18:17:40,370.000000,ID_273723455,398E24,44.438444,9.384556,2024-07-01 18:17:40,379.000000,ID_273722254,502D46,0,-9.000000,2.802531
2,214748484137,154618980217,2024-07-01 10:42:35,841e337ffffffff,154618980217_214748484137,48.192381,15.580595,2024-07-01 10:42:35,370.000000,ID_273708896,4D24B6,48.249259,15.532315,2024-07-01 10:42:35,360.666667,ID_273706355,4D24B5,0,9.333333,3.927615
3,274878108413,60129757424,2024-07-01 10:33:40,8439559ffffffff,60129757424_274878108413,39.641713,0.827176,2024-07-01 10:33:40,350.166667,ID_273709740,345686,39.683750,0.798194,2024-07-01 10:33:40,360.000000,ID_273708514,4CAE52,0,-9.833333,2.860502
4,274878322168,51539990214,2024-07-01 20:27:50,841ec53ffffffff,51539990214_274878322168,42.682222,25.695417,2024-07-01 20:27:50,360.250000,ID_273726269,4866DF,42.701944,25.589841,2024-07-01 20:27:50,370.000000,ID_273725397,4BAAD0,0,-9.750000,4.812471
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12150,841813725632,824633796063,2024-07-01 09:41:45,841faebffffffff,824633796063_841813725632,49.852778,8.789537,2024-07-01 09:41:45,369.833333,ID_273698243,3C6710,49.916270,8.753929,2024-07-01 09:41:45,360.000000,ID_273705950,405F13,0,9.833333,4.057808
12151,841813835888,420907030731,2024-07-01 13:24:05,841e8c1ffffffff,420907030731_841813835888,42.077500,16.104352,2024-07-01 13:24:05,360.000000,ID_273713050,4CADC2,42.016204,16.046111,2024-07-01 13:24:05,369.000000,ID_273712871,471EFE,0,-9.000000,4.509084
12152,841813961962,309238062589,2024-07-01 19:13:20,841e80dffffffff,309238062589_841813961962,42.037167,12.029333,2024-07-01 19:13:20,359.400000,ID_273722644,4CA90A,41.978278,11.955444,2024-07-01 19:13:20,350.000000,ID_273725287,46B8A8,0,9.400000,4.839331
12153,850403680669,266288114374,2024-07-01 08:57:20,841e329ffffffff,266288114374_850403680669,48.769769,13.692083,2024-07-01 08:57:20,330.000000,ID_273704372,49D232,48.746667,13.686111,2024-07-01 08:57:20,339.500000,ID_273706301,471F63,0,-9.500000,1.408591


In [8]:
pd.read_parquet('data/ce_20240701_10FL_5NM_gtFL245_v4.parquet')

Unnamed: 0,ID2,ID1,time_over,h3_group,ID,lat1,lon1,time1,flight_lvl1,flight_id1,icao241,lat2,lon2,time2,flight_lvl2,flight_id2,icao242,time_diff_s,v_dist_FL,h_dist_NM
0,137439090770,128849216314,2024-07-01 09:44:30,841faabffffffff,128849216314_137439090770,49.402738,9.354444,2024-07-01 09:44:30,369.857143,ID_273707946,896536,49.463254,9.384881,2024-07-01 09:44:30,360.000000,ID_273705373,3C54A4,0,9.857143,3.827026
1,188978914205,137439311525,2024-07-01 18:17:40,841f9b3ffffffff,137439311525_188978914205,44.394944,9.361056,2024-07-01 18:17:40,370.000000,ID_273723455,398E24,44.438444,9.384556,2024-07-01 18:17:40,379.000000,ID_273722254,502D46,0,-9.000000,2.802531
2,214748484137,154618980217,2024-07-01 10:42:35,841e337ffffffff,154618980217_214748484137,48.192381,15.580595,2024-07-01 10:42:35,370.000000,ID_273708896,4D24B6,48.249259,15.532315,2024-07-01 10:42:35,360.666667,ID_273706355,4D24B5,0,9.333333,3.927615
3,274878108413,60129757424,2024-07-01 10:33:40,8439559ffffffff,60129757424_274878108413,39.641713,0.827176,2024-07-01 10:33:40,350.166667,ID_273709740,345686,39.683750,0.798194,2024-07-01 10:33:40,360.000000,ID_273708514,4CAE52,0,-9.833333,2.860502
4,274878322168,51539990214,2024-07-01 20:27:50,841ec53ffffffff,51539990214_274878322168,42.682222,25.695417,2024-07-01 20:27:50,360.250000,ID_273726269,4866DF,42.701944,25.589841,2024-07-01 20:27:50,370.000000,ID_273725397,4BAAD0,0,-9.750000,4.812471
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12150,841813725632,824633796063,2024-07-01 09:41:45,841faebffffffff,824633796063_841813725632,49.852778,8.789537,2024-07-01 09:41:45,369.833333,ID_273698243,3C6710,49.916270,8.753929,2024-07-01 09:41:45,360.000000,ID_273705950,405F13,0,9.833333,4.057808
12151,841813835888,420907030731,2024-07-01 13:24:05,841e8c1ffffffff,420907030731_841813835888,42.077500,16.104352,2024-07-01 13:24:05,360.000000,ID_273713050,4CADC2,42.016204,16.046111,2024-07-01 13:24:05,369.000000,ID_273712871,471EFE,0,-9.000000,4.509084
12152,841813961962,309238062589,2024-07-01 19:13:20,841e80dffffffff,309238062589_841813961962,42.037167,12.029333,2024-07-01 19:13:20,359.400000,ID_273722644,4CA90A,41.978278,11.955444,2024-07-01 19:13:20,350.000000,ID_273725287,46B8A8,0,9.400000,4.839331
12153,850403680669,266288114374,2024-07-01 08:57:20,841e329ffffffff,266288114374_850403680669,48.769769,13.692083,2024-07-01 08:57:20,330.000000,ID_273704372,49D232,48.746667,13.686111,2024-07-01 08:57:20,339.500000,ID_273706301,471F63,0,-9.500000,1.408591


In [9]:
import math
from pyspark.sql import SparkSession
from pyspark.sql.window import Window
from pyspark.sql.functions import col, lag, udf
from pyspark.sql.types import DoubleType

def calculate_bearing(lat1, lon1, lat2, lon2):
    """
    Calculate the initial bearing (forward azimuth) between two points
    specified in decimal degrees using the great-circle formula.

    Parameters:
        lat1 (float): Latitude of the first point.
        lon1 (float): Longitude of the first point.
        lat2 (float): Latitude of the second point.
        lon2 (float): Longitude of the second point.

    Returns:
        float: Initial bearing in degrees, normalized to [0, 360).
    """
    if None in (lat1, lon1, lat2, lon2):
        return None

    lat1_rad = math.radians(lat1)
    lat2_rad = math.radians(lat2)
    delta_lon_rad = math.radians(lon2 - lon1)

    x = math.sin(delta_lon_rad) * math.cos(lat2_rad)
    y = (math.cos(lat1_rad) * math.sin(lat2_rad) -
         math.sin(lat1_rad) * math.cos(lat2_rad) * math.cos(delta_lon_rad))

    bearing_rad = math.atan2(x, y)
    bearing_deg = math.degrees(bearing_rad)

    return (bearing_deg + 360) % 360


# Register UDF
calculate_bearing_udf = udf(calculate_bearing, DoubleType())

# Assume `resampled_sdf` is your existing DataFrame
# Define window for each flight ordered by timestamp
window_spec = Window.partitionBy("flight_id").orderBy("time_over")

# Add previous point's latitude and longitude
resampled_sdf = resampled_sdf.withColumn(
    "prev_latitude", lag("latitude").over(window_spec)
)
resampled_sdf = resampled_sdf.withColumn(
    "prev_longitude", lag("longitude").over(window_spec)
)

# Compute heading using the UDF
resampled_sdf = resampled_sdf.withColumn(
    "heading",
    calculate_bearing_udf(
        col("prev_latitude"),
        col("prev_longitude"),
        col("latitude"),
        col("longitude")
    )
)


In [10]:
resampled_pdf = resampled_sdf.limit(20000).toPandas()

                                                                                

In [11]:
resampled_pdf

Unnamed: 0,time_over,flight_level,latitude,longitude,flight_id,icao24,is_ts_interpolated,segment_id,prev_latitude,prev_longitude,heading
0,2024-07-01 12:00:15,370.0,65.029722,-6.206111,273696561.0,3965AF,False,1432,,,
1,2024-07-01 12:00:20,370.0,65.020000,-6.200083,273696561.0,3965AF,True,1433,65.029722,-6.206111,165.327592
2,2024-07-01 12:00:25,370.0,65.010278,-6.194056,273696561.0,3965AF,True,1434,65.020000,-6.200083,165.322479
3,2024-07-01 12:00:30,370.0,65.000556,-6.188028,273696561.0,3965AF,True,1435,65.010278,-6.194056,165.317367
4,2024-07-01 12:00:35,370.0,64.990833,-6.182000,273696561.0,3965AF,True,1436,65.000556,-6.188028,165.312255
...,...,...,...,...,...,...,...,...,...,...,...
19995,2024-07-01 13:32:10,380.0,41.479722,41.306111,273697355.0,4406DE,False,26171,41.478148,41.321343,277.858780
19996,2024-07-01 13:32:15,380.0,41.481065,41.292917,273697355.0,4406DE,True,26172,41.479722,41.306111,277.738986
19997,2024-07-01 13:32:20,380.0,41.482407,41.279722,273697355.0,4406DE,True,26173,41.481065,41.292917,277.739145
19998,2024-07-01 13:32:25,380.0,41.483750,41.266528,273697355.0,4406DE,True,26174,41.482407,41.279722,277.739303


In [15]:
!pip uninstall -f plotly


Usage:   
  pip uninstall [options] <package> ...
  pip uninstall [options] -r <requirements file> ...

no such option: -f


In [9]:
import plotly.express as px
px.scatter(resampled_pdf, x = 'lat', y = 'lon')

ModuleNotFoundError: No module named 'plotly'

In [None]:
ce.

In [10]:
ce = ce.find_close_encounters()

Skipping resample: Already done (w. freq_s = 5 and t_max = 10)


In [11]:
ce.show()



+-----------+----------+-------------------+---------------+--------------------+------------------+------------------+-------------------+------------------+------------+-------+------------------+------------------+-------------------+-----------------+------------+-------+-----------+------------------+------------------+
|        ID2|       ID1|          time_over|       h3_group|                  ID|              lat1|              lon1|              time1|       flight_lvl1|  flight_id1|icao241|              lat2|              lon2|              time2|      flight_lvl2|  flight_id2|icao242|time_diff_s|         v_dist_FL|         h_dist_NM|
+-----------+----------+-------------------+---------------+--------------------+------------------+------------------+-------------------+------------------+------------+-------+------------------+------------------+-------------------+-----------------+------------+-------+-----------+------------------+------------------+
|     869136|    57

                                                                                

In [12]:
df = load_sample_trajectories()
encounters_df = CloseEncountersH3HalfDisk(
    df, 
    distance_nm = horizontal_separation_NM, 
    FL_diff = vertical_separation_FL, 
    FL_min = minimal_FL, 
    deltaT_min = deltaT_min, 
    pnumb = 100, 
    spark = spark)

NameError: name 'CloseEncountersH3HalfDisk' is not defined

In [None]:
create_keplergl_html(encounters_df)

TypeError: CloseEncounters.__init__() missing 1 required positional argument: 'spark'