# IceCube❄️: Fitting line to 3D point cloud

The goal of this competition is to predict a neutrino particle’s direction. You will develop a model based on data from the "IceCube" detector, which observes the cosmos from deep within the South Pole ice.

The IceCube Neutrino Observatory is the first detector of its kind, encompassing a cubic kilometer of ice and designed to search for the nearly massless neutrinos. An international group of scientists is responsible for the scientific research that makes up the IceCube Collaboration.

By making the process faster and more precise, you'll help improve the reconstruction of neutrinos. As a result, we could gain a clearer image of our universe.

### See related notebooks:

- [IceCube🧊: Neutrino🎆EDA & 3D🔭interactive viewer](https://www.kaggle.com/code/jirkaborovec/icecube-neutrino-eda-3d-interactive-viewer)
- [IceCube❄️: baseline with XGBoost regression](https://www.kaggle.com/code/jirkaborovec/icecube-neutrino-baseline-xgboost-regression)

### Point-cloud 3D fitting

- https://scikit-spatial.readthedocs.io/en/stable/gallery/fitting/plot_line_3d.html

In [None]:
%matplotlib inline

import os
import glob
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

PATH_DATASET = "/kaggle/input/icecube-neutrinos-in-deep-ice"

## Browse meta data

**[train/test]_meta.parquet**

- **batch_id** (int): the ID of the batch the event was placed into.
- **event_id** (int): the event ID.
- **[first/last]_pulse_index** (int): index of the first/last row in the features dataframe belonging to this event.
- **[azimuth/zenith]** (float32): the [azimuth/zenith] angle in radians of the neutrino. A value between 0 and 2*pi for the azimuth and 0 and pi for zenith. The target columns. Not provided for the test set. The direction vector represented by zenith and azimuth points to where the neutrino came from.

In [None]:
meta_train = pd.read_parquet(os.path.join(PATH_DATASET, "train_meta.parquet"))
display(meta_train.head())

In [None]:
display(meta_train.info())

### azimuth and zenith

In [None]:
plt.hist2d(meta_train["azimuth"], meta_train["zenith"], bins=(50, 50), cmap=plt.cm.jet)
plt.xlabel('azimuth'), plt.ylabel('zenith')
plt.colorbar()

## Browse geometry

The x, y, and z positions for each of the 5160 IceCube sensors. The row index corresponds to the sensor_idx feature of pulses. The x, y, and z coordinates are in units of meters, with the origin at the center of the IceCube detector. The coordinate system is right-handed, and the z-axis points upwards when standing at the South Pole.

In [None]:
geometry = pd.read_csv(os.path.join(PATH_DATASET, "sensor_geometry.csv"))
print(f"length: {len(geometry)}")
geometry.head()

In [None]:
import plotly.express as px

fig = px.scatter_3d(geometry, x='x', y='y', z='z', opacity=0.6, color="sensor_id")
fig.update_traces(marker_size=2)
fig.update_layout(height=600, width=600)
fig.show()

In [None]:
geometry.set_index("sensor_id", inplace=True)
geometry = geometry.apply(np.float32)
geometry.info()

## Browse training/test data

**[train/test]/batch_[n].parquet** Each batch contains tens of thousands of events. Each event may contain thousands of pulses, each of which is the digitized output from a photomultiplier tube and occupies one row.

- **event_id** (int): the event ID. Saved as the index column in parquet.
- **sensor_id** (int): the ID of which of the 5160 IceCube photomultiplier sensors recorded this pulse.
- **auxiliary** (bool): If True, the pulse was not fully digitized, is of lower quality, and was more likely to originate from noise. If False, then this pulse was contributed to the trigger decision and the pulse was fully digitized.

In [None]:
train = pd.read_parquet(os.path.join(PATH_DATASET, "train/batch_15.parquet"))
print(f"length: {len(train)}")
print(f"events: {len(train.index.unique())}")
train.head()

## Fitting line to points

In [None]:
! pip install -q scikit-spatial --no-index -f /kaggle/input/icecube-neutrino-eda-3d-interactive-viewer/frozen-packages/

see the angle visualisation in https://www.kaggle.com/code/jirkaborovec/icecube-visualize-azimuth-zenith

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/a/a2/Kugelkoord-lokale-Basis-s.svg/1024px-Kugelkoord-lokale-Basis-s.svg.png" width="400" height="400"/>

In [None]:
import math
import warnings
warnings.filterwarnings('ignore')

def cartesian_to_sphere(x, y, z):
    # https://en.wikipedia.org/wiki/Spherical_coordinate_system
    x2y2 = x**2 + y**2
    r = math.sqrt(x2y2 + z**2)
    azimuth = math.acos(x / math.sqrt(x2y2)) * np.sign(y)
    zenith = math.acos(z / r)
    return azimuth, zenith

def adjust_sphere(azimuth, zenith):
    if zenith < 0:
        zenith += math.pi
        azimuth += math.pi
    if azimuth < 0:
        azimuth += math.pi * 2
    azimuth = azimuth % (2 * math.pi)
    return azimuth, zenith

def sphere_to_cartesian(azimuth, zenith):
    # see: https://stackoverflow.com/a/10868220/4521646
    x = math.sin(zenith) * math.cos(azimuth)
    y = math.sin(zenith) * math.sin(azimuth)
    z = math.cos(zenith)
    return x, y, z

In [None]:
import plotly.graph_objects as go

def draw_subplot(
    fig, i, evt, sensors, meta, pos_offset, pos_direction, gt_direction, line_len=250
):
    # sensors as background
    fig.add_trace(go.Scatter3d(
        x=sensors['x'], y=sensors['y'], z=sensors['z'], 
        mode='markers', marker=dict(size=1, color="black"), opacity=0.2
    ), row=(i+1), col=1)
    # sensors reading
    evt_time_min, evt_time_max = min(evt['time']), max(evt['time'])
    evt.loc[:, 'time'] = (evt['time'] - evt_time_min) / (evt_time_max - evt_time_min)
    fig.add_trace(go.Scatter3d(
        x=evt['x'], y=evt['y'], z=evt['z'], opacity=0.95,
        mode='markers', marker=dict(
            size=evt['charge'] * 15, color=evt['time'], colorscale='sunsetdark'
        )
    ), row=(i+1), col=1)
    # direction from metad data
    ox, oy, oz = pos_offset
    
    for c, (x, y, z) in [('red', pos_direction), ('green', gt_direction)]:
        fig.add_trace(go.Scatter3d(
            x=[ox - x * line_len, ox + x * line_len],
            y=[oy - y * line_len, oy + y * line_len],
            z=[oz - z * line_len, oz + z * line_len],
            opacity=0.8, mode='lines', line=dict(color=c, width=3)
        ), row=(i+1), col=1)
        fig.add_trace(go.Scatter3d(
            x=[ox - x * line_len],
            y=[oy - y * line_len],
            z=[oz - z * line_len],
            opacity=0.8, mode='markers', marker=dict(size=5, color=c)
        ), row=(i+1), col=1)

### Sample event

In [None]:
from copy import copy
from plotly.subplots import make_subplots
from skspatial.objects import Line, Point, Points


# def sample_directions(evt, line):
#     line_ = copy(line)
#     evt_charge = evt[evt['charge'] >= evt['charge'].mean()]
#     points = evt_charge.sort_values('time')[['x', 'y', 'z']]
#     projs = []
#     for i, (_, point) in enumerate(points[:len(points) // 2].iterrows()):
#         line_.point = Point(point.values)
#         points_ = Points(points[i:].sample(2).values)
#         projs += np.sign(line_.transform_points(points_)).tolist()


def show_event(event_id=46528394, data=train, sensors=geometry, metadata=meta_train):
    event = data[data.index == event_id]
    event = event.merge(sensors, left_on="sensor_id", right_index=True)
    meta = dict(metadata[metadata["event_id"]==event_id].iloc[0])
    print(meta)
    
    evt_ = event[~event['auxiliary']]
    points = Points(evt_[["x", "y", "z"]].values)
    line = Line.best_fit(points, full_matrices=False)
    print(f"Estimate: {line}")
    # sample_directions(evt_, line)
    
    print(f"provided/GT of azimuth={meta['azimuth']:0.5}; zenith={meta['zenith']:0.5}")
    azimuth_, zenith_ = cartesian_to_sphere(*line.direction)
    print(f"predictions of azimuth={azimuth_:0.5} & zenith={zenith_:0.5}")
    azimuth_, zenith_ = adjust_sphere(azimuth_, zenith_)
    print(f"correction of azimuth={azimuth_:0.5} & zenith={zenith_:0.5}")
    
    auxiliaries = [False, True]
    fig = make_subplots(
        rows=2, specs=[[{'type': 'scene'}], [{'type': 'scene'}]],
        subplot_titles=[f"auxiliary={aux}" for aux in auxiliaries],
        vertical_spacing=0.05,
    )
    event.loc[:, 'charge'] = event['charge'] / max(event['charge'])
    x_, y_, z_ = sphere_to_cartesian(meta['azimuth'], meta['zenith'])
    for i, aux in enumerate(auxiliaries):
        draw_subplot(
            fig, i, event[event['auxiliary'] == aux], sensors, meta,
            line.point, line.direction, gt_direction=(x_, y_, z_),
        )
    desc_gt = f"azimuth={meta['azimuth']:0.3}; zenith={meta['zenith']:0.3}"
    desc_gt += f"\n -> x={x_:0.2}; y={y_:0.2}; z={z_:0.2}"
    fig.update_layout(
        height=800, width=600, showlegend=False,
        title_text=f"Event #{event_id} / {desc_gt}",
    )
    return fig

show_event().show()

### Interactive - browse events

**This one is an interactive chart when you can slide over all event in the training batch**

but to get in you need to ioen the notebook in edior mode (so to copy as your own)

In [None]:
from ipywidgets import interact, IntSlider

def interactive_show(events):
    interact(
        lambda i: show_event(events[i]).show(),
        i=IntSlider(min=0, max=len(events), step=1, value=len(events) // 2),
    )

events = train.index.unique().tolist()
interactive_show(events)

In [None]:
import gc, time

del train, meta_train, events
gc.collect(), time.sleep(9)

## Prepare submission¶

An example submission with the correct columns and properly ordered event IDs. The sample submission is provided in the parquet format so it can be read quickly but your final submission must be a csv.

In [None]:
ssub = pd.read_parquet(os.path.join(PATH_DATASET, "sample_submission.parquet"))
ssub.set_index("event_id", inplace=True)
display(ssub.head())

In [None]:
ssub = ssub.apply(np.float32)
display(ssub.info())

### Fitting to all data

In [None]:
ls = glob.glob(os.path.join(PATH_DATASET, "test", "*.parquet"))

for batch_file in ls:
    print(f"processing: {batch_file}")
    df = pd.read_parquet(batch_file)
    # del df['time'], df['charge']
    # display(df.head())
    gc.collect()
    
    for eid, dfg in df.groupby("event_id"):
        dfg = dfg[~dfg['auxiliary']]
        dfg = dfg.merge(geometry, left_on="sensor_id", right_index=True)
        # TODO: for memory reason subsample point cloud
        if len(dfg) > 800:
            dfg = dfg.sort_values('charge', ascending=False)[:800]
        # display(dfg)

        try:
            points = Points(dfg[["x", "y", "z"]].values)
            line = Line.best_fit(points, full_matrices=False)
            # TOOD: adjust by timeseriers
            direction_ = -1 * line.direction
            azimuth_, zenith_ = adjust_sphere(*cartesian_to_sphere(*direction_))
        except Exception as ex:
            azimuth_, zenith_ = 0., 0.
            print(ex)
        else:
            ssub.at[eid, "azimuth"] = azimuth_
            ssub.at[eid, "zenith"] = zenith_
        if len(df) < 1e5:
            print(f"Estimation {line} with azimuth={azimuth_} & zenith={zenith_}")

    del df, dfg
    gc.collect(), time.sleep(9)

### Finalize submission

In [None]:
ssub.fillna(0).to_csv('submission.csv', index=True)

!head submission.csv