# Telemetry Drilling Sessions – Minimal Exploratory Notebook (Domain-Aware)

This notebook analyzes raw telemetry captured every 30 seconds from retrofitted Pods on drilling machines (July 2025). It performs ingestion, schema enforcement, timestamp normalization, feature engineering (distance, speed, energy proxy), operational state classification, session segmentation, BLE tagging analysis, descriptive statistics, outlier detection, correlation matrix, and CSV export.

Allowed stack only: pandas, numpy, plotly.

Domain Semantics Incorporated:
- Sampling interval: 30 s (expected regular cadence, gaps imply missing telemetry)
- Current thresholds (approx.):
  * ~0 A       → OFF / unplugged
  * ~0.9 A     → STANDBY (plugged, switch off)
  * ~3.9 A     → SPIN (motor on, no load)
  * more than 5 A       → DRILLING (under load)
- A "session" is a contiguous run of samples for a device with no time gap > 1.5 * sampling interval.
- BLE tag presence (ble_id) indicates session used a tagged consumable if any non-empty value appears.
- Sequence gaps (seq) expose missing packets.

Advanced analytics (ML clustering, anomaly detection, profiling, folium maps, model artifacts, parquet) were intentionally removed to keep the environment lightweight. See final section for optional future extensions.

### Imports & Configuration 
Sets up deterministic environment and domain parameters:
- Minimal stack imports (pandas, numpy, plotly) for portability.
- Absolute project root and data directory to avoid relative path pitfalls inside `public/notebooks`.
- Sampling cadence (30 s) and session gap (45 s) define temporal segmentation logic.
- Current thresholds map raw amperage to machine operational states (OFF/STANDBY/SPIN/DRILL).
- Configuration dict centralizes parameters; file existence check guards early against path errors.
Outcome: Printed configuration plus `RAW_FILE exists: True` if the CSV is accessible.

In [75]:
# Section 1: Imports & Configuration (Minimal Domain-Aware Version)
from __future__ import annotations
import random
from pathlib import Path
from datetime import timedelta

import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

# Random seed for reproducibility
RANDOM_SEED = 42
random.seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)

# Absolute project root (explicit to avoid nesting mistakes)
PROJECT_ROOT = Path(r"C:/Users/peppe/Documents/GitHub/Telemetry-Analytics-Dashboard-for-Smart-Drilling-Machines").resolve()
DATA_DIR = PROJECT_ROOT / 'public' / 'data'
PROCESSED_DIR = DATA_DIR
DATA_DIR.mkdir(parents=True, exist_ok=True)

# Raw file absolute path (as you specified)
RAW_FILE = DATA_DIR / 'raw_drilling_sessions.csv'

# Domain parameters
SAMPLING_SECONDS = 30  # expected interval
SESSION_MAX_GAP_FACTOR = 1.5  # gap > 45 s breaks a session
SESSION_GAP_SECONDS = int(SAMPLING_SECONDS * SESSION_MAX_GAP_FACTOR)
CURRENT_THRESHOLDS = {
    'off_max': 0.2,
    'standby_max': 1.8,
    'spin_max': 5.0
}

CONFIG = {
    'raw_file': str(RAW_FILE),
    'sampling_seconds': SAMPLING_SECONDS,
    'session_gap_seconds': SESSION_GAP_SECONDS,
    'current_thresholds': CURRENT_THRESHOLDS,
    'export_csv': 'drilling_sessions_enriched.csv'
}

print('Configuration:')
for k,v in CONFIG.items():
    print(f'  {k}: {v}')
print(f'RAW_FILE exists: {RAW_FILE.exists()}')

Configuration:
  raw_file: C:\Users\peppe\Documents\GitHub\Telemetry-Analytics-Dashboard-for-Smart-Drilling-Machines\public\data\raw_drilling_sessions.csv
  sampling_seconds: 30
  session_gap_seconds: 45
  current_thresholds: {'off_max': 0.2, 'standby_max': 1.8, 'spin_max': 5.0}
  export_csv: drilling_sessions_enriched.csv
RAW_FILE exists: True


In [76]:
# Section 2: Ingest Raw CSV (Streaming & Full Load)
# Full load
df_full = pd.read_csv(str(RAW_FILE))
print(f'Full load shape: {df_full.shape}')
mem_mb = df_full.memory_usage(deep=True).sum() / 1e6
print(f'Memory usage (raw dtypes): {mem_mb:,.2f} MB')

# Streaming / chunked approach example
chunk_rows = 0
chunks = []
for chunk in pd.read_csv(RAW_FILE, chunksize=50_000):
    chunk_rows += len(chunk)
    chunks.append(chunk.head(2))  # keep tiny sample for demonstration
print(f'Iterated through {chunk_rows} rows using chunk size=50,000.')
print('First rows from first chunk:')
print(chunks[0])


Full load shape: (3900, 8)
Memory usage (raw dtypes): 0.89 MB
Iterated through 3900 rows using chunk size=50,000.
First rows from first chunk:
              timestamp device_id  seq  current_amp    gps_lat    gps_lon  \
0  2025-07-01T09:20:15Z  b4e1d9c2  410         7.30  52.393297  13.265675   
1  2025-07-01T09:20:45Z  b4e1d9c2  411         5.79  52.393608  13.265867   

   battery_level             ble_id  
0             67  F4:12:FA:6C:9D:21  
1             67  F4:12:FA:6C:9D:21  


### Ingestion 
Two read strategies:
1. Full load for total shape & memory footprint (`df_full`).
2. Chunked iteration (50k) pattern for scalability; only small samples retained.
Outputs validate: rows, columns, approximate memory MB, and preview rows. Confirms schema expectations early and provides fallback if memory limits arise in bigger deployments.

In [77]:
# Section 3: Define Explicit Schema & Enforce Data Types
schema_dtypes = {
    'timestamp': 'string',  # parse later
    'device_id': 'category',
    'seq': 'int32',
    'current_amp': 'float32',
    'gps_lat': 'float64',
    'gps_lon': 'float64',
    'battery_level': 'Int8',  # allows NA
    'ble_id': 'category'
}

# Re-read with schema (except timestamp)
df = pd.read_csv(RAW_FILE, dtype=schema_dtypes)
print(df.dtypes)
print('Rows:', len(df))

# Report any coercion issues (pandas coercion already handled; we could check for non-numeric in numeric columns)
issues = {}
for col, expected in schema_dtypes.items():
    if pd.api.types.is_numeric_dtype(df[col]) and df[col].isna().any():
        na_pct = df[col].isna().mean()*100
        issues[col] = f'Contains {na_pct:.2f}% NA after dtype coercion'
print('Coercion issues:', issues or 'None')


timestamp        string[python]
device_id              category
seq                       int32
current_amp             float32
gps_lat                 float64
gps_lon                 float64
battery_level              Int8
ble_id                 category
dtype: object
Rows: 3900
Coercion issues: None


### Missing Value Audit & Imputation 
Profiles NA distribution to prioritize cleaning steps, then forward-fills `battery_level` per device to smooth sparse gaps. Adds boolean `ble_id_missing` to later assess tag detection reliability. Expected output: table of NA counts/percentages and reduced (or eliminated) battery_level NAs after forward fill.

In [78]:
# Section 5: Handle Missing Values & Data Quality Audit
na_counts = df.isna().sum()
na_pct = (na_counts / len(df))*100
quality = pd.DataFrame({'na_count': na_counts, 'na_pct': na_pct}).sort_values('na_pct', ascending=False)
print('Missing values summary:')
print(quality)

# Impute battery_level forward per device
df['battery_level'] = df.groupby('device_id')['battery_level'].ffill().astype('Int8')

# Flag missing BLE IDs
df['ble_id_missing'] = df['ble_id'].isna()
issues_summary = quality[quality.na_pct > 0]
print('Issues summary (non-zero NA columns):')
print(issues_summary)


Missing values summary:
               na_count     na_pct
ble_id             2573  65.974359
timestamp             0   0.000000
seq                   0   0.000000
device_id             0   0.000000
current_amp           0   0.000000
gps_lat               0   0.000000
gps_lon               0   0.000000
battery_level         0   0.000000
Issues summary (non-zero NA columns):
        na_count     na_pct
ble_id      2573  65.974359


  df['battery_level'] = df.groupby('device_id')['battery_level'].ffill().astype('Int8')


### Remove Duplicates & Sequence Integrity – Explanation
Eliminates fully duplicated rows to prevent double counting, then inspects the `seq` field per device to locate missing sequence ranges (data loss). Consecutive gaps are summarized as (start, end) tuples. Expected output: number of duplicates removed (often 0) and printed gaps only if any sequence numbers are missing.

In [79]:
# Section 6: Remove Duplicates & Sequence Integrity Check
pre_dupe = len(df)
df = df.drop_duplicates()
print(f'Removed {pre_dupe - len(df)} duplicate rows.')

# Sequence monotonic check & gaps
def find_seq_gaps(g):
    seq = g['seq'].to_numpy()
    gaps = []
    if len(seq) > 1:
        missing = np.setdiff1d(np.arange(seq.min(), seq.max()+1), seq)
        if missing.size:
            # group consecutive missing
            start = missing[0]
            prev = missing[0]
            for x in missing[1:]:
                if x == prev + 1:
                    prev = x
                else:
                    gaps.append((start, prev))
                    start = x
                    prev = x
            gaps.append((start, prev))
    return gaps

seq_gaps = df.groupby('device_id', observed=True).apply(find_seq_gaps)
print('Sequence gaps:')
print(seq_gaps[seq_gaps.apply(len) > 0])


Removed 0 duplicate rows.
Sequence gaps:
Series([], dtype: object)


  seq_gaps = df.groupby('device_id', observed=True).apply(find_seq_gaps)


### Geospatial Cleaning – Explanation
Validates latitude/longitude ranges, removes invalid rows, detects motionless bursts via rolling std (potential GPS freeze), and produces smoothed latitude/longitude variants with a centered rolling median. Output: count of invalid coordinate removals (if any) and number of constant-coordinate bursts.

In [80]:
# Section 7: Geospatial Data Cleaning (Latitude/Longitude Validation)
valid_mask = df['gps_lat'].between(-90, 90) & df['gps_lon'].between(-180, 180)
invalid_rows = (~valid_mask).sum()
if invalid_rows:
    print(f'Removing {invalid_rows} invalid coordinate rows.')
    df = df[valid_mask]

# Detect zero variance bursts (5-point rolling std == 0)
df['lat_rolling_std'] = df.groupby('device_id')['gps_lat'].transform(lambda s: s.rolling(5, min_periods=3).std())
df['lon_rolling_std'] = df.groupby('device_id')['gps_lon'].transform(lambda s: s.rolling(5, min_periods=3).std())
constant_coords = (df['lat_rolling_std'] == 0) & (df['lon_rolling_std'] == 0)
print('Constant coordinate burst count:', constant_coords.sum())

# Optional smoothing (rolling median) - non-destructive
for col in ['gps_lat','gps_lon']:
    df[f'{col}_smooth'] = df.groupby('device_id')[col].transform(lambda s: s.rolling(3, center=True, min_periods=1).median())


Constant coordinate burst count: 0


  df['lat_rolling_std'] = df.groupby('device_id')['gps_lat'].transform(lambda s: s.rolling(5, min_periods=3).std())
  df['lon_rolling_std'] = df.groupby('device_id')['gps_lon'].transform(lambda s: s.rolling(5, min_periods=3).std())
  df[f'{col}_smooth'] = df.groupby('device_id')[col].transform(lambda s: s.rolling(3, center=True, min_periods=1).median())
  df[f'{col}_smooth'] = df.groupby('device_id')[col].transform(lambda s: s.rolling(3, center=True, min_periods=1).median())


### Feature Engineering (Distance, Speed, Bearing) – Explanation
Computes per-sample movement metrics per device by shifting previous coordinates. Uses haversine for geodesic distance (meters) and bearing for heading. Derives `distance_m`, `bearing_deg`, `speed_mps` (clipped to 0–20 to suppress spikes), and cumulative distance. Output preview shows first few engineered rows.

In [81]:
# Section 8: Feature Engineering: Delta Time, Distance, Speed, Bearing
R_EARTH = 6371000.0  # meters

def haversine(lat1, lon1, lat2, lon2):
    lat1, lon1, lat2, lon2 = map(np.radians, [lat1, lon1, lat2, lon2])
    dlat = lat2 - lat1
    dlon = lon2 - lon1
    a = np.sin(dlat/2)**2 + np.cos(lat1)*np.cos(lat2)*np.sin(dlon/2)**2
    c = 2 * np.arcsin(np.sqrt(a))
    return R_EARTH * c

def bearing(lat1, lon1, lat2, lon2):
    lat1, lon1, lat2, lon2 = map(np.radians, [lat1, lon1, lat2, lon2])
    dlon = lon2 - lon1
    x = np.sin(dlon) * np.cos(lat2)
    y = np.cos(lat1)*np.sin(lat2) - np.sin(lat1)*np.cos(lat2)*np.cos(dlon)
    brng = np.degrees(np.arctan2(x, y))
    return (brng + 360) % 360

# Compute time difference in seconds per device
df['timestamp_dt'] = pd.to_datetime(df['timestamp'])
df['time_diff_s'] = df.groupby('device_id')['timestamp_dt'].diff().dt.total_seconds().fillna(0).astype('float32')

# shift per device
for col in ['gps_lat','gps_lon']:
    df[f'{col}_prev'] = df.groupby('device_id')[col].shift(1)

mask_move = df['gps_lat_prev'].notna()
df.loc[mask_move, 'distance_m'] = haversine(
    df.loc[mask_move, 'gps_lat_prev'], df.loc[mask_move, 'gps_lon_prev'],
    df.loc[mask_move, 'gps_lat'], df.loc[mask_move, 'gps_lon']
)
df['distance_m'] = df['distance_m'].fillna(0).astype('float32')

df.loc[mask_move, 'bearing_deg'] = bearing(
    df.loc[mask_move, 'gps_lat_prev'], df.loc[mask_move, 'gps_lon_prev'],
    df.loc[mask_move, 'gps_lat'], df.loc[mask_move, 'gps_lon']
)
df['bearing_deg'] = df['bearing_deg'].fillna(method='ffill')

df['speed_mps'] = (df['distance_m'] / df['time_diff_s'].replace(0, np.nan)).clip(0, 20).fillna(0).astype('float32')
df['cumulative_distance_m'] = df.groupby('device_id')['distance_m'].cumsum().astype('float32')
print(df[['device_id','seq','distance_m','speed_mps','bearing_deg']].head())

  device_id  seq  distance_m  speed_mps  bearing_deg
0  b4e1d9c2  410    0.000000   0.000000          NaN
1  b4e1d9c2  411   36.954323   1.231811    20.643180
2  b4e1d9c2  412   81.003448   2.700115   139.265107
3  b4e1d9c2  413   39.776043   1.325868   271.441930
4  b4e1d9c2  414   89.053589   2.968453    24.631961


  df['time_diff_s'] = df.groupby('device_id')['timestamp_dt'].diff().dt.total_seconds().fillna(0).astype('float32')
  df[f'{col}_prev'] = df.groupby('device_id')[col].shift(1)
  df[f'{col}_prev'] = df.groupby('device_id')[col].shift(1)
  df['bearing_deg'] = df['bearing_deg'].fillna(method='ffill')
  df['cumulative_distance_m'] = df.groupby('device_id')['distance_m'].cumsum().astype('float32')


### Operational Power / State Classification – Explanation
Translates raw current into discrete operational states using configured thresholds (OFF, STANDBY, SPIN, DRILL) and computes a normalized `power_index`. Rolling means (window 5) smooth short-term noise. Output preview lists state assignments and smoothed metrics for early rows.

In [82]:
# Section 9: Energy / Operational Power Proxies (Domain-Aware Minimal)
# We approximate relative power states without voltage; focus on current-driven proxies.
# Motor state distinguishes idle spin vs drilling load using current thresholds.

thr = CONFIG['current_thresholds']

def classify_state(i):
    if pd.isna(i):
        return 'UNKNOWN'
    if i <= thr['off_max']:
        return 'OFF'
    if i <= thr['standby_max']:
        return 'STANDBY'
    if i <= thr['spin_max']:
        return 'SPIN'
    return 'DRILL'

# Operational state classification
df['op_state'] = df['current_amp'].apply(classify_state)

# Simple relative power index scaled to current (normalizing by 10 A for readability)
df['power_index'] = (df['current_amp'] / 10.0).clip(0).astype('float32')

# Rolling smoothing for current & power
for col in ['current_amp','power_index']:
    df[f'{col}_roll5'] = df.groupby('device_id')[col].transform(lambda s: s.rolling(5, min_periods=1).mean()).astype('float32')

# Session-level tag presence placeholder (filled later after session assignment)
print(df[['device_id','seq','current_amp','op_state','power_index']].head())

  device_id  seq  current_amp op_state  power_index
0  b4e1d9c2  410         7.30    DRILL        0.730
1  b4e1d9c2  411         5.79    DRILL        0.579
2  b4e1d9c2  412         5.51    DRILL        0.551
3  b4e1d9c2  413         2.03     SPIN        0.203
4  b4e1d9c2  414         6.98    DRILL        0.698


  df[f'{col}_roll5'] = df.groupby('device_id')[col].transform(lambda s: s.rolling(5, min_periods=1).mean()).astype('float32')
  df[f'{col}_roll5'] = df.groupby('device_id')[col].transform(lambda s: s.rolling(5, min_periods=1).mean()).astype('float32')


### Session Segmentation & BLE Tagging – Explanation
Segments telemetry into sessions per device when time gaps exceed the configured threshold or sequence jumps indicate packet loss. Aggregates BLE IDs to flag tagged sessions and capture a dominant tag. Produces a session summary (duration, rows, distance, activity ratio, tag status) and counts tagged vs untagged sessions.

In [88]:
# Section 10: Session Segmentation & BLE Tag Enrichment (no power_index usage)

def assign_sessions(g):
    gap = g['time_diff_s'] > CONFIG['session_gap_seconds']
    return gap.cumsum()

df = df.sort_values(['device_id','timestamp'])

for col in ['session_local_id', 'session_id', 'session_tagged', 'session_ble_id']:
    if col in df.columns:
        df = df.drop(columns=[col])

df['session_local_id'] = (
    df.groupby('device_id', observed=True)
      .apply(assign_sessions)
      .reset_index(level=0, drop=True)
)
df['session_id'] = df['device_id'].astype(str) + '_' + df['session_local_id'].astype(str)

def clean_ble_tags(series):
    valid = series.dropna()
    valid = valid[valid.astype(str).str.strip() != '']
    valid = valid[valid.astype(str) != 'nan']
    return valid.unique() if len(valid) > 0 else []

session_ble = df.groupby('session_id', observed=True)['ble_id'].apply(clean_ble_tags)
session_tagged = session_ble.apply(lambda arr: len(arr) > 0)
session_ble_id = session_ble.apply(lambda arr: arr[0] if len(arr) > 0 else None)

session_meta = pd.DataFrame({
    'session_id': session_tagged.index,
    'session_tagged': session_tagged.values,
    'session_ble_id': session_ble_id.values
})
df = df.merge(session_meta, on='session_id', how='left')

session_summary = (
    df.groupby('session_id', observed=True).agg(
        device_id=('device_id','first'),
        start=('timestamp','min'),
        end=('timestamp','max'),
        rows=('seq','count'),
        distance_m=('distance_m','sum'),
        tagged=('session_tagged','first'),
        ble_id=('session_ble_id','first')
    )
    .sort_values('start')
)

# Calculate duration as end time minus start time
session_summary['start_dt'] = pd.to_datetime(session_summary['start'])
session_summary['end_dt'] = pd.to_datetime(session_summary['end'])
session_summary['duration_s'] = (session_summary['end_dt'] - session_summary['start_dt']).dt.total_seconds().astype('float32')

# Drop temporary datetime columns
session_summary = session_summary.drop(columns=['start_dt', 'end_dt'])

# Add duration in minutes
session_summary['duration_min'] = (session_summary['duration_s'] / 60.0).astype('float32')

print('Session summary (head):')
print(session_summary.head())

print('\nTagged session counts:')
print(session_summary["tagged"].value_counts())

Session summary (head):
           device_id                 start                   end  rows  \
session_id                                                               
b4e1d9c2_0  b4e1d9c2  2025-07-01T09:20:15Z  2025-07-01T09:30:15Z    21   
9f8c2a7d_0  9f8c2a7d  2025-07-01T19:39:26Z  2025-07-01T19:59:26Z    41   
9f8c2a7d_1  9f8c2a7d  2025-07-01T23:33:52Z  2025-07-01T23:43:52Z    21   
9f8c2a7d_2  9f8c2a7d  2025-07-02T06:22:10Z  2025-07-02T06:32:10Z    21   
b4e1d9c2_1  b4e1d9c2  2025-07-02T22:44:10Z  2025-07-02T23:04:10Z    41   

              distance_m  tagged             ble_id  duration_s  duration_min  
session_id                                                                     
b4e1d9c2_0   1501.454712    True  F4:12:FA:6C:9D:21       600.0          10.0  
9f8c2a7d_0   2571.208008   False               None      1200.0          20.0  
9f8c2a7d_1  35371.179688   False               None       600.0          10.0  
9f8c2a7d_2  28007.292969   False               None      

  .apply(assign_sessions)


### Export – Explanation
Writes a lean CSV containing only essential enriched columns (original telemetry + engineered movement metrics + state + session & BLE tagging). Column list is filtered to existing columns for robustness. Output: confirmation path and column count.

In [84]:
# Section 14: Export (Minimal)
# Lightweight CSV export of enriched dataframe with session & state features.

minimal_export_path = PROCESSED_DIR / CONFIG['export_csv']
export_cols = [
    'timestamp','device_id','seq','current_amp','gps_lat','gps_lon','battery_level','ble_id',
    'distance_m','speed_mps','bearing_deg','cumulative_distance_m','op_state','power_index',
    'session_id','session_tagged','session_ble_id'
]
# Only keep columns that exist (robustness if dataset schema changes)
export_cols = [c for c in export_cols if c in df.columns]
df[export_cols].to_csv(minimal_export_path, index=False)
print(f'Minimal CSV export written to: {minimal_export_path} (columns: {len(export_cols)})')

Minimal CSV export written to: C:\Users\peppe\Documents\GitHub\Telemetry-Analytics-Dashboard-for-Smart-Drilling-Machines\public\data\drilling_sessions_enriched.csv (columns: 17)


In [85]:
# Export session_summary with duration to CSV
session_export_path = PROCESSED_DIR / 'session_summary.csv'
session_summary.to_csv(session_export_path, index=True)
print(f'Session summary exported to: {session_export_path} (rows: {len(session_summary)})')

Session summary exported to: C:\Users\peppe\Documents\GitHub\Telemetry-Analytics-Dashboard-for-Smart-Drilling-Machines\public\data\session_summary.csv (rows: 98)


### Integrity Checks – Explanation
Runs domain sanity validations: sampling gap plausibility (no extreme gaps), geographic bounds, non-negative distance/speed, and consistent tagging within each session. Any violation raises an assertion to fail fast. Output: confirmation message if all pass.

In [89]:
# Section 15: Integrity Checks (Domain-Aware Minimal)
print('Running integrity checks...')

# 1. Sampling interval plausibility (soft checks instead of hard assert to avoid notebook stop)
extreme_gap_threshold = CONFIG['session_gap_seconds'] * 3
extreme_gaps = df['time_diff_s'] > extreme_gap_threshold
negative_gaps = df['time_diff_s'] < 0

if extreme_gaps.any() or negative_gaps.any():
	print(
		f'Warning: detected {extreme_gaps.sum()} extreme positive gaps (> {extreme_gap_threshold}s) '
		f'and {negative_gaps.sum()} negative gaps. Flags added as columns gap_flag_extreme / gap_flag_negative.'
	)
	# Add (or update) flag columns
	df['gap_flag_extreme'] = extreme_gaps
	df['gap_flag_negative'] = negative_gaps
else:
	print('Sampling interval within expected bounds.')

# 2. Coordinates sanity
assert df['gps_lat'].between(-90,90).all() and df['gps_lon'].between(-180,180).all(), 'Invalid coordinates present.'

# 3. Non-negative distances & speeds
assert (df['distance_m'] >= 0).all(), 'Negative distances found.'
assert (df['speed_mps'] >= 0).all(), 'Negative speeds found.'

# 4. Session tagging consistency (robust to cells executed out-of-order)
if {'session_id', 'session_tagged'}.issubset(df.columns):
	session_tag_consistency = df.groupby('session_id')['session_tagged'].nunique().le(1).all()
	if not session_tag_consistency:
		raise AssertionError('Inconsistent tagging flag within a session.')
	else:
		print('Session tagging consistency check passed.')
else:
	print('Skipping session tagging consistency check (session_id/session_tagged not present – run segmentation cell).')

print('Integrity checks completed.')

Running integrity checks...
Session tagging consistency check passed.
Integrity checks completed.


In [90]:
# Preview public/data/drilling_sessions_enriched.csv — first 10 rows (simple)
csv_path = PROCESSED_DIR / CONFIG['export_csv']
print(f'Reading: {csv_path} (exists: {csv_path.exists()})')
df_preview = pd.read_csv(csv_path)
df_preview.head(10)

Reading: C:\Users\peppe\Documents\GitHub\Telemetry-Analytics-Dashboard-for-Smart-Drilling-Machines\public\data\drilling_sessions_enriched.csv (exists: True)


Unnamed: 0,timestamp,device_id,seq,current_amp,gps_lat,gps_lon,battery_level,ble_id,distance_m,speed_mps,bearing_deg,cumulative_distance_m,op_state,power_index,session_id,session_tagged,session_ble_id
0,2025-07-05T04:40:35Z,7a3f55e1,453,3.38,52.544487,13.168079,77,A4:C1:38:1F:2B:7C,0.0,0.0,8.096659,0.0,SPIN,0.338,7a3f55e1_0,True,A4:C1:38:1F:2B:7C
1,2025-07-05T04:41:05Z,7a3f55e1,454,6.03,52.545018,13.16815,77,A4:C1:38:1F:2B:7C,59.239388,1.974646,4.648727,59.239388,DRILL,0.603,7a3f55e1_0,True,A4:C1:38:1F:2B:7C
2,2025-07-05T04:41:35Z,7a3f55e1,455,1.93,52.544613,13.167192,77,A4:C1:38:1F:2B:7C,78.897194,2.629906,235.194857,138.13658,SPIN,0.193,7a3f55e1_0,True,A4:C1:38:1F:2B:7C
3,2025-07-05T04:42:05Z,7a3f55e1,456,8.11,52.544925,13.167072,77,A4:C1:38:1F:2B:7C,35.629192,1.18764,346.83521,173.76578,DRILL,0.811,7a3f55e1_0,True,A4:C1:38:1F:2B:7C
4,2025-07-05T04:42:35Z,7a3f55e1,457,6.62,52.544484,13.167165,77,,49.438583,1.647953,172.691805,223.20436,DRILL,0.662,7a3f55e1_0,True,A4:C1:38:1F:2B:7C
5,2025-07-05T04:43:05Z,7a3f55e1,458,6.5,52.544853,13.168088,77,A4:C1:38:1F:2B:7C,74.69422,2.489807,56.679354,297.89856,DRILL,0.65,7a3f55e1_0,True,A4:C1:38:1F:2B:7C
6,2025-07-05T04:43:35Z,7a3f55e1,459,5.1,52.544829,13.167507,77,A4:C1:38:1F:2B:7C,39.378983,1.312633,266.114369,337.27756,DRILL,0.51,7a3f55e1_0,True,A4:C1:38:1F:2B:7C
7,2025-07-05T04:44:05Z,7a3f55e1,460,4.09,52.544715,13.167623,77,,14.906969,0.496899,148.250299,352.18454,SPIN,0.409,7a3f55e1_0,True,A4:C1:38:1F:2B:7C
8,2025-07-05T04:44:35Z,7a3f55e1,461,2.17,52.544052,13.167412,77,A4:C1:38:1F:2B:7C,75.09032,2.50301,190.953844,427.27484,SPIN,0.217,7a3f55e1_0,True,A4:C1:38:1F:2B:7C
9,2025-07-05T04:45:05Z,7a3f55e1,462,2.79,52.544368,13.167751,77,A4:C1:38:1F:2B:7C,41.95439,1.39848,33.120821,469.22925,SPIN,0.279,7a3f55e1_0,True,A4:C1:38:1F:2B:7C
