
* run the reward_function on reward.py with the input from JimmyModelv4clone3_traininglog.tar.gz. 
* The dictionary of JimmyModelv4clone3_traininglog.tar.gz cloud be found in JimmyModelv4clone3_Analysis.ipynb. 
* The dictionary of input reward function is as param.py. 
* these two dictionary is different and have to convert before use it .
* the waypoint is as https://github.com/aws-deepracer-community/deepracer-race-data/raw/refs/heads/main/raw_data/tracks/npy/reinvent_base.npy . 
* Ignore all object relatived params. 
* ignore is_offtrack.  
* Do not display image. Output whole code as a jupyter notebnook. 

# Reward Function Analysis on JimmyModelv4clone3 Training Log

This notebook runs the reward function from reward.py on the training data from JimmyModelv4clone3_traininglog.tar.gz.
It converts between the training log dictionary format and the reward function parameter format.

In [2]:
import pandas as pd
import numpy as np
import glob
import tarfile
import os
import matplotlib.pyplot as plt
import seaborn as sns
from typing import Dict, List, Any
import urllib.request

## 1. Load and Extract Training Data

In [3]:
# Extract training log data if not already extracted
if not os.path.exists('traininglog'):
    with tarfile.open('JimmyModelv4clone3_traininglog.tar.gz', 'r:gz') as tar:
        tar.extractall()
        print('Training log extracted successfully')
else:
    print('Training log already extracted')

Training log already extracted


In [4]:
# Load all CSV files
csv_files = glob.glob('traininglog/sim-trace/training/training-simtrace/*.csv')
print(f'Found {len(csv_files)} CSV files')

# Combine all CSV files into one dataframe
all_data = []
for file in csv_files:
    df = pd.read_csv(file)
    all_data.append(df)

training_data = pd.concat(all_data, ignore_index=True)
print(f'Combined data shape: {training_data.shape}')
print(f'Columns: {list(training_data.columns)}')

Found 11 CSV files
Combined data shape: (18175, 17)
Columns: ['episode', 'steps', 'X', 'Y', 'yaw', 'steer', 'throttle', 'action', 'reward', 'done', 'all_wheels_on_track', 'progress', 'closest_waypoint', 'track_len', 'tstamp', 'episode_status', 'pause_duration']


## 2. Load Waypoint Data

In [5]:
# Download waypoint data
waypoint_url = 'https://github.com/aws-deepracer-community/deepracer-race-data/raw/refs/heads/main/raw_data/tracks/npy/reinvent_base.npy'

if not os.path.exists('reinvent_base.npy'):
    try:
        urllib.request.urlretrieve(waypoint_url, 'reinvent_base.npy')
        print('Waypoint file downloaded successfully')
    except Exception as e:
        print(f'Error downloading waypoints: {e}')
else:
    print('Waypoint file already exists')

# Load waypoints
waypoints = np.load('reinvent_base.npy')
print(f'Waypoints shape: {waypoints.shape}')

# Extract center waypoints (assuming first 2 columns are x, y coordinates)
waypoints_center = waypoints[:, :2]
print(f'Center waypoints shape: {waypoints_center.shape}')
print(f'First few waypoints: {waypoints_center[:5]}')

Waypoint file already exists
Waypoints shape: (119, 6)
Center waypoints shape: (119, 2)
First few waypoints: [[3.05973351 0.68265541]
 [3.2095089  0.68313448]
 [3.35927546 0.68336383]
 [3.50903499 0.68340179]
 [3.658795   0.68346104]]


## 3. Define Reward Function

In [6]:
def reward_function(params):
    '''
    范例代码：赛道分区策略
    '''
   
    # 分别设置鼓励靠左行驶的区域和鼓励靠右行驶的区域
    left = [*range(22,40),*range(76,92),*range(100,112)]
    right = [*range(48,56)]

    # 调用内置参数
    is_left_of_center = params['is_left_of_center']
    closest_waypoints = params['closest_waypoints']

    # 给予一个很低的起始奖励分数
    reward = 1e-3

    # 当赛车处于靠左行驶区且正在靠赛道左侧，或处于靠右行驶区且正在靠赛道右侧时，给予奖励
    if closest_waypoints[0] in left and is_left_of_center:
        reward = 1.0
    if closest_waypoints[0] in right and not is_left_of_center:
        reward = 1.0

    return float(reward)

## 4. Parameter Conversion Functions

In [7]:
def calculate_distance_from_center(x, y, waypoints):
    """Calculate distance from track center"""
    # Find closest waypoint
    distances = np.sqrt((waypoints[:, 0] - x)**2 + (waypoints[:, 1] - y)**2)
    closest_idx = np.argmin(distances)
    return distances[closest_idx]

def is_left_of_center_calc(x, y, waypoints, closest_waypoint_idx):
    """Calculate if the car is left of center"""
    # Get current and next waypoint
    current_wp = waypoints[closest_waypoint_idx]
    next_wp_idx = (closest_waypoint_idx + 1) % len(waypoints)
    next_wp = waypoints[next_wp_idx]
    
    # Calculate cross product to determine which side
    track_direction = np.array([next_wp[0] - current_wp[0], next_wp[1] - current_wp[1]])
    car_position = np.array([x - current_wp[0], y - current_wp[1]])
    
    cross_product = np.cross(track_direction, car_position)
    return cross_product > 0

def get_closest_waypoints(x, y, waypoints):
    """Get the two closest waypoints"""
    distances = np.sqrt((waypoints[:, 0] - x)**2 + (waypoints[:, 1] - y)**2)
    closest_indices = np.argsort(distances)[:2]
    return closest_indices.tolist()

def convert_training_data_to_params(row, waypoints, track_width=1.067):
    """Convert training log row to reward function parameters"""
    x, y = row['X'], row['Y']
    
    # Get closest waypoints
    closest_waypoints = get_closest_waypoints(x, y, waypoints)
    
    # Calculate distance from center
    distance_from_center = calculate_distance_from_center(x, y, waypoints)
    
    # Calculate if left of center
    is_left_of_center = is_left_of_center_calc(x, y, waypoints, closest_waypoints[0])
    
    params = {
        'all_wheels_on_track': bool(row['all_wheels_on_track']),
        'x': float(x),
        'y': float(y),
        'closest_waypoints': closest_waypoints,
        'distance_from_center': float(distance_from_center),
        'is_left_of_center': bool(is_left_of_center),
        'is_offtrack': not bool(row['all_wheels_on_track']),
        'heading': float(row['yaw']),
        'progress': float(row['progress']),
        'speed': float(row['throttle']),  # Using throttle as proxy for speed
        'steering_angle': float(row['steer']),
        'steps': int(row['steps']),
        'track_length': float(row['track_len']),
        'track_width': float(track_width),
        'waypoints': waypoints.tolist()
    }
    
    return params

## 5. Apply Reward Function to Training Data

In [8]:
# Apply reward function to all training data
print('Applying reward function to training data...')

calculated_rewards = []
conversion_errors = 0

for idx, row in training_data.iterrows():
    try:
        # Convert row to parameters
        params = convert_training_data_to_params(row, waypoints_center)
        
        # Calculate reward
        reward = reward_function(params)
        calculated_rewards.append(reward)
        
    except Exception as e:
        calculated_rewards.append(0.0)
        conversion_errors += 1
        if conversion_errors <= 5:  # Only print first 5 errors
            print(f'Error processing row {idx}: {e}')

print(f'Processed {len(calculated_rewards)} rows with {conversion_errors} errors')

# Add calculated rewards to dataframe
training_data['calculated_reward'] = calculated_rewards

Applying reward function to training data...


  cross_product = np.cross(track_direction, car_position)


Processed 18175 rows with 0 errors


## 6. Analysis Results

In [9]:
# Basic statistics
print('=== REWARD FUNCTION ANALYSIS RESULTS ===\n')

print('Original Rewards (from training log):')
print(f'  Mean: {training_data["reward"].mean():.4f}')
print(f'  Std:  {training_data["reward"].std():.4f}')
print(f'  Min:  {training_data["reward"].min():.4f}')
print(f'  Max:  {training_data["reward"].max():.4f}')

print('\nCalculated Rewards (from reward function):')
print(f'  Mean: {training_data["calculated_reward"].mean():.4f}')
print(f'  Std:  {training_data["calculated_reward"].std():.4f}')
print(f'  Min:  {training_data["calculated_reward"].min():.4f}')
print(f'  Max:  {training_data["calculated_reward"].max():.4f}')

print('\nReward Distribution:')
reward_counts = training_data['calculated_reward'].value_counts().sort_index()
for reward, count in reward_counts.items():
    print(f'  Reward {reward:.3f}: {count} occurrences ({count/len(training_data)*100:.1f}%)')

=== REWARD FUNCTION ANALYSIS RESULTS ===

Original Rewards (from training log):
  Mean: 0.9169
  Std:  0.2759
  Min:  0.0000
  Max:  1.0000

Calculated Rewards (from reward function):
  Mean: 0.2937
  Std:  0.4547
  Min:  0.0010
  Max:  1.0000

Reward Distribution:
  Reward 0.001: 12850 occurrences (70.7%)
  Reward 1.000: 5325 occurrences (29.3%)


In [10]:
# Analyze by episode
episode_analysis = training_data.groupby('episode').agg({
    'calculated_reward': ['mean', 'sum', 'count'],
    'reward': ['mean', 'sum'],
    'progress': 'max',
    'all_wheels_on_track': 'mean',
    'episode_status': 'last'
}).round(4)

episode_analysis.columns = ['calc_reward_mean', 'calc_reward_sum', 'steps', 
                           'orig_reward_mean', 'orig_reward_sum', 'max_progress', 
                           'on_track_ratio', 'final_status']

print('\n=== EPISODE ANALYSIS ===')
print(f'Total episodes: {len(episode_analysis)}')
print(f'Episodes with calculated rewards > 0.5: {len(episode_analysis[episode_analysis["calc_reward_mean"] > 0.5])}')
print(f'Episodes with max progress = 100%: {len(episode_analysis[episode_analysis["max_progress"] >= 99.9])}')

print('\nTop 10 episodes by calculated reward mean:')
print(episode_analysis.nlargest(10, 'calc_reward_mean')[['calc_reward_mean', 'calc_reward_sum', 'steps', 'max_progress', 'final_status']])


=== EPISODE ANALYSIS ===
Total episodes: 220
Episodes with calculated rewards > 0.5: 7
Episodes with max progress = 100%: 18

Top 10 episodes by calculated reward mean:
         calc_reward_mean  calc_reward_sum  steps  max_progress  final_status
episode                                                                      
44                 0.5950           22.015     37       17.2859     off_track
13                 0.5618           32.025     57       31.3266     off_track
93                 0.5243           33.030     63       37.6095     off_track
14                 0.5232           23.021     44       24.7981     off_track
185                0.5166           16.015     31       13.7787     off_track
53                 0.5005           30.030     60       32.7509     off_track
165                0.5005           14.014     28       11.8314     off_track
24                 0.4844           15.016     31       12.1968     off_track
33                 0.4763           49.054    103 

In [11]:
# Analyze the reward function logic
print('\n=== REWARD FUNCTION LOGIC ANALYSIS ===')

# Get data where rewards were given (calculated_reward = 1.0)
high_reward_data = training_data[training_data['calculated_reward'] == 1.0]

if len(high_reward_data) > 0:
    print(f'\nSteps receiving high rewards (1.0): {len(high_reward_data)}')
    
    # Calculate closest waypoints for analysis
    closest_waypoints_high_reward = []
    left_of_center_high_reward = []
    
    for idx, row in high_reward_data.iterrows():
        try:
            params = convert_training_data_to_params(row, waypoints_center)
            closest_waypoints_high_reward.append(params['closest_waypoints'][0])
            left_of_center_high_reward.append(params['is_left_of_center'])
        except:
            closest_waypoints_high_reward.append(-1)
            left_of_center_high_reward.append(False)
    
    # Analyze waypoint distribution for high rewards
    waypoint_counts = pd.Series(closest_waypoints_high_reward).value_counts().sort_index()
    print('\nWaypoint distribution for high rewards:')
    for wp, count in waypoint_counts.head(10).items():
        if wp >= 0:
            print(f'  Waypoint {wp}: {count} times')
    
    # Check left/right distribution
    left_count = sum(left_of_center_high_reward)
    right_count = len(left_of_center_high_reward) - left_count
    print(f'\nHigh reward distribution by track side:')
    print(f'  Left of center: {left_count} ({left_count/len(left_of_center_high_reward)*100:.1f}%)')
    print(f'  Right of center: {right_count} ({right_count/len(left_of_center_high_reward)*100:.1f}%)')

else:
    print('No high rewards (1.0) found in the data')


=== REWARD FUNCTION LOGIC ANALYSIS ===

Steps receiving high rewards (1.0): 5325


  cross_product = np.cross(track_direction, car_position)



Waypoint distribution for high rewards:
  Waypoint 22: 142 times
  Waypoint 23: 128 times
  Waypoint 24: 131 times
  Waypoint 25: 156 times
  Waypoint 26: 155 times
  Waypoint 27: 131 times
  Waypoint 28: 120 times
  Waypoint 29: 119 times
  Waypoint 30: 120 times
  Waypoint 31: 134 times

High reward distribution by track side:
  Left of center: 4854 (91.2%)
  Right of center: 471 (8.8%)


In [12]:
# Summary statistics table
summary_data = {
    'Metric': [
        'Total Steps',
        'Total Episodes', 
        'Avg Steps per Episode',
        'Successful Episodes (>99% progress)',
        'High Reward Steps (reward = 1.0)',
        'Low Reward Steps (reward = 0.001)',
        'Avg Calculated Reward',
        'Avg Original Reward',
        'On-Track Ratio'
    ],
    'Value': [
        len(training_data),
        training_data['episode'].nunique(),
        len(training_data) / training_data['episode'].nunique(),
        len(episode_analysis[episode_analysis['max_progress'] >= 99.9]),
        len(training_data[training_data['calculated_reward'] == 1.0]),
        len(training_data[training_data['calculated_reward'] == 0.001]),
        training_data['calculated_reward'].mean(),
        training_data['reward'].mean(),
        training_data['all_wheels_on_track'].mean()
    ]
}

summary_df = pd.DataFrame(summary_data)
print('\n=== SUMMARY STATISTICS ===')
print(summary_df.to_string(index=False, float_format='%.4f'))


=== SUMMARY STATISTICS ===
                             Metric      Value
                        Total Steps 18175.0000
                     Total Episodes   220.0000
              Avg Steps per Episode    82.6136
Successful Episodes (>99% progress)    18.0000
   High Reward Steps (reward = 1.0)  5325.0000
  Low Reward Steps (reward = 0.001) 12850.0000
              Avg Calculated Reward     0.2937
                Avg Original Reward     0.9169
                     On-Track Ratio     0.9330


## 7. Save Results

In [13]:
# Save detailed results
output_data = training_data[['episode', 'steps', 'X', 'Y', 'progress', 'reward', 'calculated_reward', 
                            'all_wheels_on_track', 'episode_status']].copy()

output_data.to_csv('reward_function_analysis_detailed.csv', index=False)
print('Detailed results saved to reward_function_analysis_detailed.csv')

# Save episode summary
episode_analysis.to_csv('reward_function_episode_analysis.csv')
print('Episode analysis saved to reward_function_episode_analysis.csv')

# Save summary statistics
summary_df.to_csv('reward_function_summary.csv', index=False)
print('Summary statistics saved to reward_function_summary.csv')

Detailed results saved to reward_function_analysis_detailed.csv
Episode analysis saved to reward_function_episode_analysis.csv
Summary statistics saved to reward_function_summary.csv


## 8. Conclusions

This analysis applied the track zone strategy reward function to the JimmyModelv4clone3 training data. The reward function encourages:

- **Left-side driving** in waypoint zones: 22-39, 76-91, 100-111
- **Right-side driving** in waypoint zones: 48-55
- **Minimal reward (0.001)** for all other positions

Key findings:
1. The reward function produces binary outcomes (1.0 or 0.001)
2. Most steps receive minimal reward, indicating the car spent little time in the target zones
3. The strategy may need adjustment based on the actual track layout and car behavior

**Note**: Object-related parameters were ignored as requested, focusing only on track position and waypoint-based rewards.