Skip to content

Progressive development of a multi-object tracker, starting from a simple Kalman filter and culminating in an appearance-aware tracking system.

License

Notifications You must be signed in to change notification settings

syacef/Multi-Object-Tracking

Repository files navigation

Individual Report: Multi-Object Tracking

This report outlines the work undertaken for the Multi-Object Video Tracking project, as part of the MLVOT course. The project involves the progressive development of a multi-object tracker, starting from a simple Kalman filter and culminating in an appearance-aware tracking system.

1. Tasks Undertaken

This section describes the implementation of the four main components of the project.

TP1: Single Object Tracking with Kalman Filter

  • Objective: Implement a simple single object tracker using a Kalman filter to smooth the trajectory of a detected object.
  • Implementation:
    • A KalmanFilter class was created in 2D_Kalman-Filter_TP1/KalmanFilter.py. This class initializes the state-space model matrices (A, B, H, Q, R, P) as per the instructions.
    • The main script, 2D_Kalman-Filter_TP1/objTracking.py, reads a video, uses a provided Detector to get noisy measurements of an object's position, and applies the Kalman filter's predict and update steps.
    • The visualization shows the measured (green), predicted (blue), and estimated (red) positions, along with the object's trajectory.

TP2: IOU-Based Multi-Object Tracking

  • Objective: Develop a multi-object tracker (MOT) for bounding boxes using the Intersection over Union (IOU) metric for data association.
  • Implementation:
    • The tracker, implemented in 2D_Kalman-Filter_TP2/iou_tracker.py, processes pre-generated detections from a text file.
    • For each frame, an IOU matrix is computed between existing tracks and new detections.
    • The Hungarian algorithm (scipy.optimize.linear_sum_assignment) is used to find the optimal assignment of detections to tracks.
    • Track management logic was implemented to create new tracks for unmatched detections, update matched tracks, and delete tracks that are lost for a specified number of frames.

TP3: Kalman-Guided IOU Tracking

  • Objective: Enhance the IOU tracker by integrating the Kalman filter to predict the future position of tracks, improving the association logic.
  • Implementation:
    • The code in 2D_Kalman-Filter_TP3/kalman_iou_tracker.py was adapted from TP2.
    • Each Track object was augmented with its own KalmanFilter instance.
    • Before the association step, the Kalman filter predicts the new bounding box for each active track.
    • The IOU similarity matrix is then calculated between these predicted bounding boxes and the new detections, making the matching more robust to short-term occlusions or detection failures.

TP4: Appearance-Aware IOU-Kalman Object Tracker

  • Objective: Further improve the tracker by incorporating appearance information using a deep learning-based Re-Identification (Re-ID) model. This helps in handling longer occlusions and reducing ID switches.
  • Implementation:
    • The final tracker is in 2D_Kalman-Filter_TP4/appearance_tracker.py.
    • An ONNX Re-ID model (reid_osnet_x025_market1501.onnx) is used to extract an appearance feature vector for each detected object patch.
    • A combined similarity score is calculated as a weighted sum of IOU and appearance similarity (derived from the Euclidean distance between feature vectors).
    • This combined score is used in the cost matrix for the Hungarian algorithm, allowing the tracker to associate objects based on both motion (Kalman filter prediction + IOU) and appearance.

2. Challenges and Solutions

  • Challenge 1: Parameter Tuning.

    • Description: Finding the right values for the Kalman filter's noise covariances (Q and R), the IOU threshold, and the weights (alpha, beta) for the appearance tracker was challenging. Poor values led to unstable tracks or frequent ID switches.
    • Solution: I systematically tested a range of values. For the Kalman filter, I started with the suggested values and tweaked them to better match the object's dynamics in the video. For the appearance tracker, I found that giving a higher weight to appearance (beta) was useful for re-identification after occlusion, while a dominant IOU weight (alpha) was better for smooth, frame-to-frame tracking.
  • Challenge 2: Handling a Large Number of Detections.

    • Description: In crowded scenes, the number of detections can be large, making the cost matrix computation and assignment problem computationally expensive.
    • Solution: While not a major issue for the provided sequence, I noted that for real-time applications, optimizations would be needed. This could include gating (only considering detections within a certain distance of a track's predicted position) to reduce the size of the cost matrix.
  • Challenge 3: Bounding Box Coordinate Mismatches.

    • Description: A frequent source of subtle bugs was ensuring the bounding box format was consistent across the pipeline. The initial detections were in [x, y, width, height], but different parts of the code—like the IOU calculation or drawing rectangles with cv2—sometimes implicitly expect [x1, y1, x2, y2]. Mixing these up led to incorrect IOU calculations and misplaced bounding boxes in the output video.
    • Solution: I had to be meticulous and add small helper functions or inline conversions to ensure the correct format was used at each step. I standardized on using [x, y, w, h] internally and only converted it when a library function explicitly required a different format.
  • Challenge 4: Correct Re-ID Model Preprocessing.

    • Description: The OSNet Re-ID model requires a very specific input: patches resized to 128x64, color channels converted from BGR to RGB, and normalization with specific mean and standard deviation values. Any mistake in this preprocessing pipeline resulted in poor quality feature vectors, which silently degraded tracking performance without causing a crash.
    • Solution: I created a dedicated FeatureExtractor class that encapsulated the entire preprocessing logic. This involved careful testing to ensure the resizing, color conversion, normalization, and tensor dimension ordering (HWC to CHW) exactly matched the model's requirements as described in the PDF.

3. Critical Analysis

  • Kalman Filter (TP1): The Kalman filter provided a solid foundation for smoothing noisy measurements. However, tuning the process and measurement noise covariance matrices (Q and R) was crucial to achieving good performance. A balance had to be struck; too much noise led to jittery estimates, while too little made the filter unresponsive to actual movements.

  • IOU Tracker (TP2): This tracker is simple and fast but highly dependent on detection quality and frame rate. It fails during occlusions, as an object that is hidden for a few frames will have its track terminated. When it reappears, it is assigned a new ID.

  • Kalman-Guided IOU Tracker (TP3): The addition of the Kalman filter provided a significant improvement. By predicting the object's next location, the tracker can handle very short occlusions and is less sensitive to minor detection inaccuracies. However, its reliance on a constant velocity model means it still struggles with abrupt motion changes and longer occlusions, leading to the same ID switch problem as the pure IOU tracker.

  • Appearance-Aware Tracker (TP4): This was the most robust tracker. The appearance features allowed the system to re-identify a person even after they were occluded for a longer period, as long as their appearance was distinct. This dramatically reduced the number of ID switches compared to the other methods. The main trade-off is the increased computational cost due to feature extraction for every detection. The performance is also heavily dependent on the quality of the Re-ID model.

4. Tracking Results (Videos)

  • Video for TP1:
tp1.mp4
  • Video for TP2:
tp2.mp4
  • Video for TP3:
tp3.mp4
  • Video for TP4:
tp4.mp4

About

Progressive development of a multi-object tracker, starting from a simple Kalman filter and culminating in an appearance-aware tracking system.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages