## **FEATURE DESCRIPTION**

| Feature                          | Description                                                                 | Importance for Model                                                |
|----------------------------------|-----------------------------------------------------------------------------|----------------------------------------------------------------------|
| **accident_severity** (target)   | Severity of the accident (categorical: 1=Slight, 2=Serious, 3=Fatal).       | Target variable we want to predict.                                  |
| **number_of_vehicles**           | Number of vehicles involved in the accident.                                | Higher counts may increase severity likelihood.                      |
| **number_of_casualties**         | Number of people injured/killed.                                            | Strong indicator of severity.                                        |
| **day_of_week**                  | Day accident occurred (1=Sunday, 2=Monday, ..., 7=Saturday).                | Captures weekly traffic patterns and behavior variations.            |
| **time**                         | Time of the accident (HH:MM format).                                        | Identifies rush-hour, night-time, and temporal patterns.             |
| **speed_limit**                  | Legal road speed limit at accident location (mph).                          | Higher speed limits correlate with more severe accidents.            |
| **road_type**                    | Type of road (e.g., single carriageway, dual, roundabout).                  | Road design influences accident dynamics and severity.               |
| **junction_detail**              | Type of junction involved (e.g., T-junction, crossroads, roundabout).       | Complex junctions may lead to more severe accidents.                 |
| **junction_control**             | Traffic control present (e.g., traffic lights, stop sign, uncontrolled).    | Better control systems may reduce accident severity.                 |
| **pedestrian_crossing_human_control** | Presence of human-controlled crossings (e.g., crossing guards).         | Affects pedestrian accident likelihood and severity outcomes.        |
| **pedestrian_crossing_physical_facilities** | Presence of physical crossings (zebra, pelican, footbridge, etc.).      | Infrastructure quality impacts pedestrian safety and severity.       |
| **light_conditions**             | Lighting conditions (daylight, darkness with lights, darkness no lights).   | Poor lighting significantly increases accident severity risk.        |
| **weather_conditions**           | Weather conditions (fine, rain, fog, snow, high winds).                     | Adverse weather strongly affects accident occurrence and severity.   |
| **road_surface_conditions**      | Road surface condition (dry, wet/damp, snow, ice, flood).                   | Slippery surfaces dramatically increase severe crash risk.           |
| **special_conditions_at_site**   | Unusual site conditions (roadworks, oil spill, mud, etc.).                  | Important contextual factor for understanding severity causes.       |
| **carriageway_hazards**          | Hazards present (animals, obstructions, debris, pedestrians, etc.).         | Direct hazard presence influences accident outcomes significantly.   |
| **urban_or_rural_area**          | Whether accident occurred in urban or rural area.                           | Rural areas often have higher speeds → more severe accidents.        |
| **police_force**                 | Police force area where accident occurred.                                  | Captures regional differences in enforcement, reporting, and roads.  |
| **local_authority_district**     | Local authority district code.                                              | Geographic granularity for regional pattern analysis.                |
| **local_authority_ons_district** | ONS-coded local authority district.                                         | Standardized geographic classification for analysis.                 |
| **local_authority_highway**      | Highway authority responsible for the road.                                 | Different maintenance standards may affect road safety.              |
| **first_road_class**             | Classification of the first road involved (A-road, B-road, motorway, etc.). | Road class correlates with traffic volume, speed, and design standards. |
| **first_road_number**            | Number designation of the first road.                                       | Specific road identifier for pattern recognition.                    |
| **second_road_class**            | Classification of the second road involved (if junction accident).          | Important for understanding junction dynamics and severity.          |
| **second_road_number**           | Number designation of the second road (if junction accident).               | Specific road identifier for junction analysis.                      |
| **trunk_road_flag**              | Whether the road is a trunk road (major strategic route).                   | Trunk roads typically have higher speeds and different safety profiles. |
| **did_police_officer_attend_scene_of_accident** | Whether police attended the scene.                                  | May indicate severity (police more likely to attend serious crashes).|
| **month** (engineered)           | Month when accident occurred (1-12).                                        | Captures seasonal trends and weather patterns affecting accidents.   |
| **weekday** (engineered)         | Whether the day is a weekday (True/False).                                  | Simplifies weekday/weekend traffic pattern analysis.                 |
| **is_weekend** (engineered)      | Whether the accident occurred on a weekend.                                 | Weekend driving patterns often differ from weekdays.                 |
| **region_cluster** (engineered)  | Geographic region cluster (grouped areas).                                  | Captures broader regional patterns in accident severity factors.     |


**Key Insights from Feature Set:**

- **Strong predictors**: `number_of_casualties`, `speed_limit`, `road_surface_conditions`, `weather_conditions`
- **Temporal patterns**: `time`, `day_of_week`, `month`, `is_weekend` provide rich temporal context
- **Geographic context**: Multiple geographic features (`police_force`, `local_authority_*`, `region_cluster`) enable spatial analysis
- **Road infrastructure**: Comprehensive road type and junction details for infrastructure impact analysis
- **Engineered features**: `month`, `weekday`, `is_weekend`, `region_cluster` show thoughtful feature engineering for better pattern recognition

## **ENCODED VALUES**

Here's the updated markdown table with the encoded values and including the weekday/is_weekend features:

| Feature                                     | Encoded Values | Meaning (where known)                                                                 |
|---------------------------------------------|----------------|---------------------------------------------------------------------------------------|
| **accident_severity** (target)              | {1,2,3}        | 1 = Fatal, 2 = Serious, 3 = Slight                                                    |
| **day_of_week**                             | 1–7            | 1 = Sunday, 2 = Monday, 3 = Tuesday, 4 = Wednesday, 5 = Thursday, 6 = Friday, 7 = Saturday |
| **weekday** (engineered)                    | {0,1}          | 0 = Weekend (Saturday/Sunday), 1 = Weekday (Monday-Friday)                            |
| **is_weekend** (engineered)                 | {0,1}          | 0 = Weekday, 1 = Weekend                                                              |
| **road_type**                               | {1,2,3,6,7,9}  | 1 = Roundabout, 2 = One way street, 3 = Dual carriageway, 6 = Single carriageway, 7 = Slip road, 9 = Unknown/Other |
| **junction_detail**                         | {0,1,2,3,5,6,7,8,9,99,-1} | 0 = None, 1 = T-junction, 2 = Roundabout, 3 = Crossroads, 5 = Slip road, 6 = Multiple junction, 7 = Mini-roundabout, 8 = Private drive/entrance, 9 = Other, 99 = Unknown, -1 = Missing |
| **junction_control**                        | {1,2,3,4,9,-1} | 1 = Authorised person, 2 = Auto traffic signal, 3 = Stop sign, 4 = Give way, 9 = Other, -1 = Missing |
| **pedestrian_crossing_human_control**       | {0,1,2,9,-1}   | 0 = None, 1 = Control by school crossing patrol, 2 = Other control, 9 = Unknown, -1 = Missing |
| **pedestrian_crossing_physical_facilities** | {0,1,4,5,7,8,9,-1} | 0 = None, 1 = Zebra, 4 = Pelican, 5 = Puffin, 7 = Toucan, 8 = Pedestrian phase at signals, 9 = Other, -1 = Missing |
| **light_conditions**                        | {1,4,5,6,7}    | 1 = Daylight, 4 = Darkness: lights lit, 5 = Darkness: lights unlit, 6 = Darkness: no lights, 7 = Darkness: unknown lighting |
| **weather_conditions**                      | {1–9}          | 1 = Fine no wind, 2 = Raining no wind, 3 = Snowing no wind, 4 = Fine + high winds, 5 = Rain + high winds, 6 = Snow + high winds, 7 = Fog/mist, 8 = Other, 9 = Unknown |
| **road_surface_conditions**                 | {1–5,9,-1}     | 1 = Dry, 2 = Wet/damp, 3 = Snow, 4 = Frost/ice, 5 = Flood, 9 = Unknown, -1 = Missing  |
| **special_conditions_at_site**              | {0–9,-1}       | 0 = None, 1 = Auto signal out, 2 = Permanent sign out, 3 = Roadworks, 4 = Oil/mud, 5 = Police-controlled, 6 = Permanent obstruction, 7 = Road surface defective, 8 = Inadequate street lighting, 9 = Other, -1 = Missing |
| **carriageway_hazards**                     | {0–9,-1}       | 0 = None, 1 = Vehicle load, 2 = Other object, 3 = Previous accident, 6 = Pedestrian, 7 = Animal, 9 = Other, -1 = Missing |
| **urban_or_rural_area**                     | {1,2,3,-1}     | 1 = Urban, 2 = Rural, 3 = Unallocated, -1 = Missing                                   |
| **did_police_officer_attend_scene_of_accident** | {1,2,3}    | 1 = Yes, 2 = No, 3 = Unknown                                                          |
| **trunk_road_flag**                         | {1,2,-1}       | 1 = Trunk road, 2 = Non-trunk road, -1 = Missing                                      |
| **number_of_vehicles**                      | Integer        | Count of vehicles involved in accident                                                |
| **number_of_casualties**                    | Integer        | Count of people injured/killed in accident                                            |
| **speed_limit**                             | Integer        | Legal speed limit in mph at accident location                                         |
| **time**                                    | HH:MM          | Time of accident occurrence                                                           |
| **month** (engineered)                      | 1–12           | 1 = January, 2 = February, ..., 12 = December                                         |
| **region_cluster** (engineered)             | Categorical    | Geographic region grouping based on local authority                                   |

**Note:**
- **weekday vs is_weekend**: These engineered features provide complementary information - `weekday` distinguishes Monday-Friday (1) from weekend (0), while `is_weekend` specifically flags weekend days (1) vs weekdays (0)
- **Missing values**: Many features use -1 to indicate missing data, which should be handled appropriately during preprocessing
- **Time-based features**: The combination of `day_of_week`, `weekday`, `is_weekend`, `month`, and `time` provides comprehensive temporal context for accident patterns
- **Severity encoding**: Note that accident severity is encoded with 1 being the most severe (Fatal) and 3 being least severe (Slight)