# Automobile recalls

## Attribution

The data for this project come from
[Michael Bryant's automobile recalls dataset](https://www.kaggle.com/datasets/michaelbryantds/automobile-recalls-dataset). We use a modified dataset:

* some columns have been removed,
* some column names have been changed,
* some rows have been removed due to non numeric values in numeric columns, and
* some rows have been removed to consider only subset of the data.

## Constraints

We impose the following constraints:

* may not external libraries, such as `numpy` or `pandas`; and
* may not use any regular loops, such as `for...in` or `while`.

## References

* [csv — CSV File Reading and Writing](https://docs.python.org/3/library/csv.html)

## Investigation

__Which automobile manufacturer has the highest number of potentially affected vehicles? Which has the lowest?__

_Hypothesis:_ Mercedes-Benz has the highest, and Chrysler has the lowest.


### Parsing

To implement the parser, we read each of the comma-separated values.

In [2]:
from csv import reader

with open('data/recalls-truncated.csv', newline='') as stream:
    csvReader = reader(stream)
    
    rows = [ row for row in csvReader ]

    print(rows)



After observing the data, we can extract the set of unique manufacturers, removing the header row:

In [3]:
from csv import reader

with open('data/recalls-truncated.csv', newline='') as stream:
    csvReader = reader(stream)
    
    rows = [ row for row in csvReader ]
    rows.pop(0)
    manufacturers = { manufacturer for [ date, manufacturer, subject, recallType, count ] in rows }
    
    print(manufacturers)

{'Great Dane Trailers', 'Triple E Recreational Vehicles', 'Quality Trailers Ohio, Inc.', 'Braun Ambulances', 'Rivian Automotive, LLC', 'Meritor, Inc.', 'Starcraft RV', 'Airstream, Inc.', 'Mazda North American Operations', 'Shyft Group', 'Marion Body Works Inc.', 'Navistar, Inc.', 'Carrier Corporation', 'Shadow Trailer, LLC', 'Volvo Car USA, LLC', 'Prevost Car (US) Inc.', 'Thor Motor Coach', 'Nissan North America, Inc.', 'Lippert', 'McNeilus Truck & Manufacturing, Inc.', 'Gulf States Toyota, Inc.', 'KZRV, L.P.', 'Bombardier Recreational Products, Inc.', 'Osage Industries, Inc.', 'Kalmar Solutions, LLC', 'Hyundai Translead', 'Terex South Dakota, Inc.', 'ScentLok Technologies', 'Kia America, Inc.', 'Brenner Tank, LLC', 'KTM North America, Inc.', 'Williamsen-Godwin Truck Body Company LLC', 'Kawasaki Motors Corp., U.S.A.', 'Daimler Vans USA, LLC', 'Highland Ridge RV', 'Dexter Axle Company', 'Coach and Equipment Mfg. Corp.', 'Kibbi, LLC', 'Skinny Guy Campers LLC', 'Lion Electric Company', 'S

Using the set of manufacturers, we can group rows by manufacturer, retaining only the sums of the integer counts of potential vehicles affected for each group:

In [9]:
from csv import reader

with open('data/recalls-truncated.csv', newline='') as stream:
    csvReader = reader(stream)
    
    rows = [ row for row in csvReader ]
    rows.pop(0)
    manufacturers = { manufacturer for [ date, manufacturer, subject, recallType, count ] in rows }
    groups = { key: sum(int(count) for [ date, manufacturer, subject, recallType, count ] in rows if manufacturer == key) for key in manufacturers }

    print(groups)

{'Great Dane Trailers': 2439, 'Triple E Recreational Vehicles': 341, 'Quality Trailers Ohio, Inc.': 534, 'Braun Ambulances': 2572, 'Rivian Automotive, LLC': 12419, 'Meritor, Inc.': 1601, 'Starcraft RV': 923, 'Airstream, Inc.': 2192, 'Mazda North American Operations': 226, 'Shyft Group': 4085, 'Marion Body Works Inc.': 6, 'Navistar, Inc.': 47280, 'Carrier Corporation': 7299, 'Shadow Trailer, LLC': 12, 'Volvo Car USA, LLC': 15748, 'Prevost Car (US) Inc.': 277, 'Thor Motor Coach': 69, 'Nissan North America, Inc.': 378410, 'Lippert': 404, 'McNeilus Truck & Manufacturing, Inc.': 1140, 'Gulf States Toyota, Inc.': 157, 'KZRV, L.P.': 42, 'Bombardier Recreational Products, Inc.': 46119, 'Osage Industries, Inc.': 14, 'Kalmar Solutions, LLC': 9, 'Hyundai Translead': 39, 'Terex South Dakota, Inc.': 665, 'ScentLok Technologies': 16667, 'Kia America, Inc.': 337089, 'Brenner Tank, LLC': 3, 'KTM North America, Inc.': 1040, 'Williamsen-Godwin Truck Body Company LLC': 12, 'Kawasaki Motors Corp., U.S.A.'

Finally, we can take the extrema of the groups by key to determine the manufacturers with the highest and lowest numbers of vehicles affected.

In [15]:
from csv import reader

with open('data/recalls-truncated.csv', newline='') as stream:
    csvReader = reader(stream)
    
    rows = [ row for row in csvReader ]
    rows.pop(0)
    manufacturers = { manufacturer for [ date, manufacturer, subject, recallType, count ] in rows }
    groups = { key: sum(int(count) for [ date, manufacturer, subject, recallType, count ] in rows if manufacturer == key) for key in manufacturers }
    byKey = key = lambda manufacturer: groups[manufacturer]
    minKey = min(groups, key = byKey)
    maxKey = max(groups, key = byKey)

    print(f"Min: {minKey}: {groups[minKey]:,}")
    print(f"Max: {maxKey}: {groups[maxKey]:,}")

Min: Farber Specialty Vehicles: 1
Max: Chrysler (FCA US, LLC): 1,818,455
