# Automobile recalls

## Attribution

The data for this project come from
[Michael Bryant's automobile recalls dataset](https://www.kaggle.com/datasets/michaelbryantds/automobile-recalls-dataset). We use a modified dataset:

* some columns have been removed,
* some column names have been changed,
* some rows have been removed due to non numeric values in numeric columns, and
* some rows have been removed to consider only subset of the data.

## Constraints

We impose the following constraints:

* may not external libraries, such as `numpy` or `pandas`; and
* may not use any regular loops, such as `for...in` or `while`;
* must use at least two lambda expressions:
  * [first lambda expression](#lambda-expressions)
  * [second lambda expression](#lambda-expressions)

## References

* [csv — CSV File Reading and Writing](https://docs.python.org/3/library/csv.html)

## Investigation I

__Which automobile manufacturer has the highest number of potentially affected vehicles? Which has the lowest?__

### Parsing

To implement the parser, we read each of the comma-separated values.

In [1]:
from csv import reader

with open('data/recalls-truncated.csv', newline='') as stream:
    csvReader = reader(stream)
    
    rows = [ row for row in csvReader ]

    print(rows)



After observing the data, we can extract the set of unique manufacturers, removing the header row:

In [2]:
from csv import reader

with open('data/recalls-truncated.csv', newline='') as stream:
    csvReader = reader(stream)
    
    rows = [ row for row in csvReader ]
    rows.pop(0)
    manufacturers = { manufacturer for [ date, manufacturer, subject, recallType, count ] in rows }
    
    print(manufacturers)

{'Carefree Of Colorado', 'Kawasaki Motors Corp., U.S.A.', 'BMW of North America, LLC', 'Gillig, LLC', 'Bulk Tank International S. de R.L. de C.', 'KZRV, L.P.', 'Wabash National Corporation', 'Yokohama Tire Corporation', 'Winnebago Towable', 'Hino Motors Sales U.S.A., Inc.', 'PACCAR Incorporated', 'Innovative Specialties LLC', 'Polestar Automotive USA, Inc.', 'Newmar Corporation', 'East Texas Trailer', 'Marion Body Works Inc.', 'ZF North America, Inc.', 'Entegra Coach', 'Fontaine Modification', 'Great Dane Trailers', 'Doosan Portable Power', 'Shyft Group', 'MW Company LLC', 'Hendrickson USA. L.L.C.', 'Yokohama Off-Highway Tires America, Inc', 'Skinny Guy Campers LLC', 'Nissan North America, Inc.', 'Carrier Corporation', 'PT. Elangperdana Tyre Industry', 'E-One Incorporated', 'Chinook Motor Coach, LLC', 'General Motors, LLC', 'New Flyer of America, Inc.', 'Volvo Trucks North America', 'Starcraft RV', 'Braun Ambulances', 'Altec Industries, Inc.', 'Bentley Motors, Inc.', 'Tiffin Motorhomes

Using the set of manufacturers, we can group rows by manufacturer, retaining only the sums of the integer counts of potential vehicles affected for each group:

In [3]:
from csv import reader

with open('data/recalls-truncated.csv', newline='') as stream:
    csvReader = reader(stream)
    
    rows = [ row for row in csvReader ]
    rows.pop(0)
    manufacturers = { manufacturer for [ date, manufacturer, subject, recallType, count ] in rows }
    groups = {
        key: sum(int(count) for [ date, manufacturer, subject, recallType, count ] in rows if manufacturer == key and recallType == 'Vehicle')
        for key in manufacturers
    }

    print(groups)

{'Carefree Of Colorado': 0, 'Kawasaki Motors Corp., U.S.A.': 3372, 'BMW of North America, LLC': 20385, 'Gillig, LLC': 388, 'Bulk Tank International S. de R.L. de C.': 2, 'KZRV, L.P.': 42, 'Wabash National Corporation': 740, 'Yokohama Tire Corporation': 0, 'Winnebago Towable': 6616, 'Hino Motors Sales U.S.A., Inc.': 977, 'PACCAR Incorporated': 93032, 'Innovative Specialties LLC': 630, 'Polestar Automotive USA, Inc.': 66, 'Newmar Corporation': 241, 'East Texas Trailer': 3, 'Marion Body Works Inc.': 6, 'ZF North America, Inc.': 0, 'Entegra Coach': 394, 'Fontaine Modification': 56, 'Great Dane Trailers': 2439, 'Doosan Portable Power': 702, 'Shyft Group': 4085, 'MW Company LLC': 0, 'Hendrickson USA. L.L.C.': 0, 'Yokohama Off-Highway Tires America, Inc': 0, 'Skinny Guy Campers LLC': 0, 'Nissan North America, Inc.': 378410, 'Carrier Corporation': 0, 'PT. Elangperdana Tyre Industry': 0, 'E-One Incorporated': 31, 'Chinook Motor Coach, LLC': 10, 'General Motors, LLC': 1432083, 'New Flyer of Amer

### Lambda expressions

Finally, we can take the extrema of the groups by key to determine the manufacturers with the highest and lowest numbers of vehicles affected.

<a name="lambda-expressions"></a>

To compute `argmin` and `argmax`, rather than simple `min` and `max`, we use lambda expressions that provide the comparison function.

In [4]:
from csv import reader

with open('data/recalls-truncated.csv', newline='') as stream:
    csvReader = reader(stream)
    
    rows = [ row for row in csvReader ]
    rows.pop(0)
    manufacturers = { manufacturer for [ date, manufacturer, subject, recallType, count ] in rows }
    groups = {
        key: sum(int(count) for [ date, manufacturer, subject, recallType, count ] in rows if manufacturer == key and recallType == 'Vehicle')
        for key in manufacturers
    }
    minKey = min(groups, key = lambda manufacturer: groups[manufacturer])
    maxKey = max(groups, key = lambda manufacturer: groups[manufacturer])

    print(f"Min: {minKey}: {groups[minKey]:,}")
    print(f"Max: {maxKey}: {groups[maxKey]:,}")

Min: Carefree Of Colorado: 0
Max: Ford Motor Company: 1,809,981


## Investigation II

__Which recalls have more than 500,000 potentially affected vehicles?__

_Hypothesis:_ We know that Ford does!

### Parsing

To implement the parser, we can use our in-house `nelta` library.

In [5]:
from nelta import read_csv

t = read_csv('data/recalls-truncated.csv')

We can easily determine the dimensions of the table:

In [6]:
t.shape()

(350, 5)

And observe its columns:

In [7]:
t.columns

['Date', 'Manufacturer', 'Subject', 'Recall Type', 'Potentially Affected']

We can also examine the first few rows:

In [8]:
t.head(4)

         Date                      Manufacturer                                      Subject Recall Type Potentially Affected
0  01/06/2023    Triple E Recreational Vehicles          Battery Disconnect Switch May Short     Vehicle                341.0
1  01/05/2023                Volvo Car USA, LLC                   Steering Wheel May Lock Up     Vehicle                 74.0
2  12/29/2022 Volkswagen Group of America, Inc.      12-Volt Battery Cable May Short Circuit     Vehicle               1042.0
3  12/29/2022         Indian Motorcycle Company Kickstand May Not Retract Properly/FMVSS 123     Vehicle               4653.0

To isolate the subject column of the last four rows:

In [9]:
last_four = t.tail(4)
last_four['Subject']

346       Improperly Secured Front Seat Belt Anchor
347 Rearview Camera Image May Not Display/FMVSS 111
348                  Secondary Hood Latch Corrosion
349             Tire Sidewall Separation/ FMVSS 139

Iterating over its values:

In [10]:
for subject in last_four['Subject']:
    print(subject)

Improperly Secured Front Seat Belt Anchor
Rearview Camera Image May Not Display/FMVSS 111
Secondary Hood Latch Corrosion
Tire Sidewall Separation/ FMVSS 139


We can also query the manufacturers and subjects of the last few rows:

In [11]:
last_four[[ 'Subject', 'Manufacturer' ]]

                                             Subject                   Manufacturer
346        Improperly Secured Front Seat Belt Anchor         Rivian Automotive, LLC
347  Rearview Camera Image May Not Display/FMVSS 111         Chrysler (FCA US, LLC)
348                   Secondary Hood Latch Corrosion            General Motors, LLC
349              Tire Sidewall Separation/ FMVSS 139 PT. Elangperdana Tyre Industry

Extracting only the vehicle recalls:

In [25]:
vehicles = t[t['Recall Type'] == 'Vehicle']

print(vehicles.head(2))
print("...")
print(vehicles.tail(2))

         Date                   Manufacturer                             Subject Recall Type Potentially Affected
0  01/06/2023 Triple E Recreational Vehicles Battery Disconnect Switch May Short     Vehicle                341.0
1  01/05/2023             Volvo Car USA, LLC          Steering Wheel May Lock Up     Vehicle                 74.0

...
           Date           Manufacturer                                         Subject Recall Type Potentially Affected
347  08/25/2022 Chrysler (FCA US, LLC) Rearview Camera Image May Not Display/FMVSS 111     Vehicle               7895.0
348  08/25/2022    General Motors, LLC                  Secondary Hood Latch Corrosion     Vehicle             120688.0



Counting the number of vehicle recalls:

In [26]:
vehicles.shape()[0]

313

We can also apply a filter to include only recalls which potentially affect more than 500,000 vehicles:

In [39]:
my_filter = vehicles['Potentially Affected'] > 500000

Verifying that our filter is a `LabeledList` instance:

In [40]:
type(my_filter)

nelta.LabeledList

And that the implementation works as expected:

In [41]:
my_filter[3]

False

We can apply the filter to the table:

In [42]:
vehicles[my_filter]

           Date           Manufacturer                                         Subject Recall Type Potentially Affected
 60  12/08/2022    General Motors, LLC     Running Lights May Not Deactivate/FMVSS 108     Vehicle             740108.0
 61  12/08/2022 Chrysler (FCA US, LLC)                 Tailgate May Open While Driving     Vehicle            1224078.0
110  11/18/2022     Ford Motor Company Cracked Fuel Injector May Leak and Cause a Fire     Vehicle             521746.0
284  09/19/2022            Tesla, Inc.               Power Windows May Pinch/FMVSS 118     Vehicle            1096762.0

We can apply a map to the manufacturers to transform them to uppercase:

In [47]:
rare_occurrence = vehicles[my_filter]
rare_occurrence['Manufacturer'].map(str.upper)

 60    GENERAL MOTORS, LLC
 61 CHRYSLER (FCA US, LLC)
110     FORD MOTOR COMPANY
284            TESLA, INC.

Finally, we can also find a list of vehicle recalls made in 2023:

In [48]:
vehicles[[ date.split('/')[2] == '2023' for date in vehicles['Date'] ]]

         Date                   Manufacturer                             Subject Recall Type Potentially Affected
0  01/06/2023 Triple E Recreational Vehicles Battery Disconnect Switch May Short     Vehicle                341.0
1  01/05/2023             Volvo Car USA, LLC          Steering Wheel May Lock Up     Vehicle                 74.0