# NTSB Aviation Accident dataset
The business is interested in becoming involved in commercial aviation.

An initial overview of the risks in aviation is the purpose of this data analysis.

The dataset for the overview is the National Transportation Safety Board (NTSB) dataset that covers the years 1948 through the end of 2022.

Open the dataset called AviationData.csv

In [None]:
import pandas as pd

df = pd.read_csv('Data/AviationData.csv', encoding='latin-1')

df.info()

# Cleaning the Data

rename columns to remove dots as they may cause errors in Python (replace dots with dashes or underscores)

In [None]:
df.columns = df.columns.str.replace('.', '_')

df.head()

In [None]:
# Check for duplicate rows
duplicate_rows_events = df[df.duplicated(subset=['Event_Id'], keep=False)]

duplicate_rows_events.head(10)

I see here that though these duplicate rows do represent separate aircraft in multi-aircraft incidents, the injury and/or fatality numbers are combined. This would constitute duplicate numbers in certain columns that would render errors in the analysis.

So let's remove the duplicates from this subset.

In [None]:
# remove the duplicate rows using the Event_Id column
df = df.drop_duplicates(subset=['Event_Id'], keep='first')

In [None]:
# check for duplicates again in Event_Id column
duplicate_rows_events = df[df.duplicated(subset=['Event_Id'], keep=False)]

duplicate_rows_events.info()

In [None]:
# check for duplicate rows in the Accident_Number column to verify there are no more duplicates
duplicate_rows_accidents = df[df.duplicated(subset=['Accident_Number'], keep=False)]

if duplicate_rows_accidents.empty:
  print("No duplicate rows found.")
else:
  print("Duplicate rows found.")

In [None]:
df.info()

So now we have 87,951 accident records to work with.

# Columns that are not needed

Remove certain columns that are mostly empty and would not contain data useful to the intended analysis.

I propose removing Latitude, Longitude, Schedule, and Air_carrier as those columns are mostly empty and would not contribute to my analysis.

In [None]:
df = df.drop(['Latitude', 'Longitude', 'Schedule', 'Air_carrier'], axis=1)

df.info()

# Aircraft_Category

The column for Aircraft_Category is also mostly empty, but that data could be useful. The business is after all seeking data related to types of aircraft and airplanes specifically, so removing the column entirely would not work well. Simply removing all rows that do not have a category entry would greatly reduce the number of total rows available for analysis, and most of those removed would likely be airplanes.

I would like to explore the idea of filling in as many of the missing values as I can. This could be done to some extent by making use of the Make column.

In [None]:
# Aircraft_Category values for Cessna in the Make column
df[df['Make'] == 'Cessna']['Aircraft_Category'].unique()

So here I see that Cessna categories are either empty or 'airplane'. Therefore, it's reasonable to fill in the empty category values for Cessnas as 'airplane'

In [None]:
# Show how many nan Aircraft_Category values there are for Cessna
df[df['Make'] == 'Cessna']['Aircraft_Category'].isna().sum()

In [None]:
# Show how many 'Airplane' Aircraft_Category values there are for Cessna
df[(df['Make'] == 'Cessna') & (df['Aircraft_Category'] == 'Airplane')]['Aircraft_Category'].count()

So we can add another 18344 airplane entries to our data by filling in the missing value for Cessna in the Aircraft_Category column

In [None]:
# Aircraft_Category values for Skiorsky in the Make column
df[df['Make'] == 'Sikorsky']['Aircraft_Category'].unique()

In [None]:
# Sikorsky nan values there are for 
df[df['Make'] == 'Sikorsky']['Aircraft_Category'].isna().sum()

And here we would be able to add 128 additional helicopters for Sikorsky.

In [None]:
# Fill in Aircraft_Category as 'Airplane' for Cessna
df.loc[df['Make'] == 'Cessna', 'Aircraft_Category'] = 'Airplane'

In [None]:
# Show how many 'Airplane' Aircraft_Category values there are for Cessna now
df[(df['Make'] == 'Cessna') & (df['Aircraft_Category'] == 'Airplane')]['Aircraft_Category'].count()

So now, instead of only 3500 Cessna airplanes, we have almost 22000 entries, greatly increasing the verified airplane subset.

So here I will continue finding Makes that are airplanes only, and filling in the missing values.

In [None]:
# list of the unique values in the Make column
df['Make'].value_counts()

I realize here that I need to do some further cleaning of the Make column so Cessna and CESSNA (and other similar issues) are not separate values.

# Clean up the Make column
After going through the list of makes in a plain text document, I put together a list of make values to replace the alternative spellings, all caps, etc.

In [None]:
#Fill Nan values in Make first
df['Make'] = df['Make'].fillna('Unknown')

# Clean the Make column for misspellings, all caps issues, etc
make_column_name_replace = {'Ab Sportine Aviacija': 'Sportine Aviacija', 'AB SPORTINE AVIACIJA': 'Sportine Aviacija', 'SPORTINE AVIACIJA': 'Sportine Aviacija', 'Abrams/nuding': 'Abrams', 'ACRO': 'Acro Sport', 'Adams': 'Adams Balloon', 'ADAMS': 'Adams Balloon', 'ADAMS BALLOONS LLC': 'Adams Balloon', 'AERO COMMANDER': 'Aero Commander', 'AERO VODOCHODY': 'Aero Vodochody', 'AEROVODOCHODY': 'Aero Vodochody', 'Aero Vodochody Aero Works': 'Aero Vodochody', 'AEROFAB INC': 'Aerofab Inc.', 'AEROMOT': 'Aeromot', 'AERONCA': 'Aeronca', 'Aeronca Aircraft Corporation': 'Aeronca', 'AEROPRO CZ': 'Aeropro CZ', 'AEROS': 'Aeros', 'Aeros LTD': 'Aeros', 'AEROS LTD': 'Aeros', 'AEROSPATIALE': 'Aerospatiale', 'AEROSTAR': 'Aerostar', 'Aerostar International': 'Aerostar', 'AEROSTAR INTERNATIONAL': 'Aerostar', 'Aerostar International Inc': 'Aerostar', 'AEROSTAR INTERNATIONAL INC': 'Aerostar', 'Aerostar International Inc.': 'Aerostar', 'Aerostar International, Inc.': 'Aerostar', 'AEROTEK': 'Aerotek', 'Aerotek-pitts': 'Aerotek', 'AEROTEK INC': 'Aerotek', 'AGUSTA': 'Agusta', 'AGUSTA AEROSPACE CORP': 'Agusta', 'AGUSTA SPA': 'Agusta', 'Agusta Spa': 'Agusta', 'Agusta-bell': 'Agusta', 'Agusta/Westland': 'Agusta', 'AGUSTAWESTLAND': 'Agusta', 'AgustaWestland': 'Agusta', 'AgustadWestland': 'Agusta', 'AGUSTAWESTLAND PHILADELPHIA': 'Agusta', 'AGUSTAWESTLAND SPA': 'Agusta', 'AIR CREATION': 'Air Creation', 'Air Creations': 'Air Creation', 'AIR TRACTOR': 'Air Tractor', 'AIR TRACTOR INC': 'Air Tractor', 'Air Tractor Inc': 'Air Tractor', 'Air Tractor Inc.': 'Air Tractor', 'AIR TRACTOR INC.': 'Air Tractor', 'Air Tractor, Inc.': 'Air Tractor', 'Air Borne Windsports Pty. Ltd.': 'Airborne', 'AIRBORNE (AUSTRALIA)': 'Airborne', 'AIRBORNE AUSTRALIA': 'Airborne', 'AIRBORNE EXTREME LLC': 'Airborne', 'AirBorne WindSport': 'Airborne', 'Airborne Windsports': 'Airborne', 'AIRBORNE WINDSPORTS': 'Airborne', 'Airborne Windsports Ltd': 'Airborne', 'AIRBORNE WINDSPORTS PTY LTD': 'Airborne', 'Airborne Windsports PTY LTD': 'Airborne', 'AIRBORNE': 'Airborne', 'AIRBUS': 'Airbus', 'AIRBUS HELICOPTERS': 'Airbus Helicopters', 'AIRBUS HELICOPTER': 'Airbus Helicopters', 'Airbus Helicopters (Eurocopte': 'Airbus Helicopters', 'Airbus Helicopters Deutschland': 'Airbus Helicopters', 'AIRBUS HELICOPTERS INC': 'Airbus Helicopters', 'Airbus Industrie': 'Airbus', 'AIRBUS INDUSTRIE': 'Airbus', 'AIRCRAFT MFG & DEVELOPMENT CO': 'Aircraft Mfg & Dev. Co.', 'Aircraft Mfg & Dev. Co. (amd)': 'Aircraft Mfg & Dev. Co.', 'Aircraft Mfg & Dev. Co. (AMD)':'Aircraft Mfg & Dev. Co.', 'Aircraft Mfg & Development Co.': 'Aircraft Mfg & Dev. Co.', 'AIRCRAFT MFG & DVLPMT CO': 'Aircraft Mfg & Dev. Co.', 'ALON': 'Alon', 'AMERICAN': 'American Aviation', 'American': 'American Aviation', 'AMERICAN AVIATION': 'American Aviation', 'American Aviation Corp. (aac)': 'American Aviation', 'AMERICAN CHAMPION': 'American Champion', 'American Champion (acac)': 'American Champion', 'American Champion (ACAC)': 'American Champion', 'American Champion Aircraft': 'American Champion', 'AMERICAN CHAMPION AIRCRAFT': 'American Champion', 'American Champion Aircraft Cor': 'American Champion', 'AMERICAN EUROCOPTER CORP': 'American Eurocopter', 'AMERICAN GENERAL ACFT CORP': 'American General Aircraft', 'American Legand Aircraft': 'American Legend', 'AMERICAN LEGEND': 'American Legend', 'AMERICAN LEGEND AIRCRAFT CO': 'American Legend', 'American Legend Aircraft Co.': 'American Legend', 'Anderson': 'Anderson Aircraft Corp.', 'Atr': 'ATR', 'AVIAT': 'Aviat', 'AVIAT AIRCRAFT': 'Aviat', 'Aviat Aircraft Inc': 'Aviat', 'AVIAT AIRCRAFT INC': 'Aviat', 'Aviat Aircraft Inc.': 'Aviat', 'Aviat Aircraft, Inc.': 'Aviat', 'Aviat Inc': 'Aviat', 'AVIAT INC': 'Aviat', 'Avid': 'Avid Aircraft', 'AVID': 'Avid Aircraft', 'AYERS': 'Ayres', 'Ayers': 'Ayres', 'AYRES': 'Ayres', 'AYRES CORP': 'Ayres', 'Ayres Corporation': 'Ayres', 'AYRES CORPORATION': 'Ayres', 'Bede': 'Bede Aircraft', 'BEDE': 'Bede Aircraft', 'BEECH': 'Beech', 'BEECH AIRCRAFT': 'Beech', 'BEECH AIRCRAFT CO.': 'Beech', 'Beech Aircraft Corp': 'Beech', 'Beech Aircraft Corporation': 'Beech', 'BEECH AIRCRAFT CORPORATION': 'Beech', 'Beechcraft': 'Beech', 'BEECHCRAFT': 'Beech', 'Beechcraft Corporation': 'Beech', 'BOEING': 'Boeing', 'Boeing - Canada (de Havilland)': 'Boeing', 'Boeing (Stearman)': 'Boeing', 'BOEING 777-306ER': 'Boeing', 'Boeing Commercial Airplane Gro': 'Boeing', 'BOEING COMPANY': 'Boeing', 'Boeing Company': 'Boeing', 'BOEING OF CANADA/DEHAV DIV': 'Boeing', 'Boeing Stearman': 'Boeing', 'BOEING-STEARMAN': 'Boeing', 'Boeing-brown': 'Boeing', 'BOMBARDIER': 'Bombardier', 'Bombardier Aerospace, Inc.': 'Bombardier', 'Bombardier Canadair': 'Bombardier', 'BOMBARDIER INC': 'Bombardier', 'BOMBARDIER LEARJET CORP.': 'Bombardier', 'Bombardier, Inc.': 'Bombardier', 'BRITISH AEROSPACE': 'British Aerospace', 'British Aerospace Civil Aircr': 'British Aerospace', 'BRITTEN NORMAN': 'Britten Norman', 'Britten-norman': 'Britten Norman', 'BRITTEN-NORMAN': 'Britten Norman', 'CANADAIR': 'Canadair', 'CANADAIR LTD': 'Canadair', 'CASA': 'Casa', 'Cassult Racer': 'Cassuit', 'CASSUTT': 'Cassuit', 'Cesna': 'Cessna', 'CESSNA': 'Cessna', 'CESSNA AIRCRAFT': 'Cessna', 'CESSNA Aircraft': 'Cessna', 'CESSNA AIRCRAFT CO': 'Cessna', 'Cessna Aircraft Co.': 'Cessna', 'CESSNA AIRCRAFT COMPANY': 'Cessna', 'Cessna Aircraft Company': 'Cessna', 'Cessna Ector': 'Cessna', 'CESSNA ECTOR': 'Cessna', 'Cessna Reems': 'Cessna', 'CESSNA REIMS': 'Cessna', 'Cessna Robertson': 'Cessna', 'Cessna Skyhawk II': 'Cessna', 'Cessna Soloy': 'Cessna', 'Cessna Wren': 'Cessna', 'CESSNA/AIR REPAIR INC': 'Cessna', 'CESSNA/WEAVER': 'Cessna', 'CHALLENGER': 'Challenger', 'Challenger Ii': 'Challenger', 'CHAMBERLAIN GERALD': 'Chamberlain', 'CHAMBERLIN VICTOR WAYNE': 'Chamberlain', 'CHAMPION': 'Champion', 'CHANCE VOUGHT': 'Chance Vought', 'CHRISTEN INDUSTRIES INC': 'Christen Industries', 'Christen Industries, Inc.': 'Christen Industries', 'CIRRUS': 'Cirrus', 'Cirrus Design': 'Cirrus', 'CIRRUS DESIGN': 'Cirrus', 'Cirrus Design Corp': 'Cirrus', 'CIRRUS DESIGN CORP': 'Cirrus', 'Cirrus Design Corp.': 'Cirrus', 'CIRRUS DESIGN CORP.': 'Cirrus', 'Cirrus Design Corporation': 'Cirrus', 'CIRRUS DESIGN CORPORATION': 'Cirrus', 'CLASSIC AIRCRAFT CORP': 'Classic Aircraft Corp', 'Classic Aircraft Corp.': 'Classic Aircraft Corp', 'COLUMBIA': 'Columbia', 'Columbia Aircraft': 'Columbia', 'Columbia Aircraft Mfg': 'Columbia', 'COLUMBIA AIRCRAFT MFG': 'Columbia', 'Columbia Aircraft Mfg.': 'Columbia', 'COMMANDER': 'Commander', 'Commander Aircraft': 'Commander', 'COMMANDER AIRCRAFT CO': 'Commander', 'Commander Aircraft Company': 'Commander', 'CONSOLIDATED AERONAUTICS': 'Consolidated Aero', 'Consolidated Aeronautics Inc.': 'Consolidated Aero', 'CONSOLIDATED AERONAUTICS INC.': 'Consolidated Aero', 'Consolidated Aeronautics, Inc': 'Consolidated Aero', 'Consolidated Aeronautics, Inc.': 'Consolidated Aero', 'CONSOLIDATED VULTEE': 'Consolidated Aero', 'Consolidated-vultee': 'Consolidated Aero', 'CONVAIR': 'Convair', 'Convair Div. Of Gen. Dynamics': 'Convair', 'COSTRUZIONI AERONAUTICHE TECNA': 'Costruzioni', 'Costruzioni Aeronautiche': 'Costruzioni', 'Costruzioni AeronauticheTecnam': 'Costruzioni', 'CUB CRAFTER': 'Cub Crafters', 'CUB CRAFTERS': 'Cub Crafters', 'CUB CRAFTERS INC': 'Cub Crafters', 'Cub Crafters Inc': 'Cub Crafters', 'Cub Crafters Inc.': 'Cub Crafters', 'Cub Crafters, Inc.': 'Cub Crafters', 'Cubcrafter': 'Cub Crafters', 'Cubcrafters': 'Cub Crafters', 'CUBCRAFTERS': 'Cub Crafters', 'CubCrafters': 'Cub Crafters', 'CUBCRAFTERS INC': 'Cub Crafters', 'CubCrafters Inc': 'Cub Crafters', 'Cubcrafters, Inc': 'Cub Crafters', 'CubCrafters, Inc': 'Cub Crafters', 'CULVER': 'Culver', 'CULVER GLENN': 'Culver', 'Culver-revolution': 'Culver', 'Culver, Aurther L.': 'Culver', 'CURTIS JOHN P': 'Curtiss Wright', 'Curtis-travel Air': 'Curtiss Wright', 'Curtis-wright': 'Curtiss Wright', 'CURTISS': 'Curtiss Wright', 'CURTISS WRIGHT': 'Curtiss Wright', 'Curtiss-wright': 'Curtiss Wright', 'Curtiss-Wright': 'Curtiss Wright', 'CZECH': 'Czech Aircraft Works', 'CZECH AIRCRAFT WORKS': 'Czech Aircraft Works', 'CZECH AIRCRAFT WORKS SPOL SRO': 'Czech Aircraft Works', 'Czech Aircraft Works SPOL SRO': 'Czech Aircraft Works', 'Czech Aircraft Works Spol Sro': 'Czech Aircraft Works', 'Czech Sport Aircraft': 'Czech Sport', 'CZECH SPORT AIRCRAFT A S': 'Czech Sport', 'Czech Sport Aircraft a.s.': 'Czech Sport', 'Czech Sport Aircraft AS': 'Czech Sport', 'CZECH SPORT AIRCRAFT AS': 'Czech Sport', 'CZECH SPORTPLANES SRO': 'Czech Sport', 'DASSAULT': 'Dassault', 'Dassault Aviation': 'Dassault', 'DASSAULT AVIATION': 'Dassault', 'Dassault Falcon': 'Dassault', 'Dassault-breguet': 'Dassault', 'DASSAULT-BREGUET': 'Dassault', 'Dassault-Breguet': 'Dassault', 'Dassault/sud': 'Dassault', 'DASSAULT/SUD': 'Dassault', 'DE HAVILLAND': 'de Havilland', 'De Havilland': 'de Havilland', 'Dehavilland': 'de Havilland', 'DEHAVILLAND': 'de Havilland', 'DeHavilland': 'de Havilland', 'DEHAVILLAND CANADA': 'de Havilland', 'DIAMOND': 'Diamond', 'Diamond Aicraft Industries Inc': 'Diamond', 'Diamond Aircraft': 'Diamond', 'DIAMOND AIRCRAFT': 'Diamond', 'DIAMOND AIRCRAFT IND GMBH': 'Diamond', 'Diamond Aircraft Industries': 'Diamond', 'DIAMOND AIRCRAFT INDUSTRIES': 'Diamond', 'DIAMOND AIRCRAFT INDUSTRIES IN': 'Diamond', 'Diamond Aircraft Industry Inc': 'Diamond', 'DORNIER': 'Dornier', 'DORNIER GMBH': 'Dornier', 'DOUGLAS': 'Douglas', 'DOWNER': 'Downer', 'Downer Aircraft Industries': 'Downer', 'EAGLE': 'Eagle', 'Eagle (ultralight)': 'Eagle', 'EAGLE AIRCRAFT CO': 'Eagle', 'Eagle Aircraft Co.': 'Eagle', 'ECLIPSE AVIATION': 'Eclipse Aviation', 'ECLIPSE AVIATION CORP': 'Eclipse Aviation', 'Eclipse Aviation Corporation': 'Eclipse Aviation', 'EDGE': 'Edge', 'EIPPER': 'Eipper', 'Eipper Formance': 'Eipper', 'EIPPER FORMANCE INC': 'Eipper', 'Eipper Mx Ii Quicksilver': 'Eipper', 'Eipper Quicksilver': 'Eipper', 'Eipper Quicksiver E': 'Eipper', 'Eipper-formance': 'Eipper', 'EMBRAER': 'Embraer', 'Embraer Aircraft': 'Embraer', 'EMBRAER EXECUTIVE AIRCRAFT INC': 'Embraer', 'EMBRAER S A': 'Embraer', 'EMBRAER-EMPRESA BRASILEIRA DE': 'Embraer', 'ENGINEERING & RESEARCH': 'Engineering & Research', 'Engineering and Research': 'Engineering & Research', 'ENSTROM': 'Enstrom', 'ENSTROM HELICOPTER CORP': 'Enstrom', 'ERCOUPE': 'Ercoupe', 'Ercoupe (eng & Research Corp.)': 'Ercoupe', 'EUROCOPTER': 'Eurocopter', 'Eurocopter Deutsch': 'Eurocopter', 'Eurocopter Deutschland': 'Eurocopter', 'Eurocopter Deutschland Gmbh': 'Eurocopter', 'EUROCOPTER DEUTSCHLAND GMBH': 'Eurocopter', 'Eurocopter France': 'Eurocopter', 'EUROCOPTER FRANCE': 'Eurocopter', 'EUROPA': 'Europa', 'Europa Aviation Inc': 'Europa', 'EVEKTOR AEROTECHNIK': 'Evektor Aerotechnik', 'Evektor Aerotechnik': 'Evektor Aerotechnik', 'Evektor Aerotechnik AS': 'Evektor Aerotechnik', 'Evektor-aerotechnik': 'Evektor Aerotechnik', 'Evektor-aerotechnik A.s.': 'Evektor Aerotechnik', 'Evektor-aerotechnik As': 'Evektor Aerotechnik', 'EVEKTOR-AEROTECHNIK AS': 'Evektor Aerotechnik', 'Evektor-Aerotechnik AS': 'Evektor Aerotechnik', 'EXTRA': 'Extra', 'Extra Flugzeugbau': 'Extra', 'EXTRA FLUGZEUGBAU': 'Extra', 'Extra Flugzeugbau Gmbh': 'Extra', 'EXTRA FLUGZEUGBAU GMBH': 'Extra', 'EXTRA Flugzeugproduktions-GMBH': 'Extra', 'Extra Flugzeugproduktions-und': 'Extra', 'EXTRA FLUGZEUGPRODUKTIONS-UND': 'Extra', 'Extra Flugzeugrau Gmbh': 'Extra', 'FAIRCHILD': 'Fairchild', 'Fairchild Industries': 'Fairchild', 'Fairchild Swearingen': 'Fairchild', 'Fairchild/swearingen': 'Fairchild', 'FAIRCHILD(HOWARD)': 'Fairchild', 'Fairchild Heli-porter': 'Fairchild', 'FAIRCHILD HELI-PORTER': 'Fairchild', 'Fairchild-heliporter': 'Fairchild', 'FAIRCHILD HILLER': 'Fairchild Hiller', 'FANTASY AIR': 'Fantasy Air', 'FANTASY AIR SRO': 'Fantasy Air', 'Fantasy Sky Promotions': 'Fantasy Air', 'FIELDS': 'Fields', 'FISHER': 'Fisher', 'Fischer': 'Fisher', 'Fisher Aero': 'Fisher', 'Fisher Flying Products': 'Fisher', 'FISHER FLYING PRODUCTS': 'Fisher', 'FISHER HAROLD R': 'Fisher', 'FISHER MICHAEL E': 'Fisher', 'FISHER MICHAEL H': 'Fisher', 'FLEET': 'Fleet', 'FLIGHT DESIGN': 'Flight Design', 'FLIGHT DESIGN GENERAL AVN GMBH': 'Flight Design', 'Flight Design Gmbh': 'Flight Design', 'FLIGHT DESIGN GMBH': 'Flight Design', 'Flight Design GMBH': 'Flight Design', 'Flight Star': 'Flightstar', 'FLIGHTSTAR': 'Flightstar', 'FLIGHTstar': 'Flightstar', 'FLIGHTSTAR INC': 'Flightstar', 'FLIGHTSTAR SPORTPLANES': 'Flightstar', 'Flightstar Sportplanes': 'Flightstar', 'FOKKER': 'Fokker', 'FOUND ACFT CANADA INC': 'Found Aircraft Canada', 'Found Acft': 'Found Aircraft Canada', 'Found Aircraft Canada Inc': 'Found Aircraft Canada', 'GARLICK': 'Garlick', 'Garlick Helicipters Inc.': 'Garlick', 'GARLICK HELICOPTERS INC': 'Garlick', 'Garlick Helicopters Inc.': 'Garlick', 'Gates Lear Jet': 'Gates Learjet', 'GATES LEAR JET': 'Gates Learjet', 'GATES LEAR JET CORP.': 'Gates Learjet', 'GATES LEARJET': 'Gates Learjet', 'GATES LEARJET CORP': 'Gates Learjet', 'GATES LEARJET CORP.': 'Gates Learjet', 'Gates Learjet Corporation':'Gates Learjet', 'GENERAL ATOMICS': 'General Atomics', 'GLASAIR': 'Glasair', 'Glasair Aviation LLC': 'Glasair', 'GLASAIR AVIATION USA LLC': 'Glasair', 'Glasair Iii': 'Glasair', 'Glassair': 'Glasair', 'Glassair Iii': 'Glasair', 'GLOBE': 'Globe', 'Globe Swift': 'Globe', 'GREAT LAKES': 'Great Lakes', 'Great Lakes Adams': 'Great Lakes', 'Great Lakes Aircraft Company': 'Great Lakes', 'GRUMMAN': 'Grumman', 'Grumman Acft Eng': 'Grumman', 'GRUMMAN ACFT ENG COR': 'Grumman', 'GRUMMAN ACFT ENG COR-SCHWEIZER': 'Grumman', 'GRUMMAN AIRCRAFT': 'Grumman', 'GRUMMAN AIRCRAFT COR-SCHWEIZER': 'Grumman', 'GRUMMAN AIRCRAFT ENG CORP': 'Grumman', 'Grumman American': 'Grumman', 'GRUMMAN AMERICAN': 'Grumman', 'Grumman American Aviation': 'Grumman', 'GRUMMAN AMERICAN AVIATION CORP': 'Grumman', 'GRUMMAN AMERICAN AVN. CORP': 'Grumman', 'Grumman American Avn. Corp.': 'Grumman', 'GRUMMAN AMERICAN AVN. CORP.': 'Grumman', 'GRUMMAN American Corporation': 'Grumman', 'Grumman Schweizer': 'Grumman', 'GRUMMAN SCHWEIZER': 'Grumman', 'Grumman-schweizer': 'Grumman', 'Grumman-Schweizer': 'Grumman', 'GULFSTREAM': 'Gulfstream', 'Gulfstream Aerospace': 'Gulfstream', 'GULFSTREAM AEROSPACE': 'Gulfstream', 'Gulfstream Aerospace Corp': 'Gulfstream', 'Gulfstream Aerospace Corp.': 'Gulfstream', 'Gulfstream Aerospace LP': 'Gulfstream', 'GULFSTREAM AM CORP COMM DIV': 'Gulfstream', 'Gulfstream American': 'Gulfstream', 'Gulfstream American Corp': 'Gulfstream', 'GULFSTREAM AMERICAN CORP': 'Gulfstream', 'Gulfstream American Corp.': 'Gulfstream', 'GULFSTREAM AMERICAN CORP.': 'Gulfstream', 'GULFSTREAM SCHWEIZER A/C CORP': 'Gulfstream', 'Gulfstream-schweizer': 'Gulfstream', 'GULFSTREAM-SCHWEIZER': 'Gulfstream', 'HALL': 'Hall', 'Hall Christen Eagle': 'Hall', 'HALL DON H': 'Hall', 'HALL JEFFREY': 'Hall', 'HALL STEVEN C': 'Hall', 'HALL TERRENCE / HALL CATHIE': 'Hall', 'HALL THOMAS K': 'Hall', 'Hall-cavalier': 'Hall', 'HALL, WENDALL W': 'Hall', 'Hamilton Stoddard': 'Hamilton', 'HAWKER': 'Beech', 'HAWKER AIRCRAFT LTD': 'Beech', 'Hawker Aircraft Ltd': 'Beech', 'Hawker Aircraft Ltd.': 'Beech', 'Hawker Beech': 'Beech', 'HAWKER BEECH': 'Beech', 'Hawker Beechcraft': 'Beech', 'HAWKER BEECHCRAFT': 'Beech', 'HAWKER BEECHCRAFT CORP': 'Beech', 'Hawker Beechcraft Corp.': 'Beech', 'Hawker Beechcraft Corporation': 'Beech', 'HAWKER BEECHCRAFT CORPORATION': 'Beech', 'Hawker Siddeley': 'Hawker', 'HAWKER SIDDELEY': 'Hawker', 'Hawker Siddely': 'Hawker', 'Hawker-beechcraft': 'Beech', 'Hawker-Beechcraft': 'Beech', 'Hawker-Beechcraft Corporation': 'Beech', 'Helie': 'Helio', 'HELIO': 'Helio', 'Helio Aircraft Ltd': 'Helio', 'HILLER': 'Hiller', 'Hiller-osborn': 'Hiller', 'HILLER-ROGERSON HELICOPTER': 'Hiller', 'Hiller-soloy': 'Hiller', 'HILLER-TRI-PLEX IND.INC.': 'Hiller', 'HONDA AIRCRAFT': 'Honda Aircraft', 'HONDA AIRCRAFT CO LLC': 'Honda Aircraft', 'Honda Jet': 'Honda Aircraft', 'HOWARD AIRCRAFT': 'Howard Aircraft', 'Howard Aircraft Corp.': 'Howard Aircraft', 'HUGHES': 'Hughes', 'HUGHES HELICOPTERS INC': 'Hughes', 'HUGHES/HELICOPTER ASSOCS INC': 'Hughes', 'HUNTER': 'Hunter', 'HUNTER GEORGE': 'Hunter', 'ICON': 'Icon', 'ICON AIRCRAFT INC': 'Icon', 'INTERPLANE': 'Interplane', 'Interplane Llc': 'Interplane', 'INTERPLANE S R O': 'Interplane', 'Interplane Sro': 'Interplane', 'INTERSTATE': 'Interstate', 'ISRAEL AEROSPACE INDUSTRIESLTD': 'Israel Aircraft Industries', 'ISRAEL AIRCRAFT INDUSTRIES': 'Israel Aircraft Industries', 'JABIRU': 'Jabiru Aircraft', 'JABIRU AIRCRAFT PTY LTD': 'Jabiru Aircraft', 'JABIRU USA SPORT AIRCRAFT': 'Jabiru Aircraft', 'JABIRU USA SPORT AIRCRAFT LLC': 'Jabiru Aircraft', 'JABIRU USA SPORT AIRCRAFT, LLC': 'Jabiru Aircraft', 'JACKSON': 'Jackson', 'JACKSON DENNIS': 'Jackson', 'JACKSON FRED M': 'Jackson', 'JACKSON JEREMIAH D': 'Jackson', 'JONES': 'Jones', 'JONES KENT C': 'Jones', 'JONES PETER M': 'Jones', 'JONES RALPH D': 'Jones', 'JONES RODNEY V': 'Jones', 'JONES RONALD C': 'Jones', 'JUST': 'Just', 'JUST AIRCRAFT': 'Just', 'Just Aircraft Llc': 'Just', 'JUST AIRCRAFT LLC': 'Just', 'Just Aircraft LLC': 'Just', 'KAMAN': 'Kaman', 'KAMAN AEROSPACE CORP': 'Kaman', 'KITFOX': 'Kitfox', 'Kitfox IV': 'Kitfox', 'Kitfox Ten, Inc': 'Kitfox', 'KOLB': 'Kolb', 'KOLB AIRCRAFT CO': 'Kolb', 'KOLB AIRCRAFT INC': 'Kolb', 'Kolb Company': 'Kolb', 'Kolb Twin Star': 'Kolb', 'LAKE': 'Lake', 'Lake John K': 'Lake', 'LANCAIR': 'Lancair', 'Lancair Company': 'Lancair', 'LANCAIR COMPANY': 'Lancair', 'Lanceair': 'Lancair', 'LARSEN': 'Larsen', 'Larsen Charles Bennett': 'Larsen', 'Larson': 'Larsen', 'LARSON': 'Larsen', 'LARSON KEN W': 'Larsen', 'LARSON ROGER H': 'Larsen', 'Larson Smith Miniplane': 'Larsen', 'Larson, C.h.': 'Larsen', 'LEARJET': 'Learjet', 'LEARJET INC': 'Learjet', 'Learjet Inc': 'Learjet', 'LET': 'Let', 'LIBERTY AEROSPACE INCORPORATED': 'Liberty', 'LIBERTY AEROSPACE': 'Liberty', 'Liberty Aerospace': 'Liberty', 'Liberty Aerospace Inc.': 'Liberty', 'Liberty Aerospace Incorporate': 'Liberty', 'Liberty Aerospace, Inc.': 'Liberty', 'LINDSTRAND': 'Lindstrand', 'Lindstrand Balloons': 'Lindstrand', 'LINDSTRAND BALLOONS': 'Lindstrand', 'LINDSTRAND BALLOONS USA': 'Lindstrand', 'LINSTRAND': 'Lindstrand', 'LOCKHEED': 'Lockheed', 'Lockheed-martin': 'Lockheed', 'LOCKWOOD': 'Lockwood', 'LOCKWOOD AIRCRAFT CORP': 'Lockwood', 'LONG': 'Long', 'Long E-z-e': 'Long', 'Long Pitts': 'Long', 'Long-ez': 'Long', 'Longjev': 'Long', 'LUSCOMBE': 'Luscombe', 'Luscombe Silvaire Aircraft Co.': 'Luscombe', 'MARTIN CHARLES A': 'Martin', 'Martin Company': 'Martin', 'MARTIN CURTIS': 'Martin', 'MARTIN EDWARD H': 'Martin', 'Martin-pitts': 'Martin', 'Martin/harris': 'Martin', 'MAULE': 'Maule', 'Maule Air Inc.': 'Maule', 'MAULE AIRCRAFT CORP': 'Maule', 'MAXAIR': 'Maxair', 'Maxair Aircraft Corp.': 'Maxair', 'MAXAIR DRIFTER': 'Maxair', 'MCCLISH': 'McClish', 'Mcclish': 'McClish', 'Mcdonald': 'Mcdonnell Douglas', 'Mcdonald Douglas': 'Mcdonnell Douglas', 'MCDONNELL DOUGLAS': 'Mcdonnell Douglas', 'MCDONNELL DOUGLAS AIRCRAFT CO': 'Mcdonnell Douglas', 'MCDONNELL DOUGLAS CORPORATION': 'Mcdonnell Douglas', 'Mcdonnell-douglas': 'Mcdonnell Douglas', 'MCDONNELL DOUGLAS HELI CO': 'Mcdonnell Douglas Helicopter', 'MCDONNELL DOUGLAS HELICOPTER': 'Mcdonnell Douglas Helicopter', 'McDonnell Douglas Helicopter C': 'Mcdonnell Douglas Helicopter', 'Mcdonnell Douglas Helicopters': 'Mcdonnell Douglas Helicopter', 'McDonnell Douglas Helicopters': 'Mcdonnell Douglas Helicopter', 'MD HELICOPTER': 'Md Helicopter', 'MD Helicopter Inc': 'Md Helicopter', 'MD Helicopters Inc': 'Md Helicopter', 'MD HELICOPTER INC': 'Md Helicopter', 'Md Helicopter Inc.': 'Md Helicopter', 'Md Helicopters': 'Md Helicopter', 'MD HELICOPTERS': 'Md Helicopter', 'MD HELICOPTERS INC': 'Md Helicopter', 'Md Helicopters Inc.': 'Md Helicopter', 'Md Helicopters, Inc.': 'Md Helicopter', 'MD HELICOPTERS, INC.': 'Md Helicopter', 'Messerschmitt-boelkow-blohm': 'Messerschmitt', 'MESSERSCHMITT-BOELKOW-BLOHM': 'Messerschmitt', 'Messerschmitt-bolkow-blohm': 'Messerschmitt', 'MESSERSCHMITT-BOLKOW-BLOHM': 'Messerschmitt', 'MEYER': 'Meyer', 'MEYERS': 'Meyers', 'Meyers Aircraft Co.': 'Meyers', 'MEYERS INDUSTRIES INC': 'Meyers', 'MILLER': 'Miller', 'Miller Air Sports': 'Miller', 'Mitchell Aircraft Corp': 'Mitchell', 'MITSUBISHI': 'Mitsubishi', 'MONOCOUPE': 'Monocoupe', 'Monocoupe Aircraft': 'Monocoupe', 'MOONEY': 'Mooney', 'Mooney Aircraft': 'Mooney', 'Mooney Aircraft Corp': 'Mooney', 'MOONEY AIRCRAFT CORP.': 'Mooney', 'Mooney Aircraft Corporation': 'Mooney', 'MOONEY AIRPLANE CO INC': 'Mooney', 'MOONEY AIRPLANE COMPANY, INC.': 'Mooney', 'MOONEY INTERNATIONAL CORP': 'Mooney', 'MOORE': 'Moore', 'MORAVAN': 'Moravan', 'MORRISEY': 'Morrisey', 'MORRISON': 'Morrison', 'Murphy Aircraft': 'Murphy', 'Murphy Aircraft Mfg, Ltd.': 'Murphy', 'Murphy-charles': 'Murphy', 'Murphey': 'Murphy', 'MYERS': 'Myers', 'NANCHANG': 'Nanchang', 'Nanchang China': 'Nanchang', 'NANCHANG CHINA': 'Nanchang', 'NAVAL AIRCRAFT FACTORY': 'Naval Aircraft Factory', 'Navy': 'Naval Aircraft Factory', 'NAVION': 'Navion', 'NELSON': 'Nelson', 'Nelson/nelson': 'Nelson', 'New Glasair': 'Glasair', 'NEW KOLB AIRCRAFT CO': 'Kolb', 'NEW KOLB AIRCRAFT CO LLC': 'Kolb', 'New Piper': 'Piper', 'NEW PIPER': 'Piper', 'NEW PIPER AIRCRAFT INC': 'Piper', 'New Piper Aircraft, Inc.': 'Piper', 'NEW STANDARD': 'New Standard', 'NORD': 'Nord', 'Nord (sncan)': 'Nord', 'Nord Aviation': 'Nord', 'NORTH AMERICAN': 'North American', 'North American Aviation Div.': 'North American', 'North American Rockwell': 'North American', 'North American Rockwell Corp.': 'North American', 'NORTH AMERICAN-MEDORE': 'North American', 'North American/aero Classics': 'North American', 'North American-aero Classics': 'North American', 'North American-barene': 'North American', 'North American-kenney': 'North American', 'North American-maslon': 'North American', 'NORTH AMERICAN/AERO CLASSICS': 'North American', 'NORTH AMERICAN/SCHWAMM': 'North American', 'NORTH AMERICAN/VICTORIA MNT LT': 'North American', 'NORTH WING': 'North Wing', 'NORTH WING DESIGN': 'North Wing', 'NORTH WING UUM INC': 'North Wing', 'North Wing UUM INC': 'North Wing', 'NORTH WING UUM INC.': 'North Wing', 'NORTHROP': 'Northrop', 'NORTHWING': 'Northwing', 'Northwing Design': 'Northwing', 'NORTHWING DESIGN': 'Northwing', 'PACIFIC AEROSPACE CORP LTD': 'Pacific Aerospace', 'PIAGGIO': 'Piaggio', 'Piaggio Aero Industries S.p.a.': 'Piaggio', 'PIAGGIO AERO INDUSTRIES SPA': 'Piaggio', 'Piaggio Industrie': 'Piaggio', 'PIETENPOL': 'Pietenpol', 'Pietenpol-grega': 'Pietenpol', 'PILATUS': 'Pilatus', 'Pilatus Aircraft': 'Pilatus', 'PILATUS AIRCRAFT LTD': 'Pilatus', 'Pilatus Britten-norman': 'Pilatus', 'PILATUS BRITTEN-NORMAN': 'Pilatus', 'PIPER': 'Piper', 'PIPER / LAUDEMAN': 'Piper', 'Piper Aerostar': 'Piper', 'Piper Aircraft': 'Piper', 'PIPER AIRCRAFT': 'Piper', 'PIPER AIRCRAFT CORPORATION': 'Piper', 'Piper Aircraft Corporation': 'Piper', 'PIPER AIRCRAFT INC': 'Piper', 'PIPER AIRCRAFT, INC.': 'Piper', 'Piper Aircraft, Inc.': 'Piper', 'Piper Cub Crafters': 'Piper', 'Piper Pawnee': 'Piper', 'Piper-aerostar': 'Piper', 'PIPER-HARRIS': 'Piper', 'Piper/cub Crafters': 'Piper', 'PIPER/CUB CRAFTERS': 'Piper', 'Piper/Cub Crafters': 'Piper', 'Piper/stevens': 'Piper', 'PIPER/WALLY\'S FLYERS INC': 'Piper', 'PIPISTREL': 'Pipistrel', 'PIPISTREL D O O': 'Pipistrel', 'PIPISTREL DOO AJDOVSCINA': 'Pipistrel', 'PIPISTREL ITALIA S R L': 'Pipistrel', 'Pipistrel Italia SRL': 'Pipistrel', 'PITCAIRN': 'Pitcairn', 'PITTS': 'Pitts', 'PITTS AEROBATICS': 'Pitts', 'Pitts Special': 'Pitts', 'PITTS SPECIAL': 'Pitts', 'Pitts Spl.': 'Pitts', 'PROGRESSIVE AERODYNE INC': 'Progressive Aerodyne', 'Progressive Aerodyne Inc.': 'Progressive Aerodyne', 'PZL BIELSKO': 'PZL', 'PZL MIELEC': 'PZL', 'PZL OKECIE': 'PZL', 'Pzl Okecie': 'PZL', 'Pzl Swidnik': 'PZL', 'PZL SWIDNIK': 'PZL', 'PZL Warszawa-Okecie': 'PZL', 'Pzl Warzawa-cnpsl': 'PZL', 'Pzl Warzawa-okecie': 'PZL', 'Pzl-bielsko': 'PZL', 'PZL-BIELSKO': 'PZL', 'Pzl-mielec': 'PZL', 'Pzl-okecie': 'PZL', 'Pzl-swidnik': 'PZL', 'PZL-Swidnik': 'PZL', 'PZL-SWIDNIK': 'PZL', 'QUAD CITY': 'Quad City', 'Quad City Aircraft': 'Quad City', 'Quad City Aircraft Corp': 'Quad City', 'Quad City Ultralight': 'Quad City', 'QUAD CITY ULTRALIGHT ACFT CORP': 'Quad City', 'Quad City Ultralight Aircraft': 'Quad City', 'QUAD CITY ULTRALIGHT CORP': 'Quad City', 'Quad City Ultralight, Llc': 'Quad City', 'QUAD CITY ULTRALIGHTS': 'Quad City', 'Quadcity': 'Quad City', 'QUEST': 'Quest', 'QUEST Aircraft Company': 'Quest', 'QUEST AIRCRAFT COMPANY LLC': 'Quest', 'QUESTAIR INC': 'Questair', 'Questair, Inc.': 'Questair', 'Questaire': 'Questair', 'QUICKIE': 'Quickie', 'Quickie Aircraft': 'Quickie', 'Quickie-myers': 'Quickie', 'Quick Silver': 'Quicksilver', 'QUICKSILVER': 'Quicksilver', 'Quicksilver Aircraft': 'Quicksilver', 'QUICKSILVER AIRCRAFT': 'Quicksilver', 'QUICKSILVER AIRCRAFT CO': 'Quicksilver', 'Quicksilver Aircraft Northeast': 'Quicksilver', 'QUICKSILVER EIPPER ACFT INC': 'Quicksilver', 'QUICKSILVER ENTERPRISES INC': 'Quicksilver', 'Quicksilver II': 'Quicksilver', 'Quicksilver Manufacturing': 'Quicksilver', 'QUICKSILVER MANUFACTURING INC': 'Quicksilver', 'QUICKSILVER MFG': 'Quicksilver', 'Quiksilver': 'Quicksilver', 'RANS': 'Rans', 'RANS AIRCRAFT': 'Rans', 'Rans Company': 'Rans', 'RANS DESIGNS INC': 'Rans', 'Rans Employee Flying Club': 'Rans', 'RANS EMPLOYEE FLYING CLUB': 'Rans', 'RANS INC': 'Rans', 'Rans Inc.': 'Rans', 'RANS S-12': 'Rans', 'Rans S-12 Airaile': 'Rans', 'Rans, Inc.': 'Rans', 'Rans/hine': 'Rans', 'Raven Industries': 'Raven', 'RAVEN INDUSTRIES INC': 'Raven', 'RAYTHEON': 'Raytheon', 'Raytheon Aircraft Company': 'Raytheon', 'RAYTHEON AIRCRAFT COMPANY': 'Raytheon', 'Raytheon Co': 'Raytheon', 'RAYTHEON COMPANY': 'Raytheon', 'Raytheon Corporate Jets': 'Raytheon', 'RAYTHEON CORPORATE JETS INC': 'Raytheon', 'REIMS': 'Reims', 'REims': 'Reims', 'Reims Aviation': 'Reims', 'Reims Aviation Cessna': 'Reims', 'REIMS AVIATION S.A.': 'Reims', 'REIMS AVIATION SA': 'Reims', 'Reims-Cessna': 'Reims', 'REIMS-CESSNA': 'Reims', 'REMOS ACFT GMBH FLUGZEUGBAU': 'Remos', 'Remos Aircraft GmbH': 'Remos', 'REMOS AIRCRAFT GMBH': 'Remos', 'Remos Aircraft GMBH': 'Remos', 'Remos Aircraft GMBH Flugzeugba': 'Remos', 'REMOS AIRCRAFT GMBH FLUGZEUGBA': 'Remos', 'REPUBLIC': 'Republic', 'ROBIN': 'Robin', 'ROBINSON': 'Robinson', 'Robinson Helicopter': 'Robinson', 'ROBINSON HELICOPTER': 'Robinson', 'ROBINSON HELICOPTER CO': 'Robinson', 'ROBINSON HELICOPTER CO INC': 'Robinson', 'Robinson Helicopter Co.': 'Robinson', 'Robinson Helicopter Company': 'Robinson', 'ROBINSON HELICOPTER COMPANY': 'Robinson', 'Robinson Helicopters': 'Robinson', 'ROCKWELL': 'Rockwell', 'Rockwell Comdr': 'Rockwell', 'Rockwell Commander': 'Rockwell', 'ROCKWELL COMMANDER': 'Rockwell', 'Rockwell Int\'t': 'Rockwell', 'Rockwell International': 'Rockwell', 'ROCKWELL INTERNATIONAL': 'Rockwell', 'Rockwell Intl': 'Rockwell', 'ROGERS': 'Rogers', 'ROLLADEN-SCHNEIDER': 'Rolladen Schneider', 'ROLLADEN SCHNEIDER OHG': 'Rolladen Schneider', 'Rolladen-schneider': 'Rolladen Schneider', 'Rolladen-schneider Gmbh': 'Rolladen Schneider', 'ROLLADEN-SCHNEIDER GMBH': 'Rolladen Schneider', 'ROLLADEN-SCHNEIDER OHG': 'Rolladen Schneider', 'Roof': 'Root', 'Root, Arthur T.': 'Root', 'ROSE': 'Rose', 'ROSE HERBERT D': 'Rose', 'Rose Rhinehart': 'Rose', 'Rose Wesley': 'Rose', 'Rose-Rhinehart': 'Rose', 'ROSS ALFRED K/ONEILL TERRENCE': 'Ross', 'ROSS H/HERRIOTT M': 'Ross', 'ROSS JONATHAN': 'Ross', 'Ross/stonecipher': 'Ross', 'ROTORWAY': 'Rotorway', 'Rotorway Aircraft, Inc.': 'Rotorway', 'Rotorway Executive': 'Rotorway', 'Rotoway International': 'Rotorway', 'RUTAN': 'Rutan', 'Rutan Aircraft Factory': 'Rutan', 'RYAN': 'Ryan', 'Ryan Aeronautical': 'Ryan', 'RYAN AERONAUTICAL': 'Ryan', 'Ryan Aeronautics': 'Ryan', 'RYAN JOHN STEFFEY': 'Ryan', 'RYAN W Gross': 'Ryan', 'Ryan-navion': 'Ryan', 'SAAB': 'Saab', 'Saab-fairchild': 'Saab', 'Saab-scania': 'Saab', 'SAAB-SCANIA': 'Saab', 'SAAB-SCANIA AB': 'Saab', 'Saab-scania Ab (saab)': 'Saab', 'Saab-Scania AB (Saab)': 'Saab', 'SCHWEIZER': 'Schweizer', 'Schweizer 300CBi': 'Schweizer', 'Schweizer Aircraft Corp': 'Schweizer', 'SCHWEIZER AIRCRAFT CORP': 'Schweizer', 'Schweizer Aircraft Corp.': 'Schweizer', 'Schweizer Sgs': 'Schweizer', 'Schweizer, N36289': 'Schweizer', 'SCHWEIZER(HUGHES)': 'Schweizer', 'SCHWEIZER(HUGHES)AIRCRAFT CORP': 'Schweizer', 'Scottish': 'Scottish Aviation', 'SCOTTISH AVIATION': 'Scottish Aviation', 'SHORT': 'Short', 'SHORT BROS': 'Short', 'Short Bros.': 'Short', 'SHORT BROS. & HARLAND': 'Short', 'Short Brothers': 'Short', 'SHORT BROTHERS & HARLAND LTD.': 'Short', 'SHORT BROTHERS PLC': 'Short', 'SIAI MARCHETTI': 'Siai Marchetti', 'Siai-marchetti': 'Siai Marchetti', 'SIAI-MARCHETTI': 'Siai Marchetti', 'Siai-Marchetti': 'Siai Marchetti', 'SIKORSKY': 'Sikorsky', 'SIKORSKY AIRCRAFT CORP': 'Sikorsky', 'Sikorsky/orlando': 'Sikorsky', 'SILVAIRE': 'Silvaire', 'SKYKITS': 'Skykits', 'SKYKITS CORP': 'Skykits', 'Skykits Corporation': 'Skykits', 'SKYKITS USA CORP': 'Skykits', 'SMITH': 'Smith', 'Smith & R. Mathews': 'Smith', 'Smith Aerostar': 'Smith', 'SMITH ALBERT F': 'Smith', 'SMITH ALLEN': 'Smith', 'Smith Arthur Fox': 'Smith', 'SMITH BRET B': 'Smith', 'SMITH Capella': 'Smith', 'Smith Carter A': 'Smith', 'Smith Douglas J.': 'Smith', 'SMITH EDWARD I': 'Smith', 'Smith Mini': 'Smith', 'Smith Miniplane': 'Smith', 'SMITH MINIPLANE': 'Smith', 'SMITH VILAS': 'Smith', 'Smith Wylie Jay': 'Smith', 'Smith, Ted Aerostar': 'Smith', 'Smith/davis': 'Smith', 'SNOW': 'Snow', 'SOCATA': 'Socata', 'Socata-Groupe Aerospatiale': 'Socata', 'SONEX': 'Sonex', 'Sonex / John D. McCarter': 'Sonex', 'SONEX AIRCRAFT': 'Sonex', 'SONEX LIMITED': 'Sonex', 'SORENSEN': 'Sorensen', 'SORENSEN DANNY': 'Sorensen', 'SORENSEN DANNY S': 'Sorensen', 'SORENSON': 'Sorensen', 'Sorrel': 'Sorrell', 'Sorrell Aircraft': 'Sorrell', 'STANLEY': 'Stanley', 'STANLEY ARTHUR FREEMAN': 'Stanley', 'STANLEY B E': 'Stanley', 'STANLEY ERNIE SIGURD': 'Stanley', 'Stanley, Davey L': 'Stanley', 'STANTON': 'Stanton', 'Star Duster': 'Starduster', 'Star Duster Too': 'Starduster', 'Starduster Ii': 'Starduster', 'STARDUSTER II': 'Starduster', 'Starduster Too': 'Starduster', 'STAUDACHER AIRCRAFT INC': 'Staudacher', 'Staudacher Aircraft, Inc.': 'Staudacher', 'STAUDACHER HYDROPLANES': 'Staudacher', 'STAUDACHER JON': 'Staudacher', 'Staudaucher': 'Staudacher', 'STEARMAN': 'Stearman', 'STEARMAN AIRCRAFT': 'Stearman', 'STEELE': 'Steele', 'STEELE  JOHN J': 'Steele', 'STEELE RALPH BRUCE': 'Steele', 'STEELE SAMUEL D': 'Steele', 'STEEN': 'Steen', 'Steen Aero Lab': 'Steen', 'Steen Skybolt': 'Steen', 'STINSON': 'Stinson', 'Stits Aircraft': 'Stits', 'Stits Flut-r-bug': 'Stits', 'STITS FLUT-R-BUG': 'Stits', 'Stits Playboy': 'Stits', 'Stits-itrich': 'Stits', 'Stitts': 'Stits', 'STODDARD HAMILTON': 'Stoddard Hamilton', 'Stoddard-Hamilton': 'Stoddard Hamilton', 'STOL': 'Stol', 'Stol Aircraft': 'Stol', 'STOL Aircraft Corp': 'Stol', 'STOL LLC': 'Stol', 'STOLP STARDUSTER': 'Stolp Starduster', 'Stolp Starduster Corp.': 'Stolp Starduster', 'Stolp-adams': 'Stolp Starduster', 'Stolp-starduster Too': 'Stolp Starduster', 'SUKHOI': 'Sukhoi', 'SUTTON': 'Sutton', 'Sutton Tailwind': 'Sutton', 'SUTTON WILLIAM J': 'Sutton', 'SWANSON': 'Swanson', 'Swanson/bensen': 'Swanson', 'SWEARINGEN': 'Swearingen', 'Swearingen T R/masters W': 'Swearingen', 'TAYLOR': 'Taylor', 'Taylor Air Command': 'Taylor', 'Taylor Lonsdale': 'Taylor', 'Taylor Smith': 'Taylor', 'Taylor Titch': 'Taylor', 'TAYLORCRAFT': 'Taylorcraft', 'Taylorcraft Aviation': 'Taylorcraft', 'TAYLORCRAFT AVIATION CORP': 'Taylorcraft', 'TAYLORCRAFT CORP': 'Taylorcraft', 'Taylorcraft Corporation': 'Taylorcraft', 'TECNAM': 'Tecnam', 'TEMCO': 'Temco', 'Temco Luscombe': 'Temco', 'TERATORN': 'Teratorn', 'Teratorn Acft Inc.': 'Teratorn', 'Teratorn Aircraft, Inc.': 'Teratorn', 'Teratron': 'Teratorn', 'TEXAS HELICOPTER CORP': 'Texas Helicopter', 'Texas Helicopter Corp.':'Texas Helicopter', 'Texas Helicopter Corporation': 'Texas Helicopter', 'TEXTRON AVIATION INC': 'Textron Aviation', 'Textron Aviation Inc': 'Textron Aviation', 'THE BOEING COMPANY': 'Boeing', 'THOMPSON': 'Thompson', 'THORP': 'Thorp', 'Thorp Aero, Inc.': 'Thorp', 'Thorpe': 'Thorp', 'THRUSH': 'Thrush', 'THRUSH AIRCRAFT INC': 'Thrush', 'Thrush Aircraft Inc.': 'Thrush', 'THRUSH AIRCRAFT LLC': 'Thrush', 'Thrush Aircraft, Inc.': 'Thrush', 'TITAN': 'Titan', 'TITAN AEROSPACE HOLDINGS INC': 'Titan', 'Titan Aircraft': 'Titan', 'TRAVEL AIR': 'Travel Air', 'TUPOLEV': 'Tupolev', 'Univair Aircraft Corporation': 'Univair', 'UNIVAIR AIRCRAFT CORPORATION': 'Univair', 'Univar': 'Univair', 'UNIVERSAL': 'Universal', 'Universal Aircraft Industries': 'Universal', 'Universal Globe': 'Universal', 'Universal Moulded Pdts.': 'Universal', 'Universal Stinson': 'Universal', 'UNIVERSAL STINSON': 'Universal', 'VANS': 'Vans', 'Vans Aircraft': 'Vans', 'Vans Aircraft Inc': 'Vans', 'VANS AIRCRAFT INC': 'Vans', 'Vans Aircraft, Inc.': 'Vans', 'VARGA AIRCRAFT CORP.': 'Varga', 'VELOCITY INC': 'Velocity', 'VICKERS': 'Vickers', 'WACO': 'Waco', 'Waco Classic Aircraft': 'Waco', 'WACO CLASSIC AIRCRAFT': 'Waco', 'Waco Classic Aircraft Corp': 'Waco', 'WACO CLASSIC AIRCRAFT CORP': 'Waco', 'Waco Classic Aircraft Corp.': 'Waco', 'WEATHERLY': 'Weatherly', 'WEATHERLY AVIATION CO INC': 'Weatherly', 'Weatherly Aviation Company Inc': 'Weatherly', 'WEBER': 'Weber', 'WHEELER': 'Wheeler', 'Wheeler Acft. Co.': 'Wheeler', 'WHEELER C / WHEELER K': 'Wheeler', 'Wheeler Technology, Inc.': 'Wheeler', 'WHITTMAN': 'Whittman', 'Whittman Tailwind': 'Whittman', 'WILSON': 'Wilson', 'Wing Aircraft': 'Wing', 'Wing Aircraft Co.': 'Wing', 'WOOD': 'Wood', 'WSK PZL MIELEC': 'Wsk Pzl Mielec', 'Wsk Pzl Swidnik': 'Wsk Pzl Mielec', 'Wsk Pzl Warzawa-okecie': 'Wsk Pzl Mielec', 'Wsk Pzl-krosno': 'Wsk Pzl Mielec', 'WSK-MIELEC': 'Wsk Pzl Mielec', 'WSK-PZL MEILEC': 'Wsk Pzl Mielec', 'Wsk-pzl Mielec': 'Wsk Pzl Mielec', 'WSK-PZL WARZAWA-OKECIE': 'Wsk Pzl Mielec', 'Wsk-pzl Warzawaokecie': 'Wsk Pzl Mielec', 'WSL PZL': 'Wsk Pzl Mielec', 'XTREMEAIR GMBH': 'Xtremeair GMBH', 'YAKOVLEV': 'Yakovlev', 'YAKOVLEV/CHINNERY': 'Yakovlev', 'YAKOVLEV/DAY': 'Yakovlev', 'ZENAIR': 'Zenair', 'ZENAIR LTD': 'Zenair', 'Zenair Zodiac': 'Zenair', 'ZENITH': 'Zenith', 'ZENITH ACFT CO': 'Zenith', 'ZENITH AIRCRAFT CO': 'Zenith', 'ZIMMERMAN': 'Zimmerman', 'ZIVKO AERONAUTICS INC': 'Zivko', 'Zivko Aeronautics': 'Zivko', 'Zivko Aeronautics Inc.': 'Zivko', 'ZLIN': 'Zlin', 'Zlin Aviation': 'Zlin', 'Zlin Aviation S.r.o.': 'Zlin'}

df['Make'] = df['Make'].replace(make_column_name_replace)

Combine the MD Helicopters and McDonnell Douglas into McDonnell Douglas Helicopters
Combine the BELL variations into Bell

In [None]:
df[df['Make'].isin(['Md Helicopter', 'Mcdonnell Douglas Helicopter', 'McDonnell Douglas Helicopter', 'BELL', 
                    'BELL HELICOPTER TEXTRON CANADA', 'BELL HELICOPTER TEXTRON'])][
    'Aircraft_Category'].value_counts(dropna=False)

In [None]:
# Clean up make column for McDonnell Douglas Helicopters and Bell
make_column_name_replace = {'Md Helicopter': 'McDonnell Douglas Helicopters',
                            'Mcdonnell Douglas Helicopter': 'McDonnell Douglas Helicopters', 
                            'McDonnell Douglas Helicopter': 'McDonnell Douglas Helicopters', 
                            'BELL': 'Bell',
                            'BELL HELICOPTER TEXTRON CANADA': 'Bell',
                            'BELL HELICOPTER TEXTRON': 'Bell'}

df['Make'] = df['Make'].replace(make_column_name_replace)

In [None]:
# Show Make value_counts over 10
makes_value_10 = df['Make'].value_counts()[df['Make'].value_counts() > 10]
makes_value_10

Now we look at the category column to see how we can fill it in using cleaned make column

In [None]:
# Show the Aircraft_Category value_counts for makes_value_10 including NaN
df[df['Make'].isin(makes_value_10.index)]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
# Show Aircraft Category value_counts for Cessna, Piper, Beech, Boeing, and Mooney, including Nan
df[df['Make'].isin(['Cessna', 'Piper', 'Beech', 'Boeing', 'Mooney'])][
    'Aircraft_Category'].value_counts(dropna=False)

For these 5 makes, I feel it's reasonable to make them all airplane

In [None]:
# make all category entries for particular makes 'Airplane'
df.loc[df['Make'].isin(['Cessna', 'Piper', 'Beech', 'Boeing', 'Mooney']), 'Aircraft_Category'] = 'Airplane'

In [None]:
# Show Aircraft Category value_counts for Cessna, Piper, Beech, Boeing, and Mooney, including Nan
df[df['Make'].isin(['Cessna', 'Piper', 'Beech', 'Boeing', 'Mooney'])][
    'Aircraft_Category'].value_counts(dropna=False)

In [None]:
# Show the Aircraft_Category value_counts for makes_value_10 including NaN
df[df['Make'].isin(makes_value_10.index)]['Aircraft_Category'].value_counts(dropna=False)

At this point, we still have over 16000 empty values in category. 
Let's see about helicopters and filling in some missing values there.

In [None]:
# Helicopter value_counts
df[df['Aircraft_Category'] == 'Helicopter']['Make'].value_counts()

In [None]:
# Show Aircraft Category value_counts for Robinson, Bell, Hughes, Eurocopter, Schweizer, including Nan
df[df['Make'].isin(['Robinson', 'Bell', 'Hughes', 'Eurocopter', 'Schweizer'])][
    'Aircraft_Category'].value_counts(dropna=False)

In [None]:
# Again, there's an overwhelming number that are helicopter, so let's change these
# Function to make all category entries for particular makes 'Helicopter'
df.loc[df['Make'].isin(['Robinson', 'Bell', 'Hughes', 'Eurocopter', 'Schweizer']), 'Aircraft_Category'] = 'Helicopter'

In [None]:
# Show Helicopter value_counts
df[df['Aircraft_Category'] == 'Helicopter']['Make'].value_counts()

In [None]:
# Show the Aircraft_Category value_counts for makes_value_10 including NaN
df[df['Make'].isin(makes_value_10.index)]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
# Now we're down to 13,000 empty category values. Let's look at the makes.
# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
# Show category value_counts for Grumman, Bellanca, Air Tractor, and Aeronca, including NaN
df[df['Make'].isin(['Grumman', 'Bellanca', 'Air Tractor', 'Aeronca'])][
    'Aircraft_Category'].value_counts(dropna=False)

In [None]:
# Fill in 'Airplane' for Grumman, Bellanca, Air Tractor, and Aeronca
df.loc[df['Make'].isin(['Grumman', 'Bellanca', 'Air Tractor', 'Aeronca']), 'Aircraft_Category'] = 'Airplane'

In [None]:
# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
# Show category value_counts for Maule, Champion, de Havilland, Aero Commander, including NaN
df[df['Make'].isin(['Maule', 'Champion', 'de Havilland', 'Aero Commander'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
# Fill in 'Airplane' for Maule, Champion, de Havilland, Aero Commander
df.loc[df['Make'].isin(['Maule', 'Champion', 'de Havilland', 'Aero Commander']), 'Aircraft_Category'] = 'Airplane'

In [None]:
# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
# Show category value_counts for Rockwell, Hiller, Stinson, Aerospatiale, including NaN
df[df['Make'].isin(['Rockwell', 'Stinson'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
# Fill in 'Airplane' for Rockwell and Stinson
df.loc[df['Make'].isin(['Rockwell', 'Stinson']), 'Aircraft_Category'] = 'Airplane'

In [None]:
# Deal with Hiller and Aerospatiale
df[df['Make'].isin(['Aerospatiale', 'Hiller'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
# Since Hiller and Aerospatiale are overwhelmly Helicopter, let's fill those in
df.loc[df['Make'].isin(['Aerospatiale', 'Hiller']), 'Aircraft_Category'] = 'Helicopter'

In [None]:
# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
# deal with Mcdonnell Douglas
df[df['Make'].isin(['Mcdonnell Douglas'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
# If Make is Mcdonnell Douglas and category is Helicopter, change make to McDonnell Douglas Helicopters
df.loc[(df['Make'] == 'Mcdonnell Douglas') & (df['Aircraft_Category'] == 'Helicopter'), 'Make'] = 'McDonnell Douglas Helicopters'

df[df['Make'].isin(['Mcdonnell Douglas'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
df.loc[df['Make'].isin(['Mcdonnell Douglas']), 'Aircraft_Category'] = 'Airplane'

In [None]:
# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
df[df['Make'].isin(['Taylorcraft', 'North American', 'Luscombe', 'Douglas'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
df.loc[df['Make'].isin(['Taylorcraft', 'North American', 'Luscombe', 'Douglas']), 'Aircraft_Category'] = 'Airplane'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
df[df['Make'].isin(['Enstrom'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
df.loc[df['Make'].isin(['Enstrom']), 'Aircraft_Category'] = 'Helicopter'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
df[df['Make'].isin(['Ayres', 'Ercoupe', 'Gulfstream'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
df.loc[df['Make'].isin(['Ayres', 'Ercoupe', 'Gulfstream']), 'Aircraft_Category'] = 'Airplane'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
df[df['Make'].isin(['Airbus'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
#Deal with Airbus name for helicopters
df.loc[(df['Make'] == 'Airbus') & (df['Aircraft_Category'] == 'Helicopter'), 'Make'] = 'Airbus Helicopters'

df[df['Make'].isin(['Airbus'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
# Fill in the rest for Airbus
df.loc[df['Make'].isin(['Airbus']), 'Aircraft_Category'] = 'Airplane'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
df[df['Make'].isin(['Sikorsky'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
# Fill in the rest for Sikorsky
df.loc[df['Make'].isin(['Sikorsky']), 'Aircraft_Category'] = 'Helicopter'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
df[df['Make'].isin(['Fairchild', 'Pitts', 'Swearingen', 'Lake'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
# Fill in the rest for previous
df.loc[df['Make'].isin(['Fairchild', 'Pitts', 'Swearingen', 'Lake']), 'Aircraft_Category'] = 'Airplane'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
df[df['Make'].isin(['Mitsubishi', 'Waco'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
# Fill in the rest for previous
df.loc[df['Make'].isin(['Mitsubishi', 'Waco']), 'Aircraft_Category'] = 'Airplane'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
df[df['Make'].isin(['Ryan', 'Lockheed'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
# Fill in the rest for previous
df.loc[df['Make'].isin(['Ryan', 'Lockheed']), 'Aircraft_Category'] = 'Airplane'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
df[df['Make'].isin(['Learjet'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
# Fill in the rest for previous
df.loc[df['Make'].isin(['Balloon Works', 'Aerostar']), 'Aircraft_Category'] = 'Balloon'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
# Fill in the rest for previous
df.loc[df['Make'].isin(['Burkhart Grob', 'Let']), 'Aircraft_Category'] = 'Glider'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
df[df['Make'].isin(['Raven'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
# Fill in the rest for previous
df.loc[df['Make'].isin(['Learjet', 'Helio', 'Smith', 'Embraer']), 'Aircraft_Category'] = 'Airplane'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
df[df['Make'].isin(['Aviat'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
# Fill in the rest for previous
df.loc[df['Make'].isin(['Wsk Pzl Mielec', 'British Aerospace', 'American Aviation', 'Aviat']), 'Aircraft_Category'] = 'Airplane'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
df[df['Make'].isin(['Cirrus'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
# Fill in the rest for previous
df.loc[df['Make'].isin(['Globe', 'Weatherly', 'Cirrus']), 'Aircraft_Category'] = 'Airplane'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
# Fill in the rest for previous
df.loc[df['Make'].isin(['Raven']), 'Aircraft_Category'] = 'Balloon'

In [None]:
# Fill in the rest for previous
df.loc[df['Make'].isin(['Schleicher']), 'Aircraft_Category'] = 'Glider'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
df[df['Make'].isin(['Navion'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
# Fill in the rest for previous
df.loc[df['Make'].isin(['Mbb']), 'Aircraft_Category'] = 'Helicopter'
df.loc[df['Make'].isin(['Schempp-hirth']), 'Aircraft_Category'] = 'Glider'
df.loc[df['Make'].isin(['Gates Learjet', 'Saab', 'Navion']), 'Aircraft_Category'] = 'Airplane'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
df[df['Make'].isin(['Cameron'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
df.loc[df['Make'].isin(['Balloon Works', 'Aerostar', 'Raven', 'Cameron']), 'Aircraft_Category'] = 'Balloon'
df.loc[df['Make'].isin(['Burkhart Grob', 'Let', 'Schleicher', 'Schempp-hirth']), 'Aircraft_Category'] = 'Glider'
df.loc[df['Make'].isin(['Robinson', 'Bell', 'Hughes', 'Eurocopter', 'Schweizer', 'Aerospatiale', 'Hiller', 'Enstrom', 'Sikorsky', 'Mbb']), 'Aircraft_Category'] = 'Helicopter'
df.loc[df['Make'].isin(['Cessna', 'Piper', 'Beech', 'Boeing', 'Mooney', 'Grumman', 'Bellanca', 'Air Tractor', 'Aeronca', 'Maule', 'Champion', 'de Havilland', 'Aero Commander', 'Rockwell', 'Stinson', 'Mcdonnell Douglas', 'Taylorcraft', 'North American', 'Luscombe', 'Douglas', 'Ayres', 'Ercoupe', 'Gulfstream', 'Airbus', 'Fairchild', 'Pitts', 'Swearingen', 'Lake', 'Mitsubishi', 'Waco', 'Ryan', 'Lockheed', 'Learjet', 'Helio', 'Smith', 'Embraer', 'Wsk Pzl Mielec', 'British Aerospace', 'American Aviation', 'Aviat', 'Globe', 'Weatherly', 'Cirrus', 'Gates Learjet', 'Saab', 'Navion', 'Canadair', 'Dassault', 'Socata']), 'Aircraft_Category'] = 'Airplane'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
df[df['Make'].isin(['Christen Industries'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
df.loc[df['Make'].isin(['Rotorway']), 'Aircraft_Category'] = 'Helicopter'
df.loc[df['Make'].isin(['Fokker', 'Bombardier', 'Christen Industries']), 'Aircraft_Category'] = 'Airplane'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
df[df['Make'].isin(['Convair'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
df.loc[df['Make'].isin(['Great Lakes', 'Eagle', 'Eipper', 'Convair']), 'Aircraft_Category'] = 'Airplane'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
df[df['Make'].isin(['Brantly Helicopter'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
df.loc[df['Make'].isin(['Brantly Helicopter']), 'Aircraft_Category'] = 'Helicopter'
df.loc[df['Make'].isin(['Consolidated Aero', 'Short']), 'Aircraft_Category'] = 'Airplane'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
df[df['Make'].isin(['Rolladen Schneider'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
df.loc[df['Make'].isin(['Rolladen Schneider']), 'Aircraft_Category'] = 'Glider'
df.loc[df['Make'].isin(['Kaman']), 'Aircraft_Category'] = 'Helicopter'
df.loc[df['Make'].isin(['Aerotek']), 'Aircraft_Category'] = 'Airplane'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
df[df['Make'].isin(['Raytheon'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
df.loc[df['Make'].isin(['Fairchild Hiller']), 'Aircraft_Category'] = 'Helicopter'
df.loc[df['Make'].isin(['Britten Norman', 'Raytheon']), 'Aircraft_Category'] = 'Airplane'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
df[df['Make'].isin(['Diamond'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
df.loc[df['Make'].isin(['Agusta']), 'Aircraft_Category'] = 'Helicopter'
df.loc[df['Make'].isin(['Pilatus', 'Diamond']), 'Aircraft_Category'] = 'Airplane'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
df[df['Make'].isin(['Continental Copters'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
df.loc[df['Make'].isin(['Continental Copters']), 'Aircraft_Category'] = 'Helicopter'
df.loc[df['Make'].isin(['Alon', 'Hawker']), 'Aircraft_Category'] = 'Airplane'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
df[df['Make'].isin(['Republic'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
df.loc[df['Make'].isin(['Texas Helicopter']), 'Aircraft_Category'] = 'Helicopter'
df.loc[df['Make'].isin(['American Champion', 'Republic']), 'Aircraft_Category'] = 'Airplane'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
df[df['Make'].isin(['Dornier'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
df.loc[df['Make'].isin(['Homebuilt', 'Rans', 'Dornier']), 'Aircraft_Category'] = 'Airplane'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
df[df['Make'].isin(['Israel Aircraft Industries'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
df.loc[df['Make'].isin(['I.c.a. Brasov']), 'Aircraft_Category'] = 'Glider'
df.loc[df['Make'].isin(['Siai Marchetti', 'Israel Aircraft Industries']), 'Aircraft_Category'] = 'Airplane'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
df[df['Make'].isin(['Quicksilver'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
df.loc[df['Make'].isin(['Snow', 'Yakovlev', 'Quicksilver']), 'Aircraft_Category'] = 'Airplane'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
df[df['Make'].isin(['Callair'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
df.loc[df['Make'].isin(['Adams Balloon']), 'Aircraft_Category'] = 'Balloon'
df.loc[df['Make'].isin(['Lancair', 'Callair']), 'Aircraft_Category'] = 'Airplane'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
df[df['Make'].isin(['Quickie'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
df.loc[df['Make'].isin(['Thunder And Colt']), 'Aircraft_Category'] = 'Balloon'
df.loc[df['Make'].isin(['Varga', 'Quickie']), 'Aircraft_Category'] = 'Airplane'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
# Show Make values whose Aircraft_Category value is NaN if there are over 15
df[df['Aircraft_Category'].isna()]['Make'].value_counts()[df[df['Aircraft_Category'].isna()]['Make'].value_counts() > 15]

In [None]:
df[df['Make'].isin(['Glaser-dirks'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
df.loc[df['Make'].isin(['Barnes']), 'Aircraft_Category'] = 'Balloon'
df.loc[df['Make'].isin(['Glasflugel', 'Glaser-dirks']), 'Aircraft_Category'] = 'Glider'
df.loc[df['Make'].isin(['Garlick']), 'Aircraft_Category'] = 'Helicopter'
df.loc[df['Make'].isin(['Extra', 'Curtiss Wright', 'Kolb', 'Glasair', 'ATR', 'Casa', 'Temco', 'Johnson', 'Classic Aircraft Corp', 'Davis', 'Miller', 'Forney']), 'Aircraft_Category'] = 'Airplane'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
# Show Make values whose Aircraft_Category value is NaN if there are over 10
df[df['Aircraft_Category'].isna()]['Make'].value_counts()[df[df['Aircraft_Category'].isna()]['Make'].value_counts() > 10]

In [None]:
df[df['Make'].isin(['Partenavia'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
df.loc[df['Make'].isin(['Bensen', 'Air & Space']), 'Aircraft_Category'] = 'Gyrocraft'
df.loc[df['Make'].isin(['Pterodactyl', 'Weedhopper']), 'Aircraft_Category'] = 'Ultralight'
df.loc[df['Make'].isin(['Eiriavion Oy']), 'Aircraft_Category'] = 'Glider'
df.loc[df['Make'].isin(['Interstate', 'Sukhoi', 'Artic Aircraft Corp.', 'Vans', 'Rotec', 'Thorp', 'Anderson Aircraft Corp.', 'American General Aircraft', 'Culver', 'Mitchell', 'Stearman', 'Aerofab Inc.', 'Hall', 'Taylor', 'Nord', 'Jones', 'Hispano Aviacion', 'Young', 'Rutan', 'Naval Aircraft Factory', 'Howard Aircraft', 'Steen', 'Teratorn', 'Meyers', 'Starduster', 'Partenavia']), 'Aircraft_Category'] = 'Airplane'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
# Show Make values whose Aircraft_Category value is NaN if there are over 5 and under 11
blanks_over_5 = df[df['Aircraft_Category'].isna()]['Make'].value_counts()[(df[df['Aircraft_Category'].isna()]['Make'].value_counts() > 5)]

# Show blank_over_5 below 11
blanks_over_5[blanks_over_5 < 11]

In [None]:
df[df['Make'].isin(['Hayes'])]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
df.loc[df['Make'].isin(['Benson']), 'Aircraft_Category'] = 'Gyrocraft'
df.loc[df['Make'].isin(['American Aerolights']), 'Aircraft_Category'] = 'Ultralight'
df.loc[df['Make'].isin(['Maxair', 'Bede Aircraft', 'Martin']), 'Aircraft_Category'] = 'Airplane'

# Show Make values whose Aircraft_Category value is NaN
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
# Show Make value_counts over 10
makes_value_10 = df['Make'].value_counts()[df['Make'].value_counts() > 10]
makes_value_10

In [None]:
# Show the Aircraft_Category value_counts for makes_value_10 including NaN
df[df['Make'].isin(makes_value_10.index)]['Aircraft_Category'].value_counts(dropna=False)

Go ahead here and combine NaN and Unknown in the category column.

In [None]:
# Make NaN category 'Unknown'
df.loc[df['Aircraft_Category'].isna(), 'Aircraft_Category'] = 'Unknown'

In [None]:
# Show the Aircraft_Category value_counts for makes_value_10 including NaN
df[df['Make'].isin(makes_value_10.index)]['Aircraft_Category'].value_counts(dropna=False)

In [None]:
# Make WSFT category 'Weight-Shift'
df.loc[df['Aircraft_Category'] == 'WSFT', 'Aircraft_Category'] = 'Weight-Shift'

At this point, the category column is filled in enough

# Continue column cleaning

In [None]:
df.info()

In [None]:
# Show FAR_Description values including NaN
df['Weather_Condition'].value_counts(dropna=False)

In [None]:
# Change UNK and Unk to Unknown
df.loc[df['Weather_Condition'] == 'Unk', 'Weather_Condition'] = 'Unknown'
df.loc[df['Weather_Condition'] == 'UNK', 'Weather_Condition'] = 'Unknown'

# Change NaN to Unknown
df.loc[df['Weather_Condition'].isna(), 'Weather_Condition'] = 'Unknown'

In [None]:
# Show FAR_Description values including NaN
df['Weather_Condition'].value_counts(dropna=False)

In [None]:
df['Purpose_of_flight'].value_counts(dropna=False)

In [None]:
# Change NaN to Unknown
df.loc[df['Purpose_of_flight'].isna(), 'Purpose_of_flight'] = 'Unknown'

In [None]:
df['Broad_phase_of_flight'].value_counts(dropna=False)

In [None]:
# Change NaN to Unknown
df.loc[df['Broad_phase_of_flight'].isna(), 'Broad_phase_of_flight'] = 'Unknown'

# Change Other to Unknown
df.loc[df['Broad_phase_of_flight'] == 'Other', 'Broad_phase_of_flight'] = 'Unknown'

In [None]:
df['Broad_phase_of_flight'].value_counts(dropna=False)

In [None]:
df['Engine_Type'].value_counts(dropna=False)

In [None]:
# Change NaN to Unknown
df.loc[df['Engine_Type'].isna(), 'Engine_Type'] = 'Unknown'

# Change UNK to Unknown
df.loc[df['Engine_Type'] == 'UNK', 'Engine_Type'] = 'Unknown'

In [None]:
df['Engine_Type'].value_counts(dropna=False)

In [None]:
df.info()

In [None]:
df['Total_Fatal_Injuries'].value_counts(dropna=False)

In [None]:
# Change NaN to 0 in Injury columns
df.loc[df['Total_Fatal_Injuries'].isna(), 'Total_Fatal_Injuries'] = 0
df.loc[df['Total_Serious_Injuries'].isna(), 'Total_Serious_Injuries'] = 0
df.loc[df['Total_Minor_Injuries'].isna(), 'Total_Minor_Injuries'] = 0
df.loc[df['Total_Uninjured'].isna(), 'Total_Uninjured'] = 0

In [None]:
df['Location'].value_counts(dropna=False)

In [None]:
# Change NaN to Unknown
df.loc[df['Location'].isna(), 'Location'] = 'Unknown'

In [None]:
df.info()

Looking through the values for Make, I see that it may need more attention, and would like to try another method
by going through the makes alphabetically.

In [None]:
# Let's go back to the Make column and clean further
# Show most popular makes beginning with 'A'
df[df['Make'].str.lower().str.startswith('a', na=False)].value_counts('Make').head(20)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('aeron', na=False), 'Make'] = 'Aeronca'
df.loc[df['Make'].str.lower().str.startswith('air tra', na=False), 'Make'] = 'Air Tractor'
df.loc[df['Make'].str.lower().str.startswith('aero comm', na=False), 'Make'] = 'Aero Commander'
df.loc[df['Make'].str.lower().str.startswith('ayre', na=False), 'Make'] = 'Ayres'
df.loc[df['Make'].str.lower().str.startswith('aerosp', na=False), 'Make'] = 'Aerospatiale'
df.loc[df['Make'].str.lower().str.startswith('airb', na=False), 'Make'] = 'Airbus'
df.loc[df['Make'].str.lower().str.startswith('avia', na=False), 'Make'] = 'Aviat'

In [None]:
# Show most popular makes beginning with 'A'
df[df['Make'].str.lower().str.startswith('a', na=False)].value_counts('Make').head(20)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('agus', na=False), 'Make'] = 'Agusta'
df.loc[df['Make'].str.lower().str.startswith('american cha', na=False), 'Make'] = 'American Champion'
df.loc[df['Make'].str.lower().str.startswith('american av', na=False), 'Make'] = 'American'
df.loc[df['Make'].str.lower().str.startswith('american leg', na=False), 'Make'] = 'American Legend'

In [None]:
# Show me most popular makes beginning with 'A'
df[df['Make'].str.lower().str.startswith('american g', na=False)].value_counts('Make').head(20)

In [None]:
# Show me most popular makes beginning with 'A'
df[df['Make'].str.lower().str.startswith('american', na=False)].value_counts('Make').head(20)

In [None]:
# Show me most popular makes beginning with 'A'
df[df['Make'].str.lower().str.startswith('american b', na=False)].value_counts('Make').head(20)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('american b', na=False), 'Make'] = 'American Blimp'

In [None]:
# Show me most popular makes beginning with 'A'
df[df['Make'].str.lower().str.startswith('american', na=False)].value_counts('Make').head(20)

In [None]:
# Show me most popular makes beginning with 'A'
df[df['Make'].str.lower().str.startswith('american eu', na=False)].value_counts('Make').head(20)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('american eu', na=False), 'Make'] = 'American Eurocopter'

In [None]:
# Show me most popular makes beginning with 'B'
df[df['Make'].str.lower().str.startswith('b', na=False)].value_counts('Make').head(20)

In [None]:
# Show me most popular makes beginning with 'B'
df[df['Make'].str.lower().str.startswith('bell', na=False)].value_counts('Make').head(20)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('bell h', na=False), 'Make'] = 'Bell'

In [None]:
df[df['Make'].str.lower().str.startswith('bell', na=False)].value_counts('Make').head(20)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('bella', na=False), 'Make'] = 'Bellanca'
df.loc[df['Make'].str.lower().str.startswith('bell-', na=False), 'Make'] = 'Bell'
df.loc[df['Make'].str.lower().str.startswith('bell/', na=False), 'Make'] = 'Bell'
df.loc[df['Make'].str.lower().str.startswith('bell t', na=False), 'Make'] = 'Bell'
df.loc[df['Make'].str.lower().str.startswith('bell b', na=False), 'Make'] = 'Bell'
df.loc[df['Make'].str.lower().str.startswith('bell 4', na=False), 'Make'] = 'Bell'
df.loc[df['Make'].str.lower().str.startswith('bell s', na=False), 'Make'] = 'Bell'

In [None]:
df[df['Make'].str.lower().str.startswith('bell', na=False)].value_counts('Make').head(20)

In [None]:
# Show me most popular makes beginning with 'B'
df[df['Make'].str.lower().str.startswith('boe', na=False)].value_counts('Make').head(20)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('boeing h', na=False), 'Make'] = 'Boeing Helicopters'
df.loc[df['Make'].str.lower().str.startswith('boeing c', na=False), 'Make'] = 'Boeing'
df.loc[df['Make'].str.lower().str.startswith('boeing v', na=False), 'Make'] = 'Boeing'

In [None]:
df[df['Make'].str.lower().str.startswith('b', na=False)].value_counts('Make').head(20)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('bens', na=False), 'Make'] = 'Benson'

In [None]:
df[df['Make'].str.lower().str.startswith('c', na=False)].value_counts('Make').head(40)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('cub', na=False), 'Make'] = 'Cubcrafters'
df.loc[df['Make'].str.lower().str.startswith('cirrus', na=False), 'Make'] = 'Cirrus Design'
df.loc[df['Make'].str.lower().str.startswith('champ', na=False), 'Make'] = 'Champion'
df.loc[df['Make'].str.lower().str.startswith('christ', na=False), 'Make'] = 'Christen Industries'
df.loc[df['Make'].str.lower().str.startswith('consol', na=False), 'Make'] = 'Consolidated Aeronautics'

In [None]:
# Show category value_counts for Cameron
df[df['Make'] == 'Cameron'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('camer', na=False), 'Make'] = 'Cameron Balloons'

In [None]:
df[df['Make'].str.lower().str.startswith('c', na=False)].value_counts('Make').head(40)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('continental c', na=False), 'Make'] = 'Continental Copters'
df.loc[df['Make'].str.lower().str.startswith('cosmos', na=False), 'Make'] = 'Cosmos'
df.loc[df['Make'].str.lower().str.startswith('curtis', na=False), 'Make'] = 'Curtiss Wright'

In [None]:
df[df['Make'].str.lower().str.startswith('c', na=False)].value_counts('Make').head(40)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('cassu', na=False), 'Make'] = 'Cassutt'
df.loc[df['Make'].str.lower().str.startswith('cgs', na=False), 'Make'] = 'CGS Aviation'
df.loc[df['Make'].str.lower().str.startswith('continental', na=False), 'Make'] = 'Continental Copters'

In [None]:
df[df['Make'].str.lower().str.startswith('d', na=False)].value_counts('Make').head(20)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('doug', na=False), 'Make'] = 'Douglas'
df.loc[df['Make'].str.lower().str.startswith('dorn', na=False), 'Make'] = 'Dornier'

In [None]:
df[df['Make'].str.lower().str.startswith('e', na=False)].value_counts('Make').head(30)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('eagl', na=False), 'Make'] = 'Eagle Aircraft'
df.loc[df['Make'].str.lower().str.startswith('embr', na=False), 'Make'] = 'Embraer'
df.loc[df['Make'].str.lower().str.startswith('enstrom', na=False), 'Make'] = 'Enstrom'
df.loc[df['Make'].str.lower().str.startswith('ercou', na=False), 'Make'] = 'Ercoupe'
df.loc[df['Make'].str.lower().str.startswith('euroc', na=False), 'Make'] = 'Eurocopter'
df.loc[df['Make'].str.lower().str.startswith('evek', na=False), 'Make'] = 'Evektor Aerotechnik'
df.loc[df['Make'].str.lower().str.startswith('extra', na=False), 'Make'] = 'Extra'

In [None]:
df[df['Make'].str.lower().str.startswith('e', na=False)].value_counts('Make').head(30)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('eaa', na=False), 'Make'] = 'Eaa'
df.loc[df['Make'].str.lower().str.startswith('ecli', na=False), 'Make'] = 'Eclipse Aviation'
df.loc[df['Make'].str.lower().str.startswith('eip', na=False), 'Make'] = 'Eipper'
df.loc[df['Make'].str.lower().str.startswith('eiri', na=False), 'Make'] = 'Eiriavion Oy'
df.loc[df['Make'].str.lower().str.startswith('eng', na=False), 'Make'] = 'Engineering & Research'
df.loc[df['Make'].str.lower().str.startswith('evol', na=False), 'Make'] = 'Evolution'

In [None]:
df[df['Make'].str.lower().str.startswith('f', na=False)].value_counts('Make').head(30)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('fairch', na=False), 'Make'] = 'Fairchild'
df.loc[df['Make'].str.lower().str.startswith('firef', na=False), 'Make'] = 'Firefly'
df.loc[df['Make'].str.lower().str.startswith('fish', na=False), 'Make'] = 'Fisher'
df.loc[df['Make'].str.lower().str.startswith('fleet', na=False), 'Make'] = 'Fleet'
df.loc[df['Make'].str.lower().str.startswith('flight d', na=False), 'Make'] = 'Flight Design'
df.loc[df['Make'].str.lower().str.startswith('flights', na=False), 'Make'] = 'Flightstar'
df.loc[df['Make'].str.lower().str.startswith('fokk', na=False), 'Make'] = 'Fokker'

In [None]:
df[df['Make'].str.lower().str.startswith('f', na=False)].value_counts('Make').head(30)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('foug', na=False), 'Make'] = 'Fouga'
df.loc[df['Make'].str.lower().str.startswith('found', na=False), 'Make'] = 'Found Aircraft'
df.loc[df['Make'].str.lower().str.startswith('funk', na=False), 'Make'] = 'Funk'

In [None]:
df[df['Make'].str.lower().str.startswith('f', na=False)].value_counts('Make').head(30)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('fant', na=False), 'Make'] = 'Fantasy'

In [None]:
df[df['Make'].str.lower().str.startswith('g', na=False)].value_counts('Make').head(30)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('garl', na=False), 'Make'] = 'Garlick'
df.loc[df['Make'].str.lower().str.startswith('gates', na=False), 'Make'] = 'Gates Learjet'
df.loc[df['Make'].str.lower().str.startswith('general atom', na=False), 'Make'] = 'General Atomics'
df.loc[df['Make'].str.lower().str.startswith('glasa', na=False), 'Make'] = 'Glasair'
df.loc[df['Make'].str.lower().str.startswith('glassa', na=False), 'Make'] = 'Glasair'
df.loc[df['Make'].str.lower().str.startswith('glase', na=False), 'Make'] = 'Glaser Dirks'
df.loc[df['Make'].str.lower().str.startswith('glasf', na=False), 'Make'] = 'Glasflugel'
df.loc[df['Make'].str.lower().str.startswith('globe', na=False), 'Make'] = 'Globe'
df.loc[df['Make'].str.lower().str.startswith('great l', na=False), 'Make'] = 'Great Lakes'
df.loc[df['Make'].str.lower().str.startswith('grob', na=False), 'Make'] = 'Grob'
df.loc[df['Make'].str.lower().str.startswith('grum', na=False), 'Make'] = 'Grumman'
df.loc[df['Make'].str.lower().str.startswith('gulfstr', na=False), 'Make'] = 'Gulfstream'
df.loc[df['Make'].str.lower().str.startswith('golden c', na=False), 'Make'] = 'Golden Circle Air'
df.loc[df['Make'].str.lower().str.startswith('gren', na=False), 'Make'] = 'Grenier'

In [None]:
df[df['Make'].str.lower().str.startswith('h', na=False)].value_counts('Make').head(50)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('hawk', na=False), 'Make'] = 'Hawker'
df.loc[df['Make'].str.lower().str.startswith('head', na=False), 'Make'] = 'Head Balloons'
df.loc[df['Make'].str.lower().str.startswith('helio', na=False), 'Make'] = 'Helio'
df.loc[df['Make'].str.lower().str.startswith('hiller', na=False), 'Make'] = 'Hiller'
df.loc[df['Make'].str.lower().str.startswith('howard', na=False), 'Make'] = 'Howard Aircraft'
df.loc[df['Make'].str.lower().str.startswith('hughes', na=False), 'Make'] = 'Hughes Helicopters'

In [None]:
df[df['Make'].str.lower().str.startswith('i', na=False)].value_counts('Make').head(50)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('i.c.a', na=False), 'Make'] = 'I.c.a. Brasov'
df.loc[df['Make'].str.lower().str.startswith('ica', na=False), 'Make'] = 'I.c.a. Brasov'
df.loc[df['Make'].str.lower().str.startswith('icon', na=False), 'Make'] = 'Icon'
df.loc[df['Make'].str.lower().str.startswith('indu', na=False), 'Make'] = 'Indus'
df.loc[df['Make'].str.lower().str.startswith('infini', na=False), 'Make'] = 'Infinity'
df.loc[df['Make'].str.lower().str.startswith('iniz', na=False), 'Make'] = 'Iniziative Industriali Italian'
df.loc[df['Make'].str.lower().str.startswith('interp', na=False), 'Make'] = 'Interplane'
df.loc[df['Make'].str.lower().str.startswith('intersta', na=False), 'Make'] = 'Interstate'
df.loc[df['Make'].str.lower().str.startswith('israel', na=False), 'Make'] = 'Israel Aircraft Industries'

In [None]:
df[df['Make'].str.lower().str.startswith('j', na=False)].value_counts('Make').head(50)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('jabi', na=False), 'Make'] = 'Jabiru'
df.loc[df['Make'].str.lower().str.startswith('jihl', na=False), 'Make'] = 'Jihlavan'
df.loc[df['Make'].str.lower().str.startswith('jode', na=False), 'Make'] = 'Jodel'
df.loc[df['Make'].str.lower().str.startswith('johns', na=False), 'Make'] = 'Johnson'
df.loc[df['Make'].str.lower().str.startswith('jones', na=False), 'Make'] = 'Jones'
df.loc[df['Make'].str.lower().str.startswith('just', na=False), 'Make'] = 'Just Aircraft'

In [None]:
df[df['Make'].str.lower().str.startswith('k', na=False)].value_counts('Make').head(50)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('kama', na=False), 'Make'] = 'Kaman'
df.loc[df['Make'].str.lower().str.startswith('kawa', na=False), 'Make'] = 'Kawasaki'
df.loc[df['Make'].str.lower().str.startswith('kitf', na=False), 'Make'] = 'Kitfox'
df.loc[df['Make'].str.lower().str.startswith('kolb', na=False), 'Make'] = 'Kolb'
df.loc[df['Make'].str.lower().str.startswith('kubic', na=False), 'Make'] = 'Kubicek'

In [None]:
df[df['Make'].str.lower().str.startswith('l', na=False)].value_counts('Make').head(50)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('lake', na=False), 'Make'] = 'Lake'
df.loc[df['Make'].str.lower().str.startswith('lanc', na=False), 'Make'] = 'Lancair'
df.loc[df['Make'].str.lower().str.startswith('lars', na=False), 'Make'] = 'Larson'
df.loc[df['Make'].str.lower().str.startswith('lear', na=False), 'Make'] = 'Learjet'
df.loc[df['Make'].str.lower().str.startswith('let', na=False), 'Make'] = 'Let'
df.loc[df['Make'].str.lower().str.startswith('liberty', na=False), 'Make'] = 'Liberty'
df.loc[df['Make'].str.lower().str.startswith('lindst', na=False), 'Make'] = 'Lindstrand Balloons'
df.loc[df['Make'].str.lower().str.startswith('lockh', na=False), 'Make'] = 'Lockheed'
df.loc[df['Make'].str.lower().str.startswith('long', na=False), 'Make'] = 'Long'
df.loc[df['Make'].str.lower().str.startswith('lusc', na=False), 'Make'] = 'Luscombe'

In [None]:
df[df['Make'].str.lower().str.startswith('m', na=False)].value_counts('Make').head(50)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('martin', na=False), 'Make'] = 'Martin'
df.loc[df['Make'].str.lower().str.startswith('maul', na=False), 'Make'] = 'Maule'
df.loc[df['Make'].str.lower().str.startswith('MCDONNELL DOUGLAS H', na=False), 'Make'] = 'Mcdonnell Douglas Helicopters'
df.loc[df['Make'].str.lower().str.startswith('MCDONNELL DOUGLAS A', na=False), 'Make'] = 'Mcdonnell Douglas'

In [None]:
df[df['Make'].str.lower().str.startswith('mcdonn', na=False)].value_counts('Make').head(50)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('mcdonnell-douglas', na=False), 'Make'] = 'Mcdonnell Douglas'

In [None]:
df[df['Make'].str.lower().str.startswith('mcdonn', na=False)].value_counts('Make').head(50)

In [None]:
df.loc[(df['Make'] == 'MCDONNELL DOUGLAS') | (df['Make'] == 'McDonnell Douglas'), 'Make'] = 'Mcdonnell Douglas'

In [None]:
df[df['Make'].str.lower().str.startswith('mcdonn', na=False)].value_counts('Make').head(50)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('maxair', na=False), 'Make'] = 'Maxair'
df.loc[df['Make'].str.lower().str.startswith('mbb', na=False), 'Make'] = 'MBB'
df.loc[df['Make'].str.lower().str.startswith('md helicopter', na=False), 'Make'] = 'Md Helicopter'
df.loc[df['Make'].str.lower().str.startswith('meyer', na=False), 'Make'] = 'Meyers'
df.loc[df['Make'].str.lower().str.startswith('miller', na=False), 'Make'] = 'Miller'
df.loc[df['Make'].str.lower().str.startswith('mitsub', na=False), 'Make'] = 'Mitsubishi'
df.loc[df['Make'].str.lower().str.startswith('monoco', na=False), 'Make'] = 'Monocoupe'
df.loc[df['Make'].str.lower().str.startswith('moone', na=False), 'Make'] = 'Mooney'
df.loc[df['Make'].str.lower().str.startswith('morris', na=False), 'Make'] = 'Morrisey'
df.loc[df['Make'].str.lower().str.startswith('murph', na=False), 'Make'] = 'Murphy'
df.loc[df['Make'].str.lower().str.startswith('messersch', na=False), 'Make'] = 'Messerschmitt'
df.loc[df['Make'].str.lower().str.startswith('mikoya', na=False), 'Make'] = 'Mikoyan'
df.loc[df['Make'].str.lower().str.startswith('moor', na=False), 'Make'] = 'Moore'
df.loc[df['Make'].str.lower().str.startswith('mong', na=False), 'Make'] = 'Mong'

In [None]:
df[df['Make'].str.lower().str.startswith('n', na=False)].value_counts('Make').head(50)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('nanch', na=False), 'Make'] = 'Nanchang'
df.loc[df['Make'].str.lower().str.startswith('navio', na=False), 'Make'] = 'Navion'
df.loc[df['Make'].str.lower().str.startswith('nelso', na=False), 'Make'] = 'Nelson'
df.loc[df['Make'].str.lower().str.startswith('new pip', na=False), 'Make'] = 'New Piper'
df.loc[df['Make'].str.lower().str.startswith('newel', na=False), 'Make'] = 'Newell'
df.loc[df['Make'].str.lower().str.startswith('nord', na=False), 'Make'] = 'Nord'
df.loc[df['Make'].str.lower().str.startswith('north ame', na=False), 'Make'] = 'North American'
df.loc[df['Make'].str.lower().str.startswith('north w', na=False), 'Make'] = 'North Wing'
df.loc[df['Make'].str.lower().str.startswith('northw', na=False), 'Make'] = 'North Wing'

In [None]:
df[df['Make'].str.lower().str.startswith('o', na=False)].value_counts('Make').head(50)

In [None]:
df[df['Make'].str.lower().str.startswith('p', na=False)].value_counts('Make').head(50)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('p z', na=False), 'Make'] = 'PZL'
df.loc[df['Make'].str.lower().str.startswith('pz', na=False), 'Make'] = 'PZL'
df.loc[df['Make'].str.lower().str.startswith('parke', na=False), 'Make'] = 'Parker'
df.loc[df['Make'].str.lower().str.startswith('partenav', na=False), 'Make'] = 'Partenavia'
df.loc[df['Make'].str.lower().str.startswith('pdp', na=False), 'Make'] = 'PDPS'
df.loc[df['Make'].str.lower().str.startswith('perth', na=False), 'Make'] = 'Perth Amboy'
df.loc[df['Make'].str.lower().str.startswith('phanto', na=False), 'Make'] = 'Phantom'
df.loc[df['Make'].str.lower().str.startswith('philli', na=False), 'Make'] = 'Phillips'
df.loc[df['Make'].str.lower().str.startswith('piagg', na=False), 'Make'] = 'Piaggio'
df.loc[df['Make'].str.lower().str.startswith('piel', na=False), 'Make'] = 'Piel'
df.loc[df['Make'].str.lower().str.startswith('piet', na=False), 'Make'] = 'Pietenpol'
df.loc[df['Make'].str.lower().str.startswith('pilat', na=False), 'Make'] = 'Pilatus'
df.loc[df['Make'].str.lower().str.startswith('piper', na=False), 'Make'] = 'Piper'
df.loc[df['Make'].str.lower().str.startswith('pipest', na=False), 'Make'] = 'Pipestrel'
df.loc[df['Make'].str.lower().str.startswith('pitts', na=False), 'Make'] = 'Pitts'
df.loc[df['Make'].str.lower().str.startswith('powr', na=False), 'Make'] = 'Powrachute'
df.loc[df['Make'].str.lower().str.startswith('progress', na=False), 'Make'] = 'Progressive'

In [None]:
df[df['Make'].str.lower().str.startswith('q', na=False)].value_counts('Make').head(50)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('quartz', na=False), 'Make'] = 'Quartz Mountain'
df.loc[df['Make'].str.lower().str.startswith('quad', na=False), 'Make'] = 'Quad City'
df.loc[df['Make'].str.lower().str.startswith('quest a', na=False), 'Make'] = 'Quest Aircraft'
df.loc[(df['Make'] == 'QUEST') | (df['Make'] == 'Quest'), 'Make'] = 'Quest Aircraft'
df.loc[df['Make'].str.lower().str.startswith('questa', na=False), 'Make'] = 'Questair'
df.loc[df['Make'].str.lower().str.startswith('quickie', na=False), 'Make'] = 'Quickie'
df.loc[df['Make'].str.lower().str.startswith('quick s', na=False), 'Make'] = 'Quicksilver'
df.loc[df['Make'].str.lower().str.startswith('quicksil', na=False), 'Make'] = 'Quicksilver'
df.loc[df['Make'].str.lower().str.startswith('quiks', na=False), 'Make'] = 'Quicksilver'
df.loc[df['Make'].str.lower().str.startswith('quinn', na=False), 'Make'] = 'Quinn'

In [None]:
df[df['Make'].str.lower().str.startswith('r', na=False)].value_counts('Make').head(50)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('rans', na=False), 'Make'] = 'Rans'
df.loc[df['Make'].str.lower().str.startswith('raven', na=False), 'Make'] = 'Raven'
df.loc[df['Make'].str.lower().str.startswith('raythe', na=False), 'Make'] = 'Raytheon'
df.loc[df['Make'].str.lower().str.startswith('reims', na=False), 'Make'] = 'Reims Aviation'
df.loc[df['Make'].str.lower().str.startswith('remos', na=False), 'Make'] = 'Remos'
df.loc[df['Make'].str.lower().str.startswith('republ', na=False), 'Make'] = 'Republic'
df.loc[df['Make'].str.lower().str.startswith('revolut', na=False), 'Make'] = 'Revolution Helicopters'
df.loc[df['Make'].str.lower().str.startswith('riddel', na=False), 'Make'] = 'Riddell'
df.loc[df['Make'].str.lower().str.startswith('robinson', na=False), 'Make'] = 'Robinson Helicopter'
df.loc[(df['Make'] == 'ROBIN') | (df['Make'] == 'Robin'), 'Make'] = 'Robin'
df.loc[df['Make'].str.lower().str.startswith('rockwell', na=False), 'Make'] = 'Rockwell'
df.loc[df['Make'].str.lower().str.startswith('rolladen', na=False), 'Make'] = 'Rolladen-Schneider'
df.loc[df['Make'].str.lower().str.startswith('rose', na=False), 'Make'] = 'Rose'
df.loc[df['Make'].str.lower().str.startswith('rotec', na=False), 'Make'] = 'Rotec'
df.loc[df['Make'].str.lower().str.startswith('rotorw', na=False), 'Make'] = 'Rotorway'
df.loc[df['Make'].str.lower().str.startswith('rutan', na=False), 'Make'] = 'Rutan'
df.loc[df['Make'].str.lower().str.startswith('ryan', na=False), 'Make'] = 'Ryan'

In [None]:
df[df['Make'].str.lower().str.startswith('s', na=False)].value_counts('Make').loc[lambda x : x<3].head(60)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('saab', na=False), 'Make'] = 'Saab'
df.loc[df['Make'].str.lower().str.startswith('scheibe', na=False), 'Make'] = 'Scheibe'
df.loc[df['Make'].str.lower().str.startswith('schempp', na=False), 'Make'] = 'Schempp Hirth'
df.loc[df['Make'].str.lower().str.startswith('schleich', na=False), 'Make'] = 'Schleicher'
df.loc[df['Make'].str.lower().str.startswith('schwei', na=False), 'Make'] = 'Schweizer'
df.loc[df['Make'].str.lower().str.startswith('scottish', na=False), 'Make'] = 'Scottish Aviation'
df.loc[df['Make'].str.lower().str.startswith('short bro', na=False), 'Make'] = 'Short Brothers'
df.loc[df['Make'].str.lower().str.startswith('siai', na=False), 'Make'] = 'Siai Marchetti'
df.loc[df['Make'].str.lower().str.startswith('sikors', na=False), 'Make'] = 'Sikorsky'
df.loc[df['Make'].str.lower().str.startswith('silva', na=False), 'Make'] = 'Silvaire'
df.loc[df['Make'].str.lower().str.startswith('six ch', na=False), 'Make'] = 'Six Chuter'
df.loc[df['Make'].str.lower().str.startswith('skykit', na=False), 'Make'] = 'Skykits Corp'
df.loc[df['Make'].str.lower().str.startswith('slings', na=False), 'Make'] = 'Slingsby'

In [None]:
df[df['Make'].str.lower().str.startswith('t', na=False)].value_counts('Make').loc[lambda x : x<2].head(60)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('taylorcr', na=False), 'Make'] = 'Taylorcraft'
df.loc[df['Make'].str.lower().str.startswith('tecn', na=False), 'Make'] = 'Tecnam'
df.loc[df['Make'].str.lower().str.startswith('temco', na=False), 'Make'] = 'Temco'
df.loc[df['Make'].str.lower().str.startswith('terato', na=False), 'Make'] = 'Teratorn'
df.loc[df['Make'].str.lower().str.startswith('texas h', na=False), 'Make'] = 'Texas Helicopter'
df.loc[df['Make'].str.lower().str.startswith('textro', na=False), 'Make'] = 'Textron Aviation'
df.loc[df['Make'].str.lower().str.startswith('thorp', na=False), 'Make'] = 'Thorp'
df.loc[df['Make'].str.lower().str.startswith('thrush', na=False), 'Make'] = 'Thrush Aircraft'
df.loc[df['Make'].str.lower().str.startswith('thunder', na=False), 'Make'] = 'Thunder And Colt'
df.loc[df['Make'].str.lower().str.startswith('titan', na=False), 'Make'] = 'Titan'
df.loc[df['Make'].str.lower().str.startswith('tl u', na=False), 'Make'] = 'TL Ultralight'
df.loc[df['Make'].str.lower().str.startswith('travel', na=False), 'Make'] = 'Travel Air'
df.loc[df['Make'].str.lower().str.startswith('trick', na=False), 'Make'] = 'Trick Trikes'
df.loc[df['Make'].str.lower().str.startswith('tubb', na=False), 'Make'] = 'Tubbs'
df.loc[df['Make'].str.lower().str.startswith('tupole', na=False), 'Make'] = 'Tupolev'
df.loc[df['Make'].str.lower().str.startswith('the boei', na=False), 'Make'] = 'Boeing'

In [None]:
df[df['Make'].str.lower().str.startswith('u', na=False)].value_counts('Make').loc[lambda x : x>0].head(60)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('ultrali', na=False), 'Make'] = 'Ultralight Flight'
df.loc[df['Make'].str.lower().str.startswith('ultramag', na=False), 'Make'] = 'Ultramagic'
df.loc[df['Make'].str.lower().str.startswith('ultravia', na=False), 'Make'] = 'Ultravia Aero'
df.loc[df['Make'].str.lower().str.startswith('united cons', na=False), 'Make'] = 'United Consultant Corp.'
df.loc[df['Make'].str.lower().str.startswith('univa', na=False), 'Make'] = 'Univair'
df.loc[df['Make'].str.lower().str.startswith('universal s', na=False), 'Make'] = 'Universal'
df.loc[df['Make'].str.lower().str.startswith('unknow', na=False), 'Make'] = 'Unknown'
df.loc[df['Make'].str.lower().str.startswith('unregis', na=False), 'Make'] = 'Unknown'
df.loc[df['Make'].str.lower().str.startswith('urban a', na=False), 'Make'] = 'Urban Air'

In [None]:
df[df['Make'].str.lower().str.startswith('v', na=False)].value_counts('Make').loc[lambda x : x>0].head(60)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('valent', na=False), 'Make'] = 'Valentin'
df.loc[df['Make'].str.lower().str.startswith('vans', na=False), 'Make'] = 'Vans'
df.loc[df['Make'].str.lower().str.startswith("van's", na=False), 'Make'] = 'Vans'
df.loc[df['Make'].str.lower().str.startswith('varga', na=False), 'Make'] = 'Varga'
df.loc[df['Make'].str.lower().str.startswith('vari', na=False), 'Make'] = 'Varieze'

In [None]:
df[df['Make'].str.lower().str.startswith('vaugh', na=False)].value_counts('Make').loc[lambda x : x>0].head(60)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('vaugh', na=False), 'Make'] = 'Vaughn'

In [None]:
df[df['Make'].str.lower().str.startswith('w', na=False)].value_counts('Make').loc[lambda x : x<2].head(60)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('waco', na=False), 'Make'] = 'Waco'
df.loc[df['Make'].str.lower().str.startswith('weatherl', na=False), 'Make'] = 'Weatherly'
df.loc[df['Make'].str.lower().str.startswith('weber', na=False), 'Make'] = 'Weber'
df.loc[df['Make'].str.lower().str.startswith('westland', na=False), 'Make'] = 'Westland Helicopters'
df.loc[df['Make'].str.lower().str.startswith('wheele', na=False), 'Make'] = 'Wheeler'
df.loc[df['Make'].str.lower().str.startswith('white', na=False), 'Make'] = 'White'
df.loc[df['Make'].str.lower().str.startswith('whittman', na=False), 'Make'] = 'Whittman'
df.loc[df['Make'].str.lower().str.startswith('williams hel', na=False), 'Make'] = 'Williams Helicopter'
df.loc[df['Make'].str.lower().str.startswith('wsk', na=False), 'Make'] = 'WSK'

In [None]:
df[df['Make'].str.lower().str.startswith('x', na=False)].value_counts('Make').loc[lambda x : x>0].head(60)

In [None]:
df[df['Make'].str.lower().str.startswith('y', na=False)].value_counts('Make').loc[lambda x : x>0].head(60)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('yamok', na=False), 'Make'] = 'Yamokoski'

In [None]:
df[df['Make'].str.lower().str.startswith('z', na=False)].value_counts('Make').loc[lambda x : x>0].head(60)

In [None]:
df.loc[df['Make'].str.lower().str.startswith('zenai', na=False), 'Make'] = 'Zenair'
df.loc[df['Make'].str.lower().str.startswith('zeni', na=False), 'Make'] = 'Zenith'
df.loc[df['Make'].str.lower().str.startswith('zimmerm', na=False), 'Make'] = 'Zimmerman'
df.loc[df['Make'].str.lower().str.startswith('zivk', na=False), 'Make'] = 'Zivko Aeronautics'
df.loc[df['Make'].str.lower().str.startswith('zli', na=False), 'Make'] = 'Zlin'

In [None]:
df.loc[df['Make'].str.lower().str.startswith('aero vodo', na=False), 'Make'] = 'Aero Vodochody'
df.loc[df['Make'].str.lower().str.startswith('aeromot', na=False), 'Make'] = 'Aeromot'
df.loc[df['Make'].str.lower().str.startswith('aeropro', na=False), 'Make'] = 'Aeropro CZ'
df.loc[df['Make'].str.lower().str.startswith('aerostar', na=False), 'Make'] = 'Aerostar'
df.loc[df['Make'].str.lower().str.startswith('aerotek', na=False), 'Make'] = 'Aerotek'
df.loc[df['Make'].str.lower().str.startswith('air cre', na=False), 'Make'] = 'Air Creation'
df.loc[df['Make'].str.lower().str.startswith('aircraft mfg', na=False), 'Make'] = 'Aircraft Mfg and Dev'
df.loc[df['Make'].str.lower().str.startswith('alon', na=False), 'Make'] = 'Alon'
df.loc[df['Make'].str.lower().str.startswith('amateur b', na=False), 'Make'] = 'Amateur Built'
df.loc[df['Make'].str.lower().str.startswith('atr', na=False), 'Make'] = 'ATR'
df.loc[df['Make'].str.lower().str.startswith('autogyr', na=False), 'Make'] = 'AutoGyro'
df.loc[df['Make'].str.lower().str.startswith('avid', na=False), 'Make'] = 'Avid'
df.loc[df['Make'].str.lower().str.startswith('balloon w', na=False), 'Make'] = 'Balloon Works'
df.loc[df['Make'].str.lower().str.startswith('brantl', na=False), 'Make'] = 'Brantly'
df.loc[df['Make'].str.lower().str.startswith('british ae', na=False), 'Make'] = 'British Aerospace'
df.loc[df['Make'].str.lower().str.startswith('britten', na=False), 'Make'] = 'Britten Norman'
df.loc[df['Make'].str.lower().str.startswith('buckeye', na=False), 'Make'] = 'Buckeye'
df.loc[df['Make'].str.lower().str.startswith('burkhart', na=False), 'Make'] = 'Burkhart Grob'
df.loc[df['Make'].str.lower().str.startswith('canadair', na=False), 'Make'] = 'Canadair'
df.loc[df['Make'].str.lower().str.startswith('cassutt', na=False), 'Make'] = 'Cassutt'
df.loc[df['Make'].str.lower().str.startswith('cgs', na=False), 'Make'] = 'CGS Aviation'
df.loc[df['Make'].str.lower().str.startswith('classic airc', na=False), 'Make'] = 'Classic Aircraft Corp'
df.loc[df['Make'].str.lower().str.startswith('continental', na=False), 'Make'] = 'Continental Copters'
df.loc[df['Make'].str.lower().str.startswith('convair', na=False), 'Make'] = 'Convair'
df.loc[df['Make'].str.lower().str.startswith('cosmos', na=False), 'Make'] = 'Cosmos'
df.loc[df['Make'].str.lower().str.startswith('costruzioni', na=False), 'Make'] = 'Costruzioni Aeronautiche Tecna'
df.loc[df['Make'].str.lower().str.startswith('curtis', na=False), 'Make'] = 'Curtiss-Wright'
df.loc[df['Make'].str.lower().str.startswith('czech a', na=False), 'Make'] = 'Czech Aircraft Works'
df.loc[df['Make'].str.lower().str.startswith('czech s', na=False), 'Make'] = 'Czech Sport Aircraft'
df.loc[df['Make'].str.lower().str.startswith('downer', na=False), 'Make'] = 'Downer Aircraft Industries'
df.loc[df['Make'].str.lower().str.startswith('pipistrel', na=False), 'Make'] = 'Pipistrel'
df.loc[df['Make'].str.lower().str.startswith('socata', na=False), 'Make'] = 'Socata'
df.loc[df['Make'].str.lower().str.startswith('sonex', na=False), 'Make'] = 'Sonex'
df.loc[df['Make'].str.lower().str.startswith('stearm', na=False), 'Make'] = 'Stearman Aircraft'
df.loc[df['Make'].str.lower().str.startswith('steen', na=False), 'Make'] = 'Steen'
df.loc[df['Make'].str.lower().str.startswith('stemme', na=False), 'Make'] = 'Stemme'
df.loc[df['Make'].str.lower().str.startswith('stinson', na=False), 'Make'] = 'Stinson'
df.loc[df['Make'].str.lower().str.startswith('sukh', na=False), 'Make'] = 'Sukhoi'
df.loc[df['Make'].str.lower().str.startswith('swearing', na=False), 'Make'] = 'Swearingen'

In [None]:
df['Make'].value_counts().loc[lambda x : x>50].head(60)

In [None]:
df.info()

The airport columns and FAR Description are quite empty and not useful to the intended analysis, so they can be removed

In [None]:
# Remove Columns labelled Airport_Code, Airport_Name, FAR_Description
df = df.drop(['Airport_Code', 'Airport_Name', 'FAR_Description'], axis=1)

df.info()

In [None]:
# Aircraft_damage value_counts
df['Aircraft_damage'].value_counts(dropna=False)

In [None]:
# Fill in NaN values in Aircraft_damage with Unknown
df['Aircraft_damage'] = df['Aircraft_damage'].fillna('Unknown')

df['Aircraft_damage'].value_counts(dropna=False)

In [None]:
df['Aircraft_Category'].value_counts(dropna=False)

In [None]:
# Fill UNK with Unknown in Category column
df['Aircraft_Category'] = df['Aircraft_Category'].replace('UNK', 'Unknown')

In [None]:
# Fill ULTR with Ultralight in Category column
df['Aircraft_Category'] = df['Aircraft_Category'].replace('ULTR', 'Ultralight')

In [None]:
df['Aircraft_Category'].value_counts(dropna=False)

In [None]:
df['Injury_Severity'].value_counts(dropna=False)

In [None]:
# Show NaN count in Injury_Severity column
df['Injury_Severity'].isna().sum()

In [None]:
# Fill in NaN values in Injury_Severity with Unknown
df['Injury_Severity'] = df['Injury_Severity'].fillna('Unknown')

In [None]:
# replace values starting with 'Fata' with 'Fatal' since the number of fatalities is already recorded in another column
df.loc[df['Injury_Severity'].str.startswith('Fata'), 'Injury_Severity'] = 'Fatal'

In [None]:
df['Injury_Severity'].value_counts(dropna=False)

In [None]:
# Fill Unavailable with Unknown in Injury_Severity column
df['Injury_Severity'] = df['Injury_Severity'].replace('Unavailable', 'Unknown')

In [None]:
df['Injury_Severity'].value_counts(dropna=False)

In [None]:
df['Amateur_Built'].value_counts(dropna=False)

In [None]:
# Fill in NaN values in Amateur_Built with Unknown
df['Amateur_Built'] = df['Amateur_Built'].fillna('Unknown')

In [None]:
# Show NaN count in Report_Status column
df['Report_Status'].isna().sum()

In [None]:
# Fill in NaN values in Amateur_Built with Unknown
df['Report_Status'] = df['Report_Status'].fillna('Unknown')

In [None]:
df.info()

In [None]:
# Show NaN count in Report_Status column
df['Country'].isna().sum()

In [None]:
# Fill in NaN values in Country with Unknown
df['Country'] = df['Country'].fillna('Unknown')

In [None]:
# Fill in NaN values in Registration_Number with Unknown
df['Registration_Number'] = df['Registration_Number'].fillna('Unknown')

In [None]:
# Fill in NaN values in Model with Unknown
df['Model'] = df['Model'].fillna('Unknown')

In [None]:
# Fill in NaN values in Number_of_Engines with Unknown
df['Number_of_Engines'] = df['Number_of_Engines'].fillna('Unknown')

In [None]:
# Fill in NaN values in Publication_Date with Unknown
df['Publication_Date'] = df['Publication_Date'].fillna('Unknown')

In [None]:
df.info()

At this point we've filled in all the columns with valid values or "Unknown" if the values were not capable of being filled in.

In [None]:
# Export df as a separate file for Tableau visualizations
df.to_csv('Data/cleaned_aviation_data_complete.csv', index=False)

# Aircraft Damage Levels
I'd like to create some numbers, percentages, and charts to explore the Aircraft Damage levels related to Injury Levels

In [None]:
# For airplane incidents, how many were destroyed, had minor damage or substantial damage
incidents_airplane = df[df['Aircraft_Category'] == 'Airplane']
incidents_airplane['Aircraft_damage'].value_counts()

In [None]:
# Sums of different injury categories for airplanes
fatalities_airplane = incidents_airplane['Total_Fatal_Injuries'].sum()
serious_injury_airplane = incidents_airplane['Total_Serious_Injuries'].sum()
minor_injury_airplane = incidents_airplane['Total_Minor_Injuries'].sum()
no_injury_airplane = incidents_airplane['Total_Uninjured'].sum()
airplane_people_total = fatalities_airplane + serious_injury_airplane + minor_injury_airplane + no_injury_airplane

airplane_people_total

In [None]:
substantial_damage_airplane = incidents_airplane[incidents_airplane['Aircraft_damage'] == 'Substantial'].shape[0]
minor_damage_airplane = incidents_airplane[incidents_airplane['Aircraft_damage'] == 'Minor'].shape[0]
destroyed_airplane = incidents_airplane[incidents_airplane['Aircraft_damage'] == 'Destroyed'].shape[0]

In [None]:
#fatalities in the damage subsets
fatalities_substantial_damage_airplane = incidents_airplane[incidents_airplane['Aircraft_damage'] == 'Substantial']['Total_Fatal_Injuries'].sum()
fatalities_minor_damage_airplane = incidents_airplane[incidents_airplane['Aircraft_damage'] == 'Minor']['Total_Fatal_Injuries'].sum()
fatalities_destroyed_airplane = incidents_airplane[incidents_airplane['Aircraft_damage'] == 'Destroyed']['Total_Fatal_Injuries'].sum()

In [None]:
# what are the percentages of incidents_airplane injury levels
no_injury_airplane_percent = no_injury_airplane / airplane_people_total * 100
fatalities_airplane_percent = fatalities_airplane / airplane_people_total * 100
serious_injury_airplane_percent = serious_injury_airplane / airplane_people_total * 100
minor_injury_airplane_percent = minor_injury_airplane / airplane_people_total * 100

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Create pie chart with injury percentages
labels = ['No Injuries', 'Fatalities', 'Serious Injuries', 'Minor Injuries']
sizes = [no_injury_airplane_percent, fatalities_airplane_percent, serious_injury_airplane_percent, minor_injury_airplane_percent]
colors = ['#66b3ff', '#ff9999', '#99ff99', '#ffcc99']
sns.set_style("whitegrid")
plt.figure(figsize=(6,6))
plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%', shadow=False, startangle=45)
plt.axis('equal')
plt.title('Percentages of injuries, no injuries, and fatalities in Airplane Incidents')
plt.show()

Make the same graph for helicopters

In [None]:
incidents_helicopter = df[df['Aircraft_Category'] == 'Helicopter']
# Sums of different injury categories for airplanes
fatalities_helicopter = incidents_helicopter['Total_Fatal_Injuries'].sum()
serious_injury_helicopter = incidents_helicopter['Total_Serious_Injuries'].sum()
minor_injury_helicopter = incidents_helicopter['Total_Minor_Injuries'].sum()
no_injury_helicopter = incidents_helicopter['Total_Uninjured'].sum()
helicopter_people_total = fatalities_helicopter + serious_injury_helicopter + minor_injury_helicopter + no_injury_helicopter

substantial_damage_helicopter = incidents_helicopter[incidents_helicopter['Aircraft_damage'] == 'Substantial'].shape[0]
minor_damage_helicopter = incidents_helicopter[incidents_helicopter['Aircraft_damage'] == 'Minor'].shape[0]
destroyed_helicopter = incidents_helicopter[incidents_helicopter['Aircraft_damage'] == 'Destroyed'].shape[0]

#fatalities in the damage subsets
fatalities_substantial_damage_helicopter = incidents_helicopter[incidents_helicopter['Aircraft_damage'] == 'Substantial']['Total_Fatal_Injuries'].sum()
fatalities_minor_damage_helicopter = incidents_helicopter[incidents_helicopter['Aircraft_damage'] == 'Minor']['Total_Fatal_Injuries'].sum()
fatalities_destroyed_helicopter = incidents_helicopter[incidents_helicopter['Aircraft_damage'] == 'Destroyed']['Total_Fatal_Injuries'].sum()

# percentages of incidents_helicopter in the injury column
no_injury_helicopter_percent = no_injury_helicopter / helicopter_people_total * 100
fatalities_helicopter_percent = fatalities_helicopter / helicopter_people_total * 100
serious_injury_helicopter_percent = serious_injury_helicopter / helicopter_people_total * 100
minor_injury_helicopter_percent = minor_injury_helicopter / helicopter_people_total * 100

# Create pie chart with injury percentages
labels = ['No Injuries', 'Fatalities', 'Serious Injuries', 'Minor Injuries']
sizes = [no_injury_helicopter_percent, fatalities_helicopter_percent, serious_injury_helicopter_percent, minor_injury_helicopter_percent]
colors = ['#66b3ff', '#ff9999', '#99ff99', '#ffcc99']
sns.set_style("whitegrid")
plt.figure(figsize=(6,6))
plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%', shadow=False, startangle=45)
plt.axis('equal')
plt.title('Percentages of injuries, no injuries, and fatalities in Helicopter Incidents')
plt.show()

# Report Status
Even though the Report Status column is mostly empty, I feel that the data that does exist there may be interesting and would like to see if it can be sorted, cleaned, and used somehow.

In [None]:
df['Report_Status'].value_counts()

In [None]:
# create a subset of rows that is called informative_report that removes probable cause, unknown, foreign, factual, and any other non-useful values

informative_report = df[df['Report_Status'] != 'Probable Cause']
informative_report = informative_report[informative_report['Report_Status'] != 'Unknown']
informative_report = informative_report[informative_report['Report_Status'] != 'Foreign']
informative_report = informative_report[informative_report['Report_Status'] != '<br /><br />']
informative_report = informative_report[informative_report['Report_Status'] != 'Factual']
informative_report = informative_report[informative_report['Report_Status'] != 'None.']
informative_report = informative_report[informative_report['Report_Status'] != '.']
informative_report = informative_report[informative_report['Report_Status'] != 'Preliminary']
informative_report = informative_report[informative_report['Report_Status'] != 'Undetermined.']

informative_report['Report_Status'].info()

In [None]:
# In informative_report, replace "pilots" with "pilot's"
informative_report['Report_Status'] = informative_report['Report_Status'].str.replace('pilots', "pilot's")

informative_report['Report_Status'].value_counts()

In [None]:
# create subset of rows named pilot_error that contain the word "pilot's" in the Report_Status column
pilot_error = informative_report[informative_report['Report_Status'].str.contains("pilot's")]

pilot_error['Report_Status'].info()

In [None]:
# What percentage of all the records are pilot_error
pilot_error.shape[0] / df.shape[0] * 100

In [None]:
# non_pilot_report is informative_report without the pilot_error results
non_pilot_report = informative_report[~informative_report.index.isin(pilot_error.index)]

non_pilot_report['Report_Status'].info()

In [None]:
# What percentage of all the records are non_pilot_error
non_pilot_report.shape[0] / df.shape[0] * 100

In [None]:
# What percentage of the informative records are non_pilot_error and pilot_error
print(non_pilot_report.shape[0] / informative_report.shape[0] * 100)
print(pilot_error.shape[0] / informative_report.shape[0] * 100)

Report Status 
This previous section demonstrates that the vast majority of the report status column is not informative, having values such as "Probable Cause", "Foreign", and "Unknown". About 14% of the records (12,414) indicate pilot error as the main cause of the incident. Another 6.8% (5,966) contain a variety of causes for the incident, most of which point to mechanical or equipment issues.

So of these 18,380 informative values for Report Status, almost 68% are attributed to pilot error and about 32.5% attributed to various mechanical or equipment failures, many due to undetermined causes and some caused by human error in maintenance of equipment.

In [None]:
# create bar chart for pilot_error.shape and non_pilot_error.shape
plt.figure(figsize=(10, 6))
plt.bar(['Pilot Error', 'Non-Pilot Error'], [pilot_error.shape[0], non_pilot_report.shape[0]])
plt.title('Pilot Error vs Non-Pilot Error')
plt.xlabel('Error Type')
plt.ylabel('Number of Incidents')
plt.show()

Create two charts showing the damage percentage of planes and helicopters.

In [None]:
# airplane damage percentages
airplane_damage = incidents_airplane['Aircraft_damage'].value_counts()
labels = airplane_damage.index
sizes = airplane_damage
colors = ['#66b3ff', '#ff9999', '#99ff99', '#ffcc99']
explode = (0, 0, 0)
sns.set_style("whitegrid")
plt.figure(figsize=(6,6))
plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%', shadow=False, startangle=45)
plt.axis('equal')
plt.title('Airplane Damage')
plt.show()

In [None]:
# do the same for heli
helicopter_damage = incidents_helicopter['Aircraft_damage'].value_counts()
labels = helicopter_damage.index
sizes = helicopter_damage
colors = ['#66b3ff', '#ff9999', '#99ff99', '#ffcc99']
explode = (0, 0, 0)
sns.set_style("whitegrid")
plt.figure(figsize=(6,6))
plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%', shadow=False, startangle=45)
plt.axis('equal')
plt.title('Helicopter Damage')
plt.show()