# Business Understanding

The business is interested in expanding its portfolio by becoming involved in the aviation industry, specifically as an owner/operator of aircraft. At this initial fact-finding stage, the company knows nothing about the potential risks involved in owning and operating aircraft as a commercial endeavor.

I have been tasked with helping to determine some of the risks and suggesting which aircraft would be best suited for the company at the beginning stages of their new aviation division.

The stakeholders involved here would include not only the owners of the company, but also the department heads and employees of the aviation division that oversee and operate the aircraft for the company.

The goals for this project include recommending what kind of aircraft would provide the least risk for a commercial enterprise and suggesting certain operating protocols to help mitigate those risks.

# Data Understanding

The dataset being made available for this project is the National Transportation Safety Board aviation accident database as hosted on Kaggle.com at <a href="https://www.kaggle.com/datasets/khsamaha/aviation-accident-database-synopses" target="_blank">this link</a>. This dataset contains information about civil aviation accidents mainly in the US and includes many types of aircraft, from hot air balloons and powered parachutes to helicopters and airplanes. The current dataset contains 87,951 unique "Event ID" numbers, each representing an aircraft incident. It currently covers the years mainly from 1982 through 2022, with just a handful of accidents recorded before 1982. The dataset has 31 columns for each accident investigation that includes information like date and location, type of aircraft, make and model, injury severity information and number of injured, aircraft damage level, phase of flight for the accident, weather conditions, and reasons for the accident after the investigation is complete.

As the project is centered around risks of aviation, this dataset should prove to be a valuable resource for determining what kinds of risks exist in operating aircraft and making recommendations as far as what type of aircraft would be less of an investment risk. The columns detailing injury levels (Fatal, Serious, Minor, and Uninjured) to passengers and crew illuminate the human risks in aviation. Information related to aircraft damage levels will be valuable in terms of the financial risks.

Of concern in working with the dataset will be the lack of values in certain columns, especially the aircraft category and the accident reason columns. The "Aircraft Category" column is currently 64% empty, and the "Report Status" column (which provides a reason for the accident) is over 70% lacking in useful information. These two columns especially will need some in-depth cleaning and preparation.

# Data Preparation

## Data Cleaning

The dataset is named AviationData.csv and is in the data folder

In [706]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv('data/AviationData.csv', encoding='latin-1')

df.head()

  df = pd.read_csv('data/AviationData.csv', encoding='latin-1')


Unnamed: 0,Event.Id,Investigation.Type,Accident.Number,Event.Date,Location,Country,Latitude,Longitude,Airport.Code,Airport.Name,...,Purpose.of.flight,Air.carrier,Total.Fatal.Injuries,Total.Serious.Injuries,Total.Minor.Injuries,Total.Uninjured,Weather.Condition,Broad.phase.of.flight,Report.Status,Publication.Date
0,20001218X45444,Accident,SEA87LA080,1948-10-24,"MOOSE CREEK, ID",United States,,,,,...,Personal,,2.0,0.0,0.0,0.0,UNK,Cruise,Probable Cause,
1,20001218X45447,Accident,LAX94LA336,1962-07-19,"BRIDGEPORT, CA",United States,,,,,...,Personal,,4.0,0.0,0.0,0.0,UNK,Unknown,Probable Cause,19-09-1996
2,20061025X01555,Accident,NYC07LA005,1974-08-30,"Saltville, VA",United States,36.922223,-81.878056,,,...,Personal,,3.0,,,,IMC,Cruise,Probable Cause,26-02-2007
3,20001218X45448,Accident,LAX96LA321,1977-06-19,"EUREKA, CA",United States,,,,,...,Personal,,2.0,0.0,0.0,0.0,IMC,Cruise,Probable Cause,12-09-2000
4,20041105X01764,Accident,CHI79FA064,1979-08-02,"Canton, OH",United States,,,,,...,Personal,,1.0,2.0,,0.0,VMC,Approach,Probable Cause,16-04-1980


rename columns to remove dots as they may cause errors in Python (replace dots with underscores)

In [707]:
df.columns = df.columns.str.replace('.', '_')

df.head()

Unnamed: 0,Event_Id,Investigation_Type,Accident_Number,Event_Date,Location,Country,Latitude,Longitude,Airport_Code,Airport_Name,...,Purpose_of_flight,Air_carrier,Total_Fatal_Injuries,Total_Serious_Injuries,Total_Minor_Injuries,Total_Uninjured,Weather_Condition,Broad_phase_of_flight,Report_Status,Publication_Date
0,20001218X45444,Accident,SEA87LA080,1948-10-24,"MOOSE CREEK, ID",United States,,,,,...,Personal,,2.0,0.0,0.0,0.0,UNK,Cruise,Probable Cause,
1,20001218X45447,Accident,LAX94LA336,1962-07-19,"BRIDGEPORT, CA",United States,,,,,...,Personal,,4.0,0.0,0.0,0.0,UNK,Unknown,Probable Cause,19-09-1996
2,20061025X01555,Accident,NYC07LA005,1974-08-30,"Saltville, VA",United States,36.922223,-81.878056,,,...,Personal,,3.0,,,,IMC,Cruise,Probable Cause,26-02-2007
3,20001218X45448,Accident,LAX96LA321,1977-06-19,"EUREKA, CA",United States,,,,,...,Personal,,2.0,0.0,0.0,0.0,IMC,Cruise,Probable Cause,12-09-2000
4,20041105X01764,Accident,CHI79FA064,1979-08-02,"Canton, OH",United States,,,,,...,Personal,,1.0,2.0,,0.0,VMC,Approach,Probable Cause,16-04-1980


In [708]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 88889 entries, 0 to 88888
Data columns (total 31 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Event_Id                88889 non-null  object 
 1   Investigation_Type      88889 non-null  object 
 2   Accident_Number         88889 non-null  object 
 3   Event_Date              88889 non-null  object 
 4   Location                88837 non-null  object 
 5   Country                 88663 non-null  object 
 6   Latitude                34382 non-null  object 
 7   Longitude               34373 non-null  object 
 8   Airport_Code            50132 non-null  object 
 9   Airport_Name            52704 non-null  object 
 10  Injury_Severity         87889 non-null  object 
 11  Aircraft_damage         85695 non-null  object 
 12  Aircraft_Category       32287 non-null  object 
 13  Registration_Number     87507 non-null  object 
 14  Make                    88826 non-null

### As Event ID provides a unique identifier for each incident, let's check for duplicate rows

In [709]:
df[df.duplicated(subset=['Event_Id'], keep=False)]

Unnamed: 0,Event_Id,Investigation_Type,Accident_Number,Event_Date,Location,Country,Latitude,Longitude,Airport_Code,Airport_Name,...,Purpose_of_flight,Air_carrier,Total_Fatal_Injuries,Total_Serious_Injuries,Total_Minor_Injuries,Total_Uninjured,Weather_Condition,Broad_phase_of_flight,Report_Status,Publication_Date
117,20020917X01908,Accident,DCA82AA012B,1982-01-19,"ROCKPORT, TX",United States,,,RKP,ARANSAS COUNTY AIRPORT,...,Personal,,3.0,0.0,0.0,0.0,IMC,Approach,Probable Cause,19-01-1983
118,20020917X01908,Accident,DCA82AA012A,1982-01-19,"ROCKPORT, TX",United States,,,RKP,ARANSAS COUNTY AIRPORT,...,Executive/corporate,,3.0,0.0,0.0,0.0,IMC,Approach,Probable Cause,19-01-1983
153,20020917X02259,Accident,LAX82FA049A,1982-01-23,"VICTORVILLE, CA",United States,,,,,...,Personal,,2.0,0.0,4.0,0.0,VMC,Unknown,Probable Cause,23-01-1983
158,20020917X02400,Accident,MIA82FA038B,1982-01-23,"NEWPORT RICHEY, FL",United States,,,,,...,Personal,,0.0,0.0,0.0,3.0,VMC,Cruise,Probable Cause,23-01-1983
159,20020917X02400,Accident,MIA82FA038A,1982-01-23,"NEWPORT RICHEY, FL",United States,,,,,...,Personal,,0.0,0.0,0.0,3.0,VMC,Approach,Probable Cause,23-01-1983
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
88796,20221121106336,Accident,WPR23LA041,2022-11-18,"Las Vegas, NV",United States,361239N,1151140W,VGT,NORTH LAS VEGAS,...,Instructional,702 HELICOPTER INC,0.0,0.0,0.0,3.0,VMC,,,07-12-2022
88797,20221122106340,Incident,DCA23WA071,2022-11-18,"Marrakech,",Morocco,,,,,...,,British Airways,0.0,0.0,0.0,0.0,,,,
88798,20221122106340,Incident,DCA23WA071,2022-11-18,"Marrakech,",Morocco,,,,,...,,Valair Private Jets,0.0,0.0,0.0,0.0,,,,
88813,20221123106354,Accident,WPR23LA045,2022-11-22,"San Diego, CA",United States,323414N,1165825W,SDM,Brown Field Municipal Airport,...,Instructional,HeliStream Inc.,0.0,0.0,0.0,4.0,VMC,,,22-12-2022


I see here that though these duplicate rows do represent separate aircraft in multi-aircraft incidents, the injury and/or fatality numbers are combined. This would constitute duplicate numbers in certain columns that would render errors in the analysis when making use of the injury values.

So let's remove the duplicates from this subset.

In [710]:
df = df.drop_duplicates(subset=['Event_Id'], keep='first')

# Double check to make sure duplicates have been removed
df[df.duplicated(subset=['Event_Id'], keep=False)]

Unnamed: 0,Event_Id,Investigation_Type,Accident_Number,Event_Date,Location,Country,Latitude,Longitude,Airport_Code,Airport_Name,...,Purpose_of_flight,Air_carrier,Total_Fatal_Injuries,Total_Serious_Injuries,Total_Minor_Injuries,Total_Uninjured,Weather_Condition,Broad_phase_of_flight,Report_Status,Publication_Date


In [711]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 87951 entries, 0 to 88888
Data columns (total 31 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Event_Id                87951 non-null  object 
 1   Investigation_Type      87951 non-null  object 
 2   Accident_Number         87951 non-null  object 
 3   Event_Date              87951 non-null  object 
 4   Location                87899 non-null  object 
 5   Country                 87729 non-null  object 
 6   Latitude                34212 non-null  object 
 7   Longitude               34203 non-null  object 
 8   Airport_Code            49484 non-null  object 
 9   Airport_Name            52031 non-null  object 
 10  Injury_Severity         86961 non-null  object 
 11  Aircraft_damage         84848 non-null  object 
 12  Aircraft_Category       32181 non-null  object 
 13  Registration_Number     86601 non-null  object 
 14  Make                    87888 non-null  obj

## Columns that are not needed
Remove certain columns that are mostly empty (and can't be filled in) and/or would not contain data useful to the intended analysis.

I want to make heavy use of: date, injury, damage, category, phase of flight, and report status
Let's remove Latitude, Longitude, Airport_Code, Airport_Name, Registration_Number, FAR_Description, Schedule, Air_carrier, and Publication_Date as those columns are either mostly empty or would not contribute to the analysis.

In [712]:
df = df.drop(['Latitude', 'Longitude', 'Airport_Code', 'Airport_Name', 'Registration_Number', 'FAR_Description', 'Schedule', 'Air_carrier', 'Publication_Date'], axis=1)

df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 87951 entries, 0 to 88888
Data columns (total 22 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Event_Id                87951 non-null  object 
 1   Investigation_Type      87951 non-null  object 
 2   Accident_Number         87951 non-null  object 
 3   Event_Date              87951 non-null  object 
 4   Location                87899 non-null  object 
 5   Country                 87729 non-null  object 
 6   Injury_Severity         86961 non-null  object 
 7   Aircraft_damage         84848 non-null  object 
 8   Aircraft_Category       32181 non-null  object 
 9   Make                    87888 non-null  object 
 10  Model                   87859 non-null  object 
 11  Amateur_Built           87851 non-null  object 
 12  Number_of_Engines       81924 non-null  float64
 13  Engine_Type             80908 non-null  object 
 14  Purpose_of_flight       81829 non-null  obj

In [713]:
df.head()

Unnamed: 0,Event_Id,Investigation_Type,Accident_Number,Event_Date,Location,Country,Injury_Severity,Aircraft_damage,Aircraft_Category,Make,...,Number_of_Engines,Engine_Type,Purpose_of_flight,Total_Fatal_Injuries,Total_Serious_Injuries,Total_Minor_Injuries,Total_Uninjured,Weather_Condition,Broad_phase_of_flight,Report_Status
0,20001218X45444,Accident,SEA87LA080,1948-10-24,"MOOSE CREEK, ID",United States,Fatal(2),Destroyed,,Stinson,...,1.0,Reciprocating,Personal,2.0,0.0,0.0,0.0,UNK,Cruise,Probable Cause
1,20001218X45447,Accident,LAX94LA336,1962-07-19,"BRIDGEPORT, CA",United States,Fatal(4),Destroyed,,Piper,...,1.0,Reciprocating,Personal,4.0,0.0,0.0,0.0,UNK,Unknown,Probable Cause
2,20061025X01555,Accident,NYC07LA005,1974-08-30,"Saltville, VA",United States,Fatal(3),Destroyed,,Cessna,...,1.0,Reciprocating,Personal,3.0,,,,IMC,Cruise,Probable Cause
3,20001218X45448,Accident,LAX96LA321,1977-06-19,"EUREKA, CA",United States,Fatal(2),Destroyed,,Rockwell,...,1.0,Reciprocating,Personal,2.0,0.0,0.0,0.0,IMC,Cruise,Probable Cause
4,20041105X01764,Accident,CHI79FA064,1979-08-02,"Canton, OH",United States,Fatal(1),Destroyed,,Cessna,...,,,Personal,1.0,2.0,,0.0,VMC,Approach,Probable Cause


### Incomplete Columns
Now, we have 87,951 entries in the dataset. Most of the columns are incomplete though. For the columns that cannot be completed with reasonable values, we can fill some of them in with 'Unknown' instead of leaving them blank (NaN).

Empty Location, Country, Aircraft_damage, Make, Model, Amateur_Built, Number_of_Engines, Engine_Type, Purpose_of_flight, Weather_Condition, Broad_phase_of_flight, and Report_Status values can be filled in as 'Unknown'.

In [714]:
# Fill in NaN values in multiple columns with "Unknown"
columns_to_fill = ['Location', 'Country', 'Aircraft_damage', 'Make', 'Model', 'Amateur_Built', 'Number_of_Engines', 'Engine_Type', 'Purpose_of_flight', 
                   'Weather_Condition', 'Broad_phase_of_flight', 'Report_Status']
for column in columns_to_fill:
    df[column] = df[column].fillna('Unknown')

df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 87951 entries, 0 to 88888
Data columns (total 22 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Event_Id                87951 non-null  object 
 1   Investigation_Type      87951 non-null  object 
 2   Accident_Number         87951 non-null  object 
 3   Event_Date              87951 non-null  object 
 4   Location                87951 non-null  object 
 5   Country                 87951 non-null  object 
 6   Injury_Severity         86961 non-null  object 
 7   Aircraft_damage         87951 non-null  object 
 8   Aircraft_Category       32181 non-null  object 
 9   Make                    87951 non-null  object 
 10  Model                   87951 non-null  object 
 11  Amateur_Built           87951 non-null  object 
 12  Number_of_Engines       87951 non-null  object 
 13  Engine_Type             87951 non-null  object 
 14  Purpose_of_flight       87951 non-null  obj

The 4 injury columns (15 - 18) are incomplete, but they are float64, or integer, values, so we can't fill those empty values with "Unknown". The empty values should be changed to 0 to complete those columns.

In [715]:
# Fill in NaN values in multiple columns with 0
injury_columns_to_fill = ['Total_Fatal_Injuries', 'Total_Serious_Injuries', 'Total_Minor_Injuries', 'Total_Uninjured']
for column in injury_columns_to_fill:
    df[column] = df[column].fillna(0)

df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 87951 entries, 0 to 88888
Data columns (total 22 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Event_Id                87951 non-null  object 
 1   Investigation_Type      87951 non-null  object 
 2   Accident_Number         87951 non-null  object 
 3   Event_Date              87951 non-null  object 
 4   Location                87951 non-null  object 
 5   Country                 87951 non-null  object 
 6   Injury_Severity         86961 non-null  object 
 7   Aircraft_damage         87951 non-null  object 
 8   Aircraft_Category       32181 non-null  object 
 9   Make                    87951 non-null  object 
 10  Model                   87951 non-null  object 
 11  Amateur_Built           87951 non-null  object 
 12  Number_of_Engines       87951 non-null  object 
 13  Engine_Type             87951 non-null  object 
 14  Purpose_of_flight       87951 non-null  obj

In [716]:
df.head()

Unnamed: 0,Event_Id,Investigation_Type,Accident_Number,Event_Date,Location,Country,Injury_Severity,Aircraft_damage,Aircraft_Category,Make,...,Number_of_Engines,Engine_Type,Purpose_of_flight,Total_Fatal_Injuries,Total_Serious_Injuries,Total_Minor_Injuries,Total_Uninjured,Weather_Condition,Broad_phase_of_flight,Report_Status
0,20001218X45444,Accident,SEA87LA080,1948-10-24,"MOOSE CREEK, ID",United States,Fatal(2),Destroyed,,Stinson,...,1.0,Reciprocating,Personal,2.0,0.0,0.0,0.0,UNK,Cruise,Probable Cause
1,20001218X45447,Accident,LAX94LA336,1962-07-19,"BRIDGEPORT, CA",United States,Fatal(4),Destroyed,,Piper,...,1.0,Reciprocating,Personal,4.0,0.0,0.0,0.0,UNK,Unknown,Probable Cause
2,20061025X01555,Accident,NYC07LA005,1974-08-30,"Saltville, VA",United States,Fatal(3),Destroyed,,Cessna,...,1.0,Reciprocating,Personal,3.0,0.0,0.0,0.0,IMC,Cruise,Probable Cause
3,20001218X45448,Accident,LAX96LA321,1977-06-19,"EUREKA, CA",United States,Fatal(2),Destroyed,,Rockwell,...,1.0,Reciprocating,Personal,2.0,0.0,0.0,0.0,IMC,Cruise,Probable Cause
4,20041105X01764,Accident,CHI79FA064,1979-08-02,"Canton, OH",United States,Fatal(1),Destroyed,,Cessna,...,Unknown,Unknown,Personal,1.0,2.0,0.0,0.0,VMC,Approach,Probable Cause


Take a look at the Injury_Severity column

In [717]:
df['Injury_Severity'].value_counts(dropna=False)

Injury_Severity
Non-Fatal     66822
Fatal(1)       6086
Fatal          5257
Fatal(2)       3632
Incident       2113
              ...  
Fatal(33)         1
Fatal(123)        1
Fatal(72)         1
Fatal(54)         1
Fatal(189)        1
Name: count, Length: 110, dtype: int64

We see here the various values in that column give a number of fatal injuries for each accident. Since this number is already represented in the column for Total_Fatal_Injuries, we don't need this column, so can delete it.

In [718]:
# Drop the Injury_Severity column
df = df.drop('Injury_Severity', axis=1)

df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 87951 entries, 0 to 88888
Data columns (total 21 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Event_Id                87951 non-null  object 
 1   Investigation_Type      87951 non-null  object 
 2   Accident_Number         87951 non-null  object 
 3   Event_Date              87951 non-null  object 
 4   Location                87951 non-null  object 
 5   Country                 87951 non-null  object 
 6   Aircraft_damage         87951 non-null  object 
 7   Aircraft_Category       32181 non-null  object 
 8   Make                    87951 non-null  object 
 9   Model                   87951 non-null  object 
 10  Amateur_Built           87951 non-null  object 
 11  Number_of_Engines       87951 non-null  object 
 12  Engine_Type             87951 non-null  object 
 13  Purpose_of_flight       87951 non-null  object 
 14  Total_Fatal_Injuries    87951 non-null  flo

### Aircraft_Category
The category of aircraft is important to the analysis, but the column is mostly empty.

Many of the empty values can be filled in using the Make column, though.

In [719]:
df['Make'].value_counts(dropna=False)

Make
Cessna           21925
Piper            11903
CESSNA            4914
Beech             4290
PIPER             2841
                 ...  
Geertz               1
Conrad Menzel        1
Blucher              1
Gideon               1
ROYSE RALPH L        1
Name: count, Length: 8202, dtype: int64

I see here that there may exist multiple versions of the same makes, like "Cessna" and "CESSNA". It would be nice to clean this column for multiple versions of make names.

We can start with Cessna since it has the most in value_counts and see what other versions of that name are in the dataset.

In [720]:
# Show Make value beginning with ces, ignoring case
df[df['Make'].str.lower().str.startswith('ces')].value_counts('Make')

Make
Cessna                     21925
CESSNA                      4914
CESSNA AIRCRAFT CO            24
CESSNA AIRCRAFT                9
CESSNA AIRCRAFT COMPANY        9
Cessna Ector                   3
CESSNA ECTOR                   3
Cessna Aircraft Company        3
Cessna Wren                    2
CESSNA/AIR REPAIR INC          2
CESSNA/WEAVER                  1
Cessna Aircraft Co.            1
CESSNA REIMS                   1
CESSNA Aircraft                1
Cessna Reems                   1
Cessna Robertson               1
Cessna Skyhawk II              1
Cessna Soloy                   1
Cesna                          1
Name: count, dtype: int64

So all these makes can be cleaned by changing the values to "Cessna"

In [721]:
# Convert all these cessna values to 'Cessna'
df.loc[df['Make'].str.lower().str.startswith('ces'), 'Make'] = 'Cessna'

df[df['Make'].str.lower().str.startswith('ces')].value_counts('Make')

Make
Cessna    26903
Name: count, dtype: int64

We now have almost 27000 Cessna makes instead. So we can now look at the category values for these makes.

In [722]:
# Aircraft_Category values for Cessna in the Make column, include NaN
df[df['Make'] == 'Cessna'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         18406
Airplane     8496
Unknown         1
Name: count, dtype: int64

It looks like it would be safe to replace the empty category values (and 1 unknown) for the Cessna make with "Airplane"

In [723]:
# Fill in Aircraft_Category as 'Airplane' for Cessna
df.loc[df['Make'] == 'Cessna', 'Aircraft_Category'] = 'Airplane'

In [724]:
# Let's look at Make value_counts again
df['Make'].value_counts(dropna=False)

Make
Cessna           26903
Piper            11903
Beech             4290
PIPER             2841
Bell              2118
                 ...  
Gideon               1
Brault               1
Baldwin              1
Kirchner             1
ROYSE RALPH L        1
Name: count, Length: 8184, dtype: int64

In [725]:
# Show Make value beginning with piper, ignoring case
df[df['Make'].str.lower().str.startswith('piper')].value_counts('Make')

Make
Piper                         11903
PIPER                          2841
PIPER AIRCRAFT INC               27
PIPER AIRCRAFT CORPORATION        8
PIPER AIRCRAFT                    4
Piper Aircraft                    3
Piper/cub Crafters                3
PIPER/CUB CRAFTERS                3
Piper Aircraft Corporation        3
Piper Aircraft, Inc.              2
Piper Aerostar                    2
PIPER / LAUDEMAN                  1
PIPER/WALLY'S FLYERS INC          1
PIPER-HARRIS                      1
Piper Cub Crafters                1
Piper Pawnee                      1
Piper-aerostar                    1
Piper/Cub Crafters                1
PIPER AIRCRAFT, INC.              1
Piper/stevens                     1
Name: count, dtype: int64

In [726]:
# Convert all these piper values to 'Piper' and then take a look at its category values
df.loc[df['Make'].str.lower().str.startswith('piper'), 'Make'] = 'Piper'

df[df['Make'] == 'Piper'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         10045
Airplane     4763
Name: count, dtype: int64

In [727]:
# Fill in the NaN values for the category column for Piper as "Airplane"
df.loc[df['Make'] == 'Piper', 'Aircraft_Category'] = 'Airplane'

In [728]:
# Show Make value beginning with beech, ignoring case
df[df['Make'].str.lower().str.startswith('beech')].value_counts('Make')

Make
Beech                         4290
BEECH                         1042
Beechcraft                      24
BEECHCRAFT                       5
BEECH AIRCRAFT                   3
BEECH AIRCRAFT CORPORATION       2
Beech Aircraft Corporation       2
BEECH AIRCRAFT CO.               1
Beech Aircraft Corp              1
Beechcraft Corporation           1
Beecher                          1
Name: count, dtype: int64

A quick Google search tells me that Beech and Beechcraft are the same make.

In [729]:
df.loc[df['Make'].str.lower().str.startswith('beech'), 'Make'] = 'Beech'

df[df['Make'] == 'Beech'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         3654
Airplane    1718
Name: count, dtype: int64

In [730]:
df.loc[df['Make'] == 'Beech', 'Aircraft_Category'] = 'Airplane'

In [731]:
df[df['Make'].str.lower().str.startswith('bell')].value_counts('Make')

Make
Bell                              2118
Bellanca                           874
BELL                               588
BELLANCA                           159
BELL HELICOPTER TEXTRON CANADA      23
BELL HELICOPTER TEXTRON             21
BELL HELICOPTER                      4
Bell-transworld                      3
Bell Helicopter                      3
Bell-k Copter                        2
Bell Helicopter Textron              2
Bell-carson                          2
BELL TEXTRON CANADA LTD              2
BELL HELICOPTER CO                   2
Bell-olympic Helicopters, Inc.       1
Bell-moore                           1
Bell-world                           1
Bell/soloy                           1
Bell/garlick                         1
Bell/mason                           1
Bell/textron                         1
Bell/tsirah                          1
Bellah                               1
Bellanca Aircraft Corporation        1
Bellanca Citabria                    1
Bell-kitz Kopters   

In [732]:
# Bellanca and Bell are not the same make, so will take a little more work to clean all the various bell combinations.
# change the various interations of bell to Bell
df.loc[df['Make'].str.lower().str.startswith(('bell-', 'bell/', 'bell h', 'bell t', 'bell s', 'bell b', 'bell 4'), na=False), 'Make'] = 'Bell'

# make Bell and BELL the same
df.loc[(df['Make'] == 'BELL'), 'Make'] = 'Bell'

# address the various versions of Bellanca
df.loc[df['Make'].str.lower().str.startswith(('bellan'), na=False), 'Make'] = 'Bellanca'

# check the list again
df[df['Make'].str.lower().str.startswith('bell')].value_counts('Make')

Make
Bell              2792
Bellanca          1036
BELLER               1
BELLET JAMES J       1
Bellah               1
Name: count, dtype: int64

In [733]:
# Now we can look at the categories for Bell and Bellanca
df[df['Make'] == 'Bell'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           1816
Helicopter     971
Airplane         3
Unknown          2
Name: count, dtype: int64

In [734]:
# Bell can safely be changed to Helicopter for its category
df.loc[df['Make'] == 'Bell', 'Aircraft_Category'] = 'Helicopter'

df[df['Make'] == 'Bell'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Helicopter    2792
Name: count, dtype: int64

In [735]:
df[df['Make'] == 'Bellanca'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         753
Airplane    283
Name: count, dtype: int64

In [736]:
df.loc[df['Make'] == 'Bellanca', 'Aircraft_Category'] = 'Airplane'

In [737]:
# Let's look at Make value_counts again
df['Make'].value_counts(dropna=False)

Make
Cessna           26903
Piper            14808
Beech             5372
Bell              2792
Boeing            1512
                 ...  
Gideon               1
Brault               1
Baldwin              1
Kirchner             1
ROYSE RALPH L        1
Name: count, Length: 8118, dtype: int64

In [738]:
# clean the boeing make
df[df['Make'].str.lower().str.startswith('boei')].value_counts('Make')

Make
Boeing                            1512
BOEING                            1140
Boeing Stearman                     48
BOEING COMPANY                       8
Boeing Vertol                        6
Boeing Helicopters Div.              3
Boeing - Canada (de Havilland)       2
BOEING 777-306ER                     1
BOEING COMPANY, LONG BEACH DIV       1
BOEING OF CANADA/DEHAV DIV           1
BOEING-STEARMAN                      1
BOEING-VERTOL                        1
Boeing (Stearman)                    1
Boeing Commercial Airplane Gro       1
Boeing Company                       1
Boeing-brown                         1
Name: count, dtype: int64

In [739]:
# change the various iterations of boeing to Boeing
df.loc[df['Make'].str.lower().str.startswith('boeing'), 'Make'] = 'Boeing'

df[df['Make'] == 'Boeing'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN             1397
Airplane        1325
Helicopter         5
Powered-Lift       1
Name: count, dtype: int64

In [740]:
# Boeing can safely be assigned to the Airplane category
df.loc[df['Make'] == 'Boeing', 'Aircraft_Category'] = 'Airplane'

In [741]:
# Let's look at the top 50 Make value counts and see if there are any that can be cleaned up
df['Make'].value_counts().head(60)

Make
Cessna                            26903
Piper                             14808
Beech                              5372
Bell                               2792
Boeing                             2728
Mooney                             1080
Grumman                            1080
Bellanca                           1036
Robinson                            940
Hughes                              794
Schweizer                           628
Air Tractor                         588
Mcdonnell Douglas                   499
Aeronca                             479
Maule                               443
Champion                            426
De Havilland                        370
Aero Commander                      356
Stinson                             342
Aerospatiale                        334
Rockwell                            328
Taylorcraft                         316
Luscombe                            316
Hiller                              311
North American                     

In [742]:
# change the various iterations of aeronca to Aeronca
df.loc[df['Make'].str.lower().str.startswith('aeronca'), 'Make'] = 'Aeronca'

df[df['Make'] == 'Aeronca'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         401
Airplane    232
Name: count, dtype: int64

In [743]:
# Aeronca is airplane
df.loc[df['Make'] == 'Aeronca', 'Aircraft_Category'] = 'Airplane'

In [744]:
# change the various iterations of Air Tractor and check its category values
df.loc[df['Make'].str.lower().str.startswith('air tractor'), 'Make'] = 'Air Tractor'

df[df['Make'] == 'Air Tractor'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         462
Airplane    448
Name: count, dtype: int64

In [745]:
# Air Tractor is Airplane
df.loc[df['Make'] == 'Air Tractor', 'Aircraft_Category'] = 'Airplane'

In [746]:
# look at airbus
df[df['Make'].str.lower().str.startswith('airbus')].value_counts('Make')

Make
AIRBUS                            251
Airbus Industrie                  135
Airbus                             37
AIRBUS INDUSTRIE                   22
AIRBUS HELICOPTERS                 10
AIRBUS HELICOPTERS INC              3
Airbus Helicopters                  2
AIRBUS HELICOPTER                   1
AIRBUS Helicopters                  1
AIRBUS/EUROCOPTER                   1
Airbus Helicopters (Eurocopte       1
Airbus Helicopters Deutschland      1
Airbus Industries                   1
Name: count, dtype: int64

In [747]:
df[df['Make'].str.lower().str.startswith('airbus')].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane        284
NaN             141
Helicopter       40
Powered-Lift      1
Name: count, dtype: int64

I see here that the various versions of Airbus have both airplanes and helicopters in the make and category columns, so before replacing make names and then filling in empty category values, I need to check the categories for some of the make iterations that may not be clear.

In [748]:
# check out Airbus Industrie iterations
df[df['Make'].str.lower().str.startswith('airbus i')].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         116
Airplane     42
Name: count, dtype: int64

So, Airbus Industrie, AIRBUS INDUSTRIE, and Airbus Industries can be combined and categorized as airplane

In [749]:
df.loc[df['Make'].isin(['AIRBUS INDUSTRIE', 'Airbus Industries']), 'Make'] = 'Airbus Industrie'

df.loc[df['Make'] == 'Airbus Industrie', 'Aircraft_Category'] = 'Airplane'

df[df['Make'].str.lower().str.startswith('airbus')].value_counts('Make')

Make
AIRBUS                            251
Airbus Industrie                  158
Airbus                             37
AIRBUS HELICOPTERS                 10
AIRBUS HELICOPTERS INC              3
Airbus Helicopters                  2
AIRBUS HELICOPTER                   1
AIRBUS Helicopters                  1
AIRBUS/EUROCOPTER                   1
Airbus Helicopters (Eurocopte       1
Airbus Helicopters Deutschland      1
Name: count, dtype: int64

In [750]:
# clean up Airbus Helicopters
df.loc[df['Make'].isin(['AIRBUS HELICOPTERS', 'AIRBUS Helicopters', 'AIRBUS HELICOPTERS INC', 'AIRBUS HELICOPTER', 'AIRBUS/EUROCOPTER', 'Airbus Helicopters (Eurocopte', 'Airbus Helicopters Deutschland']), 'Make'] = 'Airbus Helicopters'

df.loc[df['Make'] == 'Airbus Helicopters', 'Aircraft_Category'] = 'Helicopter'

df[df['Make'].str.lower().str.startswith('airbus')].value_counts('Make')

Make
AIRBUS                251
Airbus Industrie      158
Airbus                 37
Airbus Helicopters     20
Name: count, dtype: int64

In [751]:
# combine the Airbus iterations
df.loc[(df['Make'] == 'AIRBUS'), 'Make'] = 'Airbus'

df[df['Make'] == 'Airbus'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane        242
NaN              25
Helicopter       20
Powered-Lift      1
Name: count, dtype: int64

Since only 20 of the almost 300 records for Airbus are helicopters, we can safely make the NaN values Airplane

In [752]:
df.loc[(df['Make'] == 'Airbus') & (df['Aircraft_Category'].isna()), 'Aircraft_Category'] = 'Airplane'

df[df['Make'].str.lower().str.startswith('airbus')].value_counts('Make')

Make
Airbus                288
Airbus Industrie      158
Airbus Helicopters     20
Name: count, dtype: int64

In [753]:
df['Aircraft_Category'].value_counts(dropna=False)

Aircraft_Category
Airplane             62783
NaN                  18695
Helicopter            5250
Glider                 505
Balloon                231
Gyrocraft              173
Weight-Shift           161
Powered Parachute       91
Ultralight              30
Unknown                 11
WSFT                     9
Blimp                    4
Powered-Lift             4
UNK                      2
Rocket                   1
ULTR                     1
Name: count, dtype: int64

So from having over 56,000 empty values in the category column, we are down to 18,695 empty values. I'd like to bring this down even further by looking at the empty category values as compared with the Make column to see which makes have the most empty values for category.

In [754]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

Make
Grumman                         911
Mooney                          900
Hughes                          686
Robinson                        661
Schweizer                       537
                               ... 
York                              1
Warren-thomas                     1
Tennessee Engineering & Manf      1
Slade H. Holmes                   1
GRUMMAN AMERICAN AVN. CORP.       1
Name: count, Length: 4049, dtype: int64

Let's look at the category values for these makes that have the most empty category values

In [755]:
# Grumman
df[df['Make'] == 'Grumman'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         911
Airplane    169
Name: count, dtype: int64

So Grumman is Airplane

In [756]:
# check to see if there are any other versions of 'Grumman' in the make column
df[df['Make'].str.lower().str.startswith('grumm')].value_counts('Make')

Make
Grumman                           1080
Grumman American                   222
Grumman-schweizer                  121
GRUMMAN                             78
GRUMMAN ACFT ENG COR-SCHWEIZER      58
GRUMMAN AMERICAN AVN. CORP.         49
Grumman-Schweizer                    6
GRUMMAN AIRCRAFT ENG CORP            2
GRUMMAN AMERICAN                     2
Grumman Acft Eng                     2
Grumman American Aviation            2
GRUMMAN AIRCRAFT COR-SCHWEIZER       1
GRUMMAN AMERICAN AVIATION CORP       1
GRUMMAN AMERICAN AVN. CORP           1
GRUMMAN ACFT ENG COR                 1
GRUMMAN SCHWEIZER                    1
GRUMMAN AIRCRAFT                     1
Grumman American Avn. Corp.          1
Grumman Schweizer                    1
GRUMMAN American Corporation         1
Name: count, dtype: int64

In [757]:
# What are the category values for all these different versions of Grumman
df[df['Make'].str.lower().str.startswith('grumm')].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         1207
Airplane     424
Name: count, dtype: int64

In [758]:
# So let's combine all these Grumman makes together and make their category Airplane
df.loc[df['Make'].str.lower().str.startswith('grumm'), 'Make'] = 'Grumman'

df.loc[df['Make'] == 'Grumman', 'Aircraft_Category'] = 'Airplane'

df[df['Make'].str.lower().str.startswith('grumm')].value_counts('Make')

Make
Grumman    1631
Name: count, dtype: int64

In [759]:
# Mooney
df[df['Make'] == 'Mooney'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         900
Airplane    180
Name: count, dtype: int64

In [760]:
# check to see if there are any other versions of 'Mooney' in the make column
df[df['Make'].str.lower().str.startswith('moon')].value_counts('Make')

Make
Mooney                           1080
MOONEY                            242
MOONEY AIRCRAFT CORP.              34
MOONEY AIRPLANE CO INC             10
MOONEY AIRPLANE COMPANY, INC.       1
MOONEY INTERNATIONAL CORP           1
Moon                                1
Mooney Aircraft                     1
Mooney Aircraft Corp                1
Mooney Aircraft Corp.               1
Mooney Aircraft Corporation         1
Mooney, Dan                         1
Name: count, dtype: int64

In [761]:
# What are the category values for all these different versions of Mooney
df[df['Make'].str.lower().str.startswith('moon')].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         908
Airplane    466
Name: count, dtype: int64

In [762]:
# Combine all these Mooney makes together and make their category Airplane
df.loc[df['Make'].str.lower().str.startswith('mooney'), 'Make'] = 'Mooney'

df.loc[df['Make'] == 'Mooney', 'Aircraft_Category'] = 'Airplane'

df[df['Make'].str.lower().str.startswith('moon')].value_counts('Make')

Make
Mooney    1373
Moon         1
Name: count, dtype: int64

In [763]:
# Hughes
df[df['Make'] == 'Hughes'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           686
Helicopter    108
Name: count, dtype: int64

Hughes would be all helicopters

In [764]:
# check to see if there are any other versions of 'Hughes' in the make column
df[df['Make'].str.lower().str.startswith('hughes')].value_counts('Make')

Make
Hughes                          794
HUGHES                          137
HUGHES HELICOPTERS INC            3
HUGHES AERO CORP                  2
HUGHES CHARLES R                  1
HUGHES/HELICOPTER ASSOCS INC      1
Hughes Aero                       1
Hughes Cassutt                    1
Hughes J/Hughes J                 1
Name: count, dtype: int64

In [765]:
df[df['Make'].str.lower().str.startswith('hughes')].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN                  688
Helicopter           248
Airplane               3
Powered Parachute      2
Name: count, dtype: int64

Now here we have a few airplanes and parachutes in addition to all the helicopters in our list of Hughes interations. This may be due to some people named Hughes in the list that are not associated with the helicopter company. We can narrow the list down to find just the helicopter Hughes.

In [766]:
df[df['Make'].isin(['Hughes', 'HUGHES', 'HUGHES HELICOPTERS INC', 'HUGHES/HELICOPTER ASSOCS INC'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           687
Helicopter    248
Name: count, dtype: int64

In [767]:
# So these 4 can be combined and made Helicopter in the category field
df.loc[df['Make'].isin(['Hughes', 'HUGHES', 'HUGHES HELICOPTERS INC', 'HUGHES/HELICOPTER ASSOCS INC']), 'Make'] = 'Hughes Helicopters'

df.loc[df['Make'] == 'Hughes Helicopters', 'Aircraft_Category'] = 'Helicopter'

df[df['Make'] == 'Hughes Helicopters'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Helicopter    935
Name: count, dtype: int64

In [768]:
# Robinson
df[df['Make'].str.lower().str.startswith('robins')].value_counts('Make')

Make
Robinson                       940
ROBINSON                       283
ROBINSON HELICOPTER            221
ROBINSON HELICOPTER COMPANY    179
ROBINSON HELICOPTER CO          22
Robinson Helicopter Company     15
Robinson Helicopter              9
ROBINSON MICHAEL E               2
ROBINSON HELICOPTER CO INC       1
ROBINSON STEWART J               1
Robinson Helicopter Co.          1
Robinson Helicopters             1
Name: count, dtype: int64

In [769]:
df[df['Make'].str.lower().str.startswith('robins')].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Helicopter    981
NaN           690
Airplane        3
Unknown         1
Name: count, dtype: int64

In [770]:
# combine all the Robinson Helicopter iterations and make them Helicopter
df.loc[df['Make'].isin(['ROBINSON', 'ROBINSON HELICOPTER', 'ROBINSON HELICOPTER COMPANY', 'ROBINSON HELICOPTER CO', 'Robinson Helicopter Company', 'Robinson Helicopter', 'ROBINSON HELICOPTER CO INC', 'Robinson Helicopter Co.', 'Robinson Helicopters']), 'Make'] = 'Robinson'

df.loc[df['Make'] == 'Robinson', 'Aircraft_Category'] = 'Helicopter'

df[df['Make'].str.lower().str.startswith('robins')].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Helicopter    1672
Airplane         3
Name: count, dtype: int64

In [771]:
# Schweizer
df[df['Make'].str.lower().str.startswith('schweiz')].value_counts('Make')

Make
Schweizer                         628
SCHWEIZER                         144
SCHWEIZER AIRCRAFT CORP            18
SCHWEIZER(HUGHES)AIRCRAFT CORP      2
Schweizer Aircraft Corp             2
Schweizer Aircraft Corp.            2
SCHWEIZER(HUGHES)                   1
Schweizer 300CBi                    1
Schweizer Sgs                       1
Schweizer, N36289                   1
Name: count, dtype: int64

In [772]:
df[df['Make'].str.lower().str.startswith('schweiz')].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           541
Helicopter    115
Glider        111
Airplane       32
Unknown         1
Name: count, dtype: int64

A more healthy mixture here requires some investigation

In [773]:
df[df['Make'].isin(['SCHWEIZER', 'Schweizer'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           539
Helicopter    111
Glider        110
Airplane       11
Unknown         1
Name: count, dtype: int64

A quick google search informs me that Schweizer Aircraft made helicopters, gliders, and airplanes, so filling in the category column for Schweizer cannot be accomplished just by using the make column. As the empty values only amount to almost 550, I'm going to leave Schweizer alone for now, except for combining the makes together so that I would be able to more easily dig into it using the model column as well.

In [774]:
df.loc[df['Make'].isin(['SCHWEIZER', 'SCHWEIZER AIRCRAFT CORP', 'Schweizer Aircraft Corp', 'Schweizer Aircraft Corp.', 'Schweizer 300CBi', 'Schweizer Sgs']), 'Make'] = 'Schweizer'

df[df['Make'].str.lower().str.startswith('schweiz')].value_counts('Make')

Make
Schweizer                         796
SCHWEIZER(HUGHES)AIRCRAFT CORP      2
SCHWEIZER(HUGHES)                   1
Schweizer, N36289                   1
Name: count, dtype: int64

Now how are the empty category counts looking?

In [775]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

Make
Schweizer            541
Mcdonnell Douglas    447
Maule                355
Champion             347
Aero Commander       317
                    ... 
Lamb/starduster        1
Reif                   1
Dale Conover           1
Curt Hoffstad          1
ROYSE RALPH L          1
Name: count, Length: 4029, dtype: int64

In [776]:
# Let's look at the Scheizer models
df[df['Make'].isin(['Schweizer'])].value_counts('Model', dropna=False)

Model
269C           154
G-164B         109
SGS 2-33A       69
269C-1          41
G-164A          25
              ... 
G167B            1
G164A "450"      1
G164-B           1
G-164A-450       1
TG3A             1
Name: count, Length: 150, dtype: int64

Wikepedia and Google informs me that the Schweizer 269C is a helicopter, G-164B is an airplane, SGS 2-33A is a glider, 269C-1 is a helicopter, and G-164A is an airplane. Let's see if that data could e used to fill some of the Schweizer category values.

In [777]:
df[df['Model'].isin(['269C'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Helicopter    253
NaN            87
Unknown         1
Name: count, dtype: int64

In [778]:
# Since the 269C model is a helicopter, let's fix all the empty category values for it. This fix will also fill in some category 
# values for other makes as well since we can see that there are more 269C models than just the Schweizer make.
df.loc[df['Model'] == '269C', 'Aircraft_Category'] = 'Helicopter'

In [779]:
# The same goes for the rest of the models listed
df[df['Model'].isin(['G-164B', 'G-164A'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    738
NaN         135
Name: count, dtype: int64

In [780]:
df.loc[df['Model'].isin(['G-164B', 'G-164A']), 'Aircraft_Category'] = 'Airplane'

In [781]:
df[df['Model'].isin(['SGS 2-33A'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN       44
Glider    25
Name: count, dtype: int64

In [782]:
df.loc[df['Model'] == 'SGS 2-33A', 'Aircraft_Category'] = 'Glider'

In [783]:
# Let's look at how the category column is shaping up
df['Aircraft_Category'].value_counts(dropna=False)

Aircraft_Category
Airplane             65032
NaN                  14938
Helicopter            6716
Glider                 549
Balloon                231
Gyrocraft              173
Weight-Shift           161
Powered Parachute       91
Ultralight              30
Unknown                  9
WSFT                     9
Blimp                    4
Powered-Lift             4
UNK                      2
Rocket                   1
ULTR                     1
Name: count, dtype: int64

We still have about 15,000 empty category records. This can be brought down further using Makes and Models. The category values as we have them now show that airplanes are the overwhelmingly largest percentage of aircraft in the dataset of accidents. But after helicopters, the rest of the categories are tiny by comparison, and they constitute aircraft that would not ordinarily be under consideration for a business interested in getting into the aviation business. I'm not going to just drop those rows right now, but in the analysis phase, I don't anticipate using them.

In [784]:
df[df['Aircraft_Category'].isna()]['Model'].value_counts()

Model
UH-12E          134
8A              133
S2R             126
S-2R            114
DHC-2           112
               ... 
PROTECH PT-2      1
L-1011-1          1
GLASAIR GARG      1
C3C               1
EMB145            1
Name: count, Length: 5545, dtype: int64

In [785]:
# Let's do the Mcdonnell Douglas make, and see about using the models in conjunction
df[df['Make'].str.lower().str.startswith('mcdonn')].value_counts('Make')

Make
Mcdonnell Douglas                 499
MCDONNELL DOUGLAS                  78
MCDONNELL DOUGLAS HELICOPTER       31
MCDONNELL DOUGLAS HELI CO          11
MCDONNELL DOUGLAS AIRCRAFT CO       6
McDonnell Douglas                   4
Mcdonnell-douglas                   2
MCDONNELL DOUGLAS CORPORATION       1
MCDONNELL-DOUGLAS                   1
McDonnell Douglas Helicopter        1
McDonnell Douglas Helicopter C      1
McDonnell Douglas Helicopters       1
Mcdonnell Douglas Helicopter        1
Mcdonnell Douglas Helicopters       1
Name: count, dtype: int64

In [786]:
df[df['Make'].isin(['MCDONNELL DOUGLAS HELICOPTER', 'MCDONNELL DOUGLAS HELI CO'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Helicopter    42
Name: count, dtype: int64

In [787]:
# Combine the helicopter variations of the name
df.loc[df['Make'].isin(['MCDONNELL DOUGLAS HELICOPTER', 'MCDONNELL DOUGLAS HELI CO', 'McDonnell Douglas Helicopter', 'McDonnell Douglas Helicopter C', 'McDonnell Douglas Helicopters', 'Mcdonnell Douglas Helicopter', 'Mcdonnell Douglas Helicopters']), 'Make'] = 'Mcdonnell Douglas Helicopters'

df[df['Make'].isin(['Mcdonnell Douglas Helicopters'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Helicopter    47
Name: count, dtype: int64

In [788]:
df[df['Make'].str.lower().str.startswith('mcdonn')].value_counts('Make')

Make
Mcdonnell Douglas                499
MCDONNELL DOUGLAS                 78
Mcdonnell Douglas Helicopters     47
MCDONNELL DOUGLAS AIRCRAFT CO      6
McDonnell Douglas                  4
Mcdonnell-douglas                  2
MCDONNELL DOUGLAS CORPORATION      1
MCDONNELL-DOUGLAS                  1
Name: count, dtype: int64

In [789]:
df.loc[df['Make'].isin(['MCDONNELL DOUGLAS', 'MCDONNELL DOUGLAS AIRCRAFT CO', 'McDonnell Douglas', 'Mcdonnell-douglas', 'MCDONNELL DOUGLAS CORPORATION', 'MCDONNELL-DOUGLAS']), 'Make'] = 'Mcdonnell Douglas'

df[df['Make'].isin(['Mcdonnell Douglas'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           450
Airplane      123
Helicopter     18
Name: count, dtype: int64

In [790]:
# make the 18 Helicopters the Mcdonnell Douglas Helicopters Make
df.loc[(df['Make'] == 'Mcdonnell Douglas') & (df['Aircraft_Category'] == 'Helicopter'), 'Make'] = 'Mcdonnell Douglas Helicopters'

df[df['Make'].isin(['Mcdonnell Douglas'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         450
Airplane    123
Name: count, dtype: int64

In [791]:
df.loc[(df['Make'] == 'Mcdonnell Douglas'), 'Aircraft_Category'] = 'Airplane'

df[df['Make'].str.lower().str.startswith('mcdonn')].value_counts('Make')

Make
Mcdonnell Douglas                573
Mcdonnell Douglas Helicopters     65
Name: count, dtype: int64

In [792]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

Make
Maule                     355
Champion                  347
Aero Commander            317
De Havilland              316
Schweizer                 299
                         ... 
Lighthizer, Richard E.      1
Steven Ulrich               1
Tate                        1
Arnold Forest               1
ROYSE RALPH L               1
Name: count, Length: 4024, dtype: int64

In [793]:
# The Maule make
df[df['Make'].str.lower().str.startswith('maul')].value_counts('Make')

Make
Maule                  443
MAULE                  144
MAULE AIRCRAFT CORP      1
Maule Air Inc.           1
Name: count, dtype: int64

In [794]:
df.loc[df['Make'].isin(['MAULE', 'MAULE AIRCRAFT CORP', 'Maule Air Inc.']), 'Make'] = 'Maule'

df[df['Make'].isin(['Maule'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         356
Airplane    233
Name: count, dtype: int64

In [795]:
df.loc[(df['Make'] == 'Maule'), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts()

Make
Champion                  347
Aero Commander            317
De Havilland              316
Schweizer                 299
Rockwell                  293
                         ... 
Angel Elbert S Jr           1
Lighthizer, Richard E.      1
Steven Ulrich               1
Tate                        1
ROYSE RALPH L               1
Name: count, Length: 4022, dtype: int64

In [796]:
# The Champion make
df[df['Make'].str.lower().str.startswith('champ')].value_counts('Make')

Make
Champion    426
CHAMPION     91
Name: count, dtype: int64

In [797]:
df.loc[df['Make'].isin(['CHAMPION']), 'Make'] = 'Champion'

df[df['Make'].isin(['Champion'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         347
Airplane    170
Name: count, dtype: int64

In [798]:
df.loc[(df['Make'] == 'Champion'), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts()

Make
Aero Commander            317
De Havilland              316
Schweizer                 299
Rockwell                  293
Stinson                   287
                         ... 
Angel Elbert S Jr           1
Lighthizer, Richard E.      1
Steven Ulrich               1
Tate                        1
ROYSE RALPH L               1
Name: count, Length: 4021, dtype: int64

In [799]:
# The Aero Commander make
df[df['Make'].str.lower().str.startswith('aero c')].value_counts('Make')

Make
Aero Commander    356
AERO COMMANDER     69
Aero Comp Inc       1
Name: count, dtype: int64

In [800]:
df.loc[df['Make'].isin(['AERO COMMANDER']), 'Make'] = 'Aero Commander'

df[df['Make'].isin(['Aero Commander'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         318
Airplane    107
Name: count, dtype: int64

In [801]:
df.loc[(df['Make'] == 'Aero Commander'), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts()

Make
De Havilland              316
Schweizer                 299
Rockwell                  293
Stinson                   287
Hiller                    282
                         ... 
Angel Elbert S Jr           1
Lighthizer, Richard E.      1
Steven Ulrich               1
Tate                        1
ROYSE RALPH L               1
Name: count, Length: 4019, dtype: int64

In [802]:
# The De Havilland make
de_havilland_variations = df[df['Make'].str.lower().str.contains(r'de\s?havil?land', regex=True)]
de_havilland_variations.value_counts('Make')

Make
De Havilland          370
DEHAVILLAND            91
DE HAVILLAND           31
de Havilland            9
Dehavilland             8
DeHavilland             2
DEHAVILLAND CANADA      1
Name: count, dtype: int64

In [803]:
# combine all these variations of De Havilland into one make
df.loc[df['Make'].str.lower().str.contains(r'de\s?havil?land', regex=True), 'Make'] = 'De Havilland'

df[df['Make'].isin(['De Havilland'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         328
Airplane    184
Name: count, dtype: int64

In [804]:
df.loc[(df['Make'] == 'De Havilland'), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts()

Make
Schweizer            299
Rockwell             293
Stinson              287
Aerospatiale         282
Hiller               282
                    ... 
Cooprider              1
Angel Elbert S Jr      1
Arnold Forest          1
Steven Ulrich          1
ROYSE RALPH L          1
Name: count, Length: 4016, dtype: int64

In [805]:
# Let's look at the Models overall for NaN values in Category
df[df['Aircraft_Category'].isna()]['Model'].value_counts()

Model
UH-12E         134
8A             133
S-2R           106
415-C           89
BC12-D          86
              ... 
160              1
CAYUSE           1
EAA SPECIAL      1
TINY TWO         1
EMB145           1
Name: count, Length: 5239, dtype: int64

Google tells me that a UH-12E is a helicopter, while 8A, S-2R, 415-C, BC12-D are airplanes. And running the function like "df[df['Model'].isin(['BC12-D'])].value_counts('Aircraft_Category', dropna=False)" verifies this. So let's correct those category values

In [806]:
#Edit one model's category value
df.loc[df['Model'] == 'UH-12E', 'Aircraft_Category'] = 'Helicopter'

#Edit multiple models' category value
df.loc[df['Model'].isin(['8A', 'S-2R', '415-C', 'BC12-D']), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Model'].value_counts()

Model
108-3              76
S2R                74
RV-4               72
108-2              72
KR-2               71
                   ..
VARIEZE,LONG EZ     1
ACRO-SPECIAL        1
AVID FLYER "C"      1
BD-5 B              1
EMB145              1
Name: count, Length: 5234, dtype: int64

In [807]:
# Running this function tells me the top 5 are airplanes
df[df['Model'].isin(['108-3', 'S2R', 'RV-4', '108-2', 'KR-2'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         365
Airplane    200
Name: count, dtype: int64

In [808]:
#Edit multiple models' category value
df.loc[df['Model'].isin(['108-3', 'S2R', 'RV-4', '108-2', 'KR-2']), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Model'].value_counts()

Model
108-1              69
LA-4-200           64
GC-1B              64
8E                 55
L-13               54
                   ..
VARIEZE,LONG EZ     1
ACRO-SPECIAL        1
AVID FLYER "C"      1
BD-5 B              1
EMB145              1
Name: count, Length: 5229, dtype: int64

In [809]:
# the top 4 are all airplanes
df[df['Model'].isin(['108-1', 'LA-4-200', 'GC-1B', '8E'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         252
Airplane     84
Name: count, dtype: int64

In [810]:
#Edit multiple models' category value
df.loc[df['Model'].isin(['108-1', 'LA-4-200', 'GC-1B', '8E']), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Model'].value_counts()

Model
Unknown              54
L-13                 54
VARIEZE              45
A-1                  44
F-28C                43
                     ..
CAYUSE                1
415-C AIRCOUPE        1
TINY TWO              1
FOX III SPEEDSTER     1
EMB145                1
Name: count, Length: 5225, dtype: int64

In [811]:
df[df['Model'].isin(['F-28C'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           43
Helicopter     8
Name: count, dtype: int64

glider - L-13; airplane - VARIEZE, A-1; helicopter - F-28C;

In [812]:
#Edit one model's category value
df.loc[df['Model'] == 'L-13', 'Aircraft_Category'] = 'Glider'
df.loc[df['Model'] == 'F-28C', 'Aircraft_Category'] = 'Helicopter'

#Edit multiple models' category value
df.loc[df['Model'].isin(['VARIEZE', 'A-1']), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Model'].value_counts()

Model
Unknown            54
FH-1100            43
108                42
AVID FLYER         40
RV-6               39
                   ..
VARIEZE,LONG EZ     1
ACRO-SPECIAL        1
AVID FLYER "C"      1
BD-5 B              1
EMB145              1
Name: count, Length: 5221, dtype: int64

In [813]:
df[df['Model'].isin(['RV-6'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           39
Airplane      34
Helicopter     1
Name: count, dtype: int64

helicopter - FH-1100; airplane - 108, AVID FLYER, RV-6;

In [814]:
#Edit one model's category value
df.loc[df['Model'] == 'FH-1100', 'Aircraft_Category'] = 'Helicopter'

#Edit multiple models' category value
df.loc[df['Model'].isin(['108', 'AVID FLYER', 'RV-6']), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Model'].value_counts()

Model
Unknown            54
BC-12D             39
280C               38
35A                38
S2R-T34            36
                   ..
VARIEZE,LONG EZ     1
ACRO-SPECIAL        1
AVID FLYER "C"      1
BD-5 B              1
EMB145              1
Name: count, Length: 5217, dtype: int64

In [815]:
df[df['Model'].isin(['S2R-T34'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         36
Airplane    23
Name: count, dtype: int64

airplane - BC-12D, 35A, S2R-T34; helicopter - 280C;

In [816]:
#Edit one model's category value
df.loc[df['Model'] == '280C', 'Aircraft_Category'] = 'Helicopter'

#Edit multiple models' category value
df.loc[df['Model'].isin(['BC-12D', '35A', 'S2R-T34']), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Model'].value_counts().head(50)

Model
Unknown          54
FIREFLY 7        36
NAVION           36
RV-6A            36
F-28A            36
M-18A            35
UPF-7            35
CHALLENGER II    35
CL-600-2B19      32
DW-1             31
AA-1             31
SA226TC          30
2T-1A-2          30
415C             30
DC-3             28
H-295            28
VARI-EZE         27
AA-1A            27
8F               27
SA227-AC         26
SKYBOLT          26
QUICKIE          26
MU-2B-60         25
LONG-EZ          25
LA-4             24
SONERAI II       24
S-2B             23
KITFOX           23
THORP T-18       23
SR22             23
S-76A            22
RC-3             22
A                22
201B             22
BLANIK L-13      21
F-28F            21
MUSTANG II       21
B-2B             21
AA-5B            21
DC-3C            21
S-60A            20
Q2               20
114              20
S-1B2            20
SNJ-5            20
2150A            20
P-51D            20
AT-6D            20
T-6G             20
UH-12C        

Instead of just a few at a time, we can display the top 50 models with no category value and go from there.

In [817]:
df[df['Model'].isin(['UH-12C'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           20
Helicopter     1
Name: count, dtype: int64

In [818]:
#Edit one model's category value
df.loc[df['Model'] == 'BLANIK L-13', 'Aircraft_Category'] = 'Glider'

#Edit multiple models' category value
df.loc[df['Model'].isin(['114', '201B', '2150A', '2T-1A-2', '415C', '8F', 'A', 'AA-1', 'AA-1A', 'AA-5B', 'AT-6D', 'CHALLENGER II', 'CL-600-2B19', 'DC-3', 'DC-3C', 'DW-1', 'H-295', 'KITFOX', 'LA-4',
                        'LONG-EZ', 'M-18A', 'MU-2B-60', 'MUSTANG II', 'NAVION', 'P-51D', 'Q2', 'QUICKIE', 'RC-3', 'RV-6A', 'S-1B2', 'S-2B', 'SA226TC', 'SA227-AC', 'SKYBOLT', 'SNJ-5', 'SONERAI II',
                        'SR22', 'T-6G', 'THORP T-18', 'UPF-7', 'VARI-EZE']), 'Aircraft_Category'] = 'Airplane'

df.loc[df['Model'].isin(['B-2B', 'F-28A', 'F-28F', 'S-76A', 'UH-12C']), 'Aircraft_Category'] = 'Helicopter'

df.loc[df['Model'].isin(['FIREFLY 7', 'S-60A']), 'Aircraft_Category'] = 'Balloon'

df[df['Aircraft_Category'].isna()]['Model'].value_counts().head(50)

Model
Unknown              54
II                   20
MONI                 19
DRAGONFLY            19
LONG EZ              19
STEEN SKYBOLT        19
UH-12B               19
PZL-M-18             19
ST3KR                18
ASW-20               18
112A                 18
QUICKIE Q2           18
UH-12D               18
TIERRA II            18
AS-350D              17
S-1S                 17
S-1                  17
AS350D               17
CHRISTEN EAGLE II    17
B-8M                 17
BD-4                 17
SR-22                16
Q-2                  16
AEROSTAR 600         16
NAVION A             16
G103                 16
T-18                 16
RANS S-12            16
DC-9-32              16
SGS-2-33A            16
S2R-600              16
PITTS S-2B           16
TB-20                16
SA315B               16
340B                 16
SGS 1-26E            15
415-D                15
AS350BA              15
BD-5B                15
PITTS S-1S           15
112TC                15
620B      

In [819]:
df[df['Model'].isin(['Q-2'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         16
Airplane     1
Name: count, dtype: int64

In [820]:
df[df['Model'].isin(['Q-2'])].value_counts('Amateur_Built', dropna=False)

Amateur_Built
Yes    17
Name: count, dtype: int64

In [821]:
#Edit one model's category value
df.loc[df['Model'] == 'B-8M', 'Aircraft_Category'] = 'Gyrocraft'
df.loc[df['Model'] == 'TIERRA II', 'Aircraft_Category'] = 'Ultralight'

#Edit multiple models' category value
df.loc[df['Model'].isin(['DRAGONFLY', 'LONG EZ', 'STEEN SKYBOLT', 'PZL-M-18', 'ST3KR', '112A', 'QUICKIE Q2', 'S-1S', 'S-1', 'CHRISTEN EAGLE II', 'BD-4', 'SR-22']), 'Aircraft_Category'] = 'Airplane'

df.loc[df['Model'].isin(['UH-12B', 'UH-12D', 'AS-350D', 'AS350D']), 'Aircraft_Category'] = 'Helicopter'

df.loc[df['Model'].isin(['MONI', 'ASW-20']), 'Aircraft_Category'] = 'Glider'

df[df['Aircraft_Category'].isna()]['Model'].value_counts().head(50)

Model
Unknown           54
II                20
AEROSTAR 600      16
Q-2               16
RANS S-12         16
T-18              16
S2R-600           16
SA315B            16
DC-9-32           16
NAVION A          16
SGS-2-33A         16
TB-20             16
G103              16
340B              16
PITTS S-2B        16
AS350B            15
620B              15
PITTS S-1S        15
SGS 1-26E         15
415-D             15
SA316B            15
BD-5B             15
DC-9-31           15
AS350BA           15
SA-226T           15
112TC             15
269C-1            15
SGS 2-33          14
F28C              14
ATR-42-300        14
GLASAIR           14
B8M               14
A2                14
IS-28B2           14
QUICKSILVER MX    14
F-19              14
RV4               14
201C              14
25B               14
S-2A              13
BC12D             13
SA-26AT           13
SGS 1-34          13
L-1011-385-1      13
SR20              13
KITFOX II         13
35                13
A-1B   

In [822]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Schweizer         298
Aerospatiale      253
Douglas           173
North American    161
Taylorcraft       144
Rockwell          141
Sikorsky          106
Burkhart Grob     100
Fairchild          97
Lockheed           94
Name: count, dtype: int64

In [823]:
df[df['Make'].isin(['Schweizer'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           298
Helicopter    199
Glider        156
Airplane      143
Name: count, dtype: int64

In [824]:
# Look at the Schweizer models again that have empty category values
df[df['Make'].isin(['Schweizer']) & df['Aircraft_Category'].isna()].value_counts('Model', dropna=False).head(60)

Model
SGS-2-33A      16
SGS 1-26E      15
269C-1         15
SGS 2-33       14
SGS 1-34       13
300C           11
269D           10
SGS 1-26B       8
SGS 2-32        8
2-33A           8
G164B           7
SGS 1-36        6
SGS2-33A        6
2-33            6
SGS-1-35C       5
SGS 1-26A       5
G-164           4
2-32            4
SGS 1-26C       4
SGS 1-26D       4
1-26E           4
SGS-1-26B       4
SGS 1-35        4
SGS-126E        3
SGS-2-33        3
SGS-233A        3
G164A           3
SGS-1-35        3
SGS 1-26        3
G-164-A         3
G164            2
SGU 2-22E       2
SGU-2-22E       2
SGS-1-34        2
2-33-A          2
1-36            2
1-35C           2
SGU2-22E        2
HUGHES 269C     2
269B            2
G164D           2
SGS1-36         2
SGS1-34         2
G-164D          2
G-164C          2
G-164B-600      2
SGS-1-26E       2
T-26E           1
SGS-1-26        1
SSG 2-33A       1
SGS-1-26A       1
SGS 2-8         1
SGU-22          1
SGS 2-33AK      1
SGU-2-22K       1
SGS1

In [825]:
#Edit multiple models' category value
df.loc[df['Model'].isin(['1-26E', '1-35C', '1-36', '2-32', '2-33', '2-33-A', '2-33A', 'SGS 1-26', 'SGS 1-26A', 'SGS 1-26B', 'SGS 1-26C', 'SGS 1-26D', 'SGS 1-26E', 'SGS 1-34', 'SGS 1-35', 'SGS 1-36',
                        'SGS 2-32', 'SGS 2-33', 'SGS 2-33AK', 'SGS 2-8', 'SGS-1-26', 'SGS-1-26A', 'SGS-1-26B', 'SGS-1-26E', 'SGS-1-30', 'SGS-1-34', 'SGS-1-35', 'SGS-1-35C', 'SGS-126D', 'SGS-126E',
                        'SGS-2-33', 'SGS-2-33A', 'SGS-233A', 'SGS1-26C', 'SGS1-26D', 'SGS1-34', 'SGS1-36', 'SGS2-33A', 'SGU 2-22CK', 'SGU 2-22E', 'SGU-2-22E', 'SGU-2-22K', 'SGU-22', 'SGU2-22E',
                        'SSG 2-33A', 'T-26E']), 'Aircraft_Category'] = 'Glider'

df.loc[df['Model'].isin(['269B', '269C-1', '269D', '300C', 'HUGHES 269C']), 'Aircraft_Category'] = 'Helicopter'

df.loc[df['Model'].isin(['G-164', 'G-164-A', 'G-164B-600', 'G-164C', 'G-164D', 'G164', 'G164A', 'G164B', 'G164D']), 'Aircraft_Category'] = 'Airplane'

df[df['Make'].isin(['Schweizer']) & df['Aircraft_Category'].isna()].value_counts('Model', dropna=False).head(60)

Model
-269C             1
I-26E             1
S2-33A            1
SA 2-37A          1
SC2-33A           1
SGS 1-23          1
SGS 1-23G         1
SGS 1-23H-15      1
SGS 1-26F         1
SGS 1-34R         1
SGS 1-35C         1
SGS 126B          1
SGS 126E          1
SGS 135           1
SGS-1-36          1
SGS-2-32          1
SGS-2-32A         1
SGS-233           1
SGS1-26-D         1
SGS1-26A          1
SGS2-32           1
SGS2-33           1
SGS233A           1
S-2-33A           1
I-26D             1
1-23              1
H-300             1
1-24              1
1-26              1
1-26B             1
1-26D             1
126-D             1
134               1
2-22EK            1
233A              1
269               1
269-C             1
269-C1            1
333               1
AG CAT            1
FGS-233           1
G-164-B           1
G-164A-450        1
G164-B            1
G164A "450"       1
G167B             1
GRUMMAN G-164A    1
GRUMMAN G-164B    1
TG3A              1
Name: count, d

In [826]:
#Edit multiple models' category value
df.loc[df['Model'].isin(['134', '1-23', '1-24', '1-26', '1-26B', '1-26D', '126-D', '2-22EK', '233A', 'FGS-233', 'I-26D', 'I-26E', 'S-2-33A', 'S2-33A', 'SC2-33A', 'SGS 1-23', 'SGS 1-23G', 'SGS 1-23H-15',
                        'SGS 1-26F', 'SGS 1-34R', 'SGS 1-35C', 'SGS 126B', 'SGS 126E', 'SGS 135', 'SGS-1-36', 'SGS-2-32', 'SGS-2-32A', 'SGS-233', 'SGS1-26-D', 'SGS1-26A', 'SGS2-32', 'SGS2-33',
                        'SGS233A', 'TG3A']), 'Aircraft_Category'] = 'Glider'

df.loc[df['Model'].isin(['269', '333', '-269C', '269-C', '269-C1', 'H-300']), 'Aircraft_Category'] = 'Helicopter'

df.loc[df['Model'].isin(['SA 2-37A', 'AG CAT', 'G-164-B', 'G-164A-450', 'G164-B', 'G164A \"450\"', 'G167B', 'GRUMMAN G-164A', 'GRUMMAN G-164B']), 'Aircraft_Category'] = 'Airplane'

df[df['Make'].isin(['Schweizer'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Glider        372
Helicopter    245
Airplane      179
Name: count, dtype: int64

In [827]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Aerospatiale      253
Douglas           173
North American    161
Taylorcraft       144
Rockwell          141
Sikorsky          106
Burkhart Grob     100
Fairchild          97
Lockheed           94
Ayres              87
Name: count, dtype: int64

In [828]:
df[df['Make'].isin(['Aerospatiale']) & df['Aircraft_Category'].isna()].value_counts('Model', dropna=False).head(60)

Model
SA315B             16
AS350B             14
SA316B             14
ATR-42-300         12
AS-350B            10
AS350BA             9
ATR-72-212          8
SA-315B             8
TB-20               6
AS-355F-1           6
AS-355-F1           5
SA341G              5
SA-341G             5
AS-350BA            4
316B                3
AS-350-B            3
AS-355F             3
AS-350              3
AS-355-F            3
ATR-42-320          3
SA 315B             3
SA-360C             3
SA319B              3
TB-21               3
ATR-42              2
ATR 42-300          2
AS35OD              2
AS355F              2
SA-315              2
SA-319B             2
SA315B LAMA         2
AS-355E             2
ATR-72              2
AS355F1             2
AS-350-B2           2
AS 355F1            2
AS-355              2
350D                2
AS-350-BA           2
AS 355F             2
ALOUETTE 3          1
SA315-D LAMA        1
SA-365-N2           1
AS 315B             1
AS 350 ASTAR        1
AS-3

In [829]:
df[df['Make'].isin(['Aerospatiale'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           253
Helicopter     79
Airplane        2
Name: count, dtype: int64

I see only 2 airplanes listed for Aerospatiale. So which models are those?

In [830]:
df[df['Make'].isin(['Aerospatiale']) & df['Aircraft_Category'].isin(['Airplane'])].value_counts('Model', dropna=False)

Model
ATR 42-320    1
ATR-42-300    1
Name: count, dtype: int64

So this tells me that models beginning with 'ATR' would be airplanes

In [831]:
df[df['Make'].isin(['Aerospatiale']) & df['Aircraft_Category'].isin(['Helicopter'])].value_counts('Model', dropna=False)

Model
AS-350D       17
AS350D        15
AS350          6
SA316B         4
SA315B         4
AS350BA        3
AS-355F        2
AS350-B2       2
SA-316B        2
SA-315B        2
SA 315B        2
AS-350-BA      1
SA 315 B       1
350B1          1
SA-360C        1
SA-341G        1
SA-319B        1
SA-318C        1
SA 316B        1
AS355F1        1
S350D          1
AS 365 N-2     1
AS355          1
350D           1
341G           1
AS350B2        1
AS350B         1
AS350-D        1
AS 355F2       1
AS-355-F2      1
315B           1
Name: count, dtype: int64

And helicopter models begin with 'AS-' and 'SA-'

In [832]:
#Edit multiple models' category value
df.loc[df['Model'].isin(['355', '316B', '350 B1', '350D', 'A-300B4', 'ALOUETTE 3', 'AS 315B', 'AS 350 ASTAR', 'AS 350B', 'AS 350B-2', 'AS 350D', 'AS 355 F', 'AS 355F', 'AS 355F1', 'AS-350', 'AS-350-B',
                        'AS-350-B2', 'AS-350-BA', 'AS-350B', 'AS-350BA', 'AS-355', 'AS-355-F', 'AS-355-F1', 'AS-355-F2', 'AS-355E', 'AS-355F', 'AS-355F-1', 'AS350B', 'AS350BA', 'AS355F', 'AS355F1',
                        'AS35OD', 'SA 315B', 'SA 360C', 'SA-315', 'SA-315-B', 'SA-315B', 'SA-316 ALOUETTE', 'SA-316B', 'SA-319B', 'SA-330J', 'SA-341G', 'SA-360C', 'SA-365-N2', 'SA315-D LAMA', 'SA315B',
                        'SA315B LAMA', 'SA316B', 'SA318C', 'SA319B', 'SA341G']), 'Aircraft_Category'] = 'Helicopter'

df.loc[df['Model'].isin(['ATR 42-300', 'ATR-42', 'ATR-42-300', 'ATR-42-320', 'ATR-72', 'ATR-72-212', 'TB-20', 'TB-21', 'TB20']), 'Aircraft_Category'] = 'Airplane'

df[df['Make'].isin(['Aerospatiale']) & df['Aircraft_Category'].isna()].value_counts('Model', dropna=False).head(60)

Model
316B ALOUETTE III       1
AS365N                  1
ATR 72-212              1
ATR-42-500              1
ATR-72-12               1
ATR42-300               1
ATR72-212               1
CONCORDE VERSION 101    1
Concorde                1
ND-26                   1
SA 315                  1
SA 316B                 1
SA319B Alouette III     1
SA330J                  1
SA360C DAUPHIN          1
SA365-N1                1
SA365N                  1
SE 3180                 1
SE 318C                 1
SE316B                  1
SF3130                  1
SN-601                  1
TB-10                   1
ATR 42-320              1
AS355F2                 1
350-B                   1
AS355F-1                1
350B                    1
AS 355 F ECUREUIL       1
AS 355F-1               1
AS-332L                 1
AS-341G                 1
AS-350-B3               1
AS-350B1                1
AS-350B2                1
AS-350BII               1
AS-355F1                1
AS-365-N2               1
AS315B

In [833]:
#Edit multiple models' category value
df.loc[df['Model'].isin(['316B ALOUETTE III', '350-B', '350B', 'AS 355 F ECUREUIL', 'AS 355F-1', 'AS-332L', 'AS-341G', 'AS-350-B3', 'AS-350B1', 'AS-350B2', 'AS-350BII', 'AS-355F1', 'AS-365-N2', 'AS315B',
                        'AS332', 'AS350', 'AS350 BA', 'AS350-B', 'AS350-B3', 'AS350-BH', 'AS350-D', 'AS350B3', 'AS350D ASTAR', 'AS355F-1', 'AS355F2', 'AS365N', 'SA 315', 'SA 316B', 'SA319B Alouette III',
                        'SA330J', 'SA360C DAUPHIN', 'SA365-N1', 'SA365N', 'SE 3180', 'SE 318C', 'SE316B', 'SF3130']), 'Aircraft_Category'] = 'Helicopter'

df.loc[df['Model'].isin(['ATR 42-320', 'ATR 72-212', 'ATR-42-500', 'ATR-72-12', 'ATR42-300', 'ATR72-212', 'SN-601', 'TB-10', 'TB21', 'CONCORDE VERSION 101', 'Concorde', 'ND-26']), 'Aircraft_Category'] = 'Airplane'

df[df['Make'].isin(['Aerospatiale']) & df['Aircraft_Category'].isna()].value_counts('Model', dropna=False).head(60)

Series([], Name: count, dtype: int64)

In [834]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Douglas           173
North American    161
Taylorcraft       144
Rockwell          141
Sikorsky          106
Burkhart Grob     100
Fairchild          97
Lockheed           94
Balloon Works      87
Ayres              87
Name: count, dtype: int64

In [835]:
df[df['Make'].str.lower().str.startswith('dougl')].value_counts('Make')

Make
Douglas                250
DOUGLAS                 26
DOUGLAS BRIAN G          1
DOUGLAS K THOMPSON       1
Douglas A. Pohl          1
Douglas C. Campbell      1
Douglas D. Turner        1
Douglas Maselink         1
Douglas Swanningson      1
Douglas/basler           1
Name: count, dtype: int64

In [836]:
df.loc[df['Make'].isin(['Douglas', 'DOUGLAS']), 'Make'] = 'Douglas'

In [837]:
df[df['Make'] == 'Douglas'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         174
Airplane    102
Name: count, dtype: int64

In [838]:
df.loc[df['Make'] == 'Douglas', 'Aircraft_Category'] = 'Airplane'

In [839]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
North American    161
Taylorcraft       144
Rockwell          141
Sikorsky          106
Burkhart Grob     100
Fairchild          97
Lockheed           94
Balloon Works      87
Ayres              87
Swearingen         86
Name: count, dtype: int64

In [840]:
df[df['Make'].str.lower().str.startswith('north a')].value_counts('Make')

Make
North American                    294
NORTH AMERICAN                     79
North American Rockwell Corp.       5
NORTH AMERICAN/AERO CLASSICS        3
North American Aviation Div.        2
NORTH AMERICAN-MEDORE               1
NORTH AMERICAN/SCHWAMM              1
NORTH AMERICAN/VICTORIA MNT LT      1
North American Rockwell             1
North American-aero Classics        1
North American-barene               1
North American-kenney               1
North American-maslon               1
North American/aero Classics        1
Name: count, dtype: int64

In [841]:
df.loc[df['Make'].isin(['NORTH AMERICAN']), 'Make'] = 'North American'

df[df['Make'] == 'North American'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    212
NaN         161
Name: count, dtype: int64

In [842]:
df.loc[df['Make'] == 'North American', 'Aircraft_Category'] = 'Airplane'

In [843]:
df[df['Make'].str.lower().str.startswith('taylorc')].value_counts('Make')

Make
Taylorcraft                   316
TAYLORCRAFT                    62
TAYLORCRAFT AVIATION CORP       5
TAYLORCRAFT AVIATION CORP.      3
TAYLORCRAFT CORP                1
Taylorcraft Aviation            1
Taylorcraft Corporation         1
Name: count, dtype: int64

In [844]:
df.loc[df['Make'].isin(['TAYLORCRAFT', 'TAYLORCRAFT AVIATION CORP', 'TAYLORCRAFT AVIATION CORP.', 'TAYLORCRAFT CORP', 'Taylorcraft Aviation', 'Taylorcraft Corporation']), 'Make'] = 'Taylorcraft'

df[df['Make'] == 'Taylorcraft'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    244
NaN         145
Name: count, dtype: int64

In [845]:
df.loc[df['Make'] == 'Taylorcraft', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Rockwell         141
Sikorsky         106
Burkhart Grob    100
Fairchild         97
Lockheed          94
Ayres             87
Balloon Works     87
Swearingen        86
Mitsubishi        84
Hiller            82
Name: count, dtype: int64

In [846]:
df[df['Make'].str.lower().str.startswith('rockw')].value_counts('Make')

Make
Rockwell                  328
ROCKWELL INTERNATIONAL     53
ROCKWELL                   24
Rockwell International     22
Rockwell Commander          3
Rockwell Intl               2
Rockwell Intl.              2
ROCKWELL COMMANDER          1
Rockwell Comdr              1
Rockwell Int't              1
Name: count, dtype: int64

In [847]:
df.loc[df['Make'].isin(['ROCKWELL', 'ROCKWELL INTERNATIONAL', 'Rockwell International', 'Rockwell Intl', 'Rockwell Intl.', 'Rockwell Int\'t']), 'Make'] = 'Rockwell'

df[df['Make'] == 'Rockwell'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    284
NaN         148
Name: count, dtype: int64

In [848]:
df.loc[df['Make'] == 'Rockwell', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Sikorsky             106
Burkhart Grob        100
Fairchild             97
Lockheed              94
Balloon Works         87
Ayres                 87
Swearingen            86
Mitsubishi            84
Hiller                82
British Aerospace     79
Name: count, dtype: int64

In [849]:
df[df['Make'].str.lower().str.startswith('sikor')].value_counts('Make')

Make
Sikorsky                         153
SIKORSKY                          76
SIKORSKY AIRCRAFT CORP             1
SIKORSKY AIRCRAFT CORPORATION      1
Sikorsky/orlando                   1
Name: count, dtype: int64

In [850]:
df.loc[df['Make'].isin(['SIKORSKY', 'SIKORSKY AIRCRAFT CORP', 'SIKORSKY AIRCRAFT CORPORATION']), 'Make'] = 'Sikorsky'

df[df['Make'] == 'Sikorsky'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Helicopter    118
NaN           113
Name: count, dtype: int64

In [851]:
df.loc[df['Make'] == 'Sikorsky', 'Aircraft_Category'] = 'Helicopter'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Burkhart Grob        100
Fairchild             97
Lockheed              94
Balloon Works         87
Ayres                 87
Swearingen            86
Mitsubishi            84
Hiller                82
British Aerospace     79
Embraer               75
Name: count, dtype: int64

In [852]:
df[df['Make'] == 'Burkhart Grob'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN       100
Glider      9
Name: count, dtype: int64

So many NaN and only 9 gliders, so I'm going to check out the models just to make sure that I should fill in Burkhart Grob category as Glider

In [853]:
df[df['Make'] == 'Burkhart Grob'].value_counts('Model', dropna=False)

Model
G103                    15
G-103A                   7
103                      6
G102                     6
G-109B                   5
G-103                    5
G103 TWIN ASTIR          4
G 103 Twin II            4
G109B                    3
G-102                    3
103A                     3
G103 TWIN II             2
G103 Twin Astir          2
G-103-TWIN II            2
G102 ASTIR CS            2
G103A                    2
G-103A Twin II Acro      2
G102 Club Astir IIIB     2
102                      2
G 103 TWIN II            2
G-103 TWIN II            1
G102-111B                1
SPEED ASTIR II           1
G10Z ASTIR CS            1
103C                     1
G103C TWIN III ACRO      1
G103B                    1
109                      1
G103-TWINA               1
G103 Twin II             1
109A                     1
109B                     1
6103 TWIN ASTIR          1
G103 FLUGZEUGBAU         1
A103 TWIN II             1
G102 Std Astir III       1
G-103-II AERO         

Google tells me all these models are gliders.

In [854]:
df[df['Make'].str.lower().str.startswith('burkha')].value_counts('Make')

Make
Burkhart Grob                109
BURKHART GROB                 11
Burkhart Grob Flugzeugbau      9
BURKHART GROB FLUGZEUGBAU      6
Burkhart Grob Flugzeugbah      1
Burkhart-grob                  1
Name: count, dtype: int64

In [855]:
df.loc[df['Make'].isin(['BURKHART GROB', 'Burkhart Grob Flugzeugbau', 'BURKHART GROB FLUGZEUGBAU', 'Burkhart Grob Flugzeugbah', 'Burkhart-grob']), 'Make'] = 'Burkhart Grob'

df.loc[df['Make'] == 'Burkhart Grob', 'Aircraft_Category'] = 'Glider'

In [856]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Fairchild            97
Lockheed             94
Balloon Works        87
Ayres                87
Swearingen           86
Mitsubishi           84
Hiller               82
British Aerospace    79
Embraer              75
Enstrom              74
Name: count, dtype: int64

In [857]:
df[df['Make'].str.lower().str.startswith('fairchi')].value_counts('Make')

Make
Fairchild                131
Fairchild Hiller          35
FAIRCHILD                 27
Fairchild Swearingen      11
FAIRCHILD HILLER           4
Fairchild Dornier          3
FAIRCHILD HELI-PORTER      2
FAIRCHILD(HOWARD)          2
FAIRCHILD FUNK             1
Fairchild Heli-porter      1
Fairchild Industries       1
Fairchild Merlin           1
Fairchild-heliporter       1
Fairchild/swearingen       1
Name: count, dtype: int64

In [858]:
df.loc[df['Make'].isin(['Fairchild Hiller', 'FAIRCHILD', 'Fairchild Swearingen', 'FAIRCHILD HILLER', 'Fairchild Dornier', 'FAIRCHILD HELI-PORTER', 'FAIRCHILD(HOWARD)', 'FAIRCHILD FUNK',
                       'Fairchild Heli-porter', 'Fairchild Industries', 'Fairchild Merlin', 'Fairchild-heliporter', 'Fairchild/swearingen']), 'Make'] = 'Fairchild'

df[df['Make'] == 'Fairchild'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           113
Airplane       73
Helicopter     35
Name: count, dtype: int64

In [859]:
df[df['Make'].isin(['Fairchild']) & df['Aircraft_Category'].isna()].value_counts('Model', dropna=False).head(60)

Model
SA-227AC             8
SA-227-AC            7
M-62A-3              6
SA-226-TC            4
24G                  4
SA227AC              4
SA-227               4
M-62C                4
SA226T               3
F-27                 3
DO-328-300           3
24W-46               3
SA227                2
C-82A                2
M-62A                2
SA227-AT             2
SA 227               2
SA 227-AC            2
PT-19                2
24R-46A              2
24R-40               2
SA227BC              2
M-62                 2
SA 227-TT Merlin     1
SA-2226-TC           1
SA-226-T             1
24 C8C               1
SA-226T              1
SA-226TC             1
SA-227-TT            1
PT-26B               1
SA-266TC             1
SA226-T              1
SA227-DC             1
SA227-TT             1
Pilatus PC6/B2-H2    1
M62A (PT-19)         1
PT-23                1
PT-19A               1
24-J                 1
24R-46               1
24W-40               1
42                   1
C-119

The only models in this list that are helicopters are the FH1100 and FH-100. All the rest fall into the Airplane category.

In [860]:
df.loc[df['Model'].isin(['FH1100', 'FH-100']), 'Aircraft_Category'] = 'Helicopter'

In [861]:
# Make the rest of the NaN category values Airplane for Fairchild
df.loc[(df['Make'].isin(['Fairchild'])) & (df['Aircraft_Category'].isna()), 'Aircraft_Category'] = 'Airplane'

df[df['Make'] == 'Fairchild'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane      184
Helicopter     37
Name: count, dtype: int64

In [862]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Lockheed             94
Ayres                87
Balloon Works        87
Swearingen           86
Mitsubishi           84
Hiller               81
British Aerospace    79
Embraer              75
Enstrom              74
Pitts                73
Name: count, dtype: int64

In [863]:
df[df['Make'].str.lower().str.startswith('lockh')].value_counts('Make')

Make
Lockheed    111
LOCKHEED     11
Name: count, dtype: int64

In [864]:
df.loc[df['Make'].isin(['LOCKHEED']), 'Make'] = 'Lockheed'

df[df['Make'] == 'Lockheed'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN             94
Airplane        27
Powered-Lift     1
Name: count, dtype: int64

In [865]:
df.loc[df['Make'] == 'Lockheed', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Balloon Works        87
Ayres                87
Swearingen           86
Mitsubishi           84
Hiller               81
British Aerospace    79
Embraer              75
Enstrom              74
Pitts                73
Aerostar             72
Name: count, dtype: int64

In [866]:
df[df['Make'].str.lower().str.startswith('ayre')].value_counts('Make')

Make
Ayres                213
AYRES CORPORATION     38
AYRES                 23
Ayres Corporation      7
AYRES THRUSH           2
AYRES CORP             1
Name: count, dtype: int64

In [867]:
df.loc[df['Make'].isin(['AYRES CORPORATION', 'AYRES', 'Ayres Corporation', 'AYRES THRUSH', 'AYRES CORP']), 'Make'] = 'Ayres'

df[df['Make'] == 'Ayres'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    194
NaN          90
Name: count, dtype: int64

In [868]:
df.loc[df['Make'] == 'Ayres', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Balloon Works        87
Swearingen           86
Mitsubishi           84
Hiller               81
British Aerospace    79
Embraer              75
Enstrom              74
Pitts                73
Aerostar             72
Unknown              65
Name: count, dtype: int64

In [869]:
df[df['Make'].str.lower().str.startswith('balloo')].value_counts('Make')

Make
Balloon Works              135
BALLOON WORKS                8
Balloon Works Inc            1
Balloonbau Woerner Gmbh      1
Name: count, dtype: int64

In [870]:
df.loc[df['Make'].isin(['BALLOON WORKS', 'Balloon Works Inc']), 'Make'] = 'Balloon Works'

df[df['Make'] == 'Balloon Works'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN        87
Balloon    57
Name: count, dtype: int64

In [871]:
df.loc[df['Make'] == 'Balloon Works', 'Aircraft_Category'] = 'Balloon'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Swearingen           86
Mitsubishi           84
Hiller               81
British Aerospace    79
Embraer              75
Enstrom              74
Pitts                73
Aerostar             72
Unknown              65
Raven                64
Name: count, dtype: int64

In [872]:
df[df['Make'].str.lower().str.startswith('swearin')].value_counts('Make')

Make
Swearingen                  141
SWEARINGEN                   29
Swearingen T R/masters W      1
Name: count, dtype: int64

In [873]:
df.loc[df['Make'].isin(['SWEARINGEN', 'Swearingen T R/masters W']), 'Make'] = 'Swearingen'

df[df['Make'] == 'Swearingen'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         88
Airplane    83
Name: count, dtype: int64

In [874]:
df.loc[df['Make'] == 'Swearingen', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Mitsubishi           84
Hiller               81
British Aerospace    79
Embraer              75
Enstrom              74
Pitts                73
Aerostar             72
Unknown              65
Learjet              64
Raven                64
Name: count, dtype: int64

In [875]:
df[df['Make'].str.lower().str.startswith('mitsub')].value_counts('Make')

Make
Mitsubishi    126
MITSUBISHI     16
Name: count, dtype: int64

In [876]:
df.loc[df['Make'].isin(['MITSUBISHI']), 'Make'] = 'Mitsubishi'

df[df['Make'] == 'Mitsubishi'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         84
Airplane    57
Unknown      1
Name: count, dtype: int64

In [877]:
df.loc[df['Make'] == 'Mitsubishi', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Hiller               81
British Aerospace    79
Embraer              75
Enstrom              74
Pitts                73
Aerostar             72
Unknown              65
Learjet              64
Raven                64
Mbb                  62
Name: count, dtype: int64

In [878]:
df[df['Make'].str.lower().str.startswith('hille')].value_counts('Make')

Make
Hiller                        311
HILLER                         37
Hiller-soloy                    9
HILLER-ROGERSON HELICOPTER      1
HILLER-TRI-PLEX IND.INC.        1
Hiller-osborn                   1
Hillery W. Grice                1
Name: count, dtype: int64

In [881]:
df.loc[df['Make'].isin(['HILLER', 'Hiller-soloy', 'HILLER-ROGERSON HELICOPTER', 'HILLER-TRI-PLEX IND.INC.', 'Hiller-osborn']), 'Make'] = 'Hiller'

df[df['Make'] == 'Hiller'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Helicopter           274
NaN                   85
Powered Parachute      1
Name: count, dtype: int64

In [882]:
df.loc[df['Make'] == 'Hiller', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
British Aerospace    79
Embraer              75
Enstrom              74
Pitts                73
Aerostar             72
Unknown              65
Learjet              64
Raven                64
Mbb                  62
Waco                 60
Name: count, dtype: int64

# Exploratory Data Analysis

# Conclusions

## Limitations

## Recommendations

## Next Steps