# Business Understanding

The business is interested in expanding its portfolio by becoming involved in the aviation industry, specifically as an owner/operator of aircraft for short-range corporate transportation that could involve light planes and helicopters.

I have been tasked with helping to determine some of the risks and suggesting which aircraft would be best suited for the company at the beginning stages of their new aviation division.

The stakeholders involved here would include not only the owners of the company, but also the department heads and employees of the aviation division that oversee and operate the aircraft for the company.

The goals for this project include recommending what kind of aircraft would provide the least risk for a commercial enterprise and suggesting certain operating protocols to help mitigate those risks.

# Data Understanding

The dataset being made available for this project is the National Transportation Safety Board aviation accident database as hosted on Kaggle.com at <a href="https://www.kaggle.com/datasets/khsamaha/aviation-accident-database-synopses" target="_blank">this link</a>. This dataset contains information about civil aviation accidents mainly in the US and includes many types of aircraft, from hot air balloons and powered parachutes to helicopters and airplanes. The current dataset contains 87,951 unique "Event ID" numbers, each representing an aircraft incident. It currently covers the years mainly from 1982 through 2022, with just a handful of accidents recorded before 1982. The dataset has 31 columns for each accident investigation that includes information like date and location, type of aircraft, make and model, injury severity information and number of injured, aircraft damage level, phase of flight for the accident, weather conditions, and reasons for the accident after the investigation is complete.

As the project is centered around risks of aviation, this dataset should prove to be a valuable resource for determining what kinds of risks exist in operating aircraft and making recommendations as far as what type of aircraft would be less of an investment risk. The columns detailing injury levels (Fatal, Serious, Minor, and Uninjured) to passengers and crew illuminate the human risks in aviation. Information related to aircraft damage levels will be valuable in terms of the financial risks.

Of concern in working with the dataset will be the lack of values in certain columns, especially the aircraft category and the accident reason columns. The "Aircraft Category" column is currently 64% empty, and the "Report Status" column (which provides a reason for the accident) is over 70% lacking in useful information. These two columns especially will need some in-depth cleaning and preparation.

# Data Preparation

## Data Cleaning

The dataset is named AviationData.csv and is in the data folder

In [1075]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv('data/AviationData.csv', encoding='latin-1')

df.head()

  df = pd.read_csv('data/AviationData.csv', encoding='latin-1')


Unnamed: 0,Event.Id,Investigation.Type,Accident.Number,Event.Date,Location,Country,Latitude,Longitude,Airport.Code,Airport.Name,...,Purpose.of.flight,Air.carrier,Total.Fatal.Injuries,Total.Serious.Injuries,Total.Minor.Injuries,Total.Uninjured,Weather.Condition,Broad.phase.of.flight,Report.Status,Publication.Date
0,20001218X45444,Accident,SEA87LA080,1948-10-24,"MOOSE CREEK, ID",United States,,,,,...,Personal,,2.0,0.0,0.0,0.0,UNK,Cruise,Probable Cause,
1,20001218X45447,Accident,LAX94LA336,1962-07-19,"BRIDGEPORT, CA",United States,,,,,...,Personal,,4.0,0.0,0.0,0.0,UNK,Unknown,Probable Cause,19-09-1996
2,20061025X01555,Accident,NYC07LA005,1974-08-30,"Saltville, VA",United States,36.922223,-81.878056,,,...,Personal,,3.0,,,,IMC,Cruise,Probable Cause,26-02-2007
3,20001218X45448,Accident,LAX96LA321,1977-06-19,"EUREKA, CA",United States,,,,,...,Personal,,2.0,0.0,0.0,0.0,IMC,Cruise,Probable Cause,12-09-2000
4,20041105X01764,Accident,CHI79FA064,1979-08-02,"Canton, OH",United States,,,,,...,Personal,,1.0,2.0,,0.0,VMC,Approach,Probable Cause,16-04-1980


rename columns to remove dots as they may cause errors in Python (replace dots with underscores)

In [1076]:
df.columns = df.columns.str.replace('.', '_')

df.head()

Unnamed: 0,Event_Id,Investigation_Type,Accident_Number,Event_Date,Location,Country,Latitude,Longitude,Airport_Code,Airport_Name,...,Purpose_of_flight,Air_carrier,Total_Fatal_Injuries,Total_Serious_Injuries,Total_Minor_Injuries,Total_Uninjured,Weather_Condition,Broad_phase_of_flight,Report_Status,Publication_Date
0,20001218X45444,Accident,SEA87LA080,1948-10-24,"MOOSE CREEK, ID",United States,,,,,...,Personal,,2.0,0.0,0.0,0.0,UNK,Cruise,Probable Cause,
1,20001218X45447,Accident,LAX94LA336,1962-07-19,"BRIDGEPORT, CA",United States,,,,,...,Personal,,4.0,0.0,0.0,0.0,UNK,Unknown,Probable Cause,19-09-1996
2,20061025X01555,Accident,NYC07LA005,1974-08-30,"Saltville, VA",United States,36.922223,-81.878056,,,...,Personal,,3.0,,,,IMC,Cruise,Probable Cause,26-02-2007
3,20001218X45448,Accident,LAX96LA321,1977-06-19,"EUREKA, CA",United States,,,,,...,Personal,,2.0,0.0,0.0,0.0,IMC,Cruise,Probable Cause,12-09-2000
4,20041105X01764,Accident,CHI79FA064,1979-08-02,"Canton, OH",United States,,,,,...,Personal,,1.0,2.0,,0.0,VMC,Approach,Probable Cause,16-04-1980


In [1077]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 88889 entries, 0 to 88888
Data columns (total 31 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Event_Id                88889 non-null  object 
 1   Investigation_Type      88889 non-null  object 
 2   Accident_Number         88889 non-null  object 
 3   Event_Date              88889 non-null  object 
 4   Location                88837 non-null  object 
 5   Country                 88663 non-null  object 
 6   Latitude                34382 non-null  object 
 7   Longitude               34373 non-null  object 
 8   Airport_Code            50132 non-null  object 
 9   Airport_Name            52704 non-null  object 
 10  Injury_Severity         87889 non-null  object 
 11  Aircraft_damage         85695 non-null  object 
 12  Aircraft_Category       32287 non-null  object 
 13  Registration_Number     87507 non-null  object 
 14  Make                    88826 non-null

### As Event ID provides a unique identifier for each incident, let's check for duplicate rows

In [1078]:
df[df.duplicated(subset=['Event_Id'], keep=False)]

Unnamed: 0,Event_Id,Investigation_Type,Accident_Number,Event_Date,Location,Country,Latitude,Longitude,Airport_Code,Airport_Name,...,Purpose_of_flight,Air_carrier,Total_Fatal_Injuries,Total_Serious_Injuries,Total_Minor_Injuries,Total_Uninjured,Weather_Condition,Broad_phase_of_flight,Report_Status,Publication_Date
117,20020917X01908,Accident,DCA82AA012B,1982-01-19,"ROCKPORT, TX",United States,,,RKP,ARANSAS COUNTY AIRPORT,...,Personal,,3.0,0.0,0.0,0.0,IMC,Approach,Probable Cause,19-01-1983
118,20020917X01908,Accident,DCA82AA012A,1982-01-19,"ROCKPORT, TX",United States,,,RKP,ARANSAS COUNTY AIRPORT,...,Executive/corporate,,3.0,0.0,0.0,0.0,IMC,Approach,Probable Cause,19-01-1983
153,20020917X02259,Accident,LAX82FA049A,1982-01-23,"VICTORVILLE, CA",United States,,,,,...,Personal,,2.0,0.0,4.0,0.0,VMC,Unknown,Probable Cause,23-01-1983
158,20020917X02400,Accident,MIA82FA038B,1982-01-23,"NEWPORT RICHEY, FL",United States,,,,,...,Personal,,0.0,0.0,0.0,3.0,VMC,Cruise,Probable Cause,23-01-1983
159,20020917X02400,Accident,MIA82FA038A,1982-01-23,"NEWPORT RICHEY, FL",United States,,,,,...,Personal,,0.0,0.0,0.0,3.0,VMC,Approach,Probable Cause,23-01-1983
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
88796,20221121106336,Accident,WPR23LA041,2022-11-18,"Las Vegas, NV",United States,361239N,1151140W,VGT,NORTH LAS VEGAS,...,Instructional,702 HELICOPTER INC,0.0,0.0,0.0,3.0,VMC,,,07-12-2022
88797,20221122106340,Incident,DCA23WA071,2022-11-18,"Marrakech,",Morocco,,,,,...,,British Airways,0.0,0.0,0.0,0.0,,,,
88798,20221122106340,Incident,DCA23WA071,2022-11-18,"Marrakech,",Morocco,,,,,...,,Valair Private Jets,0.0,0.0,0.0,0.0,,,,
88813,20221123106354,Accident,WPR23LA045,2022-11-22,"San Diego, CA",United States,323414N,1165825W,SDM,Brown Field Municipal Airport,...,Instructional,HeliStream Inc.,0.0,0.0,0.0,4.0,VMC,,,22-12-2022


I see here that though these duplicate rows do represent separate aircraft in multi-aircraft incidents, the injury and/or fatality numbers are combined. This would constitute duplicate numbers in certain columns that would render errors in the analysis when making use of the injury values.

So let's remove the duplicates from this subset.

In [1079]:
df = df.drop_duplicates(subset=['Event_Id'], keep='first')

# Double check to make sure duplicates have been removed
df[df.duplicated(subset=['Event_Id'], keep=False)]

Unnamed: 0,Event_Id,Investigation_Type,Accident_Number,Event_Date,Location,Country,Latitude,Longitude,Airport_Code,Airport_Name,...,Purpose_of_flight,Air_carrier,Total_Fatal_Injuries,Total_Serious_Injuries,Total_Minor_Injuries,Total_Uninjured,Weather_Condition,Broad_phase_of_flight,Report_Status,Publication_Date


In [1080]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 87951 entries, 0 to 88888
Data columns (total 31 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Event_Id                87951 non-null  object 
 1   Investigation_Type      87951 non-null  object 
 2   Accident_Number         87951 non-null  object 
 3   Event_Date              87951 non-null  object 
 4   Location                87899 non-null  object 
 5   Country                 87729 non-null  object 
 6   Latitude                34212 non-null  object 
 7   Longitude               34203 non-null  object 
 8   Airport_Code            49484 non-null  object 
 9   Airport_Name            52031 non-null  object 
 10  Injury_Severity         86961 non-null  object 
 11  Aircraft_damage         84848 non-null  object 
 12  Aircraft_Category       32181 non-null  object 
 13  Registration_Number     86601 non-null  object 
 14  Make                    87888 non-null  obj

## Columns that are not needed
Remove certain columns that are mostly empty (and can't be filled in) and/or would not contain data useful to the intended analysis.

I want to make heavy use of: date, injury, damage, category, phase of flight, and report status
Let's remove Latitude, Longitude, Airport_Code, Airport_Name, Registration_Number, FAR_Description, Schedule, Air_carrier, and Publication_Date as those columns are either mostly empty or would not contribute to the analysis.

In [1081]:
df = df.drop(['Latitude', 'Longitude', 'Airport_Code', 'Airport_Name', 'Registration_Number', 'FAR_Description', 'Schedule', 'Air_carrier', 'Publication_Date'], axis=1)

df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 87951 entries, 0 to 88888
Data columns (total 22 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Event_Id                87951 non-null  object 
 1   Investigation_Type      87951 non-null  object 
 2   Accident_Number         87951 non-null  object 
 3   Event_Date              87951 non-null  object 
 4   Location                87899 non-null  object 
 5   Country                 87729 non-null  object 
 6   Injury_Severity         86961 non-null  object 
 7   Aircraft_damage         84848 non-null  object 
 8   Aircraft_Category       32181 non-null  object 
 9   Make                    87888 non-null  object 
 10  Model                   87859 non-null  object 
 11  Amateur_Built           87851 non-null  object 
 12  Number_of_Engines       81924 non-null  float64
 13  Engine_Type             80908 non-null  object 
 14  Purpose_of_flight       81829 non-null  obj

In [1082]:
df.head()

Unnamed: 0,Event_Id,Investigation_Type,Accident_Number,Event_Date,Location,Country,Injury_Severity,Aircraft_damage,Aircraft_Category,Make,...,Number_of_Engines,Engine_Type,Purpose_of_flight,Total_Fatal_Injuries,Total_Serious_Injuries,Total_Minor_Injuries,Total_Uninjured,Weather_Condition,Broad_phase_of_flight,Report_Status
0,20001218X45444,Accident,SEA87LA080,1948-10-24,"MOOSE CREEK, ID",United States,Fatal(2),Destroyed,,Stinson,...,1.0,Reciprocating,Personal,2.0,0.0,0.0,0.0,UNK,Cruise,Probable Cause
1,20001218X45447,Accident,LAX94LA336,1962-07-19,"BRIDGEPORT, CA",United States,Fatal(4),Destroyed,,Piper,...,1.0,Reciprocating,Personal,4.0,0.0,0.0,0.0,UNK,Unknown,Probable Cause
2,20061025X01555,Accident,NYC07LA005,1974-08-30,"Saltville, VA",United States,Fatal(3),Destroyed,,Cessna,...,1.0,Reciprocating,Personal,3.0,,,,IMC,Cruise,Probable Cause
3,20001218X45448,Accident,LAX96LA321,1977-06-19,"EUREKA, CA",United States,Fatal(2),Destroyed,,Rockwell,...,1.0,Reciprocating,Personal,2.0,0.0,0.0,0.0,IMC,Cruise,Probable Cause
4,20041105X01764,Accident,CHI79FA064,1979-08-02,"Canton, OH",United States,Fatal(1),Destroyed,,Cessna,...,,,Personal,1.0,2.0,,0.0,VMC,Approach,Probable Cause


### Incomplete Columns
Now, we have 87,951 entries in the dataset. Most of the columns are incomplete though. For the columns that cannot be completed with reasonable values, we can fill some of them in with 'Unknown' instead of leaving them blank (NaN).

Empty Location, Country, Aircraft_damage, Make, Model, Amateur_Built, Number_of_Engines, Engine_Type, Purpose_of_flight, Weather_Condition, Broad_phase_of_flight, and Report_Status values can be filled in as 'Unknown'.

In [1083]:
# Fill in NaN values in multiple columns with "Unknown"
columns_to_fill = ['Location', 'Country', 'Aircraft_damage', 'Make', 'Model', 'Amateur_Built', 'Number_of_Engines', 'Engine_Type', 'Purpose_of_flight', 
                   'Weather_Condition', 'Broad_phase_of_flight', 'Report_Status']
for column in columns_to_fill:
    df[column] = df[column].fillna('Unknown')

df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 87951 entries, 0 to 88888
Data columns (total 22 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Event_Id                87951 non-null  object 
 1   Investigation_Type      87951 non-null  object 
 2   Accident_Number         87951 non-null  object 
 3   Event_Date              87951 non-null  object 
 4   Location                87951 non-null  object 
 5   Country                 87951 non-null  object 
 6   Injury_Severity         86961 non-null  object 
 7   Aircraft_damage         87951 non-null  object 
 8   Aircraft_Category       32181 non-null  object 
 9   Make                    87951 non-null  object 
 10  Model                   87951 non-null  object 
 11  Amateur_Built           87951 non-null  object 
 12  Number_of_Engines       87951 non-null  object 
 13  Engine_Type             87951 non-null  object 
 14  Purpose_of_flight       87951 non-null  obj

The 4 injury columns (15 - 18) are incomplete, but they are float64, or integer, values, so we can't fill those empty values with "Unknown". The empty values should be changed to 0 to complete those columns.

In [1084]:
# Fill in NaN values in multiple columns with 0
injury_columns_to_fill = ['Total_Fatal_Injuries', 'Total_Serious_Injuries', 'Total_Minor_Injuries', 'Total_Uninjured']
for column in injury_columns_to_fill:
    df[column] = df[column].fillna(0)

df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 87951 entries, 0 to 88888
Data columns (total 22 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Event_Id                87951 non-null  object 
 1   Investigation_Type      87951 non-null  object 
 2   Accident_Number         87951 non-null  object 
 3   Event_Date              87951 non-null  object 
 4   Location                87951 non-null  object 
 5   Country                 87951 non-null  object 
 6   Injury_Severity         86961 non-null  object 
 7   Aircraft_damage         87951 non-null  object 
 8   Aircraft_Category       32181 non-null  object 
 9   Make                    87951 non-null  object 
 10  Model                   87951 non-null  object 
 11  Amateur_Built           87951 non-null  object 
 12  Number_of_Engines       87951 non-null  object 
 13  Engine_Type             87951 non-null  object 
 14  Purpose_of_flight       87951 non-null  obj

In [1085]:
df.head()

Unnamed: 0,Event_Id,Investigation_Type,Accident_Number,Event_Date,Location,Country,Injury_Severity,Aircraft_damage,Aircraft_Category,Make,...,Number_of_Engines,Engine_Type,Purpose_of_flight,Total_Fatal_Injuries,Total_Serious_Injuries,Total_Minor_Injuries,Total_Uninjured,Weather_Condition,Broad_phase_of_flight,Report_Status
0,20001218X45444,Accident,SEA87LA080,1948-10-24,"MOOSE CREEK, ID",United States,Fatal(2),Destroyed,,Stinson,...,1.0,Reciprocating,Personal,2.0,0.0,0.0,0.0,UNK,Cruise,Probable Cause
1,20001218X45447,Accident,LAX94LA336,1962-07-19,"BRIDGEPORT, CA",United States,Fatal(4),Destroyed,,Piper,...,1.0,Reciprocating,Personal,4.0,0.0,0.0,0.0,UNK,Unknown,Probable Cause
2,20061025X01555,Accident,NYC07LA005,1974-08-30,"Saltville, VA",United States,Fatal(3),Destroyed,,Cessna,...,1.0,Reciprocating,Personal,3.0,0.0,0.0,0.0,IMC,Cruise,Probable Cause
3,20001218X45448,Accident,LAX96LA321,1977-06-19,"EUREKA, CA",United States,Fatal(2),Destroyed,,Rockwell,...,1.0,Reciprocating,Personal,2.0,0.0,0.0,0.0,IMC,Cruise,Probable Cause
4,20041105X01764,Accident,CHI79FA064,1979-08-02,"Canton, OH",United States,Fatal(1),Destroyed,,Cessna,...,Unknown,Unknown,Personal,1.0,2.0,0.0,0.0,VMC,Approach,Probable Cause


Take a look at the Injury_Severity column

In [1086]:
df['Injury_Severity'].value_counts(dropna=False)

Injury_Severity
Non-Fatal     66822
Fatal(1)       6086
Fatal          5257
Fatal(2)       3632
Incident       2113
              ...  
Fatal(33)         1
Fatal(123)        1
Fatal(72)         1
Fatal(54)         1
Fatal(189)        1
Name: count, Length: 110, dtype: int64

We see here the various values in that column give a number of fatal injuries for each accident. Since this number is already represented in the column for Total_Fatal_Injuries, we don't need this column, so can delete it.

In [1087]:
# Drop the Injury_Severity column
df = df.drop('Injury_Severity', axis=1)

df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 87951 entries, 0 to 88888
Data columns (total 21 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Event_Id                87951 non-null  object 
 1   Investigation_Type      87951 non-null  object 
 2   Accident_Number         87951 non-null  object 
 3   Event_Date              87951 non-null  object 
 4   Location                87951 non-null  object 
 5   Country                 87951 non-null  object 
 6   Aircraft_damage         87951 non-null  object 
 7   Aircraft_Category       32181 non-null  object 
 8   Make                    87951 non-null  object 
 9   Model                   87951 non-null  object 
 10  Amateur_Built           87951 non-null  object 
 11  Number_of_Engines       87951 non-null  object 
 12  Engine_Type             87951 non-null  object 
 13  Purpose_of_flight       87951 non-null  object 
 14  Total_Fatal_Injuries    87951 non-null  flo

### Aircraft_Category
The category of aircraft is important to the analysis, but the column is mostly empty.

Many of the empty values can be filled in using the Make column, though.

In [1088]:
df['Aircraft_Category'].value_counts(dropna=False)

Aircraft_Category
NaN                  55770
Airplane             27520
Helicopter            3434
Glider                 505
Balloon                231
Gyrocraft              173
Weight-Shift           161
Powered Parachute       91
Ultralight              30
Unknown                 14
WSFT                     9
Powered-Lift             5
Blimp                    4
UNK                      2
Rocket                   1
ULTR                     1
Name: count, dtype: int64

In [1089]:
df['Make'].value_counts(dropna=False)

Make
Cessna           21925
Piper            11903
CESSNA            4914
Beech             4290
PIPER             2841
                 ...  
Geertz               1
Conrad Menzel        1
Blucher              1
Gideon               1
ROYSE RALPH L        1
Name: count, Length: 8202, dtype: int64

I see here that there may exist multiple versions of the same makes, like "Cessna" and "CESSNA". It would be nice to clean this column for multiple versions of make names.

We can start with Cessna since it has the most in value_counts and see what other versions of that name are in the dataset.

In [1090]:
# Show Make value beginning with ces, ignoring case
df[df['Make'].str.lower().str.startswith('ces')].value_counts('Make')

Make
Cessna                     21925
CESSNA                      4914
CESSNA AIRCRAFT CO            24
CESSNA AIRCRAFT                9
CESSNA AIRCRAFT COMPANY        9
Cessna Ector                   3
CESSNA ECTOR                   3
Cessna Aircraft Company        3
Cessna Wren                    2
CESSNA/AIR REPAIR INC          2
CESSNA/WEAVER                  1
Cessna Aircraft Co.            1
CESSNA REIMS                   1
CESSNA Aircraft                1
Cessna Reems                   1
Cessna Robertson               1
Cessna Skyhawk II              1
Cessna Soloy                   1
Cesna                          1
Name: count, dtype: int64

So all these makes can be cleaned by changing the values to "Cessna"

In [1091]:
# Convert all these cessna values to 'Cessna'
df.loc[df['Make'].str.lower().str.startswith('ces'), 'Make'] = 'Cessna'

df[df['Make'].str.lower().str.startswith('ces')].value_counts('Make')

Make
Cessna    26903
Name: count, dtype: int64

We now have almost 27000 Cessna makes instead. So we can now look at the category values for these makes.

In [1092]:
# Aircraft_Category values for Cessna in the Make column, include NaN
df[df['Make'] == 'Cessna'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         18406
Airplane     8496
Unknown         1
Name: count, dtype: int64

It looks like it would be safe to replace the empty category values (and 1 unknown) for the Cessna make with "Airplane"

In [1093]:
# Fill in Aircraft_Category as 'Airplane' for Cessna
df.loc[df['Make'] == 'Cessna', 'Aircraft_Category'] = 'Airplane'

In [1094]:
# Let's look at Make value_counts again
df['Make'].value_counts(dropna=False)

Make
Cessna           26903
Piper            11903
Beech             4290
PIPER             2841
Bell              2118
                 ...  
Gideon               1
Brault               1
Baldwin              1
Kirchner             1
ROYSE RALPH L        1
Name: count, Length: 8184, dtype: int64

In [1095]:
# Show Make value beginning with piper, ignoring case
df[df['Make'].str.lower().str.startswith('piper')].value_counts('Make')

Make
Piper                         11903
PIPER                          2841
PIPER AIRCRAFT INC               27
PIPER AIRCRAFT CORPORATION        8
PIPER AIRCRAFT                    4
Piper Aircraft                    3
Piper/cub Crafters                3
PIPER/CUB CRAFTERS                3
Piper Aircraft Corporation        3
Piper Aircraft, Inc.              2
Piper Aerostar                    2
PIPER / LAUDEMAN                  1
PIPER/WALLY'S FLYERS INC          1
PIPER-HARRIS                      1
Piper Cub Crafters                1
Piper Pawnee                      1
Piper-aerostar                    1
Piper/Cub Crafters                1
PIPER AIRCRAFT, INC.              1
Piper/stevens                     1
Name: count, dtype: int64

In [1096]:
# Convert all these piper values to 'Piper' and then take a look at its category values
df.loc[df['Make'].str.lower().str.startswith('piper'), 'Make'] = 'Piper'

df[df['Make'] == 'Piper'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         10045
Airplane     4763
Name: count, dtype: int64

In [1097]:
# Fill in the NaN values for the category column for Piper as "Airplane"
df.loc[df['Make'] == 'Piper', 'Aircraft_Category'] = 'Airplane'

In [1098]:
# Show Make value beginning with beech, ignoring case
df[df['Make'].str.lower().str.startswith('beech')].value_counts('Make')

Make
Beech                         4290
BEECH                         1042
Beechcraft                      24
BEECHCRAFT                       5
BEECH AIRCRAFT                   3
BEECH AIRCRAFT CORPORATION       2
Beech Aircraft Corporation       2
BEECH AIRCRAFT CO.               1
Beech Aircraft Corp              1
Beechcraft Corporation           1
Beecher                          1
Name: count, dtype: int64

A quick Google search tells me that Beech and Beechcraft are the same make.

In [1099]:
df.loc[df['Make'].str.lower().str.startswith('beech'), 'Make'] = 'Beech'

df[df['Make'] == 'Beech'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         3654
Airplane    1718
Name: count, dtype: int64

In [1100]:
df.loc[df['Make'] == 'Beech', 'Aircraft_Category'] = 'Airplane'

In [1101]:
df[df['Make'].str.lower().str.startswith('bell')].value_counts('Make')

Make
Bell                              2118
Bellanca                           874
BELL                               588
BELLANCA                           159
BELL HELICOPTER TEXTRON CANADA      23
BELL HELICOPTER TEXTRON             21
BELL HELICOPTER                      4
Bell-transworld                      3
Bell Helicopter                      3
Bell-k Copter                        2
Bell Helicopter Textron              2
Bell-carson                          2
BELL TEXTRON CANADA LTD              2
BELL HELICOPTER CO                   2
Bell-olympic Helicopters, Inc.       1
Bell-moore                           1
Bell-world                           1
Bell/soloy                           1
Bell/garlick                         1
Bell/mason                           1
Bell/textron                         1
Bell/tsirah                          1
Bellah                               1
Bellanca Aircraft Corporation        1
Bellanca Citabria                    1
Bell-kitz Kopters   

In [1102]:
# Bellanca and Bell are not the same make, so will take a little more work to clean all the various bell combinations.
# change the various interations of bell to Bell
df.loc[df['Make'].str.lower().str.startswith(('bell-', 'bell/', 'bell h', 'bell t', 'bell s', 'bell b', 'bell 4'), na=False), 'Make'] = 'Bell'

# make Bell and BELL the same
df.loc[(df['Make'] == 'BELL'), 'Make'] = 'Bell'

# address the various versions of Bellanca
df.loc[df['Make'].str.lower().str.startswith(('bellan'), na=False), 'Make'] = 'Bellanca'

# check the list again
df[df['Make'].str.lower().str.startswith('bell')].value_counts('Make')

Make
Bell              2792
Bellanca          1036
BELLER               1
BELLET JAMES J       1
Bellah               1
Name: count, dtype: int64

In [1103]:
# Now we can look at the categories for Bell and Bellanca
df[df['Make'] == 'Bell'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           1816
Helicopter     971
Airplane         3
Unknown          2
Name: count, dtype: int64

In [1104]:
# Bell can safely be changed to Helicopter for its category
df.loc[df['Make'] == 'Bell', 'Aircraft_Category'] = 'Helicopter'

df[df['Make'] == 'Bell'].value_counts('Aircraft_Category', dropna=False)

Flushing oldest 200 entries.
  warn('Output cache limit (currently {sz} entries) hit.\n'


Aircraft_Category
Helicopter    2792
Name: count, dtype: int64

In [1105]:
df[df['Make'] == 'Bellanca'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         753
Airplane    283
Name: count, dtype: int64

In [1106]:
df.loc[df['Make'] == 'Bellanca', 'Aircraft_Category'] = 'Airplane'

In [1107]:
# Let's look at Make value_counts again
df['Make'].value_counts(dropna=False)

Make
Cessna           26903
Piper            14808
Beech             5372
Bell              2792
Boeing            1512
                 ...  
Gideon               1
Brault               1
Baldwin              1
Kirchner             1
ROYSE RALPH L        1
Name: count, Length: 8118, dtype: int64

In [1108]:
# clean the boeing make
df[df['Make'].str.lower().str.startswith('boei')].value_counts('Make')

Make
Boeing                            1512
BOEING                            1140
Boeing Stearman                     48
BOEING COMPANY                       8
Boeing Vertol                        6
Boeing Helicopters Div.              3
Boeing - Canada (de Havilland)       2
BOEING 777-306ER                     1
BOEING COMPANY, LONG BEACH DIV       1
BOEING OF CANADA/DEHAV DIV           1
BOEING-STEARMAN                      1
BOEING-VERTOL                        1
Boeing (Stearman)                    1
Boeing Commercial Airplane Gro       1
Boeing Company                       1
Boeing-brown                         1
Name: count, dtype: int64

In [1109]:
# change the various iterations of boeing to Boeing
df.loc[df['Make'].str.lower().str.startswith('boeing'), 'Make'] = 'Boeing'

df[df['Make'] == 'Boeing'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN             1397
Airplane        1325
Helicopter         5
Powered-Lift       1
Name: count, dtype: int64

In [1110]:
# Boeing can safely be assigned to the Airplane category
df.loc[df['Make'] == 'Boeing', 'Aircraft_Category'] = 'Airplane'

In [1111]:
# Let's look at the top 50 Make value counts and see if there are any that can be cleaned up
df['Make'].value_counts().head(60)

Make
Cessna                            26903
Piper                             14808
Beech                              5372
Bell                               2792
Boeing                             2728
Mooney                             1080
Grumman                            1080
Bellanca                           1036
Robinson                            940
Hughes                              794
Schweizer                           628
Air Tractor                         588
Mcdonnell Douglas                   499
Aeronca                             479
Maule                               443
Champion                            426
De Havilland                        370
Aero Commander                      356
Stinson                             342
Aerospatiale                        334
Rockwell                            328
Taylorcraft                         316
Luscombe                            316
Hiller                              311
North American                     

In [1112]:
# change the various iterations of aeronca to Aeronca
df.loc[df['Make'].str.lower().str.startswith('aeronca'), 'Make'] = 'Aeronca'

df[df['Make'] == 'Aeronca'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         401
Airplane    232
Name: count, dtype: int64

In [1113]:
# Aeronca is airplane
df.loc[df['Make'] == 'Aeronca', 'Aircraft_Category'] = 'Airplane'

In [1114]:
# change the various iterations of Air Tractor and check its category values
df.loc[df['Make'].str.lower().str.startswith('air tractor'), 'Make'] = 'Air Tractor'

df[df['Make'] == 'Air Tractor'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         462
Airplane    448
Name: count, dtype: int64

In [1115]:
# Air Tractor is Airplane
df.loc[df['Make'] == 'Air Tractor', 'Aircraft_Category'] = 'Airplane'

In [1116]:
# look at airbus
df[df['Make'].str.lower().str.startswith('airbus')].value_counts('Make')

Make
AIRBUS                            251
Airbus Industrie                  135
Airbus                             37
AIRBUS INDUSTRIE                   22
AIRBUS HELICOPTERS                 10
AIRBUS HELICOPTERS INC              3
Airbus Helicopters                  2
AIRBUS HELICOPTER                   1
AIRBUS Helicopters                  1
AIRBUS/EUROCOPTER                   1
Airbus Helicopters (Eurocopte       1
Airbus Helicopters Deutschland      1
Airbus Industries                   1
Name: count, dtype: int64

In [1117]:
df[df['Make'].str.lower().str.startswith('airbus')].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane        284
NaN             141
Helicopter       40
Powered-Lift      1
Name: count, dtype: int64

I see here that the various versions of Airbus have both airplanes and helicopters in the make and category columns, so before replacing make names and then filling in empty category values, I need to check the categories for some of the make iterations that may not be clear.

In [1118]:
# check out Airbus Industrie iterations
df[df['Make'].str.lower().str.startswith('airbus i')].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         116
Airplane     42
Name: count, dtype: int64

So, Airbus Industrie, AIRBUS INDUSTRIE, and Airbus Industries can be combined and categorized as airplane

In [1119]:
df.loc[df['Make'].isin(['AIRBUS INDUSTRIE', 'Airbus Industries']), 'Make'] = 'Airbus Industrie'

df.loc[df['Make'] == 'Airbus Industrie', 'Aircraft_Category'] = 'Airplane'

df[df['Make'].str.lower().str.startswith('airbus')].value_counts('Make')

Make
AIRBUS                            251
Airbus Industrie                  158
Airbus                             37
AIRBUS HELICOPTERS                 10
AIRBUS HELICOPTERS INC              3
Airbus Helicopters                  2
AIRBUS HELICOPTER                   1
AIRBUS Helicopters                  1
AIRBUS/EUROCOPTER                   1
Airbus Helicopters (Eurocopte       1
Airbus Helicopters Deutschland      1
Name: count, dtype: int64

In [1120]:
# clean up Airbus Helicopters
df.loc[df['Make'].isin(['AIRBUS HELICOPTERS', 'AIRBUS Helicopters', 'AIRBUS HELICOPTERS INC', 'AIRBUS HELICOPTER', 'AIRBUS/EUROCOPTER', 'Airbus Helicopters (Eurocopte', 'Airbus Helicopters Deutschland']), 'Make'] = 'Airbus Helicopters'

df.loc[df['Make'] == 'Airbus Helicopters', 'Aircraft_Category'] = 'Helicopter'

df[df['Make'].str.lower().str.startswith('airbus')].value_counts('Make')

Make
AIRBUS                251
Airbus Industrie      158
Airbus                 37
Airbus Helicopters     20
Name: count, dtype: int64

In [1121]:
# combine the Airbus iterations
df.loc[(df['Make'] == 'AIRBUS'), 'Make'] = 'Airbus'

df[df['Make'] == 'Airbus'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane        242
NaN              25
Helicopter       20
Powered-Lift      1
Name: count, dtype: int64

Since only 20 of the almost 300 records for Airbus are helicopters, we can safely make the NaN values Airplane

In [1122]:
df.loc[(df['Make'] == 'Airbus') & (df['Aircraft_Category'].isna()), 'Aircraft_Category'] = 'Airplane'

df[df['Make'].str.lower().str.startswith('airbus')].value_counts('Make')

Make
Airbus                288
Airbus Industrie      158
Airbus Helicopters     20
Name: count, dtype: int64

In [1123]:
df['Aircraft_Category'].value_counts(dropna=False)

Aircraft_Category
Airplane             62783
NaN                  18695
Helicopter            5250
Glider                 505
Balloon                231
Gyrocraft              173
Weight-Shift           161
Powered Parachute       91
Ultralight              30
Unknown                 11
WSFT                     9
Blimp                    4
Powered-Lift             4
UNK                      2
Rocket                   1
ULTR                     1
Name: count, dtype: int64

So from having over 56,000 empty values in the category column, we are down to 18,695 empty values. I'd like to bring this down even further by looking at the empty category values as compared with the Make column to see which makes have the most empty values for category.

In [1124]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

Make
Grumman                         911
Mooney                          900
Hughes                          686
Robinson                        661
Schweizer                       537
                               ... 
York                              1
Warren-thomas                     1
Tennessee Engineering & Manf      1
Slade H. Holmes                   1
GRUMMAN AMERICAN AVN. CORP.       1
Name: count, Length: 4049, dtype: int64

Let's look at the category values for these makes that have the most empty category values

In [1125]:
# Grumman
df[df['Make'] == 'Grumman'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         911
Airplane    169
Name: count, dtype: int64

So Grumman is Airplane

In [1126]:
# check to see if there are any other versions of 'Grumman' in the make column
df[df['Make'].str.lower().str.startswith('grumm')].value_counts('Make')

Make
Grumman                           1080
Grumman American                   222
Grumman-schweizer                  121
GRUMMAN                             78
GRUMMAN ACFT ENG COR-SCHWEIZER      58
GRUMMAN AMERICAN AVN. CORP.         49
Grumman-Schweizer                    6
GRUMMAN AIRCRAFT ENG CORP            2
GRUMMAN AMERICAN                     2
Grumman Acft Eng                     2
Grumman American Aviation            2
GRUMMAN AIRCRAFT COR-SCHWEIZER       1
GRUMMAN AMERICAN AVIATION CORP       1
GRUMMAN AMERICAN AVN. CORP           1
GRUMMAN ACFT ENG COR                 1
GRUMMAN SCHWEIZER                    1
GRUMMAN AIRCRAFT                     1
Grumman American Avn. Corp.          1
Grumman Schweizer                    1
GRUMMAN American Corporation         1
Name: count, dtype: int64

In [1127]:
# What are the category values for all these different versions of Grumman
df[df['Make'].str.lower().str.startswith('grumm')].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         1207
Airplane     424
Name: count, dtype: int64

In [1128]:
# So let's combine all these Grumman makes together and make their category Airplane
df.loc[df['Make'].str.lower().str.startswith('grumm'), 'Make'] = 'Grumman'

df.loc[df['Make'] == 'Grumman', 'Aircraft_Category'] = 'Airplane'

df[df['Make'].str.lower().str.startswith('grumm')].value_counts('Make')

Make
Grumman    1631
Name: count, dtype: int64

In [1129]:
# Mooney
df[df['Make'] == 'Mooney'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         900
Airplane    180
Name: count, dtype: int64

In [1130]:
# check to see if there are any other versions of 'Mooney' in the make column
df[df['Make'].str.lower().str.startswith('moon')].value_counts('Make')

Make
Mooney                           1080
MOONEY                            242
MOONEY AIRCRAFT CORP.              34
MOONEY AIRPLANE CO INC             10
MOONEY AIRPLANE COMPANY, INC.       1
MOONEY INTERNATIONAL CORP           1
Moon                                1
Mooney Aircraft                     1
Mooney Aircraft Corp                1
Mooney Aircraft Corp.               1
Mooney Aircraft Corporation         1
Mooney, Dan                         1
Name: count, dtype: int64

In [1131]:
# What are the category values for all these different versions of Mooney
df[df['Make'].str.lower().str.startswith('moon')].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         908
Airplane    466
Name: count, dtype: int64

In [1132]:
# Combine all these Mooney makes together and make their category Airplane
df.loc[df['Make'].str.lower().str.startswith('mooney'), 'Make'] = 'Mooney'

df.loc[df['Make'] == 'Mooney', 'Aircraft_Category'] = 'Airplane'

df[df['Make'].str.lower().str.startswith('moon')].value_counts('Make')

Make
Mooney    1373
Moon         1
Name: count, dtype: int64

In [1133]:
# Hughes
df[df['Make'] == 'Hughes'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           686
Helicopter    108
Name: count, dtype: int64

Hughes would be all helicopters

In [1134]:
# check to see if there are any other versions of 'Hughes' in the make column
df[df['Make'].str.lower().str.startswith('hughes')].value_counts('Make')

Make
Hughes                          794
HUGHES                          137
HUGHES HELICOPTERS INC            3
HUGHES AERO CORP                  2
HUGHES CHARLES R                  1
HUGHES/HELICOPTER ASSOCS INC      1
Hughes Aero                       1
Hughes Cassutt                    1
Hughes J/Hughes J                 1
Name: count, dtype: int64

In [1135]:
df[df['Make'].str.lower().str.startswith('hughes')].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN                  688
Helicopter           248
Airplane               3
Powered Parachute      2
Name: count, dtype: int64

Now here we have a few airplanes and parachutes in addition to all the helicopters in our list of Hughes interations. This may be due to some people named Hughes in the list that are not associated with the helicopter company. We can narrow the list down to find just the helicopter Hughes.

In [1136]:
df[df['Make'].isin(['Hughes', 'HUGHES', 'HUGHES HELICOPTERS INC', 'HUGHES/HELICOPTER ASSOCS INC'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           687
Helicopter    248
Name: count, dtype: int64

In [1137]:
# So these 4 can be combined and made Helicopter in the category field
df.loc[df['Make'].isin(['Hughes', 'HUGHES', 'HUGHES HELICOPTERS INC', 'HUGHES/HELICOPTER ASSOCS INC']), 'Make'] = 'Hughes Helicopters'

df.loc[df['Make'] == 'Hughes Helicopters', 'Aircraft_Category'] = 'Helicopter'

df[df['Make'] == 'Hughes Helicopters'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Helicopter    935
Name: count, dtype: int64

In [1138]:
# Robinson
df[df['Make'].str.lower().str.startswith('robins')].value_counts('Make')

Make
Robinson                       940
ROBINSON                       283
ROBINSON HELICOPTER            221
ROBINSON HELICOPTER COMPANY    179
ROBINSON HELICOPTER CO          22
Robinson Helicopter Company     15
Robinson Helicopter              9
ROBINSON MICHAEL E               2
ROBINSON HELICOPTER CO INC       1
ROBINSON STEWART J               1
Robinson Helicopter Co.          1
Robinson Helicopters             1
Name: count, dtype: int64

In [1139]:
df[df['Make'].str.lower().str.startswith('robins')].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Helicopter    981
NaN           690
Airplane        3
Unknown         1
Name: count, dtype: int64

In [1140]:
# combine all the Robinson Helicopter iterations and make them Helicopter
df.loc[df['Make'].isin(['ROBINSON', 'ROBINSON HELICOPTER', 'ROBINSON HELICOPTER COMPANY', 'ROBINSON HELICOPTER CO', 'Robinson Helicopter Company', 'Robinson Helicopter', 'ROBINSON HELICOPTER CO INC', 'Robinson Helicopter Co.', 'Robinson Helicopters']), 'Make'] = 'Robinson'

df.loc[df['Make'] == 'Robinson', 'Aircraft_Category'] = 'Helicopter'

df[df['Make'].str.lower().str.startswith('robins')].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Helicopter    1672
Airplane         3
Name: count, dtype: int64

In [1141]:
# Schweizer
df[df['Make'].str.lower().str.startswith('schweiz')].value_counts('Make')

Make
Schweizer                         628
SCHWEIZER                         144
SCHWEIZER AIRCRAFT CORP            18
SCHWEIZER(HUGHES)AIRCRAFT CORP      2
Schweizer Aircraft Corp             2
Schweizer Aircraft Corp.            2
SCHWEIZER(HUGHES)                   1
Schweizer 300CBi                    1
Schweizer Sgs                       1
Schweizer, N36289                   1
Name: count, dtype: int64

In [1142]:
df[df['Make'].str.lower().str.startswith('schweiz')].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           541
Helicopter    115
Glider        111
Airplane       32
Unknown         1
Name: count, dtype: int64

A more healthy mixture here requires some investigation

In [1143]:
df[df['Make'].isin(['SCHWEIZER', 'Schweizer'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           539
Helicopter    111
Glider        110
Airplane       11
Unknown         1
Name: count, dtype: int64

A quick google search informs me that Schweizer Aircraft made helicopters, gliders, and airplanes, so filling in the category column for Schweizer cannot be accomplished just by using the make column. As the empty values only amount to almost 550, I'm going to leave Schweizer alone for now, except for combining the makes together so that I would be able to more easily dig into it using the model column as well.

In [1144]:
df.loc[df['Make'].isin(['SCHWEIZER', 'SCHWEIZER AIRCRAFT CORP', 'Schweizer Aircraft Corp', 'Schweizer Aircraft Corp.', 'Schweizer 300CBi', 'Schweizer Sgs']), 'Make'] = 'Schweizer'

df[df['Make'].str.lower().str.startswith('schweiz')].value_counts('Make')

Make
Schweizer                         796
SCHWEIZER(HUGHES)AIRCRAFT CORP      2
SCHWEIZER(HUGHES)                   1
Schweizer, N36289                   1
Name: count, dtype: int64

Now how are the empty category counts looking?

In [1145]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

Make
Schweizer            541
Mcdonnell Douglas    447
Maule                355
Champion             347
Aero Commander       317
                    ... 
Lamb/starduster        1
Reif                   1
Dale Conover           1
Curt Hoffstad          1
ROYSE RALPH L          1
Name: count, Length: 4029, dtype: int64

In [1146]:
# Let's look at the Scheizer models
df[df['Make'].isin(['Schweizer'])].value_counts('Model', dropna=False)

Model
269C           154
G-164B         109
SGS 2-33A       69
269C-1          41
G-164A          25
              ... 
G167B            1
G164A "450"      1
G164-B           1
G-164A-450       1
TG3A             1
Name: count, Length: 150, dtype: int64

Wikepedia and Google informs me that the Schweizer 269C is a helicopter, G-164B is an airplane, SGS 2-33A is a glider, 269C-1 is a helicopter, and G-164A is an airplane. Let's see if that data could e used to fill some of the Schweizer category values.

In [1147]:
df[df['Model'].isin(['269C'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Helicopter    253
NaN            87
Unknown         1
Name: count, dtype: int64

In [1148]:
# Since the 269C model is a helicopter, let's fix all the empty category values for it. This fix will also fill in some category 
# values for other makes as well since we can see that there are more 269C models than just the Schweizer make.
df.loc[df['Model'] == '269C', 'Aircraft_Category'] = 'Helicopter'

In [1149]:
# The same goes for the rest of the models listed
df[df['Model'].isin(['G-164B', 'G-164A'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    738
NaN         135
Name: count, dtype: int64

In [1150]:
df.loc[df['Model'].isin(['G-164B', 'G-164A']), 'Aircraft_Category'] = 'Airplane'

In [1151]:
df[df['Model'].isin(['SGS 2-33A'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN       44
Glider    25
Name: count, dtype: int64

In [1152]:
df.loc[df['Model'] == 'SGS 2-33A', 'Aircraft_Category'] = 'Glider'

In [1153]:
# Let's look at how the category column is shaping up
df['Aircraft_Category'].value_counts(dropna=False)

Aircraft_Category
Airplane             65032
NaN                  14938
Helicopter            6716
Glider                 549
Balloon                231
Gyrocraft              173
Weight-Shift           161
Powered Parachute       91
Ultralight              30
Unknown                  9
WSFT                     9
Blimp                    4
Powered-Lift             4
UNK                      2
Rocket                   1
ULTR                     1
Name: count, dtype: int64

We still have about 15,000 empty category records. This can be brought down further using Makes and Models. The category values as we have them now show that airplanes are the overwhelmingly largest percentage of aircraft in the dataset of accidents. But after helicopters, the rest of the categories are tiny by comparison, and they constitute aircraft that would not ordinarily be under consideration for a business interested in getting into the aviation business. I'm not going to just drop those rows right now, but in the analysis phase, I don't anticipate using them.

In [1154]:
df[df['Aircraft_Category'].isna()]['Model'].value_counts()

Model
UH-12E          134
8A              133
S2R             126
S-2R            114
DHC-2           112
               ... 
PROTECH PT-2      1
L-1011-1          1
GLASAIR GARG      1
C3C               1
EMB145            1
Name: count, Length: 5545, dtype: int64

In [1155]:
# Let's do the Mcdonnell Douglas make, and see about using the models in conjunction
df[df['Make'].str.lower().str.startswith('mcdonn')].value_counts('Make')

Make
Mcdonnell Douglas                 499
MCDONNELL DOUGLAS                  78
MCDONNELL DOUGLAS HELICOPTER       31
MCDONNELL DOUGLAS HELI CO          11
MCDONNELL DOUGLAS AIRCRAFT CO       6
McDonnell Douglas                   4
Mcdonnell-douglas                   2
MCDONNELL DOUGLAS CORPORATION       1
MCDONNELL-DOUGLAS                   1
McDonnell Douglas Helicopter        1
McDonnell Douglas Helicopter C      1
McDonnell Douglas Helicopters       1
Mcdonnell Douglas Helicopter        1
Mcdonnell Douglas Helicopters       1
Name: count, dtype: int64

In [1156]:
df[df['Make'].isin(['MCDONNELL DOUGLAS HELICOPTER', 'MCDONNELL DOUGLAS HELI CO'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Helicopter    42
Name: count, dtype: int64

In [1157]:
# Combine the helicopter variations of the name
df.loc[df['Make'].isin(['MCDONNELL DOUGLAS HELICOPTER', 'MCDONNELL DOUGLAS HELI CO', 'McDonnell Douglas Helicopter', 'McDonnell Douglas Helicopter C', 'McDonnell Douglas Helicopters', 'Mcdonnell Douglas Helicopter', 'Mcdonnell Douglas Helicopters']), 'Make'] = 'Mcdonnell Douglas Helicopters'

df[df['Make'].isin(['Mcdonnell Douglas Helicopters'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Helicopter    47
Name: count, dtype: int64

In [1158]:
df[df['Make'].str.lower().str.startswith('mcdonn')].value_counts('Make')

Make
Mcdonnell Douglas                499
MCDONNELL DOUGLAS                 78
Mcdonnell Douglas Helicopters     47
MCDONNELL DOUGLAS AIRCRAFT CO      6
McDonnell Douglas                  4
Mcdonnell-douglas                  2
MCDONNELL DOUGLAS CORPORATION      1
MCDONNELL-DOUGLAS                  1
Name: count, dtype: int64

In [1159]:
df.loc[df['Make'].isin(['MCDONNELL DOUGLAS', 'MCDONNELL DOUGLAS AIRCRAFT CO', 'McDonnell Douglas', 'Mcdonnell-douglas', 'MCDONNELL DOUGLAS CORPORATION', 'MCDONNELL-DOUGLAS']), 'Make'] = 'Mcdonnell Douglas'

df[df['Make'].isin(['Mcdonnell Douglas'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           450
Airplane      123
Helicopter     18
Name: count, dtype: int64

In [1160]:
# make the 18 Helicopters the Mcdonnell Douglas Helicopters Make
df.loc[(df['Make'] == 'Mcdonnell Douglas') & (df['Aircraft_Category'] == 'Helicopter'), 'Make'] = 'Mcdonnell Douglas Helicopters'

df[df['Make'].isin(['Mcdonnell Douglas'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         450
Airplane    123
Name: count, dtype: int64

In [1161]:
df.loc[(df['Make'] == 'Mcdonnell Douglas'), 'Aircraft_Category'] = 'Airplane'

df[df['Make'].str.lower().str.startswith('mcdonn')].value_counts('Make')

Make
Mcdonnell Douglas                573
Mcdonnell Douglas Helicopters     65
Name: count, dtype: int64

In [1162]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

Make
Maule                     355
Champion                  347
Aero Commander            317
De Havilland              316
Schweizer                 299
                         ... 
Lighthizer, Richard E.      1
Steven Ulrich               1
Tate                        1
Arnold Forest               1
ROYSE RALPH L               1
Name: count, Length: 4024, dtype: int64

In [1163]:
# The Maule make
df[df['Make'].str.lower().str.startswith('maul')].value_counts('Make')

Make
Maule                  443
MAULE                  144
MAULE AIRCRAFT CORP      1
Maule Air Inc.           1
Name: count, dtype: int64

In [1164]:
df.loc[df['Make'].isin(['MAULE', 'MAULE AIRCRAFT CORP', 'Maule Air Inc.']), 'Make'] = 'Maule'

df[df['Make'].isin(['Maule'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         356
Airplane    233
Name: count, dtype: int64

In [1165]:
df.loc[(df['Make'] == 'Maule'), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts()

Make
Champion                  347
Aero Commander            317
De Havilland              316
Schweizer                 299
Rockwell                  293
                         ... 
Angel Elbert S Jr           1
Lighthizer, Richard E.      1
Steven Ulrich               1
Tate                        1
ROYSE RALPH L               1
Name: count, Length: 4022, dtype: int64

In [1166]:
# The Champion make
df[df['Make'].str.lower().str.startswith('champ')].value_counts('Make')

Make
Champion    426
CHAMPION     91
Name: count, dtype: int64

In [1167]:
df.loc[df['Make'].isin(['CHAMPION']), 'Make'] = 'Champion'

df[df['Make'].isin(['Champion'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         347
Airplane    170
Name: count, dtype: int64

In [1168]:
df.loc[(df['Make'] == 'Champion'), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts()

Make
Aero Commander            317
De Havilland              316
Schweizer                 299
Rockwell                  293
Stinson                   287
                         ... 
Angel Elbert S Jr           1
Lighthizer, Richard E.      1
Steven Ulrich               1
Tate                        1
ROYSE RALPH L               1
Name: count, Length: 4021, dtype: int64

In [1169]:
# The Aero Commander make
df[df['Make'].str.lower().str.startswith('aero c')].value_counts('Make')

Make
Aero Commander    356
AERO COMMANDER     69
Aero Comp Inc       1
Name: count, dtype: int64

In [1170]:
df.loc[df['Make'].isin(['AERO COMMANDER']), 'Make'] = 'Aero Commander'

df[df['Make'].isin(['Aero Commander'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         318
Airplane    107
Name: count, dtype: int64

In [1171]:
df.loc[(df['Make'] == 'Aero Commander'), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts()

Make
De Havilland              316
Schweizer                 299
Rockwell                  293
Stinson                   287
Hiller                    282
                         ... 
Angel Elbert S Jr           1
Lighthizer, Richard E.      1
Steven Ulrich               1
Tate                        1
ROYSE RALPH L               1
Name: count, Length: 4019, dtype: int64

In [1172]:
# The De Havilland make
de_havilland_variations = df[df['Make'].str.lower().str.contains(r'de\s?havil?land', regex=True)]
de_havilland_variations.value_counts('Make')

Make
De Havilland          370
DEHAVILLAND            91
DE HAVILLAND           31
de Havilland            9
Dehavilland             8
DeHavilland             2
DEHAVILLAND CANADA      1
Name: count, dtype: int64

In [1173]:
# combine all these variations of De Havilland into one make
df.loc[df['Make'].str.lower().str.contains(r'de\s?havil?land', regex=True), 'Make'] = 'De Havilland'

df[df['Make'].isin(['De Havilland'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         328
Airplane    184
Name: count, dtype: int64

In [1174]:
df.loc[(df['Make'] == 'De Havilland'), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts()

Make
Schweizer            299
Rockwell             293
Stinson              287
Aerospatiale         282
Hiller               282
                    ... 
Cooprider              1
Angel Elbert S Jr      1
Arnold Forest          1
Steven Ulrich          1
ROYSE RALPH L          1
Name: count, Length: 4016, dtype: int64

In [1175]:
# Let's look at the Models overall for NaN values in Category
df[df['Aircraft_Category'].isna()]['Model'].value_counts()

Model
UH-12E         134
8A             133
S-2R           106
415-C           89
BC12-D          86
              ... 
160              1
CAYUSE           1
EAA SPECIAL      1
TINY TWO         1
EMB145           1
Name: count, Length: 5239, dtype: int64

Google tells me that a UH-12E is a helicopter, while 8A, S-2R, 415-C, BC12-D are airplanes. And running the function like "df[df['Model'].isin(['BC12-D'])].value_counts('Aircraft_Category', dropna=False)" verifies this. So let's correct those category values

In [1176]:
#Edit one model's category value
df.loc[df['Model'] == 'UH-12E', 'Aircraft_Category'] = 'Helicopter'

#Edit multiple models' category value
df.loc[df['Model'].isin(['8A', 'S-2R', '415-C', 'BC12-D']), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Model'].value_counts()

Model
108-3              76
S2R                74
RV-4               72
108-2              72
KR-2               71
                   ..
VARIEZE,LONG EZ     1
ACRO-SPECIAL        1
AVID FLYER "C"      1
BD-5 B              1
EMB145              1
Name: count, Length: 5234, dtype: int64

In [1177]:
# Running this function tells me the top 5 are airplanes
df[df['Model'].isin(['108-3', 'S2R', 'RV-4', '108-2', 'KR-2'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         365
Airplane    200
Name: count, dtype: int64

In [1178]:
#Edit multiple models' category value
df.loc[df['Model'].isin(['108-3', 'S2R', 'RV-4', '108-2', 'KR-2']), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Model'].value_counts()

Model
108-1              69
LA-4-200           64
GC-1B              64
8E                 55
L-13               54
                   ..
VARIEZE,LONG EZ     1
ACRO-SPECIAL        1
AVID FLYER "C"      1
BD-5 B              1
EMB145              1
Name: count, Length: 5229, dtype: int64

In [1179]:
# the top 4 are all airplanes
df[df['Model'].isin(['108-1', 'LA-4-200', 'GC-1B', '8E'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         252
Airplane     84
Name: count, dtype: int64

In [1180]:
#Edit multiple models' category value
df.loc[df['Model'].isin(['108-1', 'LA-4-200', 'GC-1B', '8E']), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Model'].value_counts()

Model
Unknown              54
L-13                 54
VARIEZE              45
A-1                  44
F-28C                43
                     ..
CAYUSE                1
415-C AIRCOUPE        1
TINY TWO              1
FOX III SPEEDSTER     1
EMB145                1
Name: count, Length: 5225, dtype: int64

In [1181]:
df[df['Model'].isin(['F-28C'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           43
Helicopter     8
Name: count, dtype: int64

glider - L-13; airplane - VARIEZE, A-1; helicopter - F-28C;

In [1182]:
#Edit one model's category value
df.loc[df['Model'] == 'L-13', 'Aircraft_Category'] = 'Glider'
df.loc[df['Model'] == 'F-28C', 'Aircraft_Category'] = 'Helicopter'

#Edit multiple models' category value
df.loc[df['Model'].isin(['VARIEZE', 'A-1']), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Model'].value_counts()

Model
Unknown            54
FH-1100            43
108                42
AVID FLYER         40
RV-6               39
                   ..
VARIEZE,LONG EZ     1
ACRO-SPECIAL        1
AVID FLYER "C"      1
BD-5 B              1
EMB145              1
Name: count, Length: 5221, dtype: int64

In [1183]:
df[df['Model'].isin(['RV-6'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           39
Airplane      34
Helicopter     1
Name: count, dtype: int64

helicopter - FH-1100; airplane - 108, AVID FLYER, RV-6;

In [1184]:
#Edit one model's category value
df.loc[df['Model'] == 'FH-1100', 'Aircraft_Category'] = 'Helicopter'

#Edit multiple models' category value
df.loc[df['Model'].isin(['108', 'AVID FLYER', 'RV-6']), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Model'].value_counts()

Model
Unknown            54
BC-12D             39
280C               38
35A                38
S2R-T34            36
                   ..
VARIEZE,LONG EZ     1
ACRO-SPECIAL        1
AVID FLYER "C"      1
BD-5 B              1
EMB145              1
Name: count, Length: 5217, dtype: int64

In [1185]:
df[df['Model'].isin(['S2R-T34'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         36
Airplane    23
Name: count, dtype: int64

airplane - BC-12D, 35A, S2R-T34; helicopter - 280C;

In [1186]:
#Edit one model's category value
df.loc[df['Model'] == '280C', 'Aircraft_Category'] = 'Helicopter'

#Edit multiple models' category value
df.loc[df['Model'].isin(['BC-12D', '35A', 'S2R-T34']), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Model'].value_counts().head(50)

Model
Unknown          54
FIREFLY 7        36
NAVION           36
RV-6A            36
F-28A            36
M-18A            35
UPF-7            35
CHALLENGER II    35
CL-600-2B19      32
DW-1             31
AA-1             31
SA226TC          30
2T-1A-2          30
415C             30
DC-3             28
H-295            28
VARI-EZE         27
AA-1A            27
8F               27
SA227-AC         26
SKYBOLT          26
QUICKIE          26
MU-2B-60         25
LONG-EZ          25
LA-4             24
SONERAI II       24
S-2B             23
KITFOX           23
THORP T-18       23
SR22             23
S-76A            22
RC-3             22
A                22
201B             22
BLANIK L-13      21
F-28F            21
MUSTANG II       21
B-2B             21
AA-5B            21
DC-3C            21
S-60A            20
Q2               20
114              20
S-1B2            20
SNJ-5            20
2150A            20
P-51D            20
AT-6D            20
T-6G             20
UH-12C        

Instead of just a few at a time, we can display the top 50 models with no category value and go from there.

In [1187]:
df[df['Model'].isin(['UH-12C'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           20
Helicopter     1
Name: count, dtype: int64

In [1188]:
#Edit one model's category value
df.loc[df['Model'] == 'BLANIK L-13', 'Aircraft_Category'] = 'Glider'

#Edit multiple models' category value
df.loc[df['Model'].isin(['114', '201B', '2150A', '2T-1A-2', '415C', '8F', 'A', 'AA-1', 'AA-1A', 'AA-5B', 'AT-6D', 'CHALLENGER II', 'CL-600-2B19', 'DC-3', 'DC-3C', 'DW-1', 'H-295', 'KITFOX', 'LA-4',
                        'LONG-EZ', 'M-18A', 'MU-2B-60', 'MUSTANG II', 'NAVION', 'P-51D', 'Q2', 'QUICKIE', 'RC-3', 'RV-6A', 'S-1B2', 'S-2B', 'SA226TC', 'SA227-AC', 'SKYBOLT', 'SNJ-5', 'SONERAI II',
                        'SR22', 'T-6G', 'THORP T-18', 'UPF-7', 'VARI-EZE']), 'Aircraft_Category'] = 'Airplane'

df.loc[df['Model'].isin(['B-2B', 'F-28A', 'F-28F', 'S-76A', 'UH-12C']), 'Aircraft_Category'] = 'Helicopter'

df.loc[df['Model'].isin(['FIREFLY 7', 'S-60A']), 'Aircraft_Category'] = 'Balloon'

df[df['Aircraft_Category'].isna()]['Model'].value_counts().head(50)

Model
Unknown              54
II                   20
MONI                 19
DRAGONFLY            19
LONG EZ              19
STEEN SKYBOLT        19
UH-12B               19
PZL-M-18             19
ST3KR                18
ASW-20               18
112A                 18
QUICKIE Q2           18
UH-12D               18
TIERRA II            18
AS-350D              17
S-1S                 17
S-1                  17
AS350D               17
CHRISTEN EAGLE II    17
B-8M                 17
BD-4                 17
SR-22                16
Q-2                  16
AEROSTAR 600         16
NAVION A             16
G103                 16
T-18                 16
RANS S-12            16
DC-9-32              16
SGS-2-33A            16
S2R-600              16
PITTS S-2B           16
TB-20                16
SA315B               16
340B                 16
SGS 1-26E            15
415-D                15
AS350BA              15
BD-5B                15
PITTS S-1S           15
112TC                15
620B      

In [1189]:
df[df['Model'].isin(['Q-2'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         16
Airplane     1
Name: count, dtype: int64

In [1190]:
df[df['Model'].isin(['Q-2'])].value_counts('Amateur_Built', dropna=False)

Amateur_Built
Yes    17
Name: count, dtype: int64

In [1191]:
#Edit one model's category value
df.loc[df['Model'] == 'B-8M', 'Aircraft_Category'] = 'Gyrocraft'
df.loc[df['Model'] == 'TIERRA II', 'Aircraft_Category'] = 'Ultralight'

#Edit multiple models' category value
df.loc[df['Model'].isin(['DRAGONFLY', 'LONG EZ', 'STEEN SKYBOLT', 'PZL-M-18', 'ST3KR', '112A', 'QUICKIE Q2', 'S-1S', 'S-1', 'CHRISTEN EAGLE II', 'BD-4', 'SR-22']), 'Aircraft_Category'] = 'Airplane'

df.loc[df['Model'].isin(['UH-12B', 'UH-12D', 'AS-350D', 'AS350D']), 'Aircraft_Category'] = 'Helicopter'

df.loc[df['Model'].isin(['MONI', 'ASW-20']), 'Aircraft_Category'] = 'Glider'

df[df['Aircraft_Category'].isna()]['Model'].value_counts().head(50)

Model
Unknown           54
II                20
AEROSTAR 600      16
Q-2               16
RANS S-12         16
T-18              16
S2R-600           16
SA315B            16
DC-9-32           16
NAVION A          16
SGS-2-33A         16
TB-20             16
G103              16
340B              16
PITTS S-2B        16
AS350B            15
620B              15
PITTS S-1S        15
SGS 1-26E         15
415-D             15
SA316B            15
BD-5B             15
DC-9-31           15
AS350BA           15
SA-226T           15
112TC             15
269C-1            15
SGS 2-33          14
F28C              14
ATR-42-300        14
GLASAIR           14
B8M               14
A2                14
IS-28B2           14
QUICKSILVER MX    14
F-19              14
RV4               14
201C              14
25B               14
S-2A              13
BC12D             13
SA-26AT           13
SGS 1-34          13
L-1011-385-1      13
SR20              13
KITFOX II         13
35                13
A-1B   

In [1192]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Schweizer         298
Aerospatiale      253
Douglas           173
North American    161
Taylorcraft       144
Rockwell          141
Sikorsky          106
Burkhart Grob     100
Fairchild          97
Lockheed           94
Name: count, dtype: int64

In [1193]:
df[df['Make'].isin(['Schweizer'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           298
Helicopter    199
Glider        156
Airplane      143
Name: count, dtype: int64

In [1194]:
# Look at the Schweizer models again that have empty category values
df[df['Make'].isin(['Schweizer']) & df['Aircraft_Category'].isna()].value_counts('Model', dropna=False).head(60)

Model
SGS-2-33A      16
SGS 1-26E      15
269C-1         15
SGS 2-33       14
SGS 1-34       13
300C           11
269D           10
SGS 1-26B       8
SGS 2-32        8
2-33A           8
G164B           7
SGS 1-36        6
SGS2-33A        6
2-33            6
SGS-1-35C       5
SGS 1-26A       5
G-164           4
2-32            4
SGS 1-26C       4
SGS 1-26D       4
1-26E           4
SGS-1-26B       4
SGS 1-35        4
SGS-126E        3
SGS-2-33        3
SGS-233A        3
G164A           3
SGS-1-35        3
SGS 1-26        3
G-164-A         3
G164            2
SGU 2-22E       2
SGU-2-22E       2
SGS-1-34        2
2-33-A          2
1-36            2
1-35C           2
SGU2-22E        2
HUGHES 269C     2
269B            2
G164D           2
SGS1-36         2
SGS1-34         2
G-164D          2
G-164C          2
G-164B-600      2
SGS-1-26E       2
T-26E           1
SGS-1-26        1
SSG 2-33A       1
SGS-1-26A       1
SGS 2-8         1
SGU-22          1
SGS 2-33AK      1
SGU-2-22K       1
SGS1

In [1195]:
#Edit multiple models' category value
df.loc[df['Model'].isin(['1-26E', '1-35C', '1-36', '2-32', '2-33', '2-33-A', '2-33A', 'SGS 1-26', 'SGS 1-26A', 'SGS 1-26B', 'SGS 1-26C', 'SGS 1-26D', 'SGS 1-26E', 'SGS 1-34', 'SGS 1-35', 'SGS 1-36',
                        'SGS 2-32', 'SGS 2-33', 'SGS 2-33AK', 'SGS 2-8', 'SGS-1-26', 'SGS-1-26A', 'SGS-1-26B', 'SGS-1-26E', 'SGS-1-30', 'SGS-1-34', 'SGS-1-35', 'SGS-1-35C', 'SGS-126D', 'SGS-126E',
                        'SGS-2-33', 'SGS-2-33A', 'SGS-233A', 'SGS1-26C', 'SGS1-26D', 'SGS1-34', 'SGS1-36', 'SGS2-33A', 'SGU 2-22CK', 'SGU 2-22E', 'SGU-2-22E', 'SGU-2-22K', 'SGU-22', 'SGU2-22E',
                        'SSG 2-33A', 'T-26E']), 'Aircraft_Category'] = 'Glider'

df.loc[df['Model'].isin(['269B', '269C-1', '269D', '300C', 'HUGHES 269C']), 'Aircraft_Category'] = 'Helicopter'

df.loc[df['Model'].isin(['G-164', 'G-164-A', 'G-164B-600', 'G-164C', 'G-164D', 'G164', 'G164A', 'G164B', 'G164D']), 'Aircraft_Category'] = 'Airplane'

df[df['Make'].isin(['Schweizer']) & df['Aircraft_Category'].isna()].value_counts('Model', dropna=False).head(60)

Model
-269C             1
I-26E             1
S2-33A            1
SA 2-37A          1
SC2-33A           1
SGS 1-23          1
SGS 1-23G         1
SGS 1-23H-15      1
SGS 1-26F         1
SGS 1-34R         1
SGS 1-35C         1
SGS 126B          1
SGS 126E          1
SGS 135           1
SGS-1-36          1
SGS-2-32          1
SGS-2-32A         1
SGS-233           1
SGS1-26-D         1
SGS1-26A          1
SGS2-32           1
SGS2-33           1
SGS233A           1
S-2-33A           1
I-26D             1
1-23              1
H-300             1
1-24              1
1-26              1
1-26B             1
1-26D             1
126-D             1
134               1
2-22EK            1
233A              1
269               1
269-C             1
269-C1            1
333               1
AG CAT            1
FGS-233           1
G-164-B           1
G-164A-450        1
G164-B            1
G164A "450"       1
G167B             1
GRUMMAN G-164A    1
GRUMMAN G-164B    1
TG3A              1
Name: count, d

In [1196]:
#Edit multiple models' category value
df.loc[df['Model'].isin(['134', '1-23', '1-24', '1-26', '1-26B', '1-26D', '126-D', '2-22EK', '233A', 'FGS-233', 'I-26D', 'I-26E', 'S-2-33A', 'S2-33A', 'SC2-33A', 'SGS 1-23', 'SGS 1-23G', 'SGS 1-23H-15',
                        'SGS 1-26F', 'SGS 1-34R', 'SGS 1-35C', 'SGS 126B', 'SGS 126E', 'SGS 135', 'SGS-1-36', 'SGS-2-32', 'SGS-2-32A', 'SGS-233', 'SGS1-26-D', 'SGS1-26A', 'SGS2-32', 'SGS2-33',
                        'SGS233A', 'TG3A']), 'Aircraft_Category'] = 'Glider'

df.loc[df['Model'].isin(['269', '333', '-269C', '269-C', '269-C1', 'H-300']), 'Aircraft_Category'] = 'Helicopter'

df.loc[df['Model'].isin(['SA 2-37A', 'AG CAT', 'G-164-B', 'G-164A-450', 'G164-B', 'G164A \"450\"', 'G167B', 'GRUMMAN G-164A', 'GRUMMAN G-164B']), 'Aircraft_Category'] = 'Airplane'

df[df['Make'].isin(['Schweizer'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Glider        372
Helicopter    245
Airplane      179
Name: count, dtype: int64

In [1197]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Aerospatiale      253
Douglas           173
North American    161
Taylorcraft       144
Rockwell          141
Sikorsky          106
Burkhart Grob     100
Fairchild          97
Lockheed           94
Ayres              87
Name: count, dtype: int64

In [1198]:
df[df['Make'].isin(['Aerospatiale']) & df['Aircraft_Category'].isna()].value_counts('Model', dropna=False).head(60)

Model
SA315B             16
AS350B             14
SA316B             14
ATR-42-300         12
AS-350B            10
AS350BA             9
ATR-72-212          8
SA-315B             8
TB-20               6
AS-355F-1           6
AS-355-F1           5
SA341G              5
SA-341G             5
AS-350BA            4
316B                3
AS-350-B            3
AS-355F             3
AS-350              3
AS-355-F            3
ATR-42-320          3
SA 315B             3
SA-360C             3
SA319B              3
TB-21               3
ATR-42              2
ATR 42-300          2
AS35OD              2
AS355F              2
SA-315              2
SA-319B             2
SA315B LAMA         2
AS-355E             2
ATR-72              2
AS355F1             2
AS-350-B2           2
AS 355F1            2
AS-355              2
350D                2
AS-350-BA           2
AS 355F             2
ALOUETTE 3          1
SA315-D LAMA        1
SA-365-N2           1
AS 315B             1
AS 350 ASTAR        1
AS-3

In [1199]:
df[df['Make'].isin(['Aerospatiale'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           253
Helicopter     79
Airplane        2
Name: count, dtype: int64

I see only 2 airplanes listed for Aerospatiale. So which models are those?

In [1200]:
df[df['Make'].isin(['Aerospatiale']) & df['Aircraft_Category'].isin(['Airplane'])].value_counts('Model', dropna=False)

Model
ATR 42-320    1
ATR-42-300    1
Name: count, dtype: int64

So this tells me that models beginning with 'ATR' would be airplanes

In [1201]:
df[df['Make'].isin(['Aerospatiale']) & df['Aircraft_Category'].isin(['Helicopter'])].value_counts('Model', dropna=False)

Model
AS-350D       17
AS350D        15
AS350          6
SA316B         4
SA315B         4
AS350BA        3
AS-355F        2
AS350-B2       2
SA-316B        2
SA-315B        2
SA 315B        2
AS-350-BA      1
SA 315 B       1
350B1          1
SA-360C        1
SA-341G        1
SA-319B        1
SA-318C        1
SA 316B        1
AS355F1        1
S350D          1
AS 365 N-2     1
AS355          1
350D           1
341G           1
AS350B2        1
AS350B         1
AS350-D        1
AS 355F2       1
AS-355-F2      1
315B           1
Name: count, dtype: int64

And helicopter models begin with 'AS-' and 'SA-'

In [1202]:
#Edit multiple models' category value
df.loc[df['Model'].isin(['355', '316B', '350 B1', '350D', 'A-300B4', 'ALOUETTE 3', 'AS 315B', 'AS 350 ASTAR', 'AS 350B', 'AS 350B-2', 'AS 350D', 'AS 355 F', 'AS 355F', 'AS 355F1', 'AS-350', 'AS-350-B',
                        'AS-350-B2', 'AS-350-BA', 'AS-350B', 'AS-350BA', 'AS-355', 'AS-355-F', 'AS-355-F1', 'AS-355-F2', 'AS-355E', 'AS-355F', 'AS-355F-1', 'AS350B', 'AS350BA', 'AS355F', 'AS355F1',
                        'AS35OD', 'SA 315B', 'SA 360C', 'SA-315', 'SA-315-B', 'SA-315B', 'SA-316 ALOUETTE', 'SA-316B', 'SA-319B', 'SA-330J', 'SA-341G', 'SA-360C', 'SA-365-N2', 'SA315-D LAMA', 'SA315B',
                        'SA315B LAMA', 'SA316B', 'SA318C', 'SA319B', 'SA341G']), 'Aircraft_Category'] = 'Helicopter'

df.loc[df['Model'].isin(['ATR 42-300', 'ATR-42', 'ATR-42-300', 'ATR-42-320', 'ATR-72', 'ATR-72-212', 'TB-20', 'TB-21', 'TB20']), 'Aircraft_Category'] = 'Airplane'

df[df['Make'].isin(['Aerospatiale']) & df['Aircraft_Category'].isna()].value_counts('Model', dropna=False).head(60)

Model
316B ALOUETTE III       1
AS365N                  1
ATR 72-212              1
ATR-42-500              1
ATR-72-12               1
ATR42-300               1
ATR72-212               1
CONCORDE VERSION 101    1
Concorde                1
ND-26                   1
SA 315                  1
SA 316B                 1
SA319B Alouette III     1
SA330J                  1
SA360C DAUPHIN          1
SA365-N1                1
SA365N                  1
SE 3180                 1
SE 318C                 1
SE316B                  1
SF3130                  1
SN-601                  1
TB-10                   1
ATR 42-320              1
AS355F2                 1
350-B                   1
AS355F-1                1
350B                    1
AS 355 F ECUREUIL       1
AS 355F-1               1
AS-332L                 1
AS-341G                 1
AS-350-B3               1
AS-350B1                1
AS-350B2                1
AS-350BII               1
AS-355F1                1
AS-365-N2               1
AS315B

In [1203]:
#Edit multiple models' category value
df.loc[df['Model'].isin(['316B ALOUETTE III', '350-B', '350B', 'AS 355 F ECUREUIL', 'AS 355F-1', 'AS-332L', 'AS-341G', 'AS-350-B3', 'AS-350B1', 'AS-350B2', 'AS-350BII', 'AS-355F1', 'AS-365-N2', 'AS315B',
                        'AS332', 'AS350', 'AS350 BA', 'AS350-B', 'AS350-B3', 'AS350-BH', 'AS350-D', 'AS350B3', 'AS350D ASTAR', 'AS355F-1', 'AS355F2', 'AS365N', 'SA 315', 'SA 316B', 'SA319B Alouette III',
                        'SA330J', 'SA360C DAUPHIN', 'SA365-N1', 'SA365N', 'SE 3180', 'SE 318C', 'SE316B', 'SF3130']), 'Aircraft_Category'] = 'Helicopter'

df.loc[df['Model'].isin(['ATR 42-320', 'ATR 72-212', 'ATR-42-500', 'ATR-72-12', 'ATR42-300', 'ATR72-212', 'SN-601', 'TB-10', 'TB21', 'CONCORDE VERSION 101', 'Concorde', 'ND-26']), 'Aircraft_Category'] = 'Airplane'

df[df['Make'].isin(['Aerospatiale']) & df['Aircraft_Category'].isna()].value_counts('Model', dropna=False).head(60)

Series([], Name: count, dtype: int64)

In [1204]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Douglas           173
North American    161
Taylorcraft       144
Rockwell          141
Sikorsky          106
Burkhart Grob     100
Fairchild          97
Lockheed           94
Balloon Works      87
Ayres              87
Name: count, dtype: int64

In [1205]:
df[df['Make'].str.lower().str.startswith('dougl')].value_counts('Make')

Make
Douglas                250
DOUGLAS                 26
DOUGLAS BRIAN G          1
DOUGLAS K THOMPSON       1
Douglas A. Pohl          1
Douglas C. Campbell      1
Douglas D. Turner        1
Douglas Maselink         1
Douglas Swanningson      1
Douglas/basler           1
Name: count, dtype: int64

In [1206]:
df.loc[df['Make'].isin(['Douglas', 'DOUGLAS']), 'Make'] = 'Douglas'

In [1207]:
df[df['Make'] == 'Douglas'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         174
Airplane    102
Name: count, dtype: int64

In [1208]:
df.loc[df['Make'] == 'Douglas', 'Aircraft_Category'] = 'Airplane'

In [1209]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
North American    161
Taylorcraft       144
Rockwell          141
Sikorsky          106
Burkhart Grob     100
Fairchild          97
Lockheed           94
Balloon Works      87
Ayres              87
Swearingen         86
Name: count, dtype: int64

In [1210]:
df[df['Make'].str.lower().str.startswith('north a')].value_counts('Make')

Make
North American                    294
NORTH AMERICAN                     79
North American Rockwell Corp.       5
NORTH AMERICAN/AERO CLASSICS        3
North American Aviation Div.        2
NORTH AMERICAN-MEDORE               1
NORTH AMERICAN/SCHWAMM              1
NORTH AMERICAN/VICTORIA MNT LT      1
North American Rockwell             1
North American-aero Classics        1
North American-barene               1
North American-kenney               1
North American-maslon               1
North American/aero Classics        1
Name: count, dtype: int64

In [1211]:
df.loc[df['Make'].isin(['NORTH AMERICAN']), 'Make'] = 'North American'

df[df['Make'] == 'North American'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    212
NaN         161
Name: count, dtype: int64

In [1212]:
df.loc[df['Make'] == 'North American', 'Aircraft_Category'] = 'Airplane'

In [1213]:
df[df['Make'].str.lower().str.startswith('taylorc')].value_counts('Make')

Make
Taylorcraft                   316
TAYLORCRAFT                    62
TAYLORCRAFT AVIATION CORP       5
TAYLORCRAFT AVIATION CORP.      3
TAYLORCRAFT CORP                1
Taylorcraft Aviation            1
Taylorcraft Corporation         1
Name: count, dtype: int64

In [1214]:
df.loc[df['Make'].isin(['TAYLORCRAFT', 'TAYLORCRAFT AVIATION CORP', 'TAYLORCRAFT AVIATION CORP.', 'TAYLORCRAFT CORP', 'Taylorcraft Aviation', 'Taylorcraft Corporation']), 'Make'] = 'Taylorcraft'

df[df['Make'] == 'Taylorcraft'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    244
NaN         145
Name: count, dtype: int64

In [1215]:
df.loc[df['Make'] == 'Taylorcraft', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Rockwell         141
Sikorsky         106
Burkhart Grob    100
Fairchild         97
Lockheed          94
Ayres             87
Balloon Works     87
Swearingen        86
Mitsubishi        84
Hiller            82
Name: count, dtype: int64

In [1216]:
df[df['Make'].str.lower().str.startswith('rockw')].value_counts('Make')

Make
Rockwell                  328
ROCKWELL INTERNATIONAL     53
ROCKWELL                   24
Rockwell International     22
Rockwell Commander          3
Rockwell Intl               2
Rockwell Intl.              2
ROCKWELL COMMANDER          1
Rockwell Comdr              1
Rockwell Int't              1
Name: count, dtype: int64

In [1217]:
df.loc[df['Make'].isin(['ROCKWELL', 'ROCKWELL INTERNATIONAL', 'Rockwell International', 'Rockwell Intl', 'Rockwell Intl.', 'Rockwell Int\'t']), 'Make'] = 'Rockwell'

df[df['Make'] == 'Rockwell'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    284
NaN         148
Name: count, dtype: int64

In [1218]:
df.loc[df['Make'] == 'Rockwell', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Sikorsky             106
Burkhart Grob        100
Fairchild             97
Lockheed              94
Balloon Works         87
Ayres                 87
Swearingen            86
Mitsubishi            84
Hiller                82
British Aerospace     79
Name: count, dtype: int64

In [1219]:
df[df['Make'].str.lower().str.startswith('sikor')].value_counts('Make')

Make
Sikorsky                         153
SIKORSKY                          76
SIKORSKY AIRCRAFT CORP             1
SIKORSKY AIRCRAFT CORPORATION      1
Sikorsky/orlando                   1
Name: count, dtype: int64

In [1220]:
df.loc[df['Make'].isin(['SIKORSKY', 'SIKORSKY AIRCRAFT CORP', 'SIKORSKY AIRCRAFT CORPORATION']), 'Make'] = 'Sikorsky'

df[df['Make'] == 'Sikorsky'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Helicopter    118
NaN           113
Name: count, dtype: int64

In [1221]:
df.loc[df['Make'] == 'Sikorsky', 'Aircraft_Category'] = 'Helicopter'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Burkhart Grob        100
Fairchild             97
Lockheed              94
Balloon Works         87
Ayres                 87
Swearingen            86
Mitsubishi            84
Hiller                82
British Aerospace     79
Embraer               75
Name: count, dtype: int64

In [1222]:
df[df['Make'] == 'Burkhart Grob'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN       100
Glider      9
Name: count, dtype: int64

So many NaN and only 9 gliders, so I'm going to check out the models just to make sure that I should fill in Burkhart Grob category as Glider

In [1223]:
df[df['Make'] == 'Burkhart Grob'].value_counts('Model', dropna=False)

Model
G103                    15
G-103A                   7
103                      6
G102                     6
G-109B                   5
G-103                    5
G103 TWIN ASTIR          4
G 103 Twin II            4
G109B                    3
G-102                    3
103A                     3
G103 TWIN II             2
G103 Twin Astir          2
G-103-TWIN II            2
G102 ASTIR CS            2
G103A                    2
G-103A Twin II Acro      2
G102 Club Astir IIIB     2
102                      2
G 103 TWIN II            2
G-103 TWIN II            1
G102-111B                1
SPEED ASTIR II           1
G10Z ASTIR CS            1
103C                     1
G103C TWIN III ACRO      1
G103B                    1
109                      1
G103-TWINA               1
G103 Twin II             1
109A                     1
109B                     1
6103 TWIN ASTIR          1
G103 FLUGZEUGBAU         1
A103 TWIN II             1
G102 Std Astir III       1
G-103-II AERO         

Google tells me all these models are gliders.

In [1224]:
df[df['Make'].str.lower().str.startswith('burkha')].value_counts('Make')

Make
Burkhart Grob                109
BURKHART GROB                 11
Burkhart Grob Flugzeugbau      9
BURKHART GROB FLUGZEUGBAU      6
Burkhart Grob Flugzeugbah      1
Burkhart-grob                  1
Name: count, dtype: int64

In [1225]:
df.loc[df['Make'].isin(['BURKHART GROB', 'Burkhart Grob Flugzeugbau', 'BURKHART GROB FLUGZEUGBAU', 'Burkhart Grob Flugzeugbah', 'Burkhart-grob']), 'Make'] = 'Burkhart Grob'

df.loc[df['Make'] == 'Burkhart Grob', 'Aircraft_Category'] = 'Glider'

In [1226]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Fairchild            97
Lockheed             94
Balloon Works        87
Ayres                87
Swearingen           86
Mitsubishi           84
Hiller               82
British Aerospace    79
Embraer              75
Enstrom              74
Name: count, dtype: int64

In [1227]:
df[df['Make'].str.lower().str.startswith('fairchi')].value_counts('Make')

Make
Fairchild                131
Fairchild Hiller          35
FAIRCHILD                 27
Fairchild Swearingen      11
FAIRCHILD HILLER           4
Fairchild Dornier          3
FAIRCHILD HELI-PORTER      2
FAIRCHILD(HOWARD)          2
FAIRCHILD FUNK             1
Fairchild Heli-porter      1
Fairchild Industries       1
Fairchild Merlin           1
Fairchild-heliporter       1
Fairchild/swearingen       1
Name: count, dtype: int64

In [1228]:
df.loc[df['Make'].isin(['Fairchild Hiller', 'FAIRCHILD', 'Fairchild Swearingen', 'FAIRCHILD HILLER', 'Fairchild Dornier', 'FAIRCHILD HELI-PORTER', 'FAIRCHILD(HOWARD)', 'FAIRCHILD FUNK',
                       'Fairchild Heli-porter', 'Fairchild Industries', 'Fairchild Merlin', 'Fairchild-heliporter', 'Fairchild/swearingen']), 'Make'] = 'Fairchild'

df[df['Make'] == 'Fairchild'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           113
Airplane       73
Helicopter     35
Name: count, dtype: int64

In [1229]:
df[df['Make'].isin(['Fairchild']) & df['Aircraft_Category'].isna()].value_counts('Model', dropna=False).head(60)

Model
SA-227AC             8
SA-227-AC            7
M-62A-3              6
SA-226-TC            4
24G                  4
SA227AC              4
SA-227               4
M-62C                4
SA226T               3
F-27                 3
DO-328-300           3
24W-46               3
SA227                2
C-82A                2
M-62A                2
SA227-AT             2
SA 227               2
SA 227-AC            2
PT-19                2
24R-46A              2
24R-40               2
SA227BC              2
M-62                 2
SA 227-TT Merlin     1
SA-2226-TC           1
SA-226-T             1
24 C8C               1
SA-226T              1
SA-226TC             1
SA-227-TT            1
PT-26B               1
SA-266TC             1
SA226-T              1
SA227-DC             1
SA227-TT             1
Pilatus PC6/B2-H2    1
M62A (PT-19)         1
PT-23                1
PT-19A               1
24-J                 1
24R-46               1
24W-40               1
42                   1
C-119

The only models in this list that are helicopters are the FH1100 and FH-100. All the rest fall into the Airplane category.

In [1230]:
df.loc[df['Model'].isin(['FH1100', 'FH-100']), 'Aircraft_Category'] = 'Helicopter'

In [1231]:
# Make the rest of the NaN category values Airplane for Fairchild
df.loc[(df['Make'].isin(['Fairchild'])) & (df['Aircraft_Category'].isna()), 'Aircraft_Category'] = 'Airplane'

df[df['Make'] == 'Fairchild'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane      184
Helicopter     37
Name: count, dtype: int64

In [1232]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Lockheed             94
Ayres                87
Balloon Works        87
Swearingen           86
Mitsubishi           84
Hiller               81
British Aerospace    79
Embraer              75
Enstrom              74
Pitts                73
Name: count, dtype: int64

In [1233]:
df[df['Make'].str.lower().str.startswith('lockh')].value_counts('Make')

Make
Lockheed    111
LOCKHEED     11
Name: count, dtype: int64

In [1234]:
df.loc[df['Make'].isin(['LOCKHEED']), 'Make'] = 'Lockheed'

df[df['Make'] == 'Lockheed'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN             94
Airplane        27
Powered-Lift     1
Name: count, dtype: int64

In [1235]:
df.loc[df['Make'] == 'Lockheed', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Balloon Works        87
Ayres                87
Swearingen           86
Mitsubishi           84
Hiller               81
British Aerospace    79
Embraer              75
Enstrom              74
Pitts                73
Aerostar             72
Name: count, dtype: int64

In [1236]:
df[df['Make'].str.lower().str.startswith('ayre')].value_counts('Make')

Make
Ayres                213
AYRES CORPORATION     38
AYRES                 23
Ayres Corporation      7
AYRES THRUSH           2
AYRES CORP             1
Name: count, dtype: int64

In [1237]:
df.loc[df['Make'].isin(['AYRES CORPORATION', 'AYRES', 'Ayres Corporation', 'AYRES THRUSH', 'AYRES CORP']), 'Make'] = 'Ayres'

df[df['Make'] == 'Ayres'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    194
NaN          90
Name: count, dtype: int64

In [1238]:
df.loc[df['Make'] == 'Ayres', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Balloon Works        87
Swearingen           86
Mitsubishi           84
Hiller               81
British Aerospace    79
Embraer              75
Enstrom              74
Pitts                73
Aerostar             72
Unknown              65
Name: count, dtype: int64

In [1239]:
df[df['Make'].str.lower().str.startswith('balloo')].value_counts('Make')

Make
Balloon Works              135
BALLOON WORKS                8
Balloon Works Inc            1
Balloonbau Woerner Gmbh      1
Name: count, dtype: int64

In [1240]:
df.loc[df['Make'].isin(['BALLOON WORKS', 'Balloon Works Inc']), 'Make'] = 'Balloon Works'

df[df['Make'] == 'Balloon Works'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN        87
Balloon    57
Name: count, dtype: int64

In [1241]:
df.loc[df['Make'] == 'Balloon Works', 'Aircraft_Category'] = 'Balloon'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Swearingen           86
Mitsubishi           84
Hiller               81
British Aerospace    79
Embraer              75
Enstrom              74
Pitts                73
Aerostar             72
Unknown              65
Raven                64
Name: count, dtype: int64

In [1242]:
df[df['Make'].str.lower().str.startswith('swearin')].value_counts('Make')

Make
Swearingen                  141
SWEARINGEN                   29
Swearingen T R/masters W      1
Name: count, dtype: int64

In [1243]:
df.loc[df['Make'].isin(['SWEARINGEN', 'Swearingen T R/masters W']), 'Make'] = 'Swearingen'

df[df['Make'] == 'Swearingen'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         88
Airplane    83
Name: count, dtype: int64

In [1244]:
df.loc[df['Make'] == 'Swearingen', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Mitsubishi           84
Hiller               81
British Aerospace    79
Embraer              75
Enstrom              74
Pitts                73
Aerostar             72
Unknown              65
Learjet              64
Raven                64
Name: count, dtype: int64

In [1245]:
df[df['Make'].str.lower().str.startswith('mitsub')].value_counts('Make')

Make
Mitsubishi    126
MITSUBISHI     16
Name: count, dtype: int64

In [1246]:
df.loc[df['Make'].isin(['MITSUBISHI']), 'Make'] = 'Mitsubishi'

df[df['Make'] == 'Mitsubishi'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         84
Airplane    57
Unknown      1
Name: count, dtype: int64

In [1247]:
df.loc[df['Make'] == 'Mitsubishi', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Hiller               81
British Aerospace    79
Embraer              75
Enstrom              74
Pitts                73
Aerostar             72
Unknown              65
Learjet              64
Raven                64
Mbb                  62
Name: count, dtype: int64

In [1248]:
df[df['Make'].str.lower().str.startswith('hille')].value_counts('Make')

Make
Hiller                        311
HILLER                         37
Hiller-soloy                    9
HILLER-ROGERSON HELICOPTER      1
HILLER-TRI-PLEX IND.INC.        1
Hiller-osborn                   1
Hillery W. Grice                1
Name: count, dtype: int64

In [1249]:
df.loc[df['Make'].isin(['HILLER', 'Hiller-soloy', 'HILLER-ROGERSON HELICOPTER', 'HILLER-TRI-PLEX IND.INC.', 'Hiller-osborn']), 'Make'] = 'Hiller'

df[df['Make'] == 'Hiller'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Helicopter           274
NaN                   85
Powered Parachute      1
Name: count, dtype: int64

In [1250]:
df.loc[df['Make'] == 'Hiller', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
British Aerospace    79
Embraer              75
Enstrom              74
Pitts                73
Aerostar             72
Unknown              65
Learjet              64
Raven                64
Mbb                  62
Waco                 60
Name: count, dtype: int64

In [1251]:
df[df['Make'].str.lower().str.startswith('british')].value_counts('Make')

Make
British Aerospace                84
British Aircraft Corp. (bac)      6
BRITISH AEROSPACE                 5
BRITISH AIRCRAFT CORP             1
BRITISH AIRCRAFT CORP.            1
British Aerospace Civil Aircr     1
British Aircraft Corp. (BAC)      1
Name: count, dtype: int64

In [1252]:
df.loc[df['Make'].isin(['British Aircraft Corp. (bac)', 'BRITISH AEROSPACE', 'BRITISH AIRCRAFT CORP', 'BRITISH AIRCRAFT CORP.', 'British Aerospace Civil Aircr', 'British Aircraft Corp. (BAC)']),
'Make'] = 'British Aerospace'

df[df['Make'] == 'British Aerospace'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         84
Airplane    15
Name: count, dtype: int64

In [1253]:
df.loc[df['Make'] == 'British Aerospace', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Embraer          75
Enstrom          74
Pitts            73
Aerostar         72
Unknown          65
Raven            64
Learjet          64
Mbb              62
Waco             60
Schempp-hirth    57
Name: count, dtype: int64

In [1254]:
df[df['Make'].str.lower().str.startswith('embra')].value_counts('Make')

Make
EMBRAER                           130
Embraer                           106
EMBRAER S A                         9
EMBRAER-EMPRESA BRASILEIRA DE       6
EMBRAER S.A.                        2
EMBRAER EXECUTIVE AIRCRAFT INC      1
EMBRAER SA                          1
Embraer Aircraft                    1
Name: count, dtype: int64

In [1255]:
df.loc[df['Make'].isin(['EMBRAER', 'EMBRAER S A', 'EMBRAER-EMPRESA BRASILEIRA DE', 'EMBRAER S.A.', 'EMBRAER EXECUTIVE AIRCRAFT INC', 'EMBRAER SA', 'Embraer Aircraft']),'Make'] = 'Embraer'

df[df['Make'] == 'Embraer'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane      174
NaN            81
Helicopter      1
Name: count, dtype: int64

In [1256]:
df.loc[df['Make'] == 'Embraer', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Enstrom          74
Pitts            73
Aerostar         72
Unknown          65
Raven            64
Learjet          64
Mbb              62
Waco             60
Schempp-hirth    57
Helio            56
Name: count, dtype: int64

In [1257]:
df[df['Make'].str.lower().str.startswith('enstr')].value_counts('Make')

Make
Enstrom                    247
ENSTROM                     49
ENSTROM HELICOPTER CORP      7
Name: count, dtype: int64

In [1258]:
df.loc[df['Make'].isin(['ENSTROM', 'ENSTROM HELICOPTER CORP']),'Make'] = 'Enstrom'

df[df['Make'] == 'Enstrom'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Helicopter    229
NaN            74
Name: count, dtype: int64

In [1259]:
df.loc[df['Make'] == 'Enstrom', 'Aircraft_Category'] = 'Helicopter'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Pitts            73
Aerostar         72
Unknown          65
Learjet          64
Raven            64
Mbb              62
Waco             60
Schempp-hirth    57
Helio            56
Schleicher       52
Name: count, dtype: int64

In [1260]:
df[df['Make'].str.lower().str.startswith('pitts')].value_counts('Make')

Make
Pitts               128
Pitts Special        15
PITTS                13
PITTS AEROBATICS      3
PITTS SPECIAL         1
Pitts Spl.            1
Name: count, dtype: int64

In [1261]:
df.loc[df['Make'].isin(['Pitts', 'PITTS', 'PITTS AEROBATICS', 'PITTS SPECIAL', 'Pitts Spl.']),'Make'] = 'Pitts Special'

df[df['Make'] == 'Pitts Special'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         82
Airplane    79
Name: count, dtype: int64

In [1262]:
df.loc[df['Make'] == 'Pitts Special', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Aerostar                          72
Unknown                           65
Learjet                           64
Raven                             64
Mbb                               62
Waco                              60
Schempp-hirth                     57
Helio                             56
Schleicher                        52
Ercoupe (eng & Research Corp.)    51
Name: count, dtype: int64

In [1263]:
df[df['Make'].str.lower().str.startswith('aerost')].value_counts('Make')

Make
Aerostar                         85
AEROSTAR INTERNATIONAL INC       12
AEROSTAR INTERNATIONAL            8
Aerostar International            7
Aerostar International Inc.       5
AEROSTAR ACFT CORP OF TEXAS       3
AEROSTAR S A                      3
Aerostar, S.a                     3
AEROSTAR                          2
Aerostar International Inc        2
AEROSTAR AIRCRAFT CORPORATION     1
Aerostar Aircraft Corporation     1
Aerostar International, Inc.      1
Aerostar-raven                    1
Name: count, dtype: int64

In [1264]:
df.loc[df['Make'].isin(['AEROSTAR ACFT CORP OF TEXAS', 'AEROSTAR AIRCRAFT CORPORATION']),'Make'] = 'Aerostar Aircraft Corporation'
df.loc[df['Make'].isin(['AEROSTAR S A', 'Aerostar, S.a']),'Make'] = 'Aerostar, SA'
df.loc[df['Make'].isin(['Aerostar', 'AEROSTAR', 'AEROSTAR INTERNATIONAL', 'AEROSTAR INTERNATIONAL INC', 'Aerostar International Inc', 'Aerostar International Inc.', 'Aerostar International, Inc.',
                       'Aerostar-raven']),'Make'] = 'Aerostar International'

df[df['Make'] == 'Aerostar, SA'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    4
NaN         2
Name: count, dtype: int64

In [1265]:
df[df['Make'] == 'Aerostar Aircraft Corporation'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    5
Name: count, dtype: int64

In [1266]:
df[df['Make'] == 'Aerostar International'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         82
Balloon     40
Airplane     1
Name: count, dtype: int64

In [1267]:
df.loc[df['Make'] == 'Aerostar, SA', 'Aircraft_Category'] = 'Airplane'
df.loc[df['Make'] == 'Aerostar International', 'Aircraft_Category'] = 'Balloon'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                           65
Learjet                           64
Raven                             64
Mbb                               62
Waco                              60
Schempp-hirth                     57
Helio                             56
Schleicher                        52
Ercoupe (eng & Research Corp.)    51
Weatherly                         48
Name: count, dtype: int64

In [1268]:
df[df['Make'].str.lower().str.startswith('lear')].value_counts('Make')

Make
Learjet        109
LEARJET         25
LEARJET INC     10
Learjet Inc      1
Name: count, dtype: int64

In [1269]:
df.loc[df['Make'].isin(['LEARJET', 'LEARJET INC', 'Learjet Inc']),'Make'] = 'Learjet'

df[df['Make'] == 'Learjet'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    80
NaN         65
Name: count, dtype: int64

In [1270]:
df.loc[df['Make'] == 'Learjet', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                           65
Raven                             64
Mbb                               62
Waco                              60
Schempp-hirth                     57
Helio                             56
Schleicher                        52
Ercoupe (eng & Research Corp.)    51
Weatherly                         48
Ryan                              48
Name: count, dtype: int64

In [1271]:
df[df['Make'].str.lower().str.startswith('raven')].value_counts('Make')

Make
Raven                         84
Raven Industries               3
RAVEN AIRCRAFT CORPPRATION     1
RAVEN INDUSTRIES INC           1
Name: count, dtype: int64

In [1272]:
df.loc[df['Make'].isin(['Raven', 'RAVEN INDUSTRIES INC']),'Make'] = 'Raven Industries'

df[df['Make'] == 'Raven Industries'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN        65
Balloon    23
Name: count, dtype: int64

In [1273]:
df.loc[df['Make'] == 'Raven Industries', 'Aircraft_Category'] = 'Balloon'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                           65
Mbb                               62
Waco                              60
Schempp-hirth                     57
Helio                             56
Schleicher                        52
Ercoupe (eng & Research Corp.)    51
Ryan                              48
Weatherly                         48
Fokker                            47
Name: count, dtype: int64

In [1274]:
df[df['Make'].str.lower().str.startswith('mbb')].value_counts('Make')

Make
Mbb           70
MBB            3
Mbb-bolkow     1
Name: count, dtype: int64

In [1275]:
df.loc[df['Make'].isin(['MBB', 'Mbb-bolkow']),'Make'] = 'Mbb'

df[df['Make'] == 'Mbb'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           62
Helicopter    12
Name: count, dtype: int64

In [1276]:
df.loc[df['Make'] == 'Mbb', 'Aircraft_Category'] = 'Helicopter'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                           65
Waco                              60
Schempp-hirth                     57
Helio                             56
Schleicher                        52
Ercoupe (eng & Research Corp.)    51
Weatherly                         48
Ryan                              48
Cameron                           47
Fokker                            47
Name: count, dtype: int64

In [1277]:
df[df['Make'].str.lower().str.startswith('wac')].value_counts('Make')

Make
Waco                           114
WACO                            24
WACO CLASSIC AIRCRAFT            7
WACO CLASSIC AIRCRAFT CORP       3
Waco Classic Aircraft Corp.      2
Waco Classic Aircraft            1
Waco Classic Aircraft Corp       1
Name: count, dtype: int64

In [1278]:
df.loc[df['Make'].isin(['WACO', 'WACO CLASSIC AIRCRAFT', 'WACO CLASSIC AIRCRAFT CORP', 'Waco Classic Aircraft Corp.', 'Waco Classic Aircraft', 'Waco Classic Aircraft Corp']),'Make'] = 'Waco'

df[df['Make'] == 'Waco'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    92
NaN         60
Name: count, dtype: int64

In [1279]:
df.loc[df['Make'] == 'Waco', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                           65
Schempp-hirth                     57
Helio                             56
Schleicher                        52
Ercoupe (eng & Research Corp.)    51
Weatherly                         48
Ryan                              48
Cameron                           47
Fokker                            47
Smith, Ted Aerostar               46
Name: count, dtype: int64

In [1280]:
df[df['Make'].str.lower().str.startswith('schem')].value_counts('Make')

Make
Schempp-hirth                     67
SCHEMPP-HIRTH                     17
Schempp Hirth                      6
SCHEMPP HIRTH                      4
Schempp-Hirth                      3
SCHEMPP-HIRTH FLUGZEUGBAU          2
SCHEMPP HIRTH FLUGZEUGBAU GMBH     1
SCHEMPP-HIRTH FLUGZEUGBAU GMBH     1
SCHEMPP-HIRTH K G                  1
Schempp-hirth K.g.                 1
Name: count, dtype: int64

In [1281]:
df.loc[df['Make'].isin(['Schempp-hirth', 'SCHEMPP-HIRTH', 'Schempp Hirth', 'SCHEMPP HIRTH', 'SCHEMPP-HIRTH FLUGZEUGBAU', 'SCHEMPP HIRTH FLUGZEUGBAU GMBH', 'SCHEMPP-HIRTH FLUGZEUGBAU GMBH',
                       'SCHEMPP-HIRTH K G', 'Schempp-hirth K.g.']),'Make'] = 'Schempp-Hirth'

df[df['Make'] == 'Schempp-Hirth'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN       58
Glider    45
Name: count, dtype: int64

In [1282]:
df.loc[df['Make'] == 'Schempp-Hirth', 'Aircraft_Category'] = 'Glider'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                           65
Helio                             56
Schleicher                        52
Ercoupe (eng & Research Corp.)    51
Ryan                              48
Weatherly                         48
Cameron                           47
Fokker                            47
Smith, Ted Aerostar               46
Aviat                             45
Name: count, dtype: int64

In [1283]:
df[df['Make'].str.lower().str.startswith('helio')].value_counts('Make')

Make
Helio                 94
HELIO                 19
Helio Aircraft Ltd     1
Heliotech              1
Name: count, dtype: int64

In [1284]:
df.loc[df['Make'].isin(['HELIO', 'Helio Aircraft Ltd']),'Make'] = 'Helio'

df[df['Make'] == 'Helio'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    57
NaN         57
Name: count, dtype: int64

In [1285]:
df.loc[df['Make'] == 'Helio', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                           65
Schleicher                        52
Ercoupe (eng & Research Corp.)    51
Weatherly                         48
Ryan                              48
Cameron                           47
Fokker                            47
Smith, Ted Aerostar               46
Aviat                             45
Rotorway                          45
Name: count, dtype: int64

In [1286]:
df[df['Make'].str.lower().str.startswith('schlei')].value_counts('Make')

Make
Schleicher                        95
SCHLEICHER                        45
SCHLEICHER ALEXANDER GMBH & CO     2
SCHLEICHER ALEXANDER               1
Schlei                             1
Schleicher Alexander Gmbh          1
Name: count, dtype: int64

In [1287]:
df.loc[df['Make'].isin(['ALEXANDER SCHLEICHER GMBH & CO', 'Alexander Schleicher', 'SCHLEICHER', 'SCHLEICHER ALEXANDER GMBH & CO', 'SCHLEICHER ALEXANDER', 'Schlei',
                        'Schleicher Alexander Gmbh']),'Make'] = 'Schleicher'

df[df['Make'] == 'Schleicher'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Glider    96
NaN       56
Name: count, dtype: int64

In [1288]:
df.loc[df['Make'] == 'Schleicher', 'Aircraft_Category'] = 'Glider'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                           65
Ercoupe (eng & Research Corp.)    51
Ryan                              48
Weatherly                         48
Cameron                           47
Fokker                            47
Smith, Ted Aerostar               46
Rotorway                          45
Aviat                             45
Gulfstream                        42
Name: count, dtype: int64

In [1289]:
df[df['Make'].str.lower().str.startswith('ercou')].value_counts('Make')

Make
Ercoupe (eng & Research Corp.)    155
Ercoupe                            54
ERCOUPE                            34
Ercoupe (Eng & Research Corp.)      4
Name: count, dtype: int64

In [1290]:
df.loc[df['Make'].isin(['Ercoupe (eng & Research Corp.)', 'ERCOUPE', 'Ercoupe (Eng & Research Corp.)']),'Make'] = 'Ercoupe'

df[df['Make'] == 'Ercoupe'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    188
NaN          59
Name: count, dtype: int64

In [1291]:
df.loc[df['Make'] == 'Ercoupe', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                65
Weatherly              48
Ryan                   48
Cameron                47
Fokker                 47
Smith, Ted Aerostar    46
Aviat                  45
Rotorway               45
Gulfstream             42
Gates Learjet          41
Name: count, dtype: int64

In [1292]:
df[df['Make'].str.lower().str.startswith('weatherl')].value_counts('Make')

Make
Weatherly                         87
WEATHERLY AVIATION CO INC          9
WEATHERLY                          7
Weatherly Aviation Company Inc     1
Name: count, dtype: int64

In [1293]:
df.loc[df['Make'].isin(['WEATHERLY AVIATION CO INC', 'WEATHERLY', 'Weatherly Aviation Company Inc']),'Make'] = 'Weatherly'

df[df['Make'] == 'Weatherly'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    56
NaN         48
Name: count, dtype: int64

In [1294]:
df.loc[df['Make'] == 'Weatherly', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                65
Ryan                   48
Cameron                47
Fokker                 47
Smith, Ted Aerostar    46
Aviat                  45
Rotorway               45
Gulfstream             42
Gates Learjet          41
Eipper                 39
Name: count, dtype: int64

In [1295]:
df[df['Make'].str.lower().str.startswith('ryan')].value_counts('Make')

Make
Ryan                 92
RYAN                 16
RYAN AERONAUTICAL     7
Ryan Aeronautical     3
Ryan-navion           3
RYAN JOHN STEFFEY     1
RYAN W Gross          1
Ryan Aeronautics      1
Ryan, Robert R.       1
Name: count, dtype: int64

In [1296]:
df.loc[df['Make'].isin(['RYAN', 'RYAN AERONAUTICAL', 'Ryan Aeronautical', 'Ryan Aeronautics']),'Make'] = 'Ryan'

df[df['Make'] == 'Ryan'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    70
NaN         49
Name: count, dtype: int64

In [1297]:
df.loc[df['Make'] == 'Ryan', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                65
Cameron                47
Fokker                 47
Smith, Ted Aerostar    46
Rotorway               45
Aviat                  45
Gulfstream             42
Gates Learjet          41
Eipper                 39
Canadair               38
Name: count, dtype: int64

In [1298]:
df[df['Make'].str.lower().str.startswith('camer')].value_counts('Make')

Make
Cameron                 54
Cameron Balloons        21
CAMERON BALLOONS US     12
CAMERON                  8
CAMERON BALLOONS         4
Cameron Balloon          2
CAMERON BALLOONS U S     1
Cameron Ballon           1
Cameron Balloons US      1
Cameron Balloons Us      1
Name: count, dtype: int64

In [1299]:
df.loc[df['Make'].isin(['Cameron Balloons', 'CAMERON BALLOONS US', 'CAMERON', 'CAMERON BALLOONS', 'Cameron Balloon', 'CAMERON BALLOONS U S', 'Cameron Ballon', 'Cameron Balloons US',
                       'Cameron Balloons Us']),'Make'] = 'Cameron'

df[df['Make'] == 'Cameron'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         59
Balloon     45
Airplane     1
Name: count, dtype: int64

In [1300]:
df.loc[df['Make'] == 'Cameron', 'Aircraft_Category'] = 'Balloon'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                  65
Fokker                   47
Smith, Ted Aerostar      46
Rotorway                 45
Aviat                    45
Gulfstream               42
Gates Learjet            41
Eipper                   39
Canadair                 38
Saab-scania Ab (saab)    38
Name: count, dtype: int64

In [1301]:
df[df['Make'].str.lower().str.startswith('fokk')].value_counts('Make')

Make
Fokker    56
FOKKER     8
Name: count, dtype: int64

In [1302]:
df.loc[df['Make'].isin(['FOKKER']),'Make'] = 'Fokker'

df[df['Make'] == 'Fokker'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         47
Airplane    17
Name: count, dtype: int64

In [1303]:
df.loc[df['Make'] == 'Fokker', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                  65
Smith, Ted Aerostar      46
Aviat                    45
Rotorway                 45
Gulfstream               42
Gates Learjet            41
Eipper                   39
Canadair                 38
Saab-scania Ab (saab)    38
Wsk Pzl Mielec           37
Name: count, dtype: int64

In [1304]:
df[df['Make'].str.lower().str.startswith('smith, ted a')].value_counts('Make')

Make
Smith, Ted Aerostar    50
Name: count, dtype: int64

In [1305]:
df.loc[df['Make'].isin(['Smith, Ted Aerostar', 'Ted Smith']),'Make'] = 'Ted Smith Aerostar'

df[df['Make'] == 'Ted Smith Aerostar'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         46
Airplane     7
Name: count, dtype: int64

In [1306]:
df.loc[df['Make'] == 'Ted Smith Aerostar', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                  65
Rotorway                 45
Aviat                    45
Gulfstream               42
Gates Learjet            41
Eipper                   39
Canadair                 38
Saab-scania Ab (saab)    38
Wsk Pzl Mielec           37
Aerotek                  35
Name: count, dtype: int64

In [1307]:
df[df['Make'].str.lower().str.startswith('rotorw')].value_counts('Make')

Make
Rotorway                   56
ROTORWAY                   12
Rotorway Aircraft, Inc.     1
Rotorway Executive          1
Name: count, dtype: int64

In [1308]:
df.loc[df['Make'].isin(['ROTORWAY', 'Rotorway Aircraft, Inc.', 'Rotorway Executive']),'Make'] = 'Rotorway'

df[df['Make'] == 'Rotorway'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           45
Helicopter    25
Name: count, dtype: int64

In [1309]:
df.loc[df['Make'] == 'Rotorway', 'Aircraft_Category'] = 'Helicopter'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                  65
Aviat                    45
Gulfstream               42
Gates Learjet            41
Eipper                   39
Canadair                 38
Saab-scania Ab (saab)    38
Wsk Pzl Mielec           37
Aerotek                  35
Convair                  34
Name: count, dtype: int64

In [1310]:
df[df['Make'].str.lower().str.startswith('aviat')].value_counts('Make')

Make
Aviat                             112
AVIAT AIRCRAFT INC                 72
AVIAT                              28
AVIAT INC                           8
Aviat Aircraft Inc                  5
Aviat Aircraft Inc.                 2
Aviat Inc                           2
Aviation International Rotors       2
AVIAT AIRCRAFT                      1
AVIATE                              1
Aviat Aircraft, Inc.                1
Aviation Adv.                       1
Aviation Specialties Unlimited      1
Name: count, dtype: int64

In [1311]:
df.loc[df['Make'].isin(['AVIAT', 'AVIAT AIRCRAFT', 'AVIAT AIRCRAFT INC', 'Aviat Aircraft Inc', 'Aviat Aircraft Inc.', 'Aviat Aircraft, Inc.',
                       'AVIAT INC', 'Aviat Inc', 'AVIATE']),'Make'] = 'Aviat'

df[df['Make'] == 'Aviat'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane        185
NaN              46
Weight-Shift      1
Name: count, dtype: int64

In [1312]:
df.loc[df['Make'] == 'Aviat', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                  65
Gulfstream               42
Gates Learjet            41
Eipper                   39
Canadair                 38
Saab-scania Ab (saab)    38
Wsk Pzl Mielec           37
Aerotek                  35
Navion                   34
Convair                  34
Name: count, dtype: int64

In [1313]:
df[df['Make'].str.lower().str.startswith('gulfs')].value_counts('Make')

Make
Gulfstream                       62
Gulfstream American              55
GULFSTREAM                       24
Gulfstream-schweizer             22
Gulfstream Aerospace             17
GULFSTREAM AEROSPACE             12
GULFSTREAM-SCHWEIZER A/C CORP     9
GULFSTREAM AMERICAN CORP          9
Gulfstream-Schweizer              3
GULFSTREAM AMERICAN CORP.         2
GULFSTREAM AM CORP COMM DIV       2
Gulfstream American Corp          2
Gulfstream American Corp.         2
GULFSTREAM SCHWEIZER A/C CORP     1
GULFSTREAM-SCHWEIZER              1
Gulfstream Aerospace Corp         1
Gulfstream Aerospace Corp.        1
Gulfstream Aerospace LP           1
Name: count, dtype: int64

In [1314]:
df.loc[df['Make'].isin(['GULFSTREAM', 'Gulfstream Aerospace', 'GULFSTREAM AEROSPACE', 'Gulfstream Aerospace Corp', 'Gulfstream Aerospace Corp.',
                       'Gulfstream Aerospace LP', 'GULFSTREAM AM CORP COMM DIV', 'Gulfstream American', 'GULFSTREAM AMERICAN CORP',
                       'Gulfstream American Corp', 'GULFSTREAM AMERICAN CORP.', 'Gulfstream American Corp.', 'GULFSTREAM SCHWEIZER A/C CORP',
                       'Gulfstream-schweizer', 'Gulfstream-Schweizer', 'GULFSTREAM-SCHWEIZER', 'GULFSTREAM-SCHWEIZER A/C CORP']),'Make'] = 'Gulfstream'

df[df['Make'] == 'Gulfstream'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    136
NaN          90
Name: count, dtype: int64

In [1315]:
df.loc[df['Make'] == 'Gulfstream', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                  65
Gates Learjet            41
Eipper                   39
Canadair                 38
Saab-scania Ab (saab)    38
Wsk Pzl Mielec           37
Aerotek                  35
Navion                   34
Luscombe                 34
Eurocopter               34
Name: count, dtype: int64

In [1316]:
df[df['Make'].str.lower().str.startswith('gates')].value_counts('Make')

Flushing oldest 200 entries.
  warn('Output cache limit (currently {sz} entries) hit.\n'


Make
Gates Learjet                63
GATES LEARJET CORP.           8
GATES LEAR JET                5
GATES LEAR JET CORP.          2
GATES LEARJET                 1
GATES LEARJET CORP            1
Gates Lear Jet                1
Gates Learjet Corporation     1
Name: count, dtype: int64

In [1317]:
df.loc[df['Make'].isin(['GATES LEARJET CORP.', 'GATES LEAR JET', 'GATES LEAR JET CORP.', 'GATES LEARJET', 'GATES LEARJET CORP',
                        'Gates Lear Jet', 'Gates Learjet Corporation']),'Make'] = 'Gates Learjet'

df[df['Make'] == 'Gates Learjet'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    41
NaN         41
Name: count, dtype: int64

In [1318]:
df.loc[df['Make'] == 'Gates Learjet', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                  65
Eipper                   39
Canadair                 38
Saab-scania Ab (saab)    38
Wsk Pzl Mielec           37
Aerotek                  35
Convair                  34
Luscombe                 34
Eurocopter               34
Navion                   34
Name: count, dtype: int64

In [1319]:
df[df['Make'].str.lower().str.startswith('eipp')].value_counts('Make')

Make
Eipper                      43
EIPPER                       3
EIPPER FORMANCE INC          1
Eippen Aircraft              1
Eipper Formance              1
Eipper Mx Ii Quicksilver     1
Eipper Quicksilver           1
Eipper Quicksiver E          1
Eipper-formance              1
Name: count, dtype: int64

In [1320]:
df.loc[df['Make'].isin(['EIPPER', 'EIPPER FORMANCE INC', 'Eippen Aircraft', 'Eipper Formance', 'Eipper Mx Ii Quicksilver', 'Eipper Quicksilver',
                       'Eipper Quicksiver E', 'Eipper-formance']),'Make'] = 'Eipper'

df[df['Make'] == 'Eipper'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           43
Airplane       9
Ultralight     1
Name: count, dtype: int64

In [1321]:
df.loc[df['Make'] == 'Eipper', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                  65
Saab-scania Ab (saab)    38
Canadair                 38
Wsk Pzl Mielec           37
Aerotek                  35
Navion                   34
Luscombe                 34
Eurocopter               34
Convair                  34
Rolladen-schneider       33
Name: count, dtype: int64

In [1322]:
df[df['Make'].str.lower().str.startswith('saab')].value_counts('Make')

Make
Saab-scania Ab (saab)    43
SAAB                     16
Saab                      6
Saab-fairchild            6
Saab-scania               2
SAAB-SCANIA               1
SAAB-SCANIA AB            1
Saab-Scania AB (Saab)     1
Saabye                    1
Name: count, dtype: int64

In [1323]:
df.loc[df['Make'].isin(['Saab-scania Ab (saab)', 'SAAB', 'Saab-fairchild', 'Saab-scania', 'SAAB-SCANIA', 'SAAB-SCANIA AB',
                        'Saab-Scania AB (Saab)']),'Make'] = 'Saab'

df[df['Make'] == 'Saab'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         55
Airplane    21
Name: count, dtype: int64

In [1324]:
df.loc[df['Make'] == 'Saab', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown               65
Canadair              38
Wsk Pzl Mielec        37
Aerotek               35
Convair               34
Luscombe              34
Navion                34
Eurocopter            34
Stinson               33
Rolladen-schneider    33
Name: count, dtype: int64

In [1325]:
df[df['Make'].str.lower().str.startswith('canada')].value_counts('Make')

Make
Canadair        62
CANADAIR         6
CANADAIR LTD     4
Name: count, dtype: int64

In [1326]:
df.loc[df['Make'].isin(['CANADAIR', 'CANADAIR LTD']),'Make'] = 'Canadair'

df[df['Make'] == 'Canadair'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         38
Airplane    34
Name: count, dtype: int64

In [1327]:
df.loc[df['Make'] == 'Canadair', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown               65
Wsk Pzl Mielec        37
Aerotek               35
Eurocopter            34
Navion                34
Convair               34
Luscombe              34
Rolladen-schneider    33
Stinson               33
Britten-norman        32
Name: count, dtype: int64

In [1328]:
df[df['Make'].str.lower().str.startswith('wsk')].value_counts('Make')

Make
Wsk Pzl Mielec            77
WSK PZL MIELEC            11
Wsk Pzl                    4
Wsk Pzl Krosno             4
Wsk Pzl Warzawa-okecie     4
WSK-MIELEC                 2
WSK-PZL WARZAWA-OKECIE     2
Wsk                        2
WSK-PZL MEILEC             1
Wsk Pzl Swidnik            1
Wsk Pzl-krosno             1
Wsk-pzl Mielec             1
Wsk-pzl Mielic             1
Wsk-pzl Warzawaokecie      1
Name: count, dtype: int64

In [1329]:
df.loc[df['Make'].isin(['Wsk', 'Wsk Pzl', 'WSK PZL MIELEC', 'WSK-MIELEC', 'WSK-PZL MEILEC', 'Wsk-pzl Mielec',
                        'Wsk-pzl Mielic']),'Make'] = 'Wsk Pzl Mielec'

df[df['Make'] == 'Wsk Pzl Mielec'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    56
NaN         43
Name: count, dtype: int64

In [1330]:
df.loc[df['Make'] == 'Wsk Pzl Mielec', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown               65
Aerotek               35
Convair               34
Luscombe              34
Navion                34
Eurocopter            34
Rolladen-schneider    33
Stinson               33
Socata                32
Britten-norman        32
Name: count, dtype: int64

In [1331]:
df[df['Make'].str.lower().str.startswith('aerot')].value_counts('Make')

Make
Aerotek          38
AEROTEK           8
Aerotrike         4
Aerotechnik       3
AEROTRIKE         2
AEROTECHNIK       1
AEROTEK INC       1
Aerotek-pitts     1
Aerotrek          1
Name: count, dtype: int64

In [1332]:
df.loc[df['Make'].isin(['AEROTEK', 'AEROTEK INC', 'Aerotek-pitts', 'Aerotrek']),'Make'] = 'Aerotek'

df[df['Make'] == 'Aerotek'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         35
Airplane    14
Name: count, dtype: int64

In [1333]:
df.loc[df['Make'] == 'Aerotek', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown               65
Convair               34
Eurocopter            34
Navion                34
Luscombe              34
Stinson               33
Rolladen-schneider    33
Socata                32
Britten-norman        32
Kaman                 31
Name: count, dtype: int64

In [1334]:
df[df['Make'].str.lower().str.startswith('conv')].value_counts('Make')

Make
Convair                          39
Convair Div. Of Gen. Dynamics     6
CONVAIR                           4
Name: count, dtype: int64

In [1335]:
df.loc[df['Make'].isin(['Convair Div. Of Gen. Dynamics', 'CONVAIR']),'Make'] = 'Convair'

df[df['Make'] == 'Convair'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         40
Airplane     9
Name: count, dtype: int64

In [1336]:
df.loc[df['Make'] == 'Convair', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown               65
Navion                34
Luscombe              34
Eurocopter            34
Rolladen-schneider    33
Stinson               33
Britten-norman        32
Socata                32
Kaman                 31
Short Brothers        30
Name: count, dtype: int64

In [1337]:
df['Aircraft_Category'].value_counts(dropna=False)

Aircraft_Category
Airplane             72138
Helicopter            7484
NaN                   6137
Glider                1096
Balloon                582
Gyrocraft              188
Weight-Shift           160
Powered Parachute       90
Ultralight              48
WSFT                     9
Unknown                  8
Blimp                    4
Powered-Lift             3
UNK                      2
Rocket                   1
ULTR                     1
Name: count, dtype: int64

We are down to about 6000 empty category entries at this point. The column for Amateur_Built may enable us to remove a number of unneeded rows.

In [1338]:
# show the amateur_built value_counts for the empty category entries
df[df['Aircraft_Category'].isna()]['Amateur_Built'].value_counts(dropna=False)

Amateur_Built
Yes        3824
No         2303
Unknown      10
Name: count, dtype: int64

So we can drop the almost 4000 rows that are listed as Amateur_Built

In [1339]:
df = df.drop(df[(df['Amateur_Built'] == 'Yes') & (df['Aircraft_Category'].isna())].index)

df[df['Aircraft_Category'].isna()]['Amateur_Built'].value_counts(dropna=False)

Amateur_Built
No         2303
Unknown      10
Name: count, dtype: int64

In [1340]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown               45
Navion                34
Luscombe              34
Eurocopter            34
Stinson               33
Socata                32
Britten-norman        32
Kaman                 31
Rolladen-schneider    30
Short Brothers        30
Name: count, dtype: int64

In [1341]:
df[df['Make'].str.lower().str.startswith('navi')].value_counts('Make')

Make
Navion    63
NAVION    16
Name: count, dtype: int64

In [1342]:
df.loc[df['Make'].isin(['NAVION']),'Make'] = 'Navion'

df[df['Make'] == 'Navion'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    45
NaN         34
Name: count, dtype: int64

In [1343]:
df.loc[df['Make'] == 'Navion', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown               45
Eurocopter            34
Luscombe              34
Stinson               33
Socata                32
Britten-norman        32
Kaman                 31
Short Brothers        30
Rolladen-schneider    30
Lake                  29
Name: count, dtype: int64

In [1344]:
df[df['Make'].str.lower().str.startswith('euroc')].value_counts('Make')

Make
EUROCOPTER                     128
Eurocopter                      90
Eurocopter France               40
EUROCOPTER DEUTSCHLAND GMBH     20
Eurocopter Deutschland          10
EUROCOPTER FRANCE                5
Eurocopter Deutsch               1
Eurocopter Deutschland Gmbh      1
Name: count, dtype: int64

In [1345]:
df.loc[df['Make'].isin(['EUROCOPTER', 'Eurocopter France', 'EUROCOPTER DEUTSCHLAND GMBH', 'Eurocopter Deutschland', 'EUROCOPTER FRANCE',
                        'Eurocopter Deutsch', 'Eurocopter Deutschland Gmbh']),'Make'] = 'Eurocopter'

df[df['Make'] == 'Eurocopter'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Helicopter    234
NaN            61
Name: count, dtype: int64

In [1346]:
df[df['Make'].str.lower().str.startswith('lusc')].value_counts('Make')

Make
Luscombe                          316
LUSCOMBE                           95
Luscombe Silvaire Aircraft Co.      1
Name: count, dtype: int64

In [1347]:
df.loc[df['Make'].isin(['LUSCOMBE', 'Luscombe Silvaire Aircraft Co.']),'Make'] = 'Luscombe'

df[df['Make'] == 'Luscombe'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    378
NaN          34
Name: count, dtype: int64

In [1348]:
df[df['Make'].str.lower().str.startswith('stins')].value_counts('Make')

Make
Stinson    342
STINSON     91
Name: count, dtype: int64

In [1349]:
df.loc[df['Make'].isin(['STINSON']),'Make'] = 'Stinson'

df[df['Make'] == 'Stinson'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    400
NaN          33
Name: count, dtype: int64

In [1350]:
df[df['Make'].str.lower().str.startswith('soca')].value_counts('Make')

Make
SOCATA                        63
Socata                        62
Socata-Groupe Aerospatiale     1
Name: count, dtype: int64

In [1351]:
df.loc[df['Make'].isin(['SOCATA', 'Socata-Groupe Aerospatiale']),'Make'] = 'Socata'

df[df['Make'] == 'Socata'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    94
NaN         32
Name: count, dtype: int64

In [1352]:
df[df['Make'].str.lower().str.startswith('britt')].value_counts('Make')

Make
Britten-norman    37
BRITTEN NORMAN    11
BRITTEN-NORMAN     5
Britten Norman     2
Name: count, dtype: int64

In [1353]:
df.loc[df['Make'].isin(['Britten-norman', 'BRITTEN NORMAN', 'BRITTEN-NORMAN', 'Britten Norman']),'Make'] = 'Britten-Norman'

df[df['Make'] == 'Britten-Norman'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         33
Airplane    22
Name: count, dtype: int64

In [1354]:
df[df['Make'].str.lower().str.startswith('kama')].value_counts('Make')

Make
Kaman                   35
KAMAN                    3
KAMAN AEROSPACE CORP     2
Name: count, dtype: int64

In [1355]:
df.loc[df['Make'].isin(['KAMAN', 'KAMAN AEROSPACE CORP']),'Make'] = 'Kaman'

df[df['Make'] == 'Kaman'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           33
Helicopter     6
Airplane       1
Name: count, dtype: int64

There's one plane listed for Kaman. What is it?

In [1356]:
# what are the value_counts for Model when Aircraft_Category is Airplane and Make is Kaman
df[(df['Aircraft_Category'] == ('Airplane')) & (df['Make'] == 'Kaman')]['Model'].value_counts(dropna=False)

Model
K1200 K-Max    1
Name: count, dtype: int64

The K1200 is a helicopter, so that entry is wrong.

In [1357]:
df[df['Make'].str.lower().str.startswith('short b')].value_counts('Make')

Make
Short Brothers                   32
SHORT BROS                        7
SHORT BROS. & HARLAND             5
Short Bros.                       3
SHORT BROTHERS & HARLAND LTD.     1
SHORT BROTHERS PLC                1
Name: count, dtype: int64

In [1358]:
df.loc[df['Make'].isin(['SHORT BROS', 'SHORT BROS. & HARLAND', 'Short Bros.', 'SHORT BROTHERS & HARLAND LTD.',
                        'SHORT BROTHERS PLC']),'Make'] = 'Short Brothers'

df[df['Make'] == 'Short Brothers'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         33
Airplane    16
Name: count, dtype: int64

In [1359]:
df[df['Make'].str.lower().str.startswith('rol')].value_counts('Make')

Make
Rolladen-schneider         33
ROLLADEN-SCHNEIDER          5
ROLLADEN-SCHNEIDER OHG      3
ROLLADEN SCHNEIDER OHG      1
ROLLADEN-SCHNEIDER GMBH     1
Rolladen Schneider          1
Rolladen-schneider Gmbh     1
Name: count, dtype: int64

In [1360]:
df.loc[df['Make'].isin(['Rolladen-schneider', 'ROLLADEN-SCHNEIDER', 'ROLLADEN-SCHNEIDER OHG', 'ROLLADEN SCHNEIDER OHG', 'ROLLADEN-SCHNEIDER GMBH',
                       'Rolladen Schneider', 'Rolladen-schneider Gmbh']),'Make'] = 'Rolladen-Schneider'

df[df['Make'] == 'Rolladen-Schneider'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         30
Glider      14
Airplane     1
Name: count, dtype: int64

In [1361]:
df[df['Make'].str.lower().str.startswith('lak')].value_counts('Make')

Make
Lake           136
LAKE            14
Lake John K      1
Name: count, dtype: int64

In [1362]:
df.loc[df['Make'].isin(['LAKE']),'Make'] = 'Lake'

df[df['Make'] == 'Lake'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    121
NaN          29
Name: count, dtype: int64

In [1363]:
# fill in the previous makes for their respective categories
df.loc[df['Make'].isin(['Rolladen-Schneider']),'Aircraft_Category'] = 'Glider'
df.loc[df['Make'].isin(['Kaman', 'Eurocopter']),'Aircraft_Category'] = 'Helicopter'
df.loc[df['Make'].isin(['Britten-Norman', 'Socata', 'Stinson', 'Luscombe', 'Short Brothers', 'Lake']),'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                        45
Pilatus                        29
Agusta                         28
Texas Helicopter               28
Continental Copters            27
Alon                           27
Hawker Siddeley                26
Let                            26
Diamond Aircraft Industries    26
Siai-marchetti                 25
Name: count, dtype: int64

In [1364]:
df[df['Make'].str.lower().str.startswith('pila')].value_counts('Make')

Make
Pilatus                   36
PILATUS                   22
PILATUS AIRCRAFT LTD       8
PILATUS BRITTEN-NORMAN     1
Pilatus Aircraft           1
Pilatus Britten-norman     1
Name: count, dtype: int64

In [1365]:
df.loc[df['Make'].isin(['PILATUS', 'PILATUS AIRCRAFT LTD', 'PILATUS BRITTEN-NORMAN', 'Pilatus Aircraft', 'Pilatus Britten-norman']),'Make'] = 'Pilatus'

df[df['Make'] == 'Pilatus'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    36
NaN         32
Glider       1
Name: count, dtype: int64

In [1366]:
df[df['Make'].str.lower().str.startswith('agu')].value_counts('Make')

Make
Agusta                            44
AGUSTA                            21
AGUSTA SPA                         7
AGUSTA BELL                        5
AGUSTAWESTLAND                     4
AGUSTA AEROSPACE CORP              2
AGUSTAWESTLAND SPA                 2
AGUSTAWESTLAND PHILADELPHIA        1
AGUSTAWESTLAND PHILADELPHIA CO     1
Agusta Spa                         1
Agusta-bell                        1
Agusta/Westland                    1
AgustaWestland                     1
AgustadWestland                    1
Name: count, dtype: int64

In [1367]:
df.loc[df['Make'].isin(['AGUSTA', 'AGUSTA SPA', 'AGUSTA BELL', 'AGUSTAWESTLAND', 'AGUSTA AEROSPACE CORP', 'AGUSTAWESTLAND SPA',
                       'AGUSTAWESTLAND PHILADELPHIA', 'AGUSTAWESTLAND PHILADELPHIA CO', 'Agusta Spa', 'Agusta-bell', 'Agusta/Westland',
                       'AgustaWestland', 'AgustadWestland']),'Make'] = 'Agusta'

df[df['Make'] == 'Agusta'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Helicopter    59
NaN           32
Unknown        1
Name: count, dtype: int64

In [1368]:
df[df['Make'].str.lower().str.startswith('texas h')].value_counts('Make')

Make
Texas Helicopter                30
TEXAS HELICOPTER CORP           10
Texas Helicopter Corp.           1
Texas Helicopter Corporation     1
Name: count, dtype: int64

In [1369]:
df.loc[df['Make'].isin(['TEXAS HELICOPTER CORP', 'Texas Helicopter Corp.', 'Texas Helicopter Corporation']),'Make'] = 'Texas Helicopter'

df[df['Make'] == 'Texas Helicopter'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           28
Helicopter    14
Name: count, dtype: int64

In [1370]:
df[df['Make'].str.lower().str.startswith('contin')].value_counts('Make')

Make
Continental Copters         27
CONTINENTAL COPTERS INC.     6
Continental                  3
CONTINENTAL COPTERS INC      2
CONTINENTAL COPTERS          1
Continental Mk5a             1
Name: count, dtype: int64

In [1371]:
df.loc[df['Make'].isin(['CONTINENTAL COPTERS INC.', 'Continental', 'CONTINENTAL COPTERS INC', 'CONTINENTAL COPTERS',
                        'Continental Mk5a']),'Make'] = 'Continental Copters'

df[df['Make'] == 'Continental Copters'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           27
Helicopter    12
Airplane       1
Name: count, dtype: int64

In [1372]:
df[df['Make'].str.lower().str.startswith('alon')].value_counts('Make')

Make
Alon             29
ALON             11
ALONSO            1
Alon Aircoupe     1
Name: count, dtype: int64

In [1373]:
df.loc[df['Make'].isin(['ALON', 'Alon Aircoupe']),'Make'] = 'Alon'

df[df['Make'] == 'Alon'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         28
Airplane    13
Name: count, dtype: int64

In [1374]:
df[df['Make'].str.lower().str.startswith('hawk')].value_counts('Make')

Make
Hawker Siddeley                  28
HAWKER BEECHCRAFT CORP           13
Hawker Beechcraft                 8
HAWKER                            7
HAWKER BEECHCRAFT                 7
Hawker Beechcraft Corp.           7
HAWKER SIDDELEY                   4
Hawker Beechcraft Corporation     4
HAWKER BEECHCRAFT CORPORATION     3
Hawker                            3
Hawker-Beechcraft                 2
Hawkins & Powers                  2
HAWKER AIRCRAFT LTD               2
HAWKER BEECH                      2
Hawker Beech                      1
Hawker Aircraft Ltd               1
HAWKINS AUGUST E                  1
Hawker Siddely                    1
Hawker-Beechcraft Corporation     1
Hawker-beechcraft                 1
Hawker Aircraft Ltd.              1
Name: count, dtype: int64

In [1375]:
df.loc[df['Make'].isin(['HAWKER', 'HAWKER AIRCRAFT LTD', 'Hawker Aircraft Ltd', 'Hawker Aircraft Ltd.', 'HAWKER BEECH', 'Hawker Beech',
                       'Hawker Beechcraft', 'HAWKER BEECHCRAFT', 'HAWKER BEECHCRAFT CORP', 'Hawker Beechcraft Corp.', 'Hawker Beechcraft Corporation',
                       'HAWKER BEECHCRAFT CORPORATION', 'Hawker Siddeley', 'HAWKER SIDDELEY', 'Hawker Siddely', 'Hawker-Beechcraft',
                        'Hawker-beechcraft', 'Hawker-Beechcraft Corporation']),'Make'] = 'Hawker'

df[df['Make'] == 'Hawker'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    64
NaN         32
Name: count, dtype: int64

In [1376]:
df[df['Make'].str.lower().str.startswith('let')].value_counts('Make')

Make
Let                110
LET                 24
Let Np Kinovice      1
Letecky Zavody       1
Lett                 1
Name: count, dtype: int64

In [1377]:
df.loc[df['Make'].isin(['LET', 'Let Np Kinovice']),'Make'] = 'Let'

df[df['Make'] == 'Let'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Glider      106
NaN          27
Airplane      2
Name: count, dtype: int64

In [1378]:
df[df['Make'].str.lower().str.startswith('diam')].value_counts('Make')

Make
DIAMOND AIRCRAFT IND INC          74
Diamond Aircraft Industries       40
DIAMOND                           30
Diamond                           19
DIAMOND AIRCRAFT IND GMBH          8
Diamond Aircraft                   4
DIAMOND AIRCRAFT                   1
DIAMOND AIRCRAFT INDUSTRIES        1
DIAMOND AIRCRAFT INDUSTRIES IN     1
Diamond Aicraft Industries Inc     1
Diamond Aircraft Industry Inc      1
Name: count, dtype: int64

In [1379]:
df.loc[df['Make'].isin(['DIAMOND', 'Diamond', 'Diamond Aicraft Industries Inc', 'Diamond Aircraft', 'DIAMOND AIRCRAFT', 'DIAMOND AIRCRAFT IND GMBH',
                       'DIAMOND AIRCRAFT IND INC', 'DIAMOND AIRCRAFT INDUSTRIES', 'DIAMOND AIRCRAFT INDUSTRIES IN',
                        'Diamond Aircraft Industry Inc']),'Make'] = 'Diamond Aircraft Industries'

df[df['Make'] == 'Diamond Aircraft Industries'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane        150
NaN              28
Glider            1
Powered-Lift      1
Name: count, dtype: int64

In [1380]:
df[df['Make'].str.lower().str.startswith('sia')].value_counts('Make')

Make
Siai-marchetti    25
SIAI-MARCHETTI    10
SIAI MARCHETTI     3
Siai Marchetti     2
Siai-Marchetti     2
SIAI-Marchetti     1
Name: count, dtype: int64

In [1381]:
df.loc[df['Make'].isin(['Siai-marchetti', 'SIAI-MARCHETTI', 'SIAI MARCHETTI', 'Siai Marchetti', 'Siai-Marchetti']),'Make'] = 'SIAI-Marchetti'

df[df['Make'] == 'SIAI-Marchetti'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         25
Airplane    18
Name: count, dtype: int64

In [1382]:
# fill in the previous makes for their respective categories
df.loc[df['Make'].isin(['Let']),'Aircraft_Category'] = 'Glider'
df.loc[df['Make'].isin(['Continental Copters', 'Texas Helicopter', 'Agusta']),'Aircraft_Category'] = 'Helicopter'
df.loc[df['Make'].isin(['Pilatus', 'Alon', 'Hawker', 'Diamond Aircraft Industries', 'SIAI-Marchetti']),'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                       45
I.c.a. Brasov                 24
Israel Aircraft Industries    24
Dornier                       24
Snow                          23
Yakovlev                      22
Thunder And Colt              21
Callair                       20
Garlick                       20
Cirrus Design Corp.           19
Name: count, dtype: int64

In [1383]:
df[df['Make'].str.lower().str.startswith('i.c')].value_counts('Make')

Make
I.c.a. Brasov              26
I.C.A.-BRASOV (ROMANIA)     2
I.C.A.-Brasov               1
I.c.a. Brasov - Romania     1
I.c.a.-brasov               1
Name: count, dtype: int64

In [1384]:
df.loc[df['Make'].isin(['I.c.a. Brasov', 'I.C.A.-BRASOV (ROMANIA)', 'I.C.A.-Brasov', 'I.c.a. Brasov – Romania', 'I.c.a.-brasov',
                       'ICA BRASOV']),'Make'] = 'ICA Brasov'

df[df['Make'] == 'ICA Brasov'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN       24
Glider     7
Name: count, dtype: int64

In [1385]:
df[df['Make'].str.lower().str.startswith('isr')].value_counts('Make')

Make
Israel Aircraft Industries        28
ISRAEL AIRCRAFT INDUSTRIES         8
ISRAEL AEROSPACE INDUSTRIESLTD     1
Name: count, dtype: int64

In [1386]:
df.loc[df['Make'].isin(['ISRAEL AIRCRAFT INDUSTRIES', 'ISRAEL AEROSPACE INDUSTRIESLTD']),'Make'] = 'Israel Aircraft Industries'

df[df['Make'] == 'Israel Aircraft Industries'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         24
Airplane    13
Name: count, dtype: int64

In [1387]:
df[df['Make'].str.lower().str.startswith('dorni')].value_counts('Make')

Make
Dornier         24
DORNIER          7
DORNIER GMBH     2
Name: count, dtype: int64

In [1388]:
df.loc[df['Make'].isin(['DORNIER', 'DORNIER GMBH']),'Make'] = 'Dornier'

df[df['Make'] == 'Dornier'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         24
Airplane     9
Name: count, dtype: int64

In [1389]:
df[df['Make'].str.lower().str.startswith('sno')].value_counts('Make')

Make
Snow       30
SNOW        1
Snobird     1
Name: count, dtype: int64

In [1390]:
df.loc[df['Make'].isin(['SNOW']),'Make'] = 'Snow'

df[df['Make'] == 'Snow'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         23
Airplane     8
Name: count, dtype: int64

In [1391]:
df[df['Make'].str.lower().str.startswith('yako')].value_counts('Make')

Make
Yakovlev             30
YAKOVLEV             12
YAKOVLEV/CHINNERY     1
YAKOVLEV/DAY          1
Name: count, dtype: int64

In [1392]:
df.loc[df['Make'].isin(['YAKOVLEV', 'YAKOVLEV/CHINNERY', 'YAKOVLEV/DAY']),'Make'] = 'Yakovlev'

df[df['Make'] == 'Yakovlev'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    22
NaN         22
Name: count, dtype: int64

In [1393]:
df[df['Make'].str.lower().str.startswith('thund')].value_counts('Make')

Make
Thunder And Colt                23
Thunder Balloons, Ltd.           2
THUNDER & COLT                   1
THUNDER & COLT AIRBORNE AMER     1
THUNDERBIRD AVIATION             1
Thunder & Colt Ltd               1
Thunder Mustang                  1
Name: count, dtype: int64

In [1394]:
df.loc[df['Make'].isin(['Thunder And Colt', 'Thunder Balloons, Ltd.', 'THUNDER & COLT', 'THUNDER & COLT AIRBORNE AMER',
                        'Thunder & Colt Ltd', 'COLT BALLOONS', 'LINDSTRAND BALLOONS', 'Lindstrand Balloons', 'Lindstrand',
                       'LINDSTRAND', 'LINDSTRAND BALLOONS USA']),'Make'] = 'Thunder & Colt Balloons'

df[df['Make'] == 'Thunder & Colt Balloons'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Balloon    33
NaN        25
Name: count, dtype: int64

In [1395]:
df[df['Make'] == 'Callair'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         20
Airplane     6
Name: count, dtype: int64

In [1396]:
df[df['Make'].str.lower().str.startswith('garli')].value_counts('Make')

Make
Garlick                     22
GARLICK                     10
GARLICK HELICOPTERS INC      8
Garlick Helicipters Inc.     1
Garlick Helicopters Inc      1
Garlick Helicopters Inc.     1
Name: count, dtype: int64

In [1397]:
df.loc[df['Make'].isin(['Garlick', 'GARLICK', 'GARLICK HELICOPTERS INC', 'Garlick Helicipters Inc.', 'Garlick Helicopters Inc',
                        'Garlick Helicopters Inc.']),'Make'] = 'Garlick Helicopters'

df[df['Make'] == 'Garlick Helicopters'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           22
Helicopter    21
Name: count, dtype: int64

In [1398]:
df[df['Make'].str.lower().str.startswith('cirr')].value_counts('Make')

Make
CIRRUS DESIGN CORP           218
CIRRUS                        80
Cirrus Design Corp.           71
Cirrus                        69
Cirrus Design Corporation     10
Cirrus Design Corp             5
CIRRUS DESIGN CORP.            4
Cirrus Design                  4
CIRRUS DESIGN CORPORATION      3
CIRRUS DESIGN                  1
Name: count, dtype: int64

In [1399]:
df.loc[df['Make'].isin(['CIRRUS DESIGN CORP', 'CIRRUS', 'Cirrus Design', 'CIRRUS DESIGN', 'Cirrus Design Corp', 'Cirrus Design Corp.',
                        'CIRRUS DESIGN CORP.', 'Cirrus Design Corporation', 'CIRRUS DESIGN CORPORATION']),'Make'] = 'Cirrus'

df[df['Make'] == 'Cirrus'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    440
NaN          25
Name: count, dtype: int64

In [1400]:
# fill in the previous makes for their respective categories
df.loc[df['Make'].isin(['Thunder & Colt Balloons']),'Aircraft_Category'] = 'Balloon'
df.loc[df['Make'].isin(['ICA Brasov']),'Aircraft_Category'] = 'Glider'
df.loc[df['Make'].isin(['Garlick Helicopters']),'Aircraft_Category'] = 'Helicopter'
df.loc[df['Make'].isin(['Callair', 'Israel Aircraft Industries', 'Dornier', 'Snow', 'Yakovlev', 'Cirrus']),'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                          45
Christen Industries              19
American Champion (acac)         19
Casa                             18
Dassault                         18
Consolidated Aeronautics Inc.    18
Brantly Helicopter               17
Glasflugel                       17
Globe                            17
American                         16
Name: count, dtype: int64

In [1401]:
df[df['Make'].str.lower().str.startswith('christen')].value_counts('Make')

Make
Christen Industries          46
CHRISTEN INDUSTRIES INC      16
CHRISTENSEN STEVE             1
Christen                      1
Christen Industries Inc.      1
Christen Industries, Inc.     1
Christen/walter               1
Name: count, dtype: int64

In [1402]:
df.loc[df['Make'].isin(['Christen Industries Inc.', 'Christen', 'Christen Industries Inc.',
                        'Christen Industries, Inc.']),'Make'] = 'Christen Industries'

df[df['Make'] == 'Christen Industries'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    30
NaN         19
Name: count, dtype: int64

In [1403]:
df[df['Make'].str.lower().str.startswith('american c')].value_counts('Make')

Make
AMERICAN CHAMPION AIRCRAFT        46
American Champion (acac)          28
American Champion                 11
American Champion Aircraft         8
American Champion (ACAC)           4
AMERICAN CHAMPION                  3
AMERICAN Champion                  1
American Champion Aircraft Cor     1
Name: count, dtype: int64

In [1404]:
df.loc[df['Make'].isin(['AMERICAN CHAMPION AIRCRAFT', 'American Champion (acac)', 'American Champion Aircraft', 'American Champion (ACAC)',
                       'AMERICAN CHAMPION', 'AMERICAN Champion', 'American Champion Aircraft Cor']),'Make'] = 'American Champion'

df[df['Make'] == 'American Champion'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    75
NaN         27
Name: count, dtype: int64

In [1405]:
df[df['Make'].str.lower().str.startswith('casa')].value_counts('Make')

Make
Casa    20
CASA     2
Name: count, dtype: int64

In [1406]:
df.loc[df['Make'].isin(['CASA']),'Make'] = 'Casa'

df[df['Make'] == 'Casa'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         18
Airplane     4
Name: count, dtype: int64

In [1407]:
df[df['Make'].str.lower().str.startswith('dass')].value_counts('Make')

Make
Dassault             22
Dassault-breguet     16
Dassault Aviation    11
DASSAULT              7
Dassault/sud          5
DASSAULT AVIATION     3
DASSAULT-BREGUET      3
DASSAULT/SUD          3
Dassault Falcon       1
Dassault-Breguet      1
Name: count, dtype: int64

In [1408]:
df.loc[df['Make'].isin(['Dassault-breguet', 'Dassault Aviation', 'DASSAULT', 'Dassault/sud', 'DASSAULT AVIATION', 'DASSAULT-BREGUET',
                       'DASSAULT/SUD', 'Dassault Falcon', 'Dassault-Breguet']),'Make'] = 'Dassault'

df[df['Make'] == 'Dassault'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         50
Airplane    22
Name: count, dtype: int64

In [1409]:
df[df['Make'].str.lower().str.startswith('consolid')].value_counts('Make')

Make
Consolidated Aeronautics Inc.     25
Consolidated-vultee               15
CONSOLIDATED AERONAUTICS INC.     13
CONSOLIDATED VULTEE                4
CONSOLIDATED  AERONAUTICS INC.     1
CONSOLIDATED AERONAUTICS           1
CONSOLIDATED AERONAUTICS INC       1
Consolidated Aero                  1
Consolidated Aeronautics, Inc      1
Consolidated Aeronautics, Inc.     1
Name: count, dtype: int64

In [1410]:
df.loc[df['Make'].isin(['Consolidated-vultee', 'CONSOLIDATED AERONAUTICS INC.', 'CONSOLIDATED VULTEE', 'CONSOLIDATED AERONAUTICS INC.',
                       'CONSOLIDATED AERONAUTICS', 'CONSOLIDATED AERONAUTICS INC.', 'Consolidated Aero', 'Consolidated Aeronautics, Inc',
                       'Consolidated Aeronautics, Inc.', 'Consolidated Aeronautics Inc.', 'CONSOLIDATED  AERONAUTICS INC.',
                       'CONSOLIDATED AERONAUTICS INC']),'Make'] = 'Consolidated'

df[df['Make'] == 'Consolidated'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         33
Airplane    29
Glider       1
Name: count, dtype: int64

In [1411]:
df[df['Make'].str.lower().str.startswith('brant')].value_counts('Make')

Make
Brantly Helicopter    39
BRANTLY               13
Brantly                7
Brantly-hynes          2
Brantley               1
Name: count, dtype: int64

In [1412]:
df.loc[df['Make'].isin(['BRANTLY', 'Brantly', 'Brantly-hynes', 'Brantley']),'Make'] = 'Brantly Helicopter'

df[df['Make'] == 'Brantly Helicopter'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Helicopter    44
NaN           18
Name: count, dtype: int64

In [1413]:
df[df['Make'].str.lower().str.startswith('glasf')].value_counts('Make')

Make
Glasflugel    24
GLASFLUGEL     4
Name: count, dtype: int64

In [1414]:
df.loc[df['Make'].isin(['GLASFLUGEL']),'Make'] = 'Glasflugel'

df[df['Make'] == 'Glasflugel'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         17
Glider      10
Airplane     1
Name: count, dtype: int64

In [1415]:
df[df['Make'].str.lower().str.startswith('glob')].value_counts('Make')

Make
Globe          78
GLOBE          17
Globe Swift     1
Name: count, dtype: int64

In [1416]:
df.loc[df['Make'].isin(['GLOBE', 'Globe Swift']),'Make'] = 'Globe'

df[df['Make'] == 'Globe'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    78
NaN         18
Name: count, dtype: int64

In [1417]:
df[df['Make'].str.lower().str.startswith('american')].value_counts('Make')

Make
American Champion                 102
American                           57
American Aviation Corp. (aac)      20
AMERICAN LEGEND AIRCRAFT CO        17
American General Aircraft          16
American Aviation                  15
AMERICAN                           14
AMERICAN AVIATION                  12
American Legend                     7
AMERICAN EUROCOPTER CORP            6
American Eurocopter                 6
American Blimp Corp.                5
AMERICAN EUROCOPTER LLC             4
AMERICAN GENERAL ACFT CORP          3
American Aerolights                 3
American Aviation Corp. (AAC)       2
American Legend Aircraft Co.        2
American Autogyro                   2
AMERICAN LEGEND                     2
AMERICAN EUROCOPTER                 2
AMERICAN AIR RACING LTD             1
American Aircraft                   1
American Blimp Corporation          1
American Air Racing                 1
American Eagle                      1
AMERICAN LONGEVITY CORP             1
America

In [1418]:
df.loc[df['Make'].isin(['AMERICAN', 'American Aircraft', 'American Aviation', 'AMERICAN AVIATION', 'American Aviation Corp. (aac)',
                       'American Aviation Corp. (AAC)']),'Make'] = 'American'
df.loc[df['Make'].isin(['AMERICAN AIR RACING LTD']),'Make'] = 'American Air Racing'
df.loc[df['Make'].isin(['AMERICAN BLIMP', 'American Blimp Corp.', 'American Blimp Corporation']),'Make'] = 'American Blimp'
df.loc[df['Make'].isin(['AMERICAN EUROCOPTER', 'AMERICAN EUROCOPTER CORP', 'AMERICAN EUROCOPTER LLC']),'Make'] = 'American Eurocopter'
df.loc[df['Make'].isin(['AMERICAN GENERAL ACFT CORP']),'Make'] = 'American General Aircraft'
df.loc[df['Make'].isin(['American Legand Aircraft', 'AMERICAN LEGEND', 'AMERICAN LEGEND AIRCRAFT CO', 'American Legend Aircraft Co.',
                       'American Legend Aircraft Compa']),'Make'] = 'American Legend'

df[df['Make'] == 'American'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    102
NaN          19
Name: count, dtype: int64

In [1419]:
# fill in the previous makes for their respective categories
df.loc[df['Make'].isin(['Glasflugel']),'Aircraft_Category'] = 'Glider'
df.loc[df['Make'].isin(['Brantly Helicopter']),'Aircraft_Category'] = 'Helicopter'
df.loc[df['Make'].isin(['Christen Industries', 'American Champion', 'Casa', 'Dassault', 'Consolidated', 'Globe',
                        'American']),'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                   45
Extra Flugzeugbau         16
Glaser-dirks              15
Sukhoi                    15
Eiriavion Oy              15
Forney                    15
Curtiss-wright            14
Classic Aircraft Corp.    14
Adams                     14
Stearman                  13
Name: count, dtype: int64

In [1420]:
df[df['Make'].str.lower().str.startswith('extra')].value_counts('Make')

Make
Extra Flugzeugbau                 19
EXTRA FLUGZEUGBAU GMBH            10
EXTRA                              6
Extra                              5
EXTRA FLUGZEUGPRODUKTIONS-UND      4
Extra Flugzeugbau Gmbh             2
EXTRA FLUGZEUGBAU                  1
EXTRA Flugzeugproduktions-GMBH     1
Extra Flugzeugproduktions-und      1
Extra Flugzeugrau Gmbh             1
Name: count, dtype: int64

In [1421]:
df.loc[df['Make'].isin(['EXTRA FLUGZEUGBAU GMBH', 'EXTRA FLUGZEUGPRODUKTIONS-UND', 'Extra Flugzeugbau Gmbh', 'EXTRA FLUGZEUGBAU',
                       'EXTRA Flugzeugproduktions-GMBH', 'Extra Flugzeugproduktions-und', 'Extra Flugzeugrau Gmbh']),'Make'] = 'Extra Flugzeugbau'
df.loc[df['Make'].isin(['EXTRA']),'Make'] = 'Extra'
df[df['Make'] == 'Extra Flugzeugbau'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    21
NaN         18
Name: count, dtype: int64

In [1422]:
df[df['Make'].str.lower().str.startswith('glase')].value_counts('Make')

Make
Glaser-dirks                16
Glaser Dirks                 3
GLASER DIRKS                 1
GLASER-DIRKS                 1
Glaser-Dirks Flugzeugbau     1
Glaser-dirks-flugzeubau      1
Name: count, dtype: int64

In [1423]:
df.loc[df['Make'].isin(['Glaser-dirks', 'Glaser Dirks', 'GLASER DIRKS', 'GLASER-DIRKS', 'Glaser-Dirks Flugzeugbau',
                        'Glaser-dirks-flugzeubau']),'Make'] = 'Glaser-Dirks'

df[df['Make'] == 'Glaser-Dirks'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN       16
Glider     7
Name: count, dtype: int64

In [1424]:
df[df['Make'].str.lower().str.startswith('sukh')].value_counts('Make')

Make
Sukhoi    15
SUKHOI     6
Name: count, dtype: int64

In [1425]:
df.loc[df['Make'].isin(['SUKHOI']),'Make'] = 'Sukhoi'

df[df['Make'] == 'Sukhoi'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         15
Airplane     6
Name: count, dtype: int64

In [1426]:
df[df['Make'].str.lower().str.startswith('eiri')].value_counts('Make')

Make
Eiriavion Oy    17
EIRIAVION OY     3
Name: count, dtype: int64

In [1427]:
df.loc[df['Make'].isin(['EIRIAVION OY']),'Make'] = 'Eiriavion Oy'

df[df['Make'] == 'Eiriavion Oy'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN       15
Glider     5
Name: count, dtype: int64

In [1428]:
df[df['Make'].str.lower().str.startswith('forne')].value_counts('Make')

Make
Forney    17
Name: count, dtype: int64

In [1429]:
df[df['Make'] == 'Forney'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         15
Airplane     2
Name: count, dtype: int64

In [1430]:
df[df['Make'].str.lower().str.startswith('curtiss')].value_counts('Make')

Make
Curtiss-wright    17
CURTISS WRIGHT    11
Curtiss            5
Curtiss-Wright     2
CURTISS            1
Curtiss Moses      1
Curtiss Wright     1
Name: count, dtype: int64

In [1431]:
df.loc[df['Make'].isin(['CURTISS WRIGHT', 'Curtiss-wright', 'Curtiss Wright']),'Make'] = 'Curtiss-Wright'

df[df['Make'] == 'Curtiss-Wright'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    17
NaN         14
Name: count, dtype: int64

In [1432]:
df[df['Make'].str.lower().str.startswith('classic')].value_counts('Make')

Make
Classic Aircraft Corp.    16
CLASSIC AIRCRAFT CORP      5
Classic Aircraft Corp      2
Name: count, dtype: int64

In [1433]:
df.loc[df['Make'].isin(['Classic Aircraft Corp.', 'CLASSIC AIRCRAFT CORP', 'Classic Aircraft Corp']),'Make'] = 'Classic Aircraft'

df[df['Make'] == 'Classic Aircraft'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         16
Airplane     7
Name: count, dtype: int64

In [1434]:
df[df['Make'].str.lower().str.startswith('adams')].value_counts('Make')

Make
Adams                 19
ADAMS                  1
ADAMS BALLOONS LLC     1
ADAMS DENNIS ALLEN     1
ADAMS DONALD L         1
ADAMS JOHN R JR        1
Adams Balloon          1
Name: count, dtype: int64

In [1435]:
df.loc[df['Make'].isin(['ADAMS BALLOONS LLC', 'Adams Balloon']),'Make'] = 'Adams Balloons'

In [1436]:
df[df['Make'].str.lower().str.startswith('adams')].value_counts('Make')

Make
Adams                 19
Adams Balloons         2
ADAMS                  1
ADAMS DENNIS ALLEN     1
ADAMS DONALD L         1
ADAMS JOHN R JR        1
Name: count, dtype: int64

In [1437]:
df[df['Make'] == 'Adams'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         14
Airplane     3
Balloon      2
Name: count, dtype: int64

In [1438]:
df[df['Make'] == 'Adams'].value_counts('Model', dropna=False)

Model
A55S                    6
A55                     4
AB                      2
A-60                    1
A60S                    1
AX-9                    1
Airborne Australia O    1
KITFOX                  1
RV-6A                   1
SONERAI II              1
Name: count, dtype: int64

Balloons: A55S, A55, AB, A-60, A60S, AX-9; Planes: Airborne Australia O, KITFOX, RV-6A, SONERAI II

In [1439]:
df[df['Model'].isin(['SONERAI II']) & df['Make'].isin(['Adams'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    1
Name: count, dtype: int64

In [1440]:
df.loc[(df['Model'].isin(['A55S', 'A55', 'AB', 'A-60', 'A60S', 'AX-9'])) & (df['Make'].isin(['Adams'])), 'Aircraft_Category'] = 'Balloon'
df.loc[(df['Model'].isin(['Airborne Australia O', 'KITFOX', 'RV-6A', 'SONERAI II'])) & (df['Make'].isin(['Adams'])), 'Aircraft_Category'] = 'Airplane'

In [1441]:
df[df['Make'] == 'Adams'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Balloon     15
Airplane     4
Name: count, dtype: int64

In [1442]:
df[df['Make'].str.lower().str.startswith('adams')].value_counts('Make')

Make
Adams                 19
Adams Balloons         2
ADAMS                  1
ADAMS DENNIS ALLEN     1
ADAMS DONALD L         1
ADAMS JOHN R JR        1
Name: count, dtype: int64

In [1443]:
df.loc[(df['Make'].isin(['Adams'])) & (df['Aircraft_Category'].isin(['Balloon'])), 'Make'] = 'Adams Balloons'

df[df['Make'].str.lower().str.startswith('adams')].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Balloon     16
Airplane     8
NaN          1
Name: count, dtype: int64

In [1444]:
df[df['Make'].str.lower().str.startswith('stearm')].value_counts('Make')

Make
Stearman             23
STEARMAN AIRCRAFT     5
STEARMAN              4
Name: count, dtype: int64

In [1445]:
df.loc[df['Make'].isin(['STEARMAN AIRCRAFT', 'STEARMAN']),'Make'] = 'Stearman'

df[df['Make'] == 'Stearman'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    19
NaN         13
Name: count, dtype: int64

In [1446]:
# fill in the previous makes for their respective categories
df.loc[df['Make'].isin(['Glaser-Dirks', 'Eiriavion Oy']),'Aircraft_Category'] = 'Glider'
df.loc[df['Make'].isin(['Adams Balloons']),'Aircraft_Category'] = 'Balloon'
df.loc[df['Make'].isin(['Stearman', 'Extra Flugzeugbau', 'Sukhoi', 'Forney', 'Curtiss-Wright', 'Classic Aircraft']),'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                      45
Aerofab Inc.                 12
American General Aircraft    12
Culver                       12
Air & Space                  12
Interstate                   12
Bombardier                   12
Raytheon                     12
Barnes                       11
Pzl-bielsko                  11
Name: count, dtype: int64

In [1447]:
df[df['Make'].str.lower().str.startswith('aerof')].value_counts('Make')

Make
Aerofab Inc.     12
AEROFAB INC       2
AEROFAB INC.      1
AeroFab           1
Aerofab, Inc.     1
Name: count, dtype: int64

In [1448]:
df.loc[df['Make'].isin(['Aerofab Inc.', 'AEROFAB INC', 'AEROFAB INC.', 'Aerofab, Inc.']),'Make'] = 'Aerofab'

df[df['Make'] == 'Aerofab'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         13
Airplane     3
Name: count, dtype: int64

In [1449]:
df[df['Make'].str.lower().str.startswith('american g')].value_counts('Make')

Make
American General Aircraft    19
Name: count, dtype: int64

In [1450]:
df[df['Make'] == 'American General Aircraft'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         12
Airplane     7
Name: count, dtype: int64

In [1451]:
df[df['Make'].str.lower().str.startswith('culv')].value_counts('Make')

Make
Culver          15
CULVER           2
CULVER GLENN     1
Name: count, dtype: int64

In [1452]:
df.loc[df['Make'].isin(['CULVER']),'Make'] = 'Culver'

df[df['Make'] == 'Culver'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         12
Airplane     5
Name: count, dtype: int64

In [1453]:
df[df['Make'].str.lower().str.startswith('air &')].value_counts('Make')

Make
Air & Space    13
Name: count, dtype: int64

In [1454]:
df[df['Make'] == 'Air & Space'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN          12
Gyrocraft     1
Name: count, dtype: int64

In [1455]:
df[df['Make'].str.lower().str.startswith('inters')].value_counts('Make')

Make
Interstate    17
INTERSTATE     3
Name: count, dtype: int64

In [1456]:
df.loc[df['Make'].isin(['INTERSTATE']),'Make'] = 'Interstate'

df[df['Make'] == 'Interstate'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         12
Airplane     8
Name: count, dtype: int64

In [1457]:
df[df['Make'].str.lower().str.startswith('bomb')].value_counts('Make')

Make
BOMBARDIER INC                68
BOMBARDIER                    46
Bombardier                    29
Bombardier, Inc.              22
BOMBARDIER LEARJET CORP.       1
Bombardier Aerospace, Inc.     1
Bombardier Canadair            1
Name: count, dtype: int64

In [1458]:
df.loc[df['Make'].isin(['BOMBARDIER INC', 'BOMBARDIER', 'Bombardier, Inc.', 'BOMBARDIER LEARJET CORP.', 'Bombardier Aerospace, Inc.',
                       'Bombardier Canadair']),'Make'] = 'Bombardier'

df[df['Make'] == 'Bombardier'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    141
NaN          27
Name: count, dtype: int64

In [1459]:
df[df['Make'].str.lower().str.startswith('rayth')].value_counts('Make')

Make
RAYTHEON AIRCRAFT COMPANY      53
Raytheon Aircraft Company      17
Raytheon                       16
Raytheon Corporate Jets        16
RAYTHEON                       15
RAYTHEON COMPANY                1
RAYTHEON CORPORATE JETS INC     1
Raytheon Co                     1
Name: count, dtype: int64

In [1460]:
df.loc[df['Make'].isin(['RAYTHEON AIRCRAFT COMPANY', 'Raytheon Aircraft Company', 'Raytheon Corporate Jets', 'RAYTHEON', 'RAYTHEON COMPANY',
                       'RAYTHEON CORPORATE JETS INC', 'Raytheon Co']),'Make'] = 'Raytheon'

df[df['Make'] == 'Raytheon'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    88
NaN         32
Name: count, dtype: int64

In [1461]:
df[df['Make'].str.lower().str.startswith('barnes')].value_counts('Make')

Make
Barnes                           19
BARNES RICHARD B/HOWE MICHAEL     1
BARNES STEVEN D                   1
BARNES THOMAS A                   1
Name: count, dtype: int64

In [1462]:
df[df['Make'] == 'Barnes'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN        11
Balloon     8
Name: count, dtype: int64

In [1463]:
df[df['Make'].str.lower().str.startswith('pzl')].value_counts('Make')

Make
Pzl-mielec             22
Pzl-bielsko            11
Pzl                     7
PZL-SWIDNIK             5
PZL MIELEC              4
Pzl Warzawa-okecie      3
Pzl Okecie              1
Pzl-okecie              1
Pzl Warzawa-cnpsl       1
Pzl Swidnik             1
PZL                     1
PZL BIELSKO             1
PZL-Swidnik             1
PZL-BIELSKO             1
PZL Warszawa-Okecie     1
PZL SWIDNIK             1
PZL OKECIE              1
Pzl-swidnik             1
Name: count, dtype: int64

In [1464]:
# The different PZL permutations are all connected
df.loc[df['Make'].isin(['Pzl-mielec', 'Pzl-bielsko', 'Pzl', 'PZL-SWIDNIK', 'PZL MIELEC', 'Pzl Warzawa-okecie', 'Pzl Okecie', 'Pzl-okecie',
                       'Pzl Warzawa-cnpsl', 'Pzl Swidnik', 'PZL BIELSKO', 'PZL-Swidnik', 'PZL-BIELSKO', 'PZL Warszawa-Okecie', 'PZL SWIDNIK',
                       'PZL OKECIE', 'Pzl-swidnik']),'Make'] = 'PZL'

df[df['Make'] == 'PZL'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN           29
Airplane      23
Glider        10
Helicopter     2
Name: count, dtype: int64

In [1465]:
# PZL made different categories of aircraft, so we could see if the models may inform what categories they are
df[df['Make'] == 'PZL'].value_counts('Model', dropna=False)

Model
M-18A                11
M18                   6
PW-5                  5
PZL-M-18              3
SZD-59                3
PZL-104 Wilga 35A     2
PW 5                  2
SZD-45A OGAR          2
PZL-104 WILGA 80      2
SW4                   1
PZL104                1
SW-4                  1
101                   1
SZD 50-3              1
SZD-42-2 JANTAR       1
SZD-48-3              1
SZD-50-3              1
SZD-55-1              1
SZD51                 1
SZD 55-1              1
PZL-104               1
PZL-104 WILGA 35A     1
PZL-104 35A           1
101A                  1
PW 6U                 1
MIG-17                1
M-18T                 1
M-18B                 1
M-18A DROMADER        1
M-18                  1
KOLIBER -150A         1
JANTAR 2A             1
80                    1
55-1                  1
104-80                1
104 Wilga 80          1
Wilga 104-80          1
Name: count, dtype: int64

In [1466]:
# PZL represents multiple cats of aircraft, so google provides the category for these models
df.loc[(df['Make'].isin(['PZL'])) & (df['Model'].isin(['PW-5', 'SZD-59', 'PW 5', 'SZD-45A OGAR', 'SZD 50-3', 'SZD-42-2 JANTAR', 'SZD-48-3',
                                                      'SZD-50-3', 'SZD-55-1', 'SZD51', 'SZD 55-1', 'PW 6U', 'JANTAR 2A',
                                                       '55-1'])), 'Aircraft_Category'] = 'Glider'

df.loc[(df['Make'].isin(['PZL'])) & (df['Model'].isin(['SW4', 'SW-4'])), 'Aircraft_Category'] = 'Helicopter'

df.loc[(df['Make'].isin(['PZL'])) & (df['Model'].isin(['M-18A', 'M18', 'PZL-M-18', 'PZL-104 Wilga 35A', 'PZL-104 WILGA 80', 'PZL104',
                                                      '101', 'PZL-104', 'PZL-104 WILGA 35A', 'PZL-104 35A', '101A', 'MIG-17', 'M-18T', 'M-18B',
                                                      'M-18A DROMADER', 'M-18', 'KOLIBER -150A', '80', '104-80', '104 Wilga 80',
                                                      'Wilga 104-80'])), 'Aircraft_Category'] = 'Airplane'

df[df['Make'] == 'PZL'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane      40
Glider        22
Helicopter     2
Name: count, dtype: int64

In [1467]:
# fill in the previous makes for their respective categories
df.loc[df['Make'].isin(['Air & Space']),'Aircraft_Category'] = 'Gyrocraft'
df.loc[df['Make'].isin(['Barnes']),'Aircraft_Category'] = 'Balloon'
df.loc[df['Make'].isin(['Aerofab', 'American General Aircraft', 'Culver', 'Interstate', 'Bombardier', 'Raytheon']),'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                   45
Naval Aircraft Factory    11
Hispano Aviacion          11
Partenavia                11
Piccard                   10
Eagle Aircraft Co.        10
Atr                       10
Great Lakes               10
Nord (sncan)               9
Meyers Aircraft Co.        9
Name: count, dtype: int64

In [1468]:
df[df['Make'].str.lower().str.startswith('nava')].value_counts('Make')

Make
Naval Aircraft Factory    12
NAVAL AIRCRAFT FACTORY     1
Name: count, dtype: int64

In [1469]:
df.loc[df['Make'].isin(['NAVAL AIRCRAFT FACTORY']),'Make'] = 'Naval Aircraft Factory'

df[df['Make'] == 'Naval Aircraft Factory'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         11
Airplane     2
Name: count, dtype: int64

In [1470]:
df[df['Make'].str.lower().str.startswith('hispa')].value_counts('Make')

Make
Hispano Aviacion    11
Name: count, dtype: int64

In [1471]:
df[df['Make'] == 'Hispano Aviacion'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN    11
Name: count, dtype: int64

In [1472]:
df[df['Make'].str.lower().str.startswith('parte')].value_counts('Make')

Make
Partenavia           12
PARTENAVIA            4
PARTENAVIA S.P.A.     1
PARTENAVIA SPA        1
Name: count, dtype: int64

In [1473]:
df.loc[df['Make'].isin(['PARTENAVIA', 'PARTENAVIA S.P.A.', 'PARTENAVIA SPA']),'Make'] = 'Partenavia'

df[df['Make'] == 'Partenavia'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         11
Airplane     7
Name: count, dtype: int64

In [1474]:
df[df['Make'].str.lower().str.startswith('picc')].value_counts('Make')

Make
Piccard    10
Name: count, dtype: int64

In [1475]:
df[df['Make'] == 'Piccard'].value_counts('Model', dropna=False)

Model
AX-6    7
AX6     1
AX6W    1
P-80    1
Name: count, dtype: int64

Google confirms that the models listed under Piccard are balloons

In [1476]:
df[df['Make'].str.lower().str.startswith('eagle a')].value_counts('Make')

Make
Eagle Aircraft Co.    42
EAGLE AIRCRAFT CO      1
Name: count, dtype: int64

In [1477]:
df.loc[df['Make'].isin(['Eagle Aircraft Co.', 'EAGLE AIRCRAFT CO']),'Make'] = 'Eagle Aircraft'

df[df['Make'] == 'Eagle Aircraft'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    33
NaN         10
Name: count, dtype: int64

In [1478]:
df[df['Make'].str.lower().str.startswith('atr')].value_counts('Make')

Make
ATR    19
Atr    16
Name: count, dtype: int64

In [1479]:
df.loc[df['Make'].isin(['Atr']),'Make'] = 'ATR'

df[df['Make'] == 'ATR'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    22
NaN         13
Name: count, dtype: int64

In [1480]:
df[df['Make'].str.lower().str.startswith('great')].value_counts('Make')

Make
Great Lakes                     50
GREAT LAKES                     14
Great Lakes Aircraft Company     1
Name: count, dtype: int64

In [1481]:
df.loc[df['Make'].isin(['GREAT LAKES', 'Great Lakes Aircraft Company']),'Make'] = 'Great Lakes'

df[df['Make'] == 'Great Lakes'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    55
NaN         10
Name: count, dtype: int64

In [1482]:
df[df['Make'].str.lower().str.startswith('nord')].value_counts('Make')

Make
Nord (sncan)           9
NORD                   2
Nord                   2
NORDQUIST RICHARD A    1
Nord (SNCAN)           1
Nord Aviation          1
Name: count, dtype: int64

In [1483]:
df.loc[df['Make'].isin(['Nord (sncan)', 'NORD', 'Nord (SNCAN)', 'Nord Aviation']),'Make'] = 'Nord'

df[df['Make'] == 'Nord'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         11
Airplane     4
Name: count, dtype: int64

In [1484]:
df[df['Make'].str.lower().str.startswith('meyers')].value_counts('Make')

Make
Meyers Aircraft Co.      10
MEYERS                    7
MEYERS INDUSTRIES INC     3
Meyers                    1
Name: count, dtype: int64

In [1485]:
df.loc[df['Make'].isin(['MEYERS', 'MEYERS INDUSTRIES INC', 'Meyers']),'Make'] = 'Meyers Aircraft Co.'

df[df['Make'] == 'Meyers Aircraft Co.'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    11
NaN         10
Name: count, dtype: int64

In [1486]:
# fill in the previous makes for their respective categories
df.loc[df['Make'].isin(['Piccard']),'Aircraft_Category'] = 'Balloon'
df.loc[df['Make'].isin(['Naval Aircraft Factory', 'Hispano Aviacion', 'Partenavia', 'Eagle Aircraft', 'ATR', 'Great Lakes', 'Nord',
                       'Meyers Aircraft Co.']),'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                       45
Fleet                          9
Mcculloch                      9
Nanchang                       9
Head Balloons, Inc.            8
Aero Vodochody Aero. Works     8
Centrair                       8
Commander                      8
Quicksilver                    8
Chance Vought                  7
Name: count, dtype: int64

In [1487]:
df[df['Make'].str.lower().str.startswith('fleet')].value_counts('Make')

Make
Fleet             10
FLEET              6
FLEETWOOD JACK     1
Name: count, dtype: int64

In [1488]:
df.loc[df['Make'].isin(['FLEET']),'Make'] = 'Fleet'

df[df['Make'] == 'Fleet'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         9
Airplane    7
Name: count, dtype: int64

In [1489]:
df[df['Make'].str.lower().str.startswith('mccu')].value_counts('Make')

Make
Mcculloch            10
MCCULLOCH JERRY       1
MCCURRY CHARLES P     1
McCurdy               1
Mcculley Ronald       1
Mccullough            1
Name: count, dtype: int64

In [1490]:
df[df['Make'] == 'Mcculloch'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN          9
Gyrocraft    1
Name: count, dtype: int64

In [1491]:
df[df['Make'].str.lower().str.startswith('nanc')].value_counts('Make')

Make
Nanchang          11
NANCHANG CHINA     6
NANCHANG           4
Nanchang China     1
Name: count, dtype: int64

In [1492]:
df.loc[df['Make'].isin(['NANCHANG CHINA', 'NANCHANG', 'Nanchang China']),'Make'] = 'Nanchang'

df[df['Make'] == 'Nanchang'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    12
NaN         10
Name: count, dtype: int64

In [1493]:
df[df['Make'].str.lower().str.startswith('head')].value_counts('Make')

Make
Head Balloons, Inc.    10
HEAD BALLOONS INC       5
HEAD                    3
Head                    2
HEAD BALLOONS INC.      1
Head Balloons Inc.      1
Name: count, dtype: int64

In [1494]:
df.loc[df['Make'].isin(['Head Balloons, Inc.', 'HEAD BALLOONS INC', 'HEAD', 'Head', 'HEAD BALLOONS INC.',
                        'Head Balloons Inc.']),'Make'] = 'Head Balloons'

df[df['Make'] == 'Head Balloons'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Balloon    14
NaN         8
Name: count, dtype: int64

In [1495]:
df[df['Make'].str.lower().str.startswith('aero v')].value_counts('Make')

Make
AERO VODOCHODY                10
Aero Vodochody Aero. Works    10
Aero Vodochody                 8
Aero Vodochody Aero Works      1
Name: count, dtype: int64

In [1496]:
df.loc[df['Make'].isin(['AERO VODOCHODY', 'Aero Vodochody Aero. Works', 'Aero Vodochody Aero Works']),'Make'] = 'Aero Vodochody'

df[df['Make'] == 'Aero Vodochody'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         15
Airplane    14
Name: count, dtype: int64

In [1497]:
df[df['Make'].str.lower().str.startswith('centrai')].value_counts('Make')

Make
Centrair    9
CENTRAIR    2
Name: count, dtype: int64

In [1498]:
df.loc[df['Make'].isin(['CENTRAIR']),'Make'] = 'Centrair'

df[df['Make'] == 'Centrair'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN       8
Glider    3
Name: count, dtype: int64

In [1499]:
df[df['Make'].str.lower().str.startswith('comma')].value_counts('Make')

Make
Commander                     12
COMMANDER AIRCRAFT CO          2
Commander Aircraft Company     2
COMMANDER                      1
Commander Aircraft             1
Name: count, dtype: int64

In [1500]:
df.loc[df['Make'].isin(['COMMANDER AIRCRAFT CO', 'Commander Aircraft Company', 'COMMANDER', 'Commander Aircraft']),'Make'] = 'Commander'

df[df['Make'] == 'Commander'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    9
NaN         9
Name: count, dtype: int64

In [1501]:
df[df['Make'].str.lower().str.startswith('quick')].value_counts('Make')

Make
QUICKSILVER                       36
Quickie                           22
Quicksilver                       18
QUICKSILVER ENTERPRISES INC        2
QUICKSILVER MANUFACTURING INC      2
QUICKIE                            1
QUICKSILVER AIRCRAFT               1
QUICKSILVER AIRCRAFT CO            1
QUICKSILVER EIPPER ACFT INC        1
QUICKSILVER MFG                    1
Quickie-myers                      1
Quicksilver Aircraft Northeast     1
Quicksilver II                     1
Quicksilver Manufacturing          1
Name: count, dtype: int64

In [1502]:
df.loc[df['Make'].isin(['QUICKSILVER', 'QUICKSILVER AIRCRAFT', 'QUICKSILVER AIRCRAFT CO', 'Quicksilver Aircraft Northeast',
                        'QUICKSILVER EIPPER ACFT INC', 'QUICKSILVER ENTERPRISES INC', 'Quicksilver II', 'Quicksilver Manufacturing',
                        'QUICKSILVER MANUFACTURING INC', 'QUICKSILVER MFG']),'Make'] = 'Quicksilver'

df.loc[df['Make'].isin(['QUICKIE', 'Quickie-myers']),'Make'] = 'Quickie'
df[df['Make'] == 'Quickie'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    24
Name: count, dtype: int64

In [1503]:
df[df['Make'] == 'Quicksilver'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane      53
NaN            8
Ultralight     3
Unknown        1
Name: count, dtype: int64

In [1504]:
df[df['Make'] == 'Quicksilver'].value_counts('Model', dropna=False)

Model
MXL II               9
MXL II Sport         4
Sport 2S             4
MX II Sprint         3
MXL II SPORT         3
MXII                 3
SPORT 2S             3
GT-500               3
GT500                2
MX                   2
GT400                2
MXII SPORT           2
GT 400               2
Sport                2
QUICKSILVER MX II    1
SPRINT II            1
Sport II             1
Sprint II            1
SPRINT 2             1
SPORT2S R582         1
SPORT IIS            1
SPORT 2S R582        1
2S                   1
MXLII                1
MXL2                 1
MXL-2                1
MXL Sport II         1
MXL SPORT            1
2S SPORT             1
MX2                  1
MX-2                 1
MX SPRINT II         1
GT-400 R503          1
GT-400               1
UNKNOWN              1
Name: count, dtype: int64

Google says these models are all ultralight aircraft, so the airplane designation is incorrect, and so all Quicksilver entries can be changed to ultralight.

In [1505]:
df[df['Make'].str.lower().str.startswith('chanc')].value_counts('Make')

Make
Chance Vought      7
CHANCE VOUGHT      2
CHANCEY GERRY M    1
Name: count, dtype: int64

In [1506]:
df.loc[df['Make'].isin(['CHANCE VOUGHT']),'Make'] = 'Chance Vought'
df[df['Make'] == 'Chance Vought'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         7
Airplane    2
Name: count, dtype: int64

In [1507]:
# fill in the previous makes for their respective categories
df.loc[df['Make'].isin(['Head Balloons']),'Aircraft_Category'] = 'Balloon'
df.loc[df['Make'].isin(['Centrair']),'Aircraft_Category'] = 'Glider'
df.loc[df['Make'].isin(['Quicksilver']),'Aircraft_Category'] = 'Ultralight'
df.loc[df['Make'].isin(['Mcculloch']),'Aircraft_Category'] = 'Gyrocraft'
df.loc[df['Make'].isin(['Fleet', 'Nanchang', 'Aero Vodochody', 'Commander', 'Chance Vought']),'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                           45
Reims Aviation                     7
Temco                              7
Nihon                              7
Monocoupe Aircraft                 7
Mitchell                           7
Slingsby                           7
Smith                              7
Avian Balloon                      7
Government Aircraft Fact (gaf)     6
Name: count, dtype: int64

In [1508]:
df[df['Make'].str.lower().str.startswith('reims')].value_counts('Make')

Make
Reims Aviation           9
Reims                    5
REIMS                    4
REIMS AVIATION SA        2
REIMS AVIATION S.A.      1
REIMS-CESSNA             1
REims                    1
Reims Aviation Cessna    1
Reims-Cessna             1
Name: count, dtype: int64

In [1509]:
df.loc[df['Make'].isin(['Reims Aviation', 'REIMS', 'REIMS AVIATION SA', 'REIMS AVIATION S.A.', 'REIMS-CESSNA', 'REims', 'Reims Aviation Cessna',
                       'Reims-Cessna']),'Make'] = 'Reims'
df[df['Make'] == 'Reims'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    17
NaN          8
Name: count, dtype: int64

In [1510]:
df[df['Make'].str.lower().str.startswith('temc')].value_counts('Make')

Make
Temco             21
TEMCO              8
Temco Luscombe     1
Name: count, dtype: int64

In [1511]:
df.loc[df['Make'].isin(['TEMCO', 'Temco Luscombe']),'Make'] = 'Temco'
df[df['Make'] == 'Temco'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    22
NaN          8
Name: count, dtype: int64

In [1512]:
df[df['Make'].str.lower().str.startswith('nih')].value_counts('Make')

Make
Nihon    8
Name: count, dtype: int64

In [1513]:
df[df['Make'] == 'Nihon'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         7
Airplane    1
Name: count, dtype: int64

In [1514]:
df[df['Make'].str.lower().str.startswith('mono')].value_counts('Make')

Make
Monocoupe Aircraft    8
Monocoupe             5
MONOCOUPE             2
Name: count, dtype: int64

In [1515]:
df.loc[df['Make'].isin(['Monocoupe Aircraft', 'MONOCOUPE']),'Make'] = 'Monocoupe'
df[df['Make'] == 'Monocoupe'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         9
Airplane    6
Name: count, dtype: int64

In [1516]:
df[df['Make'].str.lower().str.startswith('mitch')].value_counts('Make')

Make
Mitchell              8
MITCHELL DAVID N      1
MITCHELL DERRYLE V    1
Mitchell Ronald A     1
Mitchell/bede         1
Name: count, dtype: int64

In [1517]:
df[df['Make'] == 'Mitchell'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         7
Airplane    1
Name: count, dtype: int64

In [1518]:
df[df['Make'].str.lower().str.startswith('sling')].value_counts('Make')

Flushing oldest 200 entries.
  warn('Output cache limit (currently {sz} entries) hit.\n'


Make
Slingsby                 7
SLINGSBY                 2
Slingsby Aviation Plc    1
Name: count, dtype: int64

In [1519]:
df.loc[df['Make'].isin(['SLINGSBY', 'Slingsby Aviation Plc']),'Make'] = 'Slingsby'
df[df['Make'] == 'Slingsby'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         8
Airplane    1
Glider      1
Name: count, dtype: int64

Wikepedia says Slingsby makes both gliders and planes, so I'll look at the models

In [1520]:
df[df['Make'] == 'Slingsby'].value_counts('Model', dropna=False)

Model
41-2                 1
CAPSTAN TYPE 49B     1
DART T-51            1
KESTREL 19           1
Swallow Type T.45    1
T-51                 1
T59D KESTREL 19      1
T65A                 1
T67M 260             1
TYPE 43 SERIES 3F    1
Name: count, dtype: int64

Only the T67M 260 is an airplane, the rest are gliders

In [1521]:
df.loc[(df['Model'].isin(['T67M 260'])) & (df['Make'].isin(['Slingsby'])), 'Aircraft_Category'] = 'Airplane'

df.loc[(df['Model'].isin(['41-2', 'CAPSTAN TYPE 49B', 'DART T-51', 'KESTREL 19', 'Swallow Type T.45', 'T-51', 'T59D KESTREL 19',
                         'T65A', 'TYPE 43 SERIES 3F'])) & (df['Make'].isin(['Slingsby'])), 'Aircraft_Category'] = 'Glider'

In [1522]:
df[df['Make'].str.lower().str.startswith('smith')].value_counts('Make')

Make
SMITH                 20
Smith                 18
Smith Miniplane        3
Smith Aerostar         3
SMITH MINIPLANE        3
SMITHWICK/TREIDEL      1
Smith Mini             1
Smith Douglas J.       1
Smith Arthur Fox       1
Smith & R. Mathews     1
SMITH VILAS            1
SMITH ALBERT F         1
SMITH RICHARD D JR     1
SMITH EDWARD I         1
SMITH DENNIS P         1
SMITH Capella          1
SMITH BRET B           1
SMITH ALLEN            1
Smith Wylie Jay        1
Name: count, dtype: int64

In [1523]:
df[df['Make'] == 'Smith'].value_counts('Model', dropna=False)

Model
Aerostar 601P         4
AEROSTAR 600          2
Aerostar 601          2
MINIPLANE             2
AEROSTAR 601          1
LONG-EZ               1
MINIPLANE DSA-1       1
RV-4                  1
S-51D                 1
Stewart S51D          1
WCS-222 (BELL 47G)    1
Zodiac 601XL          1
Name: count, dtype: int64

In [1524]:
df[df['Make'] == 'Smith'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane      10
NaN            7
Helicopter     1
Name: count, dtype: int64

The one helicopter under Smith is the WCS-222; the rest are planes. So I'll fill in Smith's categories here by model.

In [1525]:
df.loc[(df['Model'].isin(['WCS-222 (BELL 47G)'])) & (df['Make'].isin(['Smith'])), 'Aircraft_Category'] = 'Helicopter'

df.loc[(df['Model'].isin(['Aerostar 601P', 'AEROSTAR 600', 'Aerostar 601', 'MINIPLANE', 'AEROSTAR 601', 'LONG-EZ', 'MINIPLANE DSA-1',
                         'RV-4', 'S-51D', 'Stewart S51D', 'Zodiac 601XL'])) & (df['Make'].isin(['Smith'])), 'Aircraft_Category'] = 'Airplane'

df[df['Make'] == 'Smith'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane      17
Helicopter     1
Name: count, dtype: int64

In [1526]:
df[df['Make'].str.lower().str.startswith('avian')].value_counts('Make')

Make
Avian Balloon    8
Avian            6
AVIAN BALLOON    1
Name: count, dtype: int64

In [1527]:
df.loc[df['Make'].isin(['Avian', 'AVIAN BALLOON']),'Make'] = 'Avian Balloon'
df[df['Make'] == 'Avian Balloon'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN        12
Balloon     3
Name: count, dtype: int64

In [1528]:
df[df['Make'].str.lower().str.startswith('governm')].value_counts('Make')

Make
Government Aircraft Fact (gaf)    6
Name: count, dtype: int64

In [1529]:
df[df['Make'] == 'Government Aircraft Fact (gaf)'].value_counts('Model', dropna=False)

Model
N24A          2
N22B          1
N22S          1
NOMAD 24A     1
NOMAD N22B    1
Name: count, dtype: int64

These are all planes

In [1530]:
# fill in the previous makes for their respective categories
df.loc[df['Make'].isin(['Avian Balloon']),'Aircraft_Category'] = 'Balloon'
df.loc[df['Make'].isin(['Reims', 'Temco', 'Nihon', 'Monocoupe', 'Mitchell', 'Government Aircraft Fact (gaf)']),'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                      45
Howard Aircraft Corp.         6
Intermountain Mfg. (imco)     6
Republic                      6
Molino Oy                     6
Travel Air                    6
American Blimp                6
Curtiss                       5
Grob                          5
Varga                         5
Name: count, dtype: int64

Just look for Makes that have more than just a handful of entries.

In [1531]:
df[df['Make'].str.lower().str.startswith('repu')].value_counts('Make')

Make
Republic    28
REPUBLIC     8
Name: count, dtype: int64

In [1532]:
df.loc[df['Make'].isin(['REPUBLIC']),'Make'] = 'Republic'
df[df['Make'] == 'Republic'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    30
NaN          6
Name: count, dtype: int64

In [1533]:
df.loc[df['Make'].isin(['Republic']),'Aircraft_Category'] = 'Airplane'
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                      45
Travel Air                    6
Howard Aircraft Corp.         6
American Blimp                6
Molino Oy                     6
Intermountain Mfg. (imco)     6
Beagle Aircraft               5
Aircoupe                      5
Bucker Flugzeugbau            5
Varga                         5
Name: count, dtype: int64

In [1534]:
df[df['Make'].str.lower().str.startswith('beag')].value_counts('Make')

Make
Beagle Aircraft    6
BEAGLE             2
Name: count, dtype: int64

In [1535]:
df.loc[df['Make'].isin(['Beagle Aircraft', 'BEAGLE']),'Make'] = 'Beagle'
df[df['Make'] == 'Beagle'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         5
Airplane    3
Name: count, dtype: int64

In [1536]:
df.loc[df['Make'].isin(['Beagle']),'Aircraft_Category'] = 'Airplane'
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                      45
Howard Aircraft Corp.         6
Travel Air                    6
Intermountain Mfg. (imco)     6
Molino Oy                     6
American Blimp                6
Laister                       5
Curtiss                       5
Lancair                       5
Bucker Flugzeugbau            5
Name: count, dtype: int64

In [1537]:
df[df['Make'].str.lower().str.startswith('lanc')].value_counts('Make')

Make
LANCAIR            19
Lancair            19
LANCAIR COMPANY     7
Lancair Company     2
LANCE M HOOLEY      1
Name: count, dtype: int64

In [1538]:
df.loc[df['Make'].isin(['LANCAIR', 'LANCAIR COMPANY', 'Lancair Company']),'Make'] = 'Lancair'
df[df['Make'] == 'Lancair'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    40
NaN          7
Name: count, dtype: int64

In [1539]:
df.loc[df['Make'].isin(['Lancair']),'Aircraft_Category'] = 'Airplane'
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                      45
Molino Oy                     6
Howard Aircraft Corp.         6
Intermountain Mfg. (imco)     6
Travel Air                    6
American Blimp                6
Porterfield                   5
Curtiss                       5
Silvaire                      5
General Balloon               5
Name: count, dtype: int64

In [1540]:
df[df['Make'].str.lower().str.startswith('silv')].value_counts('Make')

Make
SILVAIRE                    14
Silvaire                     6
SILVERLIGHT AVIATION LLC     2
Silveira Jonathan A          1
SilverLight Aviation         1
SilverLight Aviation LLC     1
Name: count, dtype: int64

In [1541]:
df.loc[df['Make'].isin(['SILVAIRE']),'Make'] = 'Silvaire'
df[df['Make'] == 'Silvaire'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    15
NaN          5
Name: count, dtype: int64

In [1542]:
df.loc[df['Make'].isin(['Silvaire']),'Aircraft_Category'] = 'Airplane'
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                      45
Howard Aircraft Corp.         6
Intermountain Mfg. (imco)     6
Travel Air                    6
Molino Oy                     6
American Blimp                6
Grob                          5
Aircoupe                      5
Bucker Flugzeugbau            5
Curtiss                       5
Name: count, dtype: int64

In [1543]:
df[df['Make'].str.lower().str.startswith('travel')].value_counts('Make')

Make
Travel Air    7
TRAVEL AIR    2
Name: count, dtype: int64

In [1544]:
df.loc[df['Make'].isin(['TRAVEL AIR']),'Make'] = 'Travel Air'
df[df['Make'] == 'Travel Air'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         6
Airplane    3
Name: count, dtype: int64

In [1545]:
df.loc[df['Make'].isin(['Travel Air']),'Aircraft_Category'] = 'Airplane'
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                      45
Howard Aircraft Corp.         6
American Blimp                6
Molino Oy                     6
Intermountain Mfg. (imco)     6
Grob                          5
Laister                       5
Curtiss                       5
Varga                         5
Bucker Flugzeugbau            5
Name: count, dtype: int64

In [1546]:
df[df['Make'].str.lower().str.startswith('grob')].value_counts('Make')

Make
Grob                9
GROB                3
GROB-WERKE          2
GROB AIRCRAFT AG    1
Name: count, dtype: int64

In [1547]:
df.loc[df['Make'].isin(['GROB', 'GROB-WERKE', 'GROB AIRCRAFT AG']),'Make'] = 'Grob'
df[df['Make'] == 'Grob'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Glider      6
NaN         5
Airplane    4
Name: count, dtype: int64

In [1548]:
df[df['Make'] == 'Grob'].value_counts('Model', dropna=False)

Model
G103               4
G 120A             2
G102               2
120A-1             1
G 103 TWIN II      1
G 180              1
G103 TWIN ASTIR    1
G103 Twin Astir    1
G120A              1
G120TP-A           1
Name: count, dtype: int64

In [1549]:
df.loc[(df['Model'].isin(['G103', 'G102', 'G 103 TWIN II', 'G103 TWIN ASTIR',
                          'G103 Twin Astir'])) & (df['Make'].isin(['Grob'])), 'Aircraft_Category'] = 'Glider'

df.loc[(df['Model'].isin(['G 120A', '120A-1', 'G 180', 'G120A', 'G120TP-A'])) & (df['Make'].isin(['Grob'])), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                      45
American Blimp                6
Howard Aircraft Corp.         6
Intermountain Mfg. (imco)     6
Molino Oy                     6
Porterfield                   5
Varga                         5
Bucker Flugzeugbau            5
Curtiss                       5
Aircoupe                      5
Name: count, dtype: int64

In [1550]:
df[df['Make'].str.lower().str.startswith('varg')].value_counts('Make')

Make
Varga                   23
VARGA AIRCRAFT CORP.     2
Name: count, dtype: int64

In [1551]:
df.loc[df['Make'].isin(['VARGA AIRCRAFT CORP.']),'Make'] = 'Varga'
df[df['Make'] == 'Varga'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    20
NaN          5
Name: count, dtype: int64

In [1552]:
df.loc[df['Make'].isin(['Varga']),'Aircraft_Category'] = 'Airplane'
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                      45
American Blimp                6
Howard Aircraft Corp.         6
Molino Oy                     6
Intermountain Mfg. (imco)     6
Curtiss                       5
Aircoupe                      5
General Balloon               5
Bucker Flugzeugbau            5
Laister                       5
Name: count, dtype: int64

In [1553]:
df[df['Make'].str.lower().str.startswith('airc')].value_counts('Make')

Make
Aircraft Mfg & Dev. Co. (amd)     10
Aircoupe                           6
AIRCRAFT MFG & DEVELOPMENT CO      4
AIRCRAFT MFG & DVLPMT CO           2
Aircraft Parts & Dev. Corp         2
AIRCRAFT INDUSTRIES A.S.           1
AIRCRAFT MFG & DESIGN LLC          1
Aircraft Mfg & Design LLC          1
Aircraft Mfg & Dev. Co.            1
Aircraft Mfg & Dev. Co. (AMD)      1
Aircraft Mfg & Development Co.     1
Name: count, dtype: int64

In [1554]:
df.loc[df['Make'].isin(['Aircraft Mfg & Dev. Co. (amd)', 'AIRCRAFT MFG & DEVELOPMENT CO', 'AIRCRAFT MFG & DVLPMT CO',
                       'AIRCRAFT MFG & DESIGN LLC', 'Aircraft Mfg & Design LLC', 'Aircraft Mfg & Dev. Co.', 'Aircraft Mfg & Dev. Co. (AMD)',
                       'Aircraft Mfg & Development Co.']),'Make'] = 'Aircraft Mfg. & Dev. Co.'

df[df['Make'] == 'Aircraft Mfg. & Dev. Co.'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Airplane    17
NaN          4
Name: count, dtype: int64

In [1555]:
df.loc[df['Make'].isin(['Aircraft Mfg. & Dev. Co.']),'Aircraft_Category'] = 'Airplane'

In [1556]:
df[df['Make'] == 'Aircoupe'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         5
Airplane    1
Name: count, dtype: int64

In [1557]:
df.loc[df['Make'].isin(['Aircoupe']),'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                      45
American Blimp                6
Intermountain Mfg. (imco)     6
Howard Aircraft Corp.         6
Molino Oy                     6
Laister                       5
Bucker Flugzeugbau            5
General Balloon               5
Curtiss                       5
Porterfield                   5
Name: count, dtype: int64

In [1558]:
df[df['Make'].str.lower().str.startswith('bucker')].value_counts('Make')

Make
Bucker Flugzeugbau    5
Bucker Jungmann       2
BUCKER JUNGMANN       1
BUCKER JUNGMEISTER    1
Bucker                1
Bucker-jungmann       1
Name: count, dtype: int64

In [1559]:
df.loc[df['Make'].isin(['Bucker Jungmann', 'BUCKER JUNGMANN', 'BUCKER JUNGMEISTER', 'Bucker', 'Bucker-jungmann']),'Make'] = 'Bucker Flugzeugbau'

df[df['Make'] == 'Bucker Flugzeugbau'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         7
Airplane    4
Name: count, dtype: int64

In [1560]:
df.loc[df['Make'].isin(['Bucker Flugzeugbau']),'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                      45
Molino Oy                     6
Intermountain Mfg. (imco)     6
Howard Aircraft Corp.         6
American Blimp                6
Scheibe Flugzeugbau           5
General Balloon               5
Laister                       5
Curtiss                       5
Porterfield                   5
Name: count, dtype: int64

In [1561]:
df['Aircraft_Category'].value_counts(dropna=False)

Aircraft_Category
Airplane             73236
Helicopter            7708
Glider                1257
NaN                    670
Balloon                662
Gyrocraft              209
Weight-Shift           160
Ultralight             110
Powered Parachute       90
WSFT                     9
Unknown                  6
Blimp                    4
Powered-Lift             2
UNK                      2
Rocket                   1
ULTR                     1
Name: count, dtype: int64

In [1562]:
df[df['Aircraft_Category'] == 'Unknown']['Make'].value_counts()

Make
Varieze                           1
SUI RY, LAURINEN JANNE, KÄÄRIÄ    1
AMATEUR CONSTRUCTION              1
Parachute Icarus                  1
RANS                              1
QUAD CITY                         1
Name: count, dtype: int64

In [1563]:
df[df['Make'] == 'Varieze']['Aircraft_Category'].value_counts()

Aircraft_Category
Airplane    4
Unknown     1
Name: count, dtype: int64

In [1564]:
df.loc[df['Make'].isin(['Varieze']),'Aircraft_Category'] = 'Airplane'

df[df['Make'] == 'Unknown']['Model'].value_counts(dropna=False)

Model
Unknown                44
206                     1
Safari 400              1
BE20                    1
Rotorway                1
Supercat                1
Free Bird Sportlite     1
RV-4                    1
Challenger II           1
Avid Flyer              1
KR-2                    1
WINDWAGON               1
A330                    1
AVID FLYER              1
QUICKIE                 1
GLIDER TRYKE            1
HOBBS B8M               1
STARDUSTER TOO          1
SKYBOLT                 1
MIDGET MUSTANG          1
Airbus                  1
Name: count, dtype: int64

In [1565]:
df[df['Make'].isin(['Unknown']) & df['Model'].isin(['Safari 400'])].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Helicopter    1
Name: count, dtype: int64

In [1566]:
df[df['Make'].isin(['Unknown']) & df['Model'].isin(['Safari 400'])].head()

Unnamed: 0,Event_Id,Investigation_Type,Accident_Number,Event_Date,Location,Country,Aircraft_damage,Aircraft_Category,Make,Model,...,Number_of_Engines,Engine_Type,Purpose_of_flight,Total_Fatal_Injuries,Total_Serious_Injuries,Total_Minor_Injuries,Total_Uninjured,Weather_Condition,Broad_phase_of_flight,Report_Status
71377,20120628X03954,Accident,CEN12WA400,2012-02-18,"Lahr, Germany",Germany,Destroyed,Helicopter,Unknown,Safari 400,...,1.0,Unknown,Personal,1.0,0.0,0.0,0.0,VMC,Unknown,Unknown


In [1567]:
df[df['Make'].str.lower().str.startswith('saf')].value_counts('Make')

Make
SAFARI    1
Name: count, dtype: int64

In [1568]:
df.loc[(df['Model'].isin(['Safari 400'])) & (df['Make'].isin(['Unknown'])), 'Make'] = 'Safari Helicopter'

df[df['Make'].str.lower().str.startswith('saf')].value_counts('Make')

Make
SAFARI               1
Safari Helicopter    1
Name: count, dtype: int64

In [1569]:
df.loc[df['Make'].isin(['SAFARI']),'Make'] = 'Safari Helicopter'

df[df['Make'] == 'Safari Helicopter'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
Helicopter    2
Name: count, dtype: int64

In [1570]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                      45
Molino Oy                     6
Intermountain Mfg. (imco)     6
Howard Aircraft Corp.         6
American Blimp                6
Scheibe Flugzeugbau           5
General Balloon               5
Laister                       5
Curtiss                       5
Porterfield                   5
Name: count, dtype: int64

In [1571]:
df[df['Make'] == 'Molino Oy'].value_counts('Model', dropna=False)

Model
PIK-20       2
PIK-20B      2
MU2-2B-25    1
PIK 20       1
PIK 20E      1
Name: count, dtype: int64

In [1572]:
df[df['Make'].isin(['Molino Oy']) & df['Model'].isin(['MU2-2B-25'])].value_counts('Aircraft_Category')

Aircraft_Category
Airplane    1
Name: count, dtype: int64

In [1573]:
df[df['Model'].str.lower().str.startswith('mu2')].value_counts('Make')

Make
Mitsubishi    11
Molino Oy      1
Name: count, dtype: int64

MU2-2B-25 is a model of a Mitsubishi airplane. Molino Oy seems to make gliders exclusively. So this entry was entered incorrectly and the make should be changed to Mitsubishi.

In [1574]:
df.loc[(df['Model'].isin(['MU2-2B-25'])) & (df['Make'].isin(['Molino Oy'])), 'Make'] = 'Mitsubishi'

df[df['Make'] == 'Molino Oy'].value_counts('Model', dropna=False)

Model
PIK-20     2
PIK-20B    2
PIK 20     1
PIK 20E    1
Name: count, dtype: int64

In [1575]:
df.loc[df['Make'].isin(['Molino Oy']),'Aircraft_Category'] = 'Glider'

In [1576]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                      45
Howard Aircraft Corp.         6
American Blimp                6
Intermountain Mfg. (imco)     6
General Balloon               5
Curtiss                       5
Porterfield                   5
Scheibe Flugzeugbau           5
Laister                       5
Sabreliner Corp.              4
Name: count, dtype: int64

In [1577]:
df[df['Make'] == 'Howard Aircraft Corp.'].value_counts('Model', dropna=False)

Model
DGA-15P      9
500          1
GFA-15P      1
TIERRA II    1
Name: count, dtype: int64

The 500 model is an airplane made by Howard Aero Incorporated, not Howard Aircraft. The Tierra II airplane is made by Teratorn Aircraft.

In [1578]:
df.loc[(df['Model'].isin(['500'])) & (df['Make'].isin(['Howard Aircraft Corp.'])), 'Make'] = 'Howard Aero Incorporated'
df.loc[(df['Model'].isin(['TIERRA II'])) & (df['Make'].isin(['Howard Aircraft Corp.'])), 'Make'] = 'Teratorn Aircraft'

df[df['Make'].str.lower().str.startswith('howard aero')].value_counts('Make')

Make
Howard Aero Incorporated    1
Name: count, dtype: int64

In [1579]:
df[df['Make'].str.lower().str.startswith('howard')].value_counts('Make')

Make
Howard Aircraft Corp.       10
HOWARD AIRCRAFT              3
Howard                       2
Howard Aircraft              2
HOWARD                       1
HOWARD M. SHEPHERD           1
Howard Aero Incorporated     1
Howard Steven C              1
Howard William C             1
Name: count, dtype: int64

In [1580]:
df.loc[df['Make'].isin(['Howard Aircraft Corp.', 'HOWARD AIRCRAFT']),'Make'] = 'Howard Aircraft'

df[df['Make'].str.lower().str.startswith('howard')].value_counts('Make')

Make
Howard Aircraft             15
Howard                       2
HOWARD                       1
HOWARD M. SHEPHERD           1
Howard Aero Incorporated     1
Howard Steven C              1
Howard William C             1
Name: count, dtype: int64

In [1581]:
df[df['Make'] == 'Howard Aircraft'].value_counts('Aircraft_Category', dropna=False)

Aircraft_Category
NaN         8
Airplane    7
Name: count, dtype: int64

In [1582]:
df.loc[df['Make'].isin(['Howard Aircraft']),'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Make
Unknown                      45
American Blimp                6
Intermountain Mfg. (imco)     6
General Balloon               5
Porterfield                   5
Laister                       5
Curtiss                       5
Scheibe Flugzeugbau           5
Colonial                      4
Artic Aircraft Corp.          4
Name: count, dtype: int64

We can fix some of these empty categories through web search

In [1583]:
df.loc[df['Make'].isin(['American Blimp']),'Aircraft_Category'] = 'Blimp'
df.loc[df['Make'].isin(['General Balloon']),'Aircraft_Category'] = 'Balloon'
df.loc[df['Make'].isin(['Laister', 'Scheibe Flugzeugbau']),'Aircraft_Category'] = 'Glider'
df.loc[df['Make'].isin(['Intermountain Mfg. (imco)', 'Porterfield', 'Colonial', 'Artic Aircraft Corp.', 'Curtiss']),'Aircraft_Category'] = 'Airplane'

df['Aircraft_Category'].value_counts(dropna=False)

Aircraft_Category
Airplane             73269
Helicopter            7708
Glider                1273
Balloon                667
NaN                    611
Gyrocraft              209
Weight-Shift           160
Ultralight             110
Powered Parachute       90
Blimp                   10
WSFT                     9
Unknown                  5
Powered-Lift             2
UNK                      2
Rocket                   1
ULTR                     1
Name: count, dtype: int64

At this point, the empty values in the category column have been reduced from about 56,000 to about 600. How does the current dataset look?

In [1584]:
df.loc[df['Aircraft_Category'].isin(['UNK']), 'Aircraft_Category'] = 'Unknown'
df.loc[df['Aircraft_Category'].isin(['ULTR']), 'Aircraft_Category'] = 'Ultralight'
df.loc[df['Aircraft_Category'].isin(['WSFT']), 'Aircraft_Category'] = 'Weight-Shift'

In [1585]:
df['Aircraft_Category'] = df['Aircraft_Category'].fillna('Unknown')

In [1586]:
df['Aircraft_Category'].value_counts(dropna=False)

Aircraft_Category
Airplane             73269
Helicopter            7708
Glider                1273
Balloon                667
Unknown                618
Gyrocraft              209
Weight-Shift           169
Ultralight             111
Powered Parachute       90
Blimp                   10
Powered-Lift             2
Rocket                   1
Name: count, dtype: int64

In [1587]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 84127 entries, 0 to 88888
Data columns (total 21 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Event_Id                84127 non-null  object 
 1   Investigation_Type      84127 non-null  object 
 2   Accident_Number         84127 non-null  object 
 3   Event_Date              84127 non-null  object 
 4   Location                84127 non-null  object 
 5   Country                 84127 non-null  object 
 6   Aircraft_damage         84127 non-null  object 
 7   Aircraft_Category       84127 non-null  object 
 8   Make                    84127 non-null  object 
 9   Model                   84127 non-null  object 
 10  Amateur_Built           84127 non-null  object 
 11  Number_of_Engines       84127 non-null  object 
 12  Engine_Type             84127 non-null  object 
 13  Purpose_of_flight       84127 non-null  object 
 14  Total_Fatal_Injuries    84127 non-null  flo

# Exploratory Data Analysis

In [1588]:
# Percentages for Aircraft_Category
df['Aircraft_Category'].value_counts(normalize=True)

Aircraft_Category
Airplane             0.870933
Helicopter           0.091623
Glider               0.015132
Balloon              0.007928
Unknown              0.007346
Gyrocraft            0.002484
Weight-Shift         0.002009
Ultralight           0.001319
Powered Parachute    0.001070
Blimp                0.000119
Powered-Lift         0.000024
Rocket               0.000012
Name: proportion, dtype: float64

So our dataset categories are about 87% airplane, almost 10% helicopter, and the rest (including 'Unknown') cover the remaining few percent.
Let's compare the airplanes and helicopters to the damage stats.

In [1589]:
incidents_airplane = df[df['Aircraft_Category'] == 'Airplane']
incidents_helicopter = df[df['Aircraft_Category'] == 'Helicopter']

substantial_damage_airplane = incidents_airplane[incidents_airplane['Aircraft_damage'] == 'Substantial'].shape[0]
minor_damage_airplane = incidents_airplane[incidents_airplane['Aircraft_damage'] == 'Minor'].shape[0]
destroyed_airplane = incidents_airplane[incidents_airplane['Aircraft_damage'] == 'Destroyed'].shape[0]

substantial_damage_percent_airplane = substantial_damage_airplane / incidents_airplane.shape[0]*100
minor_damage_percent_airplane = minor_damage_airplane / incidents_airplane.shape[0]*100
destroyed_percent_airplane = destroyed_airplane / incidents_airplane.shape[0]*100

substantial_damage_helicopter = incidents_helicopter[incidents_helicopter['Aircraft_damage'] == 'Substantial'].shape[0]
minor_damage_helicopter = incidents_helicopter[incidents_helicopter['Aircraft_damage'] == 'Minor'].shape[0]
destroyed_helicopter = incidents_helicopter[incidents_helicopter['Aircraft_damage'] == 'Destroyed'].shape[0]

substantial_damage_percent_helicopter = substantial_damage_helicopter / incidents_helicopter.shape[0]*100
minor_damage_percent_helicopter = minor_damage_helicopter / incidents_helicopter.shape[0]*100
destroyed_percent_helicopter = destroyed_helicopter / incidents_helicopter.shape[0]*100

print(f'The percentage of substantial damage in all airplane incidents is {substantial_damage_percent_airplane:.1f}%')
print(f'The percentage of substantial damage in all helicopter incidents is {substantial_damage_percent_helicopter:.1f}%')
print()
print(f'The percentage of minor damage in all airplane incidents is {minor_damage_percent_airplane:.1f}%')
print(f'The percentage of minor damage in all helicopter incidents is {minor_damage_percent_helicopter:.1f}%')
print()
print(f'The percentage of destroyed in all airplane incidents is {destroyed_percent_airplane:.1f}%')
print(f'The percentage of destroyed in all helicopter incidents is {destroyed_percent_helicopter:.1f}%')

The percentage of substantial damage in all airplane incidents is 72.8%
The percentage of substantial damage in all helicopter incidents is 73.2%

The percentage of minor damage in all airplane incidents is 3.2%
The percentage of minor damage in all helicopter incidents is 1.6%

The percentage of destroyed in all airplane incidents is 20.3%
The percentage of destroyed in all helicopter incidents is 22.7%


The damage level percentages are very similar for planes and helicopters.

For the business problem, I need to identify the best makes and models for my company to consider investing in. I've determined that my company is interested in single-engine airplanes and perhaps even helicopters for use in a short-range corporate transporation scenario.

So first I need to identify the single-engine aircraft in the dataset and divide those between planes and helicopters.

In [1590]:
single_engine_craft = df[df['Number_of_Engines'] == 1.0]

single_engine_planes = single_engine_craft[single_engine_craft['Aircraft_Category'] == 'Airplane']
single_engine_helicopters = single_engine_craft[single_engine_craft['Aircraft_Category'] == 'Helicopter']

print(single_engine_craft['Aircraft_Category'].value_counts())
print()
print(single_engine_planes['Aircraft_Category'].value_counts())
print()
print(single_engine_helicopters['Aircraft_Category'].value_counts())

Aircraft_Category
Airplane             57806
Helicopter            6346
Unknown                374
Gyrocraft              204
Weight-Shift           161
Glider                 148
Ultralight             103
Powered Parachute       86
Balloon                 15
Blimp                    2
Powered-Lift             1
Rocket                   1
Name: count, dtype: int64

Aircraft_Category
Airplane    57806
Name: count, dtype: int64

Aircraft_Category
Helicopter    6346
Name: count, dtype: int64


So this tells me I have 57806 plane accidents (single_engine_planes subset) and 6346 helicopter accidents (single_engine_helicopters subset) to work with. But not all of these planes and helicopters would be suitable to use as corporate transportation since a lot of them are going to be small, personal aircraft, not business aircraft. So some further narrowing is called for to identify single-engine business aircraft.

In [1591]:
# Look at the makes of the single-engine planes subset
print(single_engine_planes['Make'].value_counts())

Make
Cessna           23631
Piper            12174
Beech             3115
Grumman           1503
Mooney            1339
                 ...  
NAGEL                1
Cox Clyde H          1
GIER TRAVIS H        1
LAIRD                1
ORLICAN S R O        1
Name: count, Length: 3737, dtype: int64


Cessna currently makes a business single-engine prop plane in the Caravan series, known as the 208 model

In [1592]:
cessna_planes = single_engine_craft[single_engine_craft['Make'] == 'Cessna']

# show the models of cessna_planes that begin with 208
print(cessna_planes[cessna_planes['Model'].str.startswith('208')]['Model'].value_counts())
print(incidents_airplane[incidents_airplane['Model'].str.startswith('208')]['Model'].value_counts())

Model
208B           136
208             78
208A             8
208 Caravan      1
Name: count, dtype: int64
Model
208B           187
208            103
208A             8
208 Caravan      1
Name: count, dtype: int64


I see here that the Cessna 208 Models are not all categorized correctly in either the engine number column or the make column.

In [1593]:
df[df['Model'].str.contains('208')].value_counts('Number_of_Engines', dropna=False)

Number_of_Engines
1.0        228
Unknown     75
2.0          3
Name: count, dtype: int64

Since we know that the 208 models are single-engine planes, I can correct this here, adding more useful planes to my Cessna subset

In [1594]:
df[df['Model'].str.contains('208')].value_counts('Model', dropna=False)

Model
208B             187
208              103
208A               8
C208B              2
CE-208             2
208 Caravan        1
C-208 Caravan      1
C208               1
C208B Caravan      1
Name: count, dtype: int64

In [1596]:
df.loc[df['Model'].str.contains('208'), 'Number_of_Engines'] = 1.0

df[df['Model'].str.contains('208')].value_counts('Number_of_Engines', dropna=False)

Number_of_Engines
1.0    306
Name: count, dtype: int64

In [1597]:
df[df['Model'].str.contains('208')].value_counts('Make', dropna=False)

Make
Cessna                  303
TEXTRON AVIATION INC      3
Name: count, dtype: int64

In [1599]:
df.loc[df['Make'].isin(['TEXTRON AVIATION INC']),'Make'] = 'Cessna'

#refresh the single_engine and cessna subsets
single_engine_craft = df[df['Number_of_Engines'] == 1.0]
cessna_planes = single_engine_craft[single_engine_craft['Make'] == 'Cessna']

# show the models of cessna_planes that contain 208
print(cessna_planes[cessna_planes['Model'].str.contains('208')]['Model'].value_counts())

Model
208B             187
208              103
208A               8
CE-208             2
C208B              2
C-208 Caravan      1
C208               1
208 Caravan        1
C208B Caravan      1
Name: count, dtype: int64


Piper currently makes a business single-engine plane in the M series (M350, M500, M700), also known as PA-46 in our dataset.

In [1601]:
piper_planes = single_engine_craft[single_engine_craft['Make'] == 'Piper']

print(piper_planes[piper_planes['Model'].str.contains('PA-46')]['Model'].value_counts())

Model
PA-46-310P       79
PA-46-350P       65
PA-46-500TP      18
PA-46            15
PA-46-310         2
PA-46-350         2
PA-46T            2
PA-46-600TP       2
PA-46-310-P       1
PA-46P-350        1
PA-46-301P        1
PA-46/Jetprop     1
PA-46T-350P       1
PA-46-31P         1
Name: count, dtype: int64


In [1603]:
# Let's make sure we have all the Piper PA-46 models in the piper subset
print(df[df['Model'].str.contains('PA-46')]['Model'].value_counts())

Model
PA-46-310P       84
PA-46-350P       75
PA-46-500TP      23
PA-46            19
PA-46-350         4
PA-46T            3
PA-46-310         2
PA-46-600TP       2
PA-46-310-P       1
PA-46P-350        1
PA-46-301P        1
PA-46/Jetprop     1
PA-46T-350P       1
PA-46-500T        1
PA-46-31P         1
Name: count, dtype: int64


In [1604]:
# Is it the engine number that was entered incorrectly?
df[df['Model'].str.startswith('PA-46')].value_counts('Number_of_Engines', dropna=False)

Number_of_Engines
1.0        196
Unknown     22
0.0          1
Name: count, dtype: int64

In [1605]:
# or perhaps the make for the PA-46 is wrong
df[df['Model'].str.startswith('PA-46')].value_counts('Make', dropna=False)

Make
Piper                       214
NEW PIPER AIRCRAFT INC        3
New Piper                     1
New Piper Aircraft, Inc.      1
Name: count, dtype: int64

In [1606]:
# Correct the engine numbers for PA-46 models and combine the piper makes into 'Piper'
df.loc[df['Model'].str.startswith('PA-46'), 'Number_of_Engines'] = 1.0
df.loc[df['Make'].isin(['NEW PIPER AIRCRAFT INC', 'New Piper', 'New Piper Aircraft, Inc.']),'Make'] = 'Piper'

#refresh the single_engine and piper subsets
single_engine_craft = df[df['Number_of_Engines'] == 1.0]
piper_planes = single_engine_craft[single_engine_craft['Make'] == 'Piper']

print(piper_planes[piper_planes['Model'].str.startswith('PA-46')]['Model'].value_counts())

Model
PA-46-310P       84
PA-46-350P       75
PA-46-500TP      23
PA-46            19
PA-46-350         4
PA-46T            3
PA-46-310         2
PA-46-600TP       2
PA-46-310-P       1
PA-46P-350        1
PA-46-301P        1
PA-46/Jetprop     1
PA-46T-350P       1
PA-46-500T        1
PA-46-31P         1
Name: count, dtype: int64


In [1064]:
beech_planes = single_engine_craft[single_engine_craft['Make'] == 'Beech']

print(beech_planes['Model'].value_counts())

Model
A36             399
C23             191
V35B            154
35              134
F33A            109
               ... 
E 33              1
BE19              1
P-35 BONANZA      1
BE-35M            1
Sierra            1
Name: count, Length: 256, dtype: int64


# Conclusions

## Limitations

## Recommendations

## Next Steps