# Business Understanding

The business is interested in expanding its portfolio by becoming involved in the aviation industry, specifically as an owner/operator of aircraft for short-range corporate transportation that could involve light planes and helicopters.

I have been tasked with helping to determine some of the risks and suggesting which aircraft would be best suited for the company at the beginning stages of their new aviation division.

The stakeholders involved here would include not only the owners of the company, but also the department heads and employees of the aviation division that oversee and operate the aircraft for the company.

The goals for this project include recommending what kind of aircraft would provide the least risk for a commercial enterprise and suggesting certain operating protocols to help mitigate those risks.

# Data Understanding

The dataset being made available for this project is the National Transportation Safety Board aviation accident database as hosted on Kaggle.com at <a href="https://www.kaggle.com/datasets/khsamaha/aviation-accident-database-synopses" target="_blank">this link</a>. This dataset contains information about civil aviation accidents mainly in the US and includes many types of aircraft, from hot air balloons and powered parachutes to helicopters and airplanes. The current dataset contains 87,951 unique "Event ID" numbers, each representing an aircraft incident. It currently covers the years mainly from 1982 through 2022, with just a handful of accidents recorded before 1982. The dataset has 31 columns for each accident investigation that includes information like date and location, type of aircraft, make and model, injury severity information and number of injured, aircraft damage level, phase of flight for the accident, weather conditions, and reasons for the accident after the investigation is complete.

As the project is centered around risks of aviation, this dataset should prove to be a valuable resource for determining what kinds of risks exist in operating aircraft and making recommendations as far as what type of aircraft would be less of an investment risk. The columns detailing injury levels (Fatal, Serious, Minor, and Uninjured) to passengers and crew illuminate the human risks in aviation. Information related to aircraft damage levels will be valuable in terms of the financial risks.

Of concern in working with the dataset will be the lack of values in certain columns, especially the aircraft category and the accident reason columns. The "Aircraft Category" column is currently 64% empty, and the "Report Status" column (which provides a reason for the accident) is over 70% lacking in useful information. These two columns especially will need some in-depth cleaning and preparation.

# Data Preparation

## Data Cleaning

The dataset is named AviationData.csv and is in the data folder

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv('data/AviationData.csv', encoding='latin-1')

df.head()

rename columns to remove dots as they may cause errors in Python (replace dots with underscores)

In [None]:
df.columns = df.columns.str.replace('.', '_')

df.head()

In [None]:
df.info()

### As Event ID provides a unique identifier for each incident, let's check for duplicate rows

In [None]:
df[df.duplicated(subset=['Event_Id'], keep=False)]

I see here that though these duplicate rows do represent separate aircraft in multi-aircraft incidents, the injury and/or fatality numbers are combined. This would constitute duplicate numbers in certain columns that would render errors in the analysis when making use of the injury values.

So let's remove the duplicates from this subset.

In [None]:
df = df.drop_duplicates(subset=['Event_Id'], keep='first')

# Double check to make sure duplicates have been removed
df[df.duplicated(subset=['Event_Id'], keep=False)]

In [None]:
df.info()

## Columns that are not needed
Remove certain columns that are mostly empty (and can't be filled in) and/or would not contain data useful to the intended analysis.

I want to make heavy use of: date, injury, damage, category, phase of flight, and report status
Let's remove Latitude, Longitude, Airport_Code, Airport_Name, Registration_Number, FAR_Description, Schedule, Air_carrier, and Publication_Date as those columns are either mostly empty or would not contribute to the analysis.

In [None]:
df = df.drop(['Latitude', 'Longitude', 'Airport_Code', 'Airport_Name', 'Registration_Number', 'FAR_Description', 'Schedule', 'Air_carrier', 'Publication_Date'], axis=1)

df.info()

In [None]:
df.head()

### Incomplete Columns
Now, we have 87,951 entries in the dataset. Most of the columns are incomplete though. For the columns that cannot be completed with reasonable values, we can fill some of them in with 'Unknown' instead of leaving them blank (NaN).

Empty Location, Country, Aircraft_damage, Make, Model, Amateur_Built, Number_of_Engines, Engine_Type, Purpose_of_flight, Weather_Condition, Broad_phase_of_flight, and Report_Status values can be filled in as 'Unknown'.

In [None]:
# Fill in NaN values in multiple columns with "Unknown"
columns_to_fill = ['Location', 'Country', 'Aircraft_damage', 'Make', 'Model', 'Amateur_Built', 'Number_of_Engines', 'Engine_Type', 'Purpose_of_flight', 
                   'Weather_Condition', 'Broad_phase_of_flight', 'Report_Status']
for column in columns_to_fill:
    df[column] = df[column].fillna('Unknown')

df.info()

The 4 injury columns (15 - 18) are incomplete, but they are float64, or integer, values, so we can't fill those empty values with "Unknown". The empty values should be changed to 0 to complete those columns.

In [None]:
# Fill in NaN values in multiple columns with 0
injury_columns_to_fill = ['Total_Fatal_Injuries', 'Total_Serious_Injuries', 'Total_Minor_Injuries', 'Total_Uninjured']
for column in injury_columns_to_fill:
    df[column] = df[column].fillna(0)

df.info()

In [None]:
df.head()

Take a look at the Injury_Severity column

In [None]:
df['Injury_Severity'].value_counts(dropna=False)

We see here the various values in that column give a number of fatal injuries for each accident. Since this number is already represented in the column for Total_Fatal_Injuries, we don't need this column, so can delete it.

In [None]:
# Drop the Injury_Severity column
df = df.drop('Injury_Severity', axis=1)

df.info()

### Aircraft_Category
The category of aircraft is important to the analysis, but the column is mostly empty.

Many of the empty values can be filled in using the Make column, though.

In [None]:
df['Aircraft_Category'].value_counts(dropna=False)

In [None]:
df['Make'].value_counts(dropna=False)

I see here that there may exist multiple versions of the same makes, like "Cessna" and "CESSNA". It would be nice to clean this column for multiple versions of make names.

We can start with Cessna since it has the most in value_counts and see what other versions of that name are in the dataset.

In [None]:
# Show Make value beginning with ces, ignoring case
df[df['Make'].str.lower().str.startswith('ces')].value_counts('Make')

So all these makes can be cleaned by changing the values to "Cessna"

In [None]:
# Convert all these cessna values to 'Cessna'
df.loc[df['Make'].str.lower().str.startswith('ces'), 'Make'] = 'Cessna'

df[df['Make'].str.lower().str.startswith('ces')].value_counts('Make')

We now have almost 27000 Cessna makes instead. So we can now look at the category values for these makes.

In [None]:
# Aircraft_Category values for Cessna in the Make column, include NaN
df[df['Make'] == 'Cessna'].value_counts('Aircraft_Category', dropna=False)

It looks like it would be safe to replace the empty category values (and 1 unknown) for the Cessna make with "Airplane"

In [None]:
# Fill in Aircraft_Category as 'Airplane' for Cessna
df.loc[df['Make'] == 'Cessna', 'Aircraft_Category'] = 'Airplane'

In [None]:
# Let's look at Make value_counts again
df['Make'].value_counts(dropna=False)

In [None]:
# Show Make value beginning with piper, ignoring case
df[df['Make'].str.lower().str.startswith('piper')].value_counts('Make')

In [None]:
# Convert all these piper values to 'Piper' and then take a look at its category values
df.loc[df['Make'].str.lower().str.startswith('piper'), 'Make'] = 'Piper'

df[df['Make'] == 'Piper'].value_counts('Aircraft_Category', dropna=False)

In [None]:
# Fill in the NaN values for the category column for Piper as "Airplane"
df.loc[df['Make'] == 'Piper', 'Aircraft_Category'] = 'Airplane'

In [None]:
# Show Make value beginning with beech, ignoring case
df[df['Make'].str.lower().str.startswith('beech')].value_counts('Make')

A quick Google search tells me that Beech and Beechcraft are the same make.

In [None]:
df.loc[df['Make'].str.lower().str.startswith('beech'), 'Make'] = 'Beech'

df[df['Make'] == 'Beech'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Beech', 'Aircraft_Category'] = 'Airplane'

In [None]:
df[df['Make'].str.lower().str.startswith('bell')].value_counts('Make')

In [None]:
# Bellanca and Bell are not the same make, so will take a little more work to clean all the various bell combinations.
# change the various interations of bell to Bell
df.loc[df['Make'].str.lower().str.startswith(('bell-', 'bell/', 'bell h', 'bell t', 'bell s', 'bell b', 'bell 4'), na=False), 'Make'] = 'Bell'

# make Bell and BELL the same
df.loc[(df['Make'] == 'BELL'), 'Make'] = 'Bell'

# address the various versions of Bellanca
df.loc[df['Make'].str.lower().str.startswith(('bellan'), na=False), 'Make'] = 'Bellanca'

# check the list again
df[df['Make'].str.lower().str.startswith('bell')].value_counts('Make')

In [None]:
# Now we can look at the categories for Bell and Bellanca
df[df['Make'] == 'Bell'].value_counts('Aircraft_Category', dropna=False)

In [None]:
# Bell can safely be changed to Helicopter for its category
df.loc[df['Make'] == 'Bell', 'Aircraft_Category'] = 'Helicopter'

df[df['Make'] == 'Bell'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'] == 'Bellanca'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Bellanca', 'Aircraft_Category'] = 'Airplane'

In [None]:
# Let's look at Make value_counts again
df['Make'].value_counts(dropna=False)

In [None]:
# clean the boeing make
df[df['Make'].str.lower().str.startswith('boei')].value_counts('Make')

In [None]:
# change the various iterations of boeing to Boeing
df.loc[df['Make'].str.lower().str.startswith('boeing'), 'Make'] = 'Boeing'

df[df['Make'] == 'Boeing'].value_counts('Aircraft_Category', dropna=False)

In [None]:
# Boeing can safely be assigned to the Airplane category
df.loc[df['Make'] == 'Boeing', 'Aircraft_Category'] = 'Airplane'

In [None]:
# Let's look at the top 50 Make value counts and see if there are any that can be cleaned up
df['Make'].value_counts().head(60)

In [None]:
# change the various iterations of aeronca to Aeronca
df.loc[df['Make'].str.lower().str.startswith('aeronca'), 'Make'] = 'Aeronca'

df[df['Make'] == 'Aeronca'].value_counts('Aircraft_Category', dropna=False)

In [None]:
# Aeronca is airplane
df.loc[df['Make'] == 'Aeronca', 'Aircraft_Category'] = 'Airplane'

In [None]:
# change the various iterations of Air Tractor and check its category values
df.loc[df['Make'].str.lower().str.startswith('air tractor'), 'Make'] = 'Air Tractor'

df[df['Make'] == 'Air Tractor'].value_counts('Aircraft_Category', dropna=False)

In [None]:
# Air Tractor is Airplane
df.loc[df['Make'] == 'Air Tractor', 'Aircraft_Category'] = 'Airplane'

In [None]:
# look at airbus
df[df['Make'].str.lower().str.startswith('airbus')].value_counts('Make')

In [None]:
df[df['Make'].str.lower().str.startswith('airbus')].value_counts('Aircraft_Category', dropna=False)

I see here that the various versions of Airbus have both airplanes and helicopters in the make and category columns, so before replacing make names and then filling in empty category values, I need to check the categories for some of the make iterations that may not be clear.

In [None]:
# check out Airbus Industrie iterations
df[df['Make'].str.lower().str.startswith('airbus i')].value_counts('Aircraft_Category', dropna=False)

So, Airbus Industrie, AIRBUS INDUSTRIE, and Airbus Industries can be combined and categorized as airplane

In [None]:
df.loc[df['Make'].isin(['AIRBUS INDUSTRIE', 'Airbus Industries']), 'Make'] = 'Airbus Industrie'

df.loc[df['Make'] == 'Airbus Industrie', 'Aircraft_Category'] = 'Airplane'

df[df['Make'].str.lower().str.startswith('airbus')].value_counts('Make')

In [None]:
# clean up Airbus Helicopters
df.loc[df['Make'].isin(['AIRBUS HELICOPTERS', 'AIRBUS Helicopters', 'AIRBUS HELICOPTERS INC', 'AIRBUS HELICOPTER', 'AIRBUS/EUROCOPTER', 'Airbus Helicopters (Eurocopte', 'Airbus Helicopters Deutschland']), 'Make'] = 'Airbus Helicopters'

df.loc[df['Make'] == 'Airbus Helicopters', 'Aircraft_Category'] = 'Helicopter'

df[df['Make'].str.lower().str.startswith('airbus')].value_counts('Make')

In [None]:
# combine the Airbus iterations
df.loc[(df['Make'] == 'AIRBUS'), 'Make'] = 'Airbus'

df[df['Make'] == 'Airbus'].value_counts('Aircraft_Category', dropna=False)

Since only 20 of the almost 300 records for Airbus are helicopters, we can safely make the NaN values Airplane

In [None]:
df.loc[(df['Make'] == 'Airbus') & (df['Aircraft_Category'].isna()), 'Aircraft_Category'] = 'Airplane'

df[df['Make'].str.lower().str.startswith('airbus')].value_counts('Make')

In [None]:
df['Aircraft_Category'].value_counts(dropna=False)

So from having over 56,000 empty values in the category column, we are down to 18,695 empty values. I'd like to bring this down even further by looking at the empty category values as compared with the Make column to see which makes have the most empty values for category.

In [None]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

Let's look at the category values for these makes that have the most empty category values

In [None]:
# Grumman
df[df['Make'] == 'Grumman'].value_counts('Aircraft_Category', dropna=False)

So Grumman is Airplane

In [None]:
# check to see if there are any other versions of 'Grumman' in the make column
df[df['Make'].str.lower().str.startswith('grumm')].value_counts('Make')

In [None]:
# What are the category values for all these different versions of Grumman
df[df['Make'].str.lower().str.startswith('grumm')].value_counts('Aircraft_Category', dropna=False)

In [None]:
# So let's combine all these Grumman makes together and make their category Airplane
df.loc[df['Make'].str.lower().str.startswith('grumm'), 'Make'] = 'Grumman'

df.loc[df['Make'] == 'Grumman', 'Aircraft_Category'] = 'Airplane'

df[df['Make'].str.lower().str.startswith('grumm')].value_counts('Make')

In [None]:
# Mooney
df[df['Make'] == 'Mooney'].value_counts('Aircraft_Category', dropna=False)

In [None]:
# check to see if there are any other versions of 'Mooney' in the make column
df[df['Make'].str.lower().str.startswith('moon')].value_counts('Make')

In [None]:
# What are the category values for all these different versions of Mooney
df[df['Make'].str.lower().str.startswith('moon')].value_counts('Aircraft_Category', dropna=False)

In [None]:
# Combine all these Mooney makes together and make their category Airplane
df.loc[df['Make'].str.lower().str.startswith('mooney'), 'Make'] = 'Mooney'

df.loc[df['Make'] == 'Mooney', 'Aircraft_Category'] = 'Airplane'

df[df['Make'].str.lower().str.startswith('moon')].value_counts('Make')

In [None]:
# Hughes
df[df['Make'] == 'Hughes'].value_counts('Aircraft_Category', dropna=False)

Hughes would be all helicopters

In [None]:
# check to see if there are any other versions of 'Hughes' in the make column
df[df['Make'].str.lower().str.startswith('hughes')].value_counts('Make')

In [None]:
df[df['Make'].str.lower().str.startswith('hughes')].value_counts('Aircraft_Category', dropna=False)

Now here we have a few airplanes and parachutes in addition to all the helicopters in our list of Hughes interations. This may be due to some people named Hughes in the list that are not associated with the helicopter company. We can narrow the list down to find just the helicopter Hughes.

In [None]:
df[df['Make'].isin(['Hughes', 'HUGHES', 'HUGHES HELICOPTERS INC', 'HUGHES/HELICOPTER ASSOCS INC'])].value_counts('Aircraft_Category', dropna=False)

In [None]:
# So these 4 can be combined and made Helicopter in the category field
df.loc[df['Make'].isin(['Hughes', 'HUGHES', 'HUGHES HELICOPTERS INC', 'HUGHES/HELICOPTER ASSOCS INC']), 'Make'] = 'Hughes Helicopters'

df.loc[df['Make'] == 'Hughes Helicopters', 'Aircraft_Category'] = 'Helicopter'

df[df['Make'] == 'Hughes Helicopters'].value_counts('Aircraft_Category', dropna=False)

In [None]:
# Robinson
df[df['Make'].str.lower().str.startswith('robins')].value_counts('Make')

In [None]:
df[df['Make'].str.lower().str.startswith('robins')].value_counts('Aircraft_Category', dropna=False)

In [None]:
# combine all the Robinson Helicopter iterations and make them Helicopter
df.loc[df['Make'].isin(['ROBINSON', 'ROBINSON HELICOPTER', 'ROBINSON HELICOPTER COMPANY', 'ROBINSON HELICOPTER CO', 'Robinson Helicopter Company', 'Robinson Helicopter', 'ROBINSON HELICOPTER CO INC', 'Robinson Helicopter Co.', 'Robinson Helicopters']), 'Make'] = 'Robinson'

df.loc[df['Make'] == 'Robinson', 'Aircraft_Category'] = 'Helicopter'

df[df['Make'].str.lower().str.startswith('robins')].value_counts('Aircraft_Category', dropna=False)

In [None]:
# Schweizer
df[df['Make'].str.lower().str.startswith('schweiz')].value_counts('Make')

In [None]:
df[df['Make'].str.lower().str.startswith('schweiz')].value_counts('Aircraft_Category', dropna=False)

A more healthy mixture here requires some investigation

In [None]:
df[df['Make'].isin(['SCHWEIZER', 'Schweizer'])].value_counts('Aircraft_Category', dropna=False)

A quick google search informs me that Schweizer Aircraft made helicopters, gliders, and airplanes, so filling in the category column for Schweizer cannot be accomplished just by using the make column. As the empty values only amount to almost 550, I'm going to leave Schweizer alone for now, except for combining the makes together so that I would be able to more easily dig into it using the model column as well.

In [None]:
df.loc[df['Make'].isin(['SCHWEIZER', 'SCHWEIZER AIRCRAFT CORP', 'Schweizer Aircraft Corp', 'Schweizer Aircraft Corp.', 'Schweizer 300CBi', 'Schweizer Sgs']), 'Make'] = 'Schweizer'

df[df['Make'].str.lower().str.startswith('schweiz')].value_counts('Make')

Now how are the empty category counts looking?

In [None]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
# Let's look at the Scheizer models
df[df['Make'].isin(['Schweizer'])].value_counts('Model', dropna=False)

Wikepedia and Google informs me that the Schweizer 269C is a helicopter, G-164B is an airplane, SGS 2-33A is a glider, 269C-1 is a helicopter, and G-164A is an airplane. Let's see if that data could e used to fill some of the Schweizer category values.

In [None]:
df[df['Model'].isin(['269C'])].value_counts('Aircraft_Category', dropna=False)

In [None]:
# Since the 269C model is a helicopter, let's fix all the empty category values for it. This fix will also fill in some category 
# values for other makes as well since we can see that there are more 269C models than just the Schweizer make.
df.loc[df['Model'] == '269C', 'Aircraft_Category'] = 'Helicopter'

In [None]:
# The same goes for the rest of the models listed
df[df['Model'].isin(['G-164B', 'G-164A'])].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Model'].isin(['G-164B', 'G-164A']), 'Aircraft_Category'] = 'Airplane'

In [None]:
df[df['Model'].isin(['SGS 2-33A'])].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Model'] == 'SGS 2-33A', 'Aircraft_Category'] = 'Glider'

In [None]:
# Let's look at how the category column is shaping up
df['Aircraft_Category'].value_counts(dropna=False)

We still have about 15,000 empty category records. This can be brought down further using Makes and Models. The category values as we have them now show that airplanes are the overwhelmingly largest percentage of aircraft in the dataset of accidents. But after helicopters, the rest of the categories are tiny by comparison, and they constitute aircraft that would not ordinarily be under consideration for a business interested in getting into the aviation business. I'm not going to just drop those rows right now, but in the analysis phase, I don't anticipate using them.

In [None]:
df[df['Aircraft_Category'].isna()]['Model'].value_counts()

In [None]:
# Let's do the Mcdonnell Douglas make, and see about using the models in conjunction
df[df['Make'].str.lower().str.startswith('mcdonn')].value_counts('Make')

In [None]:
df[df['Make'].isin(['MCDONNELL DOUGLAS HELICOPTER', 'MCDONNELL DOUGLAS HELI CO'])].value_counts('Aircraft_Category', dropna=False)

In [None]:
# Combine the helicopter variations of the name
df.loc[df['Make'].isin(['MCDONNELL DOUGLAS HELICOPTER', 'MCDONNELL DOUGLAS HELI CO', 'McDonnell Douglas Helicopter', 'McDonnell Douglas Helicopter C', 'McDonnell Douglas Helicopters', 'Mcdonnell Douglas Helicopter', 'Mcdonnell Douglas Helicopters']), 'Make'] = 'Mcdonnell Douglas Helicopters'

df[df['Make'].isin(['Mcdonnell Douglas Helicopters'])].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('mcdonn')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['MCDONNELL DOUGLAS', 'MCDONNELL DOUGLAS AIRCRAFT CO', 'McDonnell Douglas', 'Mcdonnell-douglas', 'MCDONNELL DOUGLAS CORPORATION', 'MCDONNELL-DOUGLAS']), 'Make'] = 'Mcdonnell Douglas'

df[df['Make'].isin(['Mcdonnell Douglas'])].value_counts('Aircraft_Category', dropna=False)

In [None]:
# make the 18 Helicopters the Mcdonnell Douglas Helicopters Make
df.loc[(df['Make'] == 'Mcdonnell Douglas') & (df['Aircraft_Category'] == 'Helicopter'), 'Make'] = 'Mcdonnell Douglas Helicopters'

df[df['Make'].isin(['Mcdonnell Douglas'])].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[(df['Make'] == 'Mcdonnell Douglas'), 'Aircraft_Category'] = 'Airplane'

df[df['Make'].str.lower().str.startswith('mcdonn')].value_counts('Make')

In [None]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
# The Maule make
df[df['Make'].str.lower().str.startswith('maul')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['MAULE', 'MAULE AIRCRAFT CORP', 'Maule Air Inc.']), 'Make'] = 'Maule'

df[df['Make'].isin(['Maule'])].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[(df['Make'] == 'Maule'), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
# The Champion make
df[df['Make'].str.lower().str.startswith('champ')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['CHAMPION']), 'Make'] = 'Champion'

df[df['Make'].isin(['Champion'])].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[(df['Make'] == 'Champion'), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
# The Aero Commander make
df[df['Make'].str.lower().str.startswith('aero c')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['AERO COMMANDER']), 'Make'] = 'Aero Commander'

df[df['Make'].isin(['Aero Commander'])].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[(df['Make'] == 'Aero Commander'), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
# The De Havilland make
de_havilland_variations = df[df['Make'].str.lower().str.contains(r'de\s?havil?land', regex=True)]
de_havilland_variations.value_counts('Make')

In [None]:
# combine all these variations of De Havilland into one make
df.loc[df['Make'].str.lower().str.contains(r'de\s?havil?land', regex=True), 'Make'] = 'De Havilland'

df[df['Make'].isin(['De Havilland'])].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[(df['Make'] == 'De Havilland'), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts()

In [None]:
# Let's look at the Models overall for NaN values in Category
df[df['Aircraft_Category'].isna()]['Model'].value_counts()

Google tells me that a UH-12E is a helicopter, while 8A, S-2R, 415-C, BC12-D are airplanes. And running the function like "df[df['Model'].isin(['BC12-D'])].value_counts('Aircraft_Category', dropna=False)" verifies this. So let's correct those category values

In [None]:
#Edit one model's category value
df.loc[df['Model'] == 'UH-12E', 'Aircraft_Category'] = 'Helicopter'

#Edit multiple models' category value
df.loc[df['Model'].isin(['8A', 'S-2R', '415-C', 'BC12-D']), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Model'].value_counts()

In [None]:
# Running this function tells me the top 5 are airplanes
df[df['Model'].isin(['108-3', 'S2R', 'RV-4', '108-2', 'KR-2'])].value_counts('Aircraft_Category', dropna=False)

In [None]:
#Edit multiple models' category value
df.loc[df['Model'].isin(['108-3', 'S2R', 'RV-4', '108-2', 'KR-2']), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Model'].value_counts()

In [None]:
# the top 4 are all airplanes
df[df['Model'].isin(['108-1', 'LA-4-200', 'GC-1B', '8E'])].value_counts('Aircraft_Category', dropna=False)

In [None]:
#Edit multiple models' category value
df.loc[df['Model'].isin(['108-1', 'LA-4-200', 'GC-1B', '8E']), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Model'].value_counts()

In [None]:
df[df['Model'].isin(['F-28C'])].value_counts('Aircraft_Category', dropna=False)

glider - L-13; airplane - VARIEZE, A-1; helicopter - F-28C;

In [None]:
#Edit one model's category value
df.loc[df['Model'] == 'L-13', 'Aircraft_Category'] = 'Glider'
df.loc[df['Model'] == 'F-28C', 'Aircraft_Category'] = 'Helicopter'

#Edit multiple models' category value
df.loc[df['Model'].isin(['VARIEZE', 'A-1']), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Model'].value_counts()

In [None]:
df[df['Model'].isin(['RV-6'])].value_counts('Aircraft_Category', dropna=False)

helicopter - FH-1100; airplane - 108, AVID FLYER, RV-6;

In [None]:
#Edit one model's category value
df.loc[df['Model'] == 'FH-1100', 'Aircraft_Category'] = 'Helicopter'

#Edit multiple models' category value
df.loc[df['Model'].isin(['108', 'AVID FLYER', 'RV-6']), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Model'].value_counts()

In [None]:
df[df['Model'].isin(['S2R-T34'])].value_counts('Aircraft_Category', dropna=False)

airplane - BC-12D, 35A, S2R-T34; helicopter - 280C;

In [None]:
#Edit one model's category value
df.loc[df['Model'] == '280C', 'Aircraft_Category'] = 'Helicopter'

#Edit multiple models' category value
df.loc[df['Model'].isin(['BC-12D', '35A', 'S2R-T34']), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Model'].value_counts().head(50)

Instead of just a few at a time, we can display the top 50 models with no category value and go from there.

In [None]:
df[df['Model'].isin(['UH-12C'])].value_counts('Aircraft_Category', dropna=False)

In [None]:
#Edit one model's category value
df.loc[df['Model'] == 'BLANIK L-13', 'Aircraft_Category'] = 'Glider'

#Edit multiple models' category value
df.loc[df['Model'].isin(['114', '201B', '2150A', '2T-1A-2', '415C', '8F', 'A', 'AA-1', 'AA-1A', 'AA-5B', 'AT-6D', 'CHALLENGER II', 'CL-600-2B19', 'DC-3', 'DC-3C', 'DW-1', 'H-295', 'KITFOX', 'LA-4',
                        'LONG-EZ', 'M-18A', 'MU-2B-60', 'MUSTANG II', 'NAVION', 'P-51D', 'Q2', 'QUICKIE', 'RC-3', 'RV-6A', 'S-1B2', 'S-2B', 'SA226TC', 'SA227-AC', 'SKYBOLT', 'SNJ-5', 'SONERAI II',
                        'SR22', 'T-6G', 'THORP T-18', 'UPF-7', 'VARI-EZE']), 'Aircraft_Category'] = 'Airplane'

df.loc[df['Model'].isin(['B-2B', 'F-28A', 'F-28F', 'S-76A', 'UH-12C']), 'Aircraft_Category'] = 'Helicopter'

df.loc[df['Model'].isin(['FIREFLY 7', 'S-60A']), 'Aircraft_Category'] = 'Balloon'

df[df['Aircraft_Category'].isna()]['Model'].value_counts().head(50)

In [None]:
df[df['Model'].isin(['Q-2'])].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Model'].isin(['Q-2'])].value_counts('Amateur_Built', dropna=False)

In [None]:
#Edit one model's category value
df.loc[df['Model'] == 'B-8M', 'Aircraft_Category'] = 'Gyrocraft'
df.loc[df['Model'] == 'TIERRA II', 'Aircraft_Category'] = 'Ultralight'

#Edit multiple models' category value
df.loc[df['Model'].isin(['DRAGONFLY', 'LONG EZ', 'STEEN SKYBOLT', 'PZL-M-18', 'ST3KR', '112A', 'QUICKIE Q2', 'S-1S', 'S-1', 'CHRISTEN EAGLE II', 'BD-4', 'SR-22']), 'Aircraft_Category'] = 'Airplane'

df.loc[df['Model'].isin(['UH-12B', 'UH-12D', 'AS-350D', 'AS350D']), 'Aircraft_Category'] = 'Helicopter'

df.loc[df['Model'].isin(['MONI', 'ASW-20']), 'Aircraft_Category'] = 'Glider'

df[df['Aircraft_Category'].isna()]['Model'].value_counts().head(50)

In [None]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].isin(['Schweizer'])].value_counts('Aircraft_Category', dropna=False)

In [None]:
# Look at the Schweizer models again that have empty category values
df[df['Make'].isin(['Schweizer']) & df['Aircraft_Category'].isna()].value_counts('Model', dropna=False).head(60)

In [None]:
#Edit multiple models' category value
df.loc[df['Model'].isin(['1-26E', '1-35C', '1-36', '2-32', '2-33', '2-33-A', '2-33A', 'SGS 1-26', 'SGS 1-26A', 'SGS 1-26B', 'SGS 1-26C', 'SGS 1-26D', 'SGS 1-26E', 'SGS 1-34', 'SGS 1-35', 'SGS 1-36',
                        'SGS 2-32', 'SGS 2-33', 'SGS 2-33AK', 'SGS 2-8', 'SGS-1-26', 'SGS-1-26A', 'SGS-1-26B', 'SGS-1-26E', 'SGS-1-30', 'SGS-1-34', 'SGS-1-35', 'SGS-1-35C', 'SGS-126D', 'SGS-126E',
                        'SGS-2-33', 'SGS-2-33A', 'SGS-233A', 'SGS1-26C', 'SGS1-26D', 'SGS1-34', 'SGS1-36', 'SGS2-33A', 'SGU 2-22CK', 'SGU 2-22E', 'SGU-2-22E', 'SGU-2-22K', 'SGU-22', 'SGU2-22E',
                        'SSG 2-33A', 'T-26E']), 'Aircraft_Category'] = 'Glider'

df.loc[df['Model'].isin(['269B', '269C-1', '269D', '300C', 'HUGHES 269C']), 'Aircraft_Category'] = 'Helicopter'

df.loc[df['Model'].isin(['G-164', 'G-164-A', 'G-164B-600', 'G-164C', 'G-164D', 'G164', 'G164A', 'G164B', 'G164D']), 'Aircraft_Category'] = 'Airplane'

df[df['Make'].isin(['Schweizer']) & df['Aircraft_Category'].isna()].value_counts('Model', dropna=False).head(60)

In [None]:
#Edit multiple models' category value
df.loc[df['Model'].isin(['134', '1-23', '1-24', '1-26', '1-26B', '1-26D', '126-D', '2-22EK', '233A', 'FGS-233', 'I-26D', 'I-26E', 'S-2-33A', 'S2-33A', 'SC2-33A', 'SGS 1-23', 'SGS 1-23G', 'SGS 1-23H-15',
                        'SGS 1-26F', 'SGS 1-34R', 'SGS 1-35C', 'SGS 126B', 'SGS 126E', 'SGS 135', 'SGS-1-36', 'SGS-2-32', 'SGS-2-32A', 'SGS-233', 'SGS1-26-D', 'SGS1-26A', 'SGS2-32', 'SGS2-33',
                        'SGS233A', 'TG3A']), 'Aircraft_Category'] = 'Glider'

df.loc[df['Model'].isin(['269', '333', '-269C', '269-C', '269-C1', 'H-300']), 'Aircraft_Category'] = 'Helicopter'

df.loc[df['Model'].isin(['SA 2-37A', 'AG CAT', 'G-164-B', 'G-164A-450', 'G164-B', 'G164A \"450\"', 'G167B', 'GRUMMAN G-164A', 'GRUMMAN G-164B']), 'Aircraft_Category'] = 'Airplane'

df[df['Make'].isin(['Schweizer'])].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].isin(['Aerospatiale']) & df['Aircraft_Category'].isna()].value_counts('Model', dropna=False).head(60)

In [None]:
df[df['Make'].isin(['Aerospatiale'])].value_counts('Aircraft_Category', dropna=False)

I see only 2 airplanes listed for Aerospatiale. So which models are those?

In [None]:
df[df['Make'].isin(['Aerospatiale']) & df['Aircraft_Category'].isin(['Airplane'])].value_counts('Model', dropna=False)

So this tells me that models beginning with 'ATR' would be airplanes

In [None]:
df[df['Make'].isin(['Aerospatiale']) & df['Aircraft_Category'].isin(['Helicopter'])].value_counts('Model', dropna=False)

And helicopter models begin with 'AS-' and 'SA-'

In [None]:
#Edit multiple models' category value
df.loc[df['Model'].isin(['355', '316B', '350 B1', '350D', 'A-300B4', 'ALOUETTE 3', 'AS 315B', 'AS 350 ASTAR', 'AS 350B', 'AS 350B-2', 'AS 350D', 'AS 355 F', 'AS 355F', 'AS 355F1', 'AS-350', 'AS-350-B',
                        'AS-350-B2', 'AS-350-BA', 'AS-350B', 'AS-350BA', 'AS-355', 'AS-355-F', 'AS-355-F1', 'AS-355-F2', 'AS-355E', 'AS-355F', 'AS-355F-1', 'AS350B', 'AS350BA', 'AS355F', 'AS355F1',
                        'AS35OD', 'SA 315B', 'SA 360C', 'SA-315', 'SA-315-B', 'SA-315B', 'SA-316 ALOUETTE', 'SA-316B', 'SA-319B', 'SA-330J', 'SA-341G', 'SA-360C', 'SA-365-N2', 'SA315-D LAMA', 'SA315B',
                        'SA315B LAMA', 'SA316B', 'SA318C', 'SA319B', 'SA341G']), 'Aircraft_Category'] = 'Helicopter'

df.loc[df['Model'].isin(['ATR 42-300', 'ATR-42', 'ATR-42-300', 'ATR-42-320', 'ATR-72', 'ATR-72-212', 'TB-20', 'TB-21', 'TB20']), 'Aircraft_Category'] = 'Airplane'

df[df['Make'].isin(['Aerospatiale']) & df['Aircraft_Category'].isna()].value_counts('Model', dropna=False).head(60)

In [None]:
#Edit multiple models' category value
df.loc[df['Model'].isin(['316B ALOUETTE III', '350-B', '350B', 'AS 355 F ECUREUIL', 'AS 355F-1', 'AS-332L', 'AS-341G', 'AS-350-B3', 'AS-350B1', 'AS-350B2', 'AS-350BII', 'AS-355F1', 'AS-365-N2', 'AS315B',
                        'AS332', 'AS350', 'AS350 BA', 'AS350-B', 'AS350-B3', 'AS350-BH', 'AS350-D', 'AS350B3', 'AS350D ASTAR', 'AS355F-1', 'AS355F2', 'AS365N', 'SA 315', 'SA 316B', 'SA319B Alouette III',
                        'SA330J', 'SA360C DAUPHIN', 'SA365-N1', 'SA365N', 'SE 3180', 'SE 318C', 'SE316B', 'SF3130']), 'Aircraft_Category'] = 'Helicopter'

df.loc[df['Model'].isin(['ATR 42-320', 'ATR 72-212', 'ATR-42-500', 'ATR-72-12', 'ATR42-300', 'ATR72-212', 'SN-601', 'TB-10', 'TB21', 'CONCORDE VERSION 101', 'Concorde', 'ND-26']), 'Aircraft_Category'] = 'Airplane'

df[df['Make'].isin(['Aerospatiale']) & df['Aircraft_Category'].isna()].value_counts('Model', dropna=False).head(60)

In [None]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('dougl')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Douglas', 'DOUGLAS']), 'Make'] = 'Douglas'

In [None]:
df[df['Make'] == 'Douglas'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Douglas', 'Aircraft_Category'] = 'Airplane'

In [None]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('north a')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['NORTH AMERICAN']), 'Make'] = 'North American'

df[df['Make'] == 'North American'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'North American', 'Aircraft_Category'] = 'Airplane'

In [None]:
df[df['Make'].str.lower().str.startswith('taylorc')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['TAYLORCRAFT', 'TAYLORCRAFT AVIATION CORP', 'TAYLORCRAFT AVIATION CORP.', 'TAYLORCRAFT CORP', 'Taylorcraft Aviation', 'Taylorcraft Corporation']), 'Make'] = 'Taylorcraft'

df[df['Make'] == 'Taylorcraft'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Taylorcraft', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('rockw')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['ROCKWELL', 'ROCKWELL INTERNATIONAL', 'Rockwell International', 'Rockwell Intl', 'Rockwell Intl.', 'Rockwell Int\'t']), 'Make'] = 'Rockwell'

df[df['Make'] == 'Rockwell'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Rockwell', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('sikor')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['SIKORSKY', 'SIKORSKY AIRCRAFT CORP', 'SIKORSKY AIRCRAFT CORPORATION']), 'Make'] = 'Sikorsky'

df[df['Make'] == 'Sikorsky'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Sikorsky', 'Aircraft_Category'] = 'Helicopter'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'] == 'Burkhart Grob'].value_counts('Aircraft_Category', dropna=False)

So many NaN and only 9 gliders, so I'm going to check out the models just to make sure that I should fill in Burkhart Grob category as Glider

In [None]:
df[df['Make'] == 'Burkhart Grob'].value_counts('Model', dropna=False)

Google tells me all these models are gliders.

In [None]:
df[df['Make'].str.lower().str.startswith('burkha')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['BURKHART GROB', 'Burkhart Grob Flugzeugbau', 'BURKHART GROB FLUGZEUGBAU', 'Burkhart Grob Flugzeugbah', 'Burkhart-grob']), 'Make'] = 'Burkhart Grob'

df.loc[df['Make'] == 'Burkhart Grob', 'Aircraft_Category'] = 'Glider'

In [None]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('fairchi')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Fairchild Hiller', 'FAIRCHILD', 'Fairchild Swearingen', 'FAIRCHILD HILLER', 'Fairchild Dornier', 'FAIRCHILD HELI-PORTER', 'FAIRCHILD(HOWARD)', 'FAIRCHILD FUNK',
                       'Fairchild Heli-porter', 'Fairchild Industries', 'Fairchild Merlin', 'Fairchild-heliporter', 'Fairchild/swearingen']), 'Make'] = 'Fairchild'

df[df['Make'] == 'Fairchild'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].isin(['Fairchild']) & df['Aircraft_Category'].isna()].value_counts('Model', dropna=False).head(60)

The only models in this list that are helicopters are the FH1100 and FH-100. All the rest fall into the Airplane category.

In [None]:
df.loc[df['Model'].isin(['FH1100', 'FH-100']), 'Aircraft_Category'] = 'Helicopter'

In [None]:
# Make the rest of the NaN category values Airplane for Fairchild
df.loc[(df['Make'].isin(['Fairchild'])) & (df['Aircraft_Category'].isna()), 'Aircraft_Category'] = 'Airplane'

df[df['Make'] == 'Fairchild'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('lockh')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['LOCKHEED']), 'Make'] = 'Lockheed'

df[df['Make'] == 'Lockheed'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Lockheed', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('ayre')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['AYRES CORPORATION', 'AYRES', 'Ayres Corporation', 'AYRES THRUSH', 'AYRES CORP']), 'Make'] = 'Ayres'

df[df['Make'] == 'Ayres'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Ayres', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('balloo')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['BALLOON WORKS', 'Balloon Works Inc']), 'Make'] = 'Balloon Works'

df[df['Make'] == 'Balloon Works'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Balloon Works', 'Aircraft_Category'] = 'Balloon'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('swearin')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['SWEARINGEN', 'Swearingen T R/masters W']), 'Make'] = 'Swearingen'

df[df['Make'] == 'Swearingen'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Swearingen', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('mitsub')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['MITSUBISHI']), 'Make'] = 'Mitsubishi'

df[df['Make'] == 'Mitsubishi'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Mitsubishi', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('hille')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['HILLER', 'Hiller-soloy', 'HILLER-ROGERSON HELICOPTER', 'HILLER-TRI-PLEX IND.INC.', 'Hiller-osborn']), 'Make'] = 'Hiller'

df[df['Make'] == 'Hiller'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Hiller', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('british')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['British Aircraft Corp. (bac)', 'BRITISH AEROSPACE', 'BRITISH AIRCRAFT CORP', 'BRITISH AIRCRAFT CORP.', 'British Aerospace Civil Aircr', 'British Aircraft Corp. (BAC)']),
'Make'] = 'British Aerospace'

df[df['Make'] == 'British Aerospace'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'British Aerospace', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('embra')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['EMBRAER', 'EMBRAER S A', 'EMBRAER-EMPRESA BRASILEIRA DE', 'EMBRAER S.A.', 'EMBRAER EXECUTIVE AIRCRAFT INC', 'EMBRAER SA', 'Embraer Aircraft']),'Make'] = 'Embraer'

df[df['Make'] == 'Embraer'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Embraer', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('enstr')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['ENSTROM', 'ENSTROM HELICOPTER CORP']),'Make'] = 'Enstrom'

df[df['Make'] == 'Enstrom'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Enstrom', 'Aircraft_Category'] = 'Helicopter'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('pitts')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Pitts', 'PITTS', 'PITTS AEROBATICS', 'PITTS SPECIAL', 'Pitts Spl.']),'Make'] = 'Pitts Special'

df[df['Make'] == 'Pitts Special'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Pitts Special', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('aerost')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['AEROSTAR ACFT CORP OF TEXAS', 'AEROSTAR AIRCRAFT CORPORATION']),'Make'] = 'Aerostar Aircraft Corporation'
df.loc[df['Make'].isin(['AEROSTAR S A', 'Aerostar, S.a']),'Make'] = 'Aerostar, SA'
df.loc[df['Make'].isin(['Aerostar', 'AEROSTAR', 'AEROSTAR INTERNATIONAL', 'AEROSTAR INTERNATIONAL INC', 'Aerostar International Inc', 'Aerostar International Inc.', 'Aerostar International, Inc.',
                       'Aerostar-raven']),'Make'] = 'Aerostar International'

df[df['Make'] == 'Aerostar, SA'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'] == 'Aerostar Aircraft Corporation'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'] == 'Aerostar International'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Aerostar, SA', 'Aircraft_Category'] = 'Airplane'
df.loc[df['Make'] == 'Aerostar International', 'Aircraft_Category'] = 'Balloon'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('lear')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['LEARJET', 'LEARJET INC', 'Learjet Inc']),'Make'] = 'Learjet'

df[df['Make'] == 'Learjet'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Learjet', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('raven')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Raven', 'RAVEN INDUSTRIES INC']),'Make'] = 'Raven Industries'

df[df['Make'] == 'Raven Industries'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Raven Industries', 'Aircraft_Category'] = 'Balloon'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('mbb')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['MBB', 'Mbb-bolkow']),'Make'] = 'Mbb'

df[df['Make'] == 'Mbb'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Mbb', 'Aircraft_Category'] = 'Helicopter'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('wac')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['WACO', 'WACO CLASSIC AIRCRAFT', 'WACO CLASSIC AIRCRAFT CORP', 'Waco Classic Aircraft Corp.', 'Waco Classic Aircraft', 'Waco Classic Aircraft Corp']),'Make'] = 'Waco'

df[df['Make'] == 'Waco'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Waco', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('schem')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Schempp-hirth', 'SCHEMPP-HIRTH', 'Schempp Hirth', 'SCHEMPP HIRTH', 'SCHEMPP-HIRTH FLUGZEUGBAU', 'SCHEMPP HIRTH FLUGZEUGBAU GMBH', 'SCHEMPP-HIRTH FLUGZEUGBAU GMBH',
                       'SCHEMPP-HIRTH K G', 'Schempp-hirth K.g.']),'Make'] = 'Schempp-Hirth'

df[df['Make'] == 'Schempp-Hirth'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Schempp-Hirth', 'Aircraft_Category'] = 'Glider'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('helio')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['HELIO', 'Helio Aircraft Ltd']),'Make'] = 'Helio'

df[df['Make'] == 'Helio'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Helio', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('schlei')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['ALEXANDER SCHLEICHER GMBH & CO', 'Alexander Schleicher', 'SCHLEICHER', 'SCHLEICHER ALEXANDER GMBH & CO', 'SCHLEICHER ALEXANDER', 'Schlei',
                        'Schleicher Alexander Gmbh']),'Make'] = 'Schleicher'

df[df['Make'] == 'Schleicher'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Schleicher', 'Aircraft_Category'] = 'Glider'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('ercou')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Ercoupe (eng & Research Corp.)', 'ERCOUPE', 'Ercoupe (Eng & Research Corp.)']),'Make'] = 'Ercoupe'

df[df['Make'] == 'Ercoupe'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Ercoupe', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('weatherl')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['WEATHERLY AVIATION CO INC', 'WEATHERLY', 'Weatherly Aviation Company Inc']),'Make'] = 'Weatherly'

df[df['Make'] == 'Weatherly'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Weatherly', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('ryan')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['RYAN', 'RYAN AERONAUTICAL', 'Ryan Aeronautical', 'Ryan Aeronautics']),'Make'] = 'Ryan'

df[df['Make'] == 'Ryan'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Ryan', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('camer')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Cameron Balloons', 'CAMERON BALLOONS US', 'CAMERON', 'CAMERON BALLOONS', 'Cameron Balloon', 'CAMERON BALLOONS U S', 'Cameron Ballon', 'Cameron Balloons US',
                       'Cameron Balloons Us']),'Make'] = 'Cameron'

df[df['Make'] == 'Cameron'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Cameron', 'Aircraft_Category'] = 'Balloon'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('fokk')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['FOKKER']),'Make'] = 'Fokker'

df[df['Make'] == 'Fokker'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Fokker', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('smith, ted a')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Smith, Ted Aerostar', 'Ted Smith']),'Make'] = 'Ted Smith Aerostar'

df[df['Make'] == 'Ted Smith Aerostar'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Ted Smith Aerostar', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('rotorw')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['ROTORWAY', 'Rotorway Aircraft, Inc.', 'Rotorway Executive']),'Make'] = 'Rotorway'

df[df['Make'] == 'Rotorway'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Rotorway', 'Aircraft_Category'] = 'Helicopter'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('aviat')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['AVIAT', 'AVIAT AIRCRAFT', 'AVIAT AIRCRAFT INC', 'Aviat Aircraft Inc', 'Aviat Aircraft Inc.', 'Aviat Aircraft, Inc.',
                       'AVIAT INC', 'Aviat Inc', 'AVIATE']),'Make'] = 'Aviat'

df[df['Make'] == 'Aviat'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Aviat', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('gulfs')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['GULFSTREAM', 'Gulfstream Aerospace', 'GULFSTREAM AEROSPACE', 'Gulfstream Aerospace Corp', 'Gulfstream Aerospace Corp.',
                       'Gulfstream Aerospace LP', 'GULFSTREAM AM CORP COMM DIV', 'Gulfstream American', 'GULFSTREAM AMERICAN CORP',
                       'Gulfstream American Corp', 'GULFSTREAM AMERICAN CORP.', 'Gulfstream American Corp.', 'GULFSTREAM SCHWEIZER A/C CORP',
                       'Gulfstream-schweizer', 'Gulfstream-Schweizer', 'GULFSTREAM-SCHWEIZER', 'GULFSTREAM-SCHWEIZER A/C CORP']),'Make'] = 'Gulfstream'

df[df['Make'] == 'Gulfstream'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Gulfstream', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('gates')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['GATES LEARJET CORP.', 'GATES LEAR JET', 'GATES LEAR JET CORP.', 'GATES LEARJET', 'GATES LEARJET CORP',
                        'Gates Lear Jet', 'Gates Learjet Corporation']),'Make'] = 'Gates Learjet'

df[df['Make'] == 'Gates Learjet'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Gates Learjet', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('eipp')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['EIPPER', 'EIPPER FORMANCE INC', 'Eippen Aircraft', 'Eipper Formance', 'Eipper Mx Ii Quicksilver', 'Eipper Quicksilver',
                       'Eipper Quicksiver E', 'Eipper-formance']),'Make'] = 'Eipper'

df[df['Make'] == 'Eipper'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Eipper', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('saab')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Saab-scania Ab (saab)', 'SAAB', 'Saab-fairchild', 'Saab-scania', 'SAAB-SCANIA', 'SAAB-SCANIA AB',
                        'Saab-Scania AB (Saab)']),'Make'] = 'Saab'

df[df['Make'] == 'Saab'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Saab', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('canada')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['CANADAIR', 'CANADAIR LTD']),'Make'] = 'Canadair'

df[df['Make'] == 'Canadair'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Canadair', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('wsk')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Wsk', 'Wsk Pzl', 'WSK PZL MIELEC', 'WSK-MIELEC', 'WSK-PZL MEILEC', 'Wsk-pzl Mielec',
                        'Wsk-pzl Mielic']),'Make'] = 'Wsk Pzl Mielec'

df[df['Make'] == 'Wsk Pzl Mielec'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Wsk Pzl Mielec', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('aerot')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['AEROTEK', 'AEROTEK INC', 'Aerotek-pitts', 'Aerotrek']),'Make'] = 'Aerotek'

df[df['Make'] == 'Aerotek'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Aerotek', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('conv')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Convair Div. Of Gen. Dynamics', 'CONVAIR']),'Make'] = 'Convair'

df[df['Make'] == 'Convair'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Convair', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df['Aircraft_Category'].value_counts(dropna=False)

We are down to about 6000 empty category entries at this point. The column for Amateur_Built may enable us to remove a number of unneeded rows.

In [None]:
# show the amateur_built value_counts for the empty category entries
df[df['Aircraft_Category'].isna()]['Amateur_Built'].value_counts(dropna=False)

So we can drop the almost 4000 rows that are listed as Amateur_Built

In [None]:
df = df.drop(df[(df['Amateur_Built'] == 'Yes') & (df['Aircraft_Category'].isna())].index)

df[df['Aircraft_Category'].isna()]['Amateur_Built'].value_counts(dropna=False)

In [None]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('navi')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['NAVION']),'Make'] = 'Navion'

df[df['Make'] == 'Navion'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'] == 'Navion', 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('euroc')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['EUROCOPTER', 'Eurocopter France', 'EUROCOPTER DEUTSCHLAND GMBH', 'Eurocopter Deutschland', 'EUROCOPTER FRANCE',
                        'Eurocopter Deutsch', 'Eurocopter Deutschland Gmbh']),'Make'] = 'Eurocopter'

df[df['Make'] == 'Eurocopter'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('lusc')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['LUSCOMBE', 'Luscombe Silvaire Aircraft Co.']),'Make'] = 'Luscombe'

df[df['Make'] == 'Luscombe'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('stins')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['STINSON']),'Make'] = 'Stinson'

df[df['Make'] == 'Stinson'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('soca')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['SOCATA', 'Socata-Groupe Aerospatiale']),'Make'] = 'Socata'

df[df['Make'] == 'Socata'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('britt')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Britten-norman', 'BRITTEN NORMAN', 'BRITTEN-NORMAN', 'Britten Norman']),'Make'] = 'Britten-Norman'

df[df['Make'] == 'Britten-Norman'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('kama')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['KAMAN', 'KAMAN AEROSPACE CORP']),'Make'] = 'Kaman'

df[df['Make'] == 'Kaman'].value_counts('Aircraft_Category', dropna=False)

There's one plane listed for Kaman. What is it?

In [None]:
# what are the value_counts for Model when Aircraft_Category is Airplane and Make is Kaman
df[(df['Aircraft_Category'] == ('Airplane')) & (df['Make'] == 'Kaman')]['Model'].value_counts(dropna=False)

The K1200 is a helicopter, so that entry is wrong.

In [None]:
df[df['Make'].str.lower().str.startswith('short b')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['SHORT BROS', 'SHORT BROS. & HARLAND', 'Short Bros.', 'SHORT BROTHERS & HARLAND LTD.',
                        'SHORT BROTHERS PLC']),'Make'] = 'Short Brothers'

df[df['Make'] == 'Short Brothers'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('rol')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Rolladen-schneider', 'ROLLADEN-SCHNEIDER', 'ROLLADEN-SCHNEIDER OHG', 'ROLLADEN SCHNEIDER OHG', 'ROLLADEN-SCHNEIDER GMBH',
                       'Rolladen Schneider', 'Rolladen-schneider Gmbh']),'Make'] = 'Rolladen-Schneider'

df[df['Make'] == 'Rolladen-Schneider'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('lak')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['LAKE']),'Make'] = 'Lake'

df[df['Make'] == 'Lake'].value_counts('Aircraft_Category', dropna=False)

In [None]:
# fill in the previous makes for their respective categories
df.loc[df['Make'].isin(['Rolladen-Schneider']),'Aircraft_Category'] = 'Glider'
df.loc[df['Make'].isin(['Kaman', 'Eurocopter']),'Aircraft_Category'] = 'Helicopter'
df.loc[df['Make'].isin(['Britten-Norman', 'Socata', 'Stinson', 'Luscombe', 'Short Brothers', 'Lake']),'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('pila')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['PILATUS', 'PILATUS AIRCRAFT LTD', 'PILATUS BRITTEN-NORMAN', 'Pilatus Aircraft', 'Pilatus Britten-norman']),'Make'] = 'Pilatus'

df[df['Make'] == 'Pilatus'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('agu')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['AGUSTA', 'AGUSTA SPA', 'AGUSTA BELL', 'AGUSTAWESTLAND', 'AGUSTA AEROSPACE CORP', 'AGUSTAWESTLAND SPA',
                       'AGUSTAWESTLAND PHILADELPHIA', 'AGUSTAWESTLAND PHILADELPHIA CO', 'Agusta Spa', 'Agusta-bell', 'Agusta/Westland',
                       'AgustaWestland', 'AgustadWestland']),'Make'] = 'Agusta'

df[df['Make'] == 'Agusta'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('texas h')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['TEXAS HELICOPTER CORP', 'Texas Helicopter Corp.', 'Texas Helicopter Corporation']),'Make'] = 'Texas Helicopter'

df[df['Make'] == 'Texas Helicopter'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('contin')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['CONTINENTAL COPTERS INC.', 'Continental', 'CONTINENTAL COPTERS INC', 'CONTINENTAL COPTERS',
                        'Continental Mk5a']),'Make'] = 'Continental Copters'

df[df['Make'] == 'Continental Copters'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('alon')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['ALON', 'Alon Aircoupe']),'Make'] = 'Alon'

df[df['Make'] == 'Alon'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('hawk')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['HAWKER', 'HAWKER AIRCRAFT LTD', 'Hawker Aircraft Ltd', 'Hawker Aircraft Ltd.', 'HAWKER BEECH', 'Hawker Beech',
                       'Hawker Beechcraft', 'HAWKER BEECHCRAFT', 'HAWKER BEECHCRAFT CORP', 'Hawker Beechcraft Corp.', 'Hawker Beechcraft Corporation',
                       'HAWKER BEECHCRAFT CORPORATION', 'Hawker Siddeley', 'HAWKER SIDDELEY', 'Hawker Siddely', 'Hawker-Beechcraft',
                        'Hawker-beechcraft', 'Hawker-Beechcraft Corporation']),'Make'] = 'Hawker'

df[df['Make'] == 'Hawker'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('let')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['LET', 'Let Np Kinovice']),'Make'] = 'Let'

df[df['Make'] == 'Let'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('diam')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['DIAMOND', 'Diamond', 'Diamond Aicraft Industries Inc', 'Diamond Aircraft', 'DIAMOND AIRCRAFT', 'DIAMOND AIRCRAFT IND GMBH',
                       'DIAMOND AIRCRAFT IND INC', 'DIAMOND AIRCRAFT INDUSTRIES', 'DIAMOND AIRCRAFT INDUSTRIES IN',
                        'Diamond Aircraft Industry Inc']),'Make'] = 'Diamond Aircraft Industries'

df[df['Make'] == 'Diamond Aircraft Industries'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('sia')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Siai-marchetti', 'SIAI-MARCHETTI', 'SIAI MARCHETTI', 'Siai Marchetti', 'Siai-Marchetti']),'Make'] = 'SIAI-Marchetti'

df[df['Make'] == 'SIAI-Marchetti'].value_counts('Aircraft_Category', dropna=False)

In [None]:
# fill in the previous makes for their respective categories
df.loc[df['Make'].isin(['Let']),'Aircraft_Category'] = 'Glider'
df.loc[df['Make'].isin(['Continental Copters', 'Texas Helicopter', 'Agusta']),'Aircraft_Category'] = 'Helicopter'
df.loc[df['Make'].isin(['Pilatus', 'Alon', 'Hawker', 'Diamond Aircraft Industries', 'SIAI-Marchetti']),'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('i.c')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['I.c.a. Brasov', 'I.C.A.-BRASOV (ROMANIA)', 'I.C.A.-Brasov', 'I.c.a. Brasov – Romania', 'I.c.a.-brasov',
                       'ICA BRASOV']),'Make'] = 'ICA Brasov'

df[df['Make'] == 'ICA Brasov'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('isr')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['ISRAEL AIRCRAFT INDUSTRIES', 'ISRAEL AEROSPACE INDUSTRIESLTD']),'Make'] = 'Israel Aircraft Industries'

df[df['Make'] == 'Israel Aircraft Industries'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('dorni')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['DORNIER', 'DORNIER GMBH']),'Make'] = 'Dornier'

df[df['Make'] == 'Dornier'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('sno')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['SNOW']),'Make'] = 'Snow'

df[df['Make'] == 'Snow'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('yako')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['YAKOVLEV', 'YAKOVLEV/CHINNERY', 'YAKOVLEV/DAY']),'Make'] = 'Yakovlev'

df[df['Make'] == 'Yakovlev'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('thund')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Thunder And Colt', 'Thunder Balloons, Ltd.', 'THUNDER & COLT', 'THUNDER & COLT AIRBORNE AMER',
                        'Thunder & Colt Ltd', 'COLT BALLOONS', 'LINDSTRAND BALLOONS', 'Lindstrand Balloons', 'Lindstrand',
                       'LINDSTRAND', 'LINDSTRAND BALLOONS USA']),'Make'] = 'Thunder & Colt Balloons'

df[df['Make'] == 'Thunder & Colt Balloons'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'] == 'Callair'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('garli')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Garlick', 'GARLICK', 'GARLICK HELICOPTERS INC', 'Garlick Helicipters Inc.', 'Garlick Helicopters Inc',
                        'Garlick Helicopters Inc.']),'Make'] = 'Garlick Helicopters'

df[df['Make'] == 'Garlick Helicopters'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('cirr')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['CIRRUS DESIGN CORP', 'CIRRUS', 'Cirrus Design', 'CIRRUS DESIGN', 'Cirrus Design Corp', 'Cirrus Design Corp.',
                        'CIRRUS DESIGN CORP.', 'Cirrus Design Corporation', 'CIRRUS DESIGN CORPORATION']),'Make'] = 'Cirrus'

df[df['Make'] == 'Cirrus'].value_counts('Aircraft_Category', dropna=False)

In [None]:
# fill in the previous makes for their respective categories
df.loc[df['Make'].isin(['Thunder & Colt Balloons']),'Aircraft_Category'] = 'Balloon'
df.loc[df['Make'].isin(['ICA Brasov']),'Aircraft_Category'] = 'Glider'
df.loc[df['Make'].isin(['Garlick Helicopters']),'Aircraft_Category'] = 'Helicopter'
df.loc[df['Make'].isin(['Callair', 'Israel Aircraft Industries', 'Dornier', 'Snow', 'Yakovlev', 'Cirrus']),'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('christen')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Christen Industries Inc.', 'Christen', 'Christen Industries Inc.',
                        'Christen Industries, Inc.']),'Make'] = 'Christen Industries'

df[df['Make'] == 'Christen Industries'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('american c')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['AMERICAN CHAMPION AIRCRAFT', 'American Champion (acac)', 'American Champion Aircraft', 'American Champion (ACAC)',
                       'AMERICAN CHAMPION', 'AMERICAN Champion', 'American Champion Aircraft Cor']),'Make'] = 'American Champion'

df[df['Make'] == 'American Champion'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('casa')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['CASA']),'Make'] = 'Casa'

df[df['Make'] == 'Casa'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('dass')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Dassault-breguet', 'Dassault Aviation', 'DASSAULT', 'Dassault/sud', 'DASSAULT AVIATION', 'DASSAULT-BREGUET',
                       'DASSAULT/SUD', 'Dassault Falcon', 'Dassault-Breguet']),'Make'] = 'Dassault'

df[df['Make'] == 'Dassault'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('consolid')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Consolidated-vultee', 'CONSOLIDATED AERONAUTICS INC.', 'CONSOLIDATED VULTEE', 'CONSOLIDATED AERONAUTICS INC.',
                       'CONSOLIDATED AERONAUTICS', 'CONSOLIDATED AERONAUTICS INC.', 'Consolidated Aero', 'Consolidated Aeronautics, Inc',
                       'Consolidated Aeronautics, Inc.', 'Consolidated Aeronautics Inc.', 'CONSOLIDATED  AERONAUTICS INC.',
                       'CONSOLIDATED AERONAUTICS INC']),'Make'] = 'Consolidated'

df[df['Make'] == 'Consolidated'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('brant')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['BRANTLY', 'Brantly', 'Brantly-hynes', 'Brantley']),'Make'] = 'Brantly Helicopter'

df[df['Make'] == 'Brantly Helicopter'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('glasf')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['GLASFLUGEL']),'Make'] = 'Glasflugel'

df[df['Make'] == 'Glasflugel'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('glob')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['GLOBE', 'Globe Swift']),'Make'] = 'Globe'

df[df['Make'] == 'Globe'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('american')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['AMERICAN', 'American Aircraft', 'American Aviation', 'AMERICAN AVIATION', 'American Aviation Corp. (aac)',
                       'American Aviation Corp. (AAC)']),'Make'] = 'American'
df.loc[df['Make'].isin(['AMERICAN AIR RACING LTD']),'Make'] = 'American Air Racing'
df.loc[df['Make'].isin(['AMERICAN BLIMP', 'American Blimp Corp.', 'American Blimp Corporation']),'Make'] = 'American Blimp'
df.loc[df['Make'].isin(['AMERICAN EUROCOPTER', 'AMERICAN EUROCOPTER CORP', 'AMERICAN EUROCOPTER LLC']),'Make'] = 'American Eurocopter'
df.loc[df['Make'].isin(['AMERICAN GENERAL ACFT CORP']),'Make'] = 'American General Aircraft'
df.loc[df['Make'].isin(['American Legand Aircraft', 'AMERICAN LEGEND', 'AMERICAN LEGEND AIRCRAFT CO', 'American Legend Aircraft Co.',
                       'American Legend Aircraft Compa']),'Make'] = 'American Legend'

df[df['Make'] == 'American'].value_counts('Aircraft_Category', dropna=False)

In [None]:
# fill in the previous makes for their respective categories
df.loc[df['Make'].isin(['Glasflugel']),'Aircraft_Category'] = 'Glider'
df.loc[df['Make'].isin(['Brantly Helicopter']),'Aircraft_Category'] = 'Helicopter'
df.loc[df['Make'].isin(['Christen Industries', 'American Champion', 'Casa', 'Dassault', 'Consolidated', 'Globe',
                        'American']),'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('extra')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['EXTRA FLUGZEUGBAU GMBH', 'EXTRA FLUGZEUGPRODUKTIONS-UND', 'Extra Flugzeugbau Gmbh', 'EXTRA FLUGZEUGBAU',
                       'EXTRA Flugzeugproduktions-GMBH', 'Extra Flugzeugproduktions-und', 'Extra Flugzeugrau Gmbh']),'Make'] = 'Extra Flugzeugbau'
df.loc[df['Make'].isin(['EXTRA']),'Make'] = 'Extra'
df[df['Make'] == 'Extra Flugzeugbau'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('glase')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Glaser-dirks', 'Glaser Dirks', 'GLASER DIRKS', 'GLASER-DIRKS', 'Glaser-Dirks Flugzeugbau',
                        'Glaser-dirks-flugzeubau']),'Make'] = 'Glaser-Dirks'

df[df['Make'] == 'Glaser-Dirks'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('sukh')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['SUKHOI']),'Make'] = 'Sukhoi'

df[df['Make'] == 'Sukhoi'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('eiri')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['EIRIAVION OY']),'Make'] = 'Eiriavion Oy'

df[df['Make'] == 'Eiriavion Oy'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('forne')].value_counts('Make')

In [None]:
df[df['Make'] == 'Forney'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('curtiss')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['CURTISS WRIGHT', 'Curtiss-wright', 'Curtiss Wright']),'Make'] = 'Curtiss-Wright'

df[df['Make'] == 'Curtiss-Wright'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('classic')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Classic Aircraft Corp.', 'CLASSIC AIRCRAFT CORP', 'Classic Aircraft Corp']),'Make'] = 'Classic Aircraft'

df[df['Make'] == 'Classic Aircraft'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('adams')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['ADAMS BALLOONS LLC', 'Adams Balloon']),'Make'] = 'Adams Balloons'

In [None]:
df[df['Make'].str.lower().str.startswith('adams')].value_counts('Make')

In [None]:
df[df['Make'] == 'Adams'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'] == 'Adams'].value_counts('Model', dropna=False)

Balloons: A55S, A55, AB, A-60, A60S, AX-9; Planes: Airborne Australia O, KITFOX, RV-6A, SONERAI II

In [None]:
df[df['Model'].isin(['SONERAI II']) & df['Make'].isin(['Adams'])].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[(df['Model'].isin(['A55S', 'A55', 'AB', 'A-60', 'A60S', 'AX-9'])) & (df['Make'].isin(['Adams'])), 'Aircraft_Category'] = 'Balloon'
df.loc[(df['Model'].isin(['Airborne Australia O', 'KITFOX', 'RV-6A', 'SONERAI II'])) & (df['Make'].isin(['Adams'])), 'Aircraft_Category'] = 'Airplane'

In [None]:
df[df['Make'] == 'Adams'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('adams')].value_counts('Make')

In [None]:
df.loc[(df['Make'].isin(['Adams'])) & (df['Aircraft_Category'].isin(['Balloon'])), 'Make'] = 'Adams Balloons'

df[df['Make'].str.lower().str.startswith('adams')].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('stearm')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['STEARMAN AIRCRAFT', 'STEARMAN']),'Make'] = 'Stearman'

df[df['Make'] == 'Stearman'].value_counts('Aircraft_Category', dropna=False)

In [None]:
# fill in the previous makes for their respective categories
df.loc[df['Make'].isin(['Glaser-Dirks', 'Eiriavion Oy']),'Aircraft_Category'] = 'Glider'
df.loc[df['Make'].isin(['Adams Balloons']),'Aircraft_Category'] = 'Balloon'
df.loc[df['Make'].isin(['Stearman', 'Extra Flugzeugbau', 'Sukhoi', 'Forney', 'Curtiss-Wright', 'Classic Aircraft']),'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('aerof')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Aerofab Inc.', 'AEROFAB INC', 'AEROFAB INC.', 'Aerofab, Inc.']),'Make'] = 'Aerofab'

df[df['Make'] == 'Aerofab'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('american g')].value_counts('Make')

In [None]:
df[df['Make'] == 'American General Aircraft'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('culv')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['CULVER']),'Make'] = 'Culver'

df[df['Make'] == 'Culver'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('air &')].value_counts('Make')

In [None]:
df[df['Make'] == 'Air & Space'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('inters')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['INTERSTATE']),'Make'] = 'Interstate'

df[df['Make'] == 'Interstate'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('bomb')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['BOMBARDIER INC', 'BOMBARDIER', 'Bombardier, Inc.', 'BOMBARDIER LEARJET CORP.', 'Bombardier Aerospace, Inc.',
                       'Bombardier Canadair']),'Make'] = 'Bombardier'

df[df['Make'] == 'Bombardier'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('rayth')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['RAYTHEON AIRCRAFT COMPANY', 'Raytheon Aircraft Company', 'Raytheon Corporate Jets', 'RAYTHEON', 'RAYTHEON COMPANY',
                       'RAYTHEON CORPORATE JETS INC', 'Raytheon Co']),'Make'] = 'Raytheon'

df[df['Make'] == 'Raytheon'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('barnes')].value_counts('Make')

In [None]:
df[df['Make'] == 'Barnes'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('pzl')].value_counts('Make')

In [None]:
# The different PZL permutations are all connected
df.loc[df['Make'].isin(['Pzl-mielec', 'Pzl-bielsko', 'Pzl', 'PZL-SWIDNIK', 'PZL MIELEC', 'Pzl Warzawa-okecie', 'Pzl Okecie', 'Pzl-okecie',
                       'Pzl Warzawa-cnpsl', 'Pzl Swidnik', 'PZL BIELSKO', 'PZL-Swidnik', 'PZL-BIELSKO', 'PZL Warszawa-Okecie', 'PZL SWIDNIK',
                       'PZL OKECIE', 'Pzl-swidnik']),'Make'] = 'PZL'

df[df['Make'] == 'PZL'].value_counts('Aircraft_Category', dropna=False)

In [None]:
# PZL made different categories of aircraft, so we could see if the models may inform what categories they are
df[df['Make'] == 'PZL'].value_counts('Model', dropna=False)

In [None]:
# PZL represents multiple cats of aircraft, so google provides the category for these models
df.loc[(df['Make'].isin(['PZL'])) & (df['Model'].isin(['PW-5', 'SZD-59', 'PW 5', 'SZD-45A OGAR', 'SZD 50-3', 'SZD-42-2 JANTAR', 'SZD-48-3',
                                                      'SZD-50-3', 'SZD-55-1', 'SZD51', 'SZD 55-1', 'PW 6U', 'JANTAR 2A',
                                                       '55-1'])), 'Aircraft_Category'] = 'Glider'

df.loc[(df['Make'].isin(['PZL'])) & (df['Model'].isin(['SW4', 'SW-4'])), 'Aircraft_Category'] = 'Helicopter'

df.loc[(df['Make'].isin(['PZL'])) & (df['Model'].isin(['M-18A', 'M18', 'PZL-M-18', 'PZL-104 Wilga 35A', 'PZL-104 WILGA 80', 'PZL104',
                                                      '101', 'PZL-104', 'PZL-104 WILGA 35A', 'PZL-104 35A', '101A', 'MIG-17', 'M-18T', 'M-18B',
                                                      'M-18A DROMADER', 'M-18', 'KOLIBER -150A', '80', '104-80', '104 Wilga 80',
                                                      'Wilga 104-80'])), 'Aircraft_Category'] = 'Airplane'

df[df['Make'] == 'PZL'].value_counts('Aircraft_Category', dropna=False)

In [None]:
# fill in the previous makes for their respective categories
df.loc[df['Make'].isin(['Air & Space']),'Aircraft_Category'] = 'Gyrocraft'
df.loc[df['Make'].isin(['Barnes']),'Aircraft_Category'] = 'Balloon'
df.loc[df['Make'].isin(['Aerofab', 'American General Aircraft', 'Culver', 'Interstate', 'Bombardier', 'Raytheon']),'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('nava')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['NAVAL AIRCRAFT FACTORY']),'Make'] = 'Naval Aircraft Factory'

df[df['Make'] == 'Naval Aircraft Factory'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('hispa')].value_counts('Make')

In [None]:
df[df['Make'] == 'Hispano Aviacion'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('parte')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['PARTENAVIA', 'PARTENAVIA S.P.A.', 'PARTENAVIA SPA']),'Make'] = 'Partenavia'

df[df['Make'] == 'Partenavia'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('picc')].value_counts('Make')

In [None]:
df[df['Make'] == 'Piccard'].value_counts('Model', dropna=False)

Google confirms that the models listed under Piccard are balloons

In [None]:
df[df['Make'].str.lower().str.startswith('eagle a')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Eagle Aircraft Co.', 'EAGLE AIRCRAFT CO']),'Make'] = 'Eagle Aircraft'

df[df['Make'] == 'Eagle Aircraft'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('atr')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Atr']),'Make'] = 'ATR'

df[df['Make'] == 'ATR'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('great')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['GREAT LAKES', 'Great Lakes Aircraft Company']),'Make'] = 'Great Lakes'

df[df['Make'] == 'Great Lakes'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('nord')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Nord (sncan)', 'NORD', 'Nord (SNCAN)', 'Nord Aviation']),'Make'] = 'Nord'

df[df['Make'] == 'Nord'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('meyers')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['MEYERS', 'MEYERS INDUSTRIES INC', 'Meyers']),'Make'] = 'Meyers Aircraft Co.'

df[df['Make'] == 'Meyers Aircraft Co.'].value_counts('Aircraft_Category', dropna=False)

In [None]:
# fill in the previous makes for their respective categories
df.loc[df['Make'].isin(['Piccard']),'Aircraft_Category'] = 'Balloon'
df.loc[df['Make'].isin(['Naval Aircraft Factory', 'Hispano Aviacion', 'Partenavia', 'Eagle Aircraft', 'ATR', 'Great Lakes', 'Nord',
                       'Meyers Aircraft Co.']),'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('fleet')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['FLEET']),'Make'] = 'Fleet'

df[df['Make'] == 'Fleet'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('mccu')].value_counts('Make')

In [None]:
df[df['Make'] == 'Mcculloch'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('nanc')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['NANCHANG CHINA', 'NANCHANG', 'Nanchang China']),'Make'] = 'Nanchang'

df[df['Make'] == 'Nanchang'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('head')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Head Balloons, Inc.', 'HEAD BALLOONS INC', 'HEAD', 'Head', 'HEAD BALLOONS INC.',
                        'Head Balloons Inc.']),'Make'] = 'Head Balloons'

df[df['Make'] == 'Head Balloons'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('aero v')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['AERO VODOCHODY', 'Aero Vodochody Aero. Works', 'Aero Vodochody Aero Works']),'Make'] = 'Aero Vodochody'

df[df['Make'] == 'Aero Vodochody'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('centrai')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['CENTRAIR']),'Make'] = 'Centrair'

df[df['Make'] == 'Centrair'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('comma')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['COMMANDER AIRCRAFT CO', 'Commander Aircraft Company', 'COMMANDER', 'Commander Aircraft']),'Make'] = 'Commander'

df[df['Make'] == 'Commander'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('quick')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['QUICKSILVER', 'QUICKSILVER AIRCRAFT', 'QUICKSILVER AIRCRAFT CO', 'Quicksilver Aircraft Northeast',
                        'QUICKSILVER EIPPER ACFT INC', 'QUICKSILVER ENTERPRISES INC', 'Quicksilver II', 'Quicksilver Manufacturing',
                        'QUICKSILVER MANUFACTURING INC', 'QUICKSILVER MFG']),'Make'] = 'Quicksilver'

df.loc[df['Make'].isin(['QUICKIE', 'Quickie-myers']),'Make'] = 'Quickie'
df[df['Make'] == 'Quickie'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'] == 'Quicksilver'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'] == 'Quicksilver'].value_counts('Model', dropna=False)

Google says these models are all ultralight aircraft, so the airplane designation is incorrect, and so all Quicksilver entries can be changed to ultralight.

In [None]:
df[df['Make'].str.lower().str.startswith('chanc')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['CHANCE VOUGHT']),'Make'] = 'Chance Vought'
df[df['Make'] == 'Chance Vought'].value_counts('Aircraft_Category', dropna=False)

In [None]:
# fill in the previous makes for their respective categories
df.loc[df['Make'].isin(['Head Balloons']),'Aircraft_Category'] = 'Balloon'
df.loc[df['Make'].isin(['Centrair']),'Aircraft_Category'] = 'Glider'
df.loc[df['Make'].isin(['Quicksilver']),'Aircraft_Category'] = 'Ultralight'
df.loc[df['Make'].isin(['Mcculloch']),'Aircraft_Category'] = 'Gyrocraft'
df.loc[df['Make'].isin(['Fleet', 'Nanchang', 'Aero Vodochody', 'Commander', 'Chance Vought']),'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('reims')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Reims Aviation', 'REIMS', 'REIMS AVIATION SA', 'REIMS AVIATION S.A.', 'REIMS-CESSNA', 'REims', 'Reims Aviation Cessna',
                       'Reims-Cessna']),'Make'] = 'Reims'
df[df['Make'] == 'Reims'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('temc')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['TEMCO', 'Temco Luscombe']),'Make'] = 'Temco'
df[df['Make'] == 'Temco'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('nih')].value_counts('Make')

In [None]:
df[df['Make'] == 'Nihon'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('mono')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Monocoupe Aircraft', 'MONOCOUPE']),'Make'] = 'Monocoupe'
df[df['Make'] == 'Monocoupe'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('mitch')].value_counts('Make')

In [None]:
df[df['Make'] == 'Mitchell'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('sling')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['SLINGSBY', 'Slingsby Aviation Plc']),'Make'] = 'Slingsby'
df[df['Make'] == 'Slingsby'].value_counts('Aircraft_Category', dropna=False)

Wikepedia says Slingsby makes both gliders and planes, so I'll look at the models

In [None]:
df[df['Make'] == 'Slingsby'].value_counts('Model', dropna=False)

Only the T67M 260 is an airplane, the rest are gliders

In [None]:
df.loc[(df['Model'].isin(['T67M 260'])) & (df['Make'].isin(['Slingsby'])), 'Aircraft_Category'] = 'Airplane'

df.loc[(df['Model'].isin(['41-2', 'CAPSTAN TYPE 49B', 'DART T-51', 'KESTREL 19', 'Swallow Type T.45', 'T-51', 'T59D KESTREL 19',
                         'T65A', 'TYPE 43 SERIES 3F'])) & (df['Make'].isin(['Slingsby'])), 'Aircraft_Category'] = 'Glider'

In [None]:
df[df['Make'].str.lower().str.startswith('smith')].value_counts('Make')

In [None]:
df[df['Make'] == 'Smith'].value_counts('Model', dropna=False)

In [None]:
df[df['Make'] == 'Smith'].value_counts('Aircraft_Category', dropna=False)

The one helicopter under Smith is the WCS-222; the rest are planes. So I'll fill in Smith's categories here by model.

In [None]:
df.loc[(df['Model'].isin(['WCS-222 (BELL 47G)'])) & (df['Make'].isin(['Smith'])), 'Aircraft_Category'] = 'Helicopter'

df.loc[(df['Model'].isin(['Aerostar 601P', 'AEROSTAR 600', 'Aerostar 601', 'MINIPLANE', 'AEROSTAR 601', 'LONG-EZ', 'MINIPLANE DSA-1',
                         'RV-4', 'S-51D', 'Stewart S51D', 'Zodiac 601XL'])) & (df['Make'].isin(['Smith'])), 'Aircraft_Category'] = 'Airplane'

df[df['Make'] == 'Smith'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('avian')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Avian', 'AVIAN BALLOON']),'Make'] = 'Avian Balloon'
df[df['Make'] == 'Avian Balloon'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].str.lower().str.startswith('governm')].value_counts('Make')

In [None]:
df[df['Make'] == 'Government Aircraft Fact (gaf)'].value_counts('Model', dropna=False)

These are all planes

In [None]:
# fill in the previous makes for their respective categories
df.loc[df['Make'].isin(['Avian Balloon']),'Aircraft_Category'] = 'Balloon'
df.loc[df['Make'].isin(['Reims', 'Temco', 'Nihon', 'Monocoupe', 'Mitchell', 'Government Aircraft Fact (gaf)']),'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

Just look for Makes that have more than just a handful of entries.

In [None]:
df[df['Make'].str.lower().str.startswith('repu')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['REPUBLIC']),'Make'] = 'Republic'
df[df['Make'] == 'Republic'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'].isin(['Republic']),'Aircraft_Category'] = 'Airplane'
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('beag')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Beagle Aircraft', 'BEAGLE']),'Make'] = 'Beagle'
df[df['Make'] == 'Beagle'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'].isin(['Beagle']),'Aircraft_Category'] = 'Airplane'
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('lanc')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['LANCAIR', 'LANCAIR COMPANY', 'Lancair Company']),'Make'] = 'Lancair'
df[df['Make'] == 'Lancair'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'].isin(['Lancair']),'Aircraft_Category'] = 'Airplane'
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('silv')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['SILVAIRE']),'Make'] = 'Silvaire'
df[df['Make'] == 'Silvaire'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'].isin(['Silvaire']),'Aircraft_Category'] = 'Airplane'
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('travel')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['TRAVEL AIR']),'Make'] = 'Travel Air'
df[df['Make'] == 'Travel Air'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'].isin(['Travel Air']),'Aircraft_Category'] = 'Airplane'
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('grob')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['GROB', 'GROB-WERKE', 'GROB AIRCRAFT AG']),'Make'] = 'Grob'
df[df['Make'] == 'Grob'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'] == 'Grob'].value_counts('Model', dropna=False)

In [None]:
df.loc[(df['Model'].isin(['G103', 'G102', 'G 103 TWIN II', 'G103 TWIN ASTIR',
                          'G103 Twin Astir'])) & (df['Make'].isin(['Grob'])), 'Aircraft_Category'] = 'Glider'

df.loc[(df['Model'].isin(['G 120A', '120A-1', 'G 180', 'G120A', 'G120TP-A'])) & (df['Make'].isin(['Grob'])), 'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('varg')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['VARGA AIRCRAFT CORP.']),'Make'] = 'Varga'
df[df['Make'] == 'Varga'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'].isin(['Varga']),'Aircraft_Category'] = 'Airplane'
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('airc')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Aircraft Mfg & Dev. Co. (amd)', 'AIRCRAFT MFG & DEVELOPMENT CO', 'AIRCRAFT MFG & DVLPMT CO',
                       'AIRCRAFT MFG & DESIGN LLC', 'Aircraft Mfg & Design LLC', 'Aircraft Mfg & Dev. Co.', 'Aircraft Mfg & Dev. Co. (AMD)',
                       'Aircraft Mfg & Development Co.']),'Make'] = 'Aircraft Mfg. & Dev. Co.'

df[df['Make'] == 'Aircraft Mfg. & Dev. Co.'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'].isin(['Aircraft Mfg. & Dev. Co.']),'Aircraft_Category'] = 'Airplane'

In [None]:
df[df['Make'] == 'Aircoupe'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'].isin(['Aircoupe']),'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'].str.lower().str.startswith('bucker')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Bucker Jungmann', 'BUCKER JUNGMANN', 'BUCKER JUNGMEISTER', 'Bucker', 'Bucker-jungmann']),'Make'] = 'Bucker Flugzeugbau'

df[df['Make'] == 'Bucker Flugzeugbau'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'].isin(['Bucker Flugzeugbau']),'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df['Aircraft_Category'].value_counts(dropna=False)

In [None]:
df[df['Aircraft_Category'] == 'Unknown']['Make'].value_counts()

In [None]:
df[df['Make'] == 'Varieze']['Aircraft_Category'].value_counts()

In [None]:
df.loc[df['Make'].isin(['Varieze']),'Aircraft_Category'] = 'Airplane'

df[df['Make'] == 'Unknown']['Model'].value_counts(dropna=False)

In [None]:
df[df['Make'].isin(['Unknown']) & df['Model'].isin(['Safari 400'])].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Make'].isin(['Unknown']) & df['Model'].isin(['Safari 400'])].head()

In [None]:
df[df['Make'].str.lower().str.startswith('saf')].value_counts('Make')

In [None]:
df.loc[(df['Model'].isin(['Safari 400'])) & (df['Make'].isin(['Unknown'])), 'Make'] = 'Safari Helicopter'

df[df['Make'].str.lower().str.startswith('saf')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['SAFARI']),'Make'] = 'Safari Helicopter'

df[df['Make'] == 'Safari Helicopter'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'] == 'Molino Oy'].value_counts('Model', dropna=False)

In [None]:
df[df['Make'].isin(['Molino Oy']) & df['Model'].isin(['MU2-2B-25'])].value_counts('Aircraft_Category')

In [None]:
df[df['Model'].str.lower().str.startswith('mu2')].value_counts('Make')

MU2-2B-25 is a model of a Mitsubishi airplane. Molino Oy seems to make gliders exclusively. So this entry was entered incorrectly and the make should be changed to Mitsubishi.

In [None]:
df.loc[(df['Model'].isin(['MU2-2B-25'])) & (df['Make'].isin(['Molino Oy'])), 'Make'] = 'Mitsubishi'

df[df['Make'] == 'Molino Oy'].value_counts('Model', dropna=False)

In [None]:
df.loc[df['Make'].isin(['Molino Oy']),'Aircraft_Category'] = 'Glider'

In [None]:
df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

In [None]:
df[df['Make'] == 'Howard Aircraft Corp.'].value_counts('Model', dropna=False)

The 500 model is an airplane made by Howard Aero Incorporated, not Howard Aircraft. The Tierra II airplane is made by Teratorn Aircraft.

In [None]:
df.loc[(df['Model'].isin(['500'])) & (df['Make'].isin(['Howard Aircraft Corp.'])), 'Make'] = 'Howard Aero Incorporated'
df.loc[(df['Model'].isin(['TIERRA II'])) & (df['Make'].isin(['Howard Aircraft Corp.'])), 'Make'] = 'Teratorn Aircraft'

df[df['Make'].str.lower().str.startswith('howard aero')].value_counts('Make')

In [None]:
df[df['Make'].str.lower().str.startswith('howard')].value_counts('Make')

In [None]:
df.loc[df['Make'].isin(['Howard Aircraft Corp.', 'HOWARD AIRCRAFT']),'Make'] = 'Howard Aircraft'

df[df['Make'].str.lower().str.startswith('howard')].value_counts('Make')

In [None]:
df[df['Make'] == 'Howard Aircraft'].value_counts('Aircraft_Category', dropna=False)

In [None]:
df.loc[df['Make'].isin(['Howard Aircraft']),'Aircraft_Category'] = 'Airplane'

df[df['Aircraft_Category'].isna()]['Make'].value_counts().head(10)

We can fix some of these empty categories through web search

In [None]:
df.loc[df['Make'].isin(['American Blimp']),'Aircraft_Category'] = 'Blimp'
df.loc[df['Make'].isin(['General Balloon']),'Aircraft_Category'] = 'Balloon'
df.loc[df['Make'].isin(['Laister', 'Scheibe Flugzeugbau']),'Aircraft_Category'] = 'Glider'
df.loc[df['Make'].isin(['Intermountain Mfg. (imco)', 'Porterfield', 'Colonial', 'Artic Aircraft Corp.', 'Curtiss']),'Aircraft_Category'] = 'Airplane'

df['Aircraft_Category'].value_counts(dropna=False)

At this point, the empty values in the category column have been reduced from about 56,000 to about 600. How does the current dataset look?

In [None]:
df.loc[df['Aircraft_Category'].isin(['UNK']), 'Aircraft_Category'] = 'Unknown'
df.loc[df['Aircraft_Category'].isin(['ULTR']), 'Aircraft_Category'] = 'Ultralight'
df.loc[df['Aircraft_Category'].isin(['WSFT']), 'Aircraft_Category'] = 'Weight-Shift'

In [None]:
df['Aircraft_Category'] = df['Aircraft_Category'].fillna('Unknown')

In [None]:
df['Aircraft_Category'].value_counts(dropna=False)

In [None]:
df.info()

# Exploratory Data Analysis

In [None]:
# Percentages for Aircraft_Category
df['Aircraft_Category'].value_counts(normalize=True)

So our dataset categories are about 87% airplane, almost 10% helicopter, and the rest (including 'Unknown') cover the remaining few percent.
Let's compare the airplanes and helicopters to the damage stats.

In [None]:
incidents_airplane = df[df['Aircraft_Category'] == 'Airplane']
incidents_helicopter = df[df['Aircraft_Category'] == 'Helicopter']

substantial_damage_airplane = incidents_airplane[incidents_airplane['Aircraft_damage'] == 'Substantial'].shape[0]
minor_damage_airplane = incidents_airplane[incidents_airplane['Aircraft_damage'] == 'Minor'].shape[0]
destroyed_airplane = incidents_airplane[incidents_airplane['Aircraft_damage'] == 'Destroyed'].shape[0]

substantial_damage_percent_airplane = substantial_damage_airplane / incidents_airplane.shape[0]*100
minor_damage_percent_airplane = minor_damage_airplane / incidents_airplane.shape[0]*100
destroyed_percent_airplane = destroyed_airplane / incidents_airplane.shape[0]*100

substantial_damage_helicopter = incidents_helicopter[incidents_helicopter['Aircraft_damage'] == 'Substantial'].shape[0]
minor_damage_helicopter = incidents_helicopter[incidents_helicopter['Aircraft_damage'] == 'Minor'].shape[0]
destroyed_helicopter = incidents_helicopter[incidents_helicopter['Aircraft_damage'] == 'Destroyed'].shape[0]

substantial_damage_percent_helicopter = substantial_damage_helicopter / incidents_helicopter.shape[0]*100
minor_damage_percent_helicopter = minor_damage_helicopter / incidents_helicopter.shape[0]*100
destroyed_percent_helicopter = destroyed_helicopter / incidents_helicopter.shape[0]*100

print(f'The percentage of substantial damage in all airplane incidents is {substantial_damage_percent_airplane:.1f}%')
print(f'The percentage of substantial damage in all helicopter incidents is {substantial_damage_percent_helicopter:.1f}%')
print()
print(f'The percentage of minor damage in all airplane incidents is {minor_damage_percent_airplane:.1f}%')
print(f'The percentage of minor damage in all helicopter incidents is {minor_damage_percent_helicopter:.1f}%')
print()
print(f'The percentage of destroyed in all airplane incidents is {destroyed_percent_airplane:.1f}%')
print(f'The percentage of destroyed in all helicopter incidents is {destroyed_percent_helicopter:.1f}%')

The damage level percentages are very similar for planes and helicopters.

For the business problem, I need to identify the best makes and models for my company to consider investing in. I've determined that my company is interested in single-engine airplanes and perhaps even helicopters for use in a short-range corporate transporation scenario.

So first I need to identify the single-engine aircraft in the dataset and divide those between planes and helicopters.

In [None]:
single_engine_craft = df[df['Number_of_Engines'] == 1.0]

single_engine_planes = single_engine_craft[single_engine_craft['Aircraft_Category'] == 'Airplane']
single_engine_helicopters = single_engine_craft[single_engine_craft['Aircraft_Category'] == 'Helicopter']

print(single_engine_craft['Aircraft_Category'].value_counts())
print()
print(single_engine_planes['Aircraft_Category'].value_counts())
print()
print(single_engine_helicopters['Aircraft_Category'].value_counts())

So this tells me I have 57806 plane accidents (single_engine_planes subset) and 6346 helicopter accidents (single_engine_helicopters subset) to work with. But not all of these planes and helicopters would be suitable to use as corporate transportation since a lot of them are going to be small, personal aircraft, not business aircraft. So some further narrowing is called for to identify single-engine business aircraft.

In [None]:
# Look at the makes of the single-engine planes subset
print(single_engine_planes['Make'].value_counts().head(25))

Cessna currently makes a business single-engine prop plane in the Caravan series, known as the 208 model

In [None]:
cessna_planes = single_engine_craft[single_engine_craft['Make'] == 'Cessna']

# show the models of cessna_planes that begin with 208
print(cessna_planes[cessna_planes['Model'].str.startswith('208')]['Model'].value_counts())
print(incidents_airplane[incidents_airplane['Model'].str.startswith('208')]['Model'].value_counts())

I see here that the Cessna 208 Models are not all categorized correctly in either the engine number column or the make column.

In [None]:
df[df['Model'].str.contains('208')].value_counts('Number_of_Engines', dropna=False)

Since we know that the 208 models are single-engine planes, I can correct this here, adding more useful planes to my Cessna subset

In [None]:
df[df['Model'].str.contains('208')].value_counts('Model', dropna=False)

In [None]:
df.loc[df['Model'].str.contains('208'), 'Number_of_Engines'] = 1.0

df[df['Model'].str.contains('208')].value_counts('Number_of_Engines', dropna=False)

In [None]:
df[df['Model'].str.contains('208')].value_counts('Make', dropna=False)

Since Textron is the parent company of Cessna, I can just combine them.

In [None]:
df.loc[df['Make'].isin(['TEXTRON AVIATION INC']),'Make'] = 'Cessna'

#refresh the single_engine and cessna subsets
single_engine_craft = df[df['Number_of_Engines'] == 1.0]
cessna_planes = single_engine_craft[single_engine_craft['Make'] == 'Cessna']

# show the models of cessna_planes that contain 208
print(cessna_planes[cessna_planes['Model'].str.contains('208')]['Model'].value_counts())

In [None]:
cessna_208 = cessna_planes[cessna_planes['Model'].str.contains('208')]

cessna_208.value_counts('Make')

So I now have 306 Cessna single-engine 208 models in the accident subset.

Piper currently makes a business single-engine plane in the M series (M350, M500, M700), also known as PA-46 in our dataset.

In [None]:
piper_planes = single_engine_craft[single_engine_craft['Make'] == 'Piper']

print(piper_planes[piper_planes['Model'].str.contains('PA-46')]['Model'].value_counts())

In [None]:
# Let's make sure we have all the Piper PA-46 models in the piper subset
print(df[df['Model'].str.contains('PA-46')]['Model'].value_counts())

There are more PA-46 models in the original df than in the piper subset

In [None]:
# Is it the engine number that was entered incorrectly?
df[df['Model'].str.startswith('PA-46')].value_counts('Number_of_Engines', dropna=False)

In [None]:
# or perhaps the make for the PA-46 is not all given as Piper
df[df['Model'].str.startswith('PA-46')].value_counts('Make', dropna=False)

In [None]:
# Correct the engine numbers for PA-46 models and combine the piper makes into 'Piper'
df.loc[df['Model'].str.startswith('PA-46'), 'Number_of_Engines'] = 1.0
df.loc[df['Make'].isin(['NEW PIPER AIRCRAFT INC', 'New Piper', 'New Piper Aircraft, Inc.']),'Make'] = 'Piper'

#refresh the single_engine and piper subsets
single_engine_craft = df[df['Number_of_Engines'] == 1.0]
piper_planes = single_engine_craft[single_engine_craft['Make'] == 'Piper']

print(piper_planes[piper_planes['Model'].str.startswith('PA-46')]['Model'].value_counts())

In [None]:
piper_PA46 = piper_planes[piper_planes['Model'].str.startswith('PA-46')]

piper_PA46.value_counts('Make')

This gives me 219 models in the Piper subset that are suitable for the company's conception of a corporate plane for business clients.

Google informs me that there are a couple more models that would fit our citeria: the Pilatus PC-12 and the Socata (now Daher) TBM series, and the Kodiak series

In [None]:
df[df['Make'].str.startswith('Pila')].value_counts('Make', dropna=False)

# Conclusions

## Limitations

## Recommendations

## Next Steps