Summary of observations
========

What we find out throughout this analysis:

* The largest existing fleets are all US-based.
* The fastest growing airlines are all low-cost carriers
* The most popular planes are cheaper, old ones (737 and A320) - and the massive amount of new orders for these planes are driven by low-cost carriers
* Among more recent planes, the A380 seems to be a commercial failure based on new orders' numbers, and the fact that Emirates is the only major airline investing in it. On the other hand, the 787 Dreamliner is a much better commercial success across multiple airlines. 
* This being said the overall trend for the future is for the industry to lean more towards airbus planes (the percentage of new orders favor Airbus overall vs Boeing)

Data Manipulation
==============

In [None]:
library(ggplot2) # Data visualization
library(readr) # CSV file I/O, e.g. the read_csv function
library(dplyr)
library(stringr)
system("ls ../input")
data <- read_csv("../input/Fleet Data.csv")

#making the data easier to deal with
colnames(data) <- c("parent_airline","airline","aircraft_type","current","future","historic","total","orders","unit_cost","total_cost","average_age")
#nrow(data)
data$total_cost <- as.numeric(gsub(",","",substring(data$total_cost,2)))
data$unit_cost <- as.numeric(gsub(",","",substring(data$unit_cost,2)))
data$plane_brand <- word(data$aircraft_type,1)
data <- data %>% mutate(newcost=ifelse(!is.na(orders),orders*unit_cost,0))

#adding flags for airbus and boeing
data <- data %>% mutate(airbus=ifelse(plane_brand=="Airbus",ifelse(is.na(current),0,current),0))
data <- data %>% mutate(boeing=ifelse(plane_brand=="Boeing",ifelse(is.na(current),0,current),0))
data <- data %>% mutate(embraer=ifelse(plane_brand=="Embraer",ifelse(is.na(current),0,current),0))
data <- data %>% mutate(bombardier=ifelse(plane_brand=="Bombardier",ifelse(is.na(current),0,current),0))
data <- data %>% mutate(mitsubishi=ifelse(plane_brand=="Mitsubishi",ifelse(is.na(current),0,current),0))
data <- data %>% mutate(embraer=ifelse(plane_brand=="Embraer",ifelse(is.na(current),0,current),0))
data <- data %>% mutate(comac=ifelse(plane_brand %in% c("COMAC","comac"),ifelse(is.na(current),0,current),0))
data <- data %>% mutate(mcdonnell=ifelse(plane_brand=="McDonnell",ifelse(is.na(current),0,current),0))
data <- data %>% mutate(canadair=ifelse(plane_brand=="Canadair",ifelse(is.na(current),0,current),0))
data <- data %>% mutate(atr=ifelse(plane_brand=="ATR",ifelse(is.na(current),0,current),0))

#table(data$plane_brand)

#adding flags for future airbus and future boeing
data <- data %>% mutate(airbus_future=ifelse(plane_brand=="Airbus",ifelse(is.na(orders),0,orders),0))
data <- data %>% mutate(boeing_future=ifelse(plane_brand=="Boeing",ifelse(is.na(orders),0,orders),0))

#The below list is not exhaustive.
lowcostcarriers<- c("AirAsia X","Gol Linhas Aereas","Azul","Tigerair Australia","Volaris","Interjet","Spirit Airlines","Frontier Airlines","Allegiant Air","WestJet","Air Canada Rouge","FlyDubai","Up","Monarch Airlines","Jet2","Flybe","Vueling","Iberia Express","Pobeda","Transavia","Germanwings","Eurowings","easyJet","Wizz Air","Lion Air","IndiGo", "Southwest Airlines", "RyanAir","AirAsia","LionAir","JetBlue Airways","Norwegian Air","WestJet","Pegasus Airlines","Flybe","SpiceJet","Air Arabia Egypt","Air Arabia Maroc","Air Cairo","Jambojet Limited","Mango","China United Airlines","Spring Airlines","Cebu Pacific Air","Jetstar Airways","AirAsia India","AirAsia Japan")
data$type <- NA
data <- data %>% mutate(type=ifelse(airline %in% lowcostcarriers,"Low Cost","Traditional"))

head(data %>% filter(type=="Low Cost"))
length(unique(data$airline))
length(unique(data$parent_airline))

Largest Airlines
============

Let's find which airlines have the largest fleets

In [None]:
#lets find out which airlines have the largest fleets
airline_fleet <- data %>% group_by(airline) %>% summarise(fleet=sum(current,na.rm=TRUE),historic=sum(historic,na.rm=TRUE),orders=sum(orders,na.rm=TRUE),ratio=100*(orders/fleet),current_fleet_cost=sum(total_cost,na.rm=TRUE),value_per_fleet_unit=current_fleet_cost/fleet)
airline_fleet <- airline_fleet %>% mutate(type=ifelse(airline %in% lowcostcarriers,"Low Cost","Traditional")) 
head(airline_fleet %>% arrange(desc(fleet)))

Unsurprisingly, American airlines are leading the pack. Air travel is the main form of transportation for long distance travels in the US, and on top of that the US have been a pioneer in air transport. We can however see that China is trailing not far behind in fleet numbers. 

Growing Airlines
============

In [None]:
#let's find out which airlines have the most orders
head(airline_fleet %>% arrange(desc(orders)))

Lion Air, IndiGo and AirAsia have a very high orders rate. They are clearly expanding at a very fast rate (see the ratio versus their current fleet). They also appear as clear outliers considering their current fleet size. If we sort the data based on what kind of carrier each airline each, we see that more often than not the Low Cost airlines are the ones with more orders compared to their current fleet size- which means they are probably expanding their operations. 

In [None]:
#ratio of orders vs current fleet for fleets more than 10
#head(airline_fleet %>% filter(fleet>10) %>% arrange(desc(ratio)))

#largest fleet costs carriers
#head(airline_fleet %>% filter(fleet>10) %>% arrange(desc(current_fleet_cost)))

#plot current vs future
g <- ggplot(airline_fleet,aes(x=fleet,y=orders,label=airline,color=as.factor(type)))+geom_point(size=0.5) +geom_text(size=2,hjust=0,vjust=0,nudge_x=7,nudge_y=-2)+ theme_minimal()
#geom_text(aes(label=rownames(airline_fleet)))
g + geom_smooth(aes(color=NA),method=lm,se=FALSE) + ggtitle("New Orders vs Existing Fleet Size")+labs(x="Current Fleet Size (Units)",y="New Orders (Units)")

We can draw the same graph focusing on airlines of smaller sizes (less than 200 planes in their current fleet).

In [None]:
g <- ggplot(airline_fleet %>% filter(fleet<=200),aes(x=fleet,y=orders,label=airline,color=as.factor(type)))+geom_point(size=0.5) +geom_text(size=2,hjust=0,vjust=0,nudge_x=3,nudge_y=-2)+ theme_minimal()
#geom_text(aes(label=rownames(airline_fleet)))
g  + ggtitle("New Orders vs Existing Fleet Size")+labs(x="Current Fleet Size (Units)",y="New Orders (Units)")

Most Popular Planes (based on orders)
=============================

What are the most popular planes sold/ordered at the present timing ? 

In [None]:
neworders <-data %>% filter(!is.na(orders)) %>% filter(orders>0) %>% group_by(aircraft_type) %>% summarise(orders=sum(orders),cost=mean(unit_cost)) 

newplanes<-c("Airbus A380","Airbus A350","Airbus A350 XWB","Airbus A350-900","Boeing 787","Boeing 787 Dreamliner")
neworders <- neworders %>% mutate(new=ifelse(aircraft_type%in%newplanes,"new","old"))

head(neworders %>% arrange(desc(orders)))

ggplot(neworders,aes(x=orders,y=cost,label=aircraft_type,color=(new)))+geom_point(size=0.5)+geom_text(size=2,hjust=0,vjust=0,nudge_x=20,nudge_y=-2)+ggtitle("New orders vs Cost")+labs(x="New Orders (units)",y="Unit Plane Cost (USD M)")+theme_minimal()

Two conclusions from this chart. First, all of the newer planes (released from the 2000s) are very expensive and cost at least 2 to 3 times more than the more popular, cheaper aircrafts like the Boeing 737 or Airbus A320. Second, all of the newer planes sell relatively poorly so far, and may be a mismatch for a market that is seemingly moving towards shorter trips/regional flights and low-cost carriers. Low-cost carriers are looking at cost efficiency and so far it seems the older planes are more effective for that purpose. The below table shows which airlines are buying these smaller, cheaper planes. There is no surprise here, all of the major purchasers are Low-Cost Carriers.

In [None]:
buyingcheap <- data %>% filter(!is.na(orders)) %>% filter(orders>0) %>% filter(aircraft_type %in% c("Airbus A320","Boeing 737")) %>% group_by(airline) %>% summarise(orders=sum(orders))
head(buyingcheap %>% arrange(desc(orders)))

As a side observation, we can see that the Airbus A380 has very few orders so far. Who is buying them ? 

In [None]:
buying380 <- data %>% filter(!is.na(orders)) %>% filter(orders>0) %>% filter(aircraft_type=="Airbus A380") %>% group_by(airline) %>% summarise(orders=sum(orders))
head(buying380 %>% arrange(desc(orders)))

Indeed Emirates is one of the most ardent supporters of the A380. Some sources mention even higher number of orders ([http://www.stuff.co.nz/travel/news/88068755/emirates-airline-receives-its-first-rollsroyce-powered-a380-superjumbo][1]) and they have just started to receive some of their orders by end 2016. It seems that A380 is not economical when it comes to fuel consumption, and this probably matters for most airlines except the ones which have direct access to cheap oil, like Emirates. 

The A380 is already disappointing in sales at this stage as most airlines are moving to regional hub operations instead of big, long-distance international hub connections. 

  [1]: http://www.stuff.co.nz/travel/news/88068755/emirates-airline-receives-its-first-rollsroyce-powered-a380-superjumbo

The 787 Dreamliner, the direct competitor to Airbus' A380, gets a lot more orders from a diverse group of companies (and does not rely on a single Airline's deal for its success). 

In [None]:
buying787DL <- data %>% filter(!is.na(orders)) %>% filter(orders>0) %>% filter(aircraft_type=="Boeing 787 Dreamliner") %>% group_by(airline) %>% summarise(orders=sum(orders))
head(buying787DL %>% arrange(desc(orders)))

Planes Manufacturers and Mix by Airline
============

What do fleets look like nowadays - what is flying most these days? Boeing ? Airbus ? Seems like's largely Boeing, both in number and value. Interestingly, Embraer is a third option with quite a few planes being used currently.

In [None]:
current_aircrafts <- data %>% filter(!is.na(current)) %>% filter(current>0) %>% group_by(plane_brand) %>% summarise(current=sum(current),cost=sum(total_cost,na.rm=TRUE))
head(current_aircrafts %>% arrange(desc(current)))

What is the distribution of Airbus/Boieng planes across major airlines? Is it even, or are there some clear preferences/partnerships depending on the company ? There are some companies like Ryan Air and Southwest Airlines that actually ONLY use Boeing, while there are no companies out there using only Airbus currently. And the companies with the highest percentage of Airbus usage are not even the European companies (Chinese companies are at the top, following by some American companies). 

In [None]:
bycompany <- data %>% group_by(airline) %>% summarise(total=sum(current,na.rm=TRUE),airbus=sum(airbus),boeing=sum(boeing),airbusper=round(100*airbus/total),boeingper=round(100*boeing/total),airbusadd=sum(airbus_future),boeingadd=sum(boeing_future),newtotal=total+airbusadd+boeingadd,newairbusper=round(100*(airbus+airbusadd)/newtotal))

bycompany <- bycompany %>% mutate(type=ifelse(airline %in% lowcostcarriers,"Low Cost","Traditional"))
head(bycompany %>% filter(total> 300) %>% arrange(desc(boeingper)))
head(bycompany %>% filter(total> 300) %>% arrange(desc(airbusper)))

How are companies changing their mix of airbus and boeing between their current fleet and their currently placed orders, assuming they kept all their current planes as active ? (this is a preposterous assumption, but let's make it for the sake of discussion!) - we can see that there are actually many companies shifting towards more airbus planes in their fleets, rather than the opposite. The shift seems to be more pronounced among the current companies which have fewer percentage of Airbus planes.

In [None]:
ggplot(data=bycompany,aes(x=airbusper,y=newairbusper,label=airline,color=type))+geom_point(size=0.5)+geom_text(size=2,hjust=0,vjust=0,nudge_x=1,nudge_y=-1)+theme_minimal()+ggtitle("Evolution (Rough Estimation) of Airbus Mix in the Future")+labs(x="Current Fleet Percentage of Airbus Planes (%)")+labs(y="Estimated Future Fleet Percentage of Airbus Planes (%)")

How do the major competitors fare in sales ? Actually Airbus seems to be better off in terms of future revenues considering all planes ordered. We can see that Airbus and Boeing dwarf completely the aircraft manufacturers' market. Embraer, the third in the list, sells less than 10 times less than Boeing. The new orders are mainly concentrated on Airbus and Boeing, even more than the current distribution of planes in operation. 

In [None]:
business <-data %>% filter(!is.na(orders)) %>% filter(orders>0) %>% group_by(plane_brand) %>% summarise(orders=sum(orders),cost=sum(newcost))

head(business %>% arrange(desc(cost)))

But making is a Airbus vs Boeing thing when trying to differentiate airlines is a simplification. Looking at the current planes distributions for each fleet, we can do a PCA in order to find out what are the relevant axes that make a difference to separate airlines. And no, it's not about airbus or boeing that much. 

In [None]:
fleets <- data %>% group_by(airline) %>% summarise(total=sum(current,na.rm=TRUE),airbus=sum(airbus),boeing=sum(boeing),embraer=sum(embraer),bombardier=sum(bombardier),atr=sum(atr),mcdonnell=sum(mcdonnell),canadair=sum(canadair))
#head(fleets)
#summary(fleets)
fleets<-as.data.frame(fleets)
row.names(fleets)<-fleets[,1]
fleets.pca <- prcomp(fleets[,3:ncol(fleets)],center=TRUE,scale=TRUE)
print(fleets.pca)
biplot(fleets.pca,cex=0.4,xlim=c(-0.2,0.6))

The key relevant differences is whether or not they have other kind of planes in their fleets or not. United Airlines is mainly Airbus and Boeing, and that's it. On the other end of the spectrum, Skywest or [American Eagle][1] is mainly Embraer and Canadair. There are probably economic reasons driving such differences: Canadair and Embraer sell cheaper planes, and companies looking at cost effective solutions may prefer to go for such planes instead of the options from Airbus or Boeing.


  [1]: https://en.wikipedia.org/wiki/Envoy_Air