You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The submitted proposal aims to aggregate and interpret publicly available data about flight statistics by airline and date and weather events by date to train a model to estimate delay times for a given flight. Ultimately, the numbers would be served to customers or corporations, to provide better estimates and peace of mind for travelers' schedules. This is a convincing problem, with potentially billions of dollars per year on the line--though it is not clear whether the model would result in actual recuperation of the loss so much as better anticipation of when it may occur.
The proposal contains a good idea of the consumer's model of an airline, and aims to mirror that information in a novel two-stage model: an initial estimate simply aggregating the airline's statistics per destination and travel period, and an updated estimate with more detailed weather information. While there is no clear dependence between the models except as a point of comparison, there is more rigor in the proposal stemming from the distinct models predicting the same results. Finally, there is a clear and valuable statement of intent in the project, which will be invaluable as time moves on.
The project is likely to encounter challenges distinguishing between the two models. It may turn out that the simpler model has sufficient data for a reasonable estimate—to within an hour, perhaps—where the more complex model overfits the more complex dataset. Furthermore, regression is unlikely to be useful, because flight delays are "technically" unbounded, though bounded by cancellations in practice. Speaking of cancellations, is the model likely to predict a cancelled flight differently from an indefinitely delayed flight? Will it provide error bounds, or simply a result? Will the first model feed into the second?
While training chained models is decidedly more complex than training two separate models, the aggregated information in the first model's output may prove more useful than training an entirely new model, and reduce overfitting.
The text was updated successfully, but these errors were encountered:
The submitted proposal aims to aggregate and interpret publicly available data about flight statistics by airline and date and weather events by date to train a model to estimate delay times for a given flight. Ultimately, the numbers would be served to customers or corporations, to provide better estimates and peace of mind for travelers' schedules. This is a convincing problem, with potentially billions of dollars per year on the line--though it is not clear whether the model would result in actual recuperation of the loss so much as better anticipation of when it may occur.
The proposal contains a good idea of the consumer's model of an airline, and aims to mirror that information in a novel two-stage model: an initial estimate simply aggregating the airline's statistics per destination and travel period, and an updated estimate with more detailed weather information. While there is no clear dependence between the models except as a point of comparison, there is more rigor in the proposal stemming from the distinct models predicting the same results. Finally, there is a clear and valuable statement of intent in the project, which will be invaluable as time moves on.
The project is likely to encounter challenges distinguishing between the two models. It may turn out that the simpler model has sufficient data for a reasonable estimate—to within an hour, perhaps—where the more complex model overfits the more complex dataset. Furthermore, regression is unlikely to be useful, because flight delays are "technically" unbounded, though bounded by cancellations in practice. Speaking of cancellations, is the model likely to predict a cancelled flight differently from an indefinitely delayed flight? Will it provide error bounds, or simply a result? Will the first model feed into the second?
While training chained models is decidedly more complex than training two separate models, the aggregated information in the first model's output may prove more useful than training an entirely new model, and reduce overfitting.
The text was updated successfully, but these errors were encountered: