Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project Feedback #1

Open
Eric-w-H opened this issue Oct 8, 2021 · 0 comments
Open

Project Feedback #1

Eric-w-H opened this issue Oct 8, 2021 · 0 comments

Comments

@Eric-w-H
Copy link

Eric-w-H commented Oct 8, 2021

The submitted proposal aims to aggregate and interpret publicly available data about flight statistics by airline and date and weather events by date to train a model to estimate delay times for a given flight. Ultimately, the numbers would be served to customers or corporations, to provide better estimates and peace of mind for travelers' schedules. This is a convincing problem, with potentially billions of dollars per year on the line--though it is not clear whether the model would result in actual recuperation of the loss so much as better anticipation of when it may occur.

The proposal contains a good idea of the consumer's model of an airline, and aims to mirror that information in a novel two-stage model: an initial estimate simply aggregating the airline's statistics per destination and travel period, and an updated estimate with more detailed weather information. While there is no clear dependence between the models except as a point of comparison, there is more rigor in the proposal stemming from the distinct models predicting the same results. Finally, there is a clear and valuable statement of intent in the project, which will be invaluable as time moves on.

The project is likely to encounter challenges distinguishing between the two models. It may turn out that the simpler model has sufficient data for a reasonable estimate—to within an hour, perhaps—where the more complex model overfits the more complex dataset. Furthermore, regression is unlikely to be useful, because flight delays are "technically" unbounded, though bounded by cancellations in practice. Speaking of cancellations, is the model likely to predict a cancelled flight differently from an indefinitely delayed flight? Will it provide error bounds, or simply a result? Will the first model feed into the second?

While training chained models is decidedly more complex than training two separate models, the aggregated information in the first model's output may prove more useful than training an entirely new model, and reduce overfitting.

@PPPSDavid PPPSDavid reopened this Dec 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants