New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hardcoded NTP Pool names #47

Open
abh opened this Issue Dec 20, 2016 · 13 comments

Comments

Projects
None yet
4 participants
@abh

abh commented Dec 20, 2016

Please don't hard code NTP Pool (pool.ntp.org) hostnames. It appears one of your users had a bad implementation and caused considerable havoc on the system. Leave the list empty and reference http://www.pool.ntp.org/vendors.html

https://twitter.com/paulgear1/status/810967233375531008

@gavineadie

This comment has been minimized.

Show comment
Hide comment
@gavineadie

gavineadie Dec 20, 2016

Collaborator

OK .. I wasn't aware of that policy. I'll change as you suggest.

Collaborator

gavineadie commented Dec 20, 2016

OK .. I wasn't aware of that policy. I'll change as you suggest.

@gavineadie

This comment has been minimized.

Show comment
Hide comment
@gavineadie

gavineadie Dec 20, 2016

Collaborator

Changes made .. all references to hard coded NTP Pool hostnames have been removed.

Collaborator

gavineadie commented Dec 20, 2016

Changes made .. all references to hard coded NTP Pool hostnames have been removed.

@gavineadie gavineadie closed this Dec 20, 2016

@jbenet

This comment has been minimized.

Show comment
Hide comment
@jbenet

jbenet Dec 20, 2016

Owner

Thanks for prompt response @abh @gavineadie -- sorry that none of us saw this problem coming here. We probably could've. May be worth a warning to users in the readme to think through the repercussions of issuing (k req/s * d devices)

Hopefully devs at snapchat update/fix asap.

Owner

jbenet commented Dec 20, 2016

Thanks for prompt response @abh @gavineadie -- sorry that none of us saw this problem coming here. We probably could've. May be worth a warning to users in the readme to think through the repercussions of issuing (k req/s * d devices)

Hopefully devs at snapchat update/fix asap.

@jbenet

This comment has been minimized.

Show comment
Hide comment
@jbenet

jbenet Dec 20, 2016

Owner

@abh is there anything else problematic in this lib? May be a good time to review/audit it. If people are in contact with snapchat devs and the push other fixes-- important to get those back here upstream to ensure the next mobile/iot mega app doesn't cause other problems for ntp.

Owner

jbenet commented Dec 20, 2016

@abh is there anything else problematic in this lib? May be a good time to review/audit it. If people are in contact with snapchat devs and the push other fixes-- important to get those back here upstream to ensure the next mobile/iot mega app doesn't cause other problems for ntp.

@gavineadie

This comment has been minimized.

Show comment
Hide comment
@gavineadie

gavineadie Dec 20, 2016

Collaborator

Yes .. Let me be frank since not all the history is known. I wrote this library a long time ago for the iPhone 3GS and earlier. At that time, as best I could tell, the iPhone obtained its time from the phone company and, for me in South-East Michigan, it was not unusual for the iPhone time to be at up to two minutes variance from true time. Since my app was predicting the location of the International Space Station and a two minute error represented about 600 miles, I wrote this library so people wouldn't be staring at the wrong side of the sky!

Since then, the world has changed. First, iOS devices use the NTP protocol as a time source -- I stopped using my own library years ago because iPhones were delivering sub-second accurate times natively. Second, my first GitHub contribution was this library and I now consider that a mistake -- it was a fairly immature piece of work (@jbenet improved things significantly, much appreciated). Third, it became clear that this library was mostly being used in a 'one-shot' manner to check the iOS time hadn't been pushed back to defeat software registrations, and there are better ways to do that (in fact, this library had a one-shot capability added). Fourth, I never dreamed an "iOS mega app" would use it -- I would have discouraged that earnestly.

I've thought for a while that this library had passed its "sell-by" date and was of minimal value; this Snapchat incident (about which I've heard only a little, and that obliquely), makes me want to remove it. I've been involved in network software, on and off, for about forty years and regard myself as a good net citizen. I would very much appreciate hearing stakeholders' thoughts on the best (or just good) next steps.

Collaborator

gavineadie commented Dec 20, 2016

Yes .. Let me be frank since not all the history is known. I wrote this library a long time ago for the iPhone 3GS and earlier. At that time, as best I could tell, the iPhone obtained its time from the phone company and, for me in South-East Michigan, it was not unusual for the iPhone time to be at up to two minutes variance from true time. Since my app was predicting the location of the International Space Station and a two minute error represented about 600 miles, I wrote this library so people wouldn't be staring at the wrong side of the sky!

Since then, the world has changed. First, iOS devices use the NTP protocol as a time source -- I stopped using my own library years ago because iPhones were delivering sub-second accurate times natively. Second, my first GitHub contribution was this library and I now consider that a mistake -- it was a fairly immature piece of work (@jbenet improved things significantly, much appreciated). Third, it became clear that this library was mostly being used in a 'one-shot' manner to check the iOS time hadn't been pushed back to defeat software registrations, and there are better ways to do that (in fact, this library had a one-shot capability added). Fourth, I never dreamed an "iOS mega app" would use it -- I would have discouraged that earnestly.

I've thought for a while that this library had passed its "sell-by" date and was of minimal value; this Snapchat incident (about which I've heard only a little, and that obliquely), makes me want to remove it. I've been involved in network software, on and off, for about forty years and regard myself as a good net citizen. I would very much appreciate hearing stakeholders' thoughts on the best (or just good) next steps.

@gavineadie gavineadie reopened this Dec 20, 2016

@abh

This comment has been minimized.

Show comment
Hide comment
@abh

abh commented Dec 20, 2016

Thanks @gavineadie and @jbenet!

@abh

This comment has been minimized.

Show comment
Hide comment
@abh

abh Dec 20, 2016

@gavineadie I agree with much of what you wrote, but a silver lining is that the library being open source did give the community a way to find the code and help get it and documentation for future users corrected. :-)

abh commented Dec 20, 2016

@gavineadie I agree with much of what you wrote, but a silver lining is that the library being open source did give the community a way to find the code and help get it and documentation for future users corrected. :-)

@gavineadie

This comment has been minimized.

Show comment
Hide comment
@gavineadie

gavineadie Dec 20, 2016

Collaborator

True, but I still feel like an idiot !

Collaborator

gavineadie commented Dec 20, 2016

True, but I still feel like an idiot !

@gavineadie

This comment has been minimized.

Show comment
Hide comment
@gavineadie

gavineadie Dec 20, 2016

Collaborator

I should also say the library will query NO servers in its new default state .. now a ntp.hosts file MUST be provided.

Collaborator

gavineadie commented Dec 20, 2016

I should also say the library will query NO servers in its new default state .. now a ntp.hosts file MUST be provided.

@jbenet

This comment has been minimized.

Show comment
Hide comment
@jbenet

jbenet Dec 20, 2016

Owner

@gavineadie ❤️

i think we could've done more (we can always do more). We could've known about that recommendation and fixed our code, we could've documented this better, and so on. But i think the main issue in this incident is deployers not understanding what they're deploying. This should've been caught by snapchat's development & deployment team way before making it into production in millions of devices (if it hit that, i still have little knowledge on the numbers).

I personally think that this library being here helps everyone. Open source is tricky, but we are all better by this library being here, than it not being here at all. It allows better community review and fixes, and i count our involvement here (however small) as a net win (@abh and the NTP folks were able to track the problem down and point to the code, @gavineadie was able to fix it quickly, and snapchat was able to pull latest (or apply their own patch), thanks to the code being Open Source and readily available. I wouldn't want to see it go.

Owner

jbenet commented Dec 20, 2016

@gavineadie ❤️

i think we could've done more (we can always do more). We could've known about that recommendation and fixed our code, we could've documented this better, and so on. But i think the main issue in this incident is deployers not understanding what they're deploying. This should've been caught by snapchat's development & deployment team way before making it into production in millions of devices (if it hit that, i still have little knowledge on the numbers).

I personally think that this library being here helps everyone. Open source is tricky, but we are all better by this library being here, than it not being here at all. It allows better community review and fixes, and i count our involvement here (however small) as a net win (@abh and the NTP folks were able to track the problem down and point to the code, @gavineadie was able to fix it quickly, and snapchat was able to pull latest (or apply their own patch), thanks to the code being Open Source and readily available. I wouldn't want to see it go.

@jbenet

This comment has been minimized.

Show comment
Hide comment
@jbenet

jbenet Dec 21, 2016

Owner

(sorry, had to step out for a bit -- i wanted to add the following)

Second, my first GitHub contribution was this library and I now consider that a mistake -- it was a fairly immature piece of work

Hmm-- @gavineadie i am personally very thankful that you made this implementation in the first place, and that it worked well; it solved my problem and allowed me to make a game that brought joy to 100,000s of people. (where they couldn't cheat by changing their phone's clock!) And i'm sure so many other users along the years have also been very thankful for it-- it fit many of our needs. Your work is much appreciated, so please feel very welcome to continue your contributions, for the commons can only get better when we work openly. ❤️

Some more lessons to draw from this

  • It's worth mentioning that open source code should always be treated like tools in a workshop: in various states of operation and polish, and it is up to the final distributors to verify their dependencies are being used correctly. We can definitely improve the tooling, and improve documentation (and we should definitely have a good handle on that), but Deployers must also ship the code they mean to ship, and should have a good understanding of how the dependencies they ingest function, and their operation range. It's an "end-to-end principle" type of thing.
  • I think this needed more communication and rigor throughout. we needed to have looked at specs closer to know not to include the list, snapchat team should've had a much clearer understanding on what network connections their app was is making, and so on.
  • These things happen, and it takes us responding quickly to make sure things are fixed and improved. This was achieved here (everyone, from @abh and ntp folks detecting this, looking for the source, filing issues, and contacting us and snapchat, to twitter discussing, to @gavineadie pushing a fix, and snapchat pushing a fix to app store). So thanks to everyone, glad to see the whole thing resolved, without a major disaster :)
Owner

jbenet commented Dec 21, 2016

(sorry, had to step out for a bit -- i wanted to add the following)

Second, my first GitHub contribution was this library and I now consider that a mistake -- it was a fairly immature piece of work

Hmm-- @gavineadie i am personally very thankful that you made this implementation in the first place, and that it worked well; it solved my problem and allowed me to make a game that brought joy to 100,000s of people. (where they couldn't cheat by changing their phone's clock!) And i'm sure so many other users along the years have also been very thankful for it-- it fit many of our needs. Your work is much appreciated, so please feel very welcome to continue your contributions, for the commons can only get better when we work openly. ❤️

Some more lessons to draw from this

  • It's worth mentioning that open source code should always be treated like tools in a workshop: in various states of operation and polish, and it is up to the final distributors to verify their dependencies are being used correctly. We can definitely improve the tooling, and improve documentation (and we should definitely have a good handle on that), but Deployers must also ship the code they mean to ship, and should have a good understanding of how the dependencies they ingest function, and their operation range. It's an "end-to-end principle" type of thing.
  • I think this needed more communication and rigor throughout. we needed to have looked at specs closer to know not to include the list, snapchat team should've had a much clearer understanding on what network connections their app was is making, and so on.
  • These things happen, and it takes us responding quickly to make sure things are fixed and improved. This was achieved here (everyone, from @abh and ntp folks detecting this, looking for the source, filing issues, and contacting us and snapchat, to twitter discussing, to @gavineadie pushing a fix, and snapchat pushing a fix to app store). So thanks to everyone, glad to see the whole thing resolved, without a major disaster :)
@gavineadie

This comment has been minimized.

Show comment
Hide comment
@gavineadie

gavineadie Dec 21, 2016

Collaborator

Fair enough, all good points ..

I'll note, for what it's worth, that NTPv4 RFC makes no mention of not using the "pool" hosts, so due diligence was done there. Only today did I become aware of the draft "best practices" document (now included in the repo).

I'm reminded that years ago I read the SMTP standard(s) carefully when doing a mainframe implementation; I even used the BNF grammar in the document appendix to parse the protocol. When I got it working, I found the distressing extent of the violations of the protocol in the wild .. my first experience of "be generous in what you receive, and rigorous in what you transmit" !!

Collaborator

gavineadie commented Dec 21, 2016

Fair enough, all good points ..

I'll note, for what it's worth, that NTPv4 RFC makes no mention of not using the "pool" hosts, so due diligence was done there. Only today did I become aware of the draft "best practices" document (now included in the repo).

I'm reminded that years ago I read the SMTP standard(s) carefully when doing a mainframe implementation; I even used the BNF grammar in the document appendix to parse the protocol. When I got it working, I found the distressing extent of the violations of the protocol in the wild .. my first experience of "be generous in what you receive, and rigorous in what you transmit" !!

@victorstewart

This comment has been minimized.

Show comment
Hide comment
@victorstewart

victorstewart Aug 3, 2017

I could add too, sometimes it's very important to make sure all devices in your ecosystem as synced to a singular time source! Across hardware manufacturers... operating systems... different leap second handling implementations. Now I can be certain that every user of app whether iOS, Android (on a whole host of different hardware), plus my RedHat servers are all synced to the same time source.

victorstewart commented Aug 3, 2017

I could add too, sometimes it's very important to make sure all devices in your ecosystem as synced to a singular time source! Across hardware manufacturers... operating systems... different leap second handling implementations. Now I can be certain that every user of app whether iOS, Android (on a whole host of different hardware), plus my RedHat servers are all synced to the same time source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment