Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
235 lines (117 sloc) 65.9 KB
ST: What do you do at Mozilla?
BW: My title is a staff security engineer in the cloud services group. I analyse and develop protocols for securely managing passwords and account data and I implement tools to, I implement those protocols in different fashions, I review other code, I look at external projects that we can take advantage of to figure out whether its appropriate to incorporate them, and I try and stay on top of security failures like 0 days and problems in the wild that might affect us and also tools and algorithms that we might be able to use.
ST: So you do quite a lot then.
BW: There’s a variety of stuff then.
ST: I didn’t quite realise you did all of that.
BW: Yeah, different amounts of it at different times. Sometimes I’m more doing developer stuff, sometimes I’m doing more research stuff.j And I guess it kinds of trades off if its crunch time and we need to write code or if we have an opportunity to bang on designs.
ST: UX vs Security: Is it a false dichotomy? Why do some people have the impression that for security to be good, it must be difficult to use? How can that perception be altered? (You and Skinny were going to do a talk together about how the two can live hand in hand, I was always interested in hearing more about that.)
BW: There are times when I think that it’s a three way tradeoff. Like instead of being x axis, y axis, and there is this diagonal line that doesn’t touch zero, it’s like 1, 0, 0, 1. Sometimes I think it’s a three way thing where the other axis is how much work you want to put into it or how clever you are or how much user research and experimentation actually. Stuff that engineers are typically not focused on but that UX people and more psychologists, people that study human reactions and how people are focused on those results. So, I believe, maybe it’s more of a hope than a belief, that if you put enough effort into that, then you can actually find something that actually is secure and usable at the same time, but you have to do a lot more work. So in some ways the quickest, easiest, fastest to market, cheapest to develop kind of thing is going to have pretty severe tradeoffs.
ST: So is this sort of the same three axis thing of .. what is it.. whether its good, whether its fast and whether its fast.
BW: best, cheap. And so yeah, secure, usable, cheap. Secure usable fast, maybe. I think that the trick is to figure out what people want to do, find a way of making whatever security decisions they have to make a normative part of that work flow. So that when you lend your house key to a neighbour so they can water your plants when you are away on vacation, you’ve got a pretty good idea of what idea you are handing over. And there is some social constructs surrounding that like “I don’t think you’re going to make a copy of that key and so when I get it back from you, so when I get it back from you, you no longer have that power that I granted to you.” And there are sort of patterns in normal life with normal non-computer behaviours and objects or effects that we developed some social practices around, some social standards around. And I think part of the trick is to use that like assume that people are going to expect something that works like that and then find a way to make the computer stuff look like that. And there are some pieces, part of the problem is that we end up asking people to do very unnatural things because it is hard to imagine or hard to build something that’s better. So like passwords. The only way. Passwords are a lousy authentication technology for a lot of different reasons. One of them being that in most cases to exercise the power you have to give that power to whoever it is you are trying to prove to. It’s like let me prove to you I know a secret, ok, tell me the secret. And that introduces all these issues like knowing who you are talking to and knowing to correctly identify who you are talking too. In addition to that the best passwords are going to be randomly generated by a computer and they are relatively long. And it’s totally possible to memorise things like that but it takes space, repetition, it takes using it on a regular basis, it takes a certain amount of exercise and you have to think of, think more in terms in terms of designing a lesson plan for using the password, and practice learning the password. And that is way more work than any one program deserves. You know you couldn’t build such things and ask people to use it, but if you only have one such password and the only thing you used it on was your phone, then your phone is now your intermediary that manages all this stuff for you, and then it probably would be fair, and it’s clear that your phone is sort of this extension of you, better at remembering things, and that the one password you need in this whole system is the bootstrapping thing that prevents your phone getting stolen. So some stuff like that, some stuff like escalating effort in rare circumstances. You know there are a lot of cases where what you do on an every day basis can be really easy and really light weight, and its only when you lose the phone that you have to go back to the more complicated thing just like you know, just like you carry so much wallet, and every once in a while you have to go to a bank and get more. So it’s stuff like that I think it’s totally possible to do, but it’s been really easy to fall into these bad patterns like blaming the user or pushing a lot of decisions onto the user when they don’t really have enough information to make a good choice there. And a lot of the choices you are giving them aren’t very meaningful.
ST: Do you think a lot of times users either don’t understand the decisions they are being asked to make or the tradeoffs they are being asked to be made. Say data, user data perhaps, for a free product and stuff like that?
BW: I think that’s very true, and I think more of the time it’s an inappropriate question to ask them. It’s kind of unfair. Walking up to somebody and putting them in this uncomfortable situation - do you like X or do you like Y is a little bit cruel.
ST: So that can be reframed, or should be reframed a lot of times?
BW: Yeah, I mean, there’s situations in which. One thing that comes to mind is permission dialogs, especially on windows boxes, where to do a bunch of really basic stuff that’s useful and it’s a valid thing to do, it’s not like you are trying to do something crazy, you have to answer these questions, you have to accept these demands that are basically saying - bad things will happen and it’s all going to be your fault. In some ways that’s giving the user, it’s intended to give the user an informed choice, but what it comes out into is this kind of blame the user, blame the victim pattern, where it’s like “something bad happened, but you clicked on the OK button, you’ve taken responsibility for that” Because it wasn’t, the user didn’t have enough information to do something and the system wasn’t well enough designed that they could have designed what they wanted to do without becoming vulnerable for the stuff they asked for. It’s like, I want to run a spreadsheet program and if the dialog box “by the way if the spreadsheet program is badly written, then it could have a virus and it’ll destroy your entire computer.” then it’s like well, how about you write the spreadsheet program better, or how about the OS system runs in a sandbox where it can only spread sheets, where it can only like flip bits and not overwrite everything. So that’s a limitation of the computing model that is then imposing this really annoying choice. But there are other cases like you said with online data and “by using this service, do you understand that they are going to collect a lot of information about you and sell it to somebody that doesn’t really care about you but they are going to make some money off of that and are you OK with that?” Yeah, it seems like there should be some better way of expressing that or raising awareness of that and that I’d like if people recognised that’s what the deal was.
ST: So would I, very much so.
BW: But I don’t know how to get that started and I don’t know how to convince a business to scare all of their potential users away
ST: Thank you for that. So months before sync 2.0, I don’t know if that’s what it’s actually being called or not, ever saw the light of day.
BW: We tend to call it New Sync or Sync 1.5. The sync people call it sync 1.5, because they had bigger plans in mind for sync 2.0.
ST: But you guys hashing out the protocol in extremely vocal and public forum. You know there were weigh ins from everybody it seemed and that thread just went on forever. This is last May I’m thinking of in particular, April, May time. the discussions seemed to be the exact opposite of security through obscurity.
BW: Definitely, definitely. So there were a couple of different things that I was hoping from that discussion and I pushed all that stuff to be described and discussed publicly because it’s the right thing to do, it’s the way we develop software, you know, it’s the open source way. And so I can’t really imagine doing it any other way. But the specific hopes that I had for publishing that stuff and try and solicit feedback were getting people to look for basic design flaws. It’s like oh hey wait, if you fail to hash this thing into this place, then it means someone can do this kind of guess cheaper than intended. I wanted to get people comfortable with what the security properties were, especially because new sync changes some of them. We are switching away from pairing to something based on passwords. I wanted people to have time to feel they understood what those changes were and why we were making them. Like get the design criteria and those constraints out there so people could see we kind of have to switch to a password to meet all of the other goals, and what’s the best we can do given security based on passwords. And then the other part was I don’t know how to, it’s having that kind of public discussion and getting as many experienced people involved as possible the only way that I know of to develop confidence that we’re building something that’s correct and not broken. One of the tricky things that I’ve found getting into the security space is not knowing when you know enough. Like it’d be nice if there were a merit badge or some kind of you take a test and then they give they give you a sticker, and then you’re like “yup, I know crypto and SEE!” And there isn’t such a thing and you end up reading other stuff, and sometimes you see problems in it, and maybe that tells you well, “maybe well, I know enough to see problems here” and then you build something and then you go and look back at it 6 months later and you see problems in the stuff you wrote yourself and then you go “no, I don’t know how to do this stuff.” So it is really just more eyeballs… problems in the past seems to be the best way of identifying flaws, cause the big problem with the security stuff is like sometimes it’s you’re like “what are the chances that that problem X would happen.” if you design something and there is a 1 in a 1000 chance that something will, that the particular set of inputs will cause this one particular problem to happen, if it really is random, then 1 /1000 may be ok, 1/1M may be ok, but if it is in this situation where an attacker gets to control the inputs, then it’s no longer 1/1000, it’s 1 in how ever many times the attacker chooses to make it 1. And so it’s this game of like whose cleverer and who is more through. And it’s frustrating to have to do this case analysis to figure out every possible thing that could, every state it could get into, but if somebody else out there is determined to find a hole, that’s the kind of analysis they are going to do. And if they are more thorough than you are, then they’ll find a problem that you failed to cover.
ST: That leads to one of my other questions, is this what is meant by thread modelling?
BW: Yeah, I think different people use the term in different ways, I think of when you are laying out the system, you are setting up the ground rules. You are saying there is going to be this game. And in this game, Alice is going to choose a password and Bob is trying to guess her password, and whatever. And you are defining what the ground rules are. So sometimes the rules say things like .. The attacker doesn’t get to run on the defending system, their only access to it is through this one API call, and that’s the API call that you provide for all of the good players as well, but the bad guy, you can’t tell the difference between the good guy and the bad guy, so they’re going to use that same API. And so then you figure out what are the security properties if the only thing the bad guy can do is make API calls, so maybe that means they are guessing passwords, or it means they are trying to overflow a buffer by giving you some input you didn’t expect, and then you kind of step back and say “OK, what assumptions are you making here, are those really valid assumptions?” And so, you know, you store passwords in the database with the assumption that the attacker won’t ever be able to see the database, and then some other part of the system, and loops, now they can see the database. OK, roll back that assumption, now you assume that most attackers can’t see the database but sometimes they can, how can you protect the stuff that’s in the database as best as possible. More stuff like you know, who, what are all the different sorts of threats you are intending to defend against, and then you lay those out and someone else can come along later and say that’s all well and good, but I think you should really consider the threat of somebody breaking into your data centre and stealing the computer, or planting a program on there. Or somebody is running a keyboard logger on the user’s machine and is capturing everything their typing, and sometimes those threats are things you can reasonably defend against and sometimes there not. Sometimes you draw a line in the sand and say “we are willing to try and defend against everything up to this level, and beyond that you’re hosed.” And sometimes it’s a very practical distinction where like “we could try to defend against that but it would cost us 5x as much.”
ST: So this becomes a game of tradeoffs at that point? Again, monetary, time, effort.
BW: So, for example, if you are worried about, if part of your threat model is something you are trying against is the user’s computer has a keyboard sniffer on it, and what you are trying to do is prevent the attacker from being able to log in to the system on their own, then the way you deal with that is either to like have the user click on buttons on the screen that move around so whatever you are recording from the mouse isn’t enough to figure out what they were clicking on, or you send out little hardware tokens to everybody and have everybody use those instead of real keyboards. And both of those have a cost to like, user’s time, money, and you have to decide how prevalent how prevalent are keyboard sniffers, what percentage of your audience do you expect is going to be suffering from that, and how reasonable is it to try and defend against the threat of them?
ST: It must be somewhat reasonably common, HSBC sent me out one of those little dongles.
BW: Well, the stakes are higher. That’s another thing, it’s like um. This tradeoff between, sometimes what people do is try and estimate the value to the attacker versus the cost to the user, and it’s you know its kind of like insurance modelling where expected value. Cost to attacker is X much to do something and they’ve got an expected gain of Y based on like there is some risk they might get caught, there are all sorts of potential penalties. If they succeed in taking over your bank account, how much is that worth to them, and the black market for these has all kinds of layers, there is somebody who is going to transfer the account money where somebody goes to the ATM to take it, somebody is going to take that and mail it to somebody overseas, and like any complicated enterprise, criminal or otherwise, there are lots of middlemen and lots of layers, so the amount of money they can actually get out of your account minus the risk they are going to take by doing it, is only worth so much. And so then the bank are going to try to work out what’s our repetitional risk if somebody does break in here, how much is that going to hurt our reputation or business, and they make the decision that sending out a little gadget like that is going to be worth the money it costs and the inconvenience to you to actually use it, but that will give you more confidence in using their services, you are more likely to recommend your friends.
ST: So it has a payoff for them in the end probably.
BW: Yeah, and we definitely want a way to provide incentives to these companies to be like “we should consider this.” We can’t just call it the user’s fault. Europe is doing better at this than the States. European banking system has proven more interest in trying out new technologies to improve stuff. You know chip and pin has been there for a long long time and it’s only starting to show up in the States.
ST: I can’t imagine not having chip and pin now, it’s just so easy.
BW: It depends a lot on the regulatory and the consumer liability framework of the different countries. You know in the States, the credit card companies just kind of cover everything. And so it’s really really consumer friendly and they were pushed in that direction by legal requirements, but it means that they, that there is tremendous amount of overhead to pay for fraud management. It’s multiple percentage points that is just in overhead. But as a result people are willing to use their credit cards just everywhere. Just give your credit card to the waiter and they walk away with it and they come back, and maybe you’ve signed something.
ST: Maybe you have, maybe you haven’t.
BW: You signed for something. How many times have you signed, I don’t know. But there are other requirements. I remember being told that Japan in the 80s and 90s didn’t have any, didn’t have strong protections on credit cards or debit cards and so people just refused to use them. You maybe had a debit card, an ATM card, but if anybody else learned that number, they could drain that account and have no recourse. And so people leave them at home locked up and never ever use them anywhere.
ST: So it’s cash transactions and checks?
BW: I think it was cache and checks at that time. And even today, small businesses get very little liability protection, so a lot target them first because they know that the charges won’t get rolled back. It’s nowhere near as well protected as a consumer would be.
ST: I’ve jumped out of order on my questions already, so I’m going to go back a couple of steps. Before a protocol designer, like you did with Sync, ever sits down and writes a spec or line of code, what should they be thinking about?
BW: I’d say think about what your users need. so kind of boil down what they are trying to accomplish into something minimal and pretty basic. And figure out what the smallest amount of code, the smallest amount of power that you can provide that will meet those needs.
ST: So this is like the agile version of developing ann.. ayah, protocol.
BW: Yeah. Minimalism is definitely useful. And then think about kind of. Once you have the basic API that enables you to do what needs to be done, then think about all of the bad things that could be done with that API. And try and work out, how to prevent them, or make them too expensive worthwhile.
ST: So sometimes you can’t eliminate the threat, but you can make it prohibitively expensive.
BW: Exactly, I really think that that, kind of computer security. I think that our industry is pretty immature. That we are still really learning good techniques and good practices for this. So there is a big class of problems that I kind of it’s probably kind of unfair, callous, I call them beginners mistakes. In some sense it’s the type of mistakes a beginner programmer would make, but also a beginning industry, an immature industry like what we have would make. And I think the more mature ones as like, maybe airline transportation or satellite design, or medicine where they have had several hundred years to figure out the feedback loop so every time there is a problem they investigate it, they learn from it, they update guidelines,. You know which is why air traffic is so safe. Especially compared to how many people fly all the time. Cause they are really really good, they really care deeply about fixing stuff.
ST: There is actually an article in CACM this month, or last month, about how the airline industry writes such.. has so few problems. It basically comes down to being rigorous in the development process and the review process.
BW: Right, and ours. The software industry is not there yet, and in some ways it shouldn’t be there. We couldn’t accomplish as much, but I think that we still have a lot to learn. And so, what I kind of described, there are a set of beginner mistakes, which are really a beginner industry mistakes, like buffer overruns, languages that aren’t memory safe, not validating inputs, and stuff like that, but I think that once you get past those a lot of computer security is about linguistics and about game theory and psychology. So it’s like what is. What are the bad outcomes that could exist here? You know somebody gains access that they shouldn’t, somebody stealing money from you, somebody posting things in your name and hurting your reputation, or taking advantage of other people’s opinions, and so what do they hope to gain from that? What do they have to do or break into or guess or get lucky or spend to accomplish that? What sort of risks are they taking on by doing that? Can you rearrange the system so that their incentives encourage them to do the good thing instead of the bad thing? So bit coin was very carefully thought through in this space where there are these clear points where a bad guy, where somebody could try and do a double spend, try and do something that is counter to the system, that is not good for the system as a whole, but it is very clear for everybody including the attacker that their effort would be better spent doing the mainstream good thing. They will clearly make more money doing the good thing than the bad thing. So, any rational attacker will not be an attacker anymore, they will be a good participant. And there are some
ST: So incentivise being a good actor at that point?
BW: Yeah, and in other security situations that usually gets expressed as the risk and the cost that they would have to take on in order to make a successful attack is not worth whatever the value of the thing they are attacking is. So somebody can guess passwords, you can’t distinguish between a legitimate user and a bad guy trying to guess a password until they guess so many of them you can say “well, that’s probably not the real user because the real user probably would have figured out their own password by now.” But you can try and make it so that it’s so expensive for the bad guy to do the attack that there expected return is so low that they are not going to bother.
ST: I’m learning a lot here by the way, thank you for this. Implementing a secure system is difficult, how can a system designer or implementer maximise their chances of developing a reasonably secure system? What are common problems you see out on the web?
BW: I’d say the biggest guideline is principal of least authority. So POLA is sometimes how that is expressed. The idea is that the system should be, your system should. Any component should have as little power as necessary to do the specific job that it needed to do. And that has a bunch of implications and one of them is that your system should be built out of separate components, and those components should actually be isolated so that if one of them goes crazy or gets compromised or just misbehaves, has a bug, then, it doesn’t have any more power than it needed. and so if like, the example I’d like to use here is like a, say a decompression routine. You know, something like gzip, and you’ve got bytes coming in over the wire, and you are trying to expand them before you try and do other processing with them. As a software component, that should have, it should be this isolated little bundle of 2 wires. One should have wire coming in with compressed bytes and the other one with decompressed data coming out the other side. It’s gotta allocate memory and do all kinds of format processing and lookup tables and whatnot, but, nothing that box can do no matter weird input you give to it, or how malicious that box you give to it is, can do anything other than spitting bytes out the other side. So it’s a little bit like unix process isolation, except that each process can do sys calls that can trash your entire disk, and do network traffic and do all kinds of stuff, where this is just like one pipe in and one pipe out, nothing else. And it’s not always easy to write your code that way, but it’s usually better. It’s a really good engineering practice because it means when you are trying to figure out what is possibly trying to figure out what could possibly be influencing a bit of code you only have to look at that one bit of code. It’s the reason we discourage the use of global variables, it’s the reason we like object oriented design in which class instances can protect their internal state or at least there is a strong convention that you don’t go around poking around that internal state of other objects. The ability to have private state is like the ability to have private property where it means that you can plan what you are doing without potential interference from things you can’t predict. And so the tractability of analysing your software goes way up if things are isolated. It also implies that you need a memory safe language because if you don’t have that then you know
ST: You can start peeking in other people’s business.
BW: Yeah, all over the place. So big, monolithic programs in a non memory safe language are really to develop confidence in. so you know, that’s why I really go for higher level languages that have memory safety to them, even if that means they are not as fast. Most of the time you don’t really need that speed. If you do, it’s usually possible to isolate the thing that you need, into some, a single process.
ST: So what common problems do you see out on the web that sort of violate these principals?
BW: Well in particular, the web is an interesting space because it’s really common for the. We do a better job. We tend to use memory safe languages for the receiver.
ST: You mean like Python and Javascript.
BW: Yeah, and we tend to use more object oriented stuff, more isolation. The big problem that I tend to see on the web you’d call it, failure to validate, sanitise your inputs. Or, failing to escape things like injection attacks. And I always think of those as a type violation, or a type confusion kind of problem. So like HTML, creating HTML to send out, the kind of standard problem that you run into where you tell me what your name is, and I want to display a web page with your name in it. And what I’m going to be doing, there’s the DOM, there’s the HTML that I’m generating. And what I’m expecting to do is like <p>Name</p> to close it off. And so what’s really going on at the end of the day the browser is making this DOM tree, and you can think of it as this tree structure and there’s like this HTML node and the BODY node and the P node and the text is inside of that. And I’m expecting to take this string I got from you and stick it into this one little slot here. And it should only ever go inside that paragraph tag, and it should only be text content of that. I’m intending to allow you to change and control the text content, but I’m certainly not intending to let you change the shape of that overall graph. What’s really going on is, I’m not giving a graph to the browser, I’m giving a a string of serialised HTML. So, I’m starting with this intended graph, and I’m turning that into a string, and I’m letting you modify part of that string, and I’m telling the browser to re-scan that string and build it up into a new graph, and my hope is that the old graph and the new graph are going to be the same shape with just the name swapped out. And what’s going on is there are actually two different types, like programming language types, like int, boolean, string stuff. There are actually two different types that I’m using here and they both look like strings. And there is one which is the serialised representation of an HTML tree, and there is another which is the bytes that your name string is providing me. And there is this third implicit type that we never actually construct which is the tree shape. And I’m accidentally silently converting between the name type and the serialised HTML tree type by interpolating the name that you gave me into this serialised HTML tree. And then the common problem with that is where the string that you gave me as a name, you are not telling me Shane, you are actually telling me <script>do something evil</script>, and so if you give me that the proper thing for me to do is to escape those brackets so that when I present that paragraph tag, it will say “Hello, <p>!” and the open bracket will turn into an &lt;, that kind of stuff, there is that kind of stuff. There is that string translation to get from type #1 to type #2 that I forgot to do because they look identical to me and so many of the values in type #1 actually map into the exact same series of bytes in type #2. Anything that doesn’t have a bracket in it, it’s a one to one mapping.
ST: It’s OK, but if it does have a bracket, different story.
BW: Yeah. So it would be great if we didn’t express HTML in string form. If we had to express it in some other way, then it’d be harder to make that mistake.
ST: So if we represented it in the raw tree form all the time.
BW: Right, in the program you build up this tree object. Maybe you use DOM functions there to generate something. And at the end of that process, on the server side, now serialise that tree structure out into HTML. Then from one process it goes from a tree to bytes back into a tree, and you never actually manipulate the bytes part. You say that’s some necessary wire level protocol that most developers, most HTML authors would never have to touch. So there is a category of problems like that on the web and it shows up kind of any time that you are touching strings. Any time you are touching strings, they are going to be interpreted by something else.
ST: The next question, you have a lot of experience reviewing already written implementations, Persona is one example. What common problems do you see on each of the front and back ends?
BW: Yeah, it tends to be the problems that I remember seeing will be escaping things, a lot of it will be making assumptions about where data comes from and how much an attacker of gets control over it that turn out to be faulty.
ST: So, this is trying to figure out, um, data sources is what you are saying?
BW: Yeah.
ST: Or source of the data.
BW: Yeah, the kind of standard, coming up with abstraction boundaries. The standard engineering technique of breaking a problem down into smaller pieces. And you have a module, and it takes 3 inputs, and you make sure that module does the right thing when it does the right thing when it gets the right inputs and you have to make some assumptions about whether or not it’s possible to be given the wrong inputs. And so, you know, is the relying party name that’s passed into this module coming from one of our own modules or is it coming from a potential attacker? Has somebody else already done the job of deciding if this is a good string or not, or do I need to be responsible for that too? And if you guess wrong or if something changes, you know, somebody drops the ball and is like “I thought you had it”, “oh I thought you had it” then you can wind up with an opening there where somebody crafts the right kind of value and it can get misinterpreted in a certain way and you end up with one of those injection attacks.
ST: So, is that why you advocated validating all of the input that came from the RP basically as soon as it entered into the system? You know it was as we got it from the backend we had to validate it, and if it came from the RP we had to validate it.
BW: Yeah, I felt that. I like thinking in terms of type distinctions, so that, you know, there is this one type that is “thing that comes from the bad guy” or “thing that comes from outside your kind of trusted perimeter” and you are intending to let them influence some messages, but not others, so you have to make it clear that this is a name, it is intended to be presented. Sometimes this will be put into a log file, sometimes this will be put into an HTML span, and you are supposed to see those same letters up on the screen. And what I would really love is if the language we were using had distinct types for those two types, those different things. And it wouldn’t implicitly convert from one to the other. So, when you add, take that name and add it to a string that’s actually serialised HTML, that should, that’s the point at which the brackets should get turned into &lt; But in a log file, you don’t need that, but that means that the string that gets added to the log file is of type “name” and it’s different than in HTML, and so if you take that log line and dump it into some other page, like a debugging page, it’s going to take your log information and show it on an HTML page, that’s gotta do the transformation too. So yeah, what I was advocating there is for there is that transformation takes place at the boundary between externally provided data and internally kind of sanitised stuff. And that the type the type is very clearly, like I almost wanted to do the convention where you put _<type_name> on all variables and all arguments as you pass them around.
ST: This is using Hungarian Notation.
BW: Kind of, yeah. Usually when I see that notation, it’s pretty ugly, but sometimes I appreciate the reminder that you know, this is not not escaped. This may have the wrong kind of characters.
ST: That’s super interesting, I never thought of that. Is this one of the reasons you also advocated making it easy to trace how the data flows through the system?
BW: Yeah, definitely, it’d be nice if you could kind of zoom out of the code and see a bunch of little connected components with little lines running between them, and to say, “OK, how did this module come up with this name string? Oh, well it came from this one. Where did it come from there? Then trace it back to the point where, HERE that name string actually comes from a user submitted parameter. This is coming from the browser, and the browser is generating it as the domain of the, sending domain of the postMessage. OK, how much control does the attacker have over one of those? What could they do that would be surprising to us? And then, work out at any given point what the type is, see where the transition is from one type to another, and notice if there are any points where you are failing to do that, that transformation or you are getting the type confused. Definitely, simplicity and visibility and tractable analysis are the keys.
ST: So, what can people do to make that, make that simpler? Make that data flow trace simpler?
BW: I think, minimising interactions between different pieces of code is a really big thing. Isolate behaviour to specific small areas. Try and break up the overall functionality into pieces that make sense.
ST: So like god modules are just horrible ideas.
BW: Yeah, and likewise, super tiny ones might not be correct either. It’s always kind of a judgement call there. But, you know, there’s a lot of division of responsibility in human effort to get a bunch of effort together - OK, you are going to do this, you are going to do that, and doing the same thing in programs where you know whatever processing has to be done, all that complexity is isolated and doesn’t have to affect anybody outside. And recognise that it’s a big deal when you change the contract between two pieces. Try and arrange the dotted lines so that you don’t have to do that quite so often. So, anything that helps analysability helps. That means, threads are usually wrong.
ST: So, for backend code, threads make it difficult? And I imagine communicate shared memory that’s doubly difficult.
BW: Right, and that’s the main problem. When you are looking at a piece of code and you are trying to figure out what could happen, you have to be aware of anything else that could affect the memory space at any time. If you don’t have threads, or if that entire piece of code is run inside of a big lock, then you know that the only code you need to look at is the code that is on the screen in front of you. If there is a global variable that it references, then that variable won’t change while you are changing the code, but the value that it starts at and what it might affect by modifying that global variable might matter. So somebody else might get hit by what you do inside of this function. If you’ve got threads, then every single point along that, in the middle of a line, then somebody could jump in and change the state. So it’s like trying to do your taxes in paper on your desk while somebody is trying to clean your desk on your or rearranging your stuff.
ST: And I imagine troublesome on multicore processors where they can actually, you know, CPUs, cores, can go after the same memory at the same time.
BW: And it makes it hard to take advantage of multiple cores, and that’s an important thing to do because CPUs aren’t getting faster, they are just getting wider. You need to shape the problem so that those different cores can correctly work on different problems at the same time and not actually overlap. So yeah, threads are really bad because the analysis complexity just totally blows up. Any kind of global variables, any kind of side effects, things where you know, mutability is kind of a drag for that. If you could look at a value and know that is is just a label for some static immutable value, that’s simplifies the analysis. There are languages where macros can be good for expressing some stuff, but it makes it harder, you have to look at more code to figure out what that piece is doing. Sometimes a language can be too flexible, and if you can do operator overloading and somebody is being too clever with it, you have to figure out what that is.
ST: Even a ah, perhaps more mundane example, whenever you reviewed the Persona code, I really remember you saying that it was really difficult to trace the data flow through because I would shove all these data, all these bits of data onto an object, under named items in the object and I would just pass the object around, so you didn’t actually know where the data was coming from or how it got there. So I’ve always remembered that since then.
BW: And it’s interesting because different languages end up with different patterns for this. At the time, I’m still not a very strong Javascript developer, at the time, I was mostly doing Python and there the convention would probably end up being keyword arguments. You know named keyword arguments when you are trying to pass multiple things through. The Javascript convention seems to be passing in a single options arguments that has a bunch of other stuff on it. And so looking at the function signature you can’t tell what is actually getting passed in. And that’s just a … of the language and not having direct support for keyword arguments. And so, looking at that, seeing this object called “options” everywhere, that in fact had a bunch of mandatory properties, so I was like “alright, nothing very optional about these options.” And the fact that every single had a variable of the same name, but all of them were different was challenging. That’s one practice that I’ve tried to, in some of the python code that I do, is to try to come up with a distinctive name for, this is the download handler, then have a name for that. Anywhere you hold a variable of that type, use the same name for that purpose. So that when you are grepping around, you know, a lot of this is like grep oriented programming. What can you do to help somebody that has your source tree and grep, but doesn’t know their way around to find the parts that are relevant. A strongly typed language with some good IDE support could let you find all of the instances of the particular kind of value, and that’s probably a good indicator.
ST: Do you mean things like intellisense and eclipse?
BW: Yeah, it would be like my download handler is an instance of a class called DownloadHandler, tell me all of the variables or all of the arguments in the system that are of that same type. In a language where you have to tell the language about that everywhere, it’d be something explicitly typed like C++ or Java. Then that would give you as the explorer, as a kind of archeologist looking through this stuff to be like “oh, well, here’s a place that can create one of those values, and here’s a place you might pass it to, and here is a place that asks it to do something.” Those are probably the relevant bits that I need to look at. In a duck typing language, you don’t that type information to work with, so the best you can do is like variable names that are all consistent and use the same thing, or comments that say “This function takes an object of class X, or abstract base class, or interface X, and use that to get a guide of what’s going on. But it’s tough because it also means, it has to be kind of a global convention, you have to say, “I’m not going to call anything a download handler unless it really is one of these things.” That means the names can’t be too short, otherwise it’s like “well, here’s a thing that has handles, and this is the handle that downloads, and DownloadHandler, well no, damn.” So, it’s tricky, and it ends up imposing some linguistic constraints on your prgram as a whole.
ST: So what is defence in depth and how can developers use it in their system?
BW: So, it’s a bit like principal of lease authority where. Well, belt and suspenders is the classic phrase. If one thing goes wrong, the other thing will protect you.
ST: I’ve never actually heard that before.
BW: Really? It’s. You look silly if you are wearing both a belt and suspenders because they are two independent tools that help you keep your pants on, but sometimes belts break, and sometimes suspenders break, and the chances that both of them will break before you get home, it protects you from the embarrassment of having your pants fall off. And that’s funny, because the guy I know that uses that phrase is from London, and I was assuming it was sort of a Britishism. So defence in depth usually means don’t depend upon perimeter security.
ST: So that would be what you were talking about before where you only scrub things on the, say the input right. Where data enters the system, where. Does this mean you should also be checking down the line, even though you think the data was scrubbed down there, if it’s critical at this moment that this be escaped, I should make sure that the stuff is escaped before I write it out to the DOM.
BW: Yeah, it can frequently be a good idea. There is always kind a judgement call about performance cost, or, the complexity cost. If your code is filled with sanity checking, then that can distract the person who is reading your code from seeing what real functionality is taking place. And then, that limits their ability to understand your code, which is important to be able to use it correctly and satisfy its needs, and stuff like that. So, it’s always this kind of judgement call and tension between being too verbose and not being verbose enough, or having too much checking. But yeah, one. The notion of perimeter security, it’s really easy to fall into this trap of thinking of, drawing this dotted line around the outside of your program and saying “the bad guys are out there, and everyone inside is good” and then implementing whatever defences you are going to do at that boundary and nothing further inside. And I think, I was talking with some folks and their opinion was that there is evolutionary biology and evolutionary sociology reasons for this. That humans developed in these, you know, like tens of thousands, up to hundreds of thousands of years ago in these tribes where basically you are related to everyone else in the tribe and there are maybe 100 people, and you live far away from the next tribe over. And the rule was basically if you are related to somebody then you trust them, and if you aren’t related, you kill them on site. And it’s like, that worked for a while, but you can’t build any social structure larger than 100 people, but the reals are really simple. And I think that that sort of translated into, we still think that way when we come to computers. And we think that like “bad guy”, “good guy”, I only have to defend against the bad guy. But, we can’t distinguish between the two of them on the internet, and the good guys make mistakes too. So, the principal of least authority thing and the idea of having separate software components are all very independent and they have very limited access to each other means that, if a component breaks because somebody compromised it, or somebody tricked it into behaving differently than you expected, or it’s just buggy, then the damage that it can do is limited because the next component is not going to be willing to do that much for it.
ST: Is this one of the reasons why Persona there is a notion a database reader and a database writer, where these were two isolated processes?
BW: I think so there may also be performance or scalability reasons for that. I know sometimes that like, having a database view that’s readonly means that it’s usable in circumstances like partition or replicas not being online, where you could keep reading from it, but you are not allowed to write to it anymore, so maybe you get some functionality from it even if you don’t have a quorum of writers. I think it’s also a good security tool where no matter what you ask the database reader to do, it’s not going to modify the database. And a lot of the operations don’t require modification to the database. So, you can think of it, there’s this security pattern discipline called object capability security that I’m a really big fan of. And the idea there is that each object in your object oriented programming language is isolated from the others. And you can invoke methods on it, but you basically sending messages to it. Please do something, please do X. And, the abilities that that object has, the things that can it can influence are exactly specified by the object references, pointers that it has to other objects. So, you can look at any object and know what it can do and can’t do based on what it has access to, what other objects it has references to. And you can make up new objects any time you want that encapsulates the subses of functionality that you care about. And in that world, if you think about it from the attacker’s point of view… So, the traditional big monolithic application, as the author’s and defender in that game, there are all of these potential holes that you might have to go and plug. You know, every message coming into your system might be malformed, it might be longer than you expect, it might be formatted strangely, so you are worrying about buffer overflows, you are wondering whether, you are trying to accept messages that tell you to do X, but you are trying to reject messages telling you to do Y, so all of those points, it’s sort of like, a big castle with a strong exterior wall, but it’s a really big wall, so there are lots of places where there might be hole in it, and no defences inside whatsoever. So, once those attackers breach any point in the wall, then they are in the court yard and they just take over everything. And so, a big program that has lots of complicated string handling, written in C, is like that because one buffer overflow means that now you are executing the attackers code and it’s not your perimeter anymore under your total control. The other approach is more like, this particular API call is expecting strings, and it parses them, but it’s in a memory safe language, so if something goes wrong, it’s not gonna take over the whole program. And any way, all that API call does is send well formed messages with a tree of arguments to this place which then does the next part of the processing. So you are matching a bunch of little isolated buildings, little armored cars or something, and each one has a smaller security perimeter, there are a lot more of them, but more importantly, the only way to get to the database, if your goal, if the prize of the attacker is to get to that database, the only thing that knows how to touch the database is the database reader. And the only way that you can get to the database reader is to attack the command processor, and the only way you can get to the command processor is to, and that that that. So, from a defender’s point of view, if anything along that chain survives that attack, then the attacker can’t get through. And from the attacker’s point of view, there’s this chain of stuff, and I have to compromise every single one of those, and the only handle I have on this thing is whatever the previous thing can reach. It’s like, even if I could completely, the API input string scanner was so so totally broken that I was able to compromise it entirely, it can’t do all that much. And it’s like the example I had before with the text decompressor, or in a web browser, the jpeg decompressor. If it’s properly isolated, then the worst thing that an attacker can do by trying to do a buffer overflow, sending you malformed jpegs or something, that process, that component should take in compressed data and spit out pixels, like (x, y, colour), and nothing else. The worst case is that they can like put rude pictures in front of you. They should not be able to take over anything else, there is no reason for the jpeg decompressor to have access to your disk. So, in a nicely isolated system like this, the attacker’s job, mark miller addressed this nicely, the attacker’s job is exponential, whereas the defender’s job is linear. In the monolithic application, the defender’s job is exponential, and the attacker’s job is linear. The bigger the program, the more potential access points, and they only need one. Whereas the defender, you have more and more code and more interactions you have to protect.
ST: I’ve taken over the hour that I requested with you.
BW: No worries, we can just keep going if you want.
ST: Do you think that common standards, like coding styles, really helps out with reviewing anyways, or help out with overall security?
BW: I think it can, I think that the ability to read code, and correctly reason about what it does is really critical. And to the extend that coding standards help you do that, then yeah, it helps security.
ST: Do you think that it, things like coding standards, reduce the amount of cognitive load because you don’t have to keep swapping between different styles and stuff like that, you can just focus on the task.
BW: Yeah, there was a really interesting paper a couple of months ago where some folks were, they were putting programming problems in front of somebody.. Here’s some basic python or some psuedocode, what does it do? Maybe C. And they introduced errors into it. And so it’d be things like you know, missing the braces on a C if-then-else statement, stuff like that. And what they found with that with experienced programmers got it wrong more frequently than beginning programmers, because the experienced programmer, you know, the particular, part of it is also indentation changes. In C where whitespace doesn’t matter, but there are certain conventions that the experienced programmers knew about that the beginners didn’t. And so, if you see that line indented in a particular way, then you think, “well, any sane programmer would intend for that to do the following”, and the experienced programmers assumed that the author of this code was a sane, experienced programmer and meant to do something. And so they didn’t bother looking for the details because they weren’t defending an inexperienced programmer making mistakes when they were reading this stuff. And I think there are cases there where if you.. Yeah, it reduces cognitive load because if it’s formatted correctly and if the language gives you some extra support there where it doesn’t let you make some of the basic mistakes like, “oh, there’s a missing brace and that changes the behaviour entirely” or there is a missing semicolon and now the Javascript parses entirely differently. Then, you don’t have to think as much while you are reading through it. And it is a question of whether defensive code analysis, where the author of this code is deliberately trying to trick you, or if it is friendly code and it’s like “well, maybe they just had a bad day, and they had a mistake”, but most of the time they get it right, and then I’m reviewing this for the higher level control flow as opposed to the low level and spelling mistakes and stuff.
ST: Do you use any tools to help you ensure the code that you write is relatively secure? Or as secure as you think it can be?
BW: I don’t know that I do. There are, for python, I mean, there is the standard stuff like unit tests, trying to run those all of the time, trying to structure your program in such a way that you can run tests on isolated pieces of it, so that you can run them very quickly. When I’m deep in the program, I’m trying to run the unit tests every couple of seconds, so I really want them to be small enough and fast enough and in a tight enough loop to do that. I use, this is mostly on python, I use code coverage tools to convince myself that the tests are actually exercising all of the different cases, and so I have some stuff like coverage on python, and some emacs integration stuff that I wrote to colour the lines in the editor based upon whether the code coverage tools announce whether they were included or not. And I know there is stuff like that in the Javascript world.
ST: We are using one on the content server, but it doesn’t integrate nicely with our editor.
BW: We can fix that, some of that is like, figure out what the coverage tool emits as a list of line numbers, and then writing stuff for the editor that knows how to read that, and then mark up the lines. And there is some static, in python there is a great tool called PyFlakes which is really fast, so it’s like the kind of thing that you can arrange to run every time you save the file to disk. And that just tells you stuff like there’s a variable here that’s cited but not actually defined or referenced anywhere else, that’s probably a typo. You forgot an input statement here, there’s a syntax error over here.
ST: So linters.
BW: Yeah, so that’s nice to get through the basic, you know, I’m a bit embarrassed to reveal to the unit test suite that I forgot that return statement, some really basic stupid error, so it’s like pyflakes is my little buddy helping me look so dumb the unit test code. And the unit test code is what helps me look not so dumb to the reviewers and stuff like that.
ST: So, I’ve gotta ask about the Javascript and the browser at the end. But where can other developers, this is actually, the writeup eventually go on hacks. Where can developers go to learn what they should be doing?
BW: That’s a good question. Yeah, I don’t, I guess for any given language there’s kind of the current popular unit test frameworks, the documentation there, or the people that wrote it, or the people that use it, and advocate for it, will probably have blog posts and essays about the benefits of doing that, and using those things. I’d say find a software project is well run and look to see what they are toolchain is. I know I learned a lot when I first got into python, I started using twisted and participating in Twisted coding. And I think their practices are just first rate, and I learned a lot about how to test stuff and how to structure programs, just how to name methods and name classes. Kind of like write categories of speech, whether it should be a noun or verb, or you know message names written from the point of view of the receiver or written from the point of view of the sender is really interesting anthromorphisation that helps you read what’s going on later and to understand stuff. Kind of telling a story with your code. I learned a lot from that community. There are other projects in other languages that could do that.
ST: Reading other people’s code. Reading good examples of code helps out?
BW: Yeah, and also watching how they work. Sometimes if you are chatting with people on IRC and they are walking you through working on your first bug in some other project, and they say “OK, the way we work here is we write patch and we submit it here, and there is this review thing, and there is this test thing.” It can also be really valuable to meet these people in person, go to sprints, go to conferences, and sit down with them and look over their shoulder while they tackle these things. And that’s when you start to really discover what tools they are using.
ST: That’s interesting, and now one last question and I’ve really gotta get out of here. A lot of people say that you shouldn’t do crypto in the browser, the browser can’t be a trusted environment, blah, blah blah. And, well, why is it thought to be a bad a idea, and we seem to do it all over the place, and we are pretty confident that we can do it safely.
BW: It’s an excellent question, there are maybe 3 or 4 different dimensions to it. Some are better than others. One dimension is that Javascript is a frustrating language to do low level math in, to do low level binary operations. And especially to do crypto fast, you really need to do a lot of bit twiddling, a lot of XORs, a lot of shifts, and Javascript has this 32 bit unsigned integer type, kind of, sort of. But you have to work really hard to get it. So, that’s getting better with like asm.js
ST: Do typed arrays help out?
BW: It does, it does, and so these days the way you do that kind of crypto in the browser is to write it in C, and then compile it into asm, because they’ll probably get those low level things correct for you. Typed arrays is another example where Javascript doesn’t make it easy to handle binary data, just a byte string. There’s UTF, UTF8, but what you really need is your unsigned char * from C so that you can point at stuff. So you have to do extra work to get at that stuff and that’s annoying. That’s slowly getting better. Another technique that helps with that is to write stuff in NodeJS style, using the buffer, and then run Browserify to turn that into something the browser can use, because that knows how to use Typed Arrays. So, I really kind of thing the way to write that kind of crypto stuff these days is write your low level shifting stuff with, in C and asm.js and stick that into a module, and write your middle level passing data in, slicing dicing stuff with buffers, and browserify it and wind up with something in JS. And then you can use your higher level stuff. That’s one axis. Another axis is who is supplying the program you think you are running, and who are you trying to protect yourself against? So there’s this spectrum of static to very dynamic code delivery mechanisms. My alarm clock at home. If you were an attacker and you wanted to somehow get me, there’s like this one day, and there’s this critical meeting or a critical interview and you wanted to cause me to wake up an hour late and miss it. You either have to break into my house and swap out the alarm clock, or like bribe my girlfriend to change the time on it, or go backwards in time 20 years when I got that thing as like a high school graduation gift and swap out the chip before they got it. Super static, there is nothing dynamic in there whatsoever. Then there’s like code that you, a program that you installed from CD-ROM back in the 80s, you’d have to grab the CD-ROM on its way to my house and swap that out for a different one with a different program. Then there’s like, stuff that you install and there is an automatic update process. And so, the operating system, OSX, if you can kind of trick that into installing the wrong thing, then you can get control later. Then there’s code that’s getting updated all of the time. There’s code that’s delivered to your browser, every single time, just before you run it. So, web pages are very dynamic and the, in addition to being much more difficult to attack the alarm clock, it’s much riskier because that thing is sitting inside my house, and if I think there is something wrong, I have this evidence of it in my hand. Whereas the web page is there, it does something, and it’s gone. So as a attacker in control of the site that’s giving you the code, or maybe in control of the network, or I can somehow violate SSL rules, then I can tamper with your experience and then erase the evidence. It’s fairly easy to attack and it’s fairly low risk. That mostly matters if your threat model includes the person who’s providing you with the code. And it may be that you are loading a web page from site A, and your threat model is site B. And so, you are getting a tool from site A, that’s encrypting some data, and then storying the cipher text in site B. And you are totally fine with site A seeing your stuff, but it’s B that you are trying to protect against. There is also a model in which an add-on, it’s written in Javascript, but the code is actually coming from an add-on. Which means the update rules, it only gets updated when the browser decides to go and fetch add ons. It’s more like the software in the native application that has an automatic update process, it’s a little bit less dynamic. Yeah, so that’s another axis. The availability of libraries to do crypto, to do it quickly, that have been well reviewed, is another axis.
ST: Is the Stanford one respectable?
BW: It’s not bad, the only problem is that it’s not really maintained. I’ve talked to a couple of the authors and they kind don’t really want to have anything more to do with it. So it’s troublesome that there are not a lot of parties out there that write this stuff and want to keep writing it, and be responsible for it. And that’ll shift over time. And I think the final axis the difficulty of doing.. Timing safe is difficult in Javascript. A lot of crypto algorithms take different amounts of time depending upon what your key is.
ST: So, you can figure out the length of the key?
BW: If you observe, you can figure out, or you can figure out the bits of it. There are cases in which you can actually find out what the secret key is by these sort of, watching at a distance how long your computer takes to do certain things, or how busy it is. There are environments in which, if I can run a program on your computer at the same time as you are doing something with your secret key, then based upon which cache lines get knocked out, I can figure out what your key is. And in a web environment that’s relatively easy because your web browser is running programs on behalf of all sorts of different web servers, all at the same time. So it’s not a super isolated environment, it’s kind of leaky that way. And it’s easier to write timing insensitive code in a lower level language where we have more control over each instruction potentially. So javascript doesn’t give you that. And I think the final thing was, just like performance and people. The browser is really big. Like the amount of code there dwarfs a kernel, you know, kind of every other program, because it’s becoming the operating system. And the bigger the program is, the more security holes there are. You know, somebody has to worry about, it’s got an XML parser, it might have flash, it’s got these video parsers, image parsers, and all of those things are different angles. And so somebody’s who is kind of serious about crypto is going to be anxious about doing it inside a browser, or be anxious about doing it on a computer with a browser installed, because that’s such a big attack surface. But, if you actually want to get stuff delivered to lots of people, the web is the best way to do it these days.
ST: So why are we somewhat confident that we can do it safely, at least somewhat safely?
BW: I think that, that first axis on who’s supplying the code and what is the threat model, that we have, we are in the right position for that. So in the case of BrowserID, either the browser in a native implementation, or the site that gives you the shim are able to change the crypto code, but not the IdP, and not the RP.
ST: We control everything at that point.
BW: Right. It’s like, we are willing to rely upon the shim provider in exchange for which we get, we are, the threat model is the IdP is trying to do X, but we are trying to keep them from doing Y, and the RP is trying to do something… Those are the pieces that are at arms length, so along that particular axis, I think we are in the appropriate spot to do crypto inside the browser. The operation that we are doing is fast enough, you know, people at SJCL to do the signature code that we need, so yeah, Javascript is not the friendliest language for doing that, but it’s not too bad, and somebody else has made it work. And we’re not doing these operations very quickly, so the timing stuff, we are kind of willing to believe that it’s good enough, it’s always this kind of like “well, it’s a little bit risky, you know, driving a car is not really a very safe thing, but I need to get to work and it’s too far to take the train.” That’s always kind of the trade off there.
ST: Thank you Brian.