Skip to content

Self indulgent flow of consciousness on what I have learned about software culture and quality from a 2 decade career

License

Notifications You must be signed in to change notification settings

rgoomar/culture-of-quality

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Experiments in a Culture of Quality

Holy Cow! This is way too long! Take me to the summary!

Dedication

I have been a computer programmer for over 30 years. I have worked long term in 3 different fortune 500 companies, I have seen 2 successful startups from inception to sale, my name is on several software patents and patent applications, and my resume reads like a list of things that were never done before, some considered impossible. However, I'm not a very good programmer. In fact, my only B in my freshman year of college was the introductory computer programming course - otherwise I would have had a 4.0 GPA. I was good enough, however to make money for college to pursue my dream of being an audio engineer. There were not many jobs for audio engineers at the time I graduated so I entered the world of professional software development. I struggled and I felt confused most the time. My curiosity kept me going and thanks to some smart and patient mentors, I learned some things along the way. About 2 years ago, the decades of coding with bad ergonomics started to catch up with me. Persistent pain and numbness, especially in my right hand and wrist became a daily part of my life. For the first time in my career I took a role where I was able to hire a small team and take a more visionary role and do less coding. It was in these two years that the knowledge of my career started to come together in my thinking and organize around a central theme of what I call a "Culture of Quality" which combines elements of good company culture, with software best practices for quality, and eventually arrives at a place that realizes that these things are deeply related. I am gratefully and humbly indebted to all those with whom I have had the joy of collaborating. In this document, I use the words "We" and "Our" to refer to all of us collectively: team members, mentors, mentees, users, customers, testers, managers, product owners, and anyone else who conspired with me to achieve better quality and culture. "We" came to include more and more people as time went on and now spans several different companies.

Executive Summary

The foundation of both good quality and good culture can be traced to a single word: accountability. Good quality requires significant and consistent investment at every step in the development process. This is dangerously easy to overlook in the daily grind of a software business. The organization, the individuals, the tools, and the practices must provide a system of accountability to ensure the right things are done to obtain sustainable quality. What follows is some of the experiments, lessons, and observations we have encountered along the way to a Culture of Quality.

Standards and Measurements

According to Google, Quality is defined as "The standard of something as measured against other things of a similar kind; the degree of excellence of something." Without diving into philosophical discussions of what constitutes quality or what excellence really means, we see two words in the very first part of the definition: "standard" and "measured". Quality must be measured and it must be measured against a standard. This is the first hurdle in the software development process. Even in research and development, where there is no set standard, we measure against what we did yesterday. If we wait until a product is fully developed to measure it, we find ourselves prohibitively limited to "black box testing". If the interfaces are not properly managed (more on this later), we will also waste a lot of time reverse engineering before we can even begin measuring this small subset of what we need to.

A critical mistake is often made right here at the beginning. Because there is no standard, no measurement capabilities are built into the software. We learned a lot when we had a project with a real SLA (Service Level Agreement) where a certain performance standard must be made in order to be paid. With limited black box testing we could send a request into the system and ensure the response came back accurately in the specified time limit. But what if it was late? This system encountered a complex set of software products on a distributed architecture that could not even all run on one laptop. It might take days or even weeks to nail down the root cause. We would bleed money the whole time while our system was out of spec.

We developed something that could fire a request into the system with some flags turned on and we would get performance metrics on every single critical step through a vast architecture of services and nodes. Furthermore, we could do that automatically at regular intervals and alert people if there were problems. It could immediately identify, for example, that we had a network connectivity issue to a DNS server. Such an issue might have taken days to debug manually.

This type of quality measurement does not come for free. The ability to do this had to be built into the software itself. An interface had to be developed whereby if a request came into the system with certain flags turned on, each critical component could understand these flags, measure what it was supposed to and append that information to the request as it moved through the system. Eventually, on the way out of the system, the response would contain all the performance information that was requested. It can be then reported to a central server for automated alerts, reporting, historical trending, and other analysis. Despite the upfront cost, we found that it not only saved us money, it actually allowed us to help our customer debug problems in their network.

The power and utility of this made us want to expand its reach. We imagined a more generalized "Tracer Bullet" API where all sorts of measurement requests could be easily sent into the system. What if we could read the CPU usage on every box that serviced that request in the architecture? What if we could request a single log record so that we did not have to access some hard-to-reach system? What if we could read thread counts, page faults, active users, disk usage, or anything else we wanted in a way that is extremely easy to automate. We never got to take this near as far as we wanted. Adding such measurement features across a wide variety of codebases and architecture components proved to be a tough sell. Surely, the time to develop and maintain such a facility would pay for itself several times over. It seemed that the main problem was a lack of priority. It is possible that the culture does not sufficiently reward this type of endeavor (more on this later). Maybe without years of experience in systems engineering, system testing, and working right alongside customers, one would not accurately value such a utility.

Debuggability

The cultural and organizational barriers to quality have taught us a great deal. If we can't build quality measurements into the system, we must certainly make it easy to debug when there are failures. We found many simple things that helped. One of the simplest, most effective techniques we used was to provide unique error numbers to every user-visible error message. These unique error numbers were traceable to a single line of source code. If a call came in from the customer that said "We are seeing an error that says 'General Connection Failure. (011)'", we knew without even reproducing the problem what line of code had failed. This saved us countless hours in debugging system issues. Not only that, we improved how we handled error cases in our code. Scenarios that were grouped into a single error leg got split out so that exact error conditions were identified. Then, instead of issuing messages like "General Connection Failure (011)"" we began writing messages like "Proxy is unreachable. Please Check your proxy settings and try again (011)" and "Proxy is not specified. Please set the proxy and try again (012)". What we did quickly moved from the realm of debugging to the realm of user experience. We came to see this pattern again and again: that user experience becomes more important as quality and culture improve.

There are many such simple things that can be done. Placing message/protocol version numbers inside of messages, for example, allows not only for backward compatibility but easy debuggability. Embedding build version numbers in content that is sent to a user or browser takes 5 minutes to code and is guaranteed to pay that time back 100x over the life of the product. Even simply paying just a bit more attention to the usefulness of log messages has tremendous impact. These very simple pieces of information are a tiny investment but have huge long-term time savings.

If one has more time, there are greater debug features that can provide even greater returns. One such feature we built was the ability to have debug information emailed to us from a remote device, alleviating the need to retrieve the device and connect a usb cable. The concept is terribly simple, but improves debuggability immensely.

Even if one has no access to the code whatsoever, there are techniques that can help. One thing we found useful is the ability to record and playback traffic into a software system. The impetus for this was a need to reproduce system crashes. We created a tool that takes a standard trace (in this case PCAP) from a given node and reproduce the traffic in and out of the node, then replay it. We found that it not only was able to reproduce crashes, it became a viable platform for functional black box test cases and even allowed us to reverse engineer certain undocumented aspects of the system interface. Once we had the recording of what produces the crash, we could re-run that scenario to ensure that the crash never happened again.

This is all good in theory, but without accountability, it quickly decays and dies. The code with unique error numbers was eventually ported, copied and pasted with new features such that were a dozen different places where one would encounter "Connection Error (006)". Version numbers of protocol messages are overlooked by a developer working in isolation and backwards compatibility broke. Automated functional test cases based on real traffic are not re-run when undocumented protocol updates to an unmanaged interface are made and discarded as broken. We find these practices to be well worth the investment, but they are not free and they will not happen unless quality standards are in place. In most software organizations, debug features are not given the priority they deserve.

Be The Change You Want

One of our many slogans was "Be the Change You Want", meaning that if we wanted change, we have both the freedom and responsibility to create that change. If standards are lacking or incomplete, we have the freedom and responsibility to create or re-imagine them. As an example, let us consider technical debt management. There are many automated tools out there that measure technical debt by analyzing code complexity. We found such standards to be somewhat incomplete because they do not consider developers' knowledge about what is and is not technical debt. Furthermore, these automated methods do not allow one to prioritize technical debt and they may even be inconsistent or unavailable in other languages. Instead of abandoning standards all together, we made our own.

We used comment lines right inside the code with standard formats that indicated the severity of the technical debt, the rough estimate of its effort, and the name of the individual that discovered it. Severity had 3 levels: TODO, FIXME, and DRAGON. TODO was something that would not affect the end user of the product but would be a nice thing to address for maintainability or performance. FIXME was something that had a chance of affecting the end user or at least compromise the running system in some way. DRAGON was a disaster waiting to happen. The effort estimates were SMALL, MEDIUM, and LARGE. SMALL being a day or less, MEDIUM being on the order of days to a week, and LARGE being more than a week. A sample comment line follows. As one can see, this would make technical debt very easy to measure with automated scripts:

//FIXME MEDIUM (werwath) - The items in this queue will be lost
//on system reset and result in unreported erroneous results.

Once standards and measurements are in place it is imperative to make them obviously visible and regularly reported. If one is required to dig for information, it becomes a significant barrier. . Technical debt, unit test coverage, documentation standards conformance, functional test results and performance metrics are ideally displayed in real time on a big monitor in our workspace where we all can see it. This makes quality a central part of what we are doing and keeps us accountable.

How We Work

Accountability to standards is so important for quality that we had even standards for how we worked. We wrote our own code manifesto[2]. These were the 4 practices we felt were most important to follow in order to achieve high quality. We knew we needed the accountability of having these explicit and public. Otherwise it would be too easy to overlook them.

One of the things this manifesto specified was minimum unit test coverage. We had at least 80% unit test coverage on our code before it was moved to our production branch. We created hooks that would actually prevent code from being checked in that failed unit tests or did not meet our coverage standard. We did this not necessarily to find bugs (though we certainly did). Good design and maintainable code (i.e. avoiding bugs in the first place) was our primary goal. Unit tests also gave us confidence to make big changes without feeling doomed that they have broken something. The energy of a team that is empowered and invested in making big improvements and features in a system differs immensely from one that is trapped by fragile code. Even if the unit tests never catch a single bug, the effects on design, maintainability and culture are not only well worth the investment, but essential to the long term health of the product AND the team.

Unit testing led to an obsession with failure. We had a slogan "FAIL HARD, FAIL EARLY, FAIL OFTEN". The philosophy is that (ironically) seeing explicit failures is actually an indication of healthy software. We sought to never have a failure that we did not know about immediately. We planned on failure, and planned on failing big. We built and tested failover features. We even had a test case to pull the cord out of the wall on our test machine. When we fail, we provide as much useful information about the failure as possible for easy debugging. This required a certain level of humility. Our users saw more errors than maybe they were used to. Surprisingly, we found that instead of trusting us less, our users appeared to trust us more and stepped up their level of engagement. One of our users even started to provide us bug FIXES. Granted, this was an internal tool and some might not want to provide that level of information to the public. Even so, its a good example of how a culture of quality comes about by embracing failure. That user ended up taking over and maintaining the codebase of the tool entirely while he was still an intern.

Our attention to failure extended beyond just code. Failures in the organization and process must also be reported. In the same way we made our manifesto public and explicit, we openly took note of barriers to maintainability, testability, debuggability and considered them technical debt. Ideally issues of culture and violations of company values must receive this level of accountability in order to be properly handled. Its bad to violate one of your own values. It's worse not to admit it. Its just like swallowing exceptions in code, but on a much higher level.

Investments in quality are easily preempted by more urgent issues. For this reason, we created what we called Maintenance Friday. Every other Friday our team would isolate ourselves completely from the rest of the organization and do nothing but tasks related to the usability, testability, and maintainability of our code: documentation, automation, unit tests, etc. No meetings, no fires, no bug fixes, no features. We came to consider this like an automatic payroll deduction into a 401K plan. We were investing in our future and it paid dividends. The 81% rule (That at least 80% of the cost of software is maintenance) means that this is a no-brainer investment. Even so, we needed the Maintenance Friday practice to hold us accountable, and to not fall prey to all the more urgent meetings and emergencies.

Documentation Tests and Dumb Users

With the practice of Maintenance Friday in place, we were able to make documentation a first class citizen which to us meant as important as code. We had standard documentation templates for classes, methods, and packages. We wrote automation to ensure that everything was documented completely and to the standard we agreed upon. Whenever possible, documentation is checked in right along side code in text formats. We wrote a cookbook on how to use our software and allowed others to contribute recipes. If we built a new feature, or addressed a new use case, we wrote a recipe so that the knowledge could be easily reused. We found that doing things in this manner allowed us to easily fold other people into the development of this system and eventually hand it over with little training.

For documentation to be a first class citizen, we had to test it.. We developed practice we called "Dumb User Testing" where someone who was completely unfamiliar with a procedure, tool, or documentation that we produced would graciously volunteer their time and we would throw this piece of product over the wall and watch them use it. The humbling experience that ensued showed us many things we thought were obvious were actually confusing, incomplete, or flat out wrong. If a user or a visitor of our documentation had a question about it, we immediately did an update such that question would never be asked again - much like how one might fix a bug in software. There is a clear side benefit too. Someone who has never been exposed to what we are doing got exposure. They seemed to feel good about being able to give feedback and being held in such high regard. Energetically we all become more collaborative and more focused on the quality of user experience for those inside the organization. It was more important than any of us had previously thought. Where there was no previous standard for how to test a procedure or documentation, we now at least had something.

DUMB USER became kind of an honorific term. It moved us deeper and deeper into the realm of user experience. Eventually one click installers and two line procedures became the standard for how our internal users would use our tools. Few teams employ this level of accountability for documentation and procedures. It is not surprising that folks get so frustrated using their own internal information.

Dumb User Testing taught us what "Customer Obsession" really means. If a company keeps a very narrow view of "Customer" to only mean someone that buys the product, then those that sell the product, those that support the product, those that market the product, and those that develop the product are not considered. What do you think the quality of the product will be? The Japanese have a beautifully rich word for customer, which uses very honorific language and the word has extended meanings which also mean visitor and guest. This rich and collective view of customer is what I feel is missing in most software organizations. If you are a software developer, you have undoubtedly had the experience of visiting someone else's code and found yourself immediately confused and frustrated. Most software organizations reward developers for doing what they are told and doing it fast. Very few organizations reward developers that take the time and effort to make something that is easy to understand, maintain, measure, and use.

Agile++

More than any single methodology, the Principles of Agile Software development came closest to describing how we worked. We studied the principles and adopted a very minimal number of ceremonies, developed our own process, and changed it as we saw fit. The principles are listed here for reference:

  1. Customer satisfaction by rapid delivery of useful software
  2. Welcome changing requirements, even late in development
  3. Working software is delivered frequently (weeks rather than months)
  4. Close, daily cooperation between business people and developers
  5. Projects are built around motivated individuals, who should be trusted
  6. Face-to-face conversation is the best form of communication (co-location)
  7. Working software is the principal measure of progress
  8. Sustainable development, able to maintain a constant pace
  9. Continuous attention to technical excellence and good design
  10. Simplicity, the art of maximizing the amount of work not done, is essential
  11. Self-organizing teams
  12. Regular adaptation to changing circumstances

These principles are pretty much common sense. They do a decent job of describing how we thought software would be best developed. However, they seem to focus more on the development phase of a software product than on the maintenance phase. Because of this, we added two of our own:

Agile++ Principle 13: Automate.

Automation is a passion of ours. Without it, agile principles #2 and #12 are going to be nearly impossible. Without automation we do not have enough feedback to really hold ourselves accountable. The age of a QA professional that does manual testing for quality assurance is over. Build, unit test, functional test, deployment, system and UI test all must be automated as much as possible and QA professionals must be able to write code and must be automation enthusiasts. Modern job titles like QA developer and Automation Engineer reflect this. Exploratory testers are the only exception. Their job is to break things in ways no one else can even imagine. Everyone else in quality assurance must be well versed in automation. The management structure must be experienced as well, and at least know that developing test automation in and around software costs at least double the normal effort.

As with measurement, automation cannot be an afterthought. Software must be built so that it can be tested in an automated way. In the architecture, design, and coding phase of software, automation interfaces must be made available and automation must be used as the software is developed. Without this integrated, incremental approach, automation becomes prohibitively expensive, even impossible. The testing phase does not happen after development. The testing phase happens throughout development and is executed with automation.

One of the hardest things we had to automate is user/GUI interaction. The things we learned from these experiments eventually became the subject of a US patent application. In this system, everything a user sees and had to be accessible via some API. Widgets like buttons, text boxes, actionable text, images, videos, transitions, even the visible screen itself, are decoupled from the events so that automation is possible. Things a user does like clicking, typing, speaking, swiping, etc. can be sent to these objects via the API in an automation context. This API exposes what objects are visible on the screen at any given time, locations of those objects, and when the screen is updated. The decoupled code structure required to do this is not obvious and does not come about naturally. Once in place however,we were able to go a step further and developed a mechanism whereby we could do remote automation testing, without a usb cable or bluetooth or even co-location. We could perform remote automated testing on a device halfway around the world. This would never have been possible had automation not been developed into the product every step of the way.

A lot can be gained from just recognizing that automated GUI testing is hard. It is not necessarily a mistake to resign to the fact that gui testing is impractical to automate and will be done manually. In this case we still automate as much as possible. We had a motto "The best GUI is a CLI" (or, in the context of Web Applications, "The Best WebApp is RESTFUL"). Designing GUI tools in this way allows one to automate testing with the CLI so that the underlying functionality receives full automated testing, and all that needs to be manually tested is that the GUI properly wraps the CLI. It also allows GUI changes to be made with confidence that the underlying functionality will not be affected. By providing this decoupling we are able to automate much of the testing, even if we do no automated GUI testing at all.

Agile++ Principle 14: Fun

The opportunity to hire and lead a small team was a turning point for me personally, and probably the proudest, most joyful part of my career thus far. I was given a req for 2 entry level software developers. I first hired a friend of mine who I had met years ago working on an open source project. Like me, he had no formal computer science background, but had a ton of passion for the craft and a long list of cautionary tales about diving into technology without full knowledge of what he was doing. This worked really well so I gave instructions to our recruiter to find me someone else without a computer science degree who has some cool stuff up on github in python (a language we were just starting to learn). The two of us interviewed this third individual as if he was already on our team. We brought him a python problem that we were stuck on and did some collaborative problem solving. The three of us had strengths and weaknesses that balanced each other perfectly. It felt much more like placing an add in the paper for a band member than it did hiring a software developer. I don't claim that this is scalable or even right, but it did introduce me to a level of chemistry that I had not previously encountered in a software organization. Much later, I realized that this is not a unique phenomenon, even in software! The 4th core value of Atlassian is "Play, as a team" suggesting a level of fun and that the sum is greater than the parts. This was exactly the culture that was manifesting.

Feelings like fun, passion, enjoyment, though subjective in one sense, seemed to be the best indicators that we were doing things right. If we weren't having fun, something was usually wrong. We were not frustrated by difficult problems, we were most frustrated by obstructions to how we worked. . To us, the quality of the product and the satisfaction of the customer was far more important than any procedural or political convention. We felt most accountable to our users, collaborators, and each other. We tended to risk punishment rather than ask for permission when it came to doing what we thought was right. We tend to work collaboratively and have a high level of trust and respect for our collaborators. This fosters trust in others and tends to attract like minded individuals that tend to do what they feel is right despite the pressure to do otherwise.

This may sound a bit rouge, but we felt justified in our approach. We were fighting a culture that produced some awful code. We kept a list of Anti-Patterns scribbled on one of our whiteboards that we would continually and collaboratively update. It read something like this.

  1. Never write automated tests, if you have them simply remove the ones that fail.
  2. Develop Software Alone and Leave no trace that you wrote it.
  3. Write all documentation in Microsoft Word if at all.
  4. Use contractors extensively, ideally in a foreign country.
  5. Use only confusing or obvious comments, if at all
  6. Swallow all exceptions, Provide NO debug information.
  7. Force users to use your shitty GUI with no command line option.
  8. Maximize build and platform dependencies
  9. Avoid all modern source code control especially git.
  10. Use undocumented API calls whenever possible
  11. Use as many different style conventions as possible
  12. Use self modifying code, often
  13. Use exceptions (preferably our own classes) as flow control.
  14. Use Haskell and other languages that are hard to learn and maintain.

Though funny, these anti-patterns are an extremely effective way to identify problems. One is far more likely to recognize the anti-pattern than the best practice. We found it especially useful in training junior developers. It allowed us to laugh at ourselves, teach others, and hold ourselves accountable not to repeat the same mistakes.

Because we had fun, we liked being together, and Agile Principle #6 (Co-location) came naturally for us. The importance of co-location cannot be overstated. A developer spends a great deal of time with meetings and email. Inefficient communications take up a tremendous amount of time. Co-location is the best way we have found to reduce the number of emails and meetings. We used flexible seating so that collaborating developers can see each others' monitors. It has room for the product owner and customers and on a given project, the important communication paths are in earshot of each other. On our last project we relocated our whole team to where our customer worked. On the project before that we pushed the desks out of our room so some of us could sit at a long table family style. We change our location to optimize our communication paths. Sometimes we would "jam" where we work collectively on one problem, often around one whiteboard or one computer. People that we could not co-locate with inside the office, we did so over lunch, beers, or skype.

At first glance fun might have nothing to do with software quality. We began to see things differently during our retrospectives where team members gave feedback about what worked and what did not. Inevitably the things that worked well for the team and for the product were things that made working more fun. We came to accept fun as the most accurate and comprehensive word to assess if we were developing good quality and good culture.

Cultural Engineering

More and more companies are putting culture at the forefront of their image. Many have a list of company values publicly visible on their website. Whether values are handed down from some HR department or democratically identified by a team, they must translate to the daily work in order to be effective and really drive culture. Culture grows organically from the people that comprise it. It has a life of its own as the communication paths are exercised. The problem in many software organizations is that there is far too much isolation. Most interactions are governed by meetings and policies that are not manifested from the culture itself. Values then seem more like unattainable ideals than personal and pragmatic guides to working.

Most software maintenance issues originate with code that is written in isolation. Silos within the organization are clearly reflected in the code. Anyone that has been in the software industry for a more than a couple years has likely had a bad experience with contractor code. More isolation means less accountability. It also means less communication and less knowledge transfer. Practices like desk checks or peer reviews may be a small improvement but are still relatively isolated and don't have a whole lot of human interaction. We chose to do something different.

In the late 1990s, I encountered the Extreme Programming paradigm and the practice of Pair Programming. I personally found this practice very rewarding. I learned so much; not only about programming but very simple things like keyboard shortcuts, IDE tricks, website resources, and countless other useful tidbits I would have never found had I not been able to witness someone's personal process. Few will argue that collaboration among developers is a worthy investment of time. Another one of our slogans, "1 + 1 = 3" meant that when good developers work together they can achieve more than the sum of what they could achieve alone. Pair programming, however, is not for everyone and organizations that mandate it quickly find themselves in trouble.

To deal with this reality, we created a practice called Collaborative Coding. Collaborative Coding has only one premise: that no line of code is ever written in isolation. Every line of code is written in a collaborative context. Desk checks and Peer Reviews don't count because these don't have the level of interaction or accountability that we knew we needed. Pair programming was used in some situations, but in many situations pair programming was overkill. We used other strategies. Often one person would do the code and another would do all the documentation. Other times one would do the code and another would write the unit tests . Sometimes one person would start it or do part and another would finish the job. Even in the cases where there is a super talented programmer that just goes off and spews out brilliant things, having another set of eyes to interpret the work and write documentation and/or unit tests made it far more understandable and maintainable to a wider audience. In all of these cases, we had more human interaction, deeper collaboration, and we had two sets of eyes on every line of code in the system. The quality of the code was far better than it would have been if it were done in isolation. There were opportunities to ask "Why did you do it this way?"", "What does this mean?", etc. The collaborators learned more from the development process than they would have alone, and it was far more fun. It cost more effort up front, but paid huge dividends in the long term.

Quality of Information

The quality of information is something that is grossly overlooked in many software organizations. People are neither formally recognized nor held accountable for the quality of information they produce. In fact, it is sometimes employees are rewarded for being the only employee that knows such-and-such. There is no requirement, standard, or reward for such employees to make their knowledge easily accessible and usable by others. Such employees become a single point of failure within the organization. Beyond just individuals, entire groups can become silos where information is available to only a few inside the silo. I don't think this is malicious or even intended. Rather, I think it stems from a lack of standards and accountability for information quality. Like the way that the word customer is limited to mean only those that buy the product, the word quality only refers to the end product and does not address quality of information or anything else that brings the product into being. Rewards for quality mentorship, collaboration, and information are lacking if they exist at all.

Here too we have stumbled upon several measures that can help. We already mentioned Dumb User testing will benefit any document or procedure, even those outside of development. This is a great way to carry out agile principle #4 (Close, daily cooperation between business people and developers). Business people and developers can serve as wonderful dumb users for each other.

Code comments, notoriously lacking or vague, can benefit a lot from considering them as information products. We made it a point to document design decisions right in the code because this was the place most likely to have them read and updated as the product grows. We also used links (to stackoverflow, for example) that detailed why things were done they way they were. In addition to our Code Comment Technical Debt Method, we put our questions right in the code like this:

//QUESTION (werwath) - Why did we not use standard java io libraries for this ?

Information sources must be identified and owned. No ownership means no accountability. Most organizations have nothing in place to ensure that if the author of information is asked a question, or given a correction, that the information is updated. Nothing is in place to ensure that the document is easy to find and search. In these situations, outdated and incomplete information is the norm if it can be located at all.

One of the most important information sources that require ownership is interfaces. Interfaces, by their very definition, are very important communication points to be understood. We have found that unmanaged interfaces quickly degrade as we accommodate new features. Ownership of important interfaces provides at least one individual who knows the interface completely. One individual is accountable for complete documentation and tests. The history of the interface is clearly documented with the nuances and design decisions for all to see. If the interface is accessed from more than one computer language, the interface exists in a more abstract representation where the code can be generated for all codebases that use it. If the interface is of a format that does not support documentation (i.e. JSON) a documentation standard is created. This is an ideal responsibility of a Product Owner if they are technical enough, or an architect if not. Most the time, problems arise because interfaces are not created at all. Instead, direct method calls are made or protocol messages are added or changed that extend the functionality without proper documentation or review. There is a persistent risk of features developed using incomplete or incorrect information.

When we wrote code we used a practice we called "Design by Interface". The premise of the technique is to write interfaces first and refrain from just writing code and coupling it together. This gives another opportunity for thinking about what the system has to do. As the interface is written, this knowledge is recorded in the method comments. The interfaces themselves become design documentation. Various other classes will come to mind in this phase and they will get interfaces too. It forces cleaner code structure and lends itself to easier testing. As the design develops, code can easily be stubbed out. Unit Testing does not have to wait. It comes in right as code starts to happen (when it should!). As the product matures, this technique is equally good for refactoring.

As the design becomes usable software, we keep examples of how to use it in cookbooks and in code comments. Examples are one of the most important types of quality information. All developers know that a good example can save hours, if not days of trial-and-error. Ideally these cookbooks are implemented in a way that provides users the means to correct, modify, and extend it. The documentation for PHP, for example, allowed the PHP community to comment on each class and method. Users added their own examples, questions, and corrections. Not only did this save people time, it created a positive culture around the language. Unless there is user-based feedback and collaboration, wikis and other online documentation become quickly disorganized and out of date. Only when an author is internally driven to maintain it does it remain useful. Wikis could benefit from anonymous feedback features that track the quality of the article. It could also benefit from the ability to subscribe to topic updates. If wikis worked more like social media they might better reflect the collaborative nature of software development and they might provide more accountability for the quality of the information. What if during our annual review, one could get a measure of the quantity, quality, and use of the information that we provided the organization? The barrier to this type of accountability is not a technological one. It's cultural - and it is pervasive! I was recently playing with fiddler[4] and looking at their online cookbook. A method name in the cookbook was out of date with the latest version of the product. Had I been able to comment on this cookbook example, I could have saved the software community hours of time with 1 minute of effort.

Some of the highest quality information is that which does not need to be relearned. Whenever possible, we capture information in tools, automation and scripts that can be used by others. Information then does not need to be re-learned, it is transparently reused. Because organizations generally don't have good visibility let alone reward structures for saving other people's time, it is important that these information sources have built in usage metrics to justify their investment. By legitimizing the importance of saving other people's time with high quality information, we can avoid rediscovering, relearning, and correcting the same poorly managed information. This is not only more efficient, it is more fun as it frees up time for real innovation.

Reimagining the Org Chart

The best in class culture and best in class quality may defy a traditional corporate hierarchy. Traditional hierarchies have limited communication paths that are mostly top down. Hierarchy makes good sense in many contexts where there is a clear chain of command to maintain and the layers above contain the best knowledge of how and what to do. It also makes sense if the underlings are largely interchangeable. The military, for example, works well as a traditional reporting hierarchy. Software organizations contain much more knowledge at lower levels of the hierarchy. This would suggest that more decision making power must be delegated to the lower levels and that the communication paths must be more than just top-down. A reporting hierarchy where only your manager does your review seems to undermine the collaboration and interdependence that agile suggests. Too often, one's boss does not have the best line of site on how a given individual is performing.. Peers are in a much better position to address the value of one's collaboration, one's mentorship, and what one uniquely brings to the team. Simply put, the accountability structure in a traditional corporate hierarchy is not the best fit for a collaborative software organization. If orders from a limited viewpoint are simply handed down a reporting hierarchy without measurable standards in place, we are almost guaranteed to sidestep quality.

Agile Principles 11 and 5 (that teams self-organize and that projects are built around motivated individuals who should be trusted) sits in direct opposition to traditional corporate hierarchy where these decisions are made by management. It is not surprising that companies that practice agile software development often move toward a more flat organization. A system that self-organizes will more accurately place individuals who are both technically competent and have good relationship skills in places of power. Furthermore, it will place individuals in positions more directly aligned with their unique talents and naturally weed out those that don't fit. It so happens that Value, one of the most successful software companies, has done away completely with hierarchy[3]. The organization of teams, even the product ideas and schedules are everyones' responsibility. Even huge companies like amazon organize themselves as multiple, interdependent startups with flat structures.

Interns are the bottom of the traditional hierarchy. There is a tendency to see them as interchangeable, replaceable, and less-than normal employees. Interns are of great importance when it comes to culture. Interns give us a chance to teach what we know and ideally give us more accountability to walk our talk. Their fresh perspective can bring energy and insight to teams. Talented interns who work closely with developers and are be leveraged to their full capacity pay huge dividends. This is good for the intern, good for the team, and good for company culture. Nathan Marz, the creator of Apache Storm publicly credits his intern with one of the most critical pieces of the projects success (no surprise: this was an automation testing feature)[5]. Internships are mutually beneficial relationships that can have profound impact on quality and culture. Organizations that recognize this will have a much stronger culture.

It is hard to reward those that exemplify good quality and good culture. It goes largely unnoticed in organizations that focus only on feature development. Possibly we might address this with job titles and job descriptions that draw focus to quality and culture. With job descriptions we might introduce accountability for good knowledge management, good mentorship, good culture, and any other of these very important things that are not deliverable code. Organizations have sometimes even invented new positions to address this. We have mentioned "QA Developer" and "Automation Developer" already. Some organizations take this a level higher with titles such as Information Architect, Culture Guru, Quality Champion, even "Thinker in Residence". These job titles indicate the need to step back from the daily grind to assess the direction of quality and culture.

## Conclusion

Caring about what we do, how we do it, and how it affects others is good for culture and is good for quality. We cannot possibly improve one without improving the other. If people feel empowered to do what is right, if they are empathetic and considerate of their users inside and outside the organization, and if they are passionate and collaborative in their approach to software, quality naturally becomes a measurable standard. They will feel accountable for delivering quality not only in the product itself but also in the information, communication, and culture around that product. Quality then reinforces the culture and both continually grow and improve together.

For this to happen however, we must remember the following:

  1. Quality must be built into software at every single step. The ability to measure and automate software is a development feature that begins with architecture and grows along with a product its whole life.

  2. The definition of quality cannot be limited to the functionality of the product. It must be extended to include the quality of information, the quality of culture, and the maintainability of the code.

  3. The definition of customer must also be extended to include collaborators, users, and anyone else that is affected by the product both during and after development.

  4. Accountability is the root of both good culture and good quality. Accountability is enhanced by ownership and collaboration. Accountability is limited by isolation and possibly by traditional corporate reporting hierarchies.

We arrived at these things by experiment, by experience, and sometimes by accident, ever reminding us that the spirit of curiosity and pursuit of right questions rather than the right answers will lead us in any great endeavor.

References [1] Beck, Kent; et al. (2001). "Principles behind the Agile Manifesto". Agile Alliance. Archived from the original on 14 June 2010. Retrieved 6 June 2010.

[2] A-Team Code Mainfesto https://drive.google.com/file/d/0B49uMrhVfmL9Rm5LNl9JRUZvR3c/view

[3] Valve Handbook for New Employees http://www.valvesoftware.com/company/Valve_Handbook_LowRes.pdf

[4] Fiddler web debugging proxy http://www.telerik.com/fiddler

[5] History of Apache Storm and Lessons Learned http://nathanmarz.com/blog/history-of-apache-storm-and-lessons-learned.html

James Werwath October 2014

About

Self indulgent flow of consciousness on what I have learned about software culture and quality from a 2 decade career

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published