Soft Skills 101
The term "soft skills" seems to imply that these skills are somehow less important than technical skills. In reality, soft skills are often specifically sought-after by hiring managers. These skills are also important for operations people seeking to advance to senior engineering positions.
As much as technical people would like to believe that operations is a purely technical profession, it is really about serving people. Operations, as the title indicates, is about making things work for people. Operations people design, build, and maintain services for people to use. It is all in a day's work for an operations professional to translate, educate, inform, reason, persuade, and generally act as a liaison between technology and the people who use it.
Soft skills at the 101 level encompass communication skills, time management, project management, and a basic understanding of DevOps from an operations perspective. Soft skills at the 201 level lead into general business skills including positioning, budgeting and the financial process, using metrics effectively, demonstrating impact, risk management, managing customer preference, and thinking strategically.
Audience analysis is the first step to effective communication. Perform a basic audience analysis by answering some simple questions:
- Is the audience technical or non-technical?
- How much do they know about the topic?
- How much do they care about the topic?
- What is the intended message for this audience?
Before sending one email or setting up a meeting, answer these questions. People are inundated with communication from email, voicemail, Twitter, social media, internal web/wikis, text, IM, and meeting requests.
Internal customers could be people who use general computing and call a helpdesk for support, the organization's software developers or engineering team, senior management, or researchers, students, faculty, or others. The type of customers depends upon the type of organization and the industry.
Working with internal customers could be as simple as being the "Ops" side of a DevOps team or it could mean supporting a wide range of technologies used by customers at varying levels of technical understanding.
When operations focuses on a specific project or works with a specific team, such as engineering or software development, communication is generally specific to that work. It can take on the form of meetings, video conferencing, chat sessions, and emails between team members. A communications culture tends to develop in these scenarios as team members figure out the best way to coordinate with one another.
When operations focuses on more general IT support, communication becomes more complicated for the operations people. Factors such as audience analysis play a larger role in successful communication with customers. Operations faces a potentially wide array of communications scenarios:
- Announcing outages to general staff in a large organization
- Announcing upcoming maintenance to a set of staff impacted by a service outage
- Broadcasting a technical idea to a non-technical audience
- Contacting internal customers impacted by a security issue or vulnerability (e.g. Run this update. Install this patch.)
- Asking middle management across the organization to weigh in on a potential service change
- Offering a seminar, workshop, or class to assist customers with a new or modified service for a general audience
- Offering a seminar, workshop, or class to assist customers with a new or modified service for a non-technical audience
- Presenting the service catalog in a question-and-answer session
- Meeting with senior management to address an operations problem, budget shortfall, request more resources, or propose an architectural change
- Meeting with customers to address service problems
- Meeting with specific groups of customers to collect requirements for a special project
- Requesting feedback from customers either individually or as a group
- Meeting with customers who are engaged in the subject matter
- Meeting with customers who are disengaged or in attendance because it is mandatory
This list spans a wide range of communication modes, communication types, customers, and outcomes.
- communication modes email, meetings, larger presentations, surveys
- communication types persuasive communication, instructional, informational
- diverse customer backgrounds management, administrative staff, technical staff, IT-savvy, non-IT-savvy, interested, disinterested
- desired outcomes management decision, increased understanding, increased abilities, increased awareness
Communicating with external customers can offer additional challenges. If the external customers are customers of the organization, there is the possibility that dealings with them could result in a complaint to upper management.
Reduce complaints by considering how to communicate with these external customers. When communicating about a service outage, consider timing of the outage, duration, and impact of the outage on these external customers. Are most external customers in the same time zone? If so, then the maintenance window could be outside of traditional working hours. If external customers include international people in varying timezones, the outage window may be the one that impacts core customers the least.
Communicate the timing of service outages with management. It is best if management knows that external customers are about to be impacted by operations. Include a justification for the maintenance: why is it necessary, why this outage window, why this duration, plan B in case the outage goes beyond the outage window, method of communication with external customers? All of these pieces of information may not be necessary if operations already supports external customers on a regular basis.
There is significant breadth and depth required to effectively communicate.
Let's start by covering the two most common modes of communication for operations: email and meetings.
Communicating via email
Before sending that email to the entire organization, who really needs to know the information? People already get a lot of email; for most it is information overload. How many of customers already complain about too many emails? Don't get filtered, Make communication count.
Here are some best practices when using email to communicate:
- Shorter is better.
- Make the subject descriptive (e.g. "www.co.com outage, May 10 - 6-8 pm")
- Put the most important information at the top of the message (e.g. deadline, action, outage dates). People generally skim the first few lines to determine if the information pertains to them. Starting with a lengthy background risks alienating people before they read the important part of the message.
- State the audience at the top of the email (e.g. "All Macintosh Users") to let them know the message is directed at them.
- Consider including a link to an internal site with a lengthier writeup if needed.
- Limit the recipient list to only those people who need or want the information (management, administrative customers, developers, people impacted.). Build a list if necessary to avoid spamming the entire organization.
Sometimes email is the best way to communicate and sometimes not. Decide when to use email and when to communicate another way.
Consider email appropriate in some situations:
- Attemping to reach a large audience
- The message or action is simple
- The message needs to reach them now
- Need to document the distribution of the information.
- Following up on a previous conversation, request, or action
Consider email less effective in other situations:
- Conversing back-and-forth with people to define or understand a complex issue
- Creating something new
- Drawing it on a whiteboard would provide better enlightenment
- There is a potential for much confusion or questions about the issue
- Asking for a management decision on a technical issue from non-technical management
- Trying to teach people
Sometimes email can be used in combination with other methods:
- After a meeting, send an email to the attendees to summarize action items or decisions. This can be an important tool to remind management of a decision made months earlier.
- Announce the time and location of a seminar or training class.
- Share status of an action taken as the result of a discussion or meeting.
Some common effective uses of email include the following:
- Notification of outages
- Warn of IT security threats (e.g. raise awareness of increased phishing attacks)
- Document a decision made by management in a meeting.
- Document the outcome of actions taken
- Provide status on previous assignments
- Announce training, seminars, and presentations by operations
- Provide customers with a link to access a new or modified service
The dreaded meeting
If customers think they get too much email, some of them also think they also attend too many meetings. Some people, especially managers, have corporate calendars that resemble a tetris game. Coordinating an effective and productive meeting follows a simple formula.
Have a purpose. Need a decision? Need a decision now? Need to inform? Need need to persuade?
Be prepared! Consider audience and be prepared to answer questions relevant to their interest in the topic. Some of this is covered in more depth at the Soft Skills 201 level.
Communicate at the right level. Leave out technical jargon if meeting with a non-technical audience. Consider simplified explanations, diagrams, and framing the content to address concerns that the audience cares about. Operations is the translator of technical information when meeting with a non-technical audience. Take that role seriously.
Set a duration. Decide how much time is needed to present the topic and answer questions. Make it as short as possible. Some organizations default all meetings to one hour.
Consider what the audience gets out of the meeting. Should the audience increase their knowledge or understanding on the topic? Maybe they have no interest in the topic but are the final decision maker due to funding levels, type of money, policy, role within the organization, or other factors.
Stick to the agenda. Do not let the audience take the meeting off course. In a 1:1 meeting, the audience might ask for IT support for an unrelated problem. Agree to put someone on the problem after the meeting, then go return to the scheduled topic. In a larger meeting, audiences can tangent into related areas or even unrelated areas. Be prepared to steer the meeting back on topic.
Summarize Summarize the outcome in the last few minutes of the meeting. It can be good to send an email to summarize decisions made in the meeting in order to document the outcome.
Sometimes attendees are mandated to attend meetings:
- Committees where members are selected by the organization to represent a subset of people. Committees are often too large and unproductive. The saying "languishing in committee" describes this cultural phenomenon.
- Management meetings where all members of a management team are required to meet at regular intervals to review topics that may or may not be relevant to everyone in the room.
- Training where all employees of an organization are required to complete a minimum set of hours on a particular topic.
The operations person tasked with leading one of these types of meetings may find a less than enthusiastic audience. Apply the best practices above and attempt to make these meetings productive. Even without being the chairperson, sometimes keeping a meeting on topic and looking for areas to be productive can reduce inefficiencies.
Alternative meeting styles
Meetings do not always require scheduling a conference room for an hour or more and everyone arriving with a laptop or a legal pad. Consider stand up meetings or even short 10-minute slots on a manager's calendar to provide a quick status update or respond to a question that is best answered in person.
Special cases for operations
There are some special communication challenges that operations engineers face.
Communicating planned and unplanned outages
Managing maintenance windows in the organization involves more than choosing a date and time that works for operations.
Consider working around important events within the organization. It takes extra planning and outreach to learn about these events, but it is one way operations demonstrates that it is savvy to the organization's needs. Wouldn't it be good to know if the organization is about to roll out the next version of a product, perform a year end close-out, host a big conference, or stage a demo to an important external stakeholder. There are no extra points for doing this, but operations avoids losing respect within the organization for being unaware of the organization's core business.
For outages that may impact a large percentage of the customers or a critical service, it is a good practice to notify the organization more than a week in advance. This serves a dual purpose: it alerts people who might be out of the office the week before the actual outage and it provides lead time to reschedule in case someone responds with a critical activity that would conflict with the outage. Send a reminder the day before or the day of the outage for customers who missed the first message.
To send a followup email, simply forward the original email with a short note at the top reminding people of the time and services impacted.
Example: Planned outage notification
All file cluster users, Save your work before 7:00 pm Friday, January 10th for a planned outage of the file cluster. The file cluster will be taken off-line for scheduled maintenance at 7:00pm Friday, January 10th. We expect the outage to last until 10:00 pm. Notify operations immediately if this interferes with time-critical work. [provide a way to notify operations]
Fielding customer complaints
In the world of operations, customer complaints are a given. Operations can't please everyone all the time. Every operations person has dealt with unhappy customers so it is good to develop strong people skills.
It is important to face customer complaints, not avoid them. Occasionally we have a customer who is a chronic complainer and the operations staff dive under their desks when that person walks in the office. A complaint should be treated as an opportunity to hear a customer's perception of services. Complaints can be turned into opportunities for improvement and can be a path to creating a lasting relationship with customers.
People are often at their worst when reporting a complaint; emotions are high due to lost data, a service outage, or frustration trying to use technology. Now is not the time for operations to get emotional or defensive about the work. Instead of reacting, follow these steps to adeptly manage customer unhappiness and maybe increase customer respect for operations as a whole.
- Listen without judgment
- Rephrase the concern so to confirm understanding
- Agree to investigate if it isn't something resolvable now
- Leave the customer with the assurance that someone will get back to him/her with a solution or feedback.
- Get back to the customer even if it is to say
- It was a one-off problem and here is why
- We found a problem internally and it is now resolved
- We are improving our processes to reduce the likelihood of it happening again
- Or an explanation that simply provides feedback to the customer.
- And don't forget to thank the customer for taking the time to provide feedback
The reason to close the feedback loop is to show the customer that operations did something as a result of the complaint. The customer will know that someone in operations was concerned enough to investigate and potentially resolve the root cause of the complaint. It could have been inconsistencies in operation's internal procedures or a skills gap. That's a bonus for operations and the customer should know that the communication had a positive impact.
Try these techniques with chronic complainers. Sometimes all they want is to be heard. Bring in IT operations management if someone is repeatedly impacting operations with complaints or becomes abusive, This advice stands if operations feels like the above techniques are not working. Escalation to the next person in the management chain is a valid procedural step in any of these instances.
.. TODO:: It might be interesting to put together an exercise where the student interacts with a fictional customer in some different scenarios. Depending on what the student does, the customer is happy or complains to the operations person or escalates the complaint up the management chain. How does the student respond? Could have multiple scenarios with different customers (a customer who causes his own problem then gets in the way, a customer who cannot wait, a customer who tries to fix the problem and makes it worse, a customer who uses the opportunity to speak to an operations person to dump 10 other requests on that person. This idea came to me from a series of books my kid has where you make a decision on page 10 that leads to to either page 26 or page 40. Your decision could end the story or take you in a new direction. The books are full of these decision points so the story is rarely the same twice, kinda like customer support!
Time management is a critical skill for the operations professional. Customer service requests and trouble tickets are up against project work and infrastructure maintenance and enhancements. How does one person prioritize and accomplished?
- Tom Limoncelli's book Time Management for System Administrators
- Tom Limoncelli's Time Management Wiki
.. TODO:: does this section need a real writeup or are references to Tom's work enough?
Project management is a necessary skill for any mid-level operations person. Start with small projects and work the way up to larger ones.
Be aware that project customers, or stakeholders, will often not know what they truly want from a project or they ask for the moon. Review the project management triangle (good, cheap, fast: pick two).
Henry Ford is credited with saying about his customers "If I had asked customers what they wanted, they would have said faster horses." Whether or not he said it, it still captures the essence of requirements gathering for operations projects. The operations professional is the technology expert. The stakeholders know they want a certain output or service. They may not know what that looks like or how to achieve it. The challenge is to extract requirements from the stakeholders then realize that these may not be the real or complete requirements.
Enter project management. Project management should help to frame the scope, resources, goals, and outcomes for the project. Let's look at two different project management methodologies as they apply to operations.
Waterfall is a hierarchical form of project management that was adapted from other industries for the software development world. In waterfall, think of the phases of a project as a cascading waterfall. Each phase must be completed before moving onto the next phase. The entirety of the project is scoped from beginning to end including milestones and and final deliverables.
Technologies change, requirements change and scoping a large project over a long period of time with what are commonly incomplete requirements or faulty assumptions by stakeholders leads operations down a path of delivering an incomplete or inaccurate solution at the end. Waterfall breaks down in practice because it requires a promise of delivery that may be several years out.
Also, by requiring each phase a project to complete before moving onto the next phase, bugs and issues are often not discovered until late in the project. This causes delays and sometimes large amounts of refactoring or re-architecting to go back and resolve these issues.
Detractors of the waterfall method point to its rigidity and lack of testing during the development phase. One of the issues in operations and development work is that stakeholders may not have a solid grasp of requirements until they see a working prototype, or iterations of working prototypes during the implementation of the product. It is common for stakeholders in a project not to know what technology can deliver until they see it. Many operations teams are moving to Agile methods for several reasons and one of them is because agile development allows stakeholders to see working bits of the product before the end and to modify requirements before it's too late.
Agile is a project management methodology. Agile started in 2001 when a group of software developers created the Agile Manifesto. The Agile Manifesto outlines the 12 principles of agile. Agile is seen most often in the software development world but it has crept into operations because of the obvious benefits over waterfall. Common implementations of Agile include: Scrum, Kanban, and the hybrid Scrumban that was created to meet more operational needs. The idea behind Agile is continuous release or delivery of a product. Instead of creating one big outcome at the end of a project, Agile allows a team to release a partially completed project for stakeholder review and requirements tweaking. Another big benefit of Agile methodologies is the discovery of problems early in the product development cycle when refactoring can be done immediately before the end product is set in a particular architectural direction that would make it costly to change.
Some documented benefits of agile include the following:
- Reduced process overhead
- Improved team and stakeholder communication and collaboration
- Errors and bugs are fixed in development instead of waiting till the product is "complete" to address them.
- Stakeholders see the product as it is shaped and have the ability to adjust requirements during development
- Project teams are empowered
- Can easily be combined with DevOps methodology to improve effectiveness of development-into-operations
- If done well, can increase work output of teams (increased velocity)
- Everyone on the project can easily see where the project stands (e.g. Scrum board or Kanban wall)
One thing to remember when implementing an Agile solution: adapt it as needed. Each of the following has its own simple framework, but organizations can use some or all of the implementation and even combine Agile methods to achieve success.
Scrum is the more prescriptive of the included methods. Scrum is recognizable by Scrum boards, user stories, timeboxed sprints, cross-functional teams, Scrum Master and Product Manager roles, the burndown chart used for tracking project status, and the Scrum meetings: daily stand-up, and retrospectives.
Some of the limiting factors of Scrum for operational teams include timeboxing and tracking the burndown velocity of the team.
Scrum board - An electronic or physical board that is used to track project status, actions that are in progress, upcoming work, and completed work. A basic Scrum board will have three columns: Todo, In Progress. Done. Items in todo are the up and coming work, items in "In Progress" are currently being worked during this sprint. Done is fairly self-explanatory. Assignments can be tracked by sticky note on a white board or via an electronic Scrum board. The Scrum board also has rows. These are referred to as swimlanes. Rows can be labeled with project names and it common to have the very first swimlane titled "unplanned work" for operations tasks that fall on the team.
Electronic Scrum board - Electronic Scrum board software can be great if the team is geographically distributed. All members of the team can see and update the board from remote locations. The downside of electronic versions is getting the team to keep the application open and updated. Burndown can also be computed automatically making it easier for management to see progress.
Physical Scrum board - Often a whiteboard with a grid made of electrical tape. The swimlanes and tasks are marked by sticky notes. The team names can be post-it flags or some other marker. The downsides to a physical board include manual tracking of burndown, stickies falling off the board onto the floor (hint: Buy the Post-It super sticky notes or use tape or magnets), and lastly distributed teams cannot see the board easily. The upside to a physical board is visibility. The board can be placed in a prominent location where the operations staff can see it every day. This makes for easy daily stand-ups. It also allows members of the team to walk up to the board and have conversations with other members of the team about the work in progress.
Sprint - A sprint is a duration of time defined by the team when the work will be done between Scrum meetings. Work is chunked into pieces small enough to fit within the sprint window. A sprint window might be a week, two weeks, four weeks, or whatever length of time seems to fit the team. During the sprint, operations staff focus on the work agreed upon at the beginning of the sprint. Organizations can define how unplanned work will be dealt with during a sprint. Sometimes it is helpful to be able to tell a customer that we can prioritize that project request in two weeks at our next sprint meeting instead of feeling like operations has to drop everything for a last minute request. Sprints are somewhat rigid and can break down with operations because the work doesn't neatly fit within a timeboxed window. The team will also provide time estimates for each task.
Daily Standup - This is a short daily meeting with the team at the Scrum board (virtual or physical). The person in the Scrum master role leads the daily stand-up by asking each team member a few questions:
- What are you working on?
- Are there any impediments?
- Do you need anything to be successful?
Each member of the operations team now knows what is expected of him/her for the day. Balance the expected work output with other team efforts such as trouble tickets and outside projects.
Burndown - The burndown tracks estimates of time with the actual time spent working on a project's tasks. The resulting chart will show a project approaching 0 as the level of effort needed to complete the project winds down. Teams get better at estimating with experience. Burndown can also demonstrate if a project is taking longer than planned or is ahead of schedule. Building a burndown chart can involve a spreadsheet or graphing application. It is common to build formulas in excel that will automatically update a pivot chart showing the project tracking. Some burndown charts are very complex and others are simple. The organization has to decide how fancy to get with this tool.
User stories - In Agile software development, user stories can be feature requests, bugs, or modules the team plans to code for a product release. In operations, user stories can be small or large projects. Smaller projects are usually broken down into smaller more easily digestible pieces otherwise a project can park in a swimlane for an inordinately long time bringing down team morale and potentially impacting productivity. Teams should see positive outcomes and accomplishments across the swimlanes.
Cross-functional teams - In a development environment, a cross-functional team could include developers, testers, management, and operations. The purpose is to introduce DevOps to software development by including roles that have a stake in the project at different levels. In operations, a cross-functional team could include people from systems administration, networking, security, and management.
Kanban is a much less prescriptive Agile implementation. Kanban can be recognized by a similar task board to Scrum but often there are more columns. Kanban's strength is the work in progress (WIP) limit. Kanban doesn't require roles, timeboxing, or burndown tracking like Scrum.
Because there is no timeboxed sprints, work continuously moves across the swimlanes on the Kanban board. Daily stand-ups are critical in Kanban because there isn't a touchpoint at the end of a sprint to review completed work effort. Kanban boards can have several additional columns to assist in the management of this continuous work flow. An example Kanban board may have "Coming soon" "Review" "Available" "In progress" "Acceptance" "Completed." The purpose of these additional columns is to enable teams to pull work into the "In progress" column as they finish other work. The "In progress" column and other columns will have what is called a WIP limit. There are a few schools of thought regarding WIP limits. Each organization must experiment with the WIP limit until a sweet spot is found for operations.
In Kanban for operations, the columns can be varied across teams or organizations. These columns are only provided as an example. The organization needs to find the Kanban workflow that works best for the team. There are several good resources that explain various ways of configuring a Kanban board. Sticking with the current example, let's review the columns in an example Kanban board to understand their purpose.
- Coming soon - these are tasks, projects, or user requests. They are un-prioritized and may be big or small.
- Review - These are tasks that are prioritized by management or the team during the daily stand-up. They are put "in the hopper" as work items that should be reviewed and possibly broken into smaller pieces if they are too large. The downside of too large is similar to Scrum when the user stories were too broad. If an in progress items its in the active queue too long, it takes up a WIP slot and can make it difficult to understand if the team is making progress on that item.
- Available - This item has been reviewed, broken into a reasonably sized task and approved by management or the team to be pulled into the active column at the next opportunity.
- In progress - Similar to Scrum, these are the tasks being worked actively by the team.
- Acceptance - When someone on the team considers a task complete, s/he moves it to this column. Acceptance means it is discussed at the next daily stand-up and possibly accepted as done by the team. Acceptance can also mean stakeholder acceptance. This could also be a testing phase for something that is rolling toward production. If something idles too long in this column, it will hold up other work because of the WIP in progress limits placed on this column.
- Completed - These are tasks that are accepted as completed and put into production.
- Impediments - Some boards might include a small section of a column to identify impediments. Impediments are tasks that cannot begin because of outside forces. Usually management intervention is required to resolve the impediment. By separating these tasks on the board, the team sends a message to management that this work requires outside intervention to move forward.
Work in Progress (WIP) limits WIP limits define the maximum number of tasks that can appear in that column on the Kanban board. The two schools of thought that seem to pervade are:
- 2n-1 - where n = the number of people on the operations team. The reason for this is to enable team members to work together on some tasks but to give enough tasks so team members stay busy.
- n-1 - where n = the number of people on the operations team. The reason for this is to encourage collaboration on the team and not to overwhelm them with too many tasks. If someone on the team completes all of their work, that person should be able to pull the next available task from the "Available" column.
What is the risk of having a WIP limit too low or too high? A high WIP limit might mean the team is taking on too much at one time. Each member of the team may get overwhelmed with the amount of work. Consider these are reviewed daily in the stand-up meetings and team members can pull new work from the "Available" column when current work moves to "Acceptance." High WIP limits mean that team members are less likely to work together on projects or tasks because each person has his/her own work to complete. A WIP limit that is too low could create a bottleneck, disallowing a team member from pulling new work into the "In Progress" queue because other people on the team have hit the WIP limit with their own work. The WIP limit is a sweet spot that the organization needs to discover through experimentation.
Whenever there is a bottleneck in Kanban, the team can refocus its efforts on the item stuck in the flow in order to unblock progress across the board. WIP limits force this to occur because a column with a WIP limit of 3 on the acceptance column will not allow any tasks to move to that column if there are already 3 items waiting for acceptance. It is a way to keep work moving across the board.
Scrumban is a hybrid of the two previously mentioned methodologies. Operations teams seem to embrace Kanban or Scrumban because of the flexibility of daily re-prioritizing and the WIP limits that keep the team from getting overwhelmed.
A Scrumban implementation would take elements from both Scrum and Kanban. For example, operations might decide to define some roles, keep the review and retrospectives, hold the daily standup from Scrum while enforcing WIP limits and implement continuous work flow from Kanban.
The Tao of DevOps
What is DevOps
DevOps seeks to include the IT operations team as an important stakeholder in the development process. Instead of developers solely coding to meet the stakeholder's requirements on time and on budget, they are also held responsible for how easily it deploys, how few bugs turn up in production, and how well it runs. Developers also focus on providing software operations can asily support once it's in production. Instead of bringing operations into the conversation after the product is complete, the DevOps methodology includes operations in the development stream.
- Roll a product out to meet customer specifications within a certain timeframe
- Continuous delivery means recurring change as bugs are fixed and features added
- Fast changing environments are needed to support dev
- Agility is key
- Supporting the product for customers
- Keeping a handle on IT security
- Planning for deployment to production state
- Changes are slow/incremental
- Consistent environments are needed to support operations
- Stability is key
Why DevOps is important
In organizations where DevOps is not a priority, development is often viewed as customer-focused by trying to solve problems and deliver solutions while operations is viewed as a barrier to development's mission. By combining these two often competing mindsets, both sides can be satisfied. The result is a product that potentially has fewer bugs, higher availability, increased security, and a process for improved development over the life of the product that works for both the developers and the operations people.
It is also possible to implement a DevOps methodology in a pure operations teams. In this scenario the operations team is also Development because they stand up a webserver, provision virtual machines, or code configuration management systems. In this case, operations needs to wear both the development and operations hats by meeting customer needs while also addressing security and supportability of the solution.
What isn't DevOps
A person cannot be a DevOp. You don't hire a DevOp.
The importance of Documentation
What to document
- Runbooks? SOP? (cparedes: might be worthwhile even though we want to automate SOP's away as much as possible - what should we check at 2 AM? What do folks typically do in this situation if automation fails?)
- Architecture and design (cparedes: also maybe talk about why we choose that design - what problems did we try to solve? Why is this a good solution?) How to manage documentation
Documentation through Diagrams
Anecdote At one job we had a single network engineer. He had a habit of walking up to a whiteboard to explain something to the systems folks. He would proceed to draw what we considered a hyper-complex-looking diagram showing the current or future state of some networking solution. We could never keep his configurations in our heads like he did and he wasn't always around when we had a question. One of us figured out that we should take a picture of the whiteboard after he finished drawing. These pictures went into the operations wiki. They weren't beautiful but they saved us time when we could easily refer back to the pictures we took.
Diagrams don't always have to be professional visio-quality to count as documentation.