In [1]:
import nltk
import re
from nltk.corpus import stopwords

In [2]:
#programming fundamental 
sentence = """
Programming Fundamentals

Our modern digital creations are complex: a key job of programmers is to express that complexity as simply as possible.
"The best programs are written so that computing machines can perform them quickly and so that human beings can understand them clearly. A programmer is ideally an essayist who works with traditional aesthetic and literary forms as well as mathematical concepts, to communicate the way that an algorithm works and to convince a reader that the results will be correct." -- Donald Knuth

 Lesson 1: Some programming fundamentals
Humans have been programming computers for seven or so decades now. DevOps is a hot new topic in that practice, but the current popularity of the term does not mean we should ignore the many findings on how to write the best programs possible that came before DevOps should be ignored!

DRY 
This stands for Don't Repeat Yourself! It means that any part of your system that might ever need to change should have a single place where you can make the change. Don't copy blocks of code to wherever you need them in your program: write a function and call it from each of those places. Don't define your data tables in your database, and also in your code: find a way (like the Django models.py file) to define your data one place and use that definition to generate both the database and the code that uses the DB.
No magic constants. 
This is a special case of DRY. It is very tempting, when coding your NYU scheduling app, to write code assuming there are two (major) semesters per year. This will be fine... until NYU adopts a tri-mester system. Instead, define a constant NUM_SEMS = 2. You might get away with writing day_of_week = day mod 7, since that number probably will never change. But you really ought to write hour_of_day = hour mod CLOCK_PERIOD, since both 12 and 24 hour timekeeping methods exist.
Make functions do one job. 
Funcitons that perform a single job are simpler to understand, easier to change or eliminate, and render the overall system more comprehensible. For instance, if the county writes a tax program with a function called calc_taxes, it would be natural to eliminate that function if the job is later passed off to a microservice running on the cloud. But, if the coders also happened to include the code to clear tax liens (county claims against the property for unpaid taxes) in the same function... Oops! No one who ever had a tax lien can sell their property, because the lien never gets cleared.
Keep functions short. 
This is related to the previous principle, but focuses on the size of the one job that should be done. A function named handle_yearly_taxes() is doing one job, but probably way to big a job. It would make more sense to have create_tax_roll(), calculate_taxes(), send_bills(), record_payments(), and perhaps more.
Format and indent properly. 
Different languages have different conventions for how to name variables (camelCase, with_underscores, MixedCase, and so on), how to space operators, where to put braces, and so on. You should follow those conventions, unless there is a strong reason not to. Consistent indentation is especially important: it allows a reader of your code to easily line up blocks of control. Irregular indentation is a significant source of bugs, as people modifying the code will make mistakes, for example, about which else goes with which if.
Comment judiciously. 
Code should contain some comments, especially things like docstrings for classes that can be extracted to produce a guide to the system, and comments explaining what particularly tricky or unusual bits of code do. But commenting is no substitute for writing clear, readable code in the first place! The best explanation of what your code does is, if you write it correctly, your code itself. Remember that we could, and once did, write code just as a sequence of 1s and 0s. And all higher-level languages need to be translated into such code in the end. So why bother with C, Java, or Python? These languages exist for humans, not for computers: they make it easier for us to understand and reason about what a program will do. The upshot: you should look at your code as being every bit as much about communicating to humans as about directing a computer.
Go for the golden mean in naming. 
Sometimes, names of functions and variables can be way too cryptic: there are examples in the widely used CLRS Algorithm book where I have found as many as six single-letter variable names used at once. On the other hand, naming a function something like take_input_of_employee_w2_and_calculate_employee_tax_rate() is absurdly long: please remember, other programmers will have to type your function names in order to call your functions! Such immense names also make it extremely difficult to stay within guidelines like PEP 8's dictum of "no lines longer than 79 characters." A more reasonable middle ground might be something like calc_tax_rate(), where an employee's W2 might be a parameter for the function.
Test, test, test! 
Write an automated test to go with every program or new feature you write. Test as completely by hand as you can: don't just test that your code fetches the data from the DB correctly: test that it still works properly if there is no data in the DB, or, indeed, if there is no DB! ("Properly" here could mean "Display an informative error message instead of crashing.")
 Lesson 2: Python coding standards
For this lesson, please read the Python coding standard, PEP 8. It is a very good example of what a coding standard is like, and most of the guidelines can be applied in other languages. Our JavaScript team is choosing a standard at present, and soon we will link to that here as well.

 Other Readings
Programming Best Practices
Following Coding Standards using Flake8

"""



In [3]:
def count_words(texts):
    stop_words = set(stopwords.words('english')) 
    words = nltk.word_tokenize(texts)
    # Remove single-character tokens (mostly punctuation)
    words = [word for word in words if len(word) > 1]
    # Remove numbers
    words = [word for word in words if not word.isdigit()]
    # Lowercase all words (default_stopwords are lowercase too)
    words = [word.lower() for word in words]
    words = [word for word in words if word not in stop_words]
    fdist = nltk.FreqDist(words)
    # Output top 50 words
    result = []
    for word,_ in fdist.most_common(len(words)):
        result.append(word)
    return  result
    


In [4]:
sentence_lst = nltk.sent_tokenize(sentence)
for i in range(0,len(sentence_lst)):
    sentence_lst[i] = re.sub('\n', '', sentence_lst[i])
data = count_words(sentence)
print(sentence_lst)

['Programming FundamentalsOur modern digital creations are complex: a key job of programmers is to express that complexity as simply as possible.', '"The best programs are written so that computing machines can perform them quickly and so that human beings can understand them clearly.', 'A programmer is ideally an essayist who works with traditional aesthetic and literary forms as well as mathematical concepts, to communicate the way that an algorithm works and to convince a reader that the results will be correct."', '-- Donald Knuth Lesson 1: Some programming fundamentalsHumans have been programming computers for seven or so decades now.', 'DevOps is a hot new topic in that practice, but the current popularity of the term does not mean we should ignore the many findings on how to write the best programs possible that came before DevOps should be ignored!', "DRY This stands for Don't Repeat Yourself!", 'It means that any part of your system that might ever need to change should have a

In [5]:
def get_all_sentence_contain_word(word,sentence_lst,url,page,posturl):
    lst = []
    for i in sentence_lst:
        if word in i:
            temp_dic = {
                "title":word,
                "info":{
                    "page":page,
                    "url":url+"/"+posturl,
                    "sentence":i
                }
            }
            lst.append(temp_dic)
            
    return lst

In [6]:
url ="http://127.0.0.1:8000/devops"

In [7]:
#Programming Fundentmal
page="Programming Fundentmal"
posturl = "basic"
words = """
Programming Fundamentals

Our modern digital creations are complex: a key job of programmers is to express that complexity as simply as possible.
"The best programs are written so that computing machines can perform them quickly and so that human beings can understand them clearly. A programmer is ideally an essayist who works with traditional aesthetic and literary forms as well as mathematical concepts, to communicate the way that an algorithm works and to convince a reader that the results will be correct." -- Donald Knuth

 Lesson 1: Some programming fundamentals
Humans have been programming computers for seven or so decades now. DevOps is a hot new topic in that practice, but the current popularity of the term does not mean we should ignore the many findings on how to write the best programs possible that came before DevOps should be ignored!

DRY 
This stands for Don't Repeat Yourself! It means that any part of your system that might ever need to change should have a single place where you can make the change. Don't copy blocks of code to wherever you need them in your program: write a function and call it from each of those places. Don't define your data tables in your database, and also in your code: find a way (like the Django models.py file) to define your data one place and use that definition to generate both the database and the code that uses the DB.
No magic constants. 
This is a special case of DRY. It is very tempting, when coding your NYU scheduling app, to write code assuming there are two (major) semesters per year. This will be fine... until NYU adopts a tri-mester system. Instead, define a constant NUM_SEMS = 2. You might get away with writing day_of_week = day mod 7, since that number probably will never change. But you really ought to write hour_of_day = hour mod CLOCK_PERIOD, since both 12 and 24 hour timekeeping methods exist.
Make functions do one job. 
Funcitons that perform a single job are simpler to understand, easier to change or eliminate, and render the overall system more comprehensible. For instance, if the county writes a tax program with a function called calc_taxes, it would be natural to eliminate that function if the job is later passed off to a microservice running on the cloud. But, if the coders also happened to include the code to clear tax liens (county claims against the property for unpaid taxes) in the same function... Oops! No one who ever had a tax lien can sell their property, because the lien never gets cleared.
Keep functions short. 
This is related to the previous principle, but focuses on the size of the one job that should be done. A function named handle_yearly_taxes() is doing one job, but probably way to big a job. It would make more sense to have create_tax_roll(), calculate_taxes(), send_bills(), record_payments(), and perhaps more.
Format and indent properly. 
Different languages have different conventions for how to name variables (camelCase, with_underscores, MixedCase, and so on), how to space operators, where to put braces, and so on. You should follow those conventions, unless there is a strong reason not to. Consistent indentation is especially important: it allows a reader of your code to easily line up blocks of control. Irregular indentation is a significant source of bugs, as people modifying the code will make mistakes, for example, about which else goes with which if.
Comment judiciously. 
Code should contain some comments, especially things like docstrings for classes that can be extracted to produce a guide to the system, and comments explaining what particularly tricky or unusual bits of code do. But commenting is no substitute for writing clear, readable code in the first place! The best explanation of what your code does is, if you write it correctly, your code itself. Remember that we could, and once did, write code just as a sequence of 1s and 0s. And all higher-level languages need to be translated into such code in the end. So why bother with C, Java, or Python? These languages exist for humans, not for computers: they make it easier for us to understand and reason about what a program will do. The upshot: you should look at your code as being every bit as much about communicating to humans as about directing a computer.
Go for the golden mean in naming. 
Sometimes, names of functions and variables can be way too cryptic: there are examples in the widely used CLRS Algorithm book where I have found as many as six single-letter variable names used at once. On the other hand, naming a function something like take_input_of_employee_w2_and_calculate_employee_tax_rate() is absurdly long: please remember, other programmers will have to type your function names in order to call your functions! Such immense names also make it extremely difficult to stay within guidelines like PEP 8's dictum of "no lines longer than 79 characters." A more reasonable middle ground might be something like calc_tax_rate(), where an employee's W2 might be a parameter for the function.
Test, test, test! 
Write an automated test to go with every program or new feature you write. Test as completely by hand as you can: don't just test that your code fetches the data from the DB correctly: test that it still works properly if there is no data in the DB, or, indeed, if there is no DB! ("Properly" here could mean "Display an informative error message instead of crashing.")
 Lesson 2: Python coding standards
For this lesson, please read the Python coding standard, PEP 8. It is a very good example of what a coding standard is like, and most of the guidelines can be applied in other languages. Our JavaScript team is choosing a standard at present, and soon we will link to that here as well.

 Other Readings
Programming Best Practices
Following Coding Standards using Flake8

"""
lst = []
def generate_index_json(words,url,page,posturl):
    sentence_lst = nltk.sent_tokenize(words.decode("utf-8"))
    for i in range(0,len(sentence_lst)):
        sentence_lst[i] = re.sub('\n', '', sentence_lst[i])
        
    data = count_words(sentence)
    
    js = []
    for i in data:
        lst = get_all_sentence_contain_word(i,sentence_lst,url,page,posturl)
        for j in lst:
            js.append(j)
    return js
js = generate_index_json(words,url,page,posturl)
import json
with open(posturl+'.json', 'w') as outfile:
    json.dump(js, outfile)

    

In [8]:
#The DevOps ways of work
page="the devOps works of work"
posturl = "work"
words = """
"Working software is the primary measure of progress." -- The Agile Manifesto

 Lesson 1: DevOps as Software Engineering

Lecture 1: Some DevOps History
As software engineers, our aim should be to deliver programs to our users that perform the tasks they need that software to perform. The whole field of software engineering has centered around how to achieve that goal.

One attempt to achieve it was the waterfall model of software development.


The waterfall model
But the waterfall model produced many failures: death marches, instances of the ninety-ninety rule, small matter of programming issues, planning fallacies, and many more such problems. And we can offer a sound diagnosis of why so many waterfall projects failed: the model assumed that all knowledge about a project could be captured by a small group of expert "analysts" right at the start of the project, and the job of the rest of the people involved was just to follow the instructions of those experts, and not to think for themselves. In short, the waterfall model was a species of Taylorism.

Of course, some software projects using the waterfall model succeeded. But it turns out that most successful software projects, over many decades, instead followed the "UNIX philosophy", where development projects sought to quickly achieve an MVP, and then proceeded to improve that initial product with incremental changes.

The recognition of this fact lead, over time, to the formulation of the Lean and Agile development methods. These methods focused on rapidly creating and releasing to users incremental improvements in a software. (Please take the time to read the Wikipedia pages linked to above on Lean and Agile: you are responsible for knowing that material.) The advantages of small improvements to a software product, released frequently, include:

Value is delivered to the users quickly, rather than waiting for "release 2.0" for the users to get a hold of the features they need.
Small batches can more easily be tested and verified as working properly.
If a small release contains an error, it is easier to roll it back than it is to roll back a major release.
The "feedback loop" between users and programmers is shortened, allowing programmers to learn about users' needs more rapidly, and respond more quickly to them.
Greater programmer satisfaction, as programmers can regularly see the value of their work to their users.
As development teams adopted Lean and Agile methods, they often became capable of producing production-ready software on a daily basis, or even more frequently. (For instance, Amazon releases software into production once every 11 seconds, on average.) But this created a problem: managing software in production environments was traditionally the job of operations, not of the developers. And operations viewed its job as slowing the pace of releases, because releases meant bugs, crashes, and other problems operations had to handle.

How could this gulf between development and operations be narrowed? A few forward-thinking operations people saw a way to reconcile the aims of development and operations: operations itself had to become Lean and Agile! In particular, rather than hand-provisioning operations infrastructure, operations team members had to themselves become coders, and apply the full toolkit of Lean and Agile methods to operations: incremental changes, automated testing, source code control, automated builds, and so on. One of those operations people, Patrick Debois, named a conference DevOpsDays, and from that seed, the term "DevOps" spread.

As a result, the term "DevOps" is an umbrella beneath which reside a large number of methods and tools. We can better understand why each of the areas we will studying falls under "DevOps" when we comprehend how they contribute to the DevOps goal of delivering useful and correct software to users as rapidly and as often as possible. Let's look at how the main areas of DevOps contribute to this goal:

Testers cannot test successfully unless they are part of the production process from day one: thus, continuous testing.
Operations cannot successfully deploy constantly evolving products unless deployment itself becomes a software product capable of swiftly evolving: thus, software as infrastructure.
The "business" stakeholders in the product can't ensure it is meeting business needs unless they are continually engaged: thus continual interaction between the engineers and the "business people."
Why is "business" in scare quotes above? 
"We are not developing software. We are doing something larger and software is just part of the solution." -- Tom Poppendieck
How new versions of a piece of software impact the end users cannot be determined without continual feedback from those users, thus:
Incremental development, which means developers work on small batches and can easily change course based on feedback;
Continuous deployment, allowing users to comment on the work done in those small batches; and
Continuous monitoring, so that problems using the product become known right away.
At this point in our course, you should read the Wikipedia page on DevOps.

 Lesson 2: My DevOps Story
Or how, being weaned on nutritious Bell Labs commonsense, and having dined on Oakeshott's critique of rationalism, I was ready to digest the DevOps feast.

I began my career as a software engineer working on MS-DOS computers in the mid-1980s. But before the decade was over, I had begun working on UNIX platforms. As I came to appreciate the elegance of the UNIX programming environment, I sought out the writings of the people who had been instrumental in its creation. This was the tremendous group of software engineers assembled in Bell Labs from the late 1960s through the early 1980s, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Alfred Aho, Peter Weinberger, Bjarne Stroustrup, Jon Bentley, and P.J. Plauger. From them I learned a pragmatic, incremental style of developing software. Rather than pursuing some grand, abstract vision, I learned to deliver minimal but working software to users as regularly as possible, and to learn from user feedback what further features needed to be added.


UNIX and its discontents
By the 1990s, I had learned enough that I felt ready to make my own contributions to this literature, writing on the benefits of symbolic interfaces, pioneering intranets as a way of delivering in-house software, touting infrastructure as code, describing generic client-server interfaces, stressing the basics of OOP, and more.

Subsequently, in a decade away from software engineering, I conducted extensive studies on rationalism. These studies prepared me to understand development methods like the waterfall model as examples of "dreaming of systems so perfect that no one will need to be good"* -- these practitioners were practicing "rationalism in software engineering." Or, to put it in my colleague Nassim Taleb's terms, these systems were designed to be robust, impervious to change. (Consider the desire to "lock down" the software's feature set in the requirements phase of the waterfall model.) What was needed instead were systems that were antifragile, and actually thrived on change.


Nassim Taleb talks antifragility
Thus, when I returned to software engineering and came across the DevOps movement, my experience was not one of meeting someone brand new, but of re-acquainting myself with an old friend who was sporting a new look, and had learned some fancy new tricks since we had last met. In particular, the DevOps approach does not try to ensure design is 100% complete before any code is written: an impossible, rationalist dream. The DevOps approach does not attempt to ensure all software leaving a developer's hands is bug-free: again, an impossible, rationalist dream. And it does not attempt to ensure that all released software is 100% crash-free and secure. Instead, the DevOps approach recognizes that humans are fallible and errors will occur, and so stresses fast recovery from errors and a low-blame culture that emphasizes learning from errors rather than punishing transgressors. So, for instance, rather than blaming the programmer who released a piece of buggy code into production, the DevOps approach asks, "What test haven't we written that would have caught that bug before release?"

* T.S. Eliot, "Choruses from the Rock"

 Lesson 3: DevOps and the Division of Labor

My talk at StackOverflow
Leonard E. Read began his famous essay “I, Pencil” (here) by noting:

I, Pencil, simple though I appear to be, merit your wonder and awe, a claim I shall attempt to prove… Simple? Yet, not a single person on the face of this earth knows how to make me. This sounds fantastic, doesn't it? Especially when it is realized that there are about one and one-half billion of my kind produced in the U.S.A. each year.
Read goes on to list just a few of the many, many people who contribute to the making of a “simple” pencil: loggers, miners, makers of chain saws, hemp growers, the manufacturers of railroads and railroad cars, millworkers, producers of precision assembly-line machines, the harvesters of canola seed, farmers growing castor beans, and more.

What Read is praising in his essay are the benefits of the division of labor, the economic process through which a human community, by dividing up tasks and “assigning” various members to specialize in each task, can greatly increase its output. (I put “assigning” in scare quotes because, in a market economy, for the most part people are not literally assigned to tasks, but instead choose their roles in the division of labor based upon their talents and the prevailing compensation for each possible role they could fill.) The benefits of the division of labor were, of course, recognized at least as far back as Plato and Xenophon. As Plato put it in The Republic, “Well then, how will our state supply these (physical) needs? It will need a farmer, a builder, and a weaver, and also, I think, a shoemaker and one or two others to provide for our bodily needs. So that the minimum state would consist of four or five men.” And Adam Smith famously expounded upon those benefits in The Wealth of Nations, writing “The greatest improvement in the productive powers of labour, and the greater part of the skill, dexterity, and judgment with which it is anywhere directed, or applied, seem to have been the effects of the division of labour” (The Wealth of Nations). Smith goes on to describe the production of pins, a task at which a single person, not specialized at the task, “could scarce, perhaps, with his utmost industry, make one pin in a day, and certainly could not make twenty” (same source as above). But, when ten workers took on specialized tasks, with the help of specialized machinery, though “they were very poor, and therefore but indifferently accommodated with the necessary machinery, they could, when they exerted themselves, make among them about twelve pounds of pins in a day” (same source), with the result that each worker produced several thousand times the number of pins per day as would have been possible without the division of labor.

In the early 20th century, this method of increasing productivity was pushed to its limits. Tasks were broken down to the extent that workers with minimal skills could be assigned simple, highly repetitive actions, and perform them with almost no knowledge of what anyone else on the assembly line was up to. Although this led to higher output of standardized products, the disadvantages of extending the division of labor to this extent were not overlooked. Karl Marx noted that the extensive division of labor alienated the worker from the product he was producing: someone who spends all day tightening a particular lug nut may be little able to associate what they do with “making a car.” But even Adam Smith, who, as we have seen, praised the effects of the division of labor, commented:

In the progress of the division of labour, the employment of the far greater part of those who live by labour, that is, of the great body of people, comes to be confined to a few very simple operations, frequently to one or two. But the understandings of the greater part of men are necessarily formed by their ordinary employments. The man whose whole life is spent in performing a few simple operations, of which the effects are perhaps always the same, or very nearly the same, has no occasion to exert his understanding or to exercise his invention in finding out expedients for removing difficulties which never occur. He naturally loses, therefore, the habit of such exertion, and generally becomes as stupid and ignorant as it is possible to become for a human creature to become. (http://www.econlib.org/library/Smith/smWN20.html#V.1.178)
Smith is pointing out a general problem with the extensive division of labor, but there is a much more particular problem, which only came to prominence in the recent days of increasing automation and increasing demand for innovative and customized products: the sort of mindless, production-line division of tasks common in mid-20th-century factories created a workforce downright discouraged from thinking about how their work fit into the production process as a whole, or how alterations in other parts they did not directly make might affect their own task. Such a holistic view was only supposed to be required of the engineers who designed new products or who designed the factory processes that would produce those new products. As in a planned, socialist economy, all knowledge about the product and the production process would be concentrated at the top of a pyramid of work, and those below the peak were to just mindlessly follow the orders of those knowledge commissars.

A major problem with this approach is that as products become more complicated and the pace of innovation increases, no single mind, or even a small group of minds, is capable of grasping all of the interconnections between the different parts of those complex products, and thus, cannot foresee how an innovation supposedly concerning only one part will actually have ripple effects on many other apparently separate production tasks. This fact was realized quite early at Toyota, and led to the invention of the Toyota Production System, the forerunner of Lean Software Development. As Mary and Tom Poppendieck note in Implementing Lean Software Development:

Toyota’s real innovation is its ability to harness the intellect of “ordinary” employees. Successful lean initiatives must be based first and foremost on a deep respect for every person in the company, especially the “ordinary” people who make the product or pound out the code. (pp. 124-125)
As important as these ideas were in factory production, their importance is even greater in the world of software development, where production is always production of a novel product: otherwise, one would simply buy or rent an existing software product, which is almost always a lower cost venture than “rolling your own.”

In such an environment, it is simply not possible to assign the “workers” (programmers) a simple, repetitive task, and expect them to achieve decent results without at least some understanding of the overall product design, as well as an understanding of how their particular “part” integrates with the other parts of the product as a whole. In such a situation, worker obedience no longer “works.” A manager cannot tell a software engineer working on a product of even moderate complexity to just follow the manager’s orders: the programmer can bring production to a halt simply by asking, “OK, what line of code should I write next?”

But further: no knowledge worker producing an even moderately complex product can do his work properly without his understanding of his part in the production process evolving in continuous interaction with the evolving understanding of all of the other knowledge workers producing the product: one such worker gaining a better understanding of the nature of her component simply must convey that understanding to all other workers upon whom the changes in her component have an impact, and that set of workers typically encompasses almost everyone working on the product. As the Disciplined Agile Framework has it:

Enterprise awareness is one of the key principles behind the Disciplined Agile (DA) framework. The observation is that DA teams work within your organization’s enterprise ecosystem, as do all other teams. There are often existing systems currently in production and minimally your solution shouldn’t impact them. Better yet your solution will hopefully leverage existing functionality and data available in production. You will often have other teams working in parallel to your team, and you may wish to take advantage of a portion of what they’re doing and vice versa. Your organization may be working towards business or technical visions which your team should contribute to. A governance strategy exists which hopefully enhances what your team is doing. (http://www.disciplinedagiledelivery.com/enterpriseawareness/)
The various aspects of Agile / Lean / DevOps production follow from recognizing these realities concerning knowledge workers cooperating to create innovative products. Programmers cannot do their jobs in isolation: thus, the practice of continuous integration, which quickly exposes mutual misunderstandings of how one person’s work impacts that of others. Testers cannot test successfully, without introducing large delays in deployment, unless they are part of the production process from day one: thus, continuous testing, guaranteeing that product flaws are exposed and fixed at the earliest moment possible. Operations cannot successfully deploy constantly evolving products unless deployment itself becomes a software product capable of evolving as fast as the products of the developers: thus, software as infrastructure. The “business” stakeholders in the product cannot ensure the product is really meeting business needs unless they are continually engaged in the development process: thus continual interaction between the engineers and the “business people.” How new versions of a piece of software impact the end users cannot be determined without continual feedback from those users: thus, incremental development, which means developer work on small batches and can easily change course; continuous deployment, allowing end users to comment on the work done in those small batches; and continuous monitoring, so that any problems using the product become known almost as soon as they occur.

Given the above realities, a rigid division of labor hinders businesses from responding agilely to changing market conditions while producing software. If workers are confined to narrow silos based on job title, the interaction between the many components of a complex piece of software must be defined from the top down, and this restriction will result in a very limited capacity to deviate from an initially defined pattern of interaction. In Disciplined Agile, it is noted:

IT departments are complex adaptive organizations.  What we mean by that is that the actions of one team will affect the actions of another team, and so on and so on.  For example, the way that your agile delivery team works will have an effect on, and be affected by, any other team that you interact with.  If you’re working with your operations teams, perhaps as part of your overall DevOps strategy, then each of those teams will need to adapt the way they work to collaborate effectively with one another.  Each team will hopefully learn from the other and improve the way that they work. (Disciplined Agile)
Let us consider a realistic change that might hit a project mid-stream, and just a few of the areas it might impact.

I was once developing an option-trading package for a team of traders. At first, we were only getting quotes for options from a single exchange. The traders realized that they wanted instead to see the best bid and ask from every exchange, which meant we needed to get quotes from four exchanges, not one. This might seem to be a specification change with a narrow scope: just add three more price feeds to the application. Who would this concern beyond the programmer who would be adding the feature?

Well, for one, it would concern the team supporting the price server: this was going to quadruple the load this application would place on it. It was also going to impact the order server: that server had to be prepared to send orders out to the proper exchanges. Oh, and the testing team had better be prepared to simulate quotes coming in from four sources, not one. Also, the monitoring team would have to detect if there was a lag on quotes arriving from four sources, not one.

Or consider the patterns and tales from Michael T. Nygard’s book, Release It!. Continually, in Nygard’s stories, solving a problem in a sophisticated web operation involves a wide range of both technical and business knowledge. For instance, in terms of designing “circuit breakers” that limit the impact of the failure of one component, Nygard notes that deciding what to do when a circuit breaker trips is not merely a technical decision, but involves a deep understanding of business processes: “Should a retail system accept an order if it can’t confirm availability of the customer’s items? What about if it can’t verify the customer’s credit card or shipping address?” (p. 97) Later in the book, a retail system went down entirely on Black Friday, costing his client about a million dollars an hour in sales. Fixing the problem involved understanding the functioning of the frontend of the online store, the order management system, and the scheduling system, and the interactions of the three.

A software engineer who thinks of his job narrowly, as just being responsible for writing the code to do the task he is told the code should do, is not going to be thinking of the multiple other areas this change would affect. And a higher-level designer is unlikely to know enough of the details of all of these areas to fully understand the impact of this change: the best bet for being able to successfully respond to this changed business requirement is for the people working in each specialization also to have a vision of the overall system, an understanding of how other specialized areas function, and to have robust communication channels open between the various specialties: in other words, to break down the silo walls produced by a rigid division of labor, and embrace agile development principles. Or, as said in Disciplined Agile:

However, to succeed delivery teams must often work with people outside of the team, such as enterprise architects, operations engineers, governance people, data management people, and many others.  For agile/lean delivery teams to be effective these people must also work in an agile/lean manner. ( Disciplined Agile)
 Lesson 4: Software Development as a Discovery Procedure
Nobel-Prize-winning economist F.A. Hayek was one of the most significant social theorists of the 20th century. He did important work on the theory of the business cycle, on monetary theory, on the theory of capital, on the informational role of market prices, on the nature of complex phenomena, and on the importance of group selection in evolution.

Hayek's work has important insights to offer those advancing Lean / Agile / DevOps ideas for IT. Here I will focus on his paper "Competition as a Discovery Procedure," and note how similar Hayek's vision for the role of competition in the market is to the Agile understanding of the importance of the "development" part of the phrase "software development."

That essay of Hayek's was written in response to the model of "perfect competition" that had come to dominate economics in the middle of the 20th century. In that model, "competition" meant a state of affairs in which each market participant already knew every relevant detail about the market in which they participated, and thus simply "accepted" a price that, somehow, mysteriously emerged from the "given data" of their market. In such a situation no actual competition, as it is commonly understood, really occurs: every "competitor" already knows what product to offer, what price to charge, and simply passively accepts their situation as it stands.

Similarly, the waterfall model of software development simply assumes that what has to be discovered, in the process of software development, is already fully known at the start of the process. Instead of correctly understanding development as a process through which the analysts, coders, testers, documenters, and users come to a mutual understanding of what the software should really be like, the waterfall model posits that certain experts can fully envision what the final product should be, right at the start of the process. "Software development" then consists of these experts drawing up a document analogous to one of the "five-year plans" of the Soviet Union, detailing how all of the other "participants" should work, according to the experts' plan. No further input is needed as far as what the software being "developed" should actually do. But in reality, as Eric Evans notes:

When we set out to write software, we never know enough. Knowledge on the project is fragmented, scattered among many people and documents, and it's mixed with other information so that we don't even know which bits of knowledge we really need. Domains that seem less technically daunting can be deceiving: we don't realize how much we don't know. This ignorance leads us to make false assumptions. (Evans, p. 15)
Hayek, describing the dependence of economists on the perfect competition model, admits:

It is difficult to defend economists against the charge that for some 40 or 50 years they have been discussing competition on assumptions that, if they were true of the real world, would make it wholly uninteresting and useless. If anyone really knew all about what economic theory calls the data, competition would indeed be a very wasteful method of securing adjustment to these facts. (Hayek, 179)
He goes on to write:

In sports or in examinations, no less than in the world of government contracts or prizes for poetry, it would clearly be pointless to arrange for competition, if we were certain beforehand who would do best... I propose to consider competition as a procedure for the discovery of such facts as, without resort to it, would not be known to anyone... (Hayek, 179)
This, I suggest, is quite analogous to software development: it would be pointless to engage in such a time-consuming, mentally challenging activity if we knew in advance what software "would do best." We engage in software development to discover "such facts as, without resort to it, would not be known to anyone." It is only when we put our interface in front of real users that we find out if it really is "intuitive." It is only when we confront our theoretical calculations with the real data that we know if we got them right. It is only when we put our database out to meet real loads that we can tell if its performance is adequate. We can only tell if our CDN design meets our goals when it actually has to deliver content. None of this means that we should not plan as much as possible, in advance, to make sure our software is up to snuff, just that how much is possible is quite limited.

Hayek highlights the true value of competition in the following passage:

[C]ompetition is valuable only because, and so far as, its results are unpredictable and on the whole different from those which anyone has, or could have, deliberately aimed at... We do not know the facts we hope to discover means of competition, we can never ascertain how effective it has been discovering those facts that might be discovered... The peculiarity of competition -- which it has in common with scientific method -- is that its performance cannot be tested in particular instances where it is significant... The advantages of accepted scientific procedures can never be proved scientifically, but only demonstrated by the common experience that, on the whole, they are better adapted to delivering the goods than alternative approaches. (Hayek, 180)
Bjarne Stroustrup, the creator of C++, has very similar things to say about programming:

When we start, we rarely know the problem well. We often think we do... but we don't. Only a combination of thinking about the problem (analysis) and experimentation (design and implementation) gives us the solid understanding that we need to write a good program... It is rare to find that we had anticipated everything when we analyzed the problem and made the initial design. We should take advantage of the feedback that writing code and testing give us (Stroustrup, 178).
Given that competition is a discovery procedure, and thus we can't ever predict, with certainty, the results of market competition, Hayek considers what sort of predictions economists can make, if any? After all, if economics is a science, we expect it to say at least something about the course of events. Hayek concludes that:

[The theory of the market's] capacity to predict is necessarily limited to predicting the kind of pattern, or the abstract character of the order that will form itself, but does not extend to the prediction of particular facts. (Hayek, 181)
Similarly, in software development, although we can't anticipate in advance exactly what lines of code will be needed... or development would be done!... we can anticipate that good software will exhibit certain patterns. And thus we see Hayek anticipating the "pattern language" approach to software development that was imported from the architectural works of Christopher Alexander into software development.

Let us turn aside from contemplating the market order, upon which Hayek focuses most of his attention, and consider the other order Hayek mentions: science. Although any scientific enterprise involves planning, we cannot possibly plan out in advance what discoveries we will make in the course of some scientific research: if we knew those, we would have already discovered them, and our research would be done: we would just be writing up the results. But that is precisely what the waterfall model supposes: we already know what the software in question must do: development is complete, and all that remains is to turn the requirements into an executable program: essentially, just "writing up the results." This approach actually blocks the process of discovery, as it leaves no room for the developers or the users to achieve new realizations in the process of turning the blueprint into working code, realizations that would expose the "specs," the master plan, as being based upon false hypotheses.

One aspect of recognizing an order as a discovery procedure is the implication that where in an organization the most relevant discoveries will be made is also not predictable in advance. Many scientific discoveries have been made because a lab assistant failed to follow some accepted procedure, or noticed something her "betters" had missed. And many successful market innovations arose at the level of the factory floor or the sales visit, and not in the executive suite.

The waterfall model assumes that every insight about the proper form of the final software product will come from the "analysts," and that it is the job of "the workers," such as programmers, to simply turn those insights into executable code. In this respect, the waterfall model has much in common with "Taylorism," the blueprint for mass production pioneered by Frederick Taylor around the turn of the last century. As Jerry Muller describes it:

Taylorism was based on trying to replace the implicit knowledge of the workmen with mass production methods developed, planned, monitored, and controlled by managers. 'Under scientific management,' [Taylor] wrote, 'the managers assume... the burden of gathering together all the traditional knowledge which in the past has been possessed by the workmen and then of classifying, tabulating, and reducing this knowledge to rules, laws, formulae... Thus all of the planning which under the old system was done by the workmen, must of necessity under the new system be done by management in accordance with the laws of science. (Muller, pp.32-33)
But Taylorism and similar top-down approaches proved inadequate in manufacturing, as demonstrated by the triumph of the Toyota Production System, just as top-down planning failed in the Soviet Union, and just as it does in science. Perhaps their most important piece of wisdom contained in the Lean / Agile / DevOps movement is that the waterfall model of software development fails for very similar reasons.

Once we recognize software development is a discovery procedure, it should prove useful to categorize some of the features of a program that are most likely to be discovered in the actual process of development, rather than having been perfectly anticipated in our initial analysis of our users' requirements. What I offer here is only intended as an initial cut at what surely is a much more extensive list that could be developed. With that caveat in mind, in the process of actually developing software, here are some likely areas where our initial analysis will fall short of the mark:

We will discover "corner solutions" we had not anticipated. Corner solutions are extreme cases that are not easy to detect in the analysis phase, such as a buyer who has purchased every single product the company sells (what do we market to her?), or a security the price of which has dropped to zero (were we dividing by that price at some point?).
Some aspect of the user interface that was "obvious" to the designers will appear completely obscure to the actual users: we won't know this until we put some working software in front of them.
A calculation or algorithm that the users thought was adequate to their purposes actually is not: it may have handled a few common cases correctly, but once exposed to real world data, its shortcomings may become obvious.
Some part of the system may incur a load that was not anticipated during the analysis phase: a particular feature may be much more popular than was predicted, and the capacity of the components assigned to handle that feature might be swamped.
There may be regulatory/legal requirements for the software that the users interviewed by the analysts simply took for granted, the violation of which will only become apparent when those users are faced with a working version of the software.
"Black swan" events will arise in the course of development: a market crash, a new, unforeseen law, a brand-new market emerging, a natural disaster, or a security threat. When we delay as many decisions to as late a time as possible, rather than trying to make all significant choices up front in an "analysis phase," we are far more flexible in responding to such events. As Nassim Taleb wrote, "once we produce a theory, we are not likely to change our minds -- so those who delay developing their theories are better off" (Taleb, 144).
Bibliography
Domain-Driver Design: Tackling Complexity in the Heart of Software, Eric Evans, Addison-Wesley, Upper Saddle River (New Jersey), 2004.

"Competition as a Discovery Procedure," in New Studies in Philosophy, Politics, Economics and the History of Ideas, F.A. Hayek, University of Chicago Press, Chicago, 1978.

Programming: Principles and Practice Using C++, Bjarne Stroustrup, Addison-Wesley, Upper Saddle River (New Jersey), 2014.

The Tyranny of Metrics, Jerry Z. Muller, Princeton University Press, Princeton, 2018.

The Black Swan, Nassim Nicholas Taleb, Random House, New York, 2010.

 Other Material

Michael Race discussing careers in DevOps.
Lean development principles
Agile development principles
Agile Technical Practices
Software Development as a Discovery Procedure
The Fallacies and Truths of DevOps
Although the Agile principles are great, some people have questioned how it is being applied, especially when it comes to the "Scrums" that so often characterize Agile practice. Here are some critics of Scrums:

Why "Agile" and especially Scrum are terrible
Why I'm not a big fan of Scrum
"""

lst = []
def generate_index_json(words,url,page,posturl):
    sentence_lst = nltk.sent_tokenize(words.decode("utf-8"))
    for i in range(0,len(sentence_lst)):
        sentence_lst[i] = re.sub('\n', '', sentence_lst[i])
        
    data = count_words(sentence)
    
    js = []
    for i in data:
        lst = get_all_sentence_contain_word(i,sentence_lst,url,page,posturl)
        for j in lst:
            js.append(j)
    return js
js = generate_index_json(words,url,page,posturl)
import json
with open(posturl+'.json', 'w') as outfile:
    json.dump(js, outfile)

In [9]:
#cooperation and communication 
page="cooperation and communication "
posturl = "comm"
words = """
For lesson 1, please read: 
The flattening of the software release process

 Lesson 2: Breaking down silos
For lesson 2, please read: 
Breaking down silos

 Lesson 3: Using Slack
For lesson 3, please go through these Slack guides:

Slack features
Getting started with Slack
Using Slack
 Other Readings
Bjarne Stroustrup on CS education
"""

lst = []
def generate_index_json(words,url,page,posturl):
    sentence_lst = nltk.sent_tokenize(words.decode("utf-8"))
    for i in range(0,len(sentence_lst)):
        sentence_lst[i] = re.sub('\n', '', sentence_lst[i])
        
    data = count_words(sentence)
    
    js = []
    for i in data:
        lst = get_all_sentence_contain_word(i,sentence_lst,url,page,posturl)
        for j in lst:
            js.append(j)
    return js
js = generate_index_json(words,url,page,posturl)
import json
with open(posturl+'.json', 'w') as outfile:
    json.dump(js, outfile)

In [10]:
#incremental development 
page="incremental development "
posturl = "incr"
words = """
Incremental Development

"Deliver software frequently..." -- The Agile Manifesto

 Lesson 1: Why incremental?

"He pushed another commit!"
Why do incremental development?

Small batches of work make testing easier.
Small batches make it easier to find a bug when you have one.
Small batches enable rapid feedback from the users on whether the software is delivering what they want.
Small batches allow us to deliver value to the user more rapidly.
Small batches are more satisfying to the developers, because they get to see their software being used soon after they write it.
Completing and delivering small batches of work at a time is such an important part of DevOps that we could almost say that every other part of DevOps exists to enable frequent, reliable delivery of small improvements to a piece of software.

In older models of software development, it was thought that rapid delivery of software to users meant buggy systems that crashed a lot.

To complete this section, please read Iterative and incremental development

 Lesson 2: Version control everything
Version control is the name for tools that allow the storing of different versions of project files, and the ability to revert to an earlier version (for instance, when a bug is found in the new version). There are a variety of version control systems in existence, such as RCS, SCCS, Suberversion, CVS, and git. You should study the page on version control here.

For the rest of lesson 2, please read: 
Version control everything

 Lesson 3: git and GitHub

A brief tutorial on using git.
First things first: don't confuse git with GitHub! git is a system for creating and updating a distributed source code control repository. It includes features for adding new files, updating files, deleting files or versions from the repository, branching, resolving version conflicts, and reverting to earlier versions.

GitHub, on the other hand, is a web site that allows for free storage of public repositories ("repos" for short). It is not necessary to use GitHub in order to use git: some companies have a "origin" repo they keep internally, while other people use other public sites, such as BitBucket, to hold the origin version of their repo. (We, in fact, will explore moving to BitBucket during our course.)

 Basic git
 Resolving conflicts
 Submodules
 Other Readings
Fail fast!
Git submodules
"""
lst = []
def generate_index_json(words,url,page,posturl):
    sentence_lst = nltk.sent_tokenize(words.decode("utf-8"))
    for i in range(0,len(sentence_lst)):
        sentence_lst[i] = re.sub('\n', '', sentence_lst[i])
        
    data = count_words(sentence)
    
    js = []
    for i in data:
        lst = get_all_sentence_contain_word(i,sentence_lst,url,page,posturl)
        for j in lst:
            js.append(j)
    return js
js = generate_index_json(words,url,page,posturl)
import json
with open(posturl+'.json', 'w') as outfile:
    json.dump(js, outfile)

In [11]:
#automate builds
page="automate builds"
posturl = "build"
words =  """
Automating Builds

Automating our builds
 Lesson 1: Why Automate Our Builds?

Building software is often a complex process. A single software "product" may be composed of hundreds or thousands of parts, some created by one team, some created by another, some open-source software products, and some proprietary software provided by some vendor. All of these parts must be "assembled" correctly in order to create the final product.

So, as in every other process DevOps touches, as DevOps practitioners we seek to automate everything. Why? Because:

Repeatedly performing such a process by hand is boring.
Repeatedly performing such a process by hand is, especially since it is boring, error prone.
By automating such a process, we can ensure the inclusion of steps like automated testing and security checks, which are likely to be left out, at least on occasion, if the process is performed manually.
The scripts that automate such a process also serve to document it, and in a fashion such that the documentation cannot fall behind the actual process, since it is the actual process!
 Lesson 2: Build Tools Comparison
Build tools are programs that automate the creating of the various products involved in some project. Whenever some "product" is constructed by some program(s) from various components that are combined into that product, we should employ a build tool to automate that combining, rather than building that product by running a series of commands "by hand."

The thing built may be a complete program, a portion of a program, documentation, a book, a database configuration, a Docker container: in short, whenever some component of a system we are working on is composed of sub-components, we should seek to automate the building of that component with a build tool.

 Make
 Ant
 Maven
 PyBuild
 SonarQube
Sources:

PyBuild
Build Concept
Managing projects with make
Java Code Geeks
What's in a build tool
 Lesson 3: Make
For our class, we will use make as our build tool. Although there are more modern tools with additional features, make is sufficient for our purposes because:

make is widely available on all UNIX-based systems.
make allows us to explore the automation of builds in a way that is quite enough for an introduction to such tools.
make executes makefiles. The basic structure of a makefile is:

A target, that we seek to build.
A list of dependencies, upon which that target depends; and
A series of commands to be run, in order to build that target from its dependencies.
A key aspect of make's behavior is that it examines the time at which the target and its dependencies were last updated in order to "decide" whether to execute the commands that build the target from its dependencies. If the timestamp on the target file is newer than any of those on the dependency files, make will "judge" that the target is up-to-date, and does not need to be rebuilt.

A second crucial aspect of make's behavior is that, when it is building a target by executing one or more commands, should any of those commands fail (return a non-zero exit code), make will stop trying to build that target and report the error. It is this feature of make that allows us to insert automated tests into a build, and halt the build should any of those tests fail.

Having examined the logic of a makefile, let us look at an actual instance of one, from one of our projects:


OK, DevOps participants, the iframe documentation I have read suggests that the contents of this makefile should appear above: but they don't! Let's debug this!

 Other Readings
DevOps Build Management"""
lst = []
def generate_index_json(words,url,page,posturl):
    sentence_lst = nltk.sent_tokenize(words.decode("utf-8"))
    for i in range(0,len(sentence_lst)):
        sentence_lst[i] = re.sub('\n', '', sentence_lst[i])
        
    data = count_words(sentence)
    
    js = []
    for i in data:
        lst = get_all_sentence_contain_word(i,sentence_lst,url,page,posturl)
        for j in lst:
            js.append(j)
    return js
js = generate_index_json(words,url,page,posturl)
import json
with open(posturl+'.json', 'w') as outfile:
    json.dump(js, outfile)

In [12]:
#workflow
page="workflow"
posturl = "flow"
words = """
Workflow

 Lesson 1: The Seven Wastes

The Seven Wastes
A key source of DevOps ideas is the Toyota Production System. In particular, the people at Toyota thinking deeply about production identified seven wastes that should be eliminated in any production process:

Overproduction: Don't produce more than the customers need at present! Such overproduction may never prove useful. The system you are building might possibly come to include, say, the ability to feed the data it produces to a neural network. But is any customer asking for this? No? Then wait until they are asking to produce it!
Inventory: Do not pile up partially done work. The aim of production is to deliver finished work to the customer. Partially done work is continually at risk of being rendered otiose by changes in circumstance. Furthermore, all partially done work is a constant drain on the scarce attention of the workers involved: a worker with one partially finished task can focus on that task. A worker with ten partially finished tasks will wind up "thrashing" between those tasks.
Waiting: Avoid having people (or products) lying idle waiting for someone else to finish some work. If each work station along the way in some process works in small batches, that minimizes the amount of waiting likely to occur at other work stations.
Motion: A production process should minimize the movement of producers and products: for example, don't make programmers fork a repo, work on it, and submit a pull request, if you can trust them to contribute directly to the repo.
Transportation: If a programmer can directly ask the end user if their work is OK, don't instead make them go through approval by the project manager, the CIO, the user's manager, and the user's nanny, to see if they can release it.
Rework: Get the work right the first time! A key to this is to work in small batches. Trying to make five major changes to a system all at once, in order to "save time", is only likely to result in amounts of subsequent rework that will more than wipe out the supposed "time savings" of the all-at-once approach.
Overprocessing: The waste of doing things the customer does not need to have done, for instance, building high-level security into an app that no is trying to hack into.
 Lesson 2: Work in Small Batches
An essential part of the Toyota Production System was to work in small batches. That allowed Toyota to respond rapidly to changing customer demand, because it did not have huge inventory backlogs it had to get rid of before changing course. Eric Ries, entrepreneur, tech executive, and coder, writes:

It turns out that there are tremendous benefits from working with a batch size radically smaller than traditional practice suggests. In my experience, a few hours of coding is enough to produce a viable batch and is worth checking in and deploying.
My experience completely agrees with Eric's: I often check in code, and deploy it into production, once an hour. Why are small batches so important? Let's look at the reasons Eric lists:

Small batches mean faster feedback.
Small batches mean problems are instantly localized.
Small batches reduce risk.
Small batches reduce overhead.
But please read Eric's full article! This is a very experienced developer, telling us our batch size should be radically smaller than most people believe.

 Lesson 3: Kanban
The key points to understanding Kanban:

Kanban serves to make work visible.
Kanban acts to limit work-in-progress.
Kanban is a pull system.
Kanban enables us to visualize workflow.
Kanban allows us to see work bottlenecks.
Simply having a card system is not Kanban: card systems may not support one or more of the properties of Kanban.
For more on Kanban, please read Altassian on Kanban .

 Lesson 4: Scrum

"I'd like to institute an organizational system called scrum."

Copyright: Scrum.org/medium.com
Scrum is a method of managing software development where a team divides its work into sprints of roughly a couple of weeks, during which progress is evaluated in a daily stand-up meeting called a daily scrum. Here is the Wikipedia article on on scrum.

The Scrum Framework below is implemented in the Online DevOps project for Fall 2018:

Scrum Roles
Product Owner
Prof. Eugene Callahan / Akshay Tambe

Build and manage the product backlog.
Ensure the team to everyone understands the work items in the product backlog.
Give the team clear guidance on which features to deliver next.
Decide when to deliver the product.

Scrum Master
Akshay Tambe

Works as a facilitator-in-chief.
Schedule the needed resources (both human and logistical) for sprint planning, stand-up, sprint review, and the sprint retrospective.
The Scrum Team
Denis Petelin / Shawn Widjaja / Jeff Cui / Mandy Kong / Felix Angel Baez

Drives the plan for each sprint (Implementation).
Forecast how much work they believe they can complete over the iteration using their historical velocity as a guide.
Updates and shows concerns to the Scrum-master frequently about the work progress.

Keeping the iteration length fixed gives the development team important feedback on their estimation and delivery process, which in turn makes their forecasts increasingly accurate over time.

Components of Scrum
Sprint planning
A team planning meeting that determines what to complete in the coming sprint.
In our process, the normal Sprint will run for 2 weeks.
An estimated 5-6 sprint iterations will follow after our 1st sprint.
The Scrum Team estimates Stories to commit in an Epic.
The Scrum Team also calculates Story points for each story.
Epic
Large body of work that can be broken down into a number of smaller stories.

Story
Small body of work (modules)/ Actionable item which is a part of epic.

Task
Decomposed parts of a story that address how the story will be completed.


Daily Stand-up

Also known as a daily scrum, a 15-minute mini-meeting for the software team to sync.
As this is an Online Course, students are required to give updates everyday on Slack.
Sample Format:

What you worked on yesterday?
What are you planning to work on today?
Any Blockers/Concerns?

Weekly Stand-up

A 20-minute Zoom Conference every week by Scrum-master to discuss about project.

Sprint Demo

A sharing meeting where the team shows what they've shipped in that sprint.
This meeting will occur mostly on before Sprint Ends. (10-minute working demo by Scrum Team)

Sprint Retrospective

A review of what did and didn't go well with actions to make the next sprint better.
Happens on last day of Sprint or during the planning of next Sprint.
Sample Format:

What went well?
What went wrong?
How can we do better in next Sprint?

Estimation of LOE in JIRA Tickets
Story Point Estimation

Story points are a unit of measure for expressing an estimate of the overall effort that will be required to fully implement a product backlog item or any other piece of work.
Fibonacci Series Format to estimate LOE: 
1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144…

Mostly used Story Points: 
0 → Story is not counted in Velocity/Capacity 
1 → Can be done within a day 
2 → Can be done within 2 days 
3 → Can be done within 3 days 
5 → Can be done within a week 
8 → Requires more than a week 
13 → Requires complete iteration 
Sprint Iterations in DevOps Coursework
*** Estimated Cycles, may vary over the coursework


Sprint Cycle: 2 Weeks 
*** Exception: Sprint 1 
*** Might Cut-off 1 Sprint due to Mid-Season Holidays

Iteration #	Period
Sprint #1 (Short)	10th September - 18th September (1 day extension)
Sprint #2	19th September - 2nd October
Sprint #3	3rd October - 16th October
Sprint #4	17th October - 30th October
Sprint #5	31st October - 13th November
Sprint #6	14th November - 27th November
Sprint #7	28th November - 11th December
Sprint #8	12th November - 25th December

SCRUM Documentation: Sprint Planning for Fall 2018

 Other Readings
Wikipedia on Kanban
Ref: Atlassian's article on Epic vs. Story vs. Task
"""
lst = []
def generate_index_json(words,url,page,posturl):
    sentence_lst = nltk.sent_tokenize(words.decode("utf-8"))
    for i in range(0,len(sentence_lst)):
        sentence_lst[i] = re.sub('\n', '', sentence_lst[i])
        
    data = count_words(sentence)
    
    js = []
    for i in data:
        lst = get_all_sentence_contain_word(i,sentence_lst,url,page,posturl)
        for j in lst:
            js.append(j)
    return js
js = generate_index_json(words,url,page,posturl)
import json
with open(posturl+'.json', 'w') as outfile:
    json.dump(js, outfile)

In [13]:
#automate test
page = "automate test"
posturl = "test"
words = """
Automated Testing

 Lesson 1: The Whys and Hows of Automated Testing
This section is co-authored by Denis Petelin and Prof. Callahan.

Why do we test? We test to see if the software does what we want it to do. 
And whenever it doesn't, what are the effects?
Classic way to do testing: test cases, bugs, regression, growing "regression debt". 
Please study regression testing .
Revolutionary idea: zero-bug mindset: bugs are not tasks in backlog, you have to fix them as you go -- by the end of the day zero bugs exist. 
This is part of the "incremental improvement" idea: if we do continuous integration and continuous delivery with small batches, each delivery into production should produce at most a "small batch" of bugs! And we can fix that small batch immediately.
Revolutionary idea: zero-length feedback, developers can test and fix bugs immediately. 
Part of this is establishing a culture of trust: If we need to feed approval for each change through multiple levels of bureaucracy, we can't fix bugs right away. And we can relate this back to the discussion in chapter one on "Taylorism" versus "Toyota Production System": in the former, the "workers" just carry out the plans of the "managers." In the latter, everyone is responsible for the entire production process.
Test pyramid:
Unit tests for individual classes and methods (models, controllers, views)
Integration tests to check feature top-down.
Acceptance test to check feature as user sees it.
Terminology:
TestCase: set of checks to be performed.
Fixture: prepared data to be loaded into the db.
Fake: a real object created for the test.
Stub: a crude imitation of real object returning hard-coded values.
Mock: an elegant imitation of the object (if real object is not yet ready or expensive).
Test suite: set of tests serving specific purpose. Always: smoke test, main success [AKA happy path], extended tests.
Django benefits:
No need to unit-test Models (except custom query sets & business logic methods).
No need to integration-test Autogenerated View (except live tests).
Anatomy of test case:
setUp()
Test_whatYouTest_whatYouDo_whatYouExpect.
Arrange -- Act -- Assert.
Assert kinds.
tearDown()
Preparing data -- AutoFixture
TaskModelTransactionTestCase(TransactionTestCase): regular fixture.
For lazy guys -- AutoFixture :)
fixture.create()
Typical mistakes:
Useless tests -- testing default Models methods, for example.
Testing implementation -- method save_changes() returns OK -- everything is OK! (Test should check if changes indeed persisted).
Large tests? Fat controllers! 
Refactoring:
Small methods -- less than a screen.
Small tests -- 8-10 lines.
Refactoring palette in the PyCharm.
Good beginners pattern:
Create Model. Add tests if there are custom methods.
Create Controller (View as Django calls it). Test if does what it should do. Test if it handles errors.
Write View (Template as Django calls it). Write LiveTestCase using requirements.
Why preparing requirements still matters (“Please show balance" in Danfoss).
Big idea: can we somehow make requirements document testable?
Turning use cases into tests -- gherkin
Feature file & steps
Passing info around -- context
Selenium -- driving real browser around
Behave test runner (behave-Jango)
JIRA: acceptance tests are now part of the 
Relying strictly on this type of testing is bad idea! (See execution time for one test vs whole suite!)
 Lesson 2: Testing Frameworks

Python testing with pytest! Part 1: Introductions and motivating testing.

 Lesson 3: Jenkins
Jenkins is a CI/CD tool, but it is used to force the running of automated tests as part of the build process, so we have included it in this section.


Introduction to Jenkins by Denis Petelin
 Other Material

What Is Jenkins?
A Crash Course in Continuous Testing
The Future of Continuous Testing: Fail Faster
Design for a Python-based spell checker 
(We want one of these for our web pages! We will try to build it.)
Setting up Jenkins as a webhook in GitHub
Slack notification plugin integration with Jenkins
Testing and Continuous Integration Part 1
Testing and Continuous Integration Part 2
Testing only recently changed git files in jenkins
"""

lst = []
def generate_index_json(words,url,page,posturl):
    sentence_lst = nltk.sent_tokenize(words.decode("utf-8"))
    for i in range(0,len(sentence_lst)):
        sentence_lst[i] = re.sub('\n', '', sentence_lst[i])
        
    data = count_words(sentence)
    
    js = []
    for i in data:
        lst = get_all_sentence_contain_word(i,sentence_lst,url,page,posturl)
        for j in lst:
            js.append(j)
    return js
js = generate_index_json(words,url,page,posturl)
import json
with open(posturl+'.json', 'w') as outfile:
    json.dump(js, outfile)

In [14]:
#infra_as_code 
page = "infrassture as code"
posturl = "infra"
words = """
Infrastructure as Code


Infrastructure as code and Docker
 Lesson 1: What Is Infrastructure as Code?

Infrastructure as code: What is it? Why is it important? 
(This is a fantastic 3 minute explanation of the value of Infrastructure as Code.)
Infrastructure as code (IaC) as the idea that, rather than manually provisioning servers, or setting up hardware through a point-and-click GUI, the "server room" should itself be managed by code. That code can then be put under version control, tested, deployed with automated build tools, and so on. The code also serves as necessarily up-to-date documentation of what the infrastructure is.

The advantages of IaC can be divided into three main categories:

Cost savings: By automating hardware provisioning, the time of the people who would have been doing that by hand is freed up for other tasks.
Speed of deployment: It is much faster to configure infrastructure by running a script than by manually setting a bunch of parameters in a GUI interface.
Lower error rates: It is error-prone, because it is boring, too configure systems "by hand." A script can be debugged once, and then will run reliably again and again. Furthermore, as code, the infrastructure can be read and reasoned about. It is very hard to do that with a bunch of check-boxes!
(Source: Wikipedia on Infrastructure as Code )

A DevOps principle: Asking people to behave like automatons bores and dehumanizes them. Asking them to devise clever ways to automate things interests them, and treats them as the rational beings that they are! 
Aristotle: humans are rational animals.

 Lesson 2: Available Tools
 Puppet
 Chef
 Ansible
 SaltStack
 Lesson 3: Running Docker
by 
Prashantkumar Patel and Prof. Callahan

First thing: Make sure you have Docker installed! You won't get any further in following along on your laptop if you do not.

Secondly: Please clone (if you have not already) our online DevOps repo: 
git clone https://github.com/gcallah/OnlineDevops.git

Once you have cloned that repo, please open two shells: in one, we will look at your local environment, and in the other we will explore the container.

In one of the two shells, in your OnlineDevops directory, please run: 
./container.sh 
That should put you inside the OnlineDevops container: if that command worked, you should see your prompt change. 
If it did, let's explore the shells you are in a little to try to understand better what a container is. 
I am going to proceed by showing you the results of running the same command inside and outside the container on my machine: your results will be different, but similar, to mine.

First of all, let's look at the root file system, inside and outside the container:

Outside the container: 
ls / 
    Macintosh:OnlineDevops gcallah$ ls /
    Applications            bin                net
    Library                cores                opt
    Network                debug.txt            private
    Shockwave Log            debug.txt.1            sbin
    System                dev                tmp
    User Guides And Information    etc                usr
    User Information        home                var
    Users                installer.failurerequests
    Volumes                logFile.xsl
                    
Inside the container: 
ls / 
    root@a5dd222a9812:/home/DevOps# ls /
    bin   dev  home  lib64    mnt  proc           root  sbin  sys    usr
    boot  etc  lib     media    opt  requirements.txt  run   srv   tmp    var
                    
What's of note here: From outside and from inside the container, we see completely different file systems! Inside the container, we are in a chroot file system.

What about our view of what processes are running? 
Outside the container we see: 
ps -ef | wc -l 
     Macintosh:OnlineDevops gcallah$ ps -ef | wc -l
     667
                    
From outside the container, the OS lists 667 processes as running on my Mac.

Inside the container we see: 
ps -ef 
    root@a5dd222a9812:/home/DevOps# ps -ef
    UID        PID  PPID  C STIME TTY          TIME CMD
    root         1     0  0 Oct25 pts/0    00:00:00 bash
    root       192     1  0 21:18 pts/0    00:00:00 ps -ef
                    
From inside the container, there are two processes running! The container has process isolation from the host. It has its own process namespace separate from its host's namespace and from the namespace of any other containers running on that host. The separate namespace also provides the container with its own hostname, its own user IDs, and its own inter-process communication names.

Some Docker commands
Now let's look at what some Docker commands are available, and what they do.

docker ps

This will list your currently running Docker images. When Prof. Callahan runs it while preparing this lecture, he sees: 
    Macintosh:OnlineDevops gcallah$ docker ps
    CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS                     NAMES
    429ff1a37d21        devops              "bash"              11 seconds ago
    Up 6 seconds        0.0.0.0:32768->8000/tcp   stupefied_swanson
                    
(The 0.0.0.0 is the IP address to use for the container, and 32768 is the port it is using.)

docker ps -a 
This will list all images that have been run on the system, not just those that are active: 
    Macintosh:OnlineDevops gcallah$ docker ps -a
    CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS                           PORTS                    NAMES
    c560c3729d7e        devops              "bash"              18 minutes ago
    Up 18 minutes                    0.0.0.0:8000->8000/tcp   brave_kirch
    429ff1a37d21        devops              "bash"              About an hour ago   Exited (130) 44 minutes ago                               stupefied_swanson
    32b272487675        devops              "bash"              5 days ago          Created                                                   wonderful_heyrovsky
    a5dd222a9812        devops              "bash"              10 days ago         Exited (130) About an hour ago                            naughty_beaver
    e82e4fb65823        devops              "bash"              11 days ago         Exited (130) 11 days ago                                  youthful_noyce
    0433f774c394        devops              "bash"              11 days ago         Created                                                   stupefied_hawking
    1ba39e023554        devops              "bash"              11 days ago         Created                                                   practical_ptolemy
    4a6bc8cf7ab0        devops              "bash"              11 days ago         Created                                                   vigilant_curie
    47cb14ca5b1b        devops              "bash"              2 weeks ago
    Exited (255) 11 days ago         0.0.0.0:8000->8000/tcp   determined_goldwasser
    7af5f14f88d9        f418f33054e8        "bash"              2 weeks ago         Exited (130) 2 weeks ago                                  modest_hypatia
    e3292cdc1449        indra               "bash"              2 weeks ago         Exited (130) 2 weeks ago                                  competent_hugle
    3ad630752ebf        indra               "bash"              2 weeks ago         Exited (130) 2 weeks ago                                  amazing_meitner
    fe439d8e15c2        indra               "bash"              2 weeks ago         Exited (130) 2 weeks a
                        
docker images

This command should give you the list of images that are available on the system. For example in Prashant's system it looks something like this: 
    ENG-EJC369-02:$ docker images
    REPOSITORY            TAG                 IMAGE ID            CREATED             SIZE
    gcallah/jenkins_py3   latest              21130e104ca1        4 weeks ago         742MB
    jenkins               latest              cd14cecfdb3a        6 weeks ago         696MB
    gcallah/indra         v7                  ad9670e8b27f        2 months ago        946MB
    python                latest              efb6baa1169f        5 months ago        691MB
    ubuntu                latest              f975c5035748        5 months ago        112MB
    gcallah/emu86         v4                  f6833ae8bf9e        6 months ago        776MB
    gcallah/django        latest              432de70e222d        6 months ago        769MB
    bash                  latest              59507b30b48a        6 months ago        12.2MB
    alpine                latest              3fd9065eaf02        7 months ago        4.15MB
                    
In the above listing, the nginx image is not installed. Running the following command will pull the nginx image from DockerHub, which is like GitHub, but for docker images.

docker pull nginx

Now that you have installed the nginx image, just run the docker images command again, in the list you should see ngnix image as below: 
    ENG-EJC369-02:$ docker images
    REPOSITORY            TAG                 IMAGE ID            CREATED             SIZE
    nginx                 latest              71c43202b8ac        7 hours ago         109MB
    gcallah/jenkins_py3   latest              21130e104ca1        4 weeks ago         742MB
    jenkins               latest              cd14cecfdb3a        6 weeks ago         696MB
    gcallah/indra         v7                  ad9670e8b27f        2 months ago        946MB
    python                latest              efb6baa1169f        5 months ago        691MB
    ubuntu                latest              f975c5035748        5 months ago        112MB
    gcallah/emu86         v4                  f6833ae8bf9e        6 months ago        776MB
    gcallah/django        latest              432de70e222d        6 months ago        769MB
    bash                  latest              59507b30b48a        6 months ago        12.2MB
    alpine                latest              3fd9065eaf02        7 months ago        4.15MB
                    
(You can put an image into DockerHub using docker push.)

Download website code
Ok now that you have download the nginx image, let's download the static website that you are going to host inside the docker container. We are going to use the algorithms website for another course. You can find the code for the website here.

Please remember the location where you have cloned the repository. Prashant has cloned it in /Users/prashant/school/algorithms. Your location will be different than this, please note the location.

Let's make a container
Ok, so we have downloaded the nginx image and the code of the website which we want to host All we need to do is just make a container out of the image. We will put the code of website inside the container so that the webserver which is nginx in our case can read the html files and host it in local server. The command for that is as shown below.

docker run --name algo_website -p 127.0.0.1:8080:80 -v /Users/prashant/school/algos/:/usr/share/nginx/html -d nginx

Please don't forget to change the location of algorithms directory in above command. After you run the command open the browser and type http://localhost:8080 and you should be able to see the webpage. Windows user should use the address http://0.0.0.0:8080 instead of localhost.

After we leave the container, we can get rid of it using docker rm algo_website. If we need to remove a container that is still running, we will have to stop it first with docker stop.

 Our Docker Implementation
We use Docker in our projects for two main reasons, with a third to come:

To set up a local version of a web server that will be configured "just like" our production server. ("Just like" is in quotes because that is always the ideal, but it may not be fully achieved.)
To provide our full suite of development tools, such as the correct Python version, make, flake8, various Python libraries, etc., in one simple to build package, so all developer's have a consistent environment.
Ultimately, we should be deploying the container where we locally test our web servers right into production, guaranteeing that development and production are identical environments. Unfortunately, at the moment, the places we are hosting do not support that. We are exploring other options.
So we need to know how to create the right container for each project. Each project we work on should have a Dockerfile consisting of instructions on how to build the image for that project, a requirements.txt listing what external modules need to be included in the image, and a line in the project's makefile automating the build of the image. This is infrastructure as code, since the infrastructure for the project is built from these files of code.

So, in the makefile we want something like:

container: $(DOCKER_DIR)/Dockerfile $(DOCKER_DIR)/requirements.txt 
        docker build -t indra docker

Here is a sample Dockerfile. 
The FROM python:3.6.0 command says what base image to build our image from. 
The COPY requirements.txt /requirements.txt line brings the requirements file inside the container. 
The line RUN pip install -r requirements.txt installs everything from the requirements file in the container. 
The line ENV user_type TERMINAL sets an environment variable (user_type) that will be available inside the container. 
WORKDIR /home/IndrasNet/ sets the starting directory inside the container.
Here is the requirements file it uses.
 Our More Advanced Docker Usage
In the Online DevOps course project, we use a more advanced Docker setup: we employ Docker Compose to define and run an application composed of more than one Docker container.

Docker Compose use a YAML file to specify the configuration of a multi-container Docker app.

In the Online DevOps course setup, we are running just two containers: one to run the MySQL database, and the other to run our Django web server. But other Docker Compose setups might include a web server, a load balancer, a database, an authentication server, and more! 
Here is the YAML file that specifies our two-container application.

 Other Readings
Containers are eating the world
Infrastructure automation
A Behavior Driven Developer's guide to Infrastructure as Code
"""

lst = []
def generate_index_json(words,url,page,posturl):
    sentence_lst = nltk.sent_tokenize(words.decode("utf-8"))
    for i in range(0,len(sentence_lst)):
        sentence_lst[i] = re.sub('\n', '', sentence_lst[i])
        
    data = count_words(sentence)
    
    js = []
    for i in data:
        lst = get_all_sentence_contain_word(i,sentence_lst,url,page,posturl)
        for j in lst:
            js.append(j)
    return js
js = generate_index_json(words,url,page,posturl)
import json
with open(posturl+'.json', 'w') as outfile:
    json.dump(js, outfile)

In [15]:
page = "cloud deployment"
posturl = "cloud"
words = """
Kubernetes as universal cloud infrastructure
Public cloud comparison
Pro Tips on Cloud Management
AWS Guide on how to host a custom domain website on S3"""
js = generate_index_json(words,url,page,posturl)
import json
with open(posturl+'.json', 'w') as outfile:
    json.dump(js, outfile)


In [16]:
page = "micro service"
posturl = "micro"
words = """
Microservices and Serverless Computing

"The philosophy of the microservices architecture essentially equates to the Unix philosophy of 'Do one thing and do it well.'" -- Wikipedia

 Lesson 1
For organizations with large, monolithic applications, adopting a microservice approach should prove beneficial because:

They are simpler to understand than a monolithic application.
They are easier to scale: only the part of the overall application that is the bottleneck needs to be given more resources.
It is easier to do continuous delivery when it is microservices being updated, rather than an entire monolithic application.
There will be looser coupling between components of the system when it is built on microservices.
Bugs are isolated in a microservice and can't bring down all components of a system.
They enable more freedom of choice among technologies, as each microservice team can choose different languages, libraries, databases, and so on."""
js = generate_index_json(words,url,page,posturl)
import json
with open(posturl+'.json', 'w') as outfile:
    json.dump(js, outfile)

In [17]:
#monitor
page = "monitor"
posturl = "monit"
monitor = """
Monitoring

KDE System Guard
 Lesson 1
 Lesson 2
 Lesson 3
 Our Monitoring Implementation
We use the product StatusCake to monitor the uptime on our various different websites. In our current configuration, StatusCake simply pings our URLs, and sends alerts when it encounters discouraging HTTP Status Codes.

We've configured StatusCake to send emails to the project owners, as well as Slack messages to appropriate, project-specific channels.

There's definitely room for improvement here, but StatusCake is a good starting place. Here are some things that we might consider when improving on this foundation:

StatusCake is a black box solution -- it doesn't have any visibility of the internals of our program, it provides us with data on how our website looks to users. It would be nice to have a monitoring solution that combines black-box reporting with logs and stack traces.
StatusCake, as it's configured right now, only checks for HTTP status codes. However, it's possible that our web server could be ACKing with empty pages. Ideally, we would want to test for content, in addition to headers, to mitigate this scenario.
We're not collecting our logs or runtime metrics in any meaningful way. That should be a crucial next step in aiding disaster response.
 Other Readings
Top 14 Monitoring tools that every DevOps needs
Monitoring and Observability
Monitoring in the DevOps Pipeline
DevOps monitoring tools
Google Analytics Adapts to GDPR, But Questions Remain
Nagios
What is Nagios?
Zabbix vs Nagios Comparison for Network and Bandwidth Monitoring
What is Sensu?
Sysdig
New relic: Change the way you monitor infrastructure
Google Stackdriver
Introducing New Relic Applied Intelligence
Comparison of 18 APM & Application Monitoring Tools
BigPanda"""
js = generate_index_json(words,url,page,posturl)
import json
with open(posturl+'.json', 'w') as outfile:
    json.dump(js, outfile)

In [18]:
#security
page = "security"
posturl = "secur"
security = """
Security (Tool: in-toto)

 Lesson 1: Securing the DevOps Toolchain
Please watch the following video:


Securing the DevOps Toolchain 
(The slides for this presentation.)
Details on Santiago's project are here.

 Lesson 2: Some types of attacks
SQL Injection
"A SQL Injection attack consists of insertion or "injection" of a SQL query via the input data from the client to the application. A successful SQL injection exploit can read sensitive data from the database, modify database data (Insert /Update/Delete), execute administration operations on the database (such as shutdown the DBMS), recover the content of a given file present on the DBMS file system and in some cases issue commands to the operating system. SQL injection attacks are a type of injection attack, in which SQL commands are injected into data-plane input in order to effect the execution of predefined SQL commands." 
-www.owasp.org

Cross-Site Scripting (XSS)
"Cross-Site Scripting (XSS) attacks are a type of injection, in which malicious scripts are injected into otherwise benign and trusted web sites. XSS attacks occur when an attacker uses a web application to send malicious code, generally in the form of a browser side script, to a different end user. Flaws that allow these attacks to succeed are quite widespread and occur anywhere a web application uses input from a user within the output it generates without validating or encoding it. XSS can be used by an attacker to send a malicious script to an unsuspecting user. The end user’s browser has no way to know that the script should not be trusted, and will execute the script. Because it thinks the script came from a trusted source, the malicious script can access any cookies, session tokens, or other sensitive information retained by the browser and used with that site. These scripts can even rewrite the content of the HTML that site. These scripts can even rewrite the content of the HTML page." 
-www.owasp.org

Cross-Site Request Forgery (CSRF)
"Cross-Site Request Forgery (CSRF) is an attack that forces an end user to execute unwanted actions on a web application in which they're currently authenticated. CSRF attacks specifically target state-changing requests, not theft of data, since the attacker has no way to see the response to the forged request. With a little help of social engineering (such as sending a link via email or chat), an attacker may trick the users of a web application into executing actions of the attacker's choosing. If the victim is a normal user, a successful CSRF attack can force the user to perform state changing requests like transferring funds, changing their email address, and so forth. If the victim is an administrative account, CSRF can compromise the entire web application." 
-www.owasp.org

Command Injection
"Command injection is an attack in which the goal is execution of arbitrary commands on the host operating system via a vulnerable application. Command injection attacks are possible when an application passes unsafe user supplied data (forms, cookies, HTTP headers etc.) to a system shell. In this attack, the attacker-supplied operating system commands are usually executed with the privileges of the vulnerable application. Command injection attacks are possible largely due to insufficient input validation." 
-www.owasp.org

Web Shell
"A web shell is a script that can be uploaded to a web server to enable remote administration of the machine. Infected web servers can be either Internet-facing or internal to the network, where the web shell is used to pivot further to internal hosts." 
-www.us-cert.gov

Path Traversal
"A path traversal attack (also known as directory traversal) aims to access files and directories that are stored outside the web root folder. By manipulating variables that reference files with “dot-dot-slash (../)” sequences and its variations or by using absolute file paths, it may be possible to access arbitrary files and directories stored on file system including application source code or configuration and critical system files. It should be noted that access to files is limited by system operational access control (such as in the case of locked or in-use files on the Microsoft Windows operating system)." 
-www.owasp.org

XML External Entity
"An XML External Entity attack is a type of attack against an application that parses XML input. This attack occurs when XML input containing a reference to an external entity is processed by a weakly configured XML parser. This attack may lead to the disclosure of confidential data, denial of service, server side request forgery, port scanning from the perspective of the machine where the parser is located, and other system impacts." 
-www.owasp.org

Insecure Deserialization
"Insecure Deserialization is a vulnerability which occurs when untrusted data is used to abuse the logic of an application, inflict a denial of service (DoS) attack, or even execute arbitrary code upon it being deserialized. Web applications make use of serialization and deserialization on a regular basis and most programming languages even provide native features to serialize data (especially into common formats like JSON and XML). It’s frequently possible for an attacker to abuse these deserialization features when the application is deserializing untrusted data which the attacker controls. Successful insecure deserialization attacks could allow an attacker to carry out denial-of-service (DoS) attacks, authentication bypasses, and remote code execution attacks." 
-www.acunetix.com

Cloud Security
Delivering IT services via the Cloud be is a time saver, but it has a downside: the security of those services. Clouds provide these services by relying on virtualization technology. While virtualization reduces some security risks, others are increased because the attack surface in a cloud service is greater. The risks can be broadly categorised into hypervisor security, network security, data security (in transit and at rest), security of monitoring and incident response.

 Lesson 3
 Other Readings
An introduction to DevSecOps or rugged DevOps 
Rugged DevOps is the practice of shifting Security left. Security teams can introduce security much earlier in the development process. This is contrary to the standard approach where security practices of code analysis and vulnerability testing is placed just before the application is deployed into production.
DevSecOps: Including Security in Software Life cycle 
Introducing security practices earlier in the software engineering will enable developers to always think of security while developing their application. This will also enable them to come up with creative solutions for enabling security in their applications.
DevOps: A Holy Grail for Security? 
DevOps provides a method to have the concept of "Security by Design" integrated into the software engineering lifecycle from the start. It also helps keep security balanced with business objectives.
DevOps:Performing Penetration Testing on Web Based Application to find Vulnerabilities 
Performing a Automated/Manual penetration testing in SDLC can help to find vulnerability at early stage of development and reduce the threat to the system. Its good to work in tandem with penetration testing team to build a robust security posture.
DevOps: Performing a Network Penetration Testing 
Performing a Network or Infrastructure based penetration testing helps to identify the weak links inside the network components such as Servers, it also detects presence/absence of firewalls, NIDS/HIDS, vulnerable ports and services running on them. The early detection and mitigation of these vulnerabilities can help an organisation to build a robust and secure infrastructure. NMAP is an important tools to scan ports and find vulnerable services running on a server.
DevOps: Secure Coding Practice 
Enforcing a secure coding practice makes a programme/application resistant to malicious attackers or potentially malicious programmes. Implementing such practices in DevOps can help to build a robust security posture right from the beginning. This link enumerates top 10 best practice which any organisation following a devops practice should follow.
DevOps: 5 best practices for integrating security into your DevOps 
These five best practice can help to integrate security features in SDLC within DevOps practice. It entails fast and efficient way to cultivate security and has an edge over traditional way of implementing security.
DevOps: Cloud Security in DevOps 
Cloud security is paramount in DevOps culture since most of the organisation is preferring to host off-premises services. This book provides the best way to provide security at SaaS,IaaS,PaaS services.
DevOps: Misconfiguration and Security Threat in IaaS cloud 
Various problems affecting the cloud are insecure interface APIs, shared resources, data breaches, malicious insiders, and misconfiguration issues. The potential attack vectors could be storage enumeration attack, link swap attacks, leaked access tokens, Key management and legal concerns. Deploying mitigation techniques like second factor authentication, encrypted key management, Logging, Audits can help to reduce the risk.
Cloud Security Solution 
Few of the security solution infrastructure can be deployed in cloud environment to stop attacks like DDoS, web application attack can without reducing performance. Akamai Intelligent perform can be instrumental into threat intelligence to detect latest threats and act as a expertise to adapt to shifting and new tactics.
"""
js = generate_index_json(words,url,page,posturl)
import json
with open(posturl+'.json', 'w') as outfile:
    json.dump(js, outfile)

In [19]:
#complete_dev_operation
page = "Complete DevOps Operations"
posturl = "suite"
complete_dev_operation = """
Complete DevOps Operations (Tool: Azure DevOps)

"It's clear DevOps has become increasingly critical to a team's success." -- Jamie Cool, Microsoft

 Lesson 1: Microsoft's DevOps Tools
Here we have Denis Petelin presenting, first, his understanding of what DevOps is, and secondly, how the Microsoft toolkit offers end-to-end support for good DevOps practices:


Azure DevOps, Part I

Azure DevOps, Part II
Below are the key DevOps ideas that Denis presented, with some commentary as to how they relate to other material presented in our course:

Small releases 
This is our incremental development idea stated a little differently. The result of adopting this practice is almost magical: whereas previously, you may have found yourself doing an eight-hour coding spree, and then testing and correcting bugs for another 20 hours, you will find that 12 one-hour sessions, each aimed to get a tiny part of your software working, will get the same amount of work done as the 8-hour session... but will be followed by 20 minutes of debugging.
Everything as code 
Infrastructure as code is a part of this principle. But also, your tests should be code (pytest), and your code reviews should be code (flake8), and your builds should be code (make).
Use components 
This is an extension of the UNIX tool philosophy, which over many years led to the idea of microservices. The basic idea is the code that has the fewest bugs and the minimum development time is the code you don't have to write, since somebody else already wrote it!
Automate everything you can 
Automating routine workflows allows us to:
Turn them into code that itself can be put under version control and tested.
Remove the drudgery of these repeated tasks from humans and assign it to machines.
Document these workflows through the very code that automates them.
Zero-bug mindset 
When we embrace the idea of (very) small releases, and adopt automated testing, there is no need for us to have any "bug backlog" at all: the automated tests should catch most problems, and anything they miss, since our release was a very small addition to the previous code, will be easy to track down.
Develop resilience (be anti-fragile!) 
Nassim Taleb teaches in Tandon's finance department. He has developed the idea of anti-fragility. An application of that idea is Netflix's Chaos Monkey tool, part of its Simian Army toolkit.
Use cloud services 
This is really an extension of the "Use components" idea: why manage your own server room when there are pros at managing server rooms who will sell you their expertise at a reasonable cost?
 Lesson 2: DevOps at DRAFT

Trevor John on DevOps at DRAFT
Please watch this video, and pay special attention to the discussion about whether DevOps should be a job, or a way of working.

 Other Readings
Azure DevOps
Introducing Azure DevOps
Microsoft Azure DevOps: What You Need to Know
Azure Lounge
"""
js = generate_index_json(words,url,page,posturl)
import json
with open(posturl+'.json', 'w') as outfile:
    json.dump(js, outfile)

In [20]:
#glossary
page = "glossary"
posturl = "gloss"
glossary = """
DevOps Course Glossary
Here is the glossary of terms for our course:

Agile Development: A method of developing software featuring frequent releases, adaptability to change, and close collaboration with customers.
Anti-Pattern: A set of characteristics of some code structure or development practice that have been found to be common and harmful.
Automated Testing: Testing through scripts rather than "by hand."
Azure DevOps: Microsoft's end-to-end DevOps solution.
Containerization: The process of packaging an application and everything it depends upon in a "container," then running it in its own virtual environment.
Continuous Integration: (CI) The process of integrating different developers' work with high regularity.
Continuous Deployment: (CD) Regularly deploying new work into production, usually with every push to the master branch on the origin source-code control server.
Docker: The most popular tool for containerizing applications.
DockerHub: A site hosting publically available Docker images.
git: Today's most popular version control system.
GitHub: A site hosting git repositories.
image: The static version of a Docker container: we run the image to get a container.
Infrastructure-as-Code: (IAC) Using code to set up and provision servers, network connections, and so on.
Jenkins: A CI/CD tool that automates building and testing a product.
Kanban: A tool for controlling work-in-progress and making visible who is working on what tasks.
Kubernetes: An open-source tool for orchestrating containers.
Lean Development: Similar to Agile Development; strong inheritance from Toyota Production Systems
linting: Cleaning up source code by running a 'lint' tool that catches common coding errors.
make: A venerable tool for automating builds.
Microservices: Tiny applications that do a single job, like authentication or scheduling deliveries.
Monitoring: Software that "watches" an application and sends out alerts at signs of trouble.
Pattern: A set of characteristics of some code structure or development practice that have been found to be common and beneficial.
repository: The 'database' of a version control system.
Slack: A popular tool for communication in development teams; also, the "empty" periods that are necessary in a schedule for anyone to do creative work.
StatusCake: The monitoring software we are using in our course.
Test-Driven Development: A method of developing software that writes tests for a new feature first, then writes the feature.
Toyota Production Systems: (TPS) A lean manufacturing system put in place at Toyota in the 1960s, featuring flexible production and employee empowerment.
Version Control: A tool that allows users to keep multiple versions of a file, revert to earlier versions, compare versions, and so on.
"""
js = generate_index_json(words,url,page,posturl)
import json
with open(posturl+'.json', 'w') as outfile:
    json.dump(js, outfile)

In [21]:
import os
files = [i for i in  os.listdir(".") if ".json" in i]
lst = []
for i in files:
    json_data=open(i).read()
    data = json.loads(json_data)
    for i in data:
        lst.append(i)
with open('data.json', 'w') as outfile:
    json.dump(lst, outfile)


In [23]:
json_data=open("data.json").read()
data = json.loads(json_data)


In [24]:
print(data)

[{u'info': {u'url': u'http://127.0.0.1:8000/devops/infra', u'page': u'infrassture as code', u'sentence': u'Infrastructure as CodeInfrastructure as code and Docker Lesson 1: What Is Infrastructure as Code?'}, u'title': u'code'}, {u'info': {u'url': u'http://127.0.0.1:8000/devops/infra', u'page': u'infrassture as code', u'sentence': u'Infrastructure as code: What is it?'}, u'title': u'code'}, {u'info': {u'url': u'http://127.0.0.1:8000/devops/infra', u'page': u'infrassture as code', u'sentence': u'Infrastructure as code (IaC) as the idea that, rather than manually provisioning servers, or setting up hardware through a point-and-click GUI, the "server room" should itself be managed by code.'}, u'title': u'code'}, {u'info': {u'url': u'http://127.0.0.1:8000/devops/infra', u'page': u'infrassture as code', u'sentence': u'That code can then be put under version control, tested, deployed with automated build tools, and so on.'}, u'title': u'code'}, {u'info': {u'url': u'http://127.0.0.1:8000/devop

In [32]:
lst = []
for i in data:
    lst.append((i["title"]))

In [33]:
mylist = list(set(lst))

In [34]:
for i in mylist:
    print(i)

longer
code
particularly
programming
indeed
results
judiciously
mistakes
wherever
devops
human
go
follow
still
fine
find
ground
record_payments
tables
render
instance
writes
hour_of_day
writing
languages
source
program
handle_yearly_taxes
send_bills
employee
include
might
easier
ought
happened
...
good
goes
stay
format
python
big
practice
db
possible
soon
creations
every
difficult
probably
using
bit
modifying
day
special
mod
easily
term
magic
name
taxes
called
--
unusual
assuming
absurdly
standards
ignore
applied
reasonable
contain
translated
found
aesthetic
works
take_input_of_employee_w2_and_calculate_employee_tax_rate
mean
mathematical
ignored
methods
coding
people
job
consistent
guidelines
humans
unless
conventions
focuses
examples
choosing
microservice
tri-mester
blocks
programmer
message
best
really
stands
space
away
since
please
per
current
create_tax_roll
written
convince
eliminate
correctly
reader
new
previous
crashing
ever
correct
concepts
coders
literary
never
models.py
team