In [44]:
class Story:
    """
    Story contains all known data about a given user story.
    """
    def __init__(self, number, title, role, action, goal):
        self.number = number
        self.title = title
        self.role = role
        self.action = action
        self.goal = goal
        
    def __repr__(self):
        return self.__str__()
        
    def __str__(self):
        return ("Story " + str(self.number) + ": " + self.title +
               " As a " + self.role + ", I would like to " + self.action +
               " so that: " + self.goal)

Now I set up the regex to parse the file (in the above formats) with.

In [45]:
# This set of regex matches the format above, 
# extracting (each section corresponds to a captured element)
# 1. The number (X above)
# 2. The story's title
# 3. User role, action, or goal
re_num = "(?:US)(\d+)"
re_title = "(?:\W*)(.+)"
re_detail = "(?:[\w\s]+:\s*)(.+)"
# Each index of the array will be what we search for.
searches = [re_num + re_title, re_detail, re_detail, re_detail]

Finally, we set up the file for parsing and get an array of stories.

In [46]:
import re
import sys
stories = []

curr_story = []
index = 0


with open("stories.txt") as s:
    for line in s:
        # can't capture anything
        if len(line) is 0 or line.isspace():
            pass
        # try and parse the line
        else:
            r = re.search(searches[index], line)
            if r == None:
                sys.exit("Failed to parse line.")
            curr_story.append(r.groups(0)[0])
            if index == 0:
                curr_story.append(r.groups(1)[1])
            index += 1
            if index >= 4:
                stories.append(Story(int(curr_story[0]), *curr_story[1:]))
                index = 0
                curr_story = []

Now we have an array of stories. Here's an example of the result:

In [18]:
print(stories[0])

Story 1: User abilities - super-duper-user As a system user, I would like to be a super-duper-user so that: I have access to all system abilities, including the ability to view restricted content, and to assign roles


That's a lot of text, but it doesn't tell us much.

---

## Natural Language Processing
For this task (and lots of others), I'll be employing a library called [spaCy](https://spacy.io). spaCy could be better referred to as an NLP toolkit, written in Cython. This allows it to take advantage of the simplicity and expressiveness of Python while still utilizing the blazing speed of compiled C.

Here's a small piece of text we'll analyze.

In [1]:
text = """On Unix and Unix-like computer operating systems, a zombie process or defunct process is a process that has completed execution (via the exit system call) but still has an entry in the process table: it is a process in the "Terminated state". This occurs for child processes, where the entry is still needed to allow the parent process to read its child's exit status: once the exit status is read via the wait system call, the zombie's entry is removed from the process table and it is said to be "reaped". A child process always first becomes a zombie before being removed from the resource table. In most cases, under normal system operation zombies are immediately waited on by their parent and then reaped by the system – processes that stay zombies for a long time are generally an error and cause a resource leak.
The term zombie process derives from the common definition of zombie — an undead person. In the term's metaphor, the child process has "died" but has not yet been "reaped". Also, unlike normal processes, the kill command has no effect on a zombie process.
Zombie processes should not be confused with orphan processes: an orphan process is a process that is still executing, but whose parent has died. These do not remain as zombie processes; instead, (like all orphaned processes) they are adopted by init (process ID 1), which waits on its children. The result is that a process that is both a zombie and an orphan will be reaped automatically."""

In [15]:
import spacy

nlp = spacy.load('en')

parsed = [nlp(s.action) for s in stories]

for i in range(len(stories)):
    print(parsed[0], "compared with", parsed[i])
    print(parsed[0].similarity(parsed[i]))

be a super-duper-user compared with be a super-duper-user
0.999999987867
be a super-duper-user compared with be able to designate roles to user accounts
0.360403269612
be a super-duper-user compared with be a super-user
0.903838203828
be a super-duper-user compared with add users to a workgroup
0.428586439198
be a super-duper-user compared with hide objects
0.361153475569
be a super-duper-user compared with set different levels of editing rights based on individual user(s)
0.372441500027
be a super-duper-user compared with embargo content
0.311382344599
be a super-duper-user compared with have access to the master files
0.425561429019
be a super-duper-user compared with designate public access levels for workgroup content once it is published
0.442165713863
be a super-duper-user compared with control which users can send submissions to me
0.216929744028
be a super-duper-user compared with create a workgroup
0.48377817286
be a super-duper-user compared with create a workgroup of files o