Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Proposal: Connect a requirement to multiple refs #365

Closed
stanislaw opened this issue Jul 12, 2019 · 25 comments
Closed

Feature Proposal: Connect a requirement to multiple refs #365

stanislaw opened this issue Jul 12, 2019 · 25 comments

Comments

@stanislaw
Copy link
Member

stanislaw commented Jul 12, 2019

Hello,

First of all thank you for creating this tool! The idea of keeping the requirements close to the source code is absolutely amazing! Having discovered this tool, I am now looking into how I could convince my team to give it a try at least for some of our subprojects. The tool seems to have some rough edges when I am trying to model with it our current requirements tree but so far I have found nothing that would become a blocker.

Among other things, I would like to request the following three features. They are all related to each other and if the maintainers think they are worth to be implemented, I could volunteer to implement them once there will be a joint agreement on the concept.

Thank you.


Feature 1: A requirement can reference multiple files via ref, not just one file

As far as I see it, currently the ref feature connects a requirement to only one file.

Feature 2: CRC calculation for the referenced files

Additionally, the ref option could also create the CRC values for the files in a way how it is working for the links option. Here is the example of this could look like:

ref:
  - file: tests/test1.cpp
    checksum: abcdefgh1
  - file: tests/test2.cpp
    checksum: abcdefgh2

Feature 3: Possibility to reference folders, not just single files

This can be built on top of the features 1 and 2.

@stanislaw stanislaw changed the title Feature Proposal: Connect requirement to multiple refs Feature Proposal: Connect a requirement to multiple refs Jul 12, 2019
@jacebrowning
Copy link
Member

jacebrowning commented Jul 12, 2019

The current implementation of ref links to a line in a file (typically a comment added specifically for this purpose), not a whole file. When documents are published, Doorstop includes the location of all external references.

For example, the LLT document (https://github.com/jacebrowning/doorstop/tree/develop/doorstop/core/tests/docs) becomes: http://jacebrowning.github.io/doorstop/LLT.html

Does that change this feature request?

@stanislaw
Copy link
Member Author

stanislaw commented Jul 12, 2019

I think it does not change it. Using keywords that point to specific comments in a file would be too weak.

I am basing my experiments with ref on this part of the doc: https://doorstop.readthedocs.io/en/latest/reference/items/#ref (Example: Reference file).

We really want to be able to link a requirement to the multiple files and also have every link tightened by the checksum calculation.

@jacebrowning
Copy link
Member

jacebrowning commented Jul 12, 2019

Thanks! I had actually forgotten about the ability to link to a whole file. 😄

Can you describe the expected workflow for making a code change to a linked directory? It seems to me that if a requirement contains the checksum of a directory of code then that requirement will need to be reviewed every time a line of code is changed.

@sebhub
Copy link
Member

sebhub commented Jul 12, 2019

I think a more flexible ref attribute with a fingerprint would be great. This may help to monitor changes in the source code with respect to the requirement coverage. A reference could point to a file or a set of lines in a file with a fingerprint. When the content in this scope changes it invalidates the reference. A doorstop clear could make the references valid again.

For example you could have a test case specification as a Doorstop item, then someone writes the test case code and you add a reference to this code block to the item with a fingerprint. Two years later someone else modifies the test code. A doorstop validation would then notice that the test case specification may no longer correspond to the test case code.

@stanislaw
Copy link
Member Author

Can you describe the expected workflow for making a code change to a linked directory? It seems to me that if a requirement contains the checksum of a directory of code then that requirement will need to be reviewed every time a line of code is changed.

That's exactly how it is supposed to work. We want to make us developers to think about the requirements whenever something in the folder changes.

It is not always that you can point to a single line or even a file so that they are enough to satisfy a requirement.

The point of this is to link to the small folders (think small submodule).

@jacebrowning
Copy link
Member

Ok, I see the value in this!

I think that explicitly calling out which files fulfill the requirement might be cleaner than specifying a whole directory. What if someone adds a new file to the directory? What if someone updates just the changelog? What if the directory has subdirectories?

@stefanoco
Copy link

I'm very positive about this proposal, now that you describe it it's a missing one that could make a difference in the usage workflow. Just thinking if it may even make sense to have the ability to link and fingerprint to a set of lines of code. Expecially in the context of a controlled process (I'm thinking of Functional Safety) it would be hard to keep alway true that modifications of a file outside of the few lines under control do not interfere... think of a variable initialization as an example.

This is just to say that from my personal point of view, refs to a set of files would be the clean way.

Regarding fingerprinting: does it make sense to think of a git hash for a commit? Or is this too specific?

@sebhub
Copy link
Member

sebhub commented Jul 13, 2019

New features for the ref attribute should be backward compatible. So, in case the value of the ref attribute is a scalar, then the current behaviour should be ensured. The new features could be enabled by a list of dictionaries. For example:

ref:
- type: file # mandatory; type is an enumerator and allows future extensions
  path: path/to/file # mandatory
  stamp: fingerprint # optional
  lines: # optional; if not present, the entire file is used
  - begin: line # begin of a line range (inclusive)
    end: line # end of a line range (exclusive, optional)
    tolerance: line count # move count lines up and down to get a matching fingerprint (optional)
- type: directory # mandatory
  path: path/to/directory # mandatory
  stamp: fingerprint # optional

Calculating the stamp for a directory should use the os.walk() function and include the file path and the file content. Maybe include and exclude patterns could be added as well.

With respect to the stamp hash function, I think MD5 is not good enough. You can trust your YAML files, however, you may not trust the targets of a reference. It could be a download from the internet. I think for the reference stamps we should use SHA512.

@stanislaw
Copy link
Member Author

I think that explicitly calling out which files fulfill the requirement might be cleaner than specifying a whole directory. What if someone adds a new file to the directory? What if someone updates just the changelog? What if the directory has subdirectories?

The feature 3 (fingerprinting the whole directories) comes from the experience of seeing rather general requirements which are hard to link to a single file. Without this feature 3, you might end up including a rather significant number of files that all belong to the same folder in order to guarantee the consistency - I find it more flexible if the tool would support both "many files" and "folder".

With that regard, the features 1,2 and 3 are all very much satisfied by the example given by @sebhub in the above comment including being explicit about the of ref: file or directory (mandatory type field).

Can the backward compatibility be simply ensured by checking on the type of a ref:

  • if it is a string, the tool works as it works now
  • if it is an array the new behavior is enabled.

@jacebrowning
Copy link
Member

jacebrowning commented Jul 13, 2019

That all sounds good to me. In terms of implementation, I think each of these could be a separate PR:

  1. Allow ref to be a list and support {type: file, path: <str>} using the existing code
  2. Enable (optional?) fingerprinting of that file
  3. Support multiple files
  4. Support directories
  5. Support line ranges
  6. Support tolerances on line ranges

@stanislaw
Copy link
Member Author

stanislaw commented Jul 13, 2019

Fully agree that the feature can and should be split into a number of PRs. I will give it a try then, starting from the first one as you described it.

I have rarely done any Python programming but I think this will not block me and it will be possible to polish the code during the code review to make it idiomatic.

@stanislaw
Copy link
Member Author

stanislaw commented Jul 15, 2019

New features for the ref attribute should be backward compatible. So, in case the value of the ref attribute is a scalar, then the current behaviour should be ensured. The new features could be enabled by a list of dictionaries. For example:

@jacebrowning @sebhub

I am in the middle of the work (the latest changes are here). One bit of input that I need from you before finishing it and opening a PR is the following:

type: can be a file and later in the subsequent PRs a folder but what about the current behavior? Do we want to have what is current behavior in the new refs attribute as for example:

refs:
- type: keyword?
  path: explicit file path where the keyword is expected to be found
  keyword?: REQ123

Without explicit path I personally find this type of connection via keyword to be a very weak link because it is not supported by the explicit file path and crc and I want to hear your opinion.

I hope it is clear what I am asking.

@jacebrowning
Copy link
Member

The keyword-style "ref" is meant to mimic what I have seen done with other tools -- link test cases to test implementations using the name of the test. It's a weak association, but Doorstop's functionality is still better than most in that it will actually confirm the name is present in the codebase.

Having said that, I don't think we need to support "keyword" as a type at this time.

@sebhub
Copy link
Member

sebhub commented Jul 16, 2019

The keyword could be added to file type references in addition to the lines and stamp support. If keyword is present, then the keyword must be contained in the file content.

With respect to the type key. An alternative is to use file: path and directory: path keys that implicitly define the type. This would make the references a bit more compact.

@stanislaw
Copy link
Member Author

stanislaw commented Jul 18, 2019

I am stuck a bit with understanding how the CSV serialization for the ref field in case of many items should look like.

The closest bit of the current behavior I see relevant is the one related to links field:

def _tabulate(obj, sep=LIST_SEP, auto=False):
    ...
            elif key == 'links':
                # separate identifiers with a delimiter
                value = sep.join(uid.string for uid in item.links) # => 'SYS001\nSYS002:abc123'

And this is how it gets written to a csv file:

uid,level,text,ref,links,active,derived,normative,reviewed
REQ006,1.5,"Hello, world!","","SYS001
SYS002:abc123",True,False,True,c442316131ca0225595ae257f3b4583d

and this is how it gets deserialized back:

def _itemize(header, data, document, mapping=None):
    ...
            elif key == 'links':
                # split links into a list
                attrs[key] = _split_list(value)

The problem is that the structure with refs is a bit different:

refs:
- type: file
  path: path1
- type: file
  path: path2

which I don't see how to serialize. Even if we skip using the explicit type field like @sebhub suggests in the above comment, the other fields like stamp and other will be added.

Any advice how do people serialize nested dictionaries to the csv?


I could introduce a custom delimiter to keep separate between refs, something like:

type:file(custom delimiter)path:path1\ntype:file(custom delimiter)path:path2

but is this the right way to do? 😕

Also, the solution shall accommodate the support of the current behavior when ref is simply a string.

@jacebrowning
Copy link
Member

Perhaps we could use multiple newlines?

type:file\npath:path1\n\ntype:file\npath:path2

@sebhub
Copy link
Member

sebhub commented Jul 19, 2019

Serialization to/from CSV sounds like a lot of fun. I guess the multiple newline separator should work. First split by double newline to get the dictionaries, then split by single newline to get the key-value pairs.

@stanislaw
Copy link
Member Author

stanislaw commented Jul 19, 2019

I will do it with the double newlines. The only counter-argument I have in mind is that the CSV reads with lots of spaces then, and it is quite noticeable when you look at the test fixtures. But this is not a problem I think.

@sebhub
Copy link
Member

sebhub commented Jul 19, 2019

Are these real newlines or escaped newlines? Don't we need real newlines to separate the rows in the CSV table?

I have to throw something in to make it even more complicated. We should think about regular expressions as values. For example the keyword could be a regular expression.

For the directories I have another use case. For example you could have a requirement like this for requirements: R1: "Each requirement shall have a UUID." One option to implement this, could be to add a document setting, see #344. To verify this requirement you have to inspect all .doorstop.yml files. You can add a verification item as a child of R1. In this item you write: "By manual inspection of each .doorstop.yml files we ensured that each document has the right UUID setting.". In addition, you add a directory reference with a file pattern and a stamp:

ref:
- directory: .
  include: .doorstop.yml # see grep command
  stamp: abc

As soon as someone touches the settings or adds a new document, the verification item is invalidated.

@stanislaw
Copy link
Member Author

Are these real newlines or escaped newlines? Don't we need real newlines to separate the rows in the CSV table?

This is a good point and I will see how it works escaped. However, note that this is not how it currently works for the links parameter. Check this example:

REQ003,1.4,Unicode: -40° ±1%,REF123,"REQ001:foo1
REQ002:foo2",True,False,True,

in case when links has two items.

I.e. I do see the strings split by the \n character, it is just surrounded by the string quotes: "...".

@stanislaw
Copy link
Member Author

@jacebrowning , @sebhub

Now I am working on the XLS import/export tests.

Please check the screenshot: https://gofile.io/?c=iqaXPZ. Is this how you would expect the single/double newlines to work?

@stanislaw
Copy link
Member Author

stanislaw commented Jul 20, 2019

It looks like I am done with all of the changes except adapting the Desktop and Web interfaces. I will look into them now, starting with the Desktop one.

@jacebrowning
Copy link
Member

@stanislaw Can we insert actual newlines like we do for links?

@stanislaw
Copy link
Member Author

Just a heads up here: I need some time to get myself into how tkinter works and how to incorporate the "multiple refs" feature into existing UI. I cannot devote all of my spare time to this work but I am on it and will definitely keep you updated.

@stanislaw
Copy link
Member Author

Everything discussed here with the exception of the GUI changes has been implemented here: #423. I am closing this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants