Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different Protocol Names In Study Sequence Cause An Error #501

Open
ptth222 opened this issue Aug 28, 2023 · 0 comments
Open

Different Protocol Names In Study Sequence Cause An Error #501

ptth222 opened this issue Aug 28, 2023 · 0 comments

Comments

@ptth222
Copy link

ptth222 commented Aug 28, 2023

I initially modified a JSON example directly and found this issue, but I think showing it from the Tab side is clearer.

I modified the BII-I-1 Tab example so that the first culture has a different protocol than the rest. This validates and converts to JSON without issues. If I try to convert that JSON back to Tab though there is an issue caused by the different protocol.

Modified study and investigation files:
s_BII-S-1.txt
i_investigation.txt

Code:

isa_json = isatab2json.convert('C:/Users/Sparda/Desktop/Moseley Lab/Code/MESSES/isadatasets/tab/BII-I-1_conversion_testing', use_new_parser=True)

with open('C:/Users/Sparda/Desktop/Moseley Lab/Code/MESSES/isadatasets/BII-I-1_testing.json', 'w') as out_fp:
     json.dump(isa_json, out_fp, indent=2)

with open('C:/Users/Sparda/Desktop/Moseley Lab/Code/MESSES/isadatasets/BII-I-1_testing.json') as file_pointer:
    json2isatab.convert(file_pointer, 'C:/Users/Sparda/Desktop/Moseley Lab/Code/MESSES/isadatasets/BII-I-1_testing/', validate_first=False)

Traceback:

Traceback (most recent call last):

  File "C:\Users\Sparda\AppData\Local\Temp\ipykernel_5600\1208495759.py", line 5, in <cell line: 4>
    json2isatab.convert(file_pointer, 'C:/Users/Sparda/Desktop/Moseley Lab/Code/MESSES/isadatasets/BII-I-1_testing/', validate_first=False)

  File "C:\Python310\lib\site-packages\isatools\convert\json2isatab.py", line 49, in convert
    isatab.dump(isa_obj=isa_obj, output_path=path, i_file_name=i_file_name,

  File "C:\Python310\lib\site-packages\isatools\isatab\dump\core.py", line 170, in dump
    write_study_table_files(investigation, output_path)

  File "C:\Python310\lib\site-packages\isatools\isatab\dump\write.py", line 134, in write_study_table_files
    df_dict[olabel][-1] = node.executes_protocol.name

KeyError: 'Protocol REF.growth protocol 2'

I investigated the error and it seems to come from identifying process nodes by the protocol they execute instead of by their position like is done with sample nodes in the same section of code. I think I was able to fix it by simply changing the process node code to be like the sample node code.

New Code:

        sample_in_path_count = 0
        protocol_in_path_count = 0
        longest_path = _longest_path_and_attrs(paths, s_graph.indexes)
        
        for node_index in longest_path:
            node = s_graph.indexes[node_index]
            if isinstance(node, Source):
                olabel = "Source Name"
                columns.append(olabel)
                columns += flatten(
                    map(lambda x: get_characteristic_columns(olabel, x),
                        node.characteristics))
                columns += flatten(
                    map(lambda x: get_comment_column(
                        olabel, x), node.comments))
            elif isinstance(node, Process):
                olabel = "Protocol REF.{}".format(protocol_in_path_count)
                columns.append(olabel)
                protocol_in_path_count += 1
                if node.executes_protocol.name not in protnames.keys():
                    protnames[node.executes_protocol.name] = protrefcount
                    protrefcount += 1
                columns += flatten(map(lambda x: get_pv_columns(olabel, x),
                                       node.parameter_values))
                if node.date is not None:
                    columns.append(olabel + ".Date")
                if node.performer is not None:
                    columns.append(olabel + ".Performer")
                columns += flatten(
                    map(lambda x: get_comment_column(
                        olabel, x), node.comments))

            elif isinstance(node, Sample):
                olabel = "Sample Name.{}".format(sample_in_path_count)
                columns.append(olabel)
                sample_in_path_count += 1
                columns += flatten(
                    map(lambda x: get_characteristic_columns(olabel, x),
                        node.characteristics))
                columns += flatten(
                    map(lambda x: get_comment_column(
                        olabel, x), node.comments))
                columns += flatten(map(lambda x: get_fv_columns(olabel, x),
                                       node.factor_values))


        omap = get_object_column_map(columns, columns)
        # load into dictionary
        df_dict = dict(map(lambda k: (k, []), flatten(omap)))

        for path_ in paths:
            for k in df_dict.keys():  # add a row per path
                df_dict[k].extend([""])

            sample_in_path_count = 0
            protocol_in_path_count = 0
            for node_index in path_:
                node = s_graph.indexes[node_index]
                if isinstance(node, Source):
                    olabel = "Source Name"
                    df_dict[olabel][-1] = node.name
                    for c in node.characteristics:
                        category_label = c.category.term if isinstance(c.category.term, str) \
                            else c.category.term["annotationValue"]
                        clabel = "{0}.Characteristics[{1}]".format(
                            olabel, category_label)
                        write_value_columns(df_dict, clabel, c)
                    for co in node.comments:
                        colabel = "{0}.Comment[{1}]".format(olabel, co.name)
                        df_dict[colabel][-1] = co.value

                elif isinstance(node, Process):
                    olabel = "Protocol REF.{}".format(
                        protocol_in_path_count)
                    df_dict[olabel][-1] = node.executes_protocol.name
                    for pv in node.parameter_values:
                        pvlabel = "{0}.Parameter Value[{1}]".format(
                            olabel, pv.category.parameter_name.term)
                        write_value_columns(df_dict, pvlabel, pv)
                    if node.date is not None:
                        df_dict[olabel + ".Date"][-1] = node.date
                    if node.performer is not None:
                        df_dict[olabel + ".Performer"][-1] = node.performer
                    for co in node.comments:
                        colabel = "{0}.Comment[{1}]".format(olabel, co.name)
                        df_dict[colabel][-1] = co.value

                elif isinstance(node, Sample):
                    olabel = "Sample Name.{}".format(sample_in_path_count)
                    sample_in_path_count += 1
                    df_dict[olabel][-1] = node.name
                    for c in node.characteristics:
                        category_label = c.category.term if isinstance(c.category.term, str) \
                            else c.category.term["annotationValue"]
                        clabel = "{0}.Characteristics[{1}]".format(
                            olabel, category_label)
                        write_value_columns(df_dict, clabel, c)
                    for co in node.comments:
                        colabel = "{0}.Comment[{1}]".format(olabel, co.name)
                        df_dict[colabel][-1] = co.value
                    for fv in node.factor_values:
                        fvlabel = "{0}.Factor Value[{1}]".format(
                            olabel, fv.factor_name.name)
                        write_value_columns(df_dict, fvlabel, fv)

This is approximately lines 64-167 in isatools\isatab\dump\write.py in the write_study_table_files function. The changed code no longer errors and the converted study Tab from the JSON looks correct to me.

ptth222 added a commit to ptth222/isa-api that referenced this issue Sep 6, 2023
Changed write_study_table_files and write_assay_table_files to count the protocol nodes instead of naming them by the protocol executed. Addresses issue ISA-tools#501.
@proccaserra proccaserra mentioned this issue Oct 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant