Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stages in different columns #1

Open
ghost opened this issue Dec 15, 2020 · 3 comments
Open

Stages in different columns #1

ghost opened this issue Dec 15, 2020 · 3 comments

Comments

@ghost
Copy link

ghost commented Dec 15, 2020

Hello and thank you for pySPA!
Is there a way to import the export_to_csv() results back into Python in a way that stages are given in different columns? I've used pd.read_csv('file.csv') and all results are given in a single column.

grafik

@ghost
Copy link
Author

ghost commented Dec 16, 2020

If multiregional is set to multiregional=True in export_to_csv(), then the region is appended to the sector in brackets. Hence, the string in that column can be cut apart based on the position of the brackets :) ... just a workaround though. Alternatively, a specific string can be inserted in def print_pathway() in the line with if multiregional.

@rrodorr
Copy link

rrodorr commented Apr 26, 2022

The issues seems to be that the csv output has its rows wrapped in quotation marks, rather than just row elements. The raw output looks like this:

"Target sector: 	Cattle farming"
"Sector ID: 	824"
"Number of Regions in input data: 	49"
"Number of sectors in A matrix: 	7987"
"Total number of pathways extracted: 	457"
"Stages analysed: 	8"

while it should look like this:

"Target sector:" 	"Cattle farming"
"Sector ID:" 	"824"
"Number of Regions in input data:" 	"49"
"Number of sectors in A matrix:" 	"7987"
"Total number of pathways extracted:" 	"457"
"Stages analysed:" 	"8"

or e.g. like this (delimiter ;):

Target sector:;Cattle farming
Sector ID:;824
Number of Regions in input data:;49
Number of sectors in A matrix:;7987
Total number of pathways extracted:;457
Stages analysed:;8

If you take a look at the source code, it seems the row elements are deliberately joined for csv export (cf. if to_csv). I don't quite understand why. Maybe to fix some issue that had occured downstream? Granted, there is a varying number of columns in the csv output, but Excel displays this correctly nonetheless. To me, this presents a considerable hurdle for working with the output data. I am now thinking about parsing the csv text and replacing the \t with ; delimiters and dropping the row-wrapping (instead of element-wrapping) quotation marks in the original csv file.

def _generate_spa_title_block(self, flow, to_csv=True or False):
        """
        Creates a title block for the SPA results produced for each flows.
        :param flow: flow being analysed
        :param to_csv: boolean flag referencing whether the data is to be saved to csv file or not
        :return: title block of the SPA results for a particular flow
        """
        try:
            flow_unit = self.flows_dict[flow].unit
        except AttributeError:
            raise AttributeError('The flow %s is not defined' % flow)

        spa_title_block_list = [
            ['flow analysed:', flow],
            ['unit:',
             "%s/%s" % (flow_unit, self.root_node.get_node_attribute(self.sector_definition_dict, 'Unit'))],
            ['thresholds:', str(self.thresholds_dict[flow])],
            ['direct:', str(self.root_node.direct_intensities[flow]), "%s/%s" %
             (flow_unit, self.root_node.get_node_attribute(self.sector_definition_dict, 'Unit'))],
            ['total:', str(self.root_node.total_intensities[flow]), "%s/%s" %
             (flow_unit, self.root_node.get_node_attribute(self.sector_definition_dict, 'Unit'))],
            ['% of total covered by SPA:', "{:.2%}".format(self.get_coverage_of(flow)),
             'Note: Value may differ from sum of percentages in the table due to rounding']
        ]
        if to_csv:
            new_spa_title_block_list = []
            for row in spa_title_block_list:
                new_row = '\t'.join(row)
                new_spa_title_block_list.append(new_row)
            return new_spa_title_block_list
        else:
            return spa_title_block_list

@Geoffrey-Guest
Copy link

thanks for the package guys! I just applied it to ecoinvent.

Here's what I did to create a column-wise df:

spaResults = pd.read_csv("spa_results.csv")
spaSummary = spaResults[0:16]
spaTable_header = spaResults.loc[17].tolist()[0].split("\t")
spaPaths = spaResults[18:].reset_index(drop=True)
spaPaths = spaPaths[spaPaths.columns[0]].tolist()

spaTable = pd.DataFrame([(x.split('\t')+['']*(18-len(x.split('\t')))) for x in spaPaths], columns=spaTable_header)

spaTable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants