Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

superstructure write to BW: ValueError: This sheet is too large! #70

Closed
simb-sdu opened this issue Apr 5, 2022 · 9 comments
Closed

Comments

@simb-sdu
Copy link
Collaborator

simb-sdu commented Apr 5, 2022

when I run ndb.write_superstructure_db_to_brightway() I get the following error
Freshly installed env with premise 1.08

Prepare database 1.
Prepare database 2.
Prepare database 3.
Looping through scenarios to detect changes...
Export a scenario difference file.
Dropped 852398 duplicates.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [18], in <cell line: 1>()
----> 1 ndb.write_superstructure_db_to_brightway()

File ~\Anaconda3\envs\premise\lib\site-packages\premise\ecoinvent_modification.py:952, in NewDatabase.write_superstructure_db_to_brightway(self, name, filepath)
    949     print(f"Prepare database {scen + 1}.")
    950     scenario["database"] = self.prepare_db_for_export(scenario)
--> 952 self.database = build_superstructure_db(
    953     self.database, self.scenarios, db_name=name, fp=filepath
    954 )
    956 print("Done!")
    958 self.database = check_for_duplicates(self.database)

File ~\Anaconda3\envs\premise\lib\site-packages\premise\utils.py:440, in build_superstructure_db(origin_db, scenarios, db_name, fp)
    437 after = len(df)
    438 print(f"Dropped {before - after} duplicates.")
--> 440 df.to_excel(filepath, index=False)
    442 print(f"Scenario difference file exported to {filepath}!")
    444 list_modified_acts = list(
    445     set([e[0] for e, v in modified.items() if v["original"] == 0])
    446 )

File ~\Anaconda3\envs\premise\lib\site-packages\pandas\core\generic.py:2357, in NDFrame.to_excel(self, excel_writer, sheet_name, na_rep, float_format, columns, header, index, index_label, startrow, startcol, engine, merge_cells, encoding, inf_rep, verbose, freeze_panes, storage_options)
   2344 from pandas.io.formats.excel import ExcelFormatter
   2346 formatter = ExcelFormatter(
   2347     df,
   2348     na_rep=na_rep,
   (...)
   2355     inf_rep=inf_rep,
   2356 )
-> 2357 formatter.write(
   2358     excel_writer,
   2359     sheet_name=sheet_name,
   2360     startrow=startrow,
   2361     startcol=startcol,
   2362     freeze_panes=freeze_panes,
   2363     engine=engine,
   2364     storage_options=storage_options,
   2365 )

File ~\Anaconda3\envs\premise\lib\site-packages\pandas\io\formats\excel.py:875, in ExcelFormatter.write(self, writer, sheet_name, startrow, startcol, freeze_panes, engine, storage_options)
    873 num_rows, num_cols = self.df.shape
    874 if num_rows > self.max_rows or num_cols > self.max_cols:
--> 875     raise ValueError(
    876         f"This sheet is too large! Your sheet size is: {num_rows}, {num_cols} "
    877         f"Max sheet size is: {self.max_rows}, {self.max_cols}"
    878     )
    880 formatted_cells = self.get_formatted_cells()
    881 if isinstance(writer, ExcelWriter):

ValueError: This sheet is too large! Your sheet size is: 2959412, 16 Max sheet size is: 1048576, 16384
@romainsacchi
Copy link
Collaborator

Hi @simb-sdu , can you tell me the exact scenarios you tried to produce?

@simb-sdu
Copy link
Collaborator Author

simb-sdu commented Apr 6, 2022

Hi Romain, it is EI 3.8 SSP2-base for the years 2020, 2050, and 2100

Had the use before, but it seemingly disappeared on its own. Now running the same code and a new pc raises this issue.

Yesterday, I changed the value of max_cols in ~\Anaconda3\envs\premise\lib\site-packages\pandas\io\formats\excel.py and got it working. But it surprises me that this is necessary?

@romainsacchi
Copy link
Collaborator

Yes, I think premise produces excel files that are unnecessarily too long.
Some rows in the excel file should not be.
I will look into it and come back with a fix.

@romainsacchi
Copy link
Collaborator

Hi @simb-sdu,
I think I fixed the issue, but not quite sure. Hence, I´d be grateful if you can test it out for me before I make the release public.
If you are willing, then you can:

  • uninstall (pip uninstall premise if you used pip)
  • install the current development version of premise by doing pip install git+https://github.com/polca/premise.git
  • to make sure you got the proper version, it should say 1.0.9

@romainsacchi
Copy link
Collaborator

I will otherwise try also on my end in the meanwhile.

@romainsacchi
Copy link
Collaborator

Hi @simb-sdu ,
it seems to work fine now. I re-produced the three scenarios you mentioned, and the Excel file is now "only" 353'000 rows long and 17 mb large. So I will push the 1.0.9 release to Pypi and consider the issue closed unless you experience something different on your end.

@simb-sdu
Copy link
Collaborator Author

simb-sdu commented Apr 8, 2022

Hi @simb-sdu , it seems to work fine now. I re-produced the three scenarios you mentioned, and the Excel file is now "only" 353'000 rows long and 17 mb large. So I will push the 1.0.9 release to Pypi and consider the issue closed unless you experience something different on your end.

Hi Romain

Thanks for looking into this. Sounds like a great improvement in efficiency from 2.5M rows to 0.35M rows!
And sorry for not being able to test before the push of 1.0.9, I have so much going on at this moment

I just ran pip install --upgrade premise but now I get an error when running import premise or from premise import *

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Input In [2], in <cell line: 1>()
----> 1 from premise import *
      2 import brightway2 as bw

File ~\Anaconda3\envs\PREMISE107\lib\site-packages\premise\__init__.py:9, in <module>
      6 DATA_DIR = Path(__file__).resolve().parent / "data"
      7 INVENTORY_DIR = Path(__file__).resolve().parent / "data" / "additional_inventories"
----> 9 from .ecoinvent_modification import NewDatabase, clear_cache

File ~\Anaconda3\envs\PREMISE107\lib\site-packages\premise\ecoinvent_modification.py:65, in <module>
     63 from .cement import Cement
     64 from .clean_datasets import DatabaseCleaner
---> 65 from .custom import (
     66     Custom,
     67     check_custom_scenario,
     68     check_inventories,
     69     detect_ei_activities_to_adjust,
     70 )
     71 from .data_collection import IAMDataCollection
     72 from .electricity import Electricity

File ~\Anaconda3\envs\PREMISE107\lib\site-packages\premise\custom.py:7, in <module>
      5 import wurst
      6 import yaml
----> 7 from schema import And, Optional, Or, Schema, Use
      9 from .ecoinvent_modification import (
     10     LIST_IMAGE_REGIONS,
     11     LIST_REMIND_REGIONS,
     12     SUPPORTED_EI_VERSIONS,
     13 )
     14 from .transformation import *

ModuleNotFoundError: No module named 'schema'

@romainsacchi
Copy link
Collaborator

@simb-sdu sorry, you can fix that in the meanwhile by installing schema pip install schema.
I will push a fix.

@simb-sdu
Copy link
Collaborator Author

simb-sdu commented Apr 8, 2022

thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants