Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move emissions definitions to single csv #135

Merged
merged 20 commits into from Oct 16, 2018
Merged

Conversation

znicholls
Copy link
Collaborator

@znicholls znicholls commented Sep 21, 2018

Move emissions definitions

This pull request moves all the definitions of MAGICC emissions species, their units and which files they belong in to a single csv.

The first column is the emissions species. These follow MAGICC7's internal conventions (and are mapped to MAGICC6 variables as necessary in other functions).

The second column is the units. They are written in a way which is pint compatible (although no tests have been added to show this yet, that is for the next pull request). This means that we have to put spaces between the mass, the time and the thing which is being emitted which looks a bit yuck but is the only way pint can parse the emissions.

The next three columns define where these species go when being written to MAGICC6 input files (note that in MAGICC7 you can simply put everything in a SCEN7 file so this extra layer of definition isn't required). For example, all the variables with True in the 'prn emms' column belong in '.prn' files. For SCEN files, MAGICC6 has an internal convention which means we have to do even more work. At the top of every MAGICC6 compatible .SCEN file there is a two digit number (in the current pymagicc/__init__.py its called region_code but it's actually more than that). The first digit tells MAGICC how many regions data is being provided for and hence isn't relevant to this PR. The second digit tells MAGICC which gases are in the SCEN file. This is referred to as the 'SCEN emms code'. Hence the column which has the header 'SCEN emms code 1' defines the gases which are expected when that 'SCEN emms code' is 1. Similarly, the column which has the header 'SCEN emms code 0' defines the gases which are expected when that 'SCEN emms code' is 0. Having these definitions allows pymagicc to check that the right set of emissions has been provided before writing SCEN files.

Please confirm that this pull request has done the following:

  • Tests added (N/A as simply refactoring)
  • Documentation added (need help here)
  • Example added (either to an existing notebook or as a new notebook) (N/A as only refactoring)
  • Description in CHANGELOG.md added (Need help here too, depends a bit on what goes in docs)

Adding to CHANGELOG.md

Please add a single line in the changelog notes similar to the following:

- (#XX)[http://link-to-pr.com] Added feature which does something

@codecov-io
Copy link

codecov-io commented Sep 21, 2018

Codecov Report

Merging #135 into master will increase coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #135      +/-   ##
==========================================
+ Coverage   96.85%   96.87%   +0.02%     
==========================================
  Files           4        4              
  Lines         889      895       +6     
  Branches      123      122       -1     
==========================================
+ Hits          861      867       +6     
  Misses         16       16              
  Partials       12       12
Impacted Files Coverage Δ
pymagicc/definitions/__init__.py 100% <100%> (ø) ⬆️
pymagicc/io.py 97.29% <100%> (+0.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fb8bd9c...a5bbe62. Read the comment docs.

@rgieseke
Copy link
Member

Sorry, but i have trouble reviewing because i don't know what does this PR do/change?

I still don't understand what the emms codes do.
Apart from that, cosmetic questions ... Why upper/lower case for booleans? Is the unit format specific to anything, Pint? Personally, I'd prefer writing it without spaces e.g. "GtC/yr" which seems more common to me.

@znicholls znicholls changed the title Make emissions units more sensible Move emissions definitions to single csv Sep 21, 2018
@znicholls
Copy link
Collaborator Author

Sorry, but i have trouble reviewing because i don't know what does this PR do/change?

Sorry was being super lazy, try again now with a proper description

Why upper/lower case for booleans?

Not sure why that's that way, happy to change to whatever you'd like.

@rgieseke
Copy link
Member

Awesome, that makes sense now! What do you want me to do, create a file where this go into the documentation?

@znicholls
Copy link
Collaborator Author

What do you want me to do, create a file where this go into the documentation?

haha ideally decide what the best system is

Other options we could also do:

  1. save the emissions units definitions file with data package so we have a useful json file which describes it
  2. make the _get_scen_special_code function public and put the docs in there

@rgieseke
Copy link
Member

I like both proposals! Should I push right to this branch and you tweak as you see fit?

@znicholls
Copy link
Collaborator Author

Should I push right to this branch and you tweak as you see fit?

Yep definitely

@rgieseke
Copy link
Member

Pushed an example of how this could look like using the Data Package format.

One question though, are the definitions used elsewhere, potentially later?

Why don't we just define

pymagicc.definitions.scen_emms_code_1 = [
['CO2I',
 'CO2B',
 'CH4',
 'N2O',
 'SOX',
 'NOX',
 'CO',
 'BC',
 'OC',
 'NMVOC',
 'NH3',
 'CF4',
 'C2F6',
 'C6F14',
 'HFC23',
 'HFC32',
 'HFC4310',
 'HFC125',
 'HFC134A',
 'HFC143A',
 'HFC227EA',
 'HFC245FA',
 'SF6']

@rgieseke
Copy link
Member

If we use the Data Package format we can do validation of the data quite easily, e.g. a mis-typed "TRU" for the boolean values would give

data validate .
> Error! Validation has failed for "magicc_emisssions_units"
> Error! There are 1 type and format mismatch errors on line 72
> Error! The value "TRU" in column "prn emms" is not type "boolean" and format "default"

This could become part of the CI. Same could be done to define allowed numbers or ranges etc.

@znicholls
Copy link
Collaborator Author

Why don't we just define

I avoid this because I want to put the units right next to the emissions. I also want to have the emissions defined in only one place (rather than once next to the emissions, once for scen code 0, once for scen code 1 etc.). This makes it much easier to keep track of things (as they're only defined in one place) and also update should we ever decide to use a more sensible naming in pymagicc (and then simply map to MAGICC's internal variables in the writers like is currently done for SCEN files).

If we use the Data Package format we can do validation of the data quite easily, e.g. a mis-typed "TRU" for the boolean values would give

Super nice

@rgieseke
Copy link
Member

Good points, I understand now. I'll convert the rest of CSV reading to use the Data Package logic.

@znicholls
Copy link
Collaborator Author

Have just gone through this and updated.

I didn't think about the fact that _ScenWriter isn't public so making get_special_scen_code public doesn't actually help... I'd be tempted to leave it as a private method in this case...

@znicholls
Copy link
Collaborator Author

Alright @rgieseke the thing blocking this is how we get the information in pymagicc/definitions/datapackage.json into the docs somehow. Do you have any know-how/ideas? Worst case we write a custom writer which generates a docs page from the json file?

@znicholls
Copy link
Collaborator Author

While we're here, do you have a link which explains what the data package standard is/where people should look to find out more?

@rgieseke
Copy link
Member

@rgieseke
Copy link
Member

Either a custom script to write out a .rst file or something with Sphinx templates I guess: http://www.sphinx-doc.org/en/master/templating.html

What do you have in mind? A Table?

@znicholls
Copy link
Collaborator Author

What do you have in mind? A Table?

Ye or even just a series of dot points in the pymagicc.definitions.rst page so people who see these csvs can get a quick overview of what they are.

@znicholls
Copy link
Collaborator Author

What do you have in mind? A Table?

I'll write something now.

@znicholls
Copy link
Collaborator Author

@rgieseke that was harder than I thought it would be but I think it's ready to go

@rgieseke
Copy link
Member

That looks super nice, what is the magicc that makes the variables appear, just documenting wth a doc string?

Only thing i would review is the naming of columns, it's a mix of lower, upper, spaces and underscores now.
For column names that are easy to read in i like to have names that are distinguishable wirh Pandas DataFrames, so starting with UpperCase. But i'm not sure if some of the names have meaning from MAGICC already ...

@znicholls
Copy link
Collaborator Author

That looks super nice, what is the magicc that makes the variables appear, just documenting wth a doc string?

Ye as shown here https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_numpy.html

starting with UpperCase

Good point, to clarify: would we do "MAGICC Variable" or "MAGICC variable" or something else?

@rgieseke
Copy link
Member

After discussion with @znicholls let's go for

all lower case, underscores
magicc_variable
emissions_unit

scenfile_region_code
scenfile_emissions_code

part_of_scenfile_with_emissions_code_0
part_of_scenfile_with_emissions_code_1

part_of_prnfile

@znicholls
Copy link
Collaborator Author

@rgieseke all yours

@rgieseke
Copy link
Member

I've changed the bools to look Pythonic otherwise thanks a lot!

@rgieseke rgieseke merged commit a326d41 into master Oct 16, 2018
@rgieseke rgieseke deleted the add-emissions-units branch October 16, 2018 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants