Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CESM2-SSP data has been updated #42

Merged
merged 6 commits into from
Jul 1, 2020

Conversation

mnlevy1981
Copy link
Contributor

Regenerate time series data in xpersist_cache/and then regenerate flux table
and timeseries plots notebooks using updated data. Also updated
forcing_iron_flux.ipynb, which plots some SSP5-8.5 data.

Note that other notebooks might be affected by this change as well.

Regenerate time series data in xpersist_cache/ and then regenerate flux table
and timeseries plots notebooks using updated data. Also updated
forcing_iron_flux.ipynb, which plots some SSP5-8.5 data.

Note that other notebooks might be affected by this change as well.
@mnlevy1981
Copy link
Contributor Author

It looks like this update will change several figures / tables in the overleaf document... I haven't done anything in the latex

Want to add river flux for nitrogen imbalance, not subtract it. Also, need
nitrogen fixation in Global Flux Table cell 11.
@mnlevy1981
Copy link
Contributor Author

eae8717 address #44 and #45

@mnlevy1981 mnlevy1981 linked an issue May 26, 2020 that may be closed by this pull request
The CESM2 columns should be SSP, not RCP, and we don't want to include oxygen
results in the paper
@mnlevy1981
Copy link
Contributor Author

9dc441c addresses #46 and #48, that just leaves #47

Also include it in nitrogen imbalance. A couple notes:

1. The intake catalog has some duplicate data for cesm2's piControl; fields
   like NOx_FLUX and NHx_SURFACE_EMIS have a few centuries of repeated data. I
   stripped the repeat out for NHx_SURFACE_EMIS to generate the xpersist cache
   file, but left the bad data in NOx_FLUX so Anderson and I can figure out how
   to handle it
2. I don't think NHx_SURFACE_EMIS was produced by CESM1; I tried to get it from
   HPSS and got file not found errors. If it _is_ available in the old model, I
   should update the table to include it.
@mnlevy1981
Copy link
Contributor Author

d85ab78 addresses #47 but raises one more issue with the catalog itself. From that commit log:

1. The intake catalog has some duplicate data for cesm2's piControl; fields
   like NOx_FLUX and NHx_SURFACE_EMIS have a few centuries of repeated data. I
   stripped the repeat out for NHx_SURFACE_EMIS to generate the xpersist cache
   file, but left the bad data in NOx_FLUX so Anderson and I can figure out how
   to handle it

Basically, there is too much data in /glade/campaign/collections/cmip/CMIP6/timeseries-cmip6/b.e21.B1850.f09_g17.CMIP6-piControl.001/ocn/proc/tseries/month_1:

$ ls -1 *NHx_SURFACE_EMIS*
b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.000101-009912.nc
b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.010001-019912.nc
b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.020001-029912.nc
b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.030001-039912.nc
b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.040001-049912.nc
b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.050001-059912.nc
b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.060001-069912.nc
b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.070001-079912.nc
b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.080001-089912.nc
b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.090001-099912.nc
b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.100001-109912.nc
b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.110001-119912.nc
b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.110001-120012.nc
b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.120001-120012.nc
b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.120001-129912.nc
b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.120101-129912.nc
b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.130001-139912.nc
b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.140001-149912.nc
b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.140001-150012.nc
b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.150001-159912.nc
b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.160001-169912.nc
b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.170001-179912.nc
b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.180001-189912.nc
b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.190001-200012.nc

I removed

b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.110001-120012.nc
b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.120001-120012.nc
b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.120101-129912.nc
b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.140001-150012.nc

from the catalog by hand, but it would be great if one of the following happened:

  1. the duplicate data was removed from disk
  2. there was a way for intake-esm to ignore duplicate data
  3. there was a way for the tool that generates the catalog to skip these files.

Keith pointed out that I was missing NO3_RIV_FLUX and ponToSed in my imbalance
term. NO3_RIV_FLUX should be added to the other two river fluxes, and ponToSed
is a whole new row in the table.

Also reporting the imbalance to the nearest tenth, rather than as a whole
number.
@mnlevy1981
Copy link
Contributor Author

48946f8 adds NO3_RIV_FLUX and subtracts ponToSed in nitrogen imbalance; this is now very close to the back-of-the-envelope estimate @klindsay28 came up with from looking at rate that nitrate fell in piControl, so I think we've included all the terms we need... just need to figure out #42 (comment)

@andersy005
Copy link

@mnlevy1981, how are you populating the rest of the catalog columns?

from the catalog by hand, but it would be great if one of the following happened:

there was a way for intake-esm to ignore duplicate data
there was a way for the tool that generates the catalog to skip these files.

If you were using Pandas to generate the CSV, it'd be very easy to drop the duplicates from the dataframe before writing it out to a CSV file.

@mnlevy1981
Copy link
Contributor Author

mnlevy1981 commented May 28, 2020

how are you populating the rest of the catalog columns?

If you were using Pandas to generate the CSV, it'd be very easy to drop the duplicates from the dataframe before writing it out to a CSV file.

I'm using the legacy intake-esm functionality to generate this. The problem is that the duplicates aren't exact copies, but

b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.110001-119912.nc
b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.110001-120012.nc
b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.120001-120012.nc
b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.120001-129912.nc
b.e21.B1850.f09_g17.CMIP6-piControl.001.pop.h.NHx_SURFACE_EMIS.120101-129912.nc

combine to give two copies of everything between 1100 and 1299 (inclusive). (and maybe a third copy of the 12 months of 1200 for a total of 401 years of data)

@mnlevy1981
Copy link
Contributor Author

I asked Gary about cleaning up the directory:

I'm almost done cleaning up these data. Since we stopped and restarted the piControl a few times, there are overlaps at those times.

I should be done in a day or two.

So I'll regenerate the catalog next week and then mark this PR as ready for review

@mnlevy1981
Copy link
Contributor Author

Follow up from Gary:

All the monthly timeseries data from b.e21.B1850.f09_g17.CMIP6-piControl.001 are now complete and correct with no temporal overlaps.

I regenerated the catalog, but found that the overlapping files were moved to hidden directories in the directory structure:

/glade/campaign/collections/cmip/CMIP6/timeseries-cmip6/b.e21.B1850.f09_g17.CMIP6-piControl.001/.120001-120012/
/glade/campaign/collections/cmip/CMIP6/timeseries-cmip6/b.e21.B1850.f09_g17.CMIP6-piControl.001/.GOOD/

So I'm rebuilding the catalog again while [hopefully] ignoring those directories. I'm also regenerating all the xpersist cached data for piControl to check that the new catalog works (also, it has been extended an additional 800 years)

some years were duplicated on disk due to the stopping / starting of the
preindustrial control run. The xpersist cache (in /glade/p/cgd/oce) has been
updated as a means of testing the new catalog and it seems like everything is
working as it should
@mnlevy1981
Copy link
Contributor Author

With the addition of 0067586, this is ready for review.

@mnlevy1981 mnlevy1981 marked this pull request as ready for review May 29, 2020 15:43
@matt-long matt-long merged commit 0b9c6a1 into marbl-ecosys:master Jul 1, 2020
@mnlevy1981 mnlevy1981 deleted the update_SSP_data branch September 13, 2021 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants