Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iss25 #43

Merged
merged 35 commits into from
Mar 6, 2024
Merged

Iss25 #43

merged 35 commits into from
Mar 6, 2024

Conversation

kasra-keshavarz
Copy link
Owner

This Pull Request resolves the following issues: #40, #37 (partially), #36 (partially), #34, #27, #25.

The commits all have comprehensive messages describing the change the more importantly, the reason.

I do NOT like that GitHub does not provide character limitations for each line here.

This script introduces new features to the tool, including the
capability to process the climate datasets, including those consisting
of multiple models, submodels (those with specific configuration sets),
ensemble members, and multiple scenarios (SSPs). The parent calling
script is in charge of parallelization scheme, if needed.

With this script, a few issues related to the current deficiencies of
datatool could be resolved simultaneously.

Signed-off-by: Kasra Keshavarz <kasra.keshavarz1@ucalgary.ca>
Multi parallelization schemes are added, so the package not only submit
array jobs based on the given date range and the chunk schemes, but also
considers submitting jobs based on various models, ensemble members, and
scenarios. These new parallelization schemes mostly applies to climate
datasets, but not necessarily.

This commit aims to save time for the user and fasten the processing
time for datasets.

This commit resolves issue #25 on remote GitHub hosting repository.
Furthermore, it adds the ESPO dataset to the list of datasets as well.

Moreover, a new option is implement to show the list of currently
available datasets to the users.

Signed-off-by: Kasra Keshavarz <kasra.keshavarz1@ucalgary.ca>
This is meant to clearly organize the information provided inside the
package. The new file lists all the available datasets and the keyword
that users can provide the `--dataset` option. Previously, this
information was part of the main Usage message `--help` of the main
script.

Signed-off-by: Kasra Keshavarz <kasra.keshavarz1@ucalgary.ca>
1. the "function" keywords added to make the style compatible with that
   of Google's recommendations,
2. required arguments and options are revised alongside the relevant
   comments,
3. typos are fixed

Signed-off-by: Kasra Keshavarz <kasra.keshavarz1@ucalgary.ca>
The script deals with the Climate Dataset produced by the Alberta
Government. The dataset is not public yet, and is planned to be
available soon.

Signed-off-by: Kasra Keshavarz <kasra.keshavarz1@ucalgary.ca>
Since some hydrological models can use near-surface level or 40m level
data, the necessary list of variables for both levels are added.

Furthermore, a link to the official website for the dataset is added for
further clarity.

Signed-off-by: Kasra Keshavarz <kasra.keshavarz1@ucalgary.ca>
Since multiple HPCs are now used for the workflows, it is important to
have consistent datasets synchronized regularly. Therefore, this commit
attempts to reflect these efforts by creating consistent paths for
various HPCs/allocations.

Signed-off-by: Kasra Keshavarz <kasra.keshavarz1@ucalgary.ca>
In this commit, the following are addressed:
 * Correcting paths for the local scripts,
 * Renaming scripts to reflect the owner of the script for further
   clarification,
 * Adding parallelization schemes based on model, ensemble, and scenario,
 * Adding gcc/9.3.0 as the reference clib for the modules loaded to
   prevent mismatch between various environments defined on the HPCs,
 * Assuring ESPG:4326 is considered for the input shape file if there is
   no CRS defined,
 * Getting rid of \t characters in the help messages,
 * Correcting short help message to be more informative,
 * Adding function declarations to follow Google’s shell scripting
   guidelines,
 * Assuring --account=STR is described in the help message.

Signed-off-by: Kasra Keshavarz <kasra.keshavarz1@ucalgary.ca>
Various files within this directory is categorized to be more
informative for the users/devs.

Signed-off-by: Kasra Keshavarz <kasra.keshavarz1@ucalgary.ca>
The README file for this dataset is added, offering necessary
information for the users.

Signed-off-by: Kasra Keshavarz <kasra.keshavarz1@ucalgary.ca>
This commit assures all dataset scripts follows the convention of
<institute>-<dataset-name> under the `scripts` path.

Furthermore, necessary adjusments on the styles of the scripts has been
implemented, including:
  * adding `--model`, `--scenario`, and `--ensemble` options, if missing,
    for compatibility with the main caller script, as these options are
    given to the script by `extract-dataset.sh` script,
  * assuring scripting style follows that of Google's shell scripting
    guidelines,
  * the paths to the externally called scripts are properlly adjusted,
    after modifications to the structure of datatool's `assets`
    directory, and
  * minor changes to the source code to assure compatibility with the
    v0.5.0 of datatool.

Signed-off-by: Kasra Keshavarz <kasra.keshavarz1@ucalgary.ca>
This commit addresses issue #27 by describing the NASA's NEX-GDDP-CMIP^
dataset and relevant scripts for it. Furthermore, it provides necessary
information for users to enable them use `datatool` for extracting
subsets of the dataset for any temporal and spatial extents.

Signed-off-by: Kasra Keshavarz <kasra.keshavarz1@ucalgary.ca>
This commit addresses issue #27 and provides scripts to extract subset
from NASA's NEX-GDDP-CMIP6 dataset. This script is capable to work with
various models, scenarios, ensemble members, and variables offered by
this dataset.

Signed-off-by: Kasra Keshavarz <kasra.keshavarz1@ucalgary.ca>
This commit addresses issue #34 and processes this dataset that contains
multiple GCM model outputs, including various sub-models, scenarios,
ensemble members, and variables.

Signed-off-by: Kasra Keshavarz <kasra.keshavarz1@ucalgary.ca>
Necessary information to use `datatool` for this script is provided to
the user via the README.md file.

Signed-off-by: Kasra Keshavarz <kasra.keshavarz1@ucalgary.ca>
With the growing number of scripts, this commit tries to restructure
this directory to provide more clarity and organization for the users.

Signed-off-by: Kasra Keshavarz <kasra.keshavarz1@ucalgary.ca>
The help message has been trimmed to provide more information to the
users. This include values provided to the `--lon-lims` that must be
within the [-180, +180] limits. This has not been mentioned before to
the users and could have provided confusion, as there are multiple
methods to describe longitudes.

Furthermore, the list of datasets on the main page of the repository has
been updated to reflect the most up-to-date list.

Signed-off-by: Kasra Keshavarz <kasra.keshavarz1@ucalgary.ca>
@kasra-keshavarz kasra-keshavarz added bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request added dataset new dataset being added to the script new release new release labels Mar 5, 2024
@kasra-keshavarz kasra-keshavarz self-assigned this Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
added dataset new dataset being added to the script bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request new release new release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant