Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] L0A and L0B processing TODO list #135

Closed
13 tasks done
ghiggi opened this issue Dec 5, 2022 · 0 comments
Closed
13 tasks done

[FEATURE] L0A and L0B processing TODO list #135

ghiggi opened this issue Dec 5, 2022 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@ghiggi
Copy link
Collaborator

ghiggi commented Dec 5, 2022

Is your feature request related to a problem? Please describe.
This issue described improvements to be done to L0A and L0B processing

Describe the solution you'd like

  • Check reader_kwargs delimiter is provided and warn otherwise.

  • check df_sanitizer_fun has only lazy and df arguments

  • Remove read_raw_data_zipped function and associated code (currently used for GPM campaigns)

  • Enable saving integers columns to Parquet files. This requires:

    1. Definition of a FillValue flag for integer columns (using _FillValue of L0B_encodings.yml, except for raw_drop*)
    2. Coercion of nan to fill values before casting to int type in L0A processing
    3. Replace fill values with np.nan in L0B processing.
  • In L0B processing, replace nan_flags from L0_data_format.yml with np.nan

  • Feature to drop dates based on issue/station_id.yml file ... .

  • In L0B processing, add variable_type (coordinate, count, category, flag, quantity, flux) attribute

  • Enable reader development for stations where data are separated in two files. Example with Grenoble: raw.txt e matrix.txt

  • check_metadata_compliance strictly!

  • In L0B processing, check ThiesLPM and OTT_Parsivel raw_drop_number shape: (diameter, velocity) vs (velocity, diameter)

  • Decide whether to support dask.dataframe or use dask.delayed and save separate Parquets (more efficient)

  • If supporting dask dataframe, maybe optimize row_partition optmization

  • Decide whether to modify L0B to save each netCDF separately and only add the end (optionally) open again all files, concat and write the full file.

@ghiggi ghiggi added the enhancement New feature or request label Dec 5, 2022
@ghiggi ghiggi changed the title [FEATURE] L0A and L0B processing features [FEATURE] L0A and L0B processing TODO list Dec 5, 2022
@ghiggi ghiggi closed this as completed Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants