Skip to content

[0.0.13] - 2026-03-18

Choose a tag to compare

@ifpen-gp ifpen-gp released this 18 Mar 17:31
· 128 commits to main since this release

NB: version 0.0.12 was not successfully published because of a wrong test configuration. This version replaces it.

Added

  • Support for reading CSV files by chunks using pandas.read_csv chunksize argument (only for DataStores).
    This modification is compatible with multi-threading and returns a more accurate progression indicator.
    It is enabled by default. To disable it, use the allow_chunks=False argument or enter --no-chunks in the Options column of the resources worksheet.
  • File formats:
    • Added CLI argument --read-kwargs/--write-kwargs in the Options column to customize the arguments for the read/write functions.
      e.g. for CSV file format, the arguments of the pandas.read_csv function can be changed with the following
      CLI argument --read-kwargs compression=gzip,header=10.
    • Added support for the following file formats: Excel (xls, xlsx, xlsm, xlsb, odf, ods, odt), JSON.
    • Support for user-defined file format loading functions with new Excel columns Read function and Write function per resource.
      See documentation for function prototypes.
  • Excel builder workbook:
    • Extra column Data cleaner per DataStore to activate a function which corrects values according to the destination type specified in the fields metadata (only for uploads).
    • CKAN upload parameterization fields (ckan sheet):
      • Field Limit to change the number of rows sent per request.
      • Field Time between requests to change the delay between each request (upload/download), in seconds.
      • Field Thread count to change the default number of threads used for large datasets (upload/download).
  • The upload data cleaner can be used to replace empty string values with None for columns indicated as numeric in the field metadata.
  • The admin report can output the storage space used by a dataset in custom fields defined in the data format policy JSON.
  • Additional default location for the CKAN API key file: ~/.config/__CKAN_API_KEY__.txt.
  • Automatically re-attempt failed API calls with a delay for certain HTTP error codes. This robustifies the scripts when the CKAN server is overloaded.

Changed

  • Datasets are uploaded in Draft state by default. When the upload is finished, the state specified in the Excel workbook is applied.
    Datasets with Draft state are visible by clicking on your profile name at the top of the CKAN web interface (if you originally created it).
    • Upon dataset creation, if was found in Deleted state, its ressources are deleted.
  • Primary key when uploading to a DataStore:
    • When no primary key is specified, the program adds a new column named py_upload_index as a primary key.
    • The default behavior when a multi-column primary key has changed. It is specified is to systematically upsert all records
      (update existing, add new primary key combinations).
      The mode --one-frame-per-primary-key checks the last record corresponding to the columns designated by the --group-by argument.
      It expects that an input file represents all the data for a group-by key combination. The group-by key must be a subset of the primary key.
      This was the default mode used for previous versions of the package.
  • The output fields generated by the admin report and data format policy are not None anymore.
    The JSON parameters for this feature were renamed.
  • The progress_callback function keyword arguments have been renamed.
  • Implementation:
    • Auxiliary functions in resource builder classes have been refactored.
    • BuilderDataStoreFile now inherits from BuilderDataStoreFolder to support multi-threading of a file reading by chunks.
      Function BuilderDataStoreFolder.from_file_datastore has been moved to BuilderDataStoreFile.to_builder_datastore_folder.

Deprecated

  • Excel workbook:
    • The attribute for the package name used in its URL was renamed from Name to Name in URL.
      Attribute Name still functions but is marked as deprecated.