[0.0.13] - 2026-03-18
NB: version 0.0.12 was not successfully published because of a wrong test configuration. This version replaces it.
Added
- Support for reading CSV files by chunks using pandas.read_csv
chunksizeargument (only for DataStores).
This modification is compatible with multi-threading and returns a more accurate progression indicator.
It is enabled by default. To disable it, use theallow_chunks=Falseargument or enter--no-chunksin the Options column of the resources worksheet. - File formats:
- Added CLI argument
--read-kwargs/--write-kwargsin the Options column to customize the arguments for the read/write functions.
e.g. for CSV file format, the arguments of thepandas.read_csvfunction can be changed with the following
CLI argument--read-kwargs compression=gzip,header=10. - Added support for the following file formats: Excel (xls, xlsx, xlsm, xlsb, odf, ods, odt), JSON.
- Support for user-defined file format loading functions with new Excel columns Read function and Write function per resource.
See documentation for function prototypes.
- Added CLI argument
- Excel builder workbook:
- Extra column Data cleaner per DataStore to activate a function which corrects values according to the destination type specified in the fields metadata (only for uploads).
- CKAN upload parameterization fields (ckan sheet):
- Field Limit to change the number of rows sent per request.
- Field Time between requests to change the delay between each request (upload/download), in seconds.
- Field Thread count to change the default number of threads used for large datasets (upload/download).
- The upload data cleaner can be used to replace empty string values with None for columns indicated as numeric in the field metadata.
- The admin report can output the storage space used by a dataset in custom fields defined in the data format policy JSON.
- Additional default location for the CKAN API key file:
~/.config/__CKAN_API_KEY__.txt. - Automatically re-attempt failed API calls with a delay for certain HTTP error codes. This robustifies the scripts when the CKAN server is overloaded.
Changed
- Datasets are uploaded in Draft state by default. When the upload is finished, the state specified in the Excel workbook is applied.
Datasets with Draft state are visible by clicking on your profile name at the top of the CKAN web interface (if you originally created it).- Upon dataset creation, if was found in Deleted state, its ressources are deleted.
- Primary key when uploading to a DataStore:
- When no primary key is specified, the program adds a new column named
py_upload_indexas a primary key. - The default behavior when a multi-column primary key has changed. It is specified is to systematically upsert all records
(update existing, add new primary key combinations).
The mode--one-frame-per-primary-keychecks the last record corresponding to the columns designated by the--group-byargument.
It expects that an input file represents all the data for a group-by key combination. The group-by key must be a subset of the primary key.
This was the default mode used for previous versions of the package.
- When no primary key is specified, the program adds a new column named
- The output fields generated by the admin report and data format policy are not None anymore.
The JSON parameters for this feature were renamed. - The
progress_callbackfunction keyword arguments have been renamed. - Implementation:
- Auxiliary functions in resource builder classes have been refactored.
BuilderDataStoreFilenow inherits fromBuilderDataStoreFolderto support multi-threading of a file reading by chunks.
FunctionBuilderDataStoreFolder.from_file_datastorehas been moved toBuilderDataStoreFile.to_builder_datastore_folder.
Deprecated
- Excel workbook:
- The attribute for the package name used in its URL was renamed from Name to Name in URL.
Attribute Name still functions but is marked as deprecated.
- The attribute for the package name used in its URL was renamed from Name to Name in URL.