
# Response to Editor and Reviewer Comments (CoSiBD) — v13 (2026-01-27)

**Manuscript:** *A synthetic dataset for Time Series Super-Resolution with Deep Learning*  
**Target journal:** *Scientific Data (Nature Portfolio)*  
**Revision package:** `main_english_v13_final` + dataset/code updates

This notebook follows the same structure requested by the supervisor:

- **(A) Response to reviewer/editor** (point-by-point, verbatim mapping).
- **(B) Manuscript changes** indicating *where* the change appears (section/figure/table).

> **Note:** Page/line pointers below refer to the PDF version of v13 for quick navigation during internal review.


## Editor requirements


### [E1] Share data in standardized formats beyond NumPy arrays (CSV/JSON)

**(A) Response to editor**  
We agree and have expanded the release to include portable, standardized formats. In addition to NumPy archives (`.npz`), the dataset is now distributed in **plain-text (`.txt`)** and **JSON (`.json`)** consolidated files for each resolution level. This enables direct use in non-Python environments and web-based workflows, and supports reproducible LR–HR pairing by row index across formats.

**(B) Manuscript changes**  
- **Data Records** now states that each signal set is provided in **three formats** (`.npz`, `.txt`, `.json`) and explains the structure and alignment by row index.  
  See *Data Records* (p. 7), including the bullet list describing formats and consolidated file names.



### [E2] Add data citations for repository datasets in the reference list and cite them where mentioned

**(A) Response to editor**  
We agree and have ensured that externally deposited data are formally cited following *Scientific Data* data-citation guidance. The Zenodo deposition is now cited in the **References** and referenced at first mention in **Data Records**.

**(B) Manuscript changes**  
- **Zenodo dataset citation** is included as a dedicated reference (Ref. **25**) with DOI.  
- The **Data Records** section explicitly identifies Zenodo as the repository and cites Ref. 25 at first mention (p. 10, “The full dataset is publicly available on Zenodo^25 …”).


## Reviewer #1


### [R1-1] “No evidence that synthetic model resembles real-world time series”; “dataset poorly justified”

**(A) Response to reviewer**  
We thank the reviewer for highlighting the need for clearer motivation and intended scope. CoSiBD is a **synthetic benchmark** designed to support *controlled, reproducible* evaluation of time-series super-resolution methods under systematically varied difficulty factors (piecewise non-stationarity, intermittent high-frequency activity, amplitude envelopes, baseline drift, and noise). The goal is not to claim that any specific domain is faithfully reproduced, but to provide a **transparent, parameterized generator** and paired LR–HR signals with rich metadata so that researchers can benchmark SR models in a reproducible setting and tailor generation settings to domain-specific needs.

To strengthen motivation, we (i) explicitly list the signal design principles and the real-signal properties they are intended to emulate at a qualitative level (physiological and speech-inspired behaviors), and (ii) add a dedicated comparison to existing synthetic time-series resources, clarifying the gap CoSiBD addresses for SR benchmarking.

**(B) Manuscript changes**  
- **Signal design principles** explicitly stated (p. 3; directly below Fig. 1).  
- **Related synthetic time-series resources** section added/expanded to position CoSiBD relative to RadioML, ECGSYN/SEREEGA, LoadGAN, and seismic trace datasets, and to articulate the SR-specific gap CoSiBD targets (p. 2).



### [R1-2] “Noise model undocumented”; concerns about deterministic sinusoid vs Gaussian/broadband noise

**(A) Response to reviewer**  
We agree that the noise model must be transparent. The dataset now explicitly implements and documents **two distinct noise types**:  
1) **Additive Gaussian noise** (broadband) with configurable amplitude; and  
2) **Structured narrow-band sinusoidal interference**, intended to emulate common acquisition artifacts such as mains hum (50/60 Hz-like interference under an illustrative time scaling).

Both noise types are applied probabilistically on a per-signal basis, and **all noise parameters are recorded in per-signal metadata**, enabling deterministic regeneration and stratified evaluation by noise condition.

**(B) Manuscript changes**  
- A dedicated subsection **“Rationale for structured 50/60 Hz interference and noise”** clarifies the motivation and interpretation (p. 5; Fig. 3).  
- The **generation pipeline step 7 (Noise injection)** now enumerates both noise types and notes that parameters are stored in metadata (p. 5).  
- **Metadata table and example** now include `noise_profile` fields (p. 8; Table 1 + JSON snippet).



### [R1-3] “Sampling frequency/time units missing”; “omits units on time axes”; “frequency in Hz unclear”

**(A) Response to reviewer**  
We agree that unit conventions must be explicit. CoSiBD signals are released as **discrete sequences** \(x[n]\) defined over a **reference domain** \(\tau \in [0,4\pi]\). Interpreting \(\tau\) as physical time (seconds) requires selecting a duration \(T\) for the reference interval, which then implies a sampling rate \(f_s = N/T\). To avoid ambiguity, the manuscript now (i) explains the reference-domain design, (ii) provides an **illustrative convention** \(T = 4\pi \text{ s}\) for reporting example Hz values, and (iii) makes clear that changing \(T\) rescales Hz values while the discrete sequences remain unchanged.

This resolves the unit concern while preserving the intended generality of the dataset across application contexts.

**(B) Manuscript changes**  
- New subsection **“Sampling units and frequency interpretation”** explains the reference-domain and mapping to physical units; includes explicit \(f_s\) expression and scaling rule (p. 6).  
- **Fig. 4** illustrates invariance of the discrete sequence under different assumed time scalings and shows how Hz axes shift with assumed \(f_s\) (p. 6).



### [R1-4] “No anti-aliasing filter before subsampling”; “aliasing artifacts visible”; “conceptually flawed validation”

**(A) Response to reviewer**  
We thank the reviewer for this important signal-processing point. CoSiBD’s paired LR–HR observations are produced via **uniform decimation without pre-filtering** by design. This choice is now explicitly stated and justified as part of the benchmark’s intent: SR models in practice often face LR observations that may contain measurement artifacts and are not always produced by ideal anti-aliasing pipelines. Providing LR signals via decimation creates a reproducible, transparent degradation process that includes challenging conditions SR methods must handle.

To address conceptual clarity, the manuscript now (i) states the decimation procedure explicitly, (ii) avoids implying that LR spectra should match HR spectra in an “anti-aliased” sense, and (iii) frames validation plots as descriptive documentation of behavior under the reported generation settings, not as proofs of optimal sampling practice.

**(B) Manuscript changes**  
- **Resolution variation step** explicitly states that LR signals are obtained by **uniform decimation (uniform subsampling) of HR without pre-filtering** (p. 5).  
- **Validation Context** clarifies that analyses are descriptive and intended to document behaviors under the generator settings (p. 10).



### [R1-5] “Metadata desirable to have annotations of the signals”; “predefined validation sets impose arbitrary assumptions”

**(A) Response to reviewer**  
We agree that flexibility is important. CoSiBD does **not** impose predefined train/validation/test splits. Instead, we provide **per-signal metadata** that supports principled partitioning (e.g., by noise type, segment labels, change-points, or difficulty factors). The metadata includes change-point structures and segment labels aligned to the piecewise frequency profile, enabling users to create scenario-based splits and analyze regimes explicitly.

**(B) Manuscript changes**  
- **Usage Notes** now states that the dataset is distributed **without a predefined split** and describes how users can create partitions using metadata (p. 13).  
- **Table 1 + example JSON entry** document change-points (`base_points`, `high_freq_points`) and segment labels (`variation_type`) (p. 8).


## Reviewer #2


### [R2-1] “Limited realism”; need stronger motivation and comparison to existing benchmarks

**(A) Response to reviewer**  
We agree and have strengthened (i) the motivation for why piecewise non-stationary sinusoidal constructions are useful as controlled benchmarks for SR, and (ii) the comparison to existing synthetic resources. The manuscript now clarifies that CoSiBD is intended as a **benchmarking resource** with tunable difficulty factors and deterministic regeneration, rather than a domain-specific simulator. We additionally provide an explicit comparison to representative synthetic time-series resources and identify the SR-specific gap CoSiBD addresses.

**(B) Manuscript changes**  
- Expanded **Related synthetic time-series resources** and gap statement (p. 2).  
- Added explicit **Signal design principles** that connect real-signal qualitative properties to design mechanisms (p. 3).



### [R2-2] “Metrics listed but no numerical SR baselines”; request for experimental results

**(A) Response to reviewer**  
We respectfully note that *Scientific Data* Data Descriptors focus on describing data and supporting reuse. Our revision therefore prioritizes (i) rigorous documentation of generation procedures, (ii) dataset structure, metadata, and pairing, and (iii) technical validation that documents dataset behavior under reported settings. We do not present model-training baselines to avoid shifting the manuscript toward a research article format.

**(B) Manuscript changes**  
- **Technical Validation** is framed explicitly as descriptive documentation for transparency and reuse, not as model benchmarking (p. 10).



### [R2-3] Figures: axis labels/units; simplify presentation

**(A) Response to reviewer**  
We agree and have improved figure interpretability by adding explicit unit conventions and clarifying how axes should be interpreted under the reference-domain design. Additionally, schematic content is supported by text/captions to reduce reliance on dense in-figure explanations.

**(B) Manuscript changes**  
- **Fig. 3** and accompanying text justify the structured interference and clarify the illustrative time scaling (p. 5).  
- **Fig. 4** explicitly addresses the unit convention and mapping from normalized frequency to Hz (p. 6).  
- **Fig. 1** is supported by a more explicit step-by-step description and “Signal design principles” text (p. 3).



### [R2-4] Reproducibility: random seeds not fixed

**(A) Response to reviewer**  
We agree that deterministic regeneration is critical. CoSiBD assigns a **unique per-signal seed** to each high-resolution signal and records it in metadata, enabling exact regeneration of individual signals and the full dataset.

**(B) Manuscript changes**  
- **Data Records** now states that each HR signal uses a unique seed in the range **10000–12499** and that all generation parameters are recorded in metadata (p. 7).  
- **Metadata schema (Table 1)** includes `seed` (p. 8).


## Reviewer #3


### [R3-1] Demonstrative impact: train DL on synthetic and validate on real-world signals

**(A) Response to reviewer**  
We appreciate the suggestion. However, in a *Scientific Data* Data Descriptor the primary objective is to **describe the dataset**, its generation, structure, metadata, validation, and reuse potential. Training specific CNN/RNN/LSTM baselines and validating on real-world datasets would shift the manuscript toward a methodological research study. We therefore focus on providing the resources needed for the community to perform such studies: paired LR–HR signals at multiple resolutions, deterministic regeneration, and rich metadata to design controlled experiments.

**(B) Manuscript changes**  
- **Usage Notes** explicitly encourages users to create task-appropriate splits and evaluations using the provided metadata (p. 13).  
- **Code availability** and metadata regeneration capability are described to support downstream benchmarking studies (p. 14–15).
