# Jan 18th

Research & API exploration:
- Researched the ClinicalTrials.gov v2 API structure 
- Identified supported query parameters and advanced search syntax
- Analyzed API response schema and core output modules (7 study modules)
- Identify important fields 
- Explore pagination strategy for large-scale data pulls


# Jan 22nd

- Implemented `trials_fetch()` to query the **ClinicalTrials.gov v2 API** with automatic pagination.
- Added filtering support via advanced query syntax:
  - `phase`
  - `country`
  - date range (`from_date`, `to_date`) with a selectable `date_filter` field.

**Notable implementation details:**

- **Date parsing & validation:** Enforced strict user input format (`YYYY-MM-DD`) to avoid ambiguous or invalid dates; validated date bounds and handled edge cases (e.g., `from_date` later than `to_date`). Converted validated dates into the APIâ€™s advanced query syntax using  
  ```r
  paste0("AREA[", date_filter, "]RANGE[", from_date, ",", to_date, "]")
- **Also include date_filter selection** (e.g., filtering by study start date vs. completion date).
- **Maximum record enforcement:** Implemented a strict upper bound on the total number of records fetched by maintaining a running counter across paginated requests and cropping the final page of results to avoid exceeding the limit:
```data <- .fetch_page(url, params)
studies <- data$studies

if (is.null(studies) || length(studies) == 0) break

remaining <- max_records - fetched
to_take   <- min(length(studies), remaining)
studies   <- studies[seq_len(to_take)]```

In [None]:
suppressWarnings(devtools::load_all())
spec <- list(
  query = "cancer",
  max_records = 10,
  phase = "Phase 2",
  country = "Canada",
  from_date = "2022-01-01",
  to_date = "2023-01-01",
  date_filter = "StartDate"
)
result <- trials_run_spec(spec)


In [None]:
# Show that 10 records were retrieved
result[2]

# Feb 1st

JSON Streaming Support

Implemented the ability to save results directly to a JSON file.

- This can be enabled via: `trial_fetch(save_json = TRUE, json_file = "filename.json")`

- This feature is intended for users who want to collect large volumes of data.
    - Without JSON streaming, all results would be stored in memory, which can lead to memory exhaustion for large queries.
    - To avoid this, downloading to JSON is implemented using data streaming, where small chunks of data are written incrementally to disk instead of being kept in memory.

```
while (fetched < max_records) {
  data <- .fetch_page(url, params)
  studies <- data$studies

  if (save_json) {
    for (s in studies) {
      if (!first) writeLines(",", con)  # avoid writing a comma before the first entry

      writeLines(
        jsonlite::toJSON(s, auto_unbox = TRUE, null = "null"),
        con
      )

      first <- FALSE
      fetched <- fetched + 1L
    }
  }
}
```

Core idea
   - Fetch studies in batches (up to 1000 records per API request).
   - Write each study directly to a JSON file as it is retrieved.
   - Flush data to disk incrementally instead of accumulating it in memory.
   - This allows users to download thousands of records safely without risking memory overload.



In [3]:
json_spec <- list(
    query = "cancer",
    max_records = 2000,
    phase = "Phase 2",
    country = "Canada",
    from_date = "2016-01-01",
    to_date = "2023-01-01",
    date_filter = "StartDate",
    save_json = TRUE,
    json_file = "cancer_trials.json"  
    )
json_result <- trials_run_spec(json_spec)

$query
[1] "cancer"

$max_records
[1] 2000

$fetched
[1] 920

$phase
[1] "Phase 2"

$country
[1] "Canada"

$from_date
[1] "2016-01-01"

$to_date
[1] "2023-01-01"

$date_filter
[1] "StartDate"

[1] "Saving JSON output to cancer_trials.json"


In [None]:
library(jsonlite)
data <- fromJSON("cancer_trials.json", simplifyVector = FALSE)


In [13]:
str(data[[1]], max.level = 3)

List of 5
 $ protocolSection:List of 13
  ..$ identificationModule      :List of 6
  .. ..$ nctId           : chr "NCT04649359"
  .. ..$ orgStudyIdInfo  :List of 1
  .. ..$ secondaryIdInfos:List of 2
  .. ..$ organization    :List of 2
  .. ..$ briefTitle      : chr "MagnetisMM-3: Study Of Elranatamab (PF-06863135) Monotherapy in Participants With Multiple Myeloma Who Are Refr"| __truncated__
  .. ..$ officialTitle   : chr "MAGNETISMM-3 AN OPEN-LABEL, MULTICENTER, NON-RANDOMIZED PHASE 2 STUDY OF ELRANATAMAB (PF-06863135) MONOTHERAPY "| __truncated__
  ..$ statusModule              :List of 13
  .. ..$ statusVerifiedDate         : chr "2025-11"
  .. ..$ overallStatus              : chr "ACTIVE_NOT_RECRUITING"
  .. ..$ startDateStruct            :List of 2
  .. ..$ primaryCompletionDateStruct:List of 2
  .. ..$ completionDateStruct       :List of 2
  .. ..$ studyFirstSubmitDate       : chr "2020-11-06"
  .. ..$ studyFirstSubmitQcDate     : chr "2020-11-24"
  .. ..$ studyFirstPostDateStru