Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable fields in EUCTR are not always retrieved #26

Closed
machado-t opened this issue Oct 4, 2023 · 14 comments
Closed

Variable fields in EUCTR are not always retrieved #26

machado-t opened this issue Oct 4, 2023 · 14 comments

Comments

@machado-t
Copy link

Some fields in EUCTR change their name depending on certain characteristics. For example, E.8.4 can have two names; "The trial involves multiple sites in the Member State concerned" or "Will this trial be conducted at multiple sites globally?", depending on whether the trial record pertains to countries outside the European Economic Area (hence, a "/3RD" protocol).
Apparently, the package is not retrieving the field E.8.4 if it falls under the '/3RD' protocol. Is there a potential fix for this issue?
Thank you for your valuable work!

@rfhb rfhb self-assigned this Oct 4, 2023
@rfhb
Copy link
Owner

rfhb commented Oct 4, 2023

Projection solution available by 2023-10-08

@rfhb
Copy link
Owner

rfhb commented Oct 5, 2023

@machado-t

  • I cannot confirm that "the package is not retrieving the field E.8.4 if it falls under the '/3RD' protocol". The field E.8.4 is retrieved and converted, but has a different name for 3rd country trials than for trials with a site in the EU. Please mention the ID of a trial for which is not working, thanks.

  • Would the following changes to fields be helpful? In bold, fields that have been changed, in order to have different names to reflect different meaning (e84_, e840_) or to have the same name as they mean the same (e83_, e863_):

2014-002605-38-DE 2015-004898-32-3RD
"e83_single_site_trial": "No" (was before: "e83_the_trial_involves_single_site_ in_the_member_state_concerned": "No") "e83_single_site_trial": "No" (was before: "e83_will_this_trial_be_conducted_ at_a_single_site_globally": "No")
"e840_multiple_sites_globally": "Yes" (was before: "e84_will_this_trial_be_conducted_ at_multiple_sites_globally": "Yes"))
"e84_multiple_sites_in_member_state": "Yes" (was before: "e84_the_trial_involves_multiple_sites_ in_the_member_state_concerned": "Yes")
"e841_number_of_sites_anticipated_ in_member_state_concerned": "3",
"e85_the_trial_involves_multiple_member_states": "Yes",
"e851_number_of_sites_anticipated_in_the_eea": "26",
"e863_trial_sites_planned_in": "Switzerland Australia Austria Canada Denmark Finland France Germany Italy Japan Netherlands Norway Spain United Kingdom United States" (was before: "e863_specify_the_regions_in_which_ trial_sites_are_planned": ...) "e863_trial_sites_planned_in": "Argentina Colombia Costa Rica Ecuador Mexico Peru Turkey Venezuela, Bolivarian Republic of" (was before: "e863_specify_the_countries_outside_ of_the_eea_in_which_ trial_sites_are_planned": ...)

@rfhb rfhb added the question label Oct 5, 2023
@machado-t
Copy link
Author

machado-t commented Oct 6, 2023

The field E.8.4 is retrieved and converted, but has a different name for 3rd country trials than for trials with a site in the EU.

I now realize what caused me to make a mistake: for some reason, my field list retrieved by dbFindFields() seems to be incomplete.

I would expect dbFindFields to output both versions of E.8.4 (and other variable fields for that matter), but it is only outputting one of them. This led me to believe the same E.8.4 field would output whichever version depending on the trial type (EEA vs 3RD).

I have now noticed, that the last time I run dbFindFields, it only outputted the fields for 3rd country trials, while in a previous time I've run it, it only outputted the fields for EEA trials. I've tested this again with the same results.
Here's the code I've just used to test this:

library(ctrdata)

db <- nodbi::src_sqlite(
  dbname = "test.sqlite",
  collection = "test"
)

ctrLoadQueryIntoDb(
  queryterm = "https://www.clinicaltrialsregister.eu/ctr-search/search?query=&age=under-18",
  con = db
)

fieldlist <- dbFindFields(namepart = ".*", con = db, verbose = TRUE)

And here's the output list of fields:

"","x"
"1","a2_eudract_number"
"2","a3_full_title_of_the_trial"
"3","a31_title_of_the_trial_for_lay_people_in_easily_understood_ie_nontechnical_language"
"4","a41_sponsors_protocol_code_number"
"5","a52_us_nct_clinicaltrialsgov_registry_number"
"6","a7_trial_is_part_of_a_paediatric_investigation_plan"
"7","a8_ema_decision_number_of_paediatric_investigation_plan"
"8","b1_sponsor"
"9","b1_sponsor._b1_sponsor"
"10","b1_sponsor.b11_name_of_sponsor"
"11","b1_sponsor.b134_country"
"12","b1_sponsor.b31_and_b32_status_of_the_sponsor"
"13","b1_sponsor.b4_sources_of_monetary_or_material_support"
"14","b1_sponsor.b4_sources_of_monetary_or_material_support.b41_name_of_organisation_providing_support"
"15","b1_sponsor.b4_sources_of_monetary_or_material_support.b42_country"
"16","b1_sponsor.b51_name_of_organisation"
"17","b1_sponsor.b52_functional_name_of_contact_point"
"18","b1_sponsor.b531_street_address"
"19","b1_sponsor.b532_town_city"
"20","b1_sponsor.b533_post_code"
"21","b1_sponsor.b534_country"
"22","b1_sponsor.b56_email"
"23","ctrname"
"24","dimp"
"25","dimp._dimp"
"26","dimp.d12_and_d13_imp_role"
"27","dimp.d21_imp_to_be_used_in_the_trial_has_a_marketing_authorisation"
"28","dimp.d25_the_imp_has_been_designated_in_this_indication_as_an_orphan_drug_in_the_community"
"29","dimp.d31_product_name"
"30","dimp.d3111_active_substance_of_chemical_origin"
"31","dimp.d31110_medicinal_product_containing_genetically_modified_organisms"
"32","dimp.d31111_herbal_medicinal_product"
"33","dimp.d31112_homeopathic_medicinal_product"
"34","dimp.d31113_another_type_of_medicinal_product"
"35","dimp.d3112_active_substance_of_biological_biotechnological_origin_other_than_advanced_therapy_imp_atimp"
"36","dimp.d3113_advanced_therapy_imp_atimp"
"37","dimp.d31131_somatic_cell_therapy_medicinal_product"
"38","dimp.d31132_gene_therapy_medical_product"
"39","dimp.d31133_tissue_engineered_product"
"40","dimp.d31134_combination_atimp_ie_one_involving_a_medical_device"
"41","dimp.d31135_committee_on_advanced_therapies_cat_has_issued_a_classification_for_this_product"
"42","dimp.d3114_combination_product_that_includes_a_device_but_does_not_involve_an_advanced_therapy"
"43","dimp.d3115_radiopharmaceutical_medicinal_product"
"44","dimp.d3116_immunological_medicinal_product_such_as_vaccine_allergen_immune_serum"
"45","dimp.d3117_plasma_derived_medicinal_product"
"46","dimp.d3118_extractive_medicinal_product"
"47","dimp.d3119_recombinant_medicinal_product"
"48","dimp.d32_product_code"
"49","dimp.d34_pharmaceutical_form"
"50","dimp.d341_specific_paediatric_formulation"
"51","dimp.d37_routes_of_administration_for_this_imp"
"52","dimp.d38_imp_identification_details"
"53","dimp.d38_imp_identification_details.d3101_concentration_unit"
"54","dimp.d38_imp_identification_details.d3102_concentration_type"
"55","dimp.d38_imp_identification_details.d3103_concentration_number"
"56","dimp.d38_imp_identification_details.d38_inn__proposed_inn"
"57","dimp.d38_imp_identification_details.d392_current_sponsor_code"
"58","dimp.d38_imp_identification_details.d393_other_descriptive_name"
"59","dimp.d38_imp_identification_details.d394_ev_substance_code"
"60","e11_medical_conditions_being_investigated"
"61","e111_medical_condition_in_easily_understood_language"
"62","e112_therapeutic_area"
"63","e13_condition_being_studied_is_a_rare_disease"
"64","e21_main_objective_of_the_trial"
"65","e22_secondary_objectives_of_the_trial"
"66","e23_trial_contains_a_substudy"
"67","e3_principal_inclusion_criteria"
"68","e4_principal_exclusion_criteria"
"69","e51_primary_end_points"
"70","e511_timepoints_of_evaluation_of_this_end_point"
"71","e52_secondary_end_points"
"72","e521_timepoints_of_evaluation_of_this_end_point"
"73","e61_diagnosis"
"74","e610_pharmacogenetic"
"75","e611_pharmacogenomic"
"76","e612_pharmacoeconomic"
"77","e613_others"
"78","e62_prophylaxis"
"79","e63_therapy"
"80","e64_safety"
"81","e65_efficacy"
"82","e66_pharmacokinetic"
"83","e67_pharmacodynamic"
"84","e68_bioequivalence"
"85","e69_dose_response"
"86","e71_human_pharmacology_phase_i"
"87","e711_first_administration_to_humans"
"88","e712_bioequivalence_study"
"89","e713_other"
"90","e7131_other_trial_type_description"
"91","e72_therapeutic_exploratory_phase_ii"
"92","e73_therapeutic_confirmatory_phase_iii"
"93","e74_therapeutic_use_phase_iv"
"94","e81_controlled"
"95","e811_randomised"
"96","e812_open"
"97","e813_single_blind"
"98","e814_double_blind"
"99","e815_parallel_group"
"100","e816_cross_over"
"101","e817_other"
"102","e821_other_medicinal_products"
"103","e822_placebo"
"104","e823_other"
"105","e824_number_of_treatment_arms_in_the_trial"
"106","e83_will_this_trial_be_conducted_at_a_single_site_globally"
"107","e84_will_this_trial_be_conducted_at_multiple_sites_globally"
"108","e862_trial_being_conducted_completely_outside_of_the_eea"
"109","e863_specify_the_countries_outside_of_the_eea_in_which_trial_sites_are_planned"
"110","e87_trial_has_a_data_monitoring_committee"
"111","e88_definition_of_the_end_of_the_trial_and_justification_where_it_is_not_the_last_visit_of_the_last_subject_undergoing_the_trial"
"112","e892_in_all_countries_concerned_by_the_trial_months"
"113","f11_number_of_subjects_for_this_age_range"
"114","f11_trial_has_subjects_under_18"
"115","f111_in_utero"
"116","f112_preterm_newborn_infants_up_to_gestational_age__37_weeks"
"117","f113_newborns_027_days"
"118","f114_infants_and_toddlers_28_days23_months"
"119","f115_children_211years"
"120","f1151_number_of_subjects_for_this_age_range"
"121","f116_adolescents_1217_years"
"122","f12_adults_1864_years"
"123","f13_elderly_65_years"
"124","f21_female"
"125","f22_male"
"126","f31_healthy_volunteers"
"127","f32_patients"
"128","f33_specific_vulnerable_populations"
"129","f331_women_of_childbearing_potential_not_using_contraception_"
"130","f332_women_of_childbearing_potential_using_contraception"
"131","f333_pregnant_women"
"132","f334_nursing_women"
"133","f335_emergency_situation"
"134","f336_subjects_incapable_of_giving_consent_personally"
"135","f3361_details_of_subjects_incapable_of_giving_consent"
"136","f337_others"
"137","f3371_details_of_other_specific_vulnerable_populations"
"138","f422_in_the_whole_clinical_trial"
"139","f5_plans_for_treatment_or_care_after_the_subject_has_ended_the_participation_in_the_trial_if_it_is_different_from_the_expected_normal_treatment_of_that_condition"
"140","h41_third_country_in_which_the_trial_was_first_authorised"
"141","record_last_import"
"142","x4_clinical_trial_type"
"143","x6_date_on_which_this_record_was_first_entered_in_the_eudract_database"

Would the following changes to fields be helpful?

To answer your question, I don't think it is needed to change the fields.
Maybe I'm using dbFindFields or something else wrong.
Thanks again for your work and help on this issue.

@rfhb
Copy link
Owner

rfhb commented Oct 6, 2023

Very helpful, thanks @machado-t, will provide a solution

@machado-t
Copy link
Author

I have just noticed that other fields are also missing from that list:

  • D.2.5.1 Orphan drug designation number
  • D.8 Placebo (and all sub-fields)
  • E.1.2 Medical condition or disease under investigation (and all MedDRA sub-fields)
  • F.1.X.1 Number of subjects for this age range (appearing only for some age groups and missing for others)
  • G. Investigator Networks to be involved in the Trial (entire section)

It seems that dbFindFields may be only retrieving the fields for one trial and not the whole collection.
I hope this is helpful. Please let me know if there's any other way I can help.

@rfhb
Copy link
Owner

rfhb commented Oct 6, 2023

  • Improved dbFindFields() to return from EUCTR fields from both EU and 3rd country trials
  • Please try 4bb519d with devtools::install_github("rfhb/ctrdata", build_vignettes = TRUE)
  • Harmonisation of EUCTR fields as mentioned here is maintained
  • Have clarified in the documentation that dbFindFields() may not return all fields, for two reasons: only samples of records are used and only fields that have values exist in the collection; see ??dbFindFields.

@machado-t
Copy link
Author

Please try 4bb519d with devtools::install_github("rfhb/ctrdata", build_vignettes = TRUE)

After running the same code again, I'm now able to find 193 fields, which I believe account for most of the available fields. However, I notice some are still absent, particularly all fields from the G section. I understand this limitation may be hard to overcome if the function must rely on a sample of the records. As a workaround, is there anyway to the dbGetFieldsIntoDf function to output all fields available, instead of listing all the desired fields in the fields argument?

@rfhb
Copy link
Owner

rfhb commented Oct 6, 2023

dbFindFields() is a helper function, and package ctrdata is designed to work in the same way across four different database backends. Of course it is possible to directly access the databases, as follows (note this does not provide the name of the register as function dbFindFields() does):

# 1) use demo database provided in package ctrdata
dbc <- nodbi::src_sqlite(
   dbname = system.file("extdata", "demo.sqlite", package = "ctrdata"),
   collection = "my_trials")
#
# the following syntax works for nodbi database
# connection objects to PostgreSQL, DuckDB, SQLite
nodbi::docdb_query(src = dbc, key = dbc$collection, query = '{}', listfields = TRUE)

# 2) for MongoDB database connection objects such as
dbm <- nodbi::src_mongo(collection = "dbperm", db = "dbperm", url = "mongodb://localhost")
ctrLoadQueryIntoDb(queryterm = "query=&age=newborn&phase=phase-three&dateFrom=2015-03-01&dateTo=2015-03-31", register = "EUCTR", con = dbm)
#
# define a mongolite connection
con <- mongolite::mongo(collection = dbm$collection, db = dbm$db, url = dbm$url)
#
# then apply a MongoDb function (caveat, user rights have to be set to permit this)
sort(con$mapreduce(
  map = "function() {
      obj = this;
      return searchInObj(obj, '');
      function searchInObj(obj, pth) {
         for(var key in obj) {
            if(typeof obj[key] == 'object' && obj[key] !== null) {
               if(pth != '') {pth = pth + '.'}
                  searchInObj(obj[key], pth + key);
            }else{
               key = pth + '.' + key;
               key = key.replace(/[.][0-9]+[.]/g, '.');
               key = key.replace(/[.][0-9]+$/, '');
               key = key.replace(/[.][.]+/g, '.');
               key = key.replace(/^[.]/, '');
               emit(key, 1);
      }}}}",
  reduce = "function(id, counts) {return Array.sum(counts)}"
)[["_id"]])

@machado-t
Copy link
Author

Unfortunately, I'm an amateur and not familiar with MongoDB or any other of those databases, so I couldn't test or use that.
Considering this limitation of dbFindFields, i believe that a simple list of all the fields available in each platform could be very useful for new users.
Thank you for your help

@rfhb
Copy link
Owner

rfhb commented Oct 20, 2023

Thanks, I appreciate the suggestion to provide a list of all data fields from (all) registers. These are the reasons why I cannot provide it and why this is not a limitation of dbFindFields(): fields change in registers often enough to make manual maintenance unreliable (see for example https://prsinfo.clinicaltrials.gov/ProtocolRecordSchema.xsd); fields are listed in different formats, not always readily readable; dbFindFields() was intended to list those fields for which at least one record in the collection has data.

Definitions of fields are linked from help("ctrdata-registers") (see https://rfhb.github.io/ctrdata/reference/ctrdata-registers.html). Perhaps this is a good starting point, not sure what exactly is sought. I can think further with more background on the question or motivation.

@machado-t
Copy link
Author

Thanks for explaining. I understand now.
For my particular case, this issue is solved now. I was not able to find all the fields I needed using dbFindFields() but i happened to find them in the code provided in your blog.

@rfhb rfhb closed this as completed Oct 21, 2023
@rfhb
Copy link
Owner

rfhb commented Oct 21, 2023

Thanks, closing the issue, after code changes e.g. in 0d965d3 to improve field detection and documentation, yet adding it to the roadmap for ctrdata: https://github.com/users/rfhb/projects/1/views/1?pane=issue&itemId=42248978.

@machado-t
Copy link
Author

That's great. Thanks for your work

@rfhb
Copy link
Owner

rfhb commented Jan 23, 2024

With ctrdata v1.17.0, it is now possible to obtain names of fields that are either derived from a sample of trials, or from all trial records - the latter will take (much) more time but is comprehensive, see parameter sample in https://rfhb.github.io/ctrdata/reference/dbFindFields.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants