unlock "other study ID numbers" field in clinicaltrials.gov #510

jessflem · 2016-10-19T14:34:52Z

It is a bit of a grim field, but if we could start by getting it broken down into individual IDs, and matching for the obvious ones e.g. the EUCRT ones (always the same structure), then I think we will significantly improve our deduplication rate.

It will also be really helpful to get an idea of the other ID sources in this field, so we can start thinking how to match them.

Thanks!

vitorbaptista · 2016-10-20T06:23:44Z

Just merged some code from @georgiana-b to extract the secondary IDs from clinicaltrials.gov, so any "well-behaved" identifier will be extracted the next time we update our data. They're not many, though. What we need to improve this is to understand the format of the identifiers, so we can improve our process to detect them. Do you know if the registries document their identifiers format somewhere?

jessflem · 2016-10-23T05:28:57Z

I don't know of any formal documents that define the formats for each of the registers - we'd probably have to manually go through as we add registers.

kerfors · 2016-10-24T18:31:17Z

"well-behaved" identifier is (IMHO) = http-based URIs with a look-up URL (re-direct /) resolver service and with a promise of persistence. So, instead of a text string such as "D5130L00067" as a secondary/sponsor identifier. I am pushing for http://clinicaltrials.astrazeneca.com/study/D5130L00067 a first step is an internal process to assign azct.IDs to our 20.000 listed studies internally and an internal look-up service. Would like to move to "things not strings" in the information we send to e.g. CT.gov, see https://lists.w3.org/Archives/Public/public-semweb-lifesci/2013Feb/0067.html

vitorbaptista · 2016-10-25T08:16:46Z

@kerfors Could you open a new issue explaining this, so we can discuss?

pwalsh · 2016-11-30T10:39:04Z

@vitorbaptista @georgiana-b is this now merged into production? If so, can you please provide me an example link or two I can share with @jessflem

vitorbaptista · 2016-11-30T17:30:31Z

@pwalsh Yes, this is in production. However, finding a link is quite difficult, because the secondary IDs from CT.org don't tell us from which registry they come from; they're simply strings like 193284798. Out identifier detection has some regexps to validate them, like a NCT identifier must be NCT\d+, so any well behaved identifiers in the secondary identifiers field would be picked up. The others, which are most of them, won't.

This is related in part with the way we handle identifiers, and I want to change it (see #502). An identifier needs a source with our current design. To handle CT.org, we'll probably need to add a bogus source (e.g. unknown) and handle multiple identifiers from the same source.

pwalsh · 2016-12-01T06:29:26Z

@vitorbaptista ok. can you prep this issue so we can get the team to do it in the next couple of weeks.

vitorbaptista · 2016-12-07T19:28:15Z

Blocked by #502

vitorbaptista · 2017-01-10T19:38:53Z

Adding another example found by @Bengoldacre, the trials https://explorer.opentrials.net/trials/2a745a57-3e64-4ec1-88a4-21f6a2cefd1a and https://explorer.opentrials.net/trials/a021b738-9294-4126-8eea-393e9c0743d2 are the same. The identifier 2015-002933-23 appears on its clinicaltrials.gov entry at https://clinicaltrials.gov/ct2/show/record/NCT02688400, but without the EUCTR prefix.

kerfors mentioned this issue Nov 5, 2016

Study URIs #552

Closed

jessflem mentioned this issue Nov 30, 2016

rematch by scientific title between registries #511

Closed

pwalsh assigned pwalsh and vitorbaptista and unassigned pwalsh Nov 30, 2016

This was referenced Dec 6, 2016

Extract secondary identifiers from nct #360

Closed

Secondary (internal industry) IDs not searchable or listed on trial page #329

Closed

vitorbaptista added the 0. Blocked label Dec 7, 2016

vitorbaptista added the Data label Jan 31, 2017

pwalsh added Database enhancement and removed Data labels Feb 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unlock "other study ID numbers" field in clinicaltrials.gov #510

unlock "other study ID numbers" field in clinicaltrials.gov #510

jessflem commented Oct 19, 2016

vitorbaptista commented Oct 20, 2016

jessflem commented Oct 23, 2016

kerfors commented Oct 24, 2016

vitorbaptista commented Oct 25, 2016

pwalsh commented Nov 30, 2016

vitorbaptista commented Nov 30, 2016

pwalsh commented Dec 1, 2016

vitorbaptista commented Dec 7, 2016

vitorbaptista commented Jan 10, 2017

unlock "other study ID numbers" field in clinicaltrials.gov #510

unlock "other study ID numbers" field in clinicaltrials.gov #510

Comments

jessflem commented Oct 19, 2016

vitorbaptista commented Oct 20, 2016

jessflem commented Oct 23, 2016

kerfors commented Oct 24, 2016

vitorbaptista commented Oct 25, 2016

pwalsh commented Nov 30, 2016

vitorbaptista commented Nov 30, 2016

pwalsh commented Dec 1, 2016

vitorbaptista commented Dec 7, 2016

vitorbaptista commented Jan 10, 2017