## Trying pmc_id_converter

> "pmc_id_converter for ID converter between PMID, PMCID and DOI"

Here the demonstration of this package is done in temporary sessions served via the MyBinder service, and so if you make anything useful save it back to your local machine ASAP.

The package is already installed in sessions that come up if you launched from [here](https://github.com/fomightez/pmc_id_converter_demo-binder). Otherwise, you'd want to run `%pip install pmc-id-converter` in your Jupyter Notebook file.

--------

# Use `pmc_id_converter` via Command Line interface

NOTE: 
In these demonstrations the exclamation point is used to send commands to the command line. You would not not need this in the terminal/console. **In other words, in most cases you'll want to actually delete the exclamation point shown at the start of these commands to use on the REAL command line.**

Running `pmc_idconv --help` shows you several command examples and options for usage

In [1]:
!pmc_idconv --help

Usage: pmc_idconv [OPTIONS] [IDS]...

  [36m[1m[3mID converter between PMID, PMCID and DOI[0m

Options:
  --version           Show the version and exit.
  -o, --outfile TEXT  the output filename [stdout]
  -?, -h, --help      Show this message and exit.

  [32m

  Examples:
      pmc_idconv --help
      pmc_idconv 30003000                     [PMID]
      pmc_idconv PMC6039336                   [PMCID]
      pmc_idconv 10.1007/s13205-018-1330-z    [DOI]
      pmc_idconv 30003000 30003001 30003002   [BATCH]
      pmc_idconv 30003000 30003001 -o out.jl  [FILE]  

  [97m[3mContact: suqingdong <suqingdong1114@gmail.com>[0m [0m


Let's step through running those examples handling identifier conversion here as a way to cover the use of this on the command line.

In [2]:
!pmc_idconv 30003000

{"doi": "10.1007/s13205-018-1330-z", "pmcid": "PMC6039336", "pmid": 30003000, "requested-id": "30003000"}


In [3]:
!pmc_idconv PMC6039336

{"doi": "10.1007/s13205-018-1330-z", "pmcid": "PMC6039336", "pmid": 30003000, "requested-id": "PMC6039336"}


In [4]:
!pmc_idconv 10.1007/s13205-018-1330-z 

{"doi": "10.1007/s13205-018-1330-z", "pmcid": "PMC6039336", "pmid": 30003000, "requested-id": "10.1007/s13205-018-1330-z"}


The fourth example involving identifiers demonstrates BATCH conversion of multiple identifiers.

In [5]:
!pmc_idconv 30003000 30003001 30003002

{"doi": "10.1007/s13205-018-1330-z", "pmcid": "PMC6039336", "pmid": 30003000, "requested-id": "30003000"}
{"doi": "10.1002/open.201800095", "pmcid": "PMC6031859", "pmid": 30003001, "requested-id": "30003001"}
{"doi": "10.1002/open.201800044", "pmcid": "PMC6031856", "pmid": 30003002, "requested-id": "30003002"}


The fifth example from the usage involving identifiers is meant to demonstrate sending the the results to a file. The related example on the repo [there]((https://github.com/suqingdong/pmc_id_converter#command-line) looks more fully realized version of this and so I suggest using that instead:

```shell
# Output to a file
pmc_idconv 30003000 30003001 30003002 -o out.json
```

In [6]:
!pmc_idconv 30003000 30003001 30003002 -o out.json

[32m[2025-10-24 16:40:15[0m [34mMain[0m cli [1;30mDEBUG[0m MainThread:45] [32msave file to: out.json[0m


At this time I cannot explain why it shows `DEBUG MainThread:45` because everything seems to work so I suggest ignoring that for now.  
Run the next command to demonstrate the last example worked:

In [7]:
!head out.json

{"doi": "10.1007/s13205-018-1330-z", "pmcid": "PMC6039336", "pmid": 30003000, "requested-id": "30003000"}
{"doi": "10.1002/open.201800095", "pmcid": "PMC6031859", "pmid": 30003001, "requested-id": "30003001"}
{"doi": "10.1002/open.201800044", "pmcid": "PMC6031856", "pmid": 30003002, "requested-id": "30003002"}


That shows the content of the file generated file `out.json` are the expected results given the output from `pmc_idconv 30003000 30003001 30003002` earlier.

Let's finish the command line section with a perhaps more advanced Python example. If you aren't currently used to handling Pandas' dataframes, you may wish to skip to the end of this part. There I show a way to get the data into a form you can use in spreadsheet softwarem such as Excel or Google Sheets.

With the `json` formatted data in hand, we should be able to use Pandas to read the data, like so:

In [8]:
import pandas as pd
df_cl = pd.read_json("out.json", lines=True)
df_cl

Unnamed: 0,doi,pmcid,pmid,requested-id
0,10.1007/s13205-018-1330-z,PMC6039336,30003000,30003000
1,10.1002/open.201800095,PMC6031859,30003001,30003001
2,10.1002/open.201800044,PMC6031856,30003002,30003002


Currently, the `pmid` would be read in as a integer because Pandas thinks it looks like one. 

In [9]:
type(df_cl.pmid.to_list()[0])

int

This is an object than that Python wants to use in math and while it may work, it would be less suitable going forward for comparisons as an identifier. Ideally, we'd address this early.   
See a more thorough example below for handling making dataframes in a way that the indentifier is kept as a string.  
That approach adapted to reading this `json`-formatted file would be:

In [10]:
import pandas as pd
import numpy as np
df_cl = pd.read_json("out.json", lines=True)
df_cl['pmid'] = df_cl['pmid'].apply(lambda x: str(int(x)) if pd.notna(x) else np.nan)
df_cl

Unnamed: 0,doi,pmcid,pmid,requested-id
0,10.1007/s13205-018-1330-z,PMC6039336,30003000,30003000
1,10.1002/open.201800095,PMC6031859,30003001,30003001
2,10.1002/open.201800044,PMC6031856,30003002,30003002


Showing the values in the `pmid` column are now `string`.

In [11]:
type(df_cl.pmid.to_list()[0])

str

What if we wanted the data to go into a spreadsheet?

In [12]:
import pandas as pd
import numpy as np
df_cl = pd.read_json("out.json", lines=True)
df_cl['pmid'] = df_cl['pmid'].apply(lambda x: str(int(x)) if pd.notna(x) else np.nan)
df_cl.to_csv("data_ready_to_be_pasted_into_spreadsheet.csv",index = False)

Let's look at the produced file.

In [13]:
!head data_ready_to_be_pasted_into_spreadsheet.csv

doi,pmcid,pmid,requested-id
10.1007/s13205-018-1330-z,PMC6039336,30003000,30003000
10.1002/open.201800095,PMC6031859,30003001,30003001
10.1002/open.201800044,PMC6031856,30003002,30003002


That file can be opened in any text editor and pasted into spreadsheet software, like Excle or Google Sheets. (You may even be able to open it direct if you download the file to your system.)

----------------

# Use `pmc_id_converter` via Python

## Basics

(Just jump to the end of this section if you want to find good examples to adapt and don't care about relating the Python offerings from  suqingdong's GitHub repo for pmc_id_converter.)

In addition to the command line, the Usage notes featured in [suqingdong's GitHub repo for pmc_id_converter](https://github.com/suqingdong/pmc_id_converter#id-converter-between-pmid-pmcid-and-doi) show you can use it via Python as well.

I think this offers much more functionality than the command line if you are going to be using this with Python or Jupyter with a Python-based kernel. However, the offered demonstrations are lacking for novices not looking to dig through the code. I will expand greatly on these and give examples of how you can use this utility in your Python ecosystem. In particular, I'll include use with Pandas that is common package in data science work within the Python ecosystem.

These are the Python usage demonstrations offered by [suqingdong's GitHub repo for pmc_id_converter](https://github.com/suqingdong/pmc_id_converter#python):

```python
from pmc_id_converter import API

API.idconv('PMC3531190')
API.idconv('PMC3531190', 'PMC3531191123', 'PMC3531191')
API.idconv('23193287')
API.idconv('10.1093/nar/gks1195')
```

Let's start with the import statement and the first example. Try running this next cell to do that:

In [14]:
from pmc_id_converter import API
API.idconv('PMC3531190')

[PMC3531190]

When running that here, you'll simply get the following as output:

```text
[PMC3531190]
```

That is hardly informative. We got the same thing we put, namely the PubMed Central identifier.  
If you know Python, you may recognize that the brackets may be implying that is a list.

If we run the third offered example we may see we get something more along the line a conversion at least:

In [15]:
API.idconv('23193287')

[PMC3531190]

We didn't need to repeat the import again because it has already been imported into the current namespace.

This time we used the PMID and not get PubMed Central identifier, it seems. However, this doesn't seem as informative as the command line use and that is because why this is a list, it is not simply the identifier. Keen Python-folks may have realized though it looks like a list, it isn't a Python string.  
So what is it?
Run the following cell to check the type of each of the code examples we ran so far:

In [16]:
print(type(API.idconv('PMC3531190')))
print(type(API.idconv('23193287')))

<class 'list'>
<class 'list'>


Indeed, each example from this section so far gave a Python list.

Each list has only one item so let's check what that is by specifying the first item, i.e., the one with index zero.

In [17]:
print(type(API.idconv('PMC3531190')[0]))
print(type(API.idconv('23193287')[0]))

<class 'pmc_id_converter.core.Record'>
<class 'pmc_id_converter.core.Record'>


Now we see each result was an item of the class `pmc_id_converter.core.Record`. 

That's interesting. So what we first saw was a list with a specially define record class as the only item.

To learn about this record item, let's use try the `print()` function to see how Python has been told to display items of the record type.

In [18]:
records_of_query_results = API.idconv('PMC3531190')
for record in records_of_query_results:
    print(record)

{'doi': '10.1093/nar/gks1195', 'pmcid': 'PMC3531190', 'pmid': 23193287, 'requested-id': 'PMC3531190'}


Oh, that looks like what we got from the command line approach.  
Let's see what the other example yields when explored that way:

In [19]:
records_of_query_results = API.idconv('23193287')
for record in records_of_query_results:
    print(record)

{'doi': '10.1093/nar/gks1195', 'pmcid': 'PMC3531190', 'pmid': 23193287, 'requested-id': '23193287'}


A-ha, the same thing because that is the PMID.
Now we seem to be getting information that is much more useful.  
And since it is the same thing, we'll focus on just one example for now.

So it turns out, if you dig in the code, you'll see that the developer of the package added a data attribute to this special 'record' class defined, and so we could have gotten much the same more easily by taking the first item in the list of results and getting the data attribute for it.

In [20]:
API.idconv('PMC3531190')[0].data

{'doi': '10.1093/nar/gks1195',
 'pmcid': 'PMC3531190',
 'pmid': 23193287,
 'requested-id': 'PMC3531190'}

What from the `data` attribute of the record looks like a Python dictionary. Is it?

In [21]:
type(API.idconv('PMC3531190')[0].data)

dict

Indeed, it is.  
Since, that object we get from the `data` attribute of the record is a dictionary, we can now use standard Python ways we handle dictionaries to access the data held in it. Like so:

In [22]:
API.idconv('PMC3531190')[0].data.get('doi')

'10.1093/nar/gks1195'

Or the simple way to access the same thing in a dictionary:

In [23]:
API.idconv('PMC3531190')[0].data['doi']

'10.1093/nar/gks1195'

If you have been paying close attention, you'll also realize the last example under the Python examples, this:

```python
API.idconv('10.1093/nar/gks1195')
```

turns out to be much the same thing as the first (`API.idconv('PMC3531190')`) and third (`API.idconv('23193287')`) examples.  
Let's do that query and see what I mean:

In [24]:
records_of_query_results = API.idconv('10.1093/nar/gks1195')
for record in records_of_query_results:
    print(record)
# Indeed what we use earlier gives much the same dictionary of results:
print(API.idconv('PMC3531190')[0].data)
print(API.idconv('23193287')[0].data)

{'doi': '10.1093/nar/gks1195', 'pmcid': 'PMC3531190', 'pmid': 23193287, 'requested-id': '10.1093/nar/gks1195'}
{'doi': '10.1093/nar/gks1195', 'pmcid': 'PMC3531190', 'pmid': 23193287, 'requested-id': 'PMC3531190'}
{'doi': '10.1093/nar/gks1195', 'pmcid': 'PMC3531190', 'pmid': 23193287, 'requested-id': '23193287'}


Indeed, only the `'requested-id'` entry that comes from the query input is any different in each resulting dictionary.

So I'd suggest a better Python example to have offered at the repo, for all but the second example, may have been:

### Better Python Example Code for Novices

```python
from pmc_id_converter import API

query_id = 'PMC3531190'
records_of_query_results = API.idconv(query_id)
for record in records_of_query_results:
    print(record)
query_id = '23193287'
records_of_query_results = API.idconv(query_id)
for record in records_of_query_results:
    print(record)
query_id = '10.1093/nar/gks1195'
records_of_query_results = API.idconv(query_id)
for record in records_of_query_results:
    print(record)
print(record.data['pmcid'])
print(record.data['pmid'])
print(record.data['doi'])
```

Let's run that next:

In [25]:
from pmc_id_converter import API
query_id = 'PMC3531190'
records_of_query_results = API.idconv(query_id)
for record in records_of_query_results:
    print(record)
query_id = '23193287'
records_of_query_results = API.idconv(query_id)
for record in records_of_query_results:
    print(record)
query_id = '10.1093/nar/gks1195'
records_of_query_results = API.idconv(query_id)
for record in records_of_query_results:
    print(record)
print(record.data['pmcid'])
print(record.data['pmid'])
print(record.data['doi'])

{'doi': '10.1093/nar/gks1195', 'pmcid': 'PMC3531190', 'pmid': 23193287, 'requested-id': 'PMC3531190'}
{'doi': '10.1093/nar/gks1195', 'pmcid': 'PMC3531190', 'pmid': 23193287, 'requested-id': '23193287'}
{'doi': '10.1093/nar/gks1195', 'pmcid': 'PMC3531190', 'pmid': 23193287, 'requested-id': '10.1093/nar/gks1195'}
PMC3531190
23193287
10.1093/nar/gks1195


I think that more easily & fully illustrates how to do queries and handle the results.

The second example offered appears more complex because it is 'batch' query of several identifiers, similar to the 'batch' example from the command line section. However, we can handle the results much the same way with some adjustments.

In [26]:
# version of `API.idconv('PMC3531190', 'PMC3531191123', 'PMC3531191')`
records_of_query_results = API.idconv('PMC3531190', 'PMC3531191123', 'PMC3531191')
for record in records_of_query_results:
    print(record)

[2025-10-24 16:40:26 ID_CONV_API idconv ERROR MainThread:58] RecordError: Identifier not found in PMC for "PMC3531191123"


{'doi': '10.1093/nar/gks1195', 'pmcid': 'PMC3531190', 'pmid': 23193287, 'requested-id': 'PMC3531190'}
{'errmsg': 'Identifier not found in PMC', '_id': 'PMC3531191123'}
{'doi': '10.1093/nar/gks1163', 'pmcid': 'PMC3531191', 'pmid': 23193288, 'requested-id': 'PMC3531191'}


When working out how to parallel the offered example `API.idconv('PMC3531190', 'PMC3531191123', 'PMC3531191')`, I also found that you can do what I demonstrate in this next cell because the software has been designed to split things at the comma:

In [27]:
query_ids = 'PMC3531190, PMC3531191123, PMC3531191'
records_of_query_results = API.idconv(query_ids)
for record in records_of_query_results:
    print(record)

[2025-10-24 16:40:26 ID_CONV_API idconv ERROR MainThread:58] RecordError: Identifier not found in PMC for "PMC3531191123"


{'doi': '10.1093/nar/gks1195', 'pmcid': 'PMC3531190', 'pmid': 23193287, 'requested-id': 'PMC3531190'}
{'errmsg': 'Identifier not found in PMC', '_id': 'PMC3531191123'}
{'doi': '10.1093/nar/gks1163', 'pmcid': 'PMC3531191', 'pmid': 23193288, 'requested-id': 'PMC3531191'}


A more pratical example of how you may access results of a batch query to work with it in Python:

In [28]:
my_pmids = []
query_ids = 'PMC3531190, PMC3531191123, PMC3531191'
records_of_query_results = API.idconv(query_ids)
for record in records_of_query_results:
    my_pmids.append(record.data.get('pmid'))
my_pmids = [str(x) if isinstance(x, int) else x for x in my_pmids] # otherwise they'll be integers which isn't what we really want as these are idenitifiers and not numbers to process in math
my_pmids

[2025-10-24 16:40:27 ID_CONV_API idconv ERROR MainThread:58] RecordError: Identifier not found in PMC for "PMC3531191123"


['23193287', None, '23193288']

Note the PMIDs seem to default to integers and the conversion to string is a little clumsy to keep the `None`. 

## Advanced Python Use

Pandas is common to use when doing data science in Python and it offers easier handling of large collections of data.  
In this section I provide examples integrating use of `pmc_id_converter` and Pandas.

For making a dataframe of the results, `Pandas.DataFrame.from_records()` seems applicable because it can take a list of dicts, and the results can be easily converted to that form.

In [29]:
import pandas as pd
import numpy as np
query_ids = 'PMC3531190, PMC3531191123, PMC3531191'
records_of_query_results = API.idconv(query_ids)
records_of_query_results_data = [x.data for x in records_of_query_results] # make a list of the results dicts
df = pd.DataFrame.from_records(records_of_query_results_data)
df['pmid'] = df['pmid'].apply(lambda x: str(int(x)) if pd.notna(x) else np.nan) # Don't want the pmid column values becoming floats/integer; however, do want the NaN staying that way & `Int64` helps with that
df

[2025-10-24 16:40:27 ID_CONV_API idconv ERROR MainThread:58] RecordError: Identifier not found in PMC for "PMC3531191123"


Unnamed: 0,doi,pmcid,pmid,requested-id,errmsg,_id
0,10.1093/nar/gks1195,PMC3531190,23193287.0,PMC3531190,,
1,,,,,Identifier not found in PMC,PMC3531191123
2,10.1093/nar/gks1163,PMC3531191,23193288.0,PMC3531191,,


You may not wish to have the 'errmsg' column. That could be removed like this:

In [30]:
df = df.drop(columns='errmsg')

 And I think technically the `_id` corresponds to the `requested-id`, and so I would propose this step to fix that and jettison the `_id` column:

In [31]:
df['requested-id'] = df['requested-id'].fillna(df['_id'])
df = df.drop(columns='_id')
df

Unnamed: 0,doi,pmcid,pmid,requested-id
0,10.1093/nar/gks1195,PMC3531190,23193287.0,PMC3531190
1,,,,PMC3531191123
2,10.1093/nar/gks1163,PMC3531191,23193288.0,PMC3531191


And if you wanted all the PMIDs, but didn't want to include the '`NaN`'?  
That's easy with Pandas:

In [32]:
pmids = df['pmid'].dropna().to_list()
pmids

['23193287', '23193288']

Compare that to the result of `my_pmids` at the end of the previous section.

We don't have the `None/NaN`. And here from the dataframe, they are strings because we handled that earlier.  
We had to handle that a little more clumsily in the section above for `my_pmids`.

A nice thing about working with Pandas dataframe as the data object, you can easily get dictionaries to allow easily lookup of each to get the corresponding identifier.

In [33]:
# Dictionary: PMID -> PMCID (excluding NaN)
pmid_to_pmcid = df.dropna(subset=['pmid', 'pmcid']).set_index('pmid')['pmcid'].to_dict()
# Dictionary: PMCID -> PMID (excluding NaN)
pmcid_to_pmid = df.dropna(subset=['pmid', 'pmcid']).set_index('pmcid')['pmid'].to_dict()

In [34]:
pmid_to_pmcid

{'23193287': 'PMC3531190', '23193288': 'PMC3531191'}

In [35]:
pmcid_to_pmid

{'PMC3531190': '23193287', 'PMC3531191': '23193288'}

Another nice thing you get with Pandas is a way to get the data into a spreadsheet software, like Excel or Google Sheets.

In [36]:
import pandas as pd
import numpy as np
query_ids = 'PMC3531190, PMC3531191123, PMC3531191'
records_of_query_results = API.idconv(query_ids)
records_of_query_results_data = [x.data for x in records_of_query_results] # make a list of the results dicts
df = pd.DataFrame.from_records(records_of_query_results_data)
df['pmid'] = df['pmid'].apply(lambda x: str(int(x)) if pd.notna(x) else np.nan) # Don't want the pmid column values becoming floats/integer; however, do want the NaN staying that way & `Int64` helps with that
df = df.drop(columns='errmsg')
df['requested-id'] = df['requested-id'].fillna(df['_id'])
df = df.drop(columns='_id')
df.to_csv("data_to_be_pasted_into_spreadsheet.csv",index = False)

[2025-10-24 16:40:28 ID_CONV_API idconv ERROR MainThread:58] RecordError: Identifier not found in PMC for "PMC3531191123"


Let's look at the produced file.

In [37]:
!head data_to_be_pasted_into_spreadsheet.csv

doi,pmcid,pmid,requested-id
10.1093/nar/gks1195,PMC3531190,23193287,PMC3531190
,,,PMC3531191123
10.1093/nar/gks1163,PMC3531191,23193288,PMC3531191


That file can be opened in any text editor and pasted into spreadsheet software, like Excle or Google Sheets. (You may even be able to open it direct if you download the file to your system.)

------

Enjoy!