This is a small script just to play around with requesting doi / sdoi.
Basically, I sent in a paper, with all my references as short DOIs. In Latex, I used the cite-keys as AuthorYearSDOI. Now, the journal changed them all to DOI's and asked me to confirm. That is a lot of work to do manually. So, I figured that I might as well learn a bit of python, requests, regular expressions, jupyter, and so forth.

Enjoy!

In [9]:
#all we need?
import requests
import re

# the URLs we will use
url_DOI = 'https://doi.org/'
url_sDOI = 'https://shortdoi.org/' # short DOI's are sweet!

Let us test requests:

In [10]:
test_DOI = '10.1016/J.CPC.2021.107987'

response = requests.get(url_DOI + test_DOI)
print(response.history)
for resp in response.history:
    print(resp.status_code, resp.url)

[<Response [302]>]
302 https://doi.org/10.1016/J.CPC.2021.107987


In [11]:
test_sDOI = '10/f9bw'

response = requests.get(url_sDOI + test_sDOI)
print(response.history)
for resp in response.history:
    print(resp.status_code, resp.url)

[<Response [302]>, <Response [301]>, <Response [302]>]
302 https://shortdoi.org/10/f9bw
301 https://doi.org/10/f9bw
302 https://doi.org/10.1016/J.CPC.2021.107987


But, there is the thing, that a DOI redirects you to the actual URL. And, a short DOI may redirect more. Let's investigate!
...but how?

In [12]:
def print_redirects(url):
    response = requests.get(url)
    if not response.history:
        print(f'No redirects for {url}')
    else:
        print(f'Redirects for {url}')
        for resp in response.history:
            print(resp.status_code, resp.url)


In [13]:
print_redirects(url_DOI)
print_redirects(url_DOI + test_DOI)
print_redirects(url_sDOI)
print_redirects(url_sDOI + test_sDOI)

Redirects for https://doi.org/
301 https://doi.org/
Redirects for https://doi.org/10.1016/J.CPC.2021.107987
302 https://doi.org/10.1016/J.CPC.2021.107987
No redirects for https://shortdoi.org/
Redirects for https://shortdoi.org/10/f9bw
302 https://shortdoi.org/10/f9bw
301 https://doi.org/10/f9bw
302 https://doi.org/10.1016/J.CPC.2021.107987


Okey, but what does the request really contain? Let's inspect!

In [18]:
content = requests.get(url_sDOI + test_DOI).text
content

'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"\n        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n<html>\n<head>\n<title>shortDOI Service</title>\n\n<meta http-equiv="content-type" content="text/html; charset=UTF-8" />\n\n<link href="/style/new-style2.css" rel="stylesheet" type="text/css" />\n<style type="text/css">\n    .code {font-family: courier, courier-new;\n                font-size: 12px;}\n\n    div.head1-shortdoi {\n    font-size: 120%;\n    color: #385FAB;\n    padding: 0px 27px 10px 0px;\n}\n</style>\n\n</head>\n\n<body>\n\n<!-- TABLE FOR BANNER -->\n\n<table width="100%" border="0" cellpadding="0" cellspacing="0" bgcolor="#acacac" align="center">\n  <tr>\n    <td bgcolor="#acacac" >\n\t    <table width="100%"><tr><td><img src="/img/shortDOI.gif" alt="Logo" width="620" height="100" border="0" /></td><td align="right"><img src="/img/shortDOI-2.gif" alt="Logo" width="150" height="100" border="0" /></td></tr></table>\n\n </td>\n  </tr>\n</tabl

Okey, fine, loads of stuff. Now, we could handle this HTML through BeautifulSoup, BS4 - I think it was, but then I have to relearn that. Aaaaaand, right now, I want to relearn regular expressions instead. Here we go!

In [19]:
pattern = re.compile('.*handle.*?\n\n.*?>(.*?)<.*')
pattern.search(content).group(1)

'10/f9bw'

That was a lot more hazze to figure out than it should have been! But, now we are here, and things are good :)
Of course, we could have solved things with pure python instead, but is that as robust?

In [23]:
start = (content).find('handle') + len(':</div>\n\n<div class="para">')
end = (content)[start:].find('div')
print(start, end, type(start), type(end))
content[start:start+end]

1386 15 <class 'int'> <class 'int'>


'para">10/f9bw</'

Seems like more hazzle....
Let's make some functions using regex instead!

In [26]:
def get_sdoi_py(doi):
    content = requests.get(url_sDOI + doi).text
    start = (content).find('handle') + len(':</div>\n\n<div class="para">')
    end = (content)[start:].find('div')
    return content[start:start+end]

In [27]:
def get_sdoi_re(doi):
    content = requests.get(url_sDOI + doi).text
    pattern = re.compile('.*handle.*?\n\n.*?>(.*?)<.*')
    return pattern.search(content).group(1)

Now, for a short test!

In [28]:
print(get_sdoi_py(test_DOI))
print(get_sdoi_re(test_DOI))

para">10/f9bw</
10/f9bw


In [29]:
get_sdoi = get_sdoi_re

**So far, so good!**
This was all the intro, now for a short demonstration!

In [30]:
# This is copy-pasting from a referencelist
ref_list = """
[1] P. Wedin, IEEE Electr. Insul. Mag. 30 (2014) 20–25, https://doi .org /10 .1109 /MEI .
2014 .6943430.
[2] O. Lesaint, J. Phys. D, Appl. Phys. 49 (2016) 144001, https://doi .org /10 .1088 /
0022 -3727 /49 /14 /144001.
[3] A. Sun, C. Huo, J. Zhuang, High Volt. 1 (2016) 74–80, https://doi .org /10 .1049 /
hve .2016 .0016.
[4] B. Farazmand, Br. J. Appl. Phys. 12 (1961) 251–254, https://doi .org /10 .1088 /
0508 -3443 /12 /5 /310.
[5] L. Niemeyer, L. Pietronero, H.J. Wiesmann, Phys. Rev. Lett. 52 (1984)
1033–1036, https://doi .org /10 .1103 /PhysRevLett .52 .1033.
[6] A.L. Kupershtok, Sov. Tech. Phys. Lett. 18 (1992) 647–649.
[7] P. Biller, in: Proc. 1993 IEEE 11th Int. Conf. Conduct. Break. Dielectr. Liq.
(ICDL’93), 1993, pp. 199–203.
[8] D.I. Karpov, A.L. Kupershtokh, in: Conf. Rec. 1998 IEEE Int. Symp. Electr. Insul.
(Cat No98CH36239), vol. 2, 1998, pp. 607–610.
"""
# Then a bit of manipulation
ref_compact = ref_list.replace(" ", "").replace("\n", "")
pattern = re.compile('.*?\](.*?)\[.*?')
pattern.findall(ref_compact)


['P.Wedin,IEEEElectr.Insul.Mag.30(2014)20–25,https://doi.org/10.1109/MEI.2014.6943430.',
 'O.Lesaint,J.Phys.D,Appl.Phys.49(2016)144001,https://doi.org/10.1088/0022-3727/49/14/144001.',
 'A.Sun,C.Huo,J.Zhuang,HighVolt.1(2016)74–80,https://doi.org/10.1049/hve.2016.0016.',
 'B.Farazmand,Br.J.Appl.Phys.12(1961)251–254,https://doi.org/10.1088/0508-3443/12/5/310.',
 'L.Niemeyer,L.Pietronero,H.J.Wiesmann,Phys.Rev.Lett.52(1984)1033–1036,https://doi.org/10.1103/PhysRevLett.52.1033.',
 'A.L.Kupershtok,Sov.Tech.Phys.Lett.18(1992)647–649.',
 'P.Biller,in:Proc.1993IEEE11thInt.Conf.Conduct.Break.Dielectr.Liq.(ICDL’93),1993,pp.199–203.']

Seems to find what I wanted. Okey, next!

In [31]:
# Let's see what need to be done to get the sDOI's out of this list!
ref_list = """
[1] P. Wedin, IEEE Electr. Insul. Mag. 30 (2014) 20–25, https://doi .org /10 .1109 /MEI .
2014 .6943430.
[2] O. Lesaint, J. Phys. D, Appl. Phys. 49 (2016) 144001, https://doi .org /10 .1088 /
0022 -3727 /49 /14 /144001.
[3] A. Sun, C. Huo, J. Zhuang, High Volt. 1 (2016) 74–80, https://doi .org /10 .1049 /
hve .2016 .0016.
[4] B. Farazmand, Br. J. Appl. Phys. 12 (1961) 251–254, https://doi .org /10 .1088 /
0508 -3443 /12 /5 /310.
[END]
"""
# Note, I added the END just to hack this, because I did not want to spend more time on it.

ref_compact = ref_list.replace(" ", "").replace("\n", "")
print(ref_compact)
pattern = re.compile('.*?\](.*?)\[.*?')
print(pattern.findall(ref_compact))
pattern2 = re.compile('.org/(.*).')
print([pattern2.findall(s)[0] for s in pattern.findall(ref_compact)])
print([get_sdoi(pattern2.findall(s)[0]) for s in pattern.findall(ref_compact)])

[1]P.Wedin,IEEEElectr.Insul.Mag.30(2014)20–25,https://doi.org/10.1109/MEI.2014.6943430.[2]O.Lesaint,J.Phys.D,Appl.Phys.49(2016)144001,https://doi.org/10.1088/0022-3727/49/14/144001.[3]A.Sun,C.Huo,J.Zhuang,HighVolt.1(2016)74–80,https://doi.org/10.1049/hve.2016.0016.[4]B.Farazmand,Br.J.Appl.Phys.12(1961)251–254,https://doi.org/10.1088/0508-3443/12/5/310.[END]
['P.Wedin,IEEEElectr.Insul.Mag.30(2014)20–25,https://doi.org/10.1109/MEI.2014.6943430.', 'O.Lesaint,J.Phys.D,Appl.Phys.49(2016)144001,https://doi.org/10.1088/0022-3727/49/14/144001.', 'A.Sun,C.Huo,J.Zhuang,HighVolt.1(2016)74–80,https://doi.org/10.1049/hve.2016.0016.', 'B.Farazmand,Br.J.Appl.Phys.12(1961)251–254,https://doi.org/10.1088/0508-3443/12/5/310.']
['10.1109/MEI.2014.6943430', '10.1088/0022-3727/49/14/144001', '10.1049/hve.2016.0016', '10.1088/0508-3443/12/5/310']
['10/cxmk', '10/cxmf', '10/dbt2', '10/bhcrhp']


Seems like I do get what I wanted in the end there!
Time to build a function for it!

In [32]:
def refs2sdoi(refs):
    refs = refs.replace(" ", "").replace("\n", "")  # compact stuff
    refs = refs + '[END]FictiveReference!'
    
    pattern = re.compile('.*?\](.*?)\[.*?')
    ref_list = pattern.findall(ref_compact)
    pattern2 = re.compile('.org/(.*).')
    doi_list = [pattern2.findall(s) for s in ref_list]
    doi_list = [doi[0] if doi else None for doi in doi_list]
    sdoi_list = [get_sdoi(doi) if doi else None for doi in doi_list]
    return ref_list, doi_list, sdoi_list
    
r, d, s = refs2sdoi(ref_list)
s

['10/cxmk', '10/cxmf', '10/dbt2', '10/bhcrhp']

Works!

**Secret**
The point of this was actually to compare to my own list. Time to manipulate the list, and make the comparinson!

In [37]:
my_refs = """Wedin2014cxmk
Lesaint2016cxmf
Sun2016dbt2
Farazmand1961bhcrhp
Niemeyer1984d35qr4
Kupershtok92stpl
Biller1993dctxz7
"""

In [38]:
pattern = re.compile('.*?\d\d\d\d(.*)')
my_sdoi = [pattern.findall(l) for l in my_refs.splitlines()]
my_sdoi = ['10/'+doi[0] if doi else None for doi in my_sdoi]
my_sdoi

['10/cxmk', '10/cxmf', '10/dbt2', '10/bhcrhp', '10/d35qr4', None, '10/dctxz7']

In [39]:
for t in zip(my_sdoi, s, d, my_refs.splitlines()):
    print(t[0]==t[1], t)

True ('10/cxmk', '10/cxmk', '10.1109/MEI.2014.6943430', 'Wedin2014cxmk')
True ('10/cxmf', '10/cxmf', '10.1088/0022-3727/49/14/144001', 'Lesaint2016cxmf')
True ('10/dbt2', '10/dbt2', '10.1049/hve.2016.0016', 'Sun2016dbt2')
True ('10/bhcrhp', '10/bhcrhp', '10.1088/0508-3443/12/5/310', 'Farazmand1961bhcrhp')


**Done!** Worth it? I dont know... But was a nice exercise!