## Obter os PIDs de todos os artigos de um determinado fascículo

Os códigos abaixo dependem das libs [articlemetaapi >= 1.26](https://pypi.org/project/articlemetaapi/), [rows == 0.3.1](https://pypi.org/project/rows/) e [requests](https://pypi.org/project/requests/)

In [1]:
import itertools
from articlemeta.client import ThriftClient

In [2]:
client = ThriftClient()

In [3]:
for i in client.documents(collection="scl", only_identifiers=True, extra_filter='{"code_issue": "1516-359820180001"}'):
    print(i.code)

S1516-35982018000100501
S1516-35982018000100401
S1516-35982018000100500
S1516-35982018000100400
S1516-35982018000100402
S1516-35982018000100600
S1516-35982018000100502
S1516-35982018000100403
S1516-35982018000100601
S1516-35982018000100300
S1516-35982018000100503
S1516-35982018000100700
S1516-35982018000100504
S1516-35982018000100505
S1516-35982018000100602
S1516-35982018000100506
S1516-35982018000100200
S1516-35982018000100509
S1516-35982018000100507
S1516-35982018000100301
S1516-35982018000100508
S1516-35982018000100404
S1516-35982018000100510
S1516-35982018000100512
S1516-35982018000100511
S1516-35982018000100514
S1516-35982018000100513
S1516-35982018000100701
S1516-35982018000100603
S1516-35982018000100100
S1516-35982018000100201
S1516-35982018000100406
S1516-35982018000100604
S1516-35982018000100517
S1516-35982018000100405
S1516-35982018000100516
S1516-35982018000100702
S1516-35982018000100515


## Obter o ID de um fascículo à partir de seu rótulo e ISSN do periódico

In [4]:
client.get_issue_code_from_label("v95n5s1", "0066-782X", "scl")

'0066-782X20100024'

## Obter a lista dos artigos dos fascículos processados em determinada data

A implementação à seguir obtém a lista de PIDs de artigos à partir da data de processamento dos fascículos. É importante notar que a data de processamento é na realidade a data em que os arquivos foram processados pelo *converter*.

In [5]:
# o arg `until_date` não está funcionando
issues = [i.code for i in client.issues(
    collection="scl", from_date="2018-05-29", until_date="2018-05-29", only_identifiers=True) 
          if i.processing_date == "2018-05-29"]

def getdocs(issue):
    return client.documents(collection="scl", only_identifiers=True, extra_filter='{"code_issue": "%s"}' % issue)
    
docs = [getdocs(issue) for issue in issues]

for i in itertools.chain(*docs):
    print(i.code)

S0100-72032018000300147
S0100-72032018000300106
S0100-72032018000300115
S0100-72032018000300127
S0100-72032018000300103
S0100-72032018000300137
S0100-72032018000300156
S0100-72032018000300121
S0100-72032018000300163
S0102-64452018000100233
S0102-64452018000100261
S0102-64452018000100135
S0102-64452018000100069
S0102-64452018000100103
S0102-64452018000100167
S0102-64452018000100203
S0102-64452018000100039
S0102-64452018000100011
S2318-03312018000100201
S2318-03312018000100202
S2318-03312018000100203
S2318-03312018000100205
S2318-03312018000100204
S2318-03312018000100206
S2318-03312018000100401
S2318-03312018000100208
S2318-03312018000100209
S2318-03312018000100210
S2318-03312018000100207
S2318-03312018000100211
S2318-03312018000100212
S2318-03312018000100215
S2318-03312018000100213
S2318-03312018000100214
S2318-03312018000100217
S2318-03312018000100216
S2318-03312018000100218
S2318-03312018000100220
S2318-03312018000100219
S2318-03312018000100222
S2318-03312018000100221
S1413-8670201800

## Obter a lista dos artigos processados à partir de determinada data

In [6]:
articles = [i.code for i in client.documents(
    collection="scl", from_date="2018-05-28", only_identifiers=True)]

for i in articles:
    print(i)

S0034-76122018000200321
S1516-44462018000200192
S0001-37652018000301175
S0101-28002018005013103
S1516-44462018000200227
S0001-37652018000301035
S0034-76122018000200221
S1516-44462018000200138
S0001-37652018000300993
S0034-76122018000200264
S1516-44462018000200220
S0001-37652018000301215
S0102-33062018005006102
S1516-44462018000200229
S0001-37652018000301043
S0034-76122018000200244
S1516-44462018000200212
S0001-37652018000301073
S0102-33062018005006101
S1516-44462018000200115
S0001-37652018000300991
S0102-33062018005006103
S1516-44462018000200226
S0001-37652018000301187
S0101-28002018005013101
S1516-44462018000200154
S0001-37652018000301131
S0034-76122018000200285
S1516-44462018000200145
S0001-37652018000301251
S0101-28002018005013104
S1516-44462018000200174
S0001-37652018000301101
S0101-28002018005013102
S1516-44462018000200181
S0001-37652018000301233
S0101-28002018005013105
S1516-44462018000200163
S0001-37652018000301059
S0102-33062018005006104
S1516-44462018000200128
S0001-3765201800

S0034-72802018000300146
S1808-86942018000300332
S0103-50532018000701579
S1808-86942018000300290
S0102-36162018000300337
S2316-40182018000200101
S0103-50532018000701455
S1984-29612018000200177
S0103-50532018000701367
S1984-29612018000200254
S0102-69922018000100007
S1516-35982018000100406
S0102-36162018000300342
S1516-35982018000100604
S0102-36162018000300378
S1808-86942018000300305
S0103-50532018000701406
S1984-29612018000200141
S0034-72802018000300156
S1984-29612018000200146
S0102-36162018000300300
S1808-86942018000300393
S0103-50532018000701440
S2316-40182018000200011
S0102-36162018000300293
S1808-86942018000300324
S0034-72802018000300153
S2316-40182018000200455
S0103-50532018000701464
S2316-40182018000200195
S0103-50532018000701538
S1516-14392018000400234
S0103-50532018000701388
S1984-29612018000200154
S0102-36162018000300350
S2316-40182018000200409
S0102-36162018000300314
S1808-86942018000300351
S0103-50532018000701400
S1984-29612018000200237
S0102-69922018000100085
S1808-8694201800

In [7]:
print('Total de artigos:', len(articles))

Total de artigos: 2933


In [8]:
'S0100-54052018000200110' in articles

True

## Obter lista dos artigos à partir de conteúdo de *scilista*

In [9]:
scilista = """
aob v26n2
cpa n53
brag v77n2
rbgg v21n2
rbem v42n2
bpsr v12n2
isz v108
rbf v40n3
asoc v21
seq n78
alea v20n2
hcsm v25n2
alea v20n1
mioc v113n8
rn v31n2
rbf v40n3
rbccv v33n2
cr v48n6
rbgo v40n4
asas v40
ecos v27n1
pvb v38n3
jaos v26
cr v48n5
sp v44n2
bar v15n2
ccrh v31n82
mana v24n1
floram v25n2
isz v108
rbepop v35n1
bjmbr v51n8
edreal v43n3
rbfar v28n2
bdj v29n2
rbepop v35n3
qn v41n5
fm v31
rca v49n3
anaismp v26
eins v16n2
ambiagua v13n3
bor v32
ni v16n2
codas v30n3
mr v21n4
jped v94n3
aabc v90n2
ref v26n2
si v23n1
rdp v9n2
asoc v21
bn v18n3
rbef v40n4
reeusp v52
rh n177
dpjo v23n2
ean v22n3
rbedu v23
ress v27n2
reeusp v52
rounesp 2018nahead
rbpv 2018nahead
jvb 2018nahead
cta 2018nahead
rbp 2018nahead
abc 2018nahead
ijcs 2018nahead
rpp 2018nahead
rbla 2018nahead
spmj 2018nahead
gmb 2018nahead
rbgn v15n46
alm n18
hcsm v24s1
jpe v29
rbcsoc v33n97
rbem v42n1
fm v31
tinf v30n2
jbpml v54n2
aob v26n1
aob v26n1
sausoc v27n1
bjmbr v51n6
csp v34n5 
rgenf v39
brag 2018nahead
hcsm 2018nahead
spmj 2018nahead
edreal 2018nahead
aabc 2017nahead
aabc 2018nahead
pee v22n1
"""

In [10]:
linhas_scilista = (l for l in scilista.split('\n') if l)

In [11]:
tuplas_scilista = [entrada.split() for entrada in linhas_scilista]  # [['aob', 'v26n2'], ['cpa', 'n53'],...]

In [12]:
#issues = client.issues(collection='scl', extra_filter='{"issue.v930": {"$elemMatch": {"_": "bbr"}}', only_identifiers=True)
#print(list(issues))

In [13]:
acron_issn_map = {j.acronym:j.scielo_issn for j in client.journals(collection='scl')}

  """Entry point for launching an IPython kernel.


In [14]:
tuplas_issn_label = [(acron_issn_map[a], l) for a, l in tuplas_scilista if acron_issn_map.get(a)]

In [15]:
tuplas_issn_issuecode = [(issn, client.get_issue_code_from_label(label, issn, "scl"))
                        for issn, label in tuplas_issn_label]

In [16]:
def listarticles(issuecode):
    for i in client.documents(collection="scl", only_identifiers=True, extra_filter='{"code_issue": "%s"}' % issuecode):
        yield i.code
        
genarticles = [listarticles(i) for _, i in tuplas_issn_issuecode]

In [17]:
for a in itertools.chain(*genarticles):
    print(a)

S1413-78522018000200103
S1413-78522018000200140
S1413-78522018000200112
S1413-78522018000200098
S1413-78522018000200127
S1413-78522018000200145
S1413-78522018000200117
S1413-78522018000200086
S1413-78522018000200131
S1413-78522018000200094
S1413-78522018000200123
S1413-78522018000200108
S1413-78522018000200091
S1413-78522018000200082
S1413-78522018000200135
S0104-83332018000200402
S0104-83332018000200405
S0104-83332018000200401
S0104-83332018000200404
S0104-83332018000200407
S0104-83332018000200400
S0104-83332018000200406
S0104-83332018000200403
S0006-87052018000200394
S0006-87052018000200265
S0006-87052018000200326
S0006-87052018000200385
S0006-87052018000200333
S0006-87052018000200348
S0006-87052018000200283
S0006-87052018000200243
S0006-87052018000200221
S0006-87052018000200253
S0006-87052018000200404
S0006-87052018000200365
S0006-87052018000200212
S0006-87052018000200273
S0006-87052018000200372
S0006-87052018000200292
S0006-87052018000200299
S0006-87052018000200230
S0006-8705201800

S1678-77572018000100501
S1678-77572018000100401
S1678-77572018000100402
S1678-77572018000100404
S1678-77572018000100407
S1678-77572018000100403
S1678-77572018000100406
S1678-77572018000100405
S1678-77572018000100408
S1678-77572018000100409
S1678-77572018000100413
S1678-77572018000100411
S1678-77572018000100410
S1678-77572018000100412
S1678-77572018000100418
S1678-77572018000100420
S1678-77572018000100419
S1678-77572018000100417
S1678-77572018000100415
S1678-77572018000100416
S1678-77572018000100414
S1678-77572018000100422
S1678-77572018000100423
S1678-77572018000100424
S1678-77572018000100421
S1678-77572018000100425
S1678-77572018000100428
S1678-77572018000100426
S1678-77572018000100429
S1678-77572018000100427
S1678-77572018000100430
S1678-77572018000100433
S1678-77572018000100431
S1678-77572018000100432
S1678-77572018000100434
S1678-77572018000100435
S1678-77572018000100442
S1678-77572018000100446
S1678-77572018000100441
S1678-77572018000100436
S1678-77572018000100444
S1678-7757201800

S1679-45082018000200500
S1679-45082018000200501
S1679-45082018000200200
S1679-45082018000200202
S1679-45082018000200502
S1679-45082018000200201
S1679-45082018000200400
S1679-45082018000200205
S1679-45082018000200206
S1679-45082018000200100
S1679-45082018000200204
S1679-45082018000200203
S1679-45082018000200208
S1679-45082018000200700
S1679-45082018000200207
S1679-45082018000200209
S1679-45082018000200503
S1679-45082018000200211
S1679-45082018000200210
S1679-45082018000200900
S1679-45082018000200213
S1679-45082018000200212
S1679-45082018000200300
S1679-45082018000200800
S1980-993X2018000300300
S1980-993X2018000300301
S1980-993X2018000300303
S1980-993X2018000300302
S1980-993X2018000300304
S1980-993X2018000300305
S1980-993X2018000300306
S1980-993X2018000300308
S1980-993X2018000300307
S1980-993X2018000300309
S1980-993X2018000300311
S1980-993X2018000300310
S1980-993X2018000300314
S1980-993X2018000300312
S1980-993X2018000300313
S1806-83242018000100200
S1806-83242018000100201
S1806-8324201800

S0034-83092018000100300
S0034-83092018000100305
S0034-83092018000100311
S0034-83092018000100303
S0034-83092018000100304
S0034-83092018000100306
S0034-83092018000100302
S0034-83092018000100310
S0034-83092018000100308
S0034-83092018000100301
S0034-83092018000100307
S0034-83092018000100309
S0034-83092018000100313
S0034-83092018000100312
S0034-83092018000100314
S2176-94512018000200068
S2176-94512018000200037
S2176-94512018000200054
S2176-94512018000200022
S2176-94512018000200062
S2176-94512018000200030
S2176-94512018000200087
S2176-94512018000200046
S2176-94512018000200007
S2176-94512018000200075
S1414-81452018000300201
S1414-81452018000300701
S1414-81452018000300601
S1414-81452018000300202
S1414-81452018000300203
S1414-81452018000300204
S1414-81452018000300702
S1414-81452018000300205
S1414-81452018000300206
S1414-81452018000300602
S1414-81452018000300207
S1414-81452018000300208
S1414-81452018000300209
S1414-81452018000300703
S1414-81452018000300210
S1413-24782018000100200
S1413-2478201800

S0102-69092018000200502
S0102-69092018000200504
S0102-69092018000200503
S0102-69092018000200507
S0102-69092018000200703
S0102-69092018000200506
S0102-69092018000200701
S0102-69092018000200702
S0102-69092018000200505
S0102-69092018000200509
S0102-69092018000200508
S0102-69092018000200704
S0102-69092018000200501
S0102-69092018000200510
S0100-55022018000100161
S0100-55022018000100027
S0100-55022018000100075
S0100-55022018000100040
S0100-55022018000100181
S0100-55022018000100105
S0100-55022018000100190
S0100-55022018000100006
S0100-55022018000100199
S0100-55022018000100152
S0100-55022018000100142
S0100-55022018000100067
S0100-55022018000100057
S0100-55022018000100129
S0100-55022018000100226
S0100-55022018000100031
S0100-55022018000100084
S0100-55022018000100094
S0100-55022018000100047
S0100-55022018000100015
S0100-55022018000100216
S0100-55022018000100171
S0100-55022018000100115
S0100-55022018000100207
S0100-55022018000100121
S0103-51502018000100300
S0103-51502018000100203
S0103-5150201800

## Obtém lista de artigos de determinado fascículo que não tiveram seus DOI depositados em determinada data

Essa receita compara o conteúdo da consulta aos artigos de determinado periódico depositados num intervalo de tempo por meio do **doi.scielo.org** e os artigos contidos em um fascículo.

In [18]:
code_issue = '2448-167X20180003'

In [19]:
all_articles = set()
for i in client.documents(collection="scl", only_identifiers=True, extra_filter='{"code_issue": "%s"}' % code_issue):
    all_articles.add(i.code)

In [20]:
import rows
import requests
from io import BytesIO

In [21]:
url = 'http://doi.scielo.br/?filter_issn=&filter_journal_acronym=remi&filter_prefix=&filter_has_valid_references=&filter_submission_status=&filter_feedback_status=&filter_start_range=05%2F30%2F2018+-+06%2F29%2F2018'
html = requests.get(url).content
table = rows.import_from_html(BytesIO(html))

In [22]:
for f in table.fields.items():
    print(f)

('field_0', <class 'rows.fields.IntegerField'>)
('inicio_de_processo', <class 'rows.fields.DatetimeField'>)
('periodico', <class 'rows.fields.TextField'>)
('deposito', <class 'rows.fields.TextField'>)
('prefixo', <class 'rows.fields.FloatField'>)
('referencias_validas', <class 'rows.fields.BoolField'>)
('situacao_de_submissao', <class 'rows.fields.TextField'>)
('situacao_de_deposito', <class 'rows.fields.TextField'>)
('funcoes', <class 'rows.fields.TextField'>)


In [23]:
deposited_articles = set(r for r in [r.deposito[4:] for r in table] if r.startswith('S'+code_issue))

In [24]:
if not deposited_articles:
    print('Nenhum artigo consta como depositado no período consultado')
else:
    print('Total de artigos pendentes de depósito:', len((all_articles - deposited_articles)))
    for i in (all_articles - deposited_articles):
        print(i)

Total de artigos pendentes de depósito: 0


## Obtém lista de artigos à partir dos seus números DOI

In [25]:
def get_codes(q, c):
    for i in client.documents(collection=c, only_identifiers=True, extra_filter=q):
        yield i.code

In [26]:
def get_code(doi, regex=False, collection='scl'):
    q = r'{"doi": {"$regex": "^%s$", "$options": "i"}}' if regex else r'{"doi": "%s"}'
    q = q % doi
    try:
        return next(get_codes(q, collection))
    except StopIteration:
        return None

In [27]:
%%time
assert get_code('10.1590/0104-07072018002018editorial2'.upper()) == 'S0104-07072018000200100'

CPU times: user 12.6 ms, sys: 8.95 ms, total: 21.5 ms
Wall time: 561 ms


In [28]:
#assert get_code('10.1590/S0074-02761940000400003', regex=True) == 'S0074-02761940000400003'

In [29]:
dois = """
http://dx.doi.org/10.1590/1984-0462/;2018;36;2;00004 
http://dx.doi.org/10.1590/1984-0462/;2018;36;2;00018
http://dx.doi.org/10.1590/1983-211720182001016
http://dx.doi.org/10.1590/1983-21172018200111 
http://dx.doi.org/10.1590/1983-21172018200112
http://dx.doi.org/10.1590/0100-6991e-20181710 
http://dx.doi.org/10.1590/0100-6991e-20181719
http://dx.doi.org/10.1590/s1678-4634201844003001 
http://dx.doi.org/10.1590/s1678-4634201844171567 
http://dx.doi.org/10.1590/s1678-4634201844172094
http://dx.doi.org/10.1590/1808-057x201805560
http://dx.doi.org/10.1590/2446-4740.06117 
http://dx.doi.org/10.1590/2446-4740.02618
http://dx.doi.org/10.1590/0102-311xer165716 
http://dx.doi.org/10.1590/0102-311x00097018
http://dx.doi.org/10.1590/0102-311x00030318 
http://dx.doi.org/10.1590/0102-311x00116317 
http://dx.doi.org/10.1590/0102-311x00074817 
http://dx.doi.org/10.1590/0102-311x00113717
http://dx.doi.org/10.1590/0102-311x00088117
http://dx.doi.org/10.1590/0102-311x00029817 
http://dx.doi.org/10.1590/0102-311x00094417 
http://dx.doi.org/10.1590/0102-311x00213816
http://dx.doi.org/10.1590/0102-311x00156416
http://dx.doi.org/10.1590/0102-311x00093417 
http://dx.doi.org/10.1590/0104-07072018002018editorial2 
http://dx.doi.org/10.1590/0104-070720180005750016 
http://dx.doi.org/10.1590/0104-07072018005710016
http://dx.doi.org/10.1590/0104-07072018005180016
http://dx.doi.org/10.1590/0104-07072018004880016 
http://dx.doi.org/10.1590/0104-07072018003820016 
http://dx.doi.org/10.1590/0104-07072018003770017 
http://dx.doi.org/10.1590/0104-070720180002820016 
http://dx.doi.org/10.1590/0104-070720180001460017
http://dx.doi.org/10.1590/0104-070720180000560017 
http://dx.doi.org/10.1590/0104-07072018000170016
http://dx.doi.org/10.1590/10.1590/2316-4018541 
http://dx.doi.org/10.1590/10.1590/2316-4018542
http://dx.doi.org/10.1590/10.1590/2316-4018543
http://dx.doi.org/10.1590/10.1590/2316-4018544
http://dx.doi.org/10.1590/10.1590/2316-4018545
http://dx.doi.org/10.1590/10.1590/2316-4018546
http://dx.doi.org/10.1590/10.1590/2316-4018547 
http://dx.doi.org/10.1590/10.1590/2316-4018548
http://dx.doi.org/10.1590/10.1590/2316-4018549
http://dx.doi.org/10.1590/10.1590/2316-40185410 
http://dx.doi.org/10.1590/10.1590/2316-40185411
http://dx.doi.org/10.1590/10.1590/2316-40185412
http://dx.doi.org/10.1590/10.1590/2316-40185413
http://dx.doi.org/10.1590/10.1590/2316-40185414 
http://dx.doi.org/10.1590/10.1590/2316-40185415 
http://dx.doi.org/10.1590/10.1590/2316-40185416
http://dx.doi.org/10.1590/10.1590/2316-40185417
http://dx.doi.org/10.1590/10.1590/2316-40185418 
http://dx.doi.org/10.1590/10.1590/2316-40185419
http://dx.doi.org/10.1590/10.1590/2316-40185420
http://dx.doi.org/10.1590/10.1590/2316-40185421
http://dx.doi.org/10.1590/10.1590/2316-40185422
http://dx.doi.org/10.1590/10.1590/2316-40185423
http://dx.doi.org/10.1590/10.1590/2316-40185424 
http://dx.doi.org/10.1590/10.1590/2316-40185425
http://dx.doi.org/10.1590/1806-9584-2018v26n249845 
http://dx.doi.org/10.1590/1806-9584-2018v26n249763
http://dx.doi.org/10.1590/1806-9584-2018v26n244481
http://dx.doi.org/10.1590/1806-9584-2018v26n245859 
http://dx.doi.org/10.1590/1806-9584-2018v26n238901
http://dx.doi.org/10.1590/1806-9584-2018v26n234529 
http://dx.doi.org/10.1590/s1980-220x2016050903315 
http://dx.doi.org/10.1590/s1980-220x2017009103336 
http://dx.doi.org/10.1590/s1980-220x2017033903313
http://dx.doi.org/10.1590/s1980-220x2017025303321
http://dx.doi.org/10.1590/s1980-220x2017017903302 
http://dx.doi.org/10.1590/s1980-220x2017013903309
""".strip().splitlines()

In [30]:
cleaned_dois = [d.replace('http://dx.doi.org/', '').strip() for d in dois]

In [31]:
#buscando com DOIs contendo todos os caracteres maiúsculos (o ArticleMeta aparentemente faz isso =/)
article_codes = [(d, get_code(d.upper())) for d in cleaned_dois]

In [32]:
#agora uma abordagem menos otimista; assumimos que podem haver múltiplos registros com o mesmo DOI
missing_codes = [(d, list(get_codes(d))) for d, code in article_codes if code is None]

Os DOIs listados abaixo não possuem nenhum artigo associado:

In [33]:
missing_codes

[]

In [34]:
for c in [code for _, code in article_codes if code]:
    print(c)

S0103-05822018000200164
S0103-05822018000200122
S1983-21172018000100900
S1983-21172018000100213
S1983-21172018000100212
S0100-69912018000300400
S0100-69912018000300150
S1517-97022018000100750
S1517-97022018000100459
S1517-97022018000100460
S1519-70772018005004101
S2446-47402018000200166
S2446-47402018000200157
S0102-311X2018000609001
S0102-311X2018000600201
S0102-311X2018000600301
S0102-311X2018000600501
S0102-311X2018000600502
S0102-311X2018000603001
S0102-311X2018000604001
S0102-311X2018000605001
S0102-311X2018000605002
S0102-311X2018000605003
S0102-311X2018000605004
S0102-311X2018000605005
S0104-07072018000200100
S0104-07072018000200331
S0104-07072018000200330
S0104-07072018000200329
S0104-07072018000200328
S0104-07072018000200327
S0104-07072018000200326
S0104-07072018000200325
S0104-07072018000200324
S0104-07072018000200323
S0104-07072018000200322
S2316-40182018000200011
S2316-40182018000200021
S2316-40182018000200041
S2316-40182018000200061
S2316-40182018000200085
S2316-4018201800