Skip to content

Filter non-active journals from KBART export#80

Merged
robertatakenaka merged 3 commits intomasterfrom
copilot/remove-inactive-periodicals
Feb 3, 2026
Merged

Filter non-active journals from KBART export#80
robertatakenaka merged 3 commits intomasterfrom
copilot/remove-inactive-periodicals

Conversation

Copy link
Contributor

Copilot AI commented Feb 3, 2026

KBART export was including all journals regardless of publication status. Library discovery systems expect only actively publishing journals.

Changes

  • Added current_status filter in Dumper.items() to skip journals with status suspended or deceased
  • Only journals with current_status == 'current' are now exported
  • Added debug logging for skipped journals
for data in self._articlemeta.journals(collection=self.collection, issn=issn):
    if data.current_status != 'current':
        logger.debug('Skipping non-active journal: %s (status: %s)' % (data.scielo_issn, data.current_status))
        continue
    logger.debug('Reading document: %s' % data.scielo_issn)
    yield self.fmt_csv(data)
Original prompt

This section details on the original issue you should resolve

<issue_title>Excluir os periódicos que não estão ativos</issue_title>
<issue_description>

for data in self._articlemeta.journals(collection=self.collection, issn=issn):
</issue_description>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits February 3, 2026 20:33
Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>
Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>
Copilot AI changed the title [WIP] Remove inactive periodicals from export Filter non-active journals from KBART export Feb 3, 2026
Copilot AI requested a review from robertatakenaka February 3, 2026 20:36
@robertatakenaka robertatakenaka marked this pull request as ready for review February 3, 2026 20:49
Copilot AI review requested due to automatic review settings February 3, 2026 20:49
@robertatakenaka robertatakenaka merged commit 6d2c37b into master Feb 3, 2026
2 checks passed
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the KBART export to exclude journals that are not actively publishing, aligning the exported dataset with expectations of library discovery systems.

Changes:

  • Filtered KBART export to only include journals where current_status == 'current'.
  • Added debug logging to record when non-active journals are skipped.
  • Updated .gitignore to ignore a src/ directory (pip source directory).

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 2 comments.

File Description
export/kbart.py Filters out non-current journals during iteration and adds debug logs for skipped titles.
.gitignore Ignores src/ directory.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +133 to 135
logger.debug('Skipping non-active journal: %s (status: %s)' % (data.scielo_issn, data.current_status))
continue
logger.debug('Reading document: %s' % data.scielo_issn)
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These logger.debug(...) calls build the message via % interpolation before the log level check, which adds overhead when running with INFO/WARNING. Prefer passing the format string and arguments to the logger (e.g., logger.debug('Skipping ... %s', value)), which defers formatting until needed.

Suggested change
logger.debug('Skipping non-active journal: %s (status: %s)' % (data.scielo_issn, data.current_status))
continue
logger.debug('Reading document: %s' % data.scielo_issn)
logger.debug('Skipping non-active journal: %s (status: %s)', data.scielo_issn, data.current_status)
continue
logger.debug('Reading document: %s', data.scielo_issn)

Copilot uses AI. Check for mistakes.
Comment on lines +132 to 136
if data.current_status != 'current':
logger.debug('Skipping non-active journal: %s (status: %s)' % (data.scielo_issn, data.current_status))
continue
logger.debug('Reading document: %s' % data.scielo_issn)
yield self.fmt_csv(data)
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new current_status filter, items() will only ever call fmt_csv() for current journals. In fmt_csv() the last_document lookup and the non-current branch become unreachable, so last_included_document_by_journal() is now always an unnecessary API call for every exported row. Consider removing the non-current branch entirely (or at least moving the last_document lookup inside the non-current branch if you intend to keep it for future use).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Excluir os periódicos que não estão ativos

2 participants