Skip to content

Refatora biblioteca com suporte a acessos a livros nos moldes COUNTER R5.1#57

Merged
pitangainnovare merged 22 commits intoscieloorg:mainfrom
pitangainnovare:v2.0.0
May 1, 2026
Merged

Refatora biblioteca com suporte a acessos a livros nos moldes COUNTER R5.1#57
pitangainnovare merged 22 commits intoscieloorg:mainfrom
pitangainnovare:v2.0.0

Conversation

@pitangainnovare
Copy link
Copy Markdown
Contributor

O que esse PR faz?

Este PR prepara o release v2.0.0 do scielo-usage-counter com suporte a formato BunnyCDN, tradutor de URLs do SciELO Books com contagem R5.1 de livros e capítulos, módulo de metadados para indexação de documentos e fontes no OpenSearch, correção de compatibilidade do device-detector 0.10, tradução de strings de português para inglês em todos os scripts CLI, limpeza de código, correção do GeoIp, melhoria de fixtures e atualização de dependências.

O README foi reescrito com env vars, CLI atualizado, library usage, seção de features

Onde a revisão poderia começar?

  1. scielo_usage_counter/log_handler.py:397-409 — mudança em format_client_name e format_client_version (compatibilidade device-detector)
  2. scielo_usage_counter/translator/books.py — novo tradutor de URLs do SciELO Books
  3. scielo_usage_counter/values.pyPATTERN_BUNNYCDN_LOG_FORMAT + limpeza de duplicatas
  4. scielo_usage_counter/utils/metadata/ — novo módulo de metadados

Como este poderia ser testado manualmente?

# 1. Instalar dependências
pip install -r requirements.txt
pip install -e .

# 2. Rodar testes
pytest tests/test_log_handler.py tests/test_counter.py -v

# 3. Testar parse de log Apache
parse-log -m tests/fixtures/map.mmdb -r tests/fixtures/counter-robots.txt \
  -f tests/fixtures/usage.log --validate

# 4. Testar parse de log BunnyCDN
parse-log -m tests/fixtures/map.mmdb -r tests/fixtures/counter-robots.txt \
  -f tests/fixtures/usage.scl.bunnynet.log --validate

# 5. Verificar stats
cat tests/fixtures/usage.log.processed.summary

Algum cenário de contexto que queira dar?

Este PR é parte do esforço de modernização das bibliotecas SciELO, em conjunto com:

  • scielo_log_validator v2.0.0 — drop Python 2, BunnyCDN, env vars
  • usage v2.0.0 — novos apps Document/Source/Reports, pipeline OpenSearch

As mudanças de compatibilidade com device-detector 0.10 são breaking changes para o formato de saída: nomes de client agora são completos (Chrome em vez de CH, Firefox em vez de FF). As fixtures processadas foram regeneradas.

Screenshots

N/A

Quais são tickets relevantes?

#53
#55

Referências

N/A

Copilot AI and others added 22 commits February 6, 2026 00:36
Co-authored-by: pitangainnovare <158627036+pitangainnovare@users.noreply.github.com>
…arity

Co-authored-by: pitangainnovare <158627036+pitangainnovare@users.noreply.github.com>
Co-authored-by: pitangainnovare <158627036+pitangainnovare@users.noreply.github.com>
Co-authored-by: pitangainnovare <158627036+pitangainnovare@users.noreply.github.com>
- Updated requirements.txt to use scielo_log_validator@0.5.1
- Added PATTERN_BUNNYCDN_LOG_FORMAT to values.py for pipe-delimited logs
- Created opac_bunnynet.py translator (delegates to OPAC translator)
- Updated log_handler.py to detect and parse bunnynet format
- Added Unix timestamp date handling in format_date()
- Enhanced match_with_best_pattern() to detect pipe-delimited format
- Modified parse_line() to handle bunnynet-specific fields
- Created test fixture usage.bunnynet.log with sample logs
- Added comprehensive tests in test_opac_bunnynet.py and test_log_handler.py
- All new tests passing (6/6)

Co-authored-by: pitangainnovare <158627036+pitangainnovare@users.noreply.github.com>
Co-authored-by: pitangainnovare <158627036+pitangainnovare@users.noreply.github.com>
…rom logs

Co-authored-by: pitangainnovare <158627036+pitangainnovare@users.noreply.github.com>
…examples

Co-authored-by: pitangainnovare <158627036+pitangainnovare@users.noreply.github.com>
- Replace usage.bunnynet.log with usage.scl.bunnynet.log
- Replace test_books_access_examples.py with test_counter_books.py
- Add usage.books.log fixture with real Apache and BunnyCDN book log lines
- Add scielo_usage_counter/utils/metadata module for document/source indexing
- Remove import of tests.fixtures.books_real_logs in test_log_handler.py
  and test_books.py
- Define BOOKS_LOG_EXPECTED inline with per-test expected values
- Read raw log lines from tests/fixtures/usage.books.log
- Remove the now-unused books_real_logs.py Python fixture
…or 0.10

- format_client_name: use client_name() instead of removed client_short_name()
- format_client_version: replace device.UNKNOWN with 'UNK' fallback
- Update processed fixture files with regenerated client names
  (SF→Safari, CH→Chrome, FF→Firefox, CM→Chrome Mobile)
- Update dataverse fixture stats (ignored_lines_static_resources: 1172→1168
  due to xml removal from EXTENSIONS_STATIC)
- parse_log.py: translate log messages, argparse help, fix shebang,
  fix contradictory --validate help text (was 'Disable' but action stores True),
  fix argparse splat to use vars(args) instead of args.__dict__
- download_geomap.py: translate exception messages and argparse help
- values.py: remove duplicate 'tbz' entry in EXTENSIONS_DOWNLOAD,
  remove 'xml' from EXTENSIONS_STATIC (was also in EXTENSIONS_DOWNLOAD)
- file_utils.py: remove unused translate_path() referencing non-existent
  LOG_PATH_TRANSLATOR, remove unused import
- opac_alpha.py: fix typo _extract_artifitial_pid → _extract_artificial_pid
- Add __init__ to initialize self.__map = None
- Replace getattr(self, '_GeoIp__map', None) with self.map in ip_to_geolocation
- Update scielo_log_validator requirement from 1.0.0 to 2.0.0
- Add scielo_scholarly_data to setup.py install_requires
- Add .idea/ to .gitignore
- Add Books URL translator for scielo.org/id/<book>/<chapter> patterns
- Support HTML, PDF, EPUB and XHTML media formats for books
- Add book-level and chapter-level metrics with unique counting
- Update counter with Books R5.1 access counting logic
- Update URLTranslationManager to dispatch to books translator
- Add publisher_name, subject_area_capes, subject_area_wos, year_of_publication
  to classic, opac, opac_alpha, dataverse and preprints translators
- Update translator tests with new expected fields
- Add BunnyCDN format test cases for opac translator
@pitangainnovare pitangainnovare self-assigned this May 1, 2026
@pitangainnovare pitangainnovare added the enhancement New feature or request label May 1, 2026
@pitangainnovare pitangainnovare merged commit 82bc150 into scieloorg:main May 1, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adicionar suporte a logs originários do SciELO Livros Adicionar suporte a logs em formato bunnynet para adoção em SciELO Usage

2 participants