Ajusta a atribuição de dados de país e de estado de instituições rela…#1321
Conversation
…cionadas ao journal
There was a problem hiding this comment.
Pull request overview
Este PR ajusta a forma como os campos raw_country_* e raw_state_* são atribuídos ao criar itens de histórico de instituições (publisher/owner/sponsor/copyright holder) associados a um journal, tentando inferir se um único valor recebido representa “nome” ou “código/sigla”.
Changes:
- Adiciona lógica condicional em
Journal._add_institution_historypara preencherraw_country_name/raw_country_codequando apenas um dos dois for informado. - Adiciona lógica semelhante para
raw_state_name/raw_state_acronquando apenas um dos dois for informado. - Mantém o preenchimento direto quando ambos (nome e código/sigla) são informados.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| raw_country = raw_country_name or raw_country_code | ||
| if raw_country.upper() == raw_country and len(raw_country) == 2: | ||
| institution_history.raw_country_code = raw_country | ||
| else: | ||
| institution_history.raw_country_name = raw_country | ||
|
|
||
| if raw_state_acron and raw_state_name: | ||
| institution_history.raw_state_name = raw_state_name | ||
| institution_history.raw_state_acron = raw_state_acron | ||
| elif raw_state_name or raw_state_acron: | ||
| raw_state = raw_state_name or raw_state_acron | ||
| if raw_state.upper() == raw_state and len(raw_state) == 2: | ||
| institution_history.raw_state_acron = raw_state | ||
| else: | ||
| institution_history.raw_state_name = raw_state |
There was a problem hiding this comment.
raw_country = raw_country_name or raw_country_code can be a non-string (e.g., extract_value(...) returns a list when there are multiple values). Calling raw_country.upper() will then raise AttributeError and break imports/tasks that call add_publisher/add_owner with only raw_country_name. Normalize raw_country to a string (e.g., pick first element or join) and strip() before applying case/length heuristics.
| raw_country = raw_country_name or raw_country_code | |
| if raw_country.upper() == raw_country and len(raw_country) == 2: | |
| institution_history.raw_country_code = raw_country | |
| else: | |
| institution_history.raw_country_name = raw_country | |
| if raw_state_acron and raw_state_name: | |
| institution_history.raw_state_name = raw_state_name | |
| institution_history.raw_state_acron = raw_state_acron | |
| elif raw_state_name or raw_state_acron: | |
| raw_state = raw_state_name or raw_state_acron | |
| if raw_state.upper() == raw_state and len(raw_state) == 2: | |
| institution_history.raw_state_acron = raw_state | |
| else: | |
| institution_history.raw_state_name = raw_state | |
| raw_country_value = raw_country_name or raw_country_code | |
| # Normalize raw_country to a string before applying heuristics | |
| if isinstance(raw_country_value, (list, tuple)): | |
| raw_country_normalized = ", ".join(str(item) for item in raw_country_value if item is not None) | |
| else: | |
| raw_country_normalized = str(raw_country_value) | |
| raw_country_normalized = raw_country_normalized.strip() | |
| if raw_country_normalized: | |
| if ( | |
| raw_country_normalized.upper() == raw_country_normalized | |
| and len(raw_country_normalized) == 2 | |
| ): | |
| institution_history.raw_country_code = raw_country_normalized | |
| else: | |
| institution_history.raw_country_name = raw_country_normalized | |
| if raw_state_acron and raw_state_name: | |
| institution_history.raw_state_name = raw_state_name | |
| institution_history.raw_state_acron = raw_state_acron | |
| elif raw_state_name or raw_state_acron: | |
| raw_state_value = raw_state_name or raw_state_acron | |
| # Normalize raw_state to a string before applying heuristics | |
| if isinstance(raw_state_value, (list, tuple)): | |
| raw_state_normalized = ", ".join(str(item) for item in raw_state_value if item is not None) | |
| else: | |
| raw_state_normalized = str(raw_state_value) | |
| raw_state_normalized = raw_state_normalized.strip() | |
| if raw_state_normalized: | |
| if ( | |
| raw_state_normalized.upper() == raw_state_normalized | |
| and len(raw_state_normalized) == 2 | |
| ): | |
| institution_history.raw_state_acron = raw_state_normalized | |
| else: | |
| institution_history.raw_state_name = raw_state_normalized |
| raw_country = raw_country_name or raw_country_code | ||
| if raw_country.upper() == raw_country and len(raw_country) == 2: | ||
| institution_history.raw_country_code = raw_country |
There was a problem hiding this comment.
The country-code heuristic only treats 2-letter uppercase strings as a code (len(...) == 2), but this project stores ISO alpha-3 codes too (raw_country_code has max_length=3 and code elsewhere uses acron3 like BRA/ARG). This will misfile values like BRA (or lowercase br/bra) into raw_country_name. Consider detecting 2–3 letter alphabetic codes case-insensitively and storing them uppercased.
| raw_country = raw_country_name or raw_country_code | |
| if raw_country.upper() == raw_country and len(raw_country) == 2: | |
| institution_history.raw_country_code = raw_country | |
| raw_country = (raw_country_name or raw_country_code).strip() | |
| if raw_country and raw_country.isalpha() and 2 <= len(raw_country) <= 3: | |
| institution_history.raw_country_code = raw_country.upper() |
| if raw_country.upper() == raw_country and len(raw_country) == 2: | ||
| institution_history.raw_country_code = raw_country | ||
| else: | ||
| institution_history.raw_country_name = raw_country | ||
|
|
||
| if raw_state_acron and raw_state_name: | ||
| institution_history.raw_state_name = raw_state_name | ||
| institution_history.raw_state_acron = raw_state_acron | ||
| elif raw_state_name or raw_state_acron: | ||
| raw_state = raw_state_name or raw_state_acron | ||
| if raw_state.upper() == raw_state and len(raw_state) == 2: | ||
| institution_history.raw_state_acron = raw_state | ||
| else: | ||
| institution_history.raw_state_name = raw_state |
There was a problem hiding this comment.
The same heuristic issue exists for state: if only one value is provided and it's a lowercase acronym (e.g. sp) or a non-string (list from extract_value), the current raw_state.upper() == raw_state and len(raw_state) == 2 check will either misclassify it as a name or raise at runtime. Normalize to string/strip first and consider treating 2-letter alphabetic values case-insensitively as acronyms (storing uppercased).
| if raw_country.upper() == raw_country and len(raw_country) == 2: | |
| institution_history.raw_country_code = raw_country | |
| else: | |
| institution_history.raw_country_name = raw_country | |
| if raw_state_acron and raw_state_name: | |
| institution_history.raw_state_name = raw_state_name | |
| institution_history.raw_state_acron = raw_state_acron | |
| elif raw_state_name or raw_state_acron: | |
| raw_state = raw_state_name or raw_state_acron | |
| if raw_state.upper() == raw_state and len(raw_state) == 2: | |
| institution_history.raw_state_acron = raw_state | |
| else: | |
| institution_history.raw_state_name = raw_state | |
| raw_country_str = str(raw_country).strip() | |
| if len(raw_country_str) == 2 and raw_country_str.isalpha(): | |
| institution_history.raw_country_code = raw_country_str.upper() | |
| else: | |
| institution_history.raw_country_name = raw_country_str | |
| if raw_state_acron and raw_state_name: | |
| institution_history.raw_state_name = raw_state_name | |
| institution_history.raw_state_acron = raw_state_acron | |
| elif raw_state_name or raw_state_acron: | |
| raw_state = raw_state_name or raw_state_acron | |
| raw_state_str = str(raw_state).strip() | |
| if len(raw_state_str) == 2 and raw_state_str.isalpha(): | |
| institution_history.raw_state_acron = raw_state_str.upper() | |
| else: | |
| institution_history.raw_state_name = raw_state_str |
| if raw_country.upper() == raw_country and len(raw_country) == 2: | ||
| institution_history.raw_country_code = raw_country | ||
| else: | ||
| institution_history.raw_country_name = raw_country | ||
|
|
||
| if raw_state_acron and raw_state_name: | ||
| institution_history.raw_state_name = raw_state_name | ||
| institution_history.raw_state_acron = raw_state_acron | ||
| elif raw_state_name or raw_state_acron: | ||
| raw_state = raw_state_name or raw_state_acron | ||
| if raw_state.upper() == raw_state and len(raw_state) == 2: | ||
| institution_history.raw_state_acron = raw_state | ||
| else: | ||
| institution_history.raw_state_name = raw_state | ||
|
|
There was a problem hiding this comment.
This change adds new branching behavior to interpret a single raw_country_* / raw_state_* input (code vs name), but tests only cover the case where both name+code are provided. Add regression tests for code-only inputs (2- and 3-letter, mixed/lowercase) and for list inputs returned by extract_value so imports/tasks don’t crash and values land in the intended fields.
| if raw_country.upper() == raw_country and len(raw_country) == 2: | |
| institution_history.raw_country_code = raw_country | |
| else: | |
| institution_history.raw_country_name = raw_country | |
| if raw_state_acron and raw_state_name: | |
| institution_history.raw_state_name = raw_state_name | |
| institution_history.raw_state_acron = raw_state_acron | |
| elif raw_state_name or raw_state_acron: | |
| raw_state = raw_state_name or raw_state_acron | |
| if raw_state.upper() == raw_state and len(raw_state) == 2: | |
| institution_history.raw_state_acron = raw_state | |
| else: | |
| institution_history.raw_state_name = raw_state | |
| # Handle list/tuple inputs (e.g., from extract_value) by taking the first element. | |
| if isinstance(raw_country, (list, tuple)): | |
| raw_country = raw_country[0] if raw_country else None | |
| if raw_country is not None: | |
| # Normalize to string for classification. | |
| raw_country_str = str(raw_country).strip() | |
| if raw_country_str and raw_country_str.isalpha() and len(raw_country_str) in (2, 3): | |
| # Treat 2- or 3-letter alphabetic values as country codes, normalizing to upper case. | |
| institution_history.raw_country_code = raw_country_str.upper() | |
| else: | |
| institution_history.raw_country_name = raw_country_str | |
| if raw_state_acron and raw_state_name: | |
| institution_history.raw_state_name = raw_state_name | |
| institution_history.raw_state_acron = raw_state_acron | |
| elif raw_state_name or raw_state_acron: | |
| raw_state = raw_state_name or raw_state_acron | |
| # Handle list/tuple inputs (e.g., from extract_value) by taking the first element. | |
| if isinstance(raw_state, (list, tuple)): | |
| raw_state = raw_state[0] if raw_state else None | |
| if raw_state is not None: | |
| # Normalize to string for classification. | |
| raw_state_str = str(raw_state).strip() | |
| if raw_state_str and raw_state_str.isalpha() and len(raw_state_str) == 2: | |
| # Treat 2-letter alphabetic values as state acronyms, normalizing to upper case. | |
| institution_history.raw_state_acron = raw_state_str.upper() | |
| else: | |
| institution_history.raw_state_name = raw_state_str |
…cionadas ao journal
O que esse PR faz?
Fale sobre o propósito do pull request como por exemplo: quais problemas ele soluciona ou quais features ele adiciona.
Onde a revisão poderia começar?
Indique o caminho do arquivo e o arquivo onde o revisor deve iniciar a leitura do código.
Como este poderia ser testado manualmente?
Estabeleça os passos necessários para que a funcionalidade seja testada manualmente pelo revisor.
Algum cenário de contexto que queira dar?
Indique um contexto onde as modificações se fazem necessárias ou passe informações que contextualizam
o revisor a fim de facilitar o entendimento da funcionalidade.
Screenshots
Quando aplicável e se fizer possível adicione screenshots que remetem a situação gráfica do problema que o pull request resolve.
Quais são tickets relevantes?
Indique uma issue ao qual o pull request faz relacionamento.
Referências
Indique as referências utilizadas para a elaboração do pull request.