Skip to content

Commit

Permalink
ENH define LanguageNotFoundError, support python 3.11, drop 3.7
Browse files Browse the repository at this point in the history
  • Loading branch information
jacksonllee committed Nov 27, 2022
1 parent 69664a7 commit 1afa5a8
Show file tree
Hide file tree
Showing 8 changed files with 94 additions and 88 deletions.
4 changes: 2 additions & 2 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
type: string
docker:
# Pick the highest Python 3.x version that this package is known to support
- image: cimg/python:3.10
- image: cimg/python:3.11
#auth:
# username: $DOCKERHUB_USERNAME
# password: $DOCKERHUB_PASSWORD
Expand Down Expand Up @@ -111,7 +111,7 @@ workflows:
- bandit
matrix:
parameters:
python-version: ["3.7", "3.8", "3.9", "3.10"]
python-version: ["3.8", "3.9", "3.10", "3.11"]
- build-python-win:
requires:
- flake8
Expand Down
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,20 @@ major/minor/micro version numbers like `05` (it'd have to be just `5`).
### Fixed
### Security

## [2022.11.27]

### Added
* Defined the `LanguageNotFoundError` exception.
* Added support for Python 3.11.

### Changed
* If the `Language` class methods `match`, `from_part3`, etc. receive an invalid
input language code or name, a `LanguageNotFoundError` is now raised.
(Previously, `None` was returned with no exception raised.)

### Removed
* Dropped support for Python 3.7.

## [2022.9.17]

### Added
Expand Down
33 changes: 15 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,13 +52,18 @@ Language(part3='fra', part2b='fre', part2t='fra', part1='fr', scope='I', type='L
>>> lang5 = iso639.Language.from_name('French') # ISO 639-3 reference language name
```

#### You Get `None` for Invalid Inputs

The user input is case-sensitive!
#### A `LanguageNotFoundError` is Raised for Invalid Inputs

```python
>>> None == iso639.Language.from_part3('Fra') == iso639.Language.from_name("unknown language")
True
>>> iso639.Language.from_part3('Fra') # The user input is case-sensitive!
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
LanguageNotFoundError: 'Fra' isn't an ISO language code or name
>>>
>>> iso639.Language.from_name("unknown language")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
LanguageNotFoundError: 'unknown language' isn't an ISO language code or name
```

### Accessing Attributes
Expand Down Expand Up @@ -108,21 +113,13 @@ accessing a specific attribute from unknown inputs, e.g., the ISO 639-3 code.
True
```

If there's no match, `None` is returned.
You may need to catch a potential `AttributeError`:
If there's no match, a `LanguageNotFoundError` is raised,
which you may want to catch:

```python
>>> lang = iso639.Language.match('not gonna find a match')
>>> lang is None
True
>>> lang.part3
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'part3'
>>> try:
... code = lang.part3
... except AttributeError:
... code = None
... lang = iso639.Language.match('not gonna find a match')
... except iso639.LanguageNotFoundError:
... print("no match found!")
...
no match found!
Expand Down Expand Up @@ -217,7 +214,7 @@ Beyond that, the precise order in matching is as follows:

Only exact matching is done (there's no fuzzy string matching of any sort).
As soon as a match is found, `Language.match` returns a `Language` instance.
If there isn't a match, `None` is returned.
If there isn't a match, a `LanguageNotFoundError` is raised.

### `Language` is a dataclass

Expand Down
8 changes: 4 additions & 4 deletions dev-requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
black==22.6.0
build==0.8.0
flake8==5.0.4
pytest==7.1.2
black==22.10.0
build==0.9.0
flake8==6.0.0
pytest==7.2.0
twine==4.0.1
8 changes: 4 additions & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,13 @@ build-backend = "setuptools.build_meta"

[project]
name = "python-iso639"
version = "2022.9.17"
version = "2022.11.27"
description = "Look-up utilities for ISO 639 language codes and names"
readme = "README.md"
requires-python = ">= 3.7"
requires-python = ">= 3.8"
license = { text = "Apache 2.0" }
authors = [ { name = "Jackson L. Lee", email = "jacksonlunlee@gmail.com" } ]
keywords = ["ISO 639", "language codes", "languages", "linguistics"]
dependencies = [ 'importlib-metadata; python_version < "3.8"' ]
classifiers = [
"Development Status :: 5 - Production/Stable",
"Intended Audience :: Developers",
Expand All @@ -22,10 +21,10 @@ classifiers = [
"Operating System :: OS Independent",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3 :: Only",
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Topic :: Text Processing",
"Topic :: Text Processing :: General",
"Topic :: Text Processing :: Indexing",
Expand All @@ -45,4 +44,5 @@ package-dir = { "" = "src" }
package-data = { "iso639" = ["languages.db"] }

[tool.pytest.ini_options]
addopts = "-vv --durations=0 --strict-markers"
testpaths = [ "tests" ]
17 changes: 9 additions & 8 deletions src/iso639/__init__.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,17 @@
try:
from importlib.metadata import version
except ModuleNotFoundError:
# For Python 3.7
from importlib_metadata import version

import datetime
from importlib.metadata import version

from .language import Language, _get_all_languages
from .language import Language, LanguageNotFoundError, _get_all_languages


__version__ = version("python-iso639")
__all__ = ["__version__", "Language", "ALL_LANGUAGES", "DATA_LAST_UPDATED"]
__all__ = [
"__version__",
"ALL_LANGUAGES",
"DATA_LAST_UPDATED",
"Language",
"LanguageNotFoundError",
]

DATA_LAST_UPDATED = datetime.date(2022, 3, 11)

Expand Down
87 changes: 39 additions & 48 deletions src/iso639/language.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,18 @@
import sqlite3
from dataclasses import dataclass

from typing import Iterable, List, Tuple, Union
from typing import Iterable, List, Optional, Tuple


_DB = sqlite3.connect(
os.path.join(os.path.dirname(os.path.realpath(__file__)), "languages.db"),
)


class LanguageNotFoundError(Exception):
pass


@dataclass(frozen=True)
class Name:
"""Represents an alternative name of a language."""
Expand Down Expand Up @@ -113,37 +117,27 @@ def match(cls, user_input) -> "Language":
("name_index", "Print_Name"),
("name_index", "Inverted_Name"),
)
part3 = _guess_part3(user_input, query_order)
language = _get_language_from_part3(part3)
return language
return _get_language(user_input, query_order)

@classmethod
def from_part3(cls, user_input) -> "Language":
"""Return a ``Language`` instance from an ISO 639-3 code."""
part3 = _guess_part3(user_input, (("codes", "Id"),))
language = _get_language_from_part3(part3)
return language
return _get_language(user_input, (("codes", "Id"),))

@classmethod
def from_part2b(cls, user_input) -> "Language":
"""Return a ``Language`` instance from an ISO 639-2 (bibliographic) code."""
part3 = _guess_part3(user_input, (("codes", "Part2B"),))
language = _get_language_from_part3(part3)
return language
return _get_language(user_input, (("codes", "Part2B"),))

@classmethod
def from_part2t(cls, user_input) -> "Language":
"""Return a ``Language`` instance from an ISO 639-2 (terminological) code."""
part3 = _guess_part3(user_input, (("codes", "Part2T"),))
language = _get_language_from_part3(part3)
return language
return _get_language(user_input, (("codes", "Part2T"),))

@classmethod
def from_part1(cls, user_input) -> "Language":
"""Return a ``Language`` instance from an ISO 639-1 code."""
part3 = _guess_part3(user_input, (("codes", "Part1"),))
language = _get_language_from_part3(part3)
return language
return _get_language(user_input, (("codes", "Part1"),))

@classmethod
def from_name(cls, user_input) -> "Language":
Expand All @@ -153,53 +147,50 @@ def from_name(cls, user_input) -> "Language":
("name_index", "Print_Name"),
("name_index", "Inverted_Name"),
)
part3 = _guess_part3(user_input, query_order)
language = _get_language_from_part3(part3)
return language
return _get_language(user_input, query_order)


def _query_db(table: str, field: str, x: str) -> sqlite3.Cursor:
return _DB.execute(f"SELECT * FROM {table} where {field} = ?", (x,)) # nosec


@functools.lru_cache()
def _guess_part3(user_input: str, query_order: Iterable[Tuple[str, str]]) -> str:
"""Guess the ISO 639-3 code.
def _get_language(
user_input: str, query_order: Optional[Iterable[Tuple[str, str]]] = None
) -> Language:
"""Create a ``Language`` instance.
Parameters
----------
user_input : str
query_order : Iterable[Tuple[str, str]
An iterable of (table, field) pairs to specify query order
The user-provided language code or name.
query_order : Iterable[Tuple[str, str], optional
An iterable of (table, field) pairs to specify query order.
If not provided, no queries are made and `part3` is assumed to be
an actual ISO 639-3 code.
Returns
-------
str
An ISO 639-3 code.
None if the user input isn't a language code or name.
"""
for table, field in query_order:
result = _query_db(table, field, user_input).fetchone()
if result:
language_id = result[0]
return language_id


@functools.lru_cache()
def _get_language_from_part3(part3: str) -> Union[Language, None]:
"""Create a ``Language`` instance.
Language
Parameters
----------
part3 : str
An ISO 639-3 code.
Returns
-------
Language or None
Raises
------
LanguageNotFoundError
If `part3` isn't a language name or code
"""
if not part3:
return None

if query_order is not None:
for table, field in query_order:
result = _query_db(table, field, user_input).fetchone()
if result:
part3 = result[0]
break
else:
raise LanguageNotFoundError(
f"{user_input!r} isn't an ISO language code or name"
)
else:
part3 = user_input

def query_for_id(table: str) -> sqlite3.Cursor:
id_field = "I_Id" if table == "macrolanguages" else "Id"
Expand Down Expand Up @@ -272,6 +263,6 @@ def query_for_id(table: str) -> sqlite3.Cursor:
@functools.lru_cache()
def _get_all_languages() -> List[Language]:
return [
_get_language_from_part3(part3)
_get_language(part3)
for part3 in [row[0] for row in _DB.execute("SELECT Id FROM codes").fetchall()]
]
11 changes: 7 additions & 4 deletions tests/test_language.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import datetime

from iso639 import Language, ALL_LANGUAGES, DATA_LAST_UPDATED
from iso639 import Language, ALL_LANGUAGES, DATA_LAST_UPDATED, LanguageNotFoundError
from iso639.language import Name

import pytest
Expand Down Expand Up @@ -59,9 +59,12 @@ def test_retired_codes():


def test_invalid_inputs():
assert Language.match("invalid input") is None
assert Language.from_part3("Fra") is None # case-sensitive!
assert Language.from_part3("unknown code") is None
with pytest.raises(LanguageNotFoundError):
Language.match("invalid input")
with pytest.raises(LanguageNotFoundError):
Language.from_part3("Fra") # case-sensitive!
with pytest.raises(LanguageNotFoundError):
Language.from_part3("unknown code")


def test_data_last_updated():
Expand Down

0 comments on commit 1afa5a8

Please sign in to comment.