## 2047. Number of Valid Words in a Sentence

https://leetcode.com/problems/number-of-valid-words-in-a-sentence/

A sentence consists of lowercase letters `('a' to 'z')`, digits `('0' to '9')`, hyphens `('-')`, punctuation marks `('!', '.', and ',')`, and spaces `(' ')` only. Each sentence can be broken down into one or more tokens separated by one or more spaces ' '.

A token is a valid word if all three of the following are true:

-    It only contains lowercase letters, hyphens, and/or punctuation (no digits).
-    There is at most one hyphen '-'. If present, it must be surrounded by lowercase characters ("a-b" is valid, but "-ab" and "ab-" are not valid).
-    There is at most one punctuation mark. If present, it must be at the end of the token ("ab,", "cd!", and "." are valid, but "a!b" and "c.," are not valid).

Examples of valid words include "a-b.", "afad", "ba-c", "a!", and "!".

Given a string sentence, return the number of valid words in sentence.

**Example 1:**

```
Input: sentence = "cat and  dog"
Output: 3
Explanation: The valid words in the sentence are "cat", "and", and "dog".
```

**Example 2:**

```
Input: sentence = "!this  1-s b8d!"
Output: 0
Explanation: There are no valid words in the sentence.
"!this" is invalid because it starts with a punctuation mark.
"1-s" and "b8d" are invalid because they contain digits.
```

**Example 3:**

```
Input: sentence = "alice and  bob are playing stone-game10"
Output: 5
Explanation: The valid words in the sentence are "alice", "and", "bob", "are", and "playing".
"stone-game10" is invalid because it contains digits.
```
 
**Constraints:**

-   1 <= sentence.length <= 1000
-    sentence only contains lowercase English letters, digits, ' ', '-', '!', '.', and ','.
-    There will be at least 1 token.

In [39]:
from enum import Enum
from typing import Dict, List
from string import ascii_lowercase


class Char(Enum):
    LETTER = 1
    HYPHEN = 2
    PUNCTUATION = 3
    UNKNOWN = 4


class Solution:
    def countValidWords(self, sentence: str) -> int:
        transitions: List[Dict[Char, int]] = [
            # 0: Beginning of the word.
            {Char.LETTER: 1, Char.PUNCTUATION: 4},

            # 1: Middle of the word.
            {Char.LETTER: 1, Char.HYPHEN: 2, Char.PUNCTUATION: 4},

            # 2: Previous character was a hyphen.
            {Char.LETTER: 3},

            # 3: Middle of the word after a hyphen.
            {Char.LETTER: 3, Char.PUNCTUATION: 4},

            # 4: Previous character was a punctuation mark.
            {}]
        initial = 0
        accepting = [1, 3, 4]

        def valid(token: str) -> bool:
            state = initial

            for char in token:
                if char in ascii_lowercase:
                    kind = Char.LETTER
                elif char == '-':
                    kind = Char.HYPHEN
                elif char in ('.', ',', '!'):
                    kind = Char.PUNCTUATION
                else:
                    kind = Char.UNKNOWN

                if kind not in transitions[state]:
                    return False

                state = transitions[state][kind]

            return state in accepting

        return sum(valid(token) for token in sentence.split())

In [45]:
import unittest
    
tests = [('cat  and dog', 3),
         ('!this  1-s b8d!', 0),
         ('alice and  bob are playing stone-game10', 5),
         ('what hath god wrought!', 4),
         ('bug-begone', 1),
         ('valid invalid!y ', 1),
         ('.', 1),
         ('', 0),
         (' ', 0)]


class Tests(unittest.TestCase):
    pass


def generator(inp, expected):
    def test(self):
        self.assertEqual(Solution().countValidWords(inp), expected)

    return test


for i, (inp, expected) in enumerate(tests):
    setattr(Tests, f'test_{i}', generator(inp, expected))
        
unittest.main(argv=[''], exit=False)

.........
----------------------------------------------------------------------
Ran 9 tests in 0.006s

OK


<unittest.main.TestProgram at 0x105395690>