Conversation
|
Important Review skippedReview was skipped due to path filters ⛔ Files ignored due to path filters (1)
CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including You can disable this status message by setting the WalkthroughThe changes introduce new model layers ( Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI (CBOW/MNIST)
participant Model
participant DataLoader
participant Trainer
User->>CLI (CBOW/MNIST): Run CLI command (train/infer/sample)
CLI (CBOW/MNIST)->>DataLoader: Load and preprocess data
CLI (CBOW/MNIST)->>Model: Initialize or load model
CLI (CBOW/MNIST)->>Trainer: (If training) Start training loop
Trainer->>Model: Forward/Backward propagation
Trainer->>Model: Update parameters
CLI (CBOW/MNIST)->>User: Output results (metrics, visualization, REPL)
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 9
🔭 Outside diff range comments (1)
mo_net/train/batcher.py (1)
47-48: Critical: Inconsistent batch calculation in StopIteration handler.The same integer division fix needs to be applied here. Currently, this code still uses the problematic float division that was fixed in the constructor.
- self._internal_iterator = zip( - iter(np.array_split(self.X, self.train_set_size / self.batch_size)), - iter(np.array_split(self.Y, self.train_set_size / self.batch_size)), - ) + num_batches = (self.train_set_size + self.batch_size - 1) // self.batch_size + self._internal_iterator = zip( + iter(np.array_split(self.X, num_batches)), + iter(np.array_split(self.Y, num_batches)), + )
🧹 Nitpick comments (8)
mo_net/model/layer/batch_norm/batch_norm.py (1)
270-277: Minor style issue: Unnecessary parentheses around dictionary argument.The extra parentheses around the dictionary argument are unnecessary. The dictionary can be passed directly to
update():- self._cache.update( - { - "input_activations": input_activations, - "mean": batch_mean, - "var": batch_variance, - "batch_size": input_activations.shape[0], - } - ) + self._cache.update({ + "input_activations": input_activations, + "mean": batch_mean, + "var": batch_variance, + "batch_size": input_activations.shape[0], + })mo_net/model/layer/layer_norm.py (1)
235-249: Minor style issue: Unnecessary parentheses around dictionary argument.The extra parentheses around the dictionary argument are unnecessary. The dictionary can be passed directly to
update():- self._cache.update( - { - "input_activations": input_activations, - "mean": np.mean( - input_activations, - axis=tuple(range(1, input_activations.ndim)), - keepdims=True, - ), - "var": np.var( - input_activations, - axis=tuple(range(1, input_activations.ndim)), - keepdims=True, - ), - } - ) + self._cache.update({ + "input_activations": input_activations, + "mean": np.mean( + input_activations, + axis=tuple(range(1, input_activations.ndim)), + keepdims=True, + ), + "var": np.var( + input_activations, + axis=tuple(range(1, input_activations.ndim)), + keepdims=True, + ), + })mo_net/model/layer/batch_norm/batch_norm_2d.py (1)
242-249: Minor style issue: Unnecessary parentheses around dictionary argument.The extra parentheses around the dictionary argument are unnecessary. The dictionary can be passed directly to
update():- self._cache.update( - { - "input_activations": input_activations, - "mean": batch_mean, - "var": batch_variance, - "batch_size": batch_size, - } - ) + self._cache.update({ + "input_activations": input_activations, + "mean": batch_mean, + "var": batch_variance, + "batch_size": batch_size, + })mo_net/tests/model/layer/test_average.py (1)
83-83: Add newline at end of file.- Average(input_dimensions=(2, 3), axis=()) + Average(input_dimensions=(2, 3), axis=()) +mo_net/samples/mnist.py (1)
148-152: Remove duplicate matplotlib calls.plt.tight_layout() plt.show() - plt.tight_layout() - plt.show()mo_net/model/layer/embedding.py (2)
105-105: Consider Python version compatibility for type alias syntax.The
typestatement is Python 3.12+ syntax. For broader compatibility, consider using the older syntax:-type ParametersType = Parameters +ParametersType = ParametersOr if you want to be more explicit:
-type ParametersType = Parameters +from typing import TypeAlias +ParametersType: TypeAlias = Parameters
113-113: Apply same type alias compatibility fix here.-type CacheType = Cache +CacheType = Cachemo_net/samples/cbow.py (1)
34-38: Consider improving token cleaning robustness.The regex pattern removes punctuation and non-printable characters, but the approach could be more explicit about what constitutes valid tokens.
-def clean_token(token: str) -> str: - """ - Remove non-printable characters and punctuation - """ - return re.sub(r"[^\w\s]|[^\x20-\x7E]", "", token).lower().strip() +def clean_token(token: str) -> str: + """ + Remove non-printable characters and punctuation, keeping only alphanumeric and spaces + """ + # Remove non-ASCII and punctuation, keep only word characters and whitespace + cleaned = re.sub(r"[^\w\s]", "", token) + # Remove non-printable characters + cleaned = re.sub(r"[^\x20-\x7E]", "", cleaned) + return cleaned.lower().strip()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
mo_net/cli.py(4 hunks)mo_net/config.py(0 hunks)mo_net/model/layer/__init__.py(2 hunks)mo_net/model/layer/average.py(1 hunks)mo_net/model/layer/batch_norm/batch_norm.py(1 hunks)mo_net/model/layer/batch_norm/batch_norm_2d.py(1 hunks)mo_net/model/layer/embedding.py(1 hunks)mo_net/model/layer/layer_norm.py(1 hunks)mo_net/protos.py(1 hunks)mo_net/samples/cbow.py(1 hunks)mo_net/samples/mnist.py(1 hunks)mo_net/tests/model/layer/test_average.py(1 hunks)mo_net/train/batcher.py(1 hunks)mo_net/train/trainer/trainer.py(4 hunks)mypy_plugins/__init__.py(1 hunks)mypy_plugins/d.py(2 hunks)
💤 Files with no reviewable changes (1)
- mo_net/config.py
🧰 Additional context used
🧬 Code Graph Analysis (3)
mo_net/model/layer/__init__.py (1)
mo_net/model/layer/average.py (1)
Average(12-90)
mo_net/tests/model/layer/test_average.py (1)
mo_net/model/layer/average.py (4)
Average(12-90)axis(82-83)serialize(85-90)deserialize(19-28)
mo_net/model/layer/average.py (2)
mo_net/protos.py (2)
D(41-49)d(173-174)mo_net/model/layer/base.py (1)
layer_id(87-89)
🪛 Ruff (0.11.9)
mo_net/protos.py
13-13: typing.Optional imported but unused
Remove unused import
(F401)
14-14: typing.Type imported but unused
Remove unused import
(F401)
15-15: typing.Union imported but unused
Remove unused import
(F401)
mo_net/samples/mnist.py
97-97: Use a context manager for opening files
(SIM115)
mo_net/samples/cbow.py
309-309: Use a context manager for opening files
(SIM115)
395-395: Use a context manager for opening files
(SIM115)
🪛 Pylint (3.3.7)
mo_net/model/layer/average.py
[error] 8-8: No name 'layer' in module 'mo_net.model'
(E0611)
[error] 9-9: Cannot import 'mo_net.protos' due to 'invalid syntax (mo_net.protos, line 67)'
(E0001)
[error] 9-9: No name 'protos' in module 'mo_net'
(E0611)
mo_net/model/layer/embedding.py
[error] 105-105: Parsing failed: 'invalid syntax (mo_net.model.layer.embedding, line 105)'
(E0001)
mo_net/samples/mnist.py
[refactor] 70-70: Too many local variables (29/15)
(R0914)
[refactor] 97-97: Consider using 'with' for resource-allocating operations
(R1732)
[refactor] 70-70: Too many statements (52/50)
(R0915)
mo_net/samples/cbow.py
[error] 16-16: No name 'layer' in module 'mo_net.model'
(E0611)
[error] 17-17: Cannot import 'mo_net.model.layer.embedding' due to 'invalid syntax (mo_net.model.layer.embedding, line 105)'
(E0001)
[error] 17-17: No name 'layer' in module 'mo_net.model'
(E0611)
[error] 18-18: Cannot import 'mo_net.model.layer.linear' due to 'invalid syntax (mo_net.model.layer.linear, line 156)'
(E0001)
[error] 18-18: No name 'layer' in module 'mo_net.model'
(E0611)
[error] 19-19: No name 'layer' in module 'mo_net.model'
(E0611)
[error] 20-20: No name 'layer' in module 'mo_net.model'
(E0611)
[error] 21-21: Cannot import 'mo_net.model.model' due to 'invalid syntax (mo_net.model.model, line 384)'
(E0001)
[error] 21-21: No name 'model' in module 'mo_net.model'
(E0611)
[error] 23-23: Cannot import 'mo_net.protos' due to 'invalid syntax (mo_net.protos, line 67)'
(E0001)
[error] 23-23: No name 'protos' in module 'mo_net'
(E0611)
[error] 28-28: Cannot import 'mo_net.train.trainer.trainer' due to 'invalid syntax (mo_net.train.trainer.trainer, line 42)'
(E0001)
[error] 28-28: No name 'trainer' in module 'mo_net.train.trainer'
(E0611)
[refactor] 196-196: Too many arguments (9/5)
(R0913)
[refactor] 196-196: Too many positional arguments (9/5)
(R0917)
[refactor] 196-196: Too many local variables (29/15)
(R0914)
[refactor] 214-214: Unnecessary use of a comprehension, use list(sentence.split()) instead.
(R1721)
[refactor] 296-296: Too many local variables (16/15)
(R0914)
[refactor] 309-309: Consider using 'with' for resource-allocating operations
(R1732)
[refactor] 382-382: Too many local variables (19/15)
(R0914)
[refactor] 395-395: Consider using 'with' for resource-allocating operations
(R1732)
🪛 Flake8 (7.2.0)
mo_net/model/layer/embedding.py
[error] 105-105: SyntaxError: invalid syntax
(E999)
mo_net/samples/cbow.py
[error] 60-60: whitespace before ':'
(E203)
[error] 60-60: whitespace before ':'
(E203)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: typecheck
🔇 Additional comments (18)
mypy_plugins/d.py (2)
29-31: LGTM: Improved method signature formatting.The multi-line method signature improves readability without changing functionality.
61-61: LGTM: Added newline at end of file.Following Python convention for newline at end of file.
mypy_plugins/__init__.py (1)
4-4: LGTM: Minor formatting improvement.Adding a trailing newline is a good practice and often required by linters.
mo_net/model/layer/__init__.py (2)
2-2: LGTM: Proper integration of new Average layer.The import follows the existing pattern and correctly integrates the new
Averagelayer into the package.
13-13: LGTM: Correctly exposed in public API.The
Averagelayer is properly added to the__all__list, making it available when importing from this package.mo_net/train/batcher.py (1)
23-27: Excellent fix for integer division issue.The change correctly addresses the problem of passing float values to
np.array_split()by using integer division with proper ceiling behavior:(self.train_set_size + self.batch_size - 1) // self.batch_size. This ensures all samples are included in batches.mo_net/train/trainer/trainer.py (4)
82-82: LGTM: Clear type alias for transformation functions.The
TransformFntype alias improves code readability and type safety.
94-94: LGTM: Clean parameter design.The optional
transformparameter provides better flexibility than the previous boolean flag approach.
115-115: LGTM: Simplified transform handling.Directly passing the transform to the Batcher is cleaner than the previous conditional logic with partial application.
280-280: LGTM: Improved error logging.Adding error logging when a check fails will help with debugging training issues.
mo_net/tests/model/layer/test_average.py (1)
1-83: LGTM! Comprehensive test coverage for the Average layer.The test suite thoroughly covers forward propagation, backward propagation, serialization/deserialization, and error handling for the
Averagelayer. The tests correctly verify the mathematical operations and gradient flow.mo_net/cli.py (2)
254-290: Good refactoring to make model creation more flexible.The changes correctly:
- Add
Y_trainparameter to dynamically determine output dimensions- Remove hardcoded output class constants
- Maintain proper error handling for missing dims when creating new models
343-344: Clear and concise error message.mo_net/model/layer/average.py (1)
1-91: Well-implemented Average layer with proper gradient handling.The implementation correctly:
- Validates axis parameters in the constructor
- Adjusts axes by +1 to account for batch dimension in forward/backward propagation
- Properly broadcasts and scales gradients in backward propagation
- Implements serialization/deserialization
The static analysis import errors appear to be false positives.
mo_net/model/layer/embedding.py (1)
116-273: Excellent implementation of the Embedding layer.The implementation includes:
- Comprehensive arithmetic operations for parameters
- Proper gradient accumulation using
np.add.atfor sparse updates- Optional gradient clipping for training stability
- Well-structured serialization/deserialization
- Proper validation of embedding matrix dimensions
mo_net/samples/cbow.py (3)
67-123: Model architecture looks sound, but verify layer dimensions.The CBOW model architecture follows standard practices with embedding → averaging → linear layers. The dimension calculations appear correct.
The model correctly:
- Uses context_size * 2 for input dimensions (words before + after target)
- Applies embedding layer followed by averaging over context words
- Uses Xavier initialization for both embedding and linear layers
- Includes proper tracing support
52-64: Fix potential index error in context window creation.The context window creation logic has an off-by-one error that could cause index out of bounds issues.
def get_training_set( sentences: Collection[Sequence[int]], context_size: int, vocab_size: int ) -> tuple[np.ndarray, np.ndarray]: contexts = [] targets = [] for sentence in sentences: for i in range(context_size, len(sentence) - context_size): context = ( - sentence[i - context_size : i] + sentence[i + 1 : i + context_size + 1] + sentence[i - context_size : i] + sentence[i + 1 : i + 1 + context_size] ) targets.append(sentence[i]) contexts.append(context) return np.array(contexts), np.eye(vocab_size)[targets]The original slicing
sentence[i + 1 : i + context_size + 1]is correct, but the logic could be clearer by ensuring the context has exactly2 * context_sizeelements.Likely an incorrect or invalid review comment.
1-28: Imports Verified – all referencedmo_net.model.layermodules exist and compileAll modules imported by
mo_net/samples/cbow.pyare present and pass syntax checks:
- mo_net/model/layer/average.py
- mo_net/model/layer/embedding.py
- mo_net/model/layer/linear.py
- mo_net/model/layer/output.py
- mo_net/model/layer/reshape.py
No import errors remain; no changes required.
| from typing import ( | ||
| Generic, | ||
| NewType, | ||
| Protocol, | ||
| Self, | ||
| TypedDict, | ||
| TypeVar, | ||
| cast, | ||
| Optional, | ||
| Type as TypingType, | ||
| Union, | ||
| ) |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Clean up unused imports.
The multi-line import formatting improves readability. However, static analysis indicates that Optional, Type, and Union are imported but unused and should be removed.
from typing import (
Generic,
NewType,
Protocol,
Self,
TypedDict,
TypeVar,
cast,
- Optional,
- Type as TypingType,
- Union,
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from typing import ( | |
| Generic, | |
| NewType, | |
| Protocol, | |
| Self, | |
| TypedDict, | |
| TypeVar, | |
| cast, | |
| Optional, | |
| Type as TypingType, | |
| Union, | |
| ) | |
| from typing import ( | |
| Generic, | |
| NewType, | |
| Protocol, | |
| Self, | |
| TypedDict, | |
| TypeVar, | |
| cast, | |
| ) |
🧰 Tools
🪛 Ruff (0.11.9)
13-13: typing.Optional imported but unused
Remove unused import
(F401)
14-14: typing.Type imported but unused
Remove unused import
(F401)
15-15: typing.Union imported but unused
Remove unused import
(F401)
🤖 Prompt for AI Agents
In mo_net/protos.py between lines 5 and 16, the imports Optional, Type, and
Union are not used anywhere in the file. Remove these three imports from the
multi-line import statement to clean up unused imports and improve code clarity.
| precision = np.sum(Y_test_pred == Y_test_true) / len(Y_test_pred) | ||
| recall = np.sum(Y_test_true == Y_test_pred) / len(Y_test_true) | ||
| f1_score = 2 * precision * recall / (precision + recall) | ||
| logger.info(f"F1 Score: {f1_score}") |
There was a problem hiding this comment.
Fix incorrect F1 score calculation.
The current implementation calculates accuracy instead of precision and recall. For multi-class classification, you need per-class or weighted F1 scores.
- precision = np.sum(Y_test_pred == Y_test_true) / len(Y_test_pred)
- recall = np.sum(Y_test_true == Y_test_pred) / len(Y_test_true)
- f1_score = 2 * precision * recall / (precision + recall)
- logger.info(f"F1 Score: {f1_score}")
+ from sklearn.metrics import f1_score
+ f1 = f1_score(Y_test_true, Y_test_pred, average='weighted')
+ logger.info(f"F1 Score (weighted): {f1}")Alternatively, if you want to avoid the sklearn dependency:
# Calculate per-class F1 scores manually
f1_scores = []
for label in range(N_DIGITS):
true_positives = np.sum((Y_test_pred == label) & (Y_test_true == label))
false_positives = np.sum((Y_test_pred == label) & (Y_test_true != label))
false_negatives = np.sum((Y_test_pred != label) & (Y_test_true == label))
precision = true_positives / (true_positives + false_positives) if (true_positives + false_positives) > 0 else 0
recall = true_positives / (true_positives + false_negatives) if (true_positives + false_negatives) > 0 else 0
f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0
f1_scores.append(f1)
# Weighted average F1 score
class_counts = [np.sum(Y_test_true == label) for label in range(N_DIGITS)]
weighted_f1 = np.average(f1_scores, weights=class_counts)
logger.info(f"F1 Score (weighted): {weighted_f1}")🤖 Prompt for AI Agents
In mo_net/samples/mnist.py around lines 112 to 115, the current code incorrectly
calculates precision and recall as overall accuracy, leading to an incorrect F1
score. To fix this, implement per-class precision and recall by counting true
positives, false positives, and false negatives for each class label, then
compute the F1 score for each class. Finally, calculate a weighted average F1
score using the class counts as weights and log this weighted F1 score instead
of the current incorrect value.
| logger.error(f"File not found: {model_path}") | ||
| sys.exit(1) | ||
|
|
||
| model = Model.load(open(model_path, "rb")) |
There was a problem hiding this comment.
Use a context manager for file operations.
- model = Model.load(open(model_path, "rb"))
+ with open(model_path, "rb") as f:
+ model = Model.load(f)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| model = Model.load(open(model_path, "rb")) | |
| with open(model_path, "rb") as f: | |
| model = Model.load(f) |
🧰 Tools
🪛 Ruff (0.11.9)
97-97: Use a context manager for opening files
(SIM115)
🪛 Pylint (3.3.7)
[refactor] 97-97: Consider using 'with' for resource-allocating operations
(R1732)
🤖 Prompt for AI Agents
In mo_net/samples/mnist.py at line 97, the file is opened without using a
context manager, which can lead to resource leaks. Replace the open call with a
with statement to ensure the file is properly closed after loading the model.
Use "with open(model_path, 'rb') as f:" and then load the model from the file
object f.
mo_net/samples/cbow.py
Outdated
| vocab = vocab_data["vocab"] | ||
| token_to_id = vocab_data["token_to_id"] | ||
|
|
||
| model = Model.load(open(model_path, "rb"), training=False) |
There was a problem hiding this comment.
Use context manager for file operations.
Same issue as above - unsafe file handling.
- model = Model.load(open(model_path, "rb"), training=False)
+ with open(model_path, "rb") as f:
+ model = Model.load(f, training=False)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| model = Model.load(open(model_path, "rb"), training=False) | |
| with open(model_path, "rb") as f: | |
| model = Model.load(f, training=False) |
🧰 Tools
🪛 Ruff (0.11.9)
395-395: Use a context manager for opening files
(SIM115)
🪛 Pylint (3.3.7)
[refactor] 395-395: Consider using 'with' for resource-allocating operations
(R1732)
🤖 Prompt for AI Agents
In mo_net/samples/cbow.py at line 395, the file is opened without using a
context manager, which can lead to unsafe file handling and resource leaks.
Replace the open call with a with statement to ensure the file is properly
closed after loading the model. This means wrapping the open call inside a with
block and passing the file object to Model.load within that block.
mo_net/samples/cbow.py
Outdated
| vocab = vocab_data["vocab"] | ||
| token_to_id = vocab_data["token_to_id"] | ||
|
|
||
| model = Model.load(open(model_path, "rb"), training=False) |
There was a problem hiding this comment.
Use context manager for file operations.
Static analysis correctly identifies unsafe file handling that could lead to resource leaks.
- model = Model.load(open(model_path, "rb"), training=False)
+ with open(model_path, "rb") as f:
+ model = Model.load(f, training=False)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| model = Model.load(open(model_path, "rb"), training=False) | |
| with open(model_path, "rb") as f: | |
| model = Model.load(f, training=False) |
🧰 Tools
🪛 Ruff (0.11.9)
309-309: Use a context manager for opening files
(SIM115)
🪛 Pylint (3.3.7)
[refactor] 309-309: Consider using 'with' for resource-allocating operations
(R1732)
🤖 Prompt for AI Agents
In mo_net/samples/cbow.py at line 309, the file is opened without using a
context manager, which risks resource leaks. Replace the open call with a with
statement to ensure the file is properly closed after loading the model. This
involves wrapping the open call inside a with block and passing the file handle
to Model.load.
mo_net/samples/cbow.py
Outdated
| similarity = np.dot(word1_embedding, word2_embedding) / ( | ||
| np.linalg.norm(word1_embedding) * np.linalg.norm(word2_embedding) | ||
| ) |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Add numerical stability to cosine similarity calculation.
The cosine similarity calculation could fail with zero-magnitude vectors or cause numerical instability.
- similarity = np.dot(word1_embedding, word2_embedding) / (
- np.linalg.norm(word1_embedding) * np.linalg.norm(word2_embedding)
- )
+ norm1 = np.linalg.norm(word1_embedding)
+ norm2 = np.linalg.norm(word2_embedding)
+
+ if norm1 == 0 or norm2 == 0:
+ similarity = 0.0
+ else:
+ similarity = np.dot(word1_embedding, word2_embedding) / (norm1 * norm2)
+ # Clamp to [-1, 1] for numerical stability
+ similarity = np.clip(similarity, -1.0, 1.0)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| similarity = np.dot(word1_embedding, word2_embedding) / ( | |
| np.linalg.norm(word1_embedding) * np.linalg.norm(word2_embedding) | |
| ) | |
| norm1 = np.linalg.norm(word1_embedding) | |
| norm2 = np.linalg.norm(word2_embedding) | |
| if norm1 == 0 or norm2 == 0: | |
| similarity = 0.0 | |
| else: | |
| similarity = np.dot(word1_embedding, word2_embedding) / (norm1 * norm2) | |
| # Clamp to [-1, 1] for numerical stability | |
| similarity = np.clip(similarity, -1.0, 1.0) |
🤖 Prompt for AI Agents
In mo_net/samples/cbow.py around lines 349 to 351, the cosine similarity
calculation can be unstable or fail if either word embedding vector has zero
magnitude. To fix this, add a small epsilon value to the denominator when
computing the norms to prevent division by zero and improve numerical stability.
This ensures the calculation safely handles zero or near-zero magnitude vectors
without errors.
| def train( | ||
| embedding_dim: int, | ||
| context_size: int, | ||
| batch_size: int, | ||
| num_epochs: int, | ||
| learning_rate: float, | ||
| warmup_epochs: int, | ||
| max_sentences: int, | ||
| model_output_path: Path | None, | ||
| log_level: str, | ||
| ): | ||
| """Train a CBOW model on Shakespeare text""" | ||
| logger.remove() | ||
| logger.add(lambda msg: click.echo(msg, err=True), level=log_level) | ||
|
|
||
| shakespeare = get_resource("s3://mo-net-resources/shakespeare.txt").read_text() | ||
| sentences = shakespeare.split("\n")[:max_sentences] | ||
| token_ids = [ | ||
| [token for token in sentence.split()] for sentence in sentences if sentence | ||
| ] | ||
| vocab = sorted(set(token for sentence in sentences for token in sentence.split())) | ||
| token_to_id = {token: i for i, token in enumerate(vocab)} | ||
| token_ids = [[token_to_id[token] for token in sentence] for sentence in token_ids] | ||
| vocab_size = len(vocab) | ||
| X_train, Y_train = get_training_set(token_ids, context_size, vocab_size) | ||
|
|
||
| logger.info(f"Vocabulary size: {vocab_size}") | ||
| logger.info(f"Embedding dimension: {embedding_dim}") | ||
| logger.info(f"Context size: {context_size}") | ||
| logger.info(f"Training samples: {len(X_train)}") | ||
|
|
||
| model = CBOWModel.create( | ||
| vocab_size=vocab_size, | ||
| embedding_dim=embedding_dim, | ||
| context_size=context_size, | ||
| tracing_enabled=False, | ||
| ) | ||
|
|
||
| training_parameters = TrainingParameters( | ||
| batch_size=batch_size, | ||
| dropout_keep_probs=(), | ||
| history_max_len=100, | ||
| learning_rate_limits=(learning_rate, learning_rate), | ||
| log_level=log_level, | ||
| max_restarts=0, | ||
| monotonic=False, | ||
| no_monitoring=True, | ||
| normalisation_type=NormalisationType.NONE, | ||
| num_epochs=num_epochs, | ||
| quiet=False, | ||
| regulariser_lambda=0.0, | ||
| trace_logging=False, | ||
| train_set_size=len(X_train), | ||
| warmup_epochs=warmup_epochs, | ||
| workers=0, | ||
| ) | ||
|
|
||
| train_size = int(0.8 * len(X_train)) | ||
| X_train_split = X_train[:train_size] | ||
| Y_train_split = Y_train[:train_size] | ||
| X_val = X_train[train_size:] | ||
| Y_val = Y_train[train_size:] | ||
|
|
||
| run = TrainingRun(seed=42, backend=CsvBackend(path=DATA_DIR / "cbow.csv")) | ||
| optimizer = get_optimizer("adam", model, training_parameters) | ||
|
|
||
| trainer = BasicTrainer( | ||
| X_train=X_train_split, | ||
| Y_train=Y_train_split, | ||
| X_val=X_val, | ||
| Y_val=Y_val, | ||
| model=model, | ||
| optimizer=optimizer, | ||
| run=run, | ||
| training_parameters=training_parameters, | ||
| ) | ||
|
|
||
| logger.info(f"Starting CBOW training with {len(X_train_split)} training samples") | ||
| result = trainer.train() | ||
|
|
||
| if hasattr(result, "model_checkpoint_path"): | ||
| if model_output_path is None: | ||
| model_output_path = DATA_DIR / "output" / "cbow_model.pkl" | ||
| result.model_checkpoint_path.rename(model_output_path) | ||
| logger.info(f"Training completed. Model saved to: {model_output_path}") | ||
|
|
||
| vocab_path = model_output_path.with_suffix(".vocab") | ||
| np.save(vocab_path, {"vocab": vocab, "token_to_id": token_to_id}) | ||
| logger.info(f"Vocabulary saved to: {vocab_path}") | ||
| else: | ||
| logger.error(f"Training failed: {result}") | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Reduce function complexity and improve resource management.
The train function has high complexity with many parameters and local variables. Consider extracting helper functions.
Key issues to address:
- Vocabulary processing: Extract to separate function
- Model setup: Extract configuration logic
- Data splitting: Use more robust validation split
def train(
embedding_dim: int,
context_size: int,
batch_size: int,
num_epochs: int,
learning_rate: float,
warmup_epochs: int,
max_sentences: int,
model_output_path: Path | None,
log_level: str,
):
"""Train a CBOW model on Shakespeare text"""
logger.remove()
logger.add(lambda msg: click.echo(msg, err=True), level=log_level)
+ # Extract vocabulary processing
+ vocab, token_to_id, token_ids = _prepare_vocabulary(max_sentences)
+ vocab_size = len(vocab)
- shakespeare = get_resource("s3://mo-net-resources/shakespeare.txt").read_text()
- sentences = shakespeare.split("\n")[:max_sentences]
- token_ids = [
- [token for token in sentence.split()] for sentence in sentences if sentence
- ]
- vocab = sorted(set(token for sentence in sentences for token in sentence.split()))
- token_to_id = {token: i for i, token in enumerate(vocab)}
- token_ids = [[token_to_id[token] for token in sentence] for sentence in token_ids]
- vocab_size = len(vocab)
X_train, Y_train = get_training_set(token_ids, context_size, vocab_size)
# ... rest of functionCommittable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 196-196: Too many arguments (9/5)
(R0913)
[refactor] 196-196: Too many positional arguments (9/5)
(R0917)
[refactor] 196-196: Too many local variables (29/15)
(R0914)
[refactor] 214-214: Unnecessary use of a comprehension, use list(sentence.split()) instead.
(R1721)
🤖 Prompt for AI Agents
In mo_net/samples/cbow.py from lines 196 to 287, the train function is overly
complex with many parameters and local variables. To fix this, extract the
vocabulary processing logic (building vocab, token_to_id, and token_ids) into a
separate helper function. Similarly, move the model creation and training
parameter setup into another helper function to isolate configuration logic. For
data splitting, replace the manual slicing with a more robust method such as
using sklearn's train_test_split to ensure better validation splitting and
reproducibility.
mo_net/samples/cbow.py
Outdated
| if hasattr(result, "model_checkpoint_path"): | ||
| if model_output_path is None: | ||
| model_output_path = DATA_DIR / "output" / "cbow_model.pkl" | ||
| result.model_checkpoint_path.rename(model_output_path) | ||
| logger.info(f"Training completed. Model saved to: {model_output_path}") | ||
|
|
||
| vocab_path = model_output_path.with_suffix(".vocab") | ||
| np.save(vocab_path, {"vocab": vocab, "token_to_id": token_to_id}) | ||
| logger.info(f"Vocabulary saved to: {vocab_path}") | ||
| else: | ||
| logger.error(f"Training failed: {result}") | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Improve error handling and model checkpoint validation.
The model saving logic assumes the training result has a checkpoint path without proper validation.
- if hasattr(result, "model_checkpoint_path"):
+ if hasattr(result, "model_checkpoint_path") and result.model_checkpoint_path is not None:
if model_output_path is None:
model_output_path = DATA_DIR / "output" / "cbow_model.pkl"
+
+ # Ensure output directory exists
+ model_output_path.parent.mkdir(parents=True, exist_ok=True)
+
result.model_checkpoint_path.rename(model_output_path)
logger.info(f"Training completed. Model saved to: {model_output_path}")
vocab_path = model_output_path.with_suffix(".vocab")
np.save(vocab_path, {"vocab": vocab, "token_to_id": token_to_id})
logger.info(f"Vocabulary saved to: {vocab_path}")
else:
- logger.error(f"Training failed: {result}")
+ logger.error(f"Training failed or no model checkpoint available: {result}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if hasattr(result, "model_checkpoint_path"): | |
| if model_output_path is None: | |
| model_output_path = DATA_DIR / "output" / "cbow_model.pkl" | |
| result.model_checkpoint_path.rename(model_output_path) | |
| logger.info(f"Training completed. Model saved to: {model_output_path}") | |
| vocab_path = model_output_path.with_suffix(".vocab") | |
| np.save(vocab_path, {"vocab": vocab, "token_to_id": token_to_id}) | |
| logger.info(f"Vocabulary saved to: {vocab_path}") | |
| else: | |
| logger.error(f"Training failed: {result}") | |
| if hasattr(result, "model_checkpoint_path") and result.model_checkpoint_path is not None: | |
| if model_output_path is None: | |
| model_output_path = DATA_DIR / "output" / "cbow_model.pkl" | |
| # Ensure output directory exists | |
| model_output_path.parent.mkdir(parents=True, exist_ok=True) | |
| result.model_checkpoint_path.rename(model_output_path) | |
| logger.info(f"Training completed. Model saved to: {model_output_path}") | |
| vocab_path = model_output_path.with_suffix(".vocab") | |
| np.save(vocab_path, {"vocab": vocab, "token_to_id": token_to_id}) | |
| logger.info(f"Vocabulary saved to: {vocab_path}") | |
| else: | |
| logger.error(f"Training failed or no model checkpoint available: {result}") |
🤖 Prompt for AI Agents
In mo_net/samples/cbow.py around lines 276 to 287, the code assumes result has a
valid model_checkpoint_path without verifying it. Improve error handling by
explicitly checking if result.model_checkpoint_path exists and is a valid path
before attempting to rename it. If the checkpoint path is missing or invalid,
log an appropriate error message and avoid proceeding with saving the model and
vocabulary files.
mo_net/samples/cbow.py
Outdated
| similarity = np.dot(word_embedding, other_embedding) / ( | ||
| np.linalg.norm(word_embedding) * np.linalg.norm(other_embedding) | ||
| ) |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Extract similarity calculation to avoid code duplication.
The same cosine similarity calculation is repeated in multiple places. Extract to a helper function.
Add this helper function at the top of the file:
+def cosine_similarity(vec1: np.ndarray, vec2: np.ndarray) -> float:
+ """Calculate cosine similarity between two vectors with numerical stability."""
+ norm1 = np.linalg.norm(vec1)
+ norm2 = np.linalg.norm(vec2)
+
+ if norm1 == 0 or norm2 == 0:
+ return 0.0
+
+ similarity = np.dot(vec1, vec2) / (norm1 * norm2)
+ return np.clip(similarity, -1.0, 1.0)Then replace the duplicated calculations:
- similarity = np.dot(word_embedding, other_embedding) / (
- np.linalg.norm(word_embedding) * np.linalg.norm(other_embedding)
- )
+ similarity = cosine_similarity(word_embedding, other_embedding)🤖 Prompt for AI Agents
In mo_net/samples/cbow.py around lines 415 to 417, the cosine similarity
calculation is duplicated. Extract this calculation into a helper function
placed near the top of the file that takes two embeddings as input and returns
their cosine similarity. Replace all instances of the duplicated code with calls
to this new helper function to improve code reuse and maintainability.
8905d1a to
1d0b7e7
Compare
Currently, the library is overfocused on mnist. The affine_transform is a footgun when training on other data. Removing it to force specification of a transformation separately. - Move towards manually specifying a transformation. - Remove no_transform.
- Move cbow into samples. - Remove mnist code from cli. - Add mnist module sample.
- Streamline code - Make a Vocab class - Simplify/streamline code
Fix a bug in attach_weight_decay_regulariser
1d0b7e7 to
ccdd9c4
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (7)
mo_net/protos.py (1)
5-16: Address the unused imports as previously identified.The multi-line import formatting improves readability. However, the unused imports (
Optional,Type,Union) identified in the previous review remain unaddressed.mo_net/samples/mnist.py (2)
97-97: Use a context manager for file operations.
112-115: Fix incorrect F1 score calculation.mo_net/samples/cbow.py (4)
375-375: Use context manager for file operations.
461-461: Use context manager for file operations.
415-417: Add numerical stability to cosine similarity calculation.
480-482: Extract similarity calculation to avoid code duplication.
🧹 Nitpick comments (1)
mo_net/samples/mnist.py (1)
150-151: Remove duplicate plotting calls.The
plt.tight_layout()andplt.show()calls are duplicated, which is unnecessary and could cause display issues.Apply this diff:
plt.tight_layout() plt.show() - plt.tight_layout() - plt.show()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (18)
mo_net/cli.py(4 hunks)mo_net/config.py(0 hunks)mo_net/model/layer/__init__.py(2 hunks)mo_net/model/layer/average.py(1 hunks)mo_net/model/layer/batch_norm/batch_norm.py(1 hunks)mo_net/model/layer/batch_norm/batch_norm_2d.py(1 hunks)mo_net/model/layer/embedding.py(1 hunks)mo_net/model/layer/layer_norm.py(1 hunks)mo_net/protos.py(1 hunks)mo_net/regulariser/weight_decay.py(1 hunks)mo_net/samples/cbow.py(1 hunks)mo_net/samples/mnist.py(1 hunks)mo_net/tests/model/layer/test_average.py(1 hunks)mo_net/train/batcher.py(1 hunks)mo_net/train/trainer/trainer.py(4 hunks)mypy_plugins/__init__.py(1 hunks)mypy_plugins/d.py(2 hunks)pyproject.toml(1 hunks)
💤 Files with no reviewable changes (1)
- mo_net/config.py
✅ Files skipped from review due to trivial changes (2)
- pyproject.toml
- mo_net/regulariser/weight_decay.py
🚧 Files skipped from review as they are similar to previous changes (9)
- mypy_plugins/init.py
- mo_net/model/layer/init.py
- mo_net/model/layer/batch_norm/batch_norm.py
- mo_net/model/layer/layer_norm.py
- mo_net/model/layer/batch_norm/batch_norm_2d.py
- mypy_plugins/d.py
- mo_net/tests/model/layer/test_average.py
- mo_net/train/trainer/trainer.py
- mo_net/train/batcher.py
🧰 Additional context used
🧬 Code Graph Analysis (2)
mo_net/model/layer/average.py (2)
mo_net/protos.py (2)
D(41-49)d(173-174)mo_net/model/layer/base.py (1)
layer_id(87-89)
mo_net/model/layer/embedding.py (3)
mo_net/model/layer/base.py (5)
BadLayerId(48-48)ParametrisedHidden(111-144)layer_id(87-89)_write_header(142-144)get_layer_id(134-140)mo_net/protos.py (13)
D(41-49)GradCache(127-128)GradLayer(135-144)SupportsGradientOperations(90-105)d(173-174)input_dimensions(190-190)output_dimensions(193-193)parameters(137-137)empty_gradient(144-144)serialize(165-165)gradient_operation(142-142)cache(140-140)parameter_count(116-116)mo_net/samples/cbow.py (2)
embeddings(182-183)serialize(50-53)
🪛 Ruff (0.11.9)
mo_net/samples/cbow.py
101-115: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
375-375: Use a context manager for opening files
(SIM115)
461-461: Use a context manager for opening files
(SIM115)
mo_net/protos.py
13-13: typing.Optional imported but unused
Remove unused import
(F401)
14-14: typing.Type imported but unused
Remove unused import
(F401)
15-15: typing.Union imported but unused
Remove unused import
(F401)
mo_net/samples/mnist.py
97-97: Use a context manager for opening files
(SIM115)
🪛 Flake8 (7.2.0)
mo_net/samples/cbow.py
[error] 106-106: whitespace before ':'
(E203)
[error] 107-107: whitespace before ':'
(E203)
mo_net/model/layer/embedding.py
[error] 105-105: SyntaxError: invalid syntax
(E999)
🪛 Pylint (3.3.7)
mo_net/samples/cbow.py
[error] 20-20: No name 'layer' in module 'mo_net.model'
(E0611)
[error] 21-21: No name 'layer' in module 'mo_net.model'
(E0611)
[error] 22-22: Cannot import 'mo_net.model.layer.embedding' due to 'invalid syntax (mo_net.model.layer.embedding, line 105)'
(E0001)
[error] 22-22: No name 'layer' in module 'mo_net.model'
(E0611)
[error] 23-23: Cannot import 'mo_net.model.layer.linear' due to 'invalid syntax (mo_net.model.layer.linear, line 156)'
(E0001)
[error] 23-23: No name 'layer' in module 'mo_net.model'
(E0611)
[error] 24-24: No name 'layer' in module 'mo_net.model'
(E0611)
[error] 25-25: No name 'layer' in module 'mo_net.model'
(E0611)
[error] 26-26: Cannot import 'mo_net.model.model' due to 'invalid syntax (mo_net.model.model, line 384)'
(E0001)
[error] 26-26: No name 'model' in module 'mo_net.model'
(E0611)
[error] 28-28: Cannot import 'mo_net.protos' due to 'invalid syntax (mo_net.protos, line 67)'
(E0001)
[error] 28-28: No name 'protos' in module 'mo_net'
(E0611)
[error] 34-34: Cannot import 'mo_net.train.trainer.trainer' due to 'invalid syntax (mo_net.train.trainer.trainer, line 42)'
(E0001)
[error] 34-39: No name 'trainer' in module 'mo_net.train.trainer'
(E0611)
[refactor] 257-257: Too many arguments (9/5)
(R0913)
[refactor] 257-257: Too many positional arguments (9/5)
(R0917)
[refactor] 257-257: Too many local variables (28/15)
(R0914)
[refactor] 375-375: Consider using 'with' for resource-allocating operations
(R1732)
[refactor] 448-448: Too many local variables (18/15)
(R0914)
[refactor] 461-461: Consider using 'with' for resource-allocating operations
(R1732)
mo_net/model/layer/average.py
[error] 8-8: No name 'layer' in module 'mo_net.model'
(E0611)
[error] 9-9: Cannot import 'mo_net.protos' due to 'invalid syntax (mo_net.protos, line 67)'
(E0001)
[error] 9-9: No name 'protos' in module 'mo_net'
(E0611)
mo_net/model/layer/embedding.py
[error] 105-105: Parsing failed: 'invalid syntax (mo_net.model.layer.embedding, line 105)'
(E0001)
mo_net/samples/mnist.py
[refactor] 70-70: Too many local variables (29/15)
(R0914)
[refactor] 97-97: Consider using 'with' for resource-allocating operations
(R1732)
[refactor] 70-70: Too many statements (52/50)
(R0915)
🔇 Additional comments (7)
mo_net/model/layer/average.py (1)
12-91: LGTM! Well-implemented averaging layer.The
Averagelayer implementation is solid with proper:
- Input validation for axes ranges
- Batch dimension handling in forward/backward propagation
- Gradient reconstruction through broadcasting
- Serialization support
The logic correctly accounts for the batch dimension by adding +1 to axes during computation and properly distributes gradients during backpropagation.
mo_net/model/layer/embedding.py (3)
105-105: Type alias syntax is correct for Python 3.12+.The syntax error flagged by static analysis is a false positive. The
type ParametersType = Parameterssyntax is valid Python 3.12+ type alias syntax.
24-103: LGTM! Well-designed parameter class with gradient operations.The
Parametersdataclass properly implementsSupportsGradientOperationswith:
- Complete arithmetic operations for gradient computation
- Proper handling of division by zero with EPSILON
- Multiple initialization methods (random, xavier, he)
- Appropriate parameter validation
The implementation follows the gradient computation protocol correctly.
116-273: LGTM! Comprehensive embedding layer implementation.The embedding layer is well-implemented with:
- Proper parameter shape validation during initialization
- Correct forward propagation mapping indices to embeddings
- Gradient accumulation with optional clipping in backward propagation
- Complete serialization/deserialization support
- Appropriate cache management
The gradient clipping logic and parameter update mechanisms are correctly implemented.
mo_net/cli.py (2)
254-296: LGTM! Clean refactoring removes quickstart complexity.The changes successfully simplify the CLI by:
- Deriving output dimensions directly from
Y_train.shape[1]- Removing quickstart branching logic in model creation
- Requiring explicit dataset URLs instead of inference
This makes the interface more explicit and predictable.
343-344: LGTM! Explicit dataset URL requirement improves clarity.Requiring an explicit dataset URL instead of inferring from quickstart templates makes the CLI more predictable and removes implicit behavior.
mo_net/samples/cbow.py (1)
453-456: Fix inconsistent vocabulary file extension.The
samplecommand expects.vocab.npyextension but thetraincommand saves with.vocabextension. This will cause the sample command to fail.Apply this diff to make the extensions consistent:
- vocab_path = model_path.with_suffix(".vocab.npy") + vocab_path = model_path.with_suffix(".vocab") if not vocab_path.exists(): raise click.ClickException(f"Vocabulary file not found: {vocab_path}") - vocab_data = np.load(vocab_path, allow_pickle=True).item() - vocab = vocab_data["vocab"] - token_to_id = vocab_data["token_to_id"] + vocab = Vocab.deserialize(vocab_path) + token_to_id = vocab.token_to_idLikely an incorrect or invalid review comment.
| sample_indices = sample(np.where(Y_test_true != Y_test_pred)[0], 25) | ||
| for idx, i in enumerate(sample_indices): | ||
| plt.subplot(8, 5, idx + 1) | ||
| plt.imshow(X_test[i].reshape(MNIST_IMAGE_SIZE, MNIST_IMAGE_SIZE), cmap="gray") | ||
| plt.title(f"Pred: {Y_test_pred[i]}, True: {Y_test_true[i]}") | ||
| plt.axis("off") |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Verify subplot indexing for 25 images in grid.
The subplot indexing uses (8, 5, idx + 1) but displays only 25 images. For a 5x5 grid layout, consider using (5, 5, idx + 1) instead to ensure proper arrangement.
🏁 Script executed:
#!/bin/bash
# Verify if the subplot configuration properly displays 25 images
# The current code uses (8, 5) which creates 40 subplot positions for only 25 images
echo "Current subplot configuration: (8, 5) = 40 positions for 25 images"
echo "Suggested configuration: (5, 5) = 25 positions for 25 images"
echo "Verify the intended layout matches the image count"Length of output: 384
Use a 5×5 grid for 25 sample images
The current call allocates an 8×5 grid (40 slots) but only fills 25. To arrange the misclassified digits in a square layout, switch to 5 rows × 5 columns:
• File: mo_net/samples/mnist.py
• Lines: ≈120–125
Suggested change:
- plt.subplot(8, 5, idx + 1)
+ plt.subplot(5, 5, idx + 1)This ensures exactly 25 subplot slots and a clean square display.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| sample_indices = sample(np.where(Y_test_true != Y_test_pred)[0], 25) | |
| for idx, i in enumerate(sample_indices): | |
| plt.subplot(8, 5, idx + 1) | |
| plt.imshow(X_test[i].reshape(MNIST_IMAGE_SIZE, MNIST_IMAGE_SIZE), cmap="gray") | |
| plt.title(f"Pred: {Y_test_pred[i]}, True: {Y_test_true[i]}") | |
| plt.axis("off") | |
| sample_indices = sample(np.where(Y_test_true != Y_test_pred)[0], 25) | |
| for idx, i in enumerate(sample_indices): | |
| plt.subplot(5, 5, idx + 1) | |
| plt.imshow(X_test[i].reshape(MNIST_IMAGE_SIZE, MNIST_IMAGE_SIZE), cmap="gray") | |
| plt.title(f"Pred: {Y_test_pred[i]}, True: {Y_test_true[i]}") | |
| plt.axis("off") |
🤖 Prompt for AI Agents
In mo_net/samples/mnist.py around lines 120 to 125, the subplot grid is
currently set to 8 rows by 5 columns, which creates 40 slots but only 25 images
are displayed. Change the subplot layout to 5 rows by 5 columns to create a
square grid that exactly fits the 25 sample images, ensuring a cleaner and more
balanced visualization.
- Fix bug in weight decay regulariser - Apply weight decay to embeddings not linear layer in CBOWModel
1d2710a to
b14d06c
Compare
It's not clear to me that dropping out projection is a good idea after all.
English sentences rather than shakespeare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
mo_net/samples/cbow.py (3)
458-458: Use context manager for file operations.Static analysis correctly identifies unsafe file handling that could lead to resource leaks.
- model = CBOWModel.load(open(model_path, "rb"), training=False) + with open(model_path, "rb") as f: + model = CBOWModel.load(f, training=False)
537-537: Use context manager for file operations.Same issue as above - unsafe file handling.
- model = CBOWModel.load(open(model_path, "rb"), training=False) + with open(model_path, "rb") as f: + model = CBOWModel.load(f, training=False)
555-558: Extract similarity calculation to avoid code duplication.The same cosine similarity calculation is repeated in multiple places. Extract to a helper function.
Add this helper function at the top of the file:
+def cosine_similarity(vec1: np.ndarray, vec2: np.ndarray) -> float: + """Calculate cosine similarity between two vectors with numerical stability.""" + norm1 = np.linalg.norm(vec1) + norm2 = np.linalg.norm(vec2) + + if norm1 == 0 or norm2 == 0: + return 0.0 + + similarity = np.dot(vec1, vec2) / (norm1 * norm2) + return np.clip(similarity, -1.0, 1.0)Then replace the duplicated calculations:
- similarity = np.dot(word_embedding, other_embedding) / ( - np.linalg.norm(word_embedding) * np.linalg.norm(other_embedding) - ) + similarity = cosine_similarity(word_embedding, other_embedding)
🧹 Nitpick comments (1)
mo_net/samples/cbow.py (1)
491-497: Add numerical stability to cosine similarity calculation.The cosine similarity calculation properly handles zero-magnitude vectors, which is good. However, the result could still benefit from clamping for numerical stability.
+ # Clamp to [-1, 1] for numerical stability + similarity = np.clip(similarity, -1.0, 1.0)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
mo_net/cli.py(4 hunks)mo_net/model/layer/embedding.py(1 hunks)mo_net/regulariser/weight_decay.py(2 hunks)mo_net/samples/cbow.py(1 hunks)mo_net/train/trainer/parallel.py(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- mo_net/train/trainer/parallel.py
🚧 Files skipped from review as they are similar to previous changes (1)
- mo_net/regulariser/weight_decay.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
mo_net/cli.py (2)
mo_net/protos.py (3)
ActivationFn(29-35)NormalisationType(202-205)d(173-174)mo_net/model/model.py (5)
Model(48-381)mlp_of(77-88)mlp_of(92-101)mlp_of(104-197)module_dimensions(347-348)
🪛 Pylint (3.3.7)
mo_net/model/layer/embedding.py
[error] 105-105: Parsing failed: 'invalid syntax (mo_net.model.layer.embedding, line 105)'
(E0001)
mo_net/samples/cbow.py
[error] 21-21: No name 'layer' in module 'mo_net.model'
(E0611)
[error] 22-22: Cannot import 'mo_net.model.layer.embedding' due to 'invalid syntax (mo_net.model.layer.embedding, line 105)'
(E0001)
[error] 22-22: No name 'layer' in module 'mo_net.model'
(E0611)
[error] 23-23: Cannot import 'mo_net.model.layer.linear' due to 'invalid syntax (mo_net.model.layer.linear, line 156)'
(E0001)
[error] 23-23: No name 'layer' in module 'mo_net.model'
(E0611)
[error] 24-24: No name 'layer' in module 'mo_net.model'
(E0611)
[error] 25-25: Cannot import 'mo_net.model.model' due to 'invalid syntax (mo_net.model.model, line 384)'
(E0001)
[error] 25-25: No name 'model' in module 'mo_net.model'
(E0611)
[error] 28-28: Cannot import 'mo_net.protos' due to 'invalid syntax (mo_net.protos, line 67)'
(E0001)
[error] 28-28: No name 'protos' in module 'mo_net'
(E0611)
[error] 33-33: Cannot import 'mo_net.train.trainer.trainer' due to 'invalid syntax (mo_net.train.trainer.trainer, line 42)'
(E0001)
[error] 33-38: No name 'trainer' in module 'mo_net.train.trainer'
(E0611)
[refactor] 336-336: Too many arguments (10/5)
(R0913)
[refactor] 336-336: Too many positional arguments (10/5)
(R0917)
[refactor] 336-336: Too many local variables (28/15)
(R0914)
[refactor] 448-448: Too many local variables (16/15)
(R0914)
[refactor] 458-458: Consider using 'with' for resource-allocating operations
(R1732)
[refactor] 527-527: Too many local variables (16/15)
(R0914)
[refactor] 537-537: Consider using 'with' for resource-allocating operations
(R1732)
🪛 Flake8 (7.2.0)
mo_net/model/layer/embedding.py
[error] 105-105: SyntaxError: invalid syntax
(E999)
mo_net/samples/cbow.py
[error] 182-182: whitespace before ':'
(E203)
[error] 183-183: whitespace before ':'
(E203)
🪛 Ruff (0.11.9)
mo_net/samples/cbow.py
458-458: Use a context manager for opening files
(SIM115)
537-537: Use a context manager for opening files
(SIM115)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: typecheck
🔇 Additional comments (7)
mo_net/model/layer/embedding.py (2)
200-214: LGTM! Solid backward propagation implementation.The gradient accumulation using
np.add.at()correctly handles multiple indices pointing to the same embedding, and the gradient clipping implementation provides good numerical stability.
171-178: Good parameter validation.The shape validation ensures embedding matrix dimensions match the specified vocabulary size and embedding dimension, preventing runtime errors.
mo_net/cli.py (2)
253-288: Excellent refactor to remove quickstart magic.The dynamic output dimension calculation from
Y_train.shape[1]is much cleaner than hardcoded constants. Requiring explicitdimsparameter when training new models improves clarity and prevents configuration mistakes.
342-343: Good explicit requirement for dataset URL.Removing dataset URL inference and requiring explicit URLs improves predictability and prevents silent failures from incorrect assumptions.
mo_net/samples/cbow.py (3)
44-86: Well-designed custom regularizer implementation.The
EmbeddingWeightDecayRegulariserproperly integrates with the training framework and follows the established pattern of attaching via static methods. The gradient accumulation and loss computation are correctly implemented.
196-257: Excellent CBOW model architecture design.The model composition using embedding layer, averaging layer, and linear output layer correctly implements the CBOW architecture. The class methods for creation and property access to embeddings provide a clean interface.
427-438: Good pattern matching for training result handling.The use of pattern matching with
TrainingSuccessful/TrainingFailedandassert_neverprovides clear, exhaustive handling of training outcomes.
| return self.__class__(embeddings=embeddings) | ||
|
|
||
|
|
||
| type ParametersType = Parameters |
There was a problem hiding this comment.
Fix Python version compatibility issue with type aliases.
The type statement syntax is only available in Python 3.12+. This will cause syntax errors in older Python versions and explains the static analysis failures.
Replace the new type alias syntax with traditional type aliases for broader compatibility:
-type ParametersType = Parameters
+ParametersType = Parameters
-type CacheType = Cache
+CacheType = CacheAlternatively, if Python 3.12+ is required, ensure this is documented in your project requirements.
Also applies to: 113-113
🧰 Tools
🪛 Pylint (3.3.7)
[error] 105-105: Parsing failed: 'invalid syntax (mo_net.model.layer.embedding, line 105)'
(E0001)
🪛 Flake8 (7.2.0)
[error] 105-105: SyntaxError: invalid syntax
(E999)
🤖 Prompt for AI Agents
In mo_net/model/layer/embedding.py at lines 105 and 113, the use of the 'type'
keyword for type aliasing is not compatible with Python versions below 3.12.
Replace these 'type' alias declarations with the traditional assignment syntax
for type aliases to ensure compatibility with older Python versions. For
example, change 'type ParametersType = Parameters' to 'ParametersType =
Parameters'. Make sure to update both lines accordingly.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (5)
mo_net/samples/cbow.py (5)
462-462: Use context manager for file operations.This matches the existing review comment about unsafe file handling.
- model = CBOWModel.load(open(model_path, "rb"), training=False) + with open(model_path, "rb") as f: + model = CBOWModel.load(f, training=False)
541-541: Use context manager for file operations.Same file handling issue as in the
infercommand.- model = CBOWModel.load(open(model_path, "rb"), training=False) + with open(model_path, "rb") as f: + model = CBOWModel.load(f, training=False)
559-562: Use the extracted cosine similarity helper function.This is the same cosine similarity calculation that should use the helper function mentioned in the previous comment.
- similarity = np.dot(word_embedding, other_embedding) / ( - np.linalg.norm(word_embedding) * np.linalg.norm(other_embedding) - ) + similarity = cosine_similarity(word_embedding, other_embedding)
495-501: Add numerical stability to cosine similarity calculation.The cosine similarity calculation should handle edge cases and potential numerical instability.
Extract to a helper function for reuse:
+def cosine_similarity(vec1: np.ndarray, vec2: np.ndarray) -> float: + """Calculate cosine similarity with numerical stability.""" + norm1 = np.linalg.norm(vec1) + norm2 = np.linalg.norm(vec2) + + if norm1 == 0 or norm2 == 0: + return 0.0 + + similarity = np.dot(vec1, vec2) / (norm1 * norm2) + return np.clip(similarity, -1.0, 1.0) - if (n1 := np.linalg.norm(word1_embedding)) == 0 or ( - n2 := np.linalg.norm(word2_embedding) - ) == 0: - similarity = 0.0 - else: - similarity = np.dot(word1_embedding, word2_embedding) / (n1 * n2) + similarity = cosine_similarity(word1_embedding, word2_embedding)
339-443: Reduce function complexity by extracting helper functions.The
trainfunction is overly complex with 10 parameters and 29 local variables, making it hard to maintain and test.Extract vocabulary and data preparation logic:
+def _prepare_vocabulary_and_data(vocab_size: int, context_size: int) -> tuple[Vocab, np.ndarray, np.ndarray]: + """Prepare vocabulary and training data from sentences.""" + sentences = ( + get_resource("s3://mo-net-resources/english-sentences.txt") + .read_text() + .split("\n") + )[:100000] + vocab = Vocab.from_sentences(sentences, max_size=vocab_size) + + tokenized_sentences = [ + [vocab[token] for token in sentence.split()] + for sentence in sentences + if sentence + ] + + return vocab, *get_training_set(tokenized_sentences, context_size, len(vocab)) def train( embedding_dim: int, context_size: int, # ... other params ): setup_logging(log_level) - sentences = ( - get_resource("s3://mo-net-resources/english-sentences.txt") - .read_text() - .split("\n") - )[:100000] - vocab = Vocab.from_sentences(sentences, max_size=vocab_size) - - tokenized_sentences = [ - [vocab[token] for token in sentence.split()] - for sentence in sentences - if sentence - ] - - X_train, Y_train = get_training_set(tokenized_sentences, context_size, len(vocab)) + vocab, X_train, Y_train = _prepare_vocabulary_and_data(vocab_size, context_size)
🧹 Nitpick comments (1)
mo_net/samples/cbow.py (1)
89-154: Consider adding validation for vocabulary size limits.The
Vocabclass is well-designed with proper serialization support. However, consider adding validation to prevent extremely large vocabularies that could cause memory issues.Add this validation to
from_sentences:@classmethod def from_sentences(cls, sentences: Collection[str], max_size: int) -> Vocab: + if max_size <= 0: + raise ValueError("max_size must be positive") + if max_size > 1_000_000: # reasonable upper bound + raise ValueError("max_size too large, consider reducing vocabulary size") + most_common_tokens = [
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
mo_net/samples/cbow.py(1 hunks)mo_net/server/app.py(3 hunks)mo_net/server/templates/dashboard.html(2 hunks)mo_net/train/backends/log.py(4 hunks)mo_net/train/backends/models.py(1 hunks)mo_net/train/run.py(1 hunks)pyproject.toml(2 hunks)
✅ Files skipped from review due to trivial changes (2)
- mo_net/server/app.py
- mo_net/server/templates/dashboard.html
🚧 Files skipped from review as they are similar to previous changes (1)
- pyproject.toml
🧰 Additional context used
🧬 Code Graph Analysis (2)
mo_net/train/backends/log.py (2)
mo_net/train/run.py (2)
start_run(54-61)seed(30-31)mo_net/train/backends/models.py (2)
DbRun(39-102)create(76-102)
mo_net/samples/cbow.py (14)
mo_net/model/layer/embedding.py (6)
Embedding(116-271)cache(243-244)Parameters(25-102)parameters(247-248)serialize(223-230)deserialize(130-138)mo_net/log.py (2)
LogLevel(12-18)setup_logging(21-23)mo_net/model/layer/linear.py (1)
Linear(167-364)mo_net/model/layer/output.py (1)
SoftmaxOutputLayer(55-92)mo_net/model/module/base.py (1)
Output(121-188)mo_net/protos.py (3)
NormalisationType(202-205)TrainingStepHandler(70-81)d(173-174)mo_net/resources.py (1)
get_resource(16-47)mo_net/cli.py (2)
train(313-463)cli(298-298)mo_net/config.py (1)
TrainingParameters(8-50)mo_net/train/backends/log.py (5)
SqliteBackend(118-219)create(17-18)create(67-68)create(130-131)create(230-231)mo_net/train/run.py (2)
TrainingRun(6-66)seed(30-31)mo_net/train/trainer/trainer.py (1)
BasicTrainer(85-371)mo_net/regulariser/weight_decay.py (2)
after_compute_update(19-31)attach(45-61)mo_net/optimizer/base.py (1)
register_after_compute_update_handler(99-104)
🪛 Flake8 (7.2.0)
mo_net/samples/cbow.py
[error] 185-185: whitespace before ':'
(E203)
[error] 186-186: whitespace before ':'
(E203)
🪛 Ruff (0.11.9)
mo_net/samples/cbow.py
462-462: Use a context manager for opening files
(SIM115)
541-541: Use a context manager for opening files
(SIM115)
🪛 Pylint (3.3.7)
mo_net/samples/cbow.py
[error] 22-22: No name 'layer' in module 'mo_net.model'
(E0611)
[error] 23-23: Cannot import 'mo_net.model.layer.embedding' due to 'invalid syntax (mo_net.model.layer.embedding, line 105)'
(E0001)
[error] 23-23: No name 'layer' in module 'mo_net.model'
(E0611)
[error] 24-24: Cannot import 'mo_net.model.layer.linear' due to 'invalid syntax (mo_net.model.layer.linear, line 156)'
(E0001)
[error] 24-24: No name 'layer' in module 'mo_net.model'
(E0611)
[error] 25-25: No name 'layer' in module 'mo_net.model'
(E0611)
[error] 26-26: Cannot import 'mo_net.model.model' due to 'invalid syntax (mo_net.model.model, line 384)'
(E0001)
[error] 26-26: No name 'model' in module 'mo_net.model'
(E0611)
[error] 29-29: Cannot import 'mo_net.protos' due to 'invalid syntax (mo_net.protos, line 67)'
(E0001)
[error] 29-29: No name 'protos' in module 'mo_net'
(E0611)
[error] 34-34: Cannot import 'mo_net.train.trainer.trainer' due to 'invalid syntax (mo_net.train.trainer.trainer, line 42)'
(E0001)
[error] 34-39: No name 'trainer' in module 'mo_net.train.trainer'
(E0611)
[refactor] 339-339: Too many arguments (10/5)
(R0913)
[refactor] 339-339: Too many positional arguments (10/5)
(R0917)
[refactor] 339-339: Too many local variables (29/15)
(R0914)
[refactor] 452-452: Too many local variables (16/15)
(R0914)
[refactor] 462-462: Consider using 'with' for resource-allocating operations
(R1732)
[refactor] 531-531: Too many local variables (16/15)
(R0914)
[refactor] 541-541: Consider using 'with' for resource-allocating operations
(R1732)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: typecheck
🔇 Additional comments (8)
mo_net/samples/cbow.py (2)
45-87: LGTM! Well-designed regularizer following established patterns.The
EmbeddingWeightDecayRegularisercorrectly implements L2 weight decay for embeddings, following the same pattern asWeightDecayRegulariserfor linear layers. The implementation properly modifies gradients and computes regularization loss.
177-197: Good implementation with proper strict checking.The
get_training_setfunction correctly usesstrict=Truefor the zip operation, addressing the static analysis concern from past reviews.mo_net/train/run.py (1)
57-57: LGTM! Correctly implements the updated backend interface.The addition of the
name=self._nameparameter properly aligns with the updated backendstart_runmethod signatures and ensures run names are persisted in the logging backend.mo_net/train/backends/models.py (1)
79-86: LGTM! Improved API design with explicit naming.The change from implicitly deriving the name from the seed to requiring an explicit
nameparameter improves clarity and allows for more meaningful run identification. This aligns well with the broader changes to support explicit run naming throughout the system.mo_net/train/backends/log.py (4)
20-26: LGTM! Consistent protocol update for explicit run naming.The
LoggingBackendprotocol correctly adds thenameparameter to thestart_runmethod signature, establishing the contract for all backend implementations.
70-77: LGTM! Proper parameter handling in CsvBackend.The
CsvBackendcorrectly accepts the newnameparameter and appropriately marks it as unused since CSV logging doesn't utilize run names.
133-149: LGTM! Correct integration with database model.The
SqliteBackendproperly passes thenameparameter toDbRun.create(), ensuring run names are persisted in the database for tracking and identification.
233-241: LGTM! Appropriate no-op implementation.The
NullBackendcorrectly accepts the new parameter and marks it as unused, maintaining interface consistency while providing no-op behavior as expected.
modiase
left a comment
There was a problem hiding this comment.
Reviewed 13 of 20 files at r1, 4 of 5 files at r2, 7 of 7 files at r5, 2 of 2 files at r6, 1 of 1 files at r7, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @modiase)
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Tests
Chores
This change is