# Code Smell Detector CLI Tool â€” Usage Guide

This notebook provides instructions on how to use the command-line tool `CodeSmellDetector` for analyzing source code files and detecting common code smells using a fine-tuned [CodeT5](https://arxiv.org/abs/2109.00859) transformer model.

---

## Project Structure Requirements

Ensure the following directory layout exists in your project:

```
goit-cp-code-smell-transformers/
â”œâ”€â”€ src/
â”‚   â”œâ”€â”€ code_smell_detector/
â”‚   â”‚   â”œâ”€â”€ code_smell_detector.py
â”‚   â”‚   â””â”€â”€ __init__.py
â”‚   â”œâ”€â”€ data_processing/
â”‚   â”‚   â”œâ”€â”€ cleaner.py
â”‚   â”‚   â””â”€â”€ __init__.py
â”œâ”€â”€ models/
â”‚   â””â”€â”€ transformers/
â”‚       â””â”€â”€ codet5/
â”‚           â””â”€â”€ codet5-base_multilabel_finetuned/
```

The model path can be customized via the `--model_path` parameter.

---

## Preprocessing

The tool internally applies code cleaning before prediction:
- Removes single-line and multi-line comments
- Normalizes whitespace
- Strips empty lines

This preprocessing improves the consistency of inference results.

---

## Usage

### Run from the command line (macOS/Linux)

The CLI tool supports two modes of operation, configured via the `--mode` argument:

- `file` â€” analyze a **single file** (default mode)
- `directory` â€” recursively analyze **all `.java` files** in the given directory and its subdirectories

Basic invocation (assuming the project structure from above):

```bash
PYTHONPATH=src python -m src.code_smell_detector \
  --mode file \
  --code /absolute/path/to/your/SourceFile.java
```

If `code_smell_detector.py` patches `sys.path`, you can omit `PYTHONPATH`:

```bash
python -m src.code_smell_detector \
  --mode file \
  --code /absolute/path/to/your/SourceFile.java
```

Since `file` is the default mode, the `--mode` argument can be omitted:

```bash
python -m src.code_smell_detector \
  --code /absolute/path/to/your/SourceFile.java
```

#### Directory analysis mode

To analyze all `.java` files in a directory (recursively), use the `directory` mode:

```bash
python -m src.code_smell_detector \
  --mode directory \
  --code /absolute/path/to/your/java/project
```

Optionally, you can override the default path to the fine-tuned model:

```bash
--model_path models/transformers/codet5/codet5-base_multilabel_finetuned
```

The `--model_path` argument works the same for both `file` and `directory` modes.

---

## Output

### `file` mode

When analyzing a single `.java` file, the tool prints a header and a single line with the prediction, enriched with an emoji for quick visual assessment:

```text
Predicted code smells:
ðŸŸ¢ /absolute/path/to/your/SourceFile.java: Clean
```

If the model predicts one or more code smells, the output may look like this:

```text
Predicted code smells:
ðŸŸ¡ /absolute/path/to/your/SourceFile.java: Long Method, Feature Envy
```

- `ðŸŸ¢` â€” the file is classified as **Clean** (no code smells detected).
- `ðŸŸ¡` â€” **at least one code smell** has been detected.

### `directory` mode

In `directory` mode, the tool first prints the number of discovered `.java` files:

```text
Found 3 .java file(s) in '/absolute/path/to/your/java/project'. Running predictions...
```

Then it prints one line per file:

```text
ðŸŸ¡ /absolute/path/to/your/java/project/Foo.java: Long Method
ðŸŸ¡ /absolute/path/to/your/java/project/Bar.java: God/Large Class
ðŸŸ¢ /absolute/path/to/your/java/project/baz/Baz.java: Clean
```

If no `.java` files are found in the directory (including subdirectories), the following message is shown:

```text
No .java files found in directory (including subdirectories): /absolute/path/to/your/java/project
```

In every line, after the emoji, the tool prints the full file path and a **comma-separated list of the predicted code smell categories**, which may include:

- `Long Method`
- `God/Large Class`
- `Feature Envy`
- `Data Class`
- or `Clean` (if no smells are detected)

---

## Notes

- The tool is designed for **Java-like syntax**, but can be extended to support other languages.
- The model was fine-tuned using a **multi-label classification formulation** with natural language prompts:
  > `"detect code smell: {code snippet}"`

---

Feel free to run inference on different files and explore predictions!