# Token Counter for Markdown Files
This notebook demonstrates how to create a Python script that counts the number of tokens (words) in a Markdown file, while ignoring Markdown syntax. We'll also learn how to handle errors and log important events, errors, and warnings.

## Import Libraries
We will use the following libraries in our script:

- `sys`: For command line arguments
- `re`: For regular expressions to remove Markdown syntax
- `logging`: For logging errors and events

In [16]:
import sys
import re
import logging

## Count Tokens Function
The `count_tokens` function takes a file path as input, reads the file, removes Markdown syntax using regular expressions, and then counts the number of tokens (words) in the file.

In [17]:
def count_tokens(file_path):
    try:
        with open(file_path, 'r', encoding='utf-8') as file:
            content = file.read()
            content = re.sub(r'\!\[.*?\]\(.*?\)|\[(.*?)\]\(.*?\)|\#\#*\s|\`|\*|[\*_]{2,}|[-]{3,}|\d\.\s|\n', '', content)
            tokens = content.split()
            logging.info(f'Token count for {file_path}: {token_count}')
            return len(tokens)
    except FileNotFoundError:
        logging.error(f'File not found: {file_path}')
        print(f'Error: File not found - {file_path}')
    except Exception as e:
        logging.error(f'Error processing file: {file_path} - {str(e)}')
        print(f'Error processing file: {file_path} - {str(e)}')

## Using the Count Tokens Function
To use the `count_tokens` function, provide a file path to a Markdown file as input. The function will return the number of tokens (words) in the file, ignoring Markdown syntax.

Here's an example of how to use the function with a sample Markdown file named `NLP_intro.md`.

In [18]:
file_path = r'C:\Users\mjpa\Documents\Obsidian\20-29_Projekte\21_jPAw\21.96_MultiAgentSystem\OBJECTIVE-MAS-GITHUB-basedOnAgentLLM.md'
token_count = count_tokens(file_path)

if token_count is not None:
    print(f'The number of tokens in the file is: {token_count}')

The number of tokens in the file is: 227


## Creating a Standalone Python Script
To create a standalone Python script, follow these steps:

1. Save the code in a file named `token_counter.py`
2. Open a command prompt or terminal
3. Navigate to the directory where the `token_counter.py` file is located
4. Run the script with the following command: `python token_counter.py path/to/your/markdown_file.md`

Replace `path/to/your/markdown_file.md` with the path to the actual Markdown file you'd like to process. The script will return the number of tokens (words) in the file, ignoring Markdown syntax.

## Setting Up Logging

We have added logging to the `count_tokens` function to log errors and events. To set up logging, we use the `logging.basicConfig` function, which allows us to configure the logging settings, such as the log file name, log level, and log message format. In our example, we set the log file name to `token_counter.log`, the log level to `logging.INFO`, and the log message format to include the timestamp, log level, and message.


In [19]:
logging.basicConfig(filename='token_counter.log', level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s')

# The End
This is a Jupyter Notebook that demonstrates how to create a Python script to count tokens in a Markdown file, handle errors, and log important events.