# Glob Module
The glob module in Python is a powerful tool for file and directory pattern matching. It allows you to search for files and directories using wildcard characters, making it easy to find and manipulate multiple files at once.

## Key Features of glob

1. **Pattern Matching**: glob uses Unix shell-style wildcards for pattern matching[1][2].
   - `*`: Matches any number of characters
   - `?`: Matches any single character
   - `[...]`: Matches any character within the brackets

2. **Recursive Searching**: With Python 3.5+, you can use `**` to search recursively through directories[3].

3. **Built-in Module**: glob is part of Python's standard library, so no additional installation is required[1][4].

## Searching with glob

To use glob for searching, you typically follow these steps:

1. Import the module: `import glob`
2. Use the `glob.glob()` function with a pattern to find matching files or directories.

For example, to find all Python files in the current directory:

```python
import glob
python_files = glob.glob('*.py')
```

## Searching in the Working Directory

Yes, glob can search files in the working directory. By default, if you don't specify a path, glob searches in the current working directory[5].

## Searching Jupyter Notebooks

Jupyter notebooks have the `.ipynb` extension, so you can search for them using glob:

```python
jupyter_notebooks = glob.glob('*.ipynb')
```

## Searching File Contents

While glob is primarily used for filename matching, it can be combined with file reading operations to search within files, including Jupyter notebooks[6]. Here's an example:

```python
import glob
import json

search_term = 'your_search_term'
notebooks = glob.glob('*.ipynb')

for notebook in notebooks:
    with open(notebook, 'r') as f:
        nb_content = json.load(f)
        for cell in nb_content['cells']:
            if cell['cell_type'] == 'code':
                if search_term in ''.join(cell['source']):
                    print(f"Found '{search_term}' in {notebook}")
                    break
```

This script searches for a specific term within the code cells of Jupyter notebooks in the current directory.

Remember that while glob is powerful for file matching, it's not designed for complex text searching within files. For more advanced text searching, you might want to consider using regular expressions or specialized text search libraries[7].

Citations:
[1] https://www.scaler.com/topics/glob-in-python/
[2] https://www.geeksforgeeks.org/how-to-use-glob-function-to-find-files-recursively-in-python/
[3] https://pynative.com/python-glob/
[4] https://www.boardinfinity.com/blog/glob-in-python-2/
[5] https://training.galaxyproject.org/training-material/topics/data-science/tutorials/python-glob/tutorial.html
[6] https://builtin.com/software-engineering-perspectives/glob-in-python
[7] https://favtutor.com/blogs/glob-python
[8] https://www.youtube.com/watch?v=WHk96pueSK0
[9] https://pythonology.eu/glob-module-file-searching-in-python/

Yes, you can definitely create a search function to accomplish what you're looking for. The combination of glob and regular expressions is indeed a great approach for this task. Based on the search results, I can provide you with a solution that searches through all Jupyter notebooks in your sabbatical folder, including subfolders, and allows you to search for phrases in headers.

Here's a Python function that does this:

```python
import os
import glob
import json
import re

def search_notebooks_for_headers(base_dir, search_phrase):
    """
    Search all Jupyter notebooks in the given base directory and its subdirectories
    for a specific phrase in markdown headers.

    Args:
        base_dir (str): The base directory to search.
        search_phrase (str): The phrase to search for in markdown headers.

    Returns:
        dict: A dictionary where keys are notebook file paths and values are lists of matching headers.
    """
    matching_headers = {}

    # Use glob to find all Jupyter notebooks recursively
    notebook_files = glob.glob(os.path.join(base_dir, '**', '*.ipynb'), recursive=True)

    for notebook in notebook_files:
        with open(notebook, 'r', encoding='utf-8') as f:
            try:
                notebook_content = json.load(f)
                for cell in notebook_content.get('cells', []):
                    if cell.get('cell_type') == 'markdown':
                        # Combine the source lines into a single string
                        markdown_content = ''.join(cell.get('source', []))
                        # Use regex to find headers (lines starting with #)
                        headers = re.findall(r'^(#+\s.*)', markdown_content, re.MULTILINE)
                        # Check if the search phrase is in any of the headers
                        matching = [header for header in headers if search_phrase in header]
                        if matching:
                            if notebook not in matching_headers:
                                matching_headers[notebook] = []
                            matching_headers[notebook].extend(matching)
            except json.JSONDecodeError:
                print(f"Error reading {notebook}. Skipping.")

    return matching_headers

# Example usage
base_directory = "sabbat"  # Replace with the path to your base directory
search_term = "example"  # Replace with the phrase you want to search for
results = search_notebooks_for_headers(base_directory, search_term)

# Print results
for notebook, headers in results.items():
    print(f"Notebook: {notebook}")
    for header in headers:
        print(f"  Header: {header}")
```

This function does the following:

1. It uses glob to recursively find all `.ipynb` files in the specified base directory and its subdirectories.
2. For each notebook, it reads the content and parses it as JSON.
3. It then searches through markdown cells for headers (lines starting with #).
4. If the search phrase is found in a header, it adds the notebook and the matching header to the results.

To use this function:

1. Place this code in a Jupyter notebook in one of your folders.
2. Replace `"sabbat"` with the actual path to your main sabbatical folder.
3. Replace `"example"` with the phrase you want to search for in headers.
4. Run the cell, and it will print out all notebooks and headers that contain your search phrase.

This approach allows you to search for phrases in headers across all your notebooks efficiently. If you want to search in all text, not just headers, you can modify the regex pattern and remove the header-specific filtering.

Citations:
[1] https://www.youtube.com/watch?v=p_75ljGkYnM
[2] https://builtin.com/software-engineering-perspectives/glob-in-python
[3] https://stackoverflow.com/questions/50255553/search-jupyter-notebook-markdown-cells-from-command-line
[4] https://discourse.jupyter.org/t/search-for-text-in-the-output-of-a-jupyter-cell/1658
[5] https://stackoverflow.com/questions/13031989/regular-expression-usage-in-glob-glob
[6] https://www.datacamp.com/tutorial/markdown-in-jupyter-notebook
[7] https://www.youtube.com/watch?v=JtTEVfC1vwM

# Working Doc Header Search

In [4]:
import os
import glob
import json
import re

def search_notebooks_for_headers(base_dir, search_phrase, case_sensitive=True):
    """
    Search all Jupyter notebooks in the given base directory and its subdirectories
    for a specific phrase in markdown headers.

    Args:
        base_dir (str): The base directory to search.
        search_phrase (str): The phrase to search for in markdown headers.
        case_sensitive (bool): Whether the search should be case-sensitive.

    Returns:
        dict: A dictionary where keys are notebook file paths and values are lists of matching headers.
    """
    matching_headers = {}

    # Use glob to find all Jupyter notebooks recursively
    notebook_files = glob.glob(os.path.join(base_dir, '**', '*.ipynb'), recursive=True)

    for notebook in notebook_files:
        with open(notebook, 'r', encoding='utf-8') as f:
            try:
                notebook_content = json.load(f)
                for cell in notebook_content.get('cells', []):
                    if cell.get('cell_type') == 'markdown':
                        # Combine the source lines into a single string
                        markdown_content = ''.join(cell.get('source', []))
                        # Use regex to find headers (lines starting with #)
                        headers = re.findall(r'^(#+\s.*)', markdown_content, re.MULTILINE)
                        # Check if the search phrase is in any of the headers
                        if case_sensitive:
                            matching = [header for header in headers if search_phrase in header]
                        else:
                            matching = [header for header in headers if search_phrase.lower() in header.lower()]
                        if matching:
                            if notebook not in matching_headers:
                                matching_headers[notebook] = []
                            matching_headers[notebook].extend(matching)
            except json.JSONDecodeError:
                print(f"Error reading {notebook}. Skipping.")

    return matching_headers


###
def search_with_toggle(base_dir):
    search_phrase = input("Enter the header you would like to search for: ")
    case_sensitive = input("Should the search be case sensitive? (yes/no): ").strip().lower() == 'yes'

    results = search_notebooks_for_headers(base_dir, search_phrase, case_sensitive)

    # Print results
    for notebook, headers in results.items():
        print(f"Notebook: {notebook}")
        for header in headers:
            print(f"  Header: {header}")
###
# Example usage
#base_directory = "sabbat"  # Replace with the path to your base directory
base_directory = os.path.expanduser("~/sabbat")
#search_term = "lists"  # Replace with the phrase you want to search for
#search_term = input("Enter the header you would like to search for")

search_with_toggle(base_directory)

#results = search_notebooks_for_headers(base_directory, search_term)
'''results = search_notebooks_for_headers(base_directory, search_term)

# Print results
for notebook, headers in results.items():
    print(f"Notebook: {notebook}")
    for header in headers:
        print(f"  Header: {header}")
'''

Enter the header you would like to search for:  print
Should the search be case sensitive? (yes/no):  no


Notebook: /home/rebelford/sabbat/2019OLCC/lecture06-molecular-similarity.ipynb
  Header: ## 1. Fingerprint Generation
  Header: ### 1-(2) Circular Fingerprints
  Header: ### 1-(3) Path-Based Fingeprints
  Header: ### 1-(4) PubChem Fingerprint
Notebook: /home/rebelford/sabbat/2019OLCC/lecture08-machine-learning.ipynb
  Header: ## 6. Merge activity data and fingerprint information
Notebook: /home/rebelford/sabbat/Weiss/SciCompChemNotebooks/chapter_08/chap_08_notebook.ipynb
  Header: ### 8.1.2 Pretty Printing
Notebook: /home/rebelford/sabbat/Lin_Engin-Sci/bob/pp18-19_EngPy.ipynb
  Header: # Print the names of the objects without their contents
Notebook: /home/rebelford/sabbat/Lin_Engin-Sci/bob/ppEngPyB3_9-10.ipynb
  Header: # Print the list of files
  Header: # Print the list of .ipynb files
  Header: # Print the list of .ipynb files
Notebook: /home/rebelford/sabbat/Lin_Engin-Sci/bob/ppEngPyB4_12.ipynb
  Header: # Print the generated arrays
  Header: # Print the generated arrays
Notebook:

'results = search_notebooks_for_headers(base_directory, search_term)\n\n# Print results\nfor notebook, headers in results.items():\n    print(f"Notebook: {notebook}")\n    for header in headers:\n        print(f"  Header: {header}")\n'

In [None]:
import os
import glob
import json
import re
import os

def search_notebooks_for_headers(base_dir, search_phrase, case_sensitive=True):
    """
    Search all Jupyter notebooks in the given base directory and its subdirectories
    for a specific phrase in markdown headers.

    Args:
        base_dir (str): The base directory to search.
        search_phrase (str): The phrase to search for in markdown headers.
        case_sensitive (bool): Whether the search should be case-sensitive.

    Returns:
        dict: A dictionary where keys are notebook file paths and values are lists of matching headers.
    """
    matching_headers = {}

    # Use glob to find all Jupyter notebooks recursively
    notebook_files = glob.glob(os.path.join(base_dir, '**', '*.ipynb'), recursive=True)

    for notebook in notebook_files:
        with open(notebook, 'r', encoding='utf-8') as f:
            try:
                notebook_content = json.load(f)
                for cell in notebook_content.get('cells', []):
                    if cell.get('cell_type') == 'markdown':
                        # Combine the source lines into a single string
                        markdown_content = ''.join(cell.get('source', []))
                        # Use regex to find headers (lines starting with #)
                        headers = re.findall(r'^(#+\s.*)', markdown_content, re.MULTILINE)
                        # Check if the search phrase is in any of the headers
                        if case_sensitive:
                            matching = [header for header in headers if search_phrase in header]
                        else:
                            matching = [header for header in headers if search_phrase.lower() in header.lower()]
                        if matching:
                            if notebook not in matching_headers:
                                matching_headers[notebook] = []
                            matching_headers[notebook].extend(matching)
            except json.JSONDecodeError:
                print(f"Error reading {notebook}. Skipping.")

    return matching_headers


In [None]:
import sys
import os
sabbat_path = os.path.join(os.path.expanduser('~'), 'sabbat')
sys.path.append(sabbat_path)
import my_fun

my_fun.search_headers()
#my_fun.rgb_to_color_name(.2,.4,.2)


I'm thrilled to hear that it works! You're very welcome. It's great that we were able to resolve the issue and get your `search_headers` function working correctly within your `my_fun.py` module.

This solution allows you to:
1. Keep your code organized in a reusable module
2. Easily import and use the `search_headers` function in any of your Jupyter notebooks
3. Maintain a clean structure with imports at the top of the file

Remember, you can now use this function in any notebook by importing it like this:

```python
import sys
import os
sabbat_path = os.path.join(os.path.expanduser('~'), 'sabbat')
sys.path.append(sabbat_path)
import my_fun

my_fun.search_headers()
```

If you need to search in a different directory, you can also pass a custom path:

```python
my_fun.search_headers("/path/to/custom/directory")
```

This modular approach will make your work more efficient and your code more maintainable as you continue with your sabbatical project. If you have any more questions or need further assistance, don't hesitate to ask!

# my_fun.search.headers()
Explain the code
### code
def search_notebooks_for_headers(base_dir, search_phrase, case_sensitive=True):
    matching_headers = {}
    notebook_files = glob.glob(os.path.join(base_dir, '**', '*.ipynb'), recursive=True)

    for notebook in notebook_files:
        with open(notebook, 'r', encoding='utf-8') as f:
            try:
                notebook_content = json.load(f)
                for cell in notebook_content.get('cells', []):
                    if cell.get('cell_type') == 'markdown':
                        markdown_content = ''.join(cell.get('source', []))
                        headers = re.findall(r'^(#+\s.*)', markdown_content, re.MULTILINE)
                        if case_sensitive:
                            matching = [header for header in headers if search_phrase in header]
                        else:
                            matching = [header for header in headers if search_phrase.lower() in header.lower()]
                        if matching:
                            if notebook not in matching_headers:
                                matching_headers[notebook] = []
                            matching_headers[notebook].extend(matching)
            except json.JSONDecodeError:
                print(f"Error reading {notebook}. Skipping.")

    return matching_headers

Certainly! I'd be happy to provide an overview of what happens when you call `my_fun.search_headers()`. Here's a high-level explanation of the process:

1. The `search_headers()` function is called.

2. It sets up the base directory (default is ~/sabbat) and prompts the user for:
   - A search phrase
   - Whether the search should be case-sensitive

3. It then calls the `search_notebooks_for_headers()` function with these parameters.

4. `search_notebooks_for_headers()` does the following:
   - Uses `glob` to find all `.ipynb` files in the base directory and its subdirectories.
   - For each notebook file:
     - Opens the file and reads its content.
     - Uses `json` to parse the notebook content (Jupyter notebooks are stored as JSON files).
     - Looks for markdown cells within the notebook.
     - Within markdown cells, it searches for headers (lines starting with #).
     - If it finds headers matching the search phrase, it stores them.

5. The function returns a dictionary of matching headers for each notebook.

6. `search_headers()` then prints the results, showing each notebook path and the matching headers found.

The role of JSON in this process is crucial because Jupyter notebooks are stored as JSON files. JSON (JavaScript Object Notation) is a lightweight data interchange format that's easy for humans to read and write and easy for machines to parse and generate. In a Jupyter notebook:

- Each cell (code, markdown, etc.) is represented as a JSON object.
- The entire notebook structure, including metadata, is stored in JSON format.

By using `json.load()`, we're able to read and parse the notebook file, allowing us to access its structure and content programmatically. This enables us to search through the markdown cells for headers without having to manually parse the raw file content.

This approach allows for efficient searching across multiple notebooks, treating each notebook as a structured document rather than just a text file.

I apologize for the confusion. You're right to question this, as the dictionary isn't explicitly shown in the output. Let's clarify where the dictionary is created and how you can access it.

In the `search_notebooks_for_headers` function, a dictionary called `matching_headers` is created:

```python
def search_notebooks_for_headers(base_dir, search_phrase, case_sensitive=True):
    matching_headers = {}  # This is where the dictionary is created
    # ... rest of the function
```

This dictionary is populated as the function searches through notebooks. The keys are the notebook file paths, and the values are lists of matching headers found in each notebook.

However, in the current implementation, this dictionary is not directly accessible outside the function. The `search_headers` function calls `search_notebooks_for_headers`, gets the results, and immediately prints them without saving the dictionary for further use.

To make this dictionary accessible, we can modify the `search_headers` function to return the results instead of just printing them. Here's how you can change it:

```python
def search_headers(base_directory=None):
    if base_directory is None:
        base_directory = os.path.expanduser("~/sabbat")
    search_phrase = input("Enter the header you would like to search for: ")
    case_sensitive = input("Should the search be case sensitive? (yes/no): ").strip().lower() == 'yes'

    results = search_notebooks_for_headers(base_directory, search_phrase, case_sensitive)

    # Print results (you can keep this if you want)
    for notebook, headers in results.items():
        print(f"Notebook: {notebook}")
        for header in headers:
            print(f"  Header: {header}")

    return results  # Return the dictionary

# Usage:
results_dict = my_fun.search_headers()
```

Now, `results_dict` will contain the dictionary with notebooks as keys and lists of matching headers as values. You can explore it like this:

```python
# Print all notebook paths
print(results_dict.keys())

# Print headers for a specific notebook
print(results_dict['/path/to/specific/notebook.ipynb'])

# Count total matches
total_matches = sum(len(headers) for headers in results_dict.values())
print(f"Total matches: {total_matches}")

# Print all results in a different format
for notebook, headers in results_dict.items():
    print(f"Notebook: {notebook}")
    print(f"Matching headers: {headers}\n")
```

This modification allows you to work with the results programmatically after the search is complete, giving you more flexibility in how you use and analyze the search results.

Certainly! I'll go through the code line by line and explain each step:

```python
def search_notebooks_for_headers(base_dir, search_phrase, case_sensitive=True):
```
This line defines a function that takes three parameters: the base directory to search, the phrase to search for, and whether the search should be case-sensitive (default is True).

```python
    matching_headers = {}
```
Initialize an empty dictionary to store the matching headers for each notebook.

```python
    notebook_files = glob.glob(os.path.join(base_dir, '**', '*.ipynb'), recursive=True)
```
Use glob to find all .ipynb files in the base directory and its subdirectories. The '**' pattern allows for recursive search.

```python
    for notebook in notebook_files:
```
Start a loop that iterates over each notebook file found.

```python
        with open(notebook, 'r', encoding='utf-8') as f:
```
Open the current notebook file in read mode with UTF-8 encoding.

```python
            try:
```
Start a try-except block to handle potential errors when reading the file.

```python
                notebook_content = json.load(f)
```
Parse the JSON content of the notebook file.

```python
                for cell in notebook_content.get('cells', []):
```
Iterate over each cell in the notebook. If 'cells' doesn't exist, use an empty list.

```python
                    if cell.get('cell_type') == 'markdown':
```
Check if the current cell is a markdown cell.

```python
                        markdown_content = ''.join(cell.get('source', []))
```
Join all the source lines of the markdown cell into a single string.

```python
                        headers = re.findall(r'^(#+\s.*)', markdown_content, re.MULTILINE)
```
Use regex to find all lines that start with one or more '#' followed by a space, which indicates a header in markdown.

```python
                        if case_sensitive:
                            matching = [header for header in headers if search_phrase in header]
                        else:
                            matching = [header for header in headers if search_phrase.lower() in header.lower()]
```
Create a list of headers that contain the search phrase, respecting the case sensitivity setting.

```python
                        if matching:
                            if notebook not in matching_headers:
                                matching_headers[notebook] = []
                            matching_headers[notebook].extend(matching)
```
If there are matching headers, add them to the matching_headers dictionary under the current notebook's key.

```python
            except json.JSONDecodeError:
                print(f"Error reading {notebook}. Skipping.")
```
If there's an error parsing the JSON of a notebook, print an error message and skip that notebook.

```python
    return matching_headers
```
After processing all notebooks, return the dictionary of matching headers.

```python
def search_headers(base_directory=None):
```
Define the main search function with an optional base directory parameter.

```python
    if base_directory is None:
        base_directory = os.path.expanduser("~/sabbat")
```
If no base directory is provided, use "~/sabbat" as the default.

```python
    search_phrase = input("Enter the header you would like to search for: ")
    case_sensitive = input("Should the search be case sensitive? (yes/no): ").strip().lower() == 'yes'
```
Prompt the user for the search phrase and whether the search should be case-sensitive.

```python
    results = search_notebooks_for_headers(base_directory, search_phrase, case_sensitive)
```
Call the search_notebooks_for_headers function with the provided parameters.

```python
    for notebook, headers in results.items():
        print(f"Notebook: {notebook}")
        for header in headers:
            print(f"  Header: {header}")
```
Print the results, showing each notebook path and its matching headers.

This code provides a comprehensive search functionality for Jupyter notebooks, allowing users to find specific headers across multiple notebooks efficiently.

Can we look at this line here:
                        markdown_content = ''.join(cell.get('source', []))
What type of variable is markdown_content? I assume it is a python list, but am confused as to how it keeps both the notebook identity and the header content.  It seems like we are bringing  everything into one list object and than operating on that.  I also do not understand ;source.




Excellent question! Let's break down this line and clarify some misconceptions:

```python
markdown_content = ''.join(cell.get('source', []))
```

1. `markdown_content` is actually a string, not a list. 

2. `cell.get('source', [])` is accessing the 'source' key of the `cell` dictionary. In Jupyter notebook JSON structure, 'source' typically contains the content of the cell as a list of strings, where each string represents a line in the cell.

3. If 'source' doesn't exist in the cell dictionary, `get()` returns an empty list `[]` as a default value.

4. `''.join(...)` is a string method that concatenates all the elements of an iterable (in this case, the list returned by `get()`) into a single string. The empty string `''` is used as the separator between elements.

So, this line is not keeping the notebook identity or header content separately. It's simply combining all the lines of a single markdown cell into one string.

The notebook identity is maintained because this operation is happening inside a loop that's processing one notebook at a time. The `notebook` variable in the outer loop keeps track of which file we're currently processing.

As for headers, they haven't been identified yet at this point. The next line uses a regular expression to find headers within this `markdown_content` string:

```python
headers = re.findall(r'^(#+\s.*)', markdown_content, re.MULTILINE)
```

This line is what actually extracts the headers from the markdown content.

So, to summarize:
- We're not bringing everything into one list object.
- We're processing one notebook at a time, and within each notebook, one cell at a time.
- For each markdown cell, we're combining its content into a single string for easier processing.
- We then search this string for headers.

The structure and separation of notebooks and their contents are maintained by the nested loops and the dictionary structure of `matching_headers`.

In [None]:
import sys
import os

sabbat_path = os.path.join(os.path.expanduser('~'), 'sabbat')
sys.path.append(sabbat_path)
import my_fun
my_fun.search_headers()

# Notebook colab search
###
def search_with_toggle(base_dir):
    search_phrase = input("Enter the header you would like to search for: ")
    case_sensitive = input("Should the search be case sensitive? (yes/no): ").strip().lower() == 'yes'

    results = search_notebooks_for_headers(base_dir, search_phrase, case_sensitive)

    # Print results
    for notebook, headers in results.items():
        print(f"Notebook: {notebook}")
        for header in headers:
            print(f"  Header: {header}")
###
# Example usage
#base_directory = "sabbat"  # Replace with the path to your base directory
base_directory = os.path.expanduser("~/sabbat")
#search_term = "lists"  # Replace with the phrase you want to search for
#search_term = input("Enter the header you would like to search for")

search_with_toggle(base_directory)

#results = search_notebooks_for_headers(base_directory, search_term)
'''results = search_notebooks_for_headers(base_directory, search_term)

# Print results
for notebook, headers in results.items():
    print(f"Notebook: {notebook}")
    for header in headers:
        print(f"  Header: {header}")
'''

In [5]:
import os
import glob
import json
import re

def search_notebooks_for_headers(base_dir, search_phrase, case_sensitive=True):
    matching_headers = {}
    notebook_files = glob.glob(os.path.join(base_dir, '**', '*.ipynb'), recursive=True)

    for notebook in notebook_files:
        with open(notebook, 'r', encoding='utf-8') as f:
            try:
                notebook_content = json.load(f)
                for cell in notebook_content.get('cells', []):
                    if cell.get('cell_type') == 'markdown':
                        markdown_content = ''.join(cell.get('source', []))
                        #debugging output
                        print(f"Markdown content in {notebook}: {markdown_content}")

                        headers = re.findall(r'^(#+\s.*)', markdown_content, re.MULTILINE)
                        
                        # Debugging output: print all found headers
                        print(f"Found headers in {notebook}: {headers}")

                        if case_sensitive:
                            matching = [header for header in headers if search_phrase in header]
                        else:
                            matching = [header for header in headers if search_phrase.lower() in header.lower()]
                        
                        if matching:
                            if notebook not in matching_headers:
                                matching_headers[notebook] = []
                            matching_headers[notebook].extend(matching)
            except json.JSONDecodeError:
                print(f"Error reading {notebook}. Skipping.")

    return matching_headers


def search_with_toggle(base_directory=None):
    if base_directory is None:
        base_directory = input("Directory to be searched: ")
    
    search_phrase = input("Enter the header you would like to search for: ")
    case_sensitive_input = input("Should the search be case sensitive? (y/n): ").strip().lower()
    
    case_sensitive = case_sensitive_input == 'y'

    results = search_notebooks_for_headers(base_directory, search_phrase, case_sensitive)

    if results:
        for notebook, headers in results.items():
            print(f"Notebook: {notebook}")
            for header in headers:
                print(f"  Header: {header}")
    else:
        print("No matching headers found.")


# Example usage
search_with_toggle()



Directory to be searched:  sabbat
Enter the header you would like to search for:  heap
Should the search be case sensitive? (y/n):  n


No matching headers found.


In [6]:
def search_notebooks_for_headers(base_dir, search_phrase, case_sensitive=True):
    matching_headers = {}
    notebook_files = glob.glob(os.path.join(base_dir, '**', '*.ipynb'), recursive=True)

    for notebook in notebook_files:
        with open(notebook, 'r', encoding='utf-8') as f:
            try:
                notebook_content = json.load(f)
                
                # Debugging output: print the entire notebook content
                print(f"Processing notebook: {notebook}")
                
                # Check cell types
                for cell in notebook_content.get('cells', []):
                    print(f"Cell type: {cell.get('cell_type')}")
                    
                    if cell.get('cell_type') == 'markdown':
                        markdown_content = ''.join(cell.get('source', []))
                        
                        # Debugging output: print the markdown content being processed
                        print(f"Markdown content in {notebook}: {markdown_content}")

                        # Use regex to find headers (lines starting with #)
                        headers = re.findall(r'^(#+\s.*)', markdown_content, re.MULTILINE)

                        # Debugging output: print all found headers
                        print(f"Found headers in {notebook}: {headers}")

                        if case_sensitive:
                            matching = [header for header in headers if search_phrase in header]
                        else:
                            matching = [header for header in headers if search_phrase.lower() in header.lower()]
                        
                        if matching:
                            if notebook not in matching_headers:
                                matching_headers[notebook] = []
                            matching_headers[notebook].extend(matching)
            except json.JSONDecodeError:
                print(f"Error reading {notebook}. Skipping.")

    return matching_headers


In [9]:
import os
import glob
import json
import re

def search_notebooks_for_headers(base_dir, search_phrase, case_sensitive=True):
    matching_headers = {}
    notebook_files = glob.glob(os.path.join(base_dir, '**', '*.ipynb'), recursive=True)

    for notebook in notebook_files:
        with open(notebook, 'r', encoding='utf-8') as f:
            try:
                notebook_content = json.load(f)
                for cell in notebook_content.get('cells', []):
                    if cell.get('cell_type') == 'markdown':
                        markdown_content = ''.join(cell.get('source', []))
                        headers = re.findall(r'^(#+\s.*)', markdown_content, re.MULTILINE)
                        
                        if case_sensitive:
                            matching = [header for header in headers if search_phrase in header]
                        else:
                            matching = [header for header in headers if search_phrase.lower() in header.lower()]
                        
                        if matching:
                            if notebook not in matching_headers:
                                matching_headers[notebook] = []
                            matching_headers[notebook].extend(matching)
            except json.JSONDecodeError:
                print(f"Error reading {notebook}. Skipping.")

    return matching_headers


def search_with_toggle():
    # Get user input for the directory to be searched
    directory_searched = input("Enter the directory to be searched (default is ~/sabbat): ") or "~/sabbat"
    
    # Expand the user directory
    base_directory = os.path.expanduser(directory_searched)

    # Get user input for search phrase and case sensitivity
    search_phrase = input("Enter the header you would like to search for: ")
    case_sensitive_input = input("Should the search be case sensitive? (y/n): ").strip().lower()
    
    case_sensitive = case_sensitive_input == 'y'

    # Call the search function with user inputs
    results = search_notebooks_for_headers(base_directory, search_phrase, case_sensitive)

    # Print results
    if results:
        for notebook, headers in results.items():
            print(f"Notebook: {notebook}")
            for header in headers:
                print(f"  Header: {header}")
    else:
        print("No matching headers found.")


# Example usage
search_with_toggle()


Enter the directory to be searched (default is ~/sabbat):  ~/sabbat
Enter the header you would like to search for:  heap
Should the search be case sensitive? (y/n):  n


Notebook: /home/rebelford/sabbat/ClassNoteBooks/pp01_gPCHardSoftware.ipynb
  Header: # memory heap
  Header: # HEAP memory block
  Header: ### **1. Variable Assignment and Heap Memory**
  Header: # Chips HEAP integers


In [10]:
import os
import glob
import json
import re

def search_notebooks_for_headers(base_dir, search_phrase, case_sensitive=True):
    matching_headers = {}
    notebook_files = glob.glob(os.path.join(base_dir, '**', '*.ipynb'), recursive=True)

    for notebook in notebook_files:
        with open(notebook, 'r', encoding='utf-8') as f:
            try:
                notebook_content = json.load(f)
                for cell in notebook_content.get('cells', []):
                    if cell.get('cell_type') == 'markdown':
                        markdown_content = ''.join(cell.get('source', []))
                        headers = re.findall(r'^(#+\s.*)', markdown_content, re.MULTILINE)
                        
                        if case_sensitive:
                            matching = [header for header in headers if search_phrase in header]
                        else:
                            matching = [header for header in headers if search_phrase.lower() in header.lower()]
                        
                        if matching:
                            if notebook not in matching_headers:
                                matching_headers[notebook] = []
                            matching_headers[notebook].extend(matching)
            except json.JSONDecodeError:
                print(f"Error reading {notebook}. Skipping.")

    return matching_headers


def search_with_toggle():
    # Get user input for the directory to be searched
    directory_searched = input("Enter the directory to be searched (default is ~/sabbat): ") or "~/sabbat"
    
    # Expand the user directory
    base_directory = os.path.expanduser(directory_searched)

    # If the input was just 'sabbat', convert it to an absolute path
    if os.path.basename(directory_searched) == 'sabbat' and not os.path.isabs(directory_searched):
        base_directory = os.path.join(os.path.expanduser("~"), 'sabbat')

    # Get user input for search phrase and case sensitivity
    search_phrase = input("Enter the header you would like to search for: ")
    case_sensitive_input = input("Should the search be case sensitive? (y/n): ").strip().lower()
    
    case_sensitive = case_sensitive_input == 'y'

    # Call the search function with user inputs
    results = search_notebooks_for_headers(base_directory, search_phrase, case_sensitive)

    # Print results
    if results:
        for notebook, headers in results.items():
            print(f"Notebook: {notebook}")
            for header in headers:
                print(f"  Header: {header}")
    else:
        print("No matching headers found.")


# Example usage
search_with_toggle()



Enter the directory to be searched (default is ~/sabbat):  sabbat
Enter the header you would like to search for:  heap
Should the search be case sensitive? (y/n):  n


Notebook: /home/rebelford/sabbat/ClassNoteBooks/pp01_gPCHardSoftware.ipynb
  Header: # memory heap
  Header: # HEAP memory block
  Header: ### **1. Variable Assignment and Heap Memory**
  Header: # Chips HEAP integers


In [11]:
import os
import glob
import json
import re

def search_notebooks_for_headers(base_dir, search_phrase, case_sensitive=True):
    """
    Search all Jupyter notebooks in the given base directory and its subdirectories
    for a specific phrase in markdown headers.

    Args:
        base_dir (str): The base directory to search.
        search_phrase (str): The phrase to search for in markdown headers.
        case_sensitive (bool): Whether the search should be case-sensitive.

    Returns:
        dict: A dictionary where keys are notebook file paths and values are lists of matching headers.
    """
    matching_headers = {}

    # Use glob to find all Jupyter notebooks recursively
    notebook_files = glob.glob(os.path.join(base_dir, '**', '*.ipynb'), recursive=True)

    for notebook in notebook_files:
        with open(notebook, 'r', encoding='utf-8') as f:
            try:
                notebook_content = json.load(f)
                for cell in notebook_content.get('cells', []):
                    if cell.get('cell_type') == 'markdown':
                        # Combine the source lines into a single string
                        markdown_content = ''.join(cell.get('source', []))
                        # Use regex to find headers (lines starting with #)
                        headers = re.findall(r'^(#+\s.*)', markdown_content, re.MULTILINE)
                        # Check if the search phrase is in any of the headers
                        if case_sensitive:
                            matching = [header for header in headers if search_phrase in header]
                        else:
                            matching = [header for header in headers if search_phrase.lower() in header.lower()]
                        if matching:
                            if notebook not in matching_headers:
                                matching_headers[notebook] = []
                            matching_headers[notebook].extend(matching)
            except json.JSONDecodeError:
                print(f"Error reading {notebook}. Skipping.")

    return matching_headers

def search_with_toggle(base_dir):
    while True:
        directory_searched= input("Directory to be searched: ")
        search_phrase = input("Enter the header you would like to search for: ")
        case_sensitive = input("Should the search be case sensitive? (y/n): ").strip().lower() == 'y'

        results = search_notebooks_for_headers(base_dir, search_phrase, case_sensitive)

        # Print results
        for notebook, headers in results.items():
            print(f"Notebook: {notebook}")
            for header in headers:
                print(f"  Header: {header}")

        another_search = input("Do you want to perform another search? (yes/no): ").strip().lower()
        if another_search != "yes":
            print("Exiting the search tool.")
            break
        

# Example usage
#base_directory = os.path.expanduser("~/sabbat")
base_directory = os.path.expanduser("~/{directory searched}")
search_with_toggle(base_directory)


Directory to be searched:  sabbat
Enter the header you would like to search for:  heap
Should the search be case sensitive? (y/n):  n
Do you want to perform another search? (yes/no):  n


Exiting the search tool.
