![tracker](https://us-central1-vertex-ai-mlops-369716.cloudfunctions.net/pixel-tracking?path=statmike%2Fvertex-ai-mlops%2Farchitectures&file=move_notebooks.ipynb)
<!--- header table --->
<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/architectures/move_notebooks.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo">
      <br>Run in<br>Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2Fstatmike%2Fvertex-ai-mlops%2Fmain%2Farchitectures%2Fmove_notebooks.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo">
      <br>Run in<br>Colab Enterprise
    </a>
  </td>      
  <td style="text-align: center">
    <a href="https://github.com/statmike/vertex-ai-mlops/blob/main/architectures/move_notebooks.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      <br>View on<br>GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/statmike/vertex-ai-mlops/main/architectures/move_notebooks.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      <br>Open in<br>Vertex AI Workbench
    </a>
  </td>
</table>

# Moving Notebook Files

Used to move files or folders to different folders within the repo
- preserves files commmit history by using `git mv`
- Will fix relative links within markdown cells of notebooks and within markdown files.
- adds banner at top of file to indicate the files move and status

Details:
- Move the file/folder with `git mv`
    - commit the staged move
- Create a list of moved files that are either `.md` or `.ipynb` files
- Find and fix and relative links inside these files
- Detect which files had changes and stage+commit them

IN PROGRESS:
- Add/Edit a banner for the file that has notes on the files location change: date, old location, new/current location
    - stage+commit these changes
- Check all other files in the repository for links to the old file location and update to the new file location
    - stage+commit these changes

---
## Setup

Installs:

In [21]:
#!pip install GitPython

Imports:

In [46]:
import os, json, urllib.parse, IPython, pathlib, nbformat, re, git, datetime

nbformat.NO_CONVERT

nbformat.NO_CONVERT

Parameters:

In [2]:
os.getcwd()

'/home/jupyter/vertex-ai-mlops/Applied GenAI/resources'

In [62]:
# `from_path` and `to_path` can be a folder or a specific file.

repo_path = pathlib.Path('/home/jupyter/vertex-ai-mlops')
from_path = repo_path.joinpath('Applied GenAI/Vertex AI Search')
to_path = repo_path.joinpath('Applied GenAI/legacy/Vertex AI Search')
repo_path, from_path, to_path

(PosixPath('/home/jupyter/vertex-ai-mlops'),
 PosixPath('/home/jupyter/vertex-ai-mlops/Applied GenAI/Vertex AI Search'),
 PosixPath('/home/jupyter/vertex-ai-mlops/Applied GenAI/legacy/Vertex AI Search'))

Clients:

In [63]:
repo = git.Repo(repo_path)

---
## Prepare For File Changes

As the code in this workflow makes changes it will also stage and commit the changes.  For this reason it is important that no files be currently staged.  Take a moment to review staged files and finish committing them or remove them from staging.  This section will print staged files.  There is also code that will unstage any staged files.

In [64]:
staged_files = [item.a_path for item in repo.index.diff("HEAD")]

if staged_files:
    print("Staged files:")
    for file in staged_files:
        print(file)
else:
    print("No files are staged for commit.")

No files are staged for commit.


**UNSTAGE ALL FILES**

In [65]:
if staged_files:
    print('Unstaging files listed above:')
    repo.git.restore('--staged', '.')
else:
    print('No files to unstage.')

No files to unstage.


---
## Move Folder/File

This is a git repository so it is important to move the files with the commit history preserved using `git mv old_file new_file`.

In [137]:
repo = git.Repo(repo_path)

In [140]:
from_path.exists()

False

In [138]:
to_path.exists()

True

In [139]:
if from_path.exists():
    repo.git.mv(from_path, to_path)
    print(f'Files moved from: \n\t{from_path}\nto:\n\t{to_path}')
    repo.index.commit('Moved file')
    print(f'Moved files commited.')
elif to_path.exists():
    print(f'It appears the file(s) have already moved to:\n\t{to_path}')
else:
    print('Make sure the file(s) exists.  Currently not found in the from or to location')

It appears the file(s) have already moved to:
	/home/jupyter/vertex-ai-mlops/Applied GenAI/legacy/Vertex AI Search


---
## Files List

Create a list of files (.md and .ipynb) including their new full path. If `to_path` was a file then the files list will have just the one file in it.  If `to_path` was a folder then all files in the folder will be included in the list.

In [141]:
def file_list(from_path, to_path):
    # returns a list of tuples for files that contain (from_file_path, to_file_path)
    files = []
    if to_path.is_dir():
        for nb in to_path.glob("*.ipynb"):
            files.append(
                (
                    from_path.joinpath(nb.name),
                    nb
                )
            )
        for md in to_path.glob("*.md"):
            files.append(
                (
                    from_path.joinpath(md.name),
                    md
                )
            )
    elif to_path.is_file() and to_path.suffix in ['.md', '.ipynb']:
        files.append(
            (
                from_path.joinpath(to_path.name),
                to_path
            )
        )
    else:
        print(f'Check for existance of file/folder: {to_path}')

    return files

In [142]:
files = file_list(from_path, to_path)
files

[(PosixPath('/home/jupyter/vertex-ai-mlops/Applied GenAI/Vertex AI Search/Vertex AI Search Python Client Overview.ipynb'),
  PosixPath('/home/jupyter/vertex-ai-mlops/Applied GenAI/legacy/Vertex AI Search/Vertex AI Search Python Client Overview.ipynb')),
 (PosixPath('/home/jupyter/vertex-ai-mlops/Applied GenAI/Vertex AI Search/vertex_search_setup.md'),
  PosixPath('/home/jupyter/vertex-ai-mlops/Applied GenAI/legacy/Vertex AI Search/vertex_search_setup.md')),
 (PosixPath('/home/jupyter/vertex-ai-mlops/Applied GenAI/Vertex AI Search/readme.md'),
  PosixPath('/home/jupyter/vertex-ai-mlops/Applied GenAI/legacy/Vertex AI Search/readme.md'))]

---
## Find and Fix Links

Go through the contents of each `.ipynb` and `.md` file and detect any relative links used in markdown or HTML:
- resolve the link to an absolute path using the `from_path`
- check to see if the link exists
- prepare a new version that is relative given the `to_path`
- check to see if the new link exists
- update the relative link in the file

In [143]:
def link_fixer(from_file_path, to_file_path, link, files):
    # return new_link, a version of link (from_file_path) that is update to work with the new file at to_file_path
    decoded_link = urllib.parse.unquote(link)
    abs_link_path_from = (from_file_path.parent / decoded_link).resolve()
    also_moved = any(ffp[0] == abs_link_path_from for ffp in files)
    try:
        # if the linked file exists or was also moved then continue
        assert abs_link_path_from.exists() or also_moved, f"This link is broken before the move:\n\t{abs_link_path_from}"
        if also_moved:
            new_link = decoded_link
        else:
            common_path_to = pathlib.Path(os.path.commonpath([to_file_path, abs_link_path_from]))
            new_link = (len(to_file_path.parent.parts) - len(common_path_to.parts))*'../' + str(abs_link_path_from.relative_to(common_path_to))
        abs_link_path_to = (to_file_path.parent / new_link).resolve()

        try:
            assert abs_link_path_from == abs_link_path_to or abs_link_path_to.exists(), f"This function failed to fix the link:\n\t{abs_link_path_from}"
            new_link = urllib.parse.quote(new_link)
            return new_link
        except AssertionError as e:
            print(f"Error fixing link: {e}")
    except AssertionError as e:
        print(f"Error fixing link: {e}")
        
def find_relative_links(from_file_path, to_file_path, files):
    # returns a list of tuples for files that contain (from_file_path, to_file_path)
    relative_links = []
    regex = r"(?:\[.*?\]\((.*?)\)|<\w+\s+[^>]*?(?:href|src)=(['\"])(.*?)\2)" # capture markdown and qouted links in href and src
    if to_file_path.suffix == '.ipynb':
        nb = nbformat.read(to_file_path, nbformat.NO_CONVERT)
        for cell in nb.cells:
            if cell.cell_type == 'markdown':
                links = re.findall(regex, cell.source)
                for link in links:
                    link = link[0] or link[2]
                    if not link.startswith("http") and not link.startswith('/'):
                        new_link = link_fixer(from_file_path, to_file_path, link, files)
                        relative_links.append((from_file_path, to_file_path, link, new_link))
                        if new_link and new_link != link:  # Check if the link actually changed 
                            cell.source = cell.source.replace(link, new_link)
        # Save the modified notebook
        nbformat.write(nb, to_file_path)
    elif to_file_path.suffix == '.md':
        with open(to_file_path, "r") as f:
            content = f.read()
            links = re.findall(regex, content)
            for link in links:
                link = link[0] or link[2]
                if not link.startswith('http') and not link.startswith('/'):
                    new_link = link_fixer(from_file_path, to_file_path, link, files)
                    relative_links.append((from_file_path, to_file_path, link, new_link))
                    if new_link and new_link != link:
                        content = content.replace(link, new_link)
            # Save the modified markdown file
            with open(to_file_path, "w") as f:
                f.write(content)
                    
    return relative_links

In [151]:
if 'relative_links' in locals() or 'relative_links' in globals():
    print(f'Relative Links already reviewed and fixed: {len(relative_links)}')
else:
    relative_links = []
    for file in files:
        relative_links.extend(
            find_relative_links(*file, files)
        )

Relative Links already reviewed and fixed: 14


In [155]:
#relative_links

---
## Changed Files
Detect which files had changes and stage+commit them:

In [157]:
changed_files = list(set([file[1] for file in relative_links if file[3] and file[3] != file[2]]))
changed_files

[]

In [158]:
for changed_file in changed_files:
    # check for unstaged changes in the file
    diff_list = repo.index.diff(None, paths=[str(changed_file)])
    # if unstaged changes, stage them
    if diff_list:
        repo.git.add(str(changed_file))

In [159]:
# if staged changes then commit them
if repo.index.diff("HEAD"):
    repo.index.commit("Fixed relative links after moving this file")

---
## Add/Edit Banner With Location Change History
Make a section at the top of .md and .ipynb files indicating location changes with dates.  Files with changes will be staged+commit.

---
Not production code but this is a working version.  These are test that I do to make sure it is working properly:
- run and check each file for markdown code and preview the display
- rerun without changes to verify that it does not further update the files
- rerun with the `change_note` changed to the version with tomorrows date, ensure that it adds change note to the files
- rerun with the same tomorrow date for `change_note` and verify it does not further update the files
- discard all changes
- ensure `change_note` is set to today's date version.

---


In [160]:
update = 0
for from_file_path, to_file_path in files:
    # change note construction
    change_lead = f"---\n\n**File Move Notices**\n\nThis file moved locations:"
    change_note = f"""\n- On {datetime.date.today().strftime("%m/%d/%Y")} (mm/dd/yyyy)\n\t- From: `{from_file_path.relative_to(repo_path)}`\n\t- To: `{to_file_path.relative_to(repo_path)}`"""
    #change_note = f"""\n- On {(datetime.date.today()+datetime.timedelta(days=1)).strftime("%m/%d/%Y")} (mm/dd/yyyy)\n\t- From: `{from_file_path.relative_to(repo_path)}`\n\t- To: `{to_file_path.relative_to(repo_path)}`"""
    change_wrap = f"\n---\n<!---end of move notices--->\n\n"
    
    # notebook files
    if to_file_path.suffix == ".ipynb":
        nb = nbformat.read(to_file_path, nbformat.NO_CONVERT)
    
        # detect existing header in file
        if nb['cells'][0]['cell_type'] == 'markdown':
            content = nb['cells'][0]['source']
            if content.startswith('<!--- header table --->') or content.startswith('![tracker](https://'):
                start_cell = 1 # second cell
            else:
                start_cell = 0 # first cell
        
        # detect existing file change info - after header
        if nb['cells'][start_cell]['cell_type'] == 'markdown':
            content = nb['cells'][start_cell]['source']
            if content.startswith(change_lead):
                changes = content[0:(content.index(change_wrap)+len(change_wrap))]
                content = content[(content.index(change_wrap)+len(change_wrap)):]
            else:
                changes = ''
        else:
            print('No starting markdown cell')
            
        # edit/update change info
        if change_lead not in changes and len(changes) < 1:
            changes = change_lead+change_note+change_wrap
            content = changes+content
            update += 1
            nb['cells'][start_cell]['source'] = content
            nbformat.write(nb, to_file_path)
            # stage file here
            repo.git.add(str(to_file_path))
        elif change_note not in changes:
            changes = changes[0:-1*len(change_wrap)]+change_note+change_wrap
            content = changes+content
            update += 1
            nb['cells'][start_cell]['source'] = content
            nbformat.write(nb, to_file_path)
            # stage file here
            repo.git.add(str(to_file_path))
        elif change_note in changes:
            update += 0
            
    # markdown files
    elif to_file_path.suffix == ".md":
        with open(to_file_path, "r") as f:
            content = f.read()
            
        # detect existing header in file
        if content.startswith('<!--- header table --->') or content.startswith('![tracker](https://'):
            end_str = '</table><br/><br/><br/><br/>\n\n'
            end_index = content.index(end_str)+len(end_str)
            header = content[0:end_index]
            content = content[end_index:]
        else:
            print(f'Header is missing from: {to_file_path}')
            header = ''
            content = content
        
        # detect existing file change info - after header
        if change_lead not in content:
            changes = ''
            content = content[content.index('#'):]
        else:
            changes = content[0:(content.index(change_wrap)+len(change_wrap))]
            content = content[content.index('#'):]
        
        # edit/update change info
        if change_lead not in changes and len(changes) < 1:
            changes = change_lead+change_note+change_wrap
            content = header+changes+content
            update += 1
            with open(to_file_path, 'w') as f:
                f.write(content)
            # stage file here
            repo.git.add(str(to_file_path))
        elif change_note not in changes:
            changes = changes[0:-1*len(change_wrap)]+change_note+change_wrap
            content = header+changes+content
            update += 1
            with open(to_file_path, 'w') as f:
                f.write(content)
            # stage file here
            repo.git.add(str(to_file_path))
        elif change_note in changes:
            update += 0        
            
    else:
        print(f'No changes made to: {to_file_path}')

# commit here if update > 0 (staged above)

In [161]:
len(files), update

(3, 3)

In [162]:
# if staged changes then commit them
if update > 0 and repo.index.diff("HEAD"):
    repo.index.commit("Added banner to files indicate moved location")

---
## Check and Update references to the moved file

Now that the file is moved any references to it will need to be updated.  This code will review all files in the repository for references to the `from_file_path` and update them to the `to_file_path`.  This includes relative links.  All the changed files will then be staged+commit.

- list files
- loop through files:
    - read contents
    - find relative links
    - create absolute link
    - see if absolute link in list of from files
    - create new link to to_files
    - replace link in file
    - save file
    - stage file
    

In [169]:
files # a list of (from_file_path, to_file_path) tuples

[(PosixPath('/home/jupyter/vertex-ai-mlops/Applied GenAI/Vertex AI Search/Vertex AI Search Python Client Overview.ipynb'),
  PosixPath('/home/jupyter/vertex-ai-mlops/Applied GenAI/legacy/Vertex AI Search/Vertex AI Search Python Client Overview.ipynb')),
 (PosixPath('/home/jupyter/vertex-ai-mlops/Applied GenAI/Vertex AI Search/vertex_search_setup.md'),
  PosixPath('/home/jupyter/vertex-ai-mlops/Applied GenAI/legacy/Vertex AI Search/vertex_search_setup.md')),
 (PosixPath('/home/jupyter/vertex-ai-mlops/Applied GenAI/Vertex AI Search/readme.md'),
  PosixPath('/home/jupyter/vertex-ai-mlops/Applied GenAI/legacy/Vertex AI Search/readme.md'))]

In [181]:
# get a list of .md and .ipynb files from the full repository:
matches = list(repo_path.rglob("*.md")) + list(repo_path.rglob("*.ipynb"))
# filter out any that are in .ipynb_checkpoints directory or a known to_file_path (already edited above)
matches = [
    match_file_path
    for match_file_path in matches
    if match_file_path not in [to_file_path for _, to_file_path in files]
    and '.ipynb_checkpoints' not in match_file_path.parts
]
#matches

---
## Some Checks Using `relative_links`

(from_file_path, to_file_path, link, new_link)

In [36]:
relative_links[0]

(PosixPath('/home/jupyter/vertex-ai-mlops/Applied GenAI/Vertex AI Search/Vertex AI Search Python Client Overview.ipynb'),
 PosixPath('/home/jupyter/vertex-ai-mlops/Applied GenAI/legacy/Vertex AI Search/Vertex AI Search Python Client Overview.ipynb'),
 './vertex_search_setup.md',
 './vertex_search_setup.md')

In [37]:
relative_links[-1]

(PosixPath('/home/jupyter/vertex-ai-mlops/Applied GenAI/Vertex AI Search/readme.md'),
 PosixPath('/home/jupyter/vertex-ai-mlops/Applied GenAI/legacy/Vertex AI Search/readme.md'),
 './Vertex%20AI%20Search%20Python%20Client%20Overview.ipynb',
 './Vertex%20AI%20Search%20Python%20Client%20Overview.ipynb')

In [38]:
len(relative_links)

14

In [40]:
for rl in relative_links:
    print(rl[3], rl[3]!=rl[2])

./vertex_search_setup.md False
./vertex_search_setup.md False
../../../architectures/notebooks/applied/genai/vertex_ai_search/vertex_search_step_0.png True
../../../architectures/notebooks/applied/genai/vertex_ai_search/vertex_search_step_1.png True
../../../architectures/notebooks/applied/genai/vertex_ai_search/vertex_search_step_2.png True
../../../architectures/notebooks/applied/genai/vertex_ai_search/vertex_search_step_3.png True
../../../architectures/notebooks/applied/genai/vertex_ai_search/vertex_search_step_4.png True
../../../architectures/notebooks/applied/genai/vertex_ai_search/vertex_search_step_5.png True
../../../architectures/notebooks/applied/genai/vertex_ai_search/vertex_search_step_6.png True
../../../architectures/notebooks/applied/genai/vertex_ai_search/vertex_search_step_7.png True
../../../architectures/notebooks/applied/genai/vertex_ai_search/vertex_search_step_8.png True
../../../architectures/notebooks/applied/genai/vertex_ai_search/vertex_search_step_9.png Tru