---
title: "Using a GitHub Personal Access Token (PAT) to Push/Pull from a SageMaker Notebook"
teaching: 25
exercises: 10
---

:::::::::::::::::::::::::::::::::::::: questions 

- How can I securely push/pull code to and from GitHub within a SageMaker notebook?
- What steps are necessary to set up a GitHub PAT for authentication in SageMaker?
- How can I convert notebooks to `.py` files and ignore `.ipynb` files in version control?

::::::::::::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: objectives

- Configure Git in a SageMaker notebook to use a GitHub Personal Access Token (PAT) for HTTPS-based authentication.
- Securely handle credentials in a notebook environment using `getpass`.
- Convert `.ipynb` files to `.py` files for better version control practices in collaborative projects.

::::::::::::::::::::::::::::::::::::::::::::::::

# Using a GitHub Personal Access Token (PAT) to Push/Pull from a SageMaker Notebook

When working in SageMaker notebooks, you may often need to push code updates to GitHub repositories. However, SageMaker notebooks are typically launched with temporary instances that don’t persist configurations, including SSH keys, across sessions. This makes HTTPS-based authentication, secured with a GitHub Personal Access Token (PAT), a practical solution. PATs provide flexibility for authentication and enable seamless interaction with both public and private repositories directly from your notebook. 

> **Important Note**: Personal access tokens are powerful credentials that grant specific permissions to your GitHub account. To ensure security, only select the minimum necessary permissions and handle the token carefully.


## Step 1: Generate a Personal Access Token (PAT) on GitHub

1. Go to **Settings > Developer settings > Personal access tokens** on GitHub.
2. Click **Generate new token**, select **Classic**.
3. Give your token a descriptive name (e.g., "SageMaker Access Token") and set an expiration date if desired for added security.
4. **Select the minimum permissions needed**:
   - **For public repositories**: Choose only **`public_repo`**.
   - **For private repositories**: Choose **`repo`** (full control of private repositories).
   - Optional permissions, if needed:
     - **`repo:status`**: Access commit status (if checking status checks).
     - **`workflow`**: Update GitHub Actions workflows (only if working with GitHub Actions).
5. Generate the token and **copy it** (you won’t be able to see it again).

> **Caution**: Treat your PAT like a password. Avoid sharing it or exposing it in your code. Store it securely (e.g., via a password manager like LastPass) and consider rotating it regularly.


## Step 2: Configure Git `user.name` and `user.email`
In your SageMaker or Jupyter notebook environment, run the following commands to set up your Git user information


#### Directory setup
Let's make sure we're starting at the same directory. Cd to the root directory of this instance before going further.

In [3]:
%cd /home/ec2-user/SageMaker/
!pwd

/home/ec2-user/SageMaker
/home/ec2-user/SageMaker


In [4]:

!git config --global user.name "Chris Endemann"
!git config --global user.email endeman@wisc.edu


### Explanation

- **`user.name`**: This is your GitHub username, which will appear in the commit history as the author of the changes.
- **`user.email`**: This should match the email associated with your GitHub account so that commits are properly linked to your profile.

Setting this globally (`--global`) will ensure the configuration persists across all repositories in the environment. If you’re working in a temporary environment, you may need to re-run this configuration after a restart.

## Step 3: Use `getpass` to Prompt for Username and PAT

The `getpass` library allows you to input your GitHub username and PAT without exposing them in the notebook. This approach ensures you’re not hardcoding sensitive information.


In [72]:
import getpass

# Prompt for GitHub username and PAT securely
github_url = 'github.com/UW-Madison-DataScience/test_AWS.git' # found under Code -> Clone -> HTTPS (remote the https:// before the rest of the address)
username = input("GitHub Username: ")
token = getpass.getpass("GitHub Personal Access Token (PAT): ")

**Note**: After running, you may want to comment out the above code so that you don't have to enter in your login every time you run your whole notebook


### Explanation

- **`input("GitHub Username: ")`**: Prompts you to enter your GitHub username.
- **`getpass.getpass("GitHub Personal Access Token (PAT): ")`**: Prompts you to securely enter the PAT, keeping it hidden on the screen.



## Step 4: Add, Commit, and Push Changes with Manual Authentication
### 1. Navigate to the Repository Directory (adjust the path if needed):


In [5]:
!pwd
%cd test_AWS

/home/ec2-user/SageMaker
/home/ec2-user/SageMaker/test_AWS


### 2. Preview changes: You may see elaborate changes if you are tracking ipynb files directly.

In [82]:
!git diff 

nbdiff /tmp/git-blob-PLwmtf/04_Interacting-with-code-repo.ipynb 04_Interacting-with-code-repo.ipynb
--- /tmp/git-blob-PLwmtf/04_Interacting-with-code-repo.ipynb  2024-11-01 21:19:40.081619
+++ 04_Interacting-with-code-repo.ipynb  2024-11-01 21:19:30.253573
[34m[1m## replaced /cells/20/execution_count:[0m
[31m-  55
[32m+  79

[0m[34m[1m## inserted before /cells/20/outputs/0:[0m
[32m+  output:
[32m+    output_type: stream
[32m+    name: stdout
[32m+    text:
[32m+      [main bc28ce1] Updates from Jupyter notebooks
[32m+       1 file changed, 875 insertions(+), 56 deletions(-)

[0m[34m[1m## deleted /cells/20/outputs/0:[0m
[31m-  output:
[31m-    output_type: stream
[31m-    name: stdout
[31m-    text:
[31m-      [main 0363cc2] Added updates from Jupyter notebook
[31m-       7 files changed, 416 insertions(+), 91 deletions(-)
[31m-       delete mode 100644 00_Data-storage-and-access-via-buckets.ipynb
[31m-       create mode 100644 01_Setting-up-S3-bucket.md
[31m

### 3. Convert json ipynb files to .py

To avoid tracking ipynb files directly, which are formatted as json, we may want to convert our notebook to .py first (plain text). This will make it easier to see our code edits across commits. Otherwise, each small edit will have massive changes associated with it.

#### Benefits of converting to `.py` before Committing

- **Cleaner Version Control**: `.py` files have cleaner diffs and are easier to review and merge in Git.
- **Script Compatibility**: Python files are more compatible with other environments and can run easily from the command line.
- **Reduced Repository Size**: `.py` files are generally lighter than `.ipynb` files since they don’t store outputs or metadata.

Converting notebooks to `.py` files helps streamline the workflow for both collaborative projects and deployments. This approach also maintains code readability and minimizes potential issues with notebook-specific metadata in Git history. Here’s how to convert `.ipynb` files to `.py` in SageMaker without needing to export or download files:

#### Method 1: Using JupyText

1. **Install Jupytext** (if you haven’t already):

In [6]:
!pip install jupytext


Collecting jupytext
  Downloading jupytext-1.16.4-py3-none-any.whl.metadata (13 kB)
Collecting mdit-py-plugins (from jupytext)
  Downloading mdit_py_plugins-0.4.2-py3-none-any.whl.metadata (2.8 kB)
Downloading jupytext-1.16.4-py3-none-any.whl (153 kB)
Downloading mdit_py_plugins-0.4.2-py3-none-any.whl (55 kB)
Installing collected packages: mdit-py-plugins, jupytext
Successfully installed jupytext-1.16.4 mdit-py-plugins-0.4.2


1. **Run the following command** in a notebook cell to convert the current notebook to a `.py` file:

This command will create a `.py` file in the same directory as the notebook.

In [76]:
# Replace 'your_notebook.ipynb' with your actual notebook filename
!jupytext --to py Data-storage-and-access-via-buckets.ipynb

[jupytext] Reading 03_Data-storage-and-access-via-buckets.ipynb in format ipynb
[jupytext] Updating the timestamp of 03_Data-storage-and-access-via-buckets.py


#### Method 2: Automated Script for Converting All Notebooks in a Directory

If you have multiple notebooks to convert, you can automate the conversion process by running this script, which converts all `.ipynb` files in the current directory to `.py` files:

In [77]:
import subprocess
import os

# List all .ipynb files in the directory
notebooks = [f for f in os.listdir() if f.endswith('.ipynb')]

# Convert each notebook to .py using jupytext
for notebook in notebooks:
    output_file = notebook.replace('.ipynb', '.py')
    subprocess.run(["jupytext", "--to", "py", notebook, "--output", output_file])
    print(f"Converted {notebook} to {output_file}")


[jupytext] Reading 05_Intro-train-models.ipynb in format ipynb
[jupytext] Updating the timestamp of 05_Intro-train-models.py
Converted 05_Intro-train-models.ipynb to 05_Intro-train-models.py
[jupytext] Reading 03_Data-storage-and-access-via-buckets.ipynb in format ipynb
[jupytext] Updating the timestamp of 03_Data-storage-and-access-via-buckets.py
Converted 03_Data-storage-and-access-via-buckets.ipynb to 03_Data-storage-and-access-via-buckets.py
[jupytext] Reading 03_Data-storage-and-access-via-buckets-test.ipynb in format ipynb
[jupytext] Updating the timestamp of 03_Data-storage-and-access-via-buckets-test.py
Converted 03_Data-storage-and-access-via-buckets-test.ipynb to 03_Data-storage-and-access-via-buckets-test.py
[jupytext] Reading 06_Hyperparameter-tuning.ipynb in format ipynb
[jupytext] Updating the timestamp of 06_Hyperparameter-tuning.py
Converted 06_Hyperparameter-tuning.ipynb to 06_Hyperparameter-tuning.py
[jupytext] Reading create_large_data.ipynb in format ipynb
[jupytext

### 4. Adding .ipynb to gitigore

Adding `.ipynb` files to `.gitignore` is a good practice if you plan to only commit `.py` scripts. This will prevent accidental commits of Jupyter Notebook files across all subfolders in the repository.

Here’s how to add `.ipynb` files to `.gitignore` to ignore them project-wide:

1. **Open or Create the `.gitignore` File**:

    ```python
    !ls -a # check for existing .gitignore file
    ```
    
   - If you don’t already have a `.gitignore` file in the repository root (use '!ls -a' to check, you can create one by running:
   
     ```python
     !touch .gitignore
     ```


2. **Add `.ipynb` Files to `.gitignore`**:

   - Append the following line to your `.gitignore` file to ignore all `.ipynb` files in all folders:

     ```plaintext
     *.ipynb # Ignore all Jupyter Notebook files
     ```

   - You can add this line using a command within your notebook:
   
     ```python
     with open(".gitignore", "a") as gitignore:
         gitignore.write("\n# Ignore all Jupyter Notebook files\n*.ipynb\n")
     ```



3. **Verify and Commit the `.gitignore` File**:

   - Add and commit the updated `.gitignore` file to ensure it’s applied across the repository.

     ```python
     !git add .gitignore
     !git commit -m "Add .ipynb files to .gitignore to ignore notebooks"
     !git push origin main
     ```

This setup will:
- Prevent all `.ipynb` files from being tracked by Git.
- Keep your repository cleaner, containing only `.py` scripts for easier version control and reduced repository size. 

Now any new or existing notebooks won’t show up as untracked files in Git, ensuring your commits stay focused on the converted `.py` files.


2. **Add and Commit Changes**:



In [7]:
!git add . # you may also add files one at a time, for further specificity over the associated commit message
!git commit -m "Updates from Jupyter notebooks" # in general, your commit message should be more specific!


[main f4b268e] Updates from Jupyter notebooks
 10 files changed, 3163 insertions(+), 256 deletions(-)
 delete mode 100644 01_Setting-up-S3-bucket.md
 delete mode 100644 02_Setting-up-notebook-environment.md
 rename 03_Data-storage-and-access-via-buckets.ipynb => Accessing-S3-via-SageMaker-notebooks.ipynb (72%)
 create mode 100644 Accessing-S3-via-SageMaker-notebooks.md
 rename 04_Interacting-with-code-repo.ipynb => Interacting-with-code-repo.ipynb (93%)


3. **Pull the Latest Changes from the Main Branch**: Pull the latest changes from the remote main branch to ensure your local branch is up-to-date.

    Recommended: Set the Pull Strategy for this Repository (Merge by Default)

    All options:

    * Merge (pull.rebase false): Combines the remote changes into your local branch as a merge commit.
    * Rebase (pull.rebase true): Replays your local changes on top of the updated main branch, resulting in a linear history.
    * Fast-forward only (pull.ff only): Only pulls if the local branch can fast-forward to the remote without diverging (no new commits locally).

In [8]:
!git config pull.rebase false # Combines the remote changes into your local branch as a merge commit.

!git pull origin main


remote: Enumerating objects: 8, done.[K
remote: Counting objects: 100% (8/8), done.[K
remote: Compressing objects: 100% (3/3), done.[K
remote: Total 6 (delta 2), reused 6 (delta 2), pack-reused 0 (from 0)[K
Unpacking objects: 100% (6/6), 152.14 KiB | 2.67 MiB/s, done.
From https://github.com/UW-Madison-DataScience/test_AWS
 * branch            main       -> FETCH_HEAD
   1602325..b2a59c3  main       -> origin/main
hint: Waiting for your editor to close the file... 7[?47h[>4;2m[?1h=[?2004h[?1004h[1;24r[?12h[?12l[22;2t[22;1t[29m[m[H[2J[?25l[24;1H"~/SageMaker/test_AWS/.git/MERGE_MSG" 6L, 300B[2;1H▽[6n[2;1H  [3;1HPzz\[0%m[6n[3;1H           [1;1H[>c]10;?]11;?[1;1H[33mMerge branch 'main' of https://github.com/UW-Madis[mon-DataScience/test_AWS
[34m# Please enter a commit message to explain why this merge is necessary,[m[2;72H[K[3;1H[34m# especially if it merges an updated upstream into a topic branch.[m[3;67H[K[4;1H[34m#
# Lines starting with

If you get merge conflicts, be sure to resolve those before moving forward (e.g., use git checkout -> add -> commit). You can skip the below code if you don't have any conflicts. 

In [91]:
# Keep your local changes in one conflicting file
# !git checkout --ours train_nn.py

# Keep remote version for the other conflicting file
# !git checkout --theirs train_xgboost.py

# # Stage the files to mark the conflicts as resolved
# !git add train_nn.py
# !git add train_xgboost.py

# # Commit the merge result
# !git commit -m "Resolved merge conflicts by keeping local changes"

4. **Push Changes and Enter Credentials**:

In [10]:
# Push with embedded credentials from getpass (avoids interactive prompt)
!git push https://{username}:{token}@{github_url} main

fatal: unable to access 'https://{github_url}/': URL rejected: Bad hostname


## Step 5: Pulling .py files and converting back to notebook format

Let's assume you've taken a short break from your work, and you would like to start again by pulling in your code repo. If you'd like to work with notebook files again, you can again use jupytext to convert your `.py` files back to `.ipynb`

This command will create `03_Data-storage-and-access-via-buckets-test.ipynb` in the current directory, converting the Python script to a Jupyter Notebook format. Jupytext handles the conversion gracefully without expecting the `.py` file to be in JSON format.

In [86]:
# Replace 'your_script.py' with your actual filename
!jupytext --to notebook Data-storage-and-access-via-buckets.py --output Data-storage-and-access-via-buckets-test.ipynb


[jupytext] Reading 03_Data-storage-and-access-via-buckets.py in format py
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/python3/bin/jupytext", line 8, in <module>
    sys.exit(jupytext())
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/jupytext/cli.py", line 497, in jupytext
    exit_code += jupytext_single_file(nb_file, args, log)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/jupytext/cli.py", line 561, in jupytext_single_file
    notebook = read(nb_file, fmt=fmt, config=config)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/jupytext/jupytext.py", line 431, in read
    with open(fp, encoding="utf-8") as stream:
FileNotFoundError: [Errno 2] No such file or directory: '03_Data-storage-and-access-via-buckets.py'


### Applying to all .py files
To convert all of your .py files to notebooks, you can use the following code:

In [87]:
import subprocess
import os

# List all .py files in the directory
scripts = [f for f in os.listdir() if f.endswith('.py')]

# Convert each .py file to .ipynb using jupytext
for script in scripts:
    output_file = script.replace('.py', '.ipynb')
    subprocess.run(["jupytext", "--to", "notebook", script, "--output", output_file])
    print(f"Converted {script} to {output_file}")


[jupytext] Reading train_xgboost.py in format py
[jupytext] Writing train_xgboost.ipynb
Converted train_xgboost.py to train_xgboost.ipynb
[jupytext] Reading train_nn.py in format py
[jupytext] Writing train_nn.ipynb
Converted train_nn.py to train_nn.ipynb


In [13]:
!pwd
!jupyter nbconvert --to markdown Interacting-with-code-repo.ipynb


/home/ec2-user/SageMaker/test_AWS
[NbConvertApp] Converting notebook Interacting-with-code-repo.ipynb to markdown
[NbConvertApp] Writing 25648 bytes to Interacting-with-code-repo.md


:::::::::::::::::::::::::::::::::::::: keypoints 

- Use a GitHub PAT for HTTPS-based authentication in temporary SageMaker notebook instances.
- Securely enter sensitive information in notebooks using `getpass`.
- Converting `.ipynb` files to `.py` files helps with cleaner version control and easier review of changes.
- Adding `.ipynb` files to `.gitignore` keeps your repository organized and reduces storage.

::::::::::::::::::::::::::::::::::::::::::::::::