# Software Introduction

####

### 1. Anaconda
- An all-inclusive resort for running software: jupyter notebook, python, and others.
- Software runs inside their own local environment, preventing conflicts with the environment of your physical machine.
- Default local environment is "base", but one can create new environments per project/pipeline.
    
> *An environment is a virtual computer with (in our case) its own Python installation, python libraries/packages, and specific library/package versions, which can be completely different from what is installed on the physical machine or other environments.*
>
>> **When linked to a python pipeline**
>> - Guarantees successful runs across time.
>>   - Runs under the same conditions in which it was ***successfully*** built.
>>   - Prevents breaks caused by updates to python packages in the future. 
>> - Guarantees successful runs across separate physical machines.
>>   - Gives the user the software recipe to build the correct environment, instead of using their machine's native environment.

####

### 2. Jupyter Notebook
- Included in the Anaconda distribution.
- Useful when you want to run python interactively: explore the data, generate plots, run and test python scripts as one builds them. 
- To run a python script non-interactively, one exports the notebook as an **Executable Script**. This creates a python/**.py** file used to either:
> - store python functions that collectively build a package — packages allow you to easily call functions inside other python scripts.
> - executes the pipeline: reads data, processes data into results, outputs results.
>> ***Note:*** *jupyter notebooks are ***not*** used to run the pipeline or build the pipeline's associated python package.*
- Has ability to run under any python environment (or "kernel") created within Anaconda — i.e. can emulate the true pipeline environment.
    
####

### 3. Visual Studio Code
- A text editor with python syntax highlighting via an external application called *Pylance*.
- It can run python code but I use it to easily edit python scripts (.py), rather than run code. 
> **Check that Pylance is installed**
> - Open up VSC
> - Go to file > New File -> Dropdown menu
> - Check in the dropdown menu for the "Python File" option. If you don't see it, you need to install Pylance.
>
>>
>> **Install Pylance**
>> - Open up VSC
>> - Select the 5th icon down in the left pane
>> - Search bar: "Pylance"
>> - Click on Pylance and install it. 

####

### 4. Github
- **Github** is a cloud service that stores repositories of code, allowing for simultaneous collaboration. 
> ***It's Box but for code, not data!***
- Cloud repositories are downloaded, or ***cloned***, onto your physical machine. 
> - make changes locally
> - **push** those changes to the cloud repository

####

### 5. Git

- **Git** is version control softare through which your local repository and the cloud repository communicate.
- It comes with a suite of commands that give users control over that relationship.
- The emphasis is on **CONTROL**. 
> - User control can become very complex when there are several collaborators making changes daily and simultaneously.     
>> **For or our purposes, we will be avoiding that complexity and only need to learn a small number of commands, simple version-control concepts, and basic best practices.**

**Main Git commands run locally in Git Bash**
> - `git init` — initialize local folder for github
> - `git clone <repository url>` — copy a repo in the cloud to your local computer
> - `git checkout -b <name>` — create a "branch" on which to make changes to the repo.
> - `git status` — what is different between your local repo and the cloud repo
> - `git commit -m "<description>"` — package all the local changes under a single purpose, preparing them to be pushed to the cloud  
> - `git add <file>` or `git add .` — add an untracked file or all untracked files (".") to be added to the commit.
> - `git rm <file>` — remove a file from being tracked by the repo 
> - `git push -u origin main` or `git push -u origin <branch>` — push the local changes to the repo or to a branch off the repo.
> - `git config --global user.name <your name>` — connect your github account to your local machine
> - `git config --global user.email <your email>` — connect your github account to your local machine 

####

### 6. Git Bash

- **Git Bash** is a command-line terminal for Windows that lets you run Git commands and, as a bonus, many Linux-style system commands inside a Bash shell.
- Python was built for Unix/Linux systems.
- Most servers run Linux with Bash, so nearly all software and data engineers are fluent in Bash—much more than Windows command line.

**Common Linux-Bash system commands**

> - `cd <path/to/folder>` — navigate to a folder
> - `pwd` — print the directory you are in
> - `mkdir <name>` — create a folder  
> - `ls` — list files in the current directory  
> - `cp <file> <path>` — copy a file to a new location  
> - `rm <file>` — delete a file  
> - `touch <file.ext>` — create an empty file with the given extension


# _________________________________________________________________________________
# _________________________________________________________________________________
####


# Initial Setup — Only Run Once
####

## Sign up for a Github account with your UD email

- Go to `https://github.com/signup`

####

## Connect to the *ir-team-retreat* Repository

####

>#### STEP 1 — Open Git Bash
>
>####
>
>#### STEP 2 — Create local directories
>
>> - `mkdir -p C:/Users/<username>/ir/ir-pipelines`
>>   
>> - `cd C:/Users/<username>/ir/ir-pipelines`
>>
>> - `pwd`
>
>####
>
>#### STEP 3 — Clone the Repository
>
>> - `git clone https://github.com/sruddy1/ir-team-retreat.git`
>
>####
>
>#### STEP 4 — Create Your Personal Branch
>
>> - `cd ir-team-retreat`
>>
>> - `pwd`
>> 
>> - `git checkout -b team-retreat/<your-first-name>`
>>
>>   - Creates a ***branch*** in Github, which is essentially your own copy of the repo.
>>   - Disconnects your changes from the root repo, referred to as "main" or "master"
>>   - Allows changes to be made and pushed to Github without updating "main".
>>   - Prevents conflicts between users who are using separate branches. 

####

## Set up GitBash with Anaconda Commands and Python

####

>#### STEP 1 — Open Visual Studio Code
>
>####
>
>#### STEP 2 — Turn on View Hidden Files
>
>> - Open `File Explorer` > `View` > `Show` > Select `Hidden Items`
>
>####
>
>#### STEP 3 — Confirm ~/.bashrc Exists
>
>> - Open `File Explorer` > Navigate to `C:\Users\<username>` > Search for `.bashrc`
>
>> - If it does not exist:
>>   - Open up `GitBash`
>>   - touch ~/.bashrc
>
>####
>
>#### STEP 3 — Update ~/.bashrc
>
>> - Go to `File` > `Open File` > Open `C:\Users\<username>\.bashrc`
>>
>> - Confirm/Find path to conda.sh based on installation type:
>>
>>   - All Users Installation: `/c/ProgramData/anaconda3/etc/profile.d/conda.sh`
>>   - Single User Installation: `/c/Users/sruddy1/anaconda3/etc/profile.d/conda.sh`
> 
>> - Confirm/Find path to anaconda3 folder based on installation type:
>>
>>   - All Users Installation: `/c/ProgramData/anaconda3`
>>   - Single User Installation: `/c/Users/sruddy1/anaconda3`
>
>> - Add the following command
>>
>>   ```
>>   if [ -f "<path-to-conda.sh>" ]; then
>>    . "<path-to-conda.sh>"
>>   fi
>>   ```
> 
>> - Add the following command
>>
>>   `export PATH="<path-to-anaconda3>:$PATH"`
>>
>
>> - Save & Close File
>
>> - Open GitBash (if already open, close and reopen)
>>
>>   - `source ~/.bashrc`
>>
>>   - `conda init bash`
>>  
>>   - Close & Reopen GitBash 




# _________________________________________________________________________________
# _________________________________________________________________________________
####

# Run an Existing Pipeline Stored on GitHub


####

## Update Configuration File

####

>#### STEP 1 — Open Visual Studio Code
>
>####
>
>#### STEP 2 — Open the Pipeline Configuation File in VSC
>
>> - Click `File` > `Open File`
>>   
>> - Navigate to `C:\Users\<username>\ir\ir-pipelines\ir-team-retreat\configs\config.yaml`
>
>####
>
>#### STEP 3 — Change file paths to match your local machine
>
>> - **root**: C:/Users/`<username>`/Box
>>   
>> - **pell_dir**: C:/Users/`<username>`/Box/Office of Decision Support Analyst Projects/ATI Kessler/Fall 2025 Pell Reporting"
>>
>> - **retention_dir**: C:/Users/`<username>`/Box/Inst Res Collab/Tableau Server Data Sources/Retention"
>>
>> - **enrollment_dir**: C:/Users/`<username>`/Box/Inst Res Collab/Tableau Server Data Sources/Census Date Enrollment"
>>
>> - **results_dir**: C:/Users/`<username>`/Box/Inst Res Collab/Team Retreat Pipeline Results/`<First Name>` — ***first letter capitalized***
>
>####
>
>#### STEP 4 — Save & Close

####

## Build & Activate the Python Virtual Environment

####

>#### STEP 1 — Open Git Bash
>
>####
>
>#### STEP 2 — Deactivate Existing Environment
>
>> - `deactivate`
>>   - a `deactivate: command not found` is good!
>>   - if the error does not appear that's also good, and it means you had an environment active that is now deactivated.
>
>####
>
>#### STEP 3 — Build the Virtual Environment
>
>> - `cd C:/Users/<username>/ir/ir-pipelines/ir-team-retreat`
>>   
>> - `python -m venv .venv`
>
>#### 
>
>#### STEP 4 — Activate the Virtual Environment
>
>> - `source .venv/Scripts/activate`
>>
>>   - This loads the python version used to build the pipeline.
>

####

## Install Packages into the Virtual Environment

####

>#### STEP 1 — Install External Python Packages into the Virtual Environment
>
>> - `pip install -r requirements.txt`
>>
>>   - `requirements.txt` contains a list of packages along with their version numbers used to build the pipeline.
>
>####
>
>#### STEP 2 — Install the Pipeline Python Package into the Virtual Environment
>
>> - `pip install -e .`
>

####

## Run the Pipeline

####

>
> - `python run.py`
>
> - `deactivate`
>


####

## Check Results Folder

####

> - Use File Explorer to navigate to `C:\Users\<username>\Box\Inst Res Collab\Team Retreat Pipeline Results\<First Name>` and confirm the pipeline successfully output the results file.

####

## Push Branch to Github

####

> - `git status` : see what files have been changed locally
>   
> - `git add .` : add all changed files to the commit
>> - Ignore `LF will be replaced by CRLF...` warning.
>
> - `git commit -m "Successfully ran pipeline"` : collect changes into a commit
>   
> - `git push -u origin team-retreat/<name>` : push commit to the github branch of the repo (this does not change the main repo)

####

## Run Branch in the Future

####

> - Open Git Bash
> - `cd C:/Users/<username>/ir/ir-pipelines/ir-team-retreat` : Navigate to the repo on your local machine.
> - `git checkout team-retreat/<name>` : if the analysis hasn't been updated, otherwise git checkout a new branch from main as done previously
> - Then,
>> - Update Config File
>> - Build environmet
>> - Activate environment
>> - Install External Packages
>> - Install Pipeline Package
>> - Run Pipeline
>> - Deactivate Environment
>> - git add > commit > push changes (if you want to keep a record)

####

## All Commands

####

> - Open Git Bash
>   
> - if repo not currently on local machine
>>   - `git clone https://github.com/<github-account-name>/<name-of-repo>.git`
>     
> - `cd <path-to-repo>/<name-of-repo>`
>
> - `git checkout main` : necessary in case your local repo is linked to an existing branch.
>
> - `git checkout -b <branch-folder>/<branch-name>`
>
> - Update config file using VSC located here: `<name-of-repo>/configs/config.yaml`
>
>> - Make sure all the directories exist.
>
> - `python -m venv .venv`
>
> - `source .venv/Scripts/activate`
>
> - `pip install -r requirements.txt`
>
> - `pip install -e .`
>
> - `python run.py`
>
> - `deactivate`
>
> - `git add .`
>
> - `git commit -m "Type informative message"`
>
> - `git push -u origin <branch-folder>/<branch-name>`
>
####


# Create a Python Pipeline

####

### Folder Structure


# _________________________________________________________________________________
# _________________________________________________________________________________
####

# Convert Python Pipeline to a Git Repository

####
