# GitHub Refresher Lab #

As we entering the homestretch and your focus turns to the capstones, now is a good time to refresh the GitHub basics as it provides a great location to showcase your work if desired (it is not required to upload your capstones, but encouraged!)



## Getting Started ##
### Create GitHub Repository ###
- Begin by creating a new, empty repository.  Do not select any of the initialization options, as we will be adding our own files later.

![Screen%20Shot%202021-01-26%20at%2011.21.08%20AM.png](attachment:Screen%20Shot%202021-01-26%20at%2011.21.08%20AM.png)

### Create a local repository on the command line and link to GitHub ###
- From here, follow the instructions provided to create the repository on the command line.
- If a `README.md` file has already been created, skip the first step `echo "# demo" >> README.md`


![Screen%20Shot%202021-01-26%20at%2011.27.46%20AM.png](attachment:Screen%20Shot%202021-01-26%20at%2011.27.46%20AM.png)

### README.md ###

The `README.md` file is critically important to your repository.  Think of it as the landing page for your project.  It will be used to provide a brief but informative introduction to the project, the problem statement, and the process.  It is used to provide the user an understanding of the entire repository including a guide to any package installations as well as a table of contents and order of operations, if project is split among multiple notebooks.

### .gitignore ###

Another important functionality of the repository to explore is the `.gitignore` file.   This file is used to provide a list of directories, files, and/or extensions that should be excluded from the GitHub repository.

Common items to ignore:
- Operating system files
- Jupyter notebook checkpoints
- .zip files
- .csv files if data is hosted publicly
- env/ directories
- anything you want ignored

It is a simple text file with each item to be ignored.  Can/should add comments to the file for clarity

### License ###

The third option when initializing was to add a license.  There are a number of open source licenses available, a popular one from MIT which is used whenever you would like to simply publish a project to the public for anyone to use however they see fit.  It is not required, but if you would like to add an open license you may.  If you happened to invent a proprietary algorithm or novel approach, you may want to work on licensing it for your own benefit instead.

## Exporting Environment / Preparing List of Dependencies ##

One of the most import aspects to remember when publishing your project is the conda environment you have meticulously crafted to run your magnum opus currently only exists on your machine!  You need to export the `environment.yml` file and share it as well.

An alternative method would be for a simpler project within a less extravagent environment would be to provide a list of dependencies in a `requirements.txt` file for easy pip installation.

In [1]:
%%bash
# navigate up to the parent directory of the project
cd ..

# create the new folder to store the file and any other environment variables in
mkdir env

# navigate into the environment folder
cd env

# export current active environment to .yml file
conda env export > environment.yml

mkdir: env: File exists


In [2]:
%%bash
# Navigate to the parent directory of the project
cd ..

# create requirements.txt file 
touch requirements.txt

# Add list of required packages to requirements.txt with nano or other text editor

Great!  Now all they would have to do is run `pip install -r requirements.txt` to install the required dependencies into their preferred environment.

Or, if you exported the `enviroment.yml` file share it as well.  To create an environment from a .yml file `conda env create --file environment.yml`

### Bonus! Importing Custom Functions ###

As a bonus, I thought this was pretty cool.  You can create any of your own functions and save them to a python script to be imported throughout future projects!  As an example, I ran into a project where I needed to perform a multilabel train_test_split and none of the built-in functions were working for me.  I found this great `multilabel.py` script to add to my project in the `data_preprocessing` folder of the `src` folder in our project.

As an additional bonus, we can utilize the os methods to get the path to the `src` directory without having to hard code it!

- to showcase the example, I also created a fun/dumb function in `hello.py` and stored it in the `hello_world` folder within `src`

In [3]:
import os
import sys

# add the 'src' directory as one where we can import modules
src_dir = os.path.join(os.getcwd(), os.pardir, 'src')
sys.path.append(src_dir)

# Similar to hard coding this but dynamic (so you can copy and paste, or clone this project repository)
# path = '/Users/bpolzin/Documents/driven_data/predicting_vaccines/src'
# sys.path.append(path)

from data_preprocessing.multilabel import multilabel_sample_dataframe, multilabel_train_test_split
from hello_world.hello import hello_name_n_times

In [4]:
hello_name_n_times('Ben',5)

Hello Ben! Welcome to the world!
Hello Ben! Welcome to the world!
Hello Ben! Welcome to the world!
Hello Ben! Welcome to the world!
Hello Ben! Welcome to the world!
