# Setting Things Up

```yaml
Course:   DS 5001 Exploratory Text Analytics
Module:   01 Getting Started
Topic:    Setting up your environment
Author:   R.C. Alvarado
Date:     18 January 2024
```

**Purpose**

This notebook describes how to set up the working environmentrequired to perform the work in this course. 

It includes how to access the course materials and set up your local workspace.

## Your Local Workspace

1. Create a directory for your coursework on your local machine (or wherever you are doing your work). \
You should call it `DS5001`. This directory is your workspace for this course.

2. Within this directory, create the following subdirectories: `data` and `output`.

## Forking the Course Repo

1. Fork the course repo to your personal GitHub account.

2. Clone the forked repo to your local workspace, i.e. within the directory you just created, e.g. `DS5001`. \
You may want to alias the repo as below for a cleaner directory structure:

```bash
git clone git@github.com:ontoligent/DS5001-2024-01-R.git repo
```

## Adding Your Environment File

1. Look inside the `/lessons` folder of the locally cloned repo and copy the file `env-sample.ini` to some place outside of your repo, such as directly within the course workspace. \
Name the copied file `env.ini`. \
Your directory should look something like this now:

```bash
/DS5001
    /code
    /output
    /repo
    env.ini
```

2. Look inside the file and update the values to reflect your environment. \
Here's what mine looks like:

```ini
[DEFAULT]
course_id = DS5100
course_term = Spring 2024
course_delivery = Residential
user_name = Rafael C. Alvarado
user_email = rca2t@virginia.edu
base_path = /Users/rca2t1/Dropbox/Courses/DS/DS5001/DS5001_2024_01_R
local_lib = %(base_path)s/repo/lessons/lib
data_home = %(base_path)s/data
output_dir = %(base_path)s/output
```

3. This file will be called from within your notebooks using Pythons <tt>configparser</tt> module. \
Here is sample cell block that will appear in most of your notebooks:

```python
import configparser
config = configparser.ConfigParser()
config.read("../../../env.ini")
data_home = config['DEFAULT']['data_home']
output_dir = config['DEFAULT']['output_dir']
```

4. You may add more values to your `env.ini` file, such as credential information or the location of other resources on your system.

## The Workflow

1. Each time you want to update the forked repo (on GitHub), such as when you are working on a new module, press the "Sync Fork" button. This will update the contents of your forked repo with any new materials added by the professor. By the way, make sure you have the <tt>main</tt> branch selected when you do this.

2. From within the <tt>main</tt> branch in your local repo, do a `git pull origin main`. Make sure you are in the <tt>main</tt> branch when you do this. To find out which branch you are in, do `git branch`; this will show you all of your branches with an asterisk by the one you are in. To get into the <tt>main</tt> repo, do `git checkout main`.

2. To work on the notebooks in the repo without altering the state of the main branch, you will create new branches wherein you may do your work. To create a new branch for your work do this `git checkout -b m01`, assuming you are working on Module 1. You can follow this pattern for all of your work.

4. You can commit changes from within your branch and push it to your forked repo.

5. Note you don't need to merge the main branch with your working branch.

**Note that there are other ways to do this:**

* Just copy the files locally from the course repo.
* Create one branch for your work and do a merge each time the content is updated.
    
    

## Getting the Data

1. Very often the notebooks you work on will use data that have been curated for this course. The URL to the Dropbox folder is here:

> https://www.dropbox.com/scl/fo/0k07nufmrurva2nv8m4vn/h?rlkey=s1ethf3moqagc33d9upomwoa7&dl=0

2. Copy folders and files as needed to your local workspace.