# STAT 440 Statistical Data Management - Fall 2021
## Week 01 Notes
### Created by Christopher Kinson


***


### Table of Contents

- [Intro to course](#intro-to-course) 
- [Rental Inspection Grades Listing Data](#rental-inspection-grades-listing-data) 
- [Intro to software](#intro-to-software)
  - [Markdown](#markdown)
  - [Python and Jupyter Lab](#python-jupyter)
  - [Git and GitHub Enterprise](#git-and-github-enterprise)
  - [Knitting and Rendering to Html](#knitting-and-rendering-to-html)
- [Three Ways to Interact With This Course](#three-ways-to-interact-with-this-course)
  - [Using GHE](#using-github)
  - [Using Git From The Command Line](#using-git-from-the-command-line)
  - [Using Git With RStudio](#using-git-with-rstudio)


***


## <a name="intro-to-course"></a>Intro to course

Welcome to STAT 440/CSE 440! These are the course notes which can be found in the **stat440-fa21-course-content** repository or in GitHub Enterprise. **GitHub Enterprise may be called GHE short.** **The images/screenshots in these notes may refer to Spring 2021, but ignore that. The information is the accurate for Fall 2021.** The intent of this course is to discuss and demonstrate typical data wrangling and data management tasks for students who are comfortable using R or Python. These notes are written ideally for the student who is unfamiliar with such tasks or who has not seen data wrangling tasks on real datasets. If you are a practicing data engineer, then these notes may not be useful. If you came to these notes looking for information about Python and Jupyter Lab, please view the file with the .ipynb extension in the **stat440-fa21-course-content** repository or [here](https://github-dev.cs.illinois.edu/stat440-fa21/stat440-fa21-course-content/blob/master/notes/stat440-fa21-notes-week01-intro-to-course-and-software-python.ipynb).

I live in Urbana. 

![](http://2012.igem.org/wiki/images/e/e2/Urbana.jpg)

***This Photo by Unknown Author is licensed under CC BY***

I like Urbana and because I'm a transplant I like to learn more about these twin cities in my spare time. One way I like to learn about things is through looking at data. We live in a technologically advanced world and there just so happen to be data portals about our locality.

- [City of Urbana Open Data](https://data.urbanaillinois.us/) https://data.urbanaillinois.us/

- [Champaign County Regional Data Portal](https://data.ccrpc.org/sv/) https://data.ccrpc.org/sv/


***


## <a name="rental-inspection-grades-listing-data"></a>Rental Inspection Grades Listing Data

The City of Urbana Open Data portal has one particularly interesting dataset, the [Rental Inspection Grades Listing Data](https://data.urbanaillinois.us/Buildings/Rental-Inspection-Grades-Listing/tn7v-fm2b/data), that I will use as the main dataset to apply the data management concepts throughout this course and these notes.

With this dataset, you will learn how to

- access and import data
- scrape a website
- format and coerce data
- assign data as objects
- arrange and reshape data
- filter and select data
- mutate data
- implement loops and iterative execution for by-group processing
- use regular expression
- manipulate character strings
- validate data
- clean data
- summarize data
- combine data
- perform queries with SQL

But before we go any further, you should review the course syllabus (found in the **syllabus** directory of the **stat440-fa21-course-content** repository or [here](https://github-dev.cs.illinois.edu/stat440-fa21/stat440-fa21-course-content/blob/master/syllabus/stat440-fa21-syllabus.md)) and familiarize yourself with the software section. Below I say a bit more about the software used in this course and how it's going to help us. **I assume you have downloaded these necessary software at this point.*


***


## <a name="intro-to-software"></a>Intro to software

This course has come a long way since its inception and early well-defined notes of Maria Muyot. This course now has no SAS offerings, but is catering to R and Python software. A recent addition to the course is the use of Git and GHE. Git will serve as the course learning management system (LMS) and as a way for students to appreciate the advances of version control and collaboration.

### <a name="markdown"></a>Markdown

Markdown is a markup language with special syntax used to craft and design simple yet flexible text documents. Markdown permits users to author HTML, PDF, and MS Word documents. Latex syntax can be included in Markdown syntax as well. Markdown was created by John Gruber in 2004 and has caught on in popularity. Markdown syntax is used as the main text styles within RMarkdown, Jupyter Lab, and Jupyter Notebooks.

- Two nice references on Markdown syntax are https://www.markdownguide.org/basic-syntax/ and https://www.markdownguide.org/extended-syntax/. 

Here are some frequent Markdown syntax examples:  

  - **bold** text ( text enclosed with \*\* on both sides )
  
  - *italic* text ( text enclosed with \* on both sides )
  
  - lists (this is a list already) ( mark with \- )
  
  - tables ( pipes between columns and a new line with \-\-\-\|\-\-\- )
  
  Variable | Description
  ---|---
  Var1 | student ID
  Var2 | height of students
  
  - hyperlinks for URLs ( paste the URL )
  
  https://www.markdownguide.org/
  
  - hyperlinks for URL images ( mark with \!\[\](pastedURL) )
  
  ![](http://2012.igem.org/wiki/images/e/e2/Urbana.jpg)


### <a name="python-jupyter"></a>Python and Jupyter Lab

**Python** is an open-source programming language gaining popularity for its adaptability for software development and other pipeline workflows. It is free to use and steadily improved by its users in statistics and data science. We can code in Python - usually in a **script** - and saved as a .py file. The output of the code we run is visible with an **interpreter** (usually in an integrated development environment) in a **console**.

- Click here to download [Python](https://www.python.org/downloads/)

- This is what Python looks like  ![](https://uofi.box.com/shared/static/uzisigjygcbucxp5qsr1drh4g8ku1anb.png)

#### I use Python 3! Proper functioning and accurate results often depends on the version you have!

**Jupyter Lab** is a notebook environment that can run Python code and promotes reproducibility. This platform allows for live coding demonstrations, changes, and visualizations. It runs thanks to a Python kernel. The combination of software allows for improved programming experiences and integrated data science education. The Python coding will be done using **code cells** and text will be written in **Markdown cells** with Markdown syntax. A single Notebook file (.ipynb) contains both code and markdown syntax.

- Click here to download [Jupyter Lab](https://jupyter.org/install.html)

- This is what Jupyter Lab looks like   ![](https://uofi.box.com/shared/static/kksn1mnmj709kbvsnt0en98tn8oka9co.png)

This document that you are reading is a notebook file. 

- For more details on using Jupyter Lab see https://jupyterlab.readthedocs.io/en/stable/getting_started/starting.html.

I can explicitly embed a code cell like this:


In [1]:
1+3

4

We simply create a new cell and verify that it is marked as a code cell in the dropdown menu see below.

![](https://uofi.box.com/shared/static/t0p7oiajyjkqzcwhjb8ppl3yvo9ikik8.png)

***We always want the code to execute!***

### <a name="knitting-and-rendering-to-html"></a>Knitting and Rendering to Html

For the assignments in this course, students will submit two files to their individual student repository, 1) either .Rmd or .ipynb file and 2) .html file. The .Rmd (or .ipynb) is the reproducible document file that contains your solutions to given assignments. These reproducible documents have a function that allows them to be rendered to a nicer web-browser-friendly format: html. Students need to make sure they always submit both files by the deadline to receive a proper grade. Rendering to html can happen:

Using Jupyter Lab, you click on the green play icon "Play".

![](https://uofi.box.com/shared/static/r6isun83wp0mjcrl6det8ur9lv4p2w22.png)

***


## <a name="git-and-github-enterprise"></a>Git and GitHub Enterprise

**Git** is a version control system and **GitHub Enterprise** (or GHE) is one platform for uniting git commands with a collaborative workflow. Git and GHE smooth collaboration by allowing multiple users to work on the same document on their own devices all at the same time. Then those users can submit their updates, describe their updates, and no one needs to rename the file. That file is updated in Git with the same file name it began with. Even if you are working alone on a code file, using version control can help alleviate confusion about what you were doing last time on that file. 

## <a name="git-and-github-enterprise"></a>Git

Git is the software that allows us to connect to repositories (or repos for short) and collaborate on projects that may exist in GHE. Git does not have an interface, as it is the version control software itself. Usually, we have an interface to use git. That interface can be your local machine's terminal or command prompt: 'Terminal' (macOS/linux) or 'Git Bash' (Windows). Another interface can be a git client such as RStudio or GitHub Desktop. I'll say more about these and how to setup your connection to your repo in the sections below.

Here are some Git terms (in alphabetical order) we need to become familiar with.

- **branches**: pathways of the repo. We can work on branches without affecting the master and this may be useful for experimenting with something without affecting the main project.

- **cloning**: copying an existing repo so that it can be accessed remotely on your local machine

- **commit**: a snapshot of the last saved version of a file and the corresponding message explaining what changed in this version of the file

- **diff**: the set of differences between commits; observing diffs helps keep track of what has changed across two commits

- **directory**: a folder or sub-folder within a repository

- **fetching**: downloading files (from a remote) that are not in your working directory

- **history**: the tracking of all changes to a file

- **master**: the main track of your repo. The master is also considered a branch. **GHE may eventually replace the name "master" with a new name due to social justice advocates who oppose the oppressive history of the term "master"**

- **merge conflict**: when two separate branches have made edits to the same line in a file, or when a file has been deleted in one branch but edited in the other. Merge conflicts can be fixed by manually editing the problem file and then merging and re-committing/pushing.

- **merging**: combines updated information into one single file and puts that on the master. Merging may be used to resolve conflicts when collaborators commit changes on the same file.

- **pulling**: a single command that does both fetching and merging

- **pushing**: finalizing and formalizing an updated file by adding the changes from your local working directory to GHE

- **remote**: the cloned repo

- **repo (or repository)**: the main folder or space in which a project exists

- **staging (or the staging area)**: a file that stores information about what's being committed. We want to stage a file after we've updated it in some way so that it can be committed

## <a name="git-and-github-enterprise"></a>GitHub Enterprise

In this course we are using **GitHub Enterprise**, which we call "GHE" for simplicity, as an LMS and as a tool for self-collaboration. **LMS means that GHE will function as the main course space - students will retrieve assignments, submit their assignments in their individual student repos, see their grades, get course updates and announcements, communicate with each other and with the course staff, access necessary course content, etc. Here self-collaboration means that you will work on a file, such as a homework assignment, and make several commits for that file as you complete the problems. Each time you make a commit, it is best to push the file to your individual student repo to submit it. Thus, your file versions are only collaborated between you and yourself.** The course exists in GHE at the website https://github-dev.cs.illinois.edu/stat440-fa21/stat440-fa21-course-content. The course website landing page is the README.md file, which will serve as the Course Announcements. You should check the course website frequently for updates and course announcements. Also, as the syllabus mentions, one of the main ways to communicate is in the Issues board, which functions as a discussion board much like Piazza. To access the Issues board, go to the course landing page and click on the Issues tab. 


***


## <a name="three-ways-to-interact-with-this-course"></a>Three Ways to Interact with this Course

There are three ways to interact with this course, which means there are three ways to interact with git.

1. Using GHE
2. Using Git from the command line
3. Using Git with RStudio

All three will get you the access to course content and for assignment retrieval and submission. But only the first way (Using GHE) will allow you to use the Issues board for posting questions and seeing responses. The second and third ways still require some aspect of the first way because of GHE. Below I discuss the procedures for setting up and connecting your individual student repo to your local machine as well as how to submit a **homework00** file in each of the three ways.

In order to complete these procedures successfully, you must first do these two steps:

Step i) Log into GHE at https://github-dev.cs.illinois.edu/login with your netID and Illinois password. **If you've never used GHE through the University before, logging in will establish your account.**

![](https://uofi.box.com/shared/static/a92v8jwa20ck5vr41ladftx1o2zu0ud3.png)

Step ii) Create your individual student repo (named as your netID) by clicking on this link https://edu.cs.illinois.edu/create-ghe-repo/stat440-fa21/. Afterwards, you should see something like this.

![](https://uofi.box.com/shared/static/yb0um2vim0mpi1vsbkh3m5ufwvf849j8.png)

### <a name="using-github"></a>Using GHE

Actually, completing the Steps i) and ii) are all that is needed to setup and connect your individual student repo. This isn't really a local machine kind of setup since you are choosing to interact with the course via using GHE. The course staff will give you access to the main repo and course landing page **stat440-fa21-course-content** by adding you to the "students" team. If you do not have access to the **stat440-fa21-course-content** repo 24 hours after completing Steps i) and ii), please contact me at kinson2@illinois.edu.

Inside the **stat440-fa21-course-content** repo, there should be a minimum of the **syllabus** directory and the README.md file. See image below. *I say "minimum" because these notes are made much earlier than newer files and directories that appear in the* **stat440-fa21-course-content** *repo.* 

![](https://uofi.box.com/shared/static/qrms5v0frf7e5hmbgtkwadpe6hjwkae0.png)

You will simply refresh the page of the **stat440-fa21-course-content** repo to see any updates and course announcements. Refreshing the page equates to **pulling** the repo since you are using GHE and not working with git locally. 

To retrieve assignments, go to the particular assignment directory, **homework** for example, and click on the assignment you need to complete. I have made a fake homework assignment called **stat440-fa21-homework00.md** and its rendered html file called **stat440-fa21-homework00.html**. It should be in the **stat440-fa21-course-content/homework** portion of GHE. See image below.

![](https://uofi.box.com/shared/static/cmtfq21ge8onai2jrbsmiltqpiw5hqbx.png)

To begin the assignment, I advise students to click on the "Raw" button for a particular assignment (for right now, we are using the stat440-fa21-homework00.md).

![](https://uofi.box.com/shared/static/e0h25c5mb9m8igr4fyvhmknupiujnx34.png)

Then, copy all of the text on that page and paste that into a blank .Rmd (or .ipynb) file depending on your preferred software. 

![](https://uofi.box.com/shared/static/c4f36az80xadb3mu68yzscsxgqd4b7lf.png)

Now, save the file as homework00-netID.Rmd (or .ipynb) inside of your preferred software. 

![](https://uofi.box.com/shared/static/dq80b7bnvm6umf8wbbfdwludhh27aspg.png)

Now, complete the first problem by writing your solution beneath the Problem **#1** wording. Then render your reproducible document file to .html. Rendering to html is also called "knitting". See image below.

![](https://uofi.box.com/shared/static/v6930gkz99crm7ditex1i2i7tkadaw1w.png)

To submit your assignments, go to your individual student repo (named as your netID) in GHE and upload the files (both .Rmd and .html). You can upload by clicking the "upload an existing file", then "choose your files" or simply dragging and dropping the two files into your repo page.

![](https://uofi.box.com/shared/static/jtwa3sylclk4e73php2nbn8unno8unng.png)

You can do this multiple times for your assignment submissions which is why I say you have unlimited submissions. Just be sure that your reproducible document file (either .Rmd or .ipynb) and rendered file (.html) are up to date with each other. It is not a good idea to complete the assignment in the .Rmd file, but forget to render it to .html.

### <a name="using-git-from-the-command-line"></a>Using Git from the Command Line

**Be sure to complete Steps i) and ii) above.** If you do not have access to the **stat440-fa21-course-content** repo 24 hours after completing Steps i) and ii), please contact me at kinson2@illinois.edu.

**The majority of these steps are discussed in a different way in the reference text Happy Git and GitHub for the useR by Bryan et al. https://happygitwithr.com/.**

Now, we are going to **clone** the **stat440-fa21-course-content** repo. Cloning this repo will be the first step to accessing the most up to date course content, updates, and announcements. To clone the repo, go to the **stat440-fa21-course-content** repo on GHE or [here](https://github-dev.cs.illinois.edu/stat440-fa21/stat440-fa21-course-content). Next, click on the green "Code" button. Then, click the clipboard in order to copy the repo's URL. See image below.

![](https://uofi.box.com/shared/static/qrms5v0frf7e5hmbgtkwadpe6hjwkae0.png)

Now, open the terminal or command prompt on your local machine: using 'Terminal' (macOS/linux) or 'Git Bash' (Windows). The code below, changes the current directory to the **Desktop** folder locally and clones the repo. At the cursor in your terminal, type the following in one line:

```
cd ~/Desktop
git clone https://github-dev.cs.illinois.edu/stat440-fa21/stat440-fa21-course-content.git stat440-fa21
```

Then press Execute to run that line. 

![](https://uofi.box.com/shared/static/kqnivgdqqj93ef52k6ky75uwg8xai84t.png)

Next, we can verify whether that clone was successful by listing out all files in this new folder called **stat440-fa21** on our local machine. The code below, changes the current directory to the **stat440-fa21** folder locally, then we list out all files with the ls code. Type the following in two lines executing after each line in your terminal:

```
cd stat440-fa21
ls
```

The resulting listing of files should contain a minimum of the **syllabus** directory and the README.md file. See image below. **I say "minimum" because these notes are made much earlier than newer files and directories that appear in the stat440-fa21-course-content repo.**

![](https://uofi.box.com/shared/static/1k6xyo6f4hfnndfvzr1j10a22xjwu270.png)

Great! You have successfully connected the **stat440-fa21-course-content** repo to your local machine. Cloning the repo should happen only once per local machine. Meaning, you should almost never have to re-establish the connection to GHE for this particular repo. **One reason you may need to re-clone the repo is if you have deleted the stat440-fa21 folder from your local machine.**

To keep your local machine up to date with the latest course content (including course announcements and assignments), you will perform a pull on this remote repo that you called stat440-fa21. To pull the stat440-fa21 repo via the command line, type the following executing after each line in your terminal:

```
cd stat440-fa21
pwd
git pull
```

After pulling successfully, you should see a message such as this.

![](https://uofi.box.com/shared/static/s2ez6jublzg9qenog0arz6d8bx9tcgwr.png)

To retrieve assignments, go to the particular assignment directory locally on your machine, **homework** for example, and click on the assignment you need to complete. I have made a fake homework assignment called **stat440-fa21-homework00.md** and its rendered html file called **stat440-fa21-homework00.html**. It should be in the **homework** sub-folder of the **stat440-fa21** folder on your local machine. 

![](https://uofi.box.com/shared/static/nifvp8e13xl1ynq4ttg8vejyw2pbst98.png)

To begin the assignment, I advise students to open the original .md file, e.g. **stat440-fa21-homework00.md**, using RStudio or Jupyter Lab depending on your preferred software. 

![](https://uofi.box.com/shared/static/b7wcgqdn72s1jj3wq3qg7cnw2qnh5uyw.png)

Now, save the file as homework00-netID.Rmd (or .ipynb). **It is good practice to save this file somewhere outside of the stat440-fa21 folder, e.g. your Desktop. Doing so ensures that you know which file you are working on and it reduces confusion about if the file is the original assignment or not.** 

![](https://uofi.box.com/shared/static/2v3qqkf5rpihhu3iuqt0ega5k84xuc8d.png)

Now, complete the second problem by writing your solution beneath the Problem **#2** wording. Then, render your reproducible document file to .html. Rendering to .html is also called "knitting". See image below.

![](https://uofi.box.com/shared/static/iewovr95xdn7po4inw0pdwnbqyswyi8w.png)

Now that your solution to Problem **#2** is saved locally, you want to practice submitting the assignment from your local machine to your individual student repo in GHE (not to the **stat440-fa21-course-content** repo). This is a form of self-collaboration. Recall that submitting an assignment in Git translates to **committing** and **pushing** the changes.

To submit your assignments, we must first connect your individual student repo to your local machine, which means we need to clone it.

Now, we are going to **clone** your individual student repo which is named as your **netID**. **If you have files in your repo already, then to clone the repo, go to your netID repo on GHE. Next, click on the green "Code" button.** Now, click the clipboard in order to copy the repo's URL. See image below.

![](https://uofi.box.com/shared/static/0o3acybq5prjojonj24f7unrstcp3b71.png)

Now, open the terminal or command prompt on your local machine: using 'Terminal' (macOS/linux) or 'Git Bash' (Windows). At the cursor in your terminal, type the following executing after each line:

```
cd ~/Desktop
git clone https://github-dev.cs.illinois.edu/stat440-fa21/netID.git netID
```

**As an alternative, using `cd ..` moves our current directory up from stat440-fa21 to whatever folder is above it. Your directories on your local machine may be setup differently than mine. Changing the current directory up from stat440-fa21 is a simple way to ensure we aren't cloning our individual student repos into the stat440-fa21 folder locally.**

Next, we can verify whether that clone was successful by listing out all files in this new folder called **netID** on our local machine. **The name netID should be your net ID**. The code below, changes the current directory to your **netID** folder locally, then we list out all files with the `ls` code. Type the following in two lines executing after each line in your terminal:

```
cd netID
ls
```

![](https://uofi.box.com/shared/static/cewo5jrbwxqlrpv4z9kg4776ysqyaav8.png)

*There shouldn't be any files in your repo, because we assume this is your first time accessing your individual student repo. Thus you may receive the message Warning: You appear to have cloned an empty repository.* 

Now, you need to copy the homework00-netID.Rmd and homework00-netID.html files from their current location into your netID folder. Go to the terminal and type the following executing after each line in your terminal depending on your OS:

Windows (if not using Git Bash)
```
cd ~/Desktop
copy homework00-netID.Rmd netID
copy homework00-netID.html netID
```

or

Mac/Linux (and Windows Git Bash)
```
cd ~/Desktop
cp homework00-netID.Rmd netID
cp homework00-netID.html netID
```

*Copying these files in this way will only work if you are copying them in the current directory. Otherwise, you will need to add information about the file location such as `copy Desktop/homework00-netID.Rmd Desktop/netID`. In this example, the homework file is on the Desktop and the netID folder is in the Desktop.* Now that those assignment files are in our local remotes, we need to actually submit them to our individual student repo in GHE. Go to the terminal and type the following executing after each line:

```
cd netID
git add homework00-netID.Rmd homework00-netID.html
git commit -m "Added two homework files from local machine"
git push origin master
```

![](https://uofi.box.com/shared/static/u90m3gn9d64l97b26eluedy0ruo0dhr1.png)

Great! We have successfully submitted our first (fake) homework assignment resulting see the image below. **You can verify whether any commit and push has been successful by going to GHE and checking if the file is there in the location you intended and that the commit message is also present and correct.**

![](https://uofi.box.com/shared/static/mmjb06fxjftsgqwtlrjuaxmva34myspe.png)

You can submit assignments an unlimited number of times for your assignment submissions which is why I say you have unlimited submissions. Just be sure that your reproducible document file (either .Rmd or .ipynb) and rendered file (.html) are up to date with each other. It is not a good idea to complete the assignment in the .Rmd file, but forget to render it to .html.


### <a name="using-git-with-rstudio"></a>Using Git With RStudio

**Be sure to complete Steps i) and ii) above.** If you do not have access to the **stat440-fa21-course-content** repo 24 hours after completing Steps i) and ii), please contact me at kinson2@illinois.edu.

**The majority of these steps are discussed in a different way in the reference text Happy Git and GitHub for the useR by Bryan et al. https://happygitwithr.com/.**

**The notes below are for RStudio which can be a git client. There are other git clients such as GitHub Desktop. If you prefer to use a git client that is not RStudio, please follow the directions in those clients.**

***We assume you have not previously cloned this or any repo mentioned below. We are going to point to the Desktop in these steps. Cloning will not work if you already have a folder on your local machine's Desktop directory called "stat440-fa21." If you do have a folder called "stat440-fa21" in your Desktop, then delete it or use a different name such as "stat440." Deleting local folders does not affect the repo in GHE ~ which is why version control and GHE is so powerful!***

Now, we are going to **clone** the **stat440-fa21-course-content** repo with RStudio. Cloning this repo will be the first step to accessing the most up to date course content, updates, and announcements. To clone the repo, go to the **stat440-fa21-course-content** repo on GHE or [here](https://github-dev.cs.illinois.edu/stat440-fa21/stat440-fa21-course-content). Next, click on the green "Code" button. Then, click the clipboard in order to copy the repo's URL. See image below.

![](https://uofi.box.com/shared/static/qrms5v0frf7e5hmbgtkwadpe6hjwkae0.png)

Now, open RStudio, click on "File", then "New Project...", then "Version Control", then "Git". See images below.

![](https://uofi.box.com/shared/static/rgdxbfghe10aijsmweo96hsre05o17xs.png)

![](https://uofi.box.com/shared/static/8jyxf62qdpzmy4bzf5t6chc5lrpyln01.png)

![](https://uofi.box.com/shared/static/m3xaqv2s30mckm5s3nxifd2qpir7o7ui.png)

Now, paste the repo's URL (copied from above) in the "Repository URL" field. Type "stat440-fa21" in the "Project Directory Name" field because that will be the name of the folder that is connected to the **stat440-fa21-course-content** repo. In the "Create project as sub-directory of" field, select the Desktop; in the future, it could be anywhere of your choosing.

![](https://uofi.box.com/shared/static/emd4lpa8nwit09qgb5riynt7rc9fi6tc.png)

Next, we can verify whether that clone was successful by checking our computer's Desktop for a folder named "stat440-fa21." **Remember that we assume this folder didn't exist before. If there was already a folder with that same name, then the clone would not be successful. One quick resolution will be to delete the "stat440-fa21" folder from the Desktop and re-do this cloning procedure.** The resulting set of files should contain a minimum of the **syllabus** directory and the README.md file ***and a new file called stat440-fa21.Rproj.***. See image below. **I say "minimum" because these notes are made much earlier than newer files and directories that appear in the stat440-fa21-course-content repo.**  

![](https://uofi.box.com/shared/static/sm1qj8vdsuw3a9obt6y6k43iof0uop3t.png)

Great! 

Another way to verify that you have successfully cloned with RStudio is by noticing RStudio has opened your stat440-fa21 directory via its stat440-fa21.Rproj file. This .Rproj file is a file that RStudio creates to keep up with projects that have been created. This file is something you keep locally; do not commit and push this to GHE. Also, notice that inside of RStudio, you now have a new Git tab in the top-right pane. This Git tab becomes our central interface for interacting with Git.

![](https://uofi.box.com/shared/static/887za5ehaplkg0izbpx2wvvk4lqgwpyj.png)

Even greater! You have successfully connected the **stat440-fa21-course-content** repo to your local machine with RStudio. Cloning the repo should happen only once per local machine. Meaning, you should almost never have to re-establish the connection to GHE for this particular repo. **One reason you may need to re-clone the repo is if you have deleted the stat440-fa21 folder from your local machine.**

To keep your local machine up to date with the latest course content (including course announcements and assignments), you will perform a pull on this remote repo that you called stat440-fa21. To pull the **stat440-fa21-course-content** repo via RStudio, click on the Git tab, then click on the blue down arrow, which means to **pull** down the repo.

![](https://uofi.box.com/shared/static/5u9upi020lk9k746t4b02wj7bp35138v.png)

After pulling successfully, you should see a message such as this.

![](https://uofi.box.com/shared/static/2idvmf1uv9uu3bif8pmue36dz1svofqb.png)

*Already up to date is not necessarily what you want to see. You just want to see that the pull was successful.*

To retrieve assignments, go to the particular assignment directory locally on your machine, **homework** for example, and click on the assignment you need to complete. I have made a fake homework assignment called **stat440-fa21-homework00.md** and its rendered html file called **stat440-fa21-homework00.html**. It should be in the **homework** sub-folder of the **stat440-fa21** folder on your local machine. 

![](https://uofi.box.com/shared/static/rk9qlr3rto5ejhie9j6mfyfiwm04ptrh.png)

To begin the assignment, I advise students to open the original .md file, e.g. **stat440-fa21-homework00.md**, using RStudio. 

Now, save the file as homework00-netID.Rmd inside of your preferred software. **It is good practice to save this file somewhere outside of the stat440-fa21 folder, e.g. your Desktop. Doing so ensures that you know which file you are working on and it reduces confusion about if the file is the original assignment or not. Again, we assume that this file does not already exist on your Desktop.** 

![](https://uofi.box.com/shared/static/zwt6p6dd0tztt7gr3aiehkya5simqutn.png)

Now, complete the third problem by writing your solution beneath the Problem **#3** wording. Then, render your reproducible document file to .html. Rendering to .html is also called "knitting". See image below showing the knitted version.

![](https://uofi.box.com/shared/static/4rpwbytx5ulx5180g605aiqlp1vv8kor.png)

Now that your solution to Problem **#3** is saved locally, you want to practice submitting the assignment from your local machine to your individual student repo in GHE (not to the **stat440-fa21-course-content** repo). This is a form of self-collaboration. Recall that submitting an assignment in Git translates to **committing** and **pushing** the changes.

To submit your assignments, we must first connect your individual student repo to your local machine, which means we need to clone it.

Now, we are going to **clone** your individual student repo which is named as your **netID**. **If you have files in your repo already, then to clone the repo, go to your netID repo on GHE. Next, click on the green "Code" button.** Now, click the clipboard in order to copy the repo's URL. See image below, which assumes you do not have any files in your repo.

![](https://uofi.box.com/shared/static/0o3acybq5prjojonj24f7unrstcp3b71.png)

If you have files in your repo, then you may see this image below.

![](https://uofi.box.com/shared/static/xp2455hyo544612sfxm1gwol3qwr8f48.png)

Now, open RStudio, click on "File", then "New Project...", then "Version Control", then "Git".

Now, paste your repo's URL (copied from above) in the "Repository URL" field. Type "netID" (your net ID) in the "Project Directory Name" field because that will be the name of the folder that is connected to your **netID** repo. In the "Create project as sub-directory of" field, select the Desktop; in the future, it could be anywhere of your choosing.

![](https://uofi.box.com/shared/static/mypaehvxrcm339iwc1v4c83xwx6dg6u6.png)

Next, we can verify whether that clone was successful by checking our computer's Desktop for a folder named "netID" (your net ID), checking that RStudio has opened a new Rproj file named "netID" (your net ID), and that there is an accessible Git tab. **Remember that we assume this netID folder didn't exist before. If there was already a folder with that same name, then the clone would not have worked, and you would see this image below.** 

![](https://uofi.box.com/shared/static/0wqzcoizwhux3x6hpafwsqwpfu4srgnd.png)

There should be nothing in the folder since we also assume that you did not attempt the first and second ways to interact with the course. 

![](https://uofi.box.com/shared/static/qwibve09c6l52y0pqyxxmy2z17str8vf.png)

Great! You've successfully cloned the repo.


Now, you need to copy the homework00-netID.Rmd and homework00-netID.html files to your netID folder. This can be done using RStudio's "Files" tab in the bottom-right pane. First, check the box next to homework00-netID.Rmd. Then, click on the Blue cog/wheel "More", then click "Copy To", then select the netID folder (which should be on your Desktop).

![](https://uofi.box.com/shared/static/3bmf5usphyz9trs2rfkuq1oqqhlc8km1.png)

Now, repeat this for the homework00-netID.html file. **Alternatively, since these two files are on your local machine, you can go directly to them outside of RStudio and copy/paste them to your netID folder.** 

Now that those assignment files are in our local remotes, we need to actually submit them to our individual student repo in GHE. To submit them means to **commit** and **push** them. To do that, go to the Git tab (assuming you are currently open in your netID.Rproj file), click on the "Commit" button which makes a pop-up interface. In this pop-up, 1) check the Staged box for the two files homework00-netID.Rmd and homework00-netID.html terminal (it may take a moment for check to appear), 2) write a commit message "Added two homework files using RStudio", (a new pop-up will perform the commit action) and 3) click on the green up arrow "Push" (a new pop-up will perform the push action).

![](https://uofi.box.com/shared/static/b8arw4u8o42ox9df82mwmr5fkjkfptro.png)

Great! We have successfully submitted our first (fake) homework assignment resulting in the image below. **You can verify whether any commit and push has been successful by going to GHE and checking if the file is there in the location you intended and that the commit message is also present and correct.**

![](https://uofi.box.com/shared/static/usmdcuvq81brtw4894wraggc3h47811m.png)

You can submit assignments an unlimited number of times for your assignment submissions which is why I say you have unlimited submissions. Just be sure that your reproducible document file (either .Rmd or .ipynb) and rendered file (.html) are up to date with each other. It is not a good idea to complete the assignment in the .Rmd file, but forget to render it to .html.

Suppose you close RStudio and you need to do more work on an assignment. It is quite simple to return to a repo/Rproj by clicking on "File", then "Open Project...", then select the folder and .Rproj file that you need to return to. For example, select the netID.Rproj (your net ID) file, which is located within the netID (your net ID) folder within the Desktop.

![](https://uofi.box.com/shared/static/qv2taxj6otlsxozzycc4ybtjkbomku2a.png)

#### END OF NOTES