# Day 3 Exercises: The Command-line & Git

## Background

A hypothesis has been stated that within the genus *Iris* that the sepal and petal dimensions of the iris flowers can be used to distinguish between the different species. A dataset was collected where the length and width of three different iris flowers, *Iris virginica*, *Iris setosa*, and *Iris versicolor* were collected. Samples were collected from 50 different individuals for each species.

<div>
    <span style="float: left; padding-right: 10px"><img src="./media/D03-Iris_versicolor.jpg" style="width:200px" /><br><a href="https://en.wikipedia.org/wiki/Iris_versicolor#/media/File:Blue_Flag,_Ottawa.jpg"><i>Iris versicolor</i></a></span>
    <span style="float: left;  padding-right: 10px"><img src="./media/D03-Iris_virginica.jpg" style="width:200px" /><br><i><a href="https://en.wikipedia.org/wiki/Iris_virginica#/media/File:Iris_virginica_2.jpg"><i>Iris virginica<i></a></span>
    <span><img src="./media/D03-Iris_setosa.jpg" style="width:200px" /><br><a href="https://en.wikipedia.org/wiki/Iris_setosa#/media/File:Irissetosa1.jpg"><i><i>Iris setosa</i></a></span>
  <br><i>Image Source for all three flowers: Wikipedia</i?
</div>
       




The data is in a comma-separated format. The first few rows of data look like the following:

Each line (**row**), of the data represents a single measurment (**observation**). Commas separate the values (**column**). Each column of data has the following meaning:


| Column # | Type | Description |
| -------- | ---- | ----------- |
| 1 | continuous | sepal length |
| 2 | continuous | sepal width |
| 3 | continuous | petal length |
| 4 | continuous | petal width |
| 5 | continuous | species (or "class") |

The following image demonstrates how the dimensions of the sepal and petal are calculated.

<span style="float: left; padding-right: 10px"><img src="./media/D03-Iris_measures.png" style="width:200px" /><br><a href="https://machinemantra.in/supervised-learning-example-iris-dataset/"><i>Image Source: Machine Mantra</i></a></span>

### Data Source
This data is a classic dataset often used for teaching data analytics. You will find it used frequently by many lessons and tutorials.  The data is derived from a paper published by Ronald Fisher in 1936 [more info](https://en.wikipedia.org/wiki/Iris_flower_data_set).  You can download a copy of the data [here](https://archive.ics.uci.edu/ml/datasets/iris).

## Your Task
You have been tasked to develop a mathematical model to explore if there is evidence to support the above stated hypothesis.  You have decided to write a Python program to do this. You want to share the source code you develop with your collegues.  This will allow them to run the models on their own, to provide suggestions for improvement, or to provide fixes to the code if they encounter bugs.  

**Note**: This scenario is just for practice to help reinforce learning.  

## Exercise #1
Perform the following:
1. Create a new project directory in your home directory for this project. 
2. Initialize this folder as a new git repository.
3. Within the project directory, create a new file named `README.md`. For now it should be empty. It is good practice to always have a README.md file in any new software project or analysis repository to help explain to others what the project is for.
4. Within the project directory, create a new Python file named `iris_analysis.py`. For now it should also be empty, but this will be the file we will use to write the Python code.
5. Within the project directory, create a subfolder named `data` where you will put the iris dataset.  Download the following files from the data source and place them in the new `data` folder.
   1. `iris.data`
   2. `iris.names`

Enter all command-line commands you used in the terminal to complete the tasks above. ***Hint***: You can use the `history` command on the command-line to look at all of your recent command-line instructions.

Answer the following questions.

**Q1.1** After you initialize the repository for Git, you will see a new `.git` directory. What is this for?

**Q1.2** What would happen if you deleted the `.git` directory?

**Q1.3** Type `git status` within the project repository. What does that tell you? Why is this command so important?

**Q1.4** Go to your home directory.  Type the following `git status`.  You will get a message. What is the meaning of this message and why did you get it?

**Q1.5** How can you make sure you are always in the correct location to use Git commands

## Exercise #2
Create a new project on GitHub. This will serve as the remote repository for the project. Next, link this new GitHub repository to your local git repository.

***Note***: this is just a test project.  You can delete it after class if you do not want it to clutter your GitHub repository listing.

Enter into the cell below the command you used to link the local repository and the remote GitHub repository

Answer the following questions

**Q2.1** Why do we want both a local Git repository and a remote copy on GitHub?

**Q2.2**  Go into the `.git` directory and look around. You will see a `config` file. Look at the contents of that file. What do you think the purpose of thisf ile is?

**Q2.3** Aside from one being local and the other remote, what is the difference between the Git repository on your local machine and the one that GitHub maintains?

## Exercise #3

You now have a new local Git repository and a remote Git repository (on GitHub) for tracking changes to your project, but you only initialized it.  It is not yet tracking files.  You have several files.  Make git track all of the files you currently have and make sure you can see those files locally and remotely. Enter the command-line commands you used to make this happen in the cell below


## Exercise #4

Normally, a `README.md` file will contain information about the project.  You will notice that GitHub will automatically take the text from the `README.md` file and display its text on the front page of your repository.  For an example, take a look at the [Tripal software package on GitHub](https://github.com/tripal/tripal).  Look at the home page for the repository that GitHub shows, then click on the `README.md` file and compare. 

Notice that the `README.md` file has a `.md` extension.  This means the file will contain Markdown.  Its Markdown!  We can use the same markdown you used for Jupyter Notebooks in this file and GitHub will render it nicely for us!   Go back to the Tripal repository, click the `README.md` file, then click the <kdb>raw</kdb> button and take a look at it.

In the `README.md` use your command-line knowledge to add a new first-level heading to the file.  It should give a title to your project.  Enter the command-line commands you used in the box below.  

**Note:** it is not ideal to add text to a file via the command-line. Its best to use a text editor, but we do here only to practice.

## Exercise #5

We have not yet learned any Python programming, so we do not yet know what we can put in the `iris_analysis.py` file, but you did learn some very basic math in [Assignment 1: Tour of JupyterLab](A01-JupyterLab.ipynb).  Edit the `iris_analysis.py` file using any text editor and put in some simple math, similar to what you did in that first assignment.

Now, commit changes to the `iris_analysis.py` file, but do not commit the `README.md` file that you created in Exercise #4. Enter the command-line commands you used in the box below. 

Now answer the following questions

**Q5.1** Type `git status`. What does it tell you about the state of your repository?

**Q5.2** Type `git diff`. What does it tell you about the state of your repository?

**Q5.3** How would you find the diff log of just the `README.md` or the `iris_analysis.py` file but not both?m

**Q5.4** If you go to GitHub and look at the remote repository for this project the changes to the `iris_analysis.py` file are not there. Why not? 

**Q5.5** How can you make changes to the `iris_analysis.py` file show up on the remote GitHub repository? Enter below how you can make this happen, then do it.

## Exercise #6
Create a new directory in your home directory, name it `delete_me` or something that you will remember to delete later.  Clone the remote repository in this new directory. Enter the commands you used to do this:


Answer the following questions

**Q6.1** What did this clone do? Would you ever do this is real life?

**Q6.2** Why does the `README.md` in the cloned copy not contain the changes you made to it?

**Q6.3** What is the difference between doing a clone like we did, and if we created a copy of the repository using your computer's file browser?

**Q6.4** How do you get the `README.md` updated to the most recent version in cloned copy?  Make it happen, and enter all the commands you used to make that happen in the cell below.

**Q6.5** Why did we clone the repository to make the copy. Why didn't we fork it instead?

**Q6.6** If we delete the cloned repository will that create any problems for us? If so how?

## Exercise #7
Lets explore what happens if we create a merge conflict.  Perform the following:
1. Using your favorite text editor, edit the `README.md` file
2. Cut-and-paste the text from the first-paragraph of this notebook and place it in the file under the header you added earlier.
3. Save the file. 
4. Go to GitHub for you repository. Click the `README.md` text and click the pencil <i class="fa fa-pencil"></i> icon to edit the file.  
5. Add any text you want to the file
6. Click the <kdb>Commit changes</kdb> button
7. Pull the changes from the remote server

Enter any command-line commands you ran in the box below:

Answer the following questions

**Q7.1** What happended?

**Q7.2** Why did we get this conflict?

**Q7.3** How can you avoid getting these types of conflicts when you do assignments and the unit project?

## Exercise 8
We will work together to fix the merge conflict we created in the previous exercise.