# <u> Tutorial 1: Python Set Up and Using Git </u>


<br> This series of tutorials is intended to teach the basics of Python for scientific programming. These tutorials were written by Sanjana Kulkarni, an intern in the High Throughput Analytics group during summer 2021. 

## Why Python?

Python is a versatile high-level programming language. It is free, unlike MATLAB, and packages are open-source, which means that anybody can use and change Python tools. Python is ideal for manipulating data in dataframes (a type of data table), performing statistical analyses, and creating plots and graphs. It is also very useful in machine learning and image analysis. 

Python does not interface with computer hardware; it is a language best suited to handling data, and its utility in scientific computing can not be overstated. It is a good language for programming beginners. 

## What is a Jupyter Notebook?

The file you are currently viewing is called a Jupyter Notebook. It is an interactive notebook that can support text, code, and graphics. It is ideal for teaching purposes because code and text descriptions can be interwoven. The file extension is <b>.ipynb</b>.

The cell that you are reading now is called a <b>Markdown</b> cell. You can write plain text, HTML code, and mathematical equations in <a href="https://www.latex-project.org/" target="_blank">LaTeX</a> in markdown cells.

<b>Code</b> cells run code, which is by default, Python. You can check the language at the top right corner of the notebook. In the toolbar at the top, use the dropdown menu to toggle between Markdown, Code, and other cell types. 

Python can also be written in script files, which have the <b>.py</b> extension. These scripts execute a series of commands and usually return an object, like an image, file, or dashboard. They are good for code that does things under the hood. 

## Using Python on Merck Laptops

The software needed to run Python is available in Merck's Software Center. You will need to download the following:

1. <b>Python 64bit </b>
2. <b>Anaconda or Visual Studio Pro</b>

Visual Studio is an integrated development environment that supports multiple programming languages. It can also run Jupyter notebooks, but this requires some extensions to be installed. 

I prefer Anaconda because I am most familiar with it, and it comes with many Python packages pre-installed, but it can be helpful to have multiple development environments when you are writing and testing many scripts.

## Using the Command Prompt

Although the Windows file manager is very easy to use, when you work with code a lot, it is helpful to use the Command Prompt to navigate between directories (a.k.a folders) and create files. When using Git (described in the next section), knowledge of the Command Prompt is essential. 

The Windows command prompt automatically installed on the Merck laptops is not very useful because nearly everything is blocked. IT may have ways to get around that, so if you need to use the command prompt, I would contact them to get greater access. 

I use the <b>Anaconda</b> prompt to install packages and run scripts. A new line of the command prompt looks like this:

`C:\Users\your_username>`

You type after the `>` symbol and then press Enter. By default, you are in your home directory. 

The most common commands you will likely use are:

1.  `cd`: <b><u>c</u></b>hange <b><u>d</u></b>irectory
2. `mkdir`: <b><u>m</u></b>a<b><u>k</u></b>e a new <b><u>dir</u></b>ectory
3. `dir`: print the contents of the current <b><u>dir</u></b>ectory
4. `del`: <b><u>del</u></b>ete a file or directory 

To use `cd`, you must type the name of the folder you want to move into. For example, to move to my desktop folder, I would type

`cd Desktop`

If I have a sequence of folders, where folder_A is located on my Desktop, folder_B is located in folder_A, and so on and so forth, I would type

`cd Desktop/folder_A/folder_B/folder_C`

And now I'm in folder_C. If the folder you are trying to `cd` into isn't located where you think it is, you will get an error like `The system cannot find the path specified.` telling you that the file path you entered isn't there. 

The opposite of moving down a file tree is moving up. If you are in folder_C and want to navigate back up to the Desktop, you type `cd ../`, but the number of `../` sequences should match the number of levels you need to move up. In our case, folder_C is 3 steps below Desktop, so we would type

`cd ../../../`

The other commands are straightforward. You don't need to type anything other than `dir` to get a list of all the files available in the current directory. 

To make a new folder, simply type `mkdir new_folder`, where new_folder is the name of the folder you want to make. Similarly, type `del folder_or_file_name` to delete a folder or file. If you are deleting a folder, it will ask you if you are sure, then you type `Y` to indicate "yes." A file is automatically deleted after the `del` command. 

Some of these commands are different for MacOS and Linus, so if you use other operating systems, you can look up the commands. There are many more commands for the command line, and you can easily look them up. 

## Git and Version Control

Git is a cloud-based software for tracking changes in a repository. It is most commonly used by programmers, but any file types can be stored in a repository. Merck uses Bitbucket, which is a Git system for professional organizations. 

Merck's Bitbucket is called Stash, which can be found <a href="https://stash.meck.com" target="_blank">here</a>. To use it, click on the link, which might prompt you to sign in for the first time or automatically sign you in. And that's it! Then you can create your own repositories or be added to existing ones. 


## Cloning the Repository

To use git on your Merck laptop, you must have git for Windows downloaded, which can be found <a href="https://git-scm.com/download/win" target="_blank">here</a>. This installs a command line that you can use to interface with Git-based repositories. 

<b><u>All remaining code in this tutorial must be run in this Git Bash command line, NOT the command line used for running scripts</u></b>

To edit files in an online repository, you must first clone the repository to your local machine. Once you navigate to the source code of a repository, click the button called <b>Clone</b> on the left-hand side and copy the HTTP link.

In <b>Git</b> command prompt, navigate to the folder where you want to store git repositories. I created a folder called "git" on my Desktop, where I store all my repositories. Once you're in this (or any other desired folder), run the following code on the <b>Git</b> command prompt.

<code>git clone <i>copied_HTTP_link</i></code>

Quotation marks are not necessary around the link. A folder with the same name as the repository will be created, and all of the files will be copied. 


## Committing Your Changes

When you're editing files in a repository, YOU MUST COMMIT your changes frequently. I commit whenever I make a large modification in case something happens to the local file or my computer. To commit changes, run these 3 commands in sequence:

<code>git add <i>file_that_was_changed</i></code>

`git commit -m "Quick message to describe the changes"`

`git push origin master`

Quotes ARE necessary around the commit message. If you change multiple files, you can commit the changes all at once. You can commit all files in a directory with

`git add *`

It will only update files that were changed, so this is probably the easiest strategy, especially if you have many changed files. If you instead want to manually include the file names, use <br>

<code> git add <i>file_1 file_2 "file number 3"</i> </code>

Quotes are only necessary if there are spaces in the file names. Otherwise, individual file names can just be separated by spaces.


## Pulling Others' Changes To Your Work

If there are multiple users working on a repository (as is usually the case), you must run the command

`git pull`

frequently to pull other users' changes to the versions of the files stored on your computer. New changes that are commited by a user with an older version of a file may lead to <b>merge conflicts</b>, which are typically resolved by reverting the files and then committing all the changes at once by one user. 

Whenever I start working in a shared repository, I start with `git pull` and then begin my work. If everything is up to date, you will get a message saying so. At the end (and frequently throughout), commit all your changes. Remember the phrase, "Git commit, git push, and git out the door!"


## Deleting Files in Git

To delete files from the cloud repository, run

`git rm file_name`

"rm" stands for "remove." The line above will give you an error if you try to remove a folder that is not empty. In that case, add a flag, which is denoted by the `-` character, to delete recursively. It will then delete every file and subfolder in the folder to delete. The code is below. 

`git rm -r folder_name`

After deleting files or folders, run `git commit -m "your message here"` and `git push origin master` just as if your commit was adding files, rather than deleting them.