# Collaborate on projects with Git & Github

As you grow up as a developer, you will need to collaborate on projects. Writing code together on the same project can be a nightmare if you don't know how to do it. 

## What you will learn in this class 🧐🧐

In this lecture, you'll learn how to collaborate efficiently on software development projects and manage code versions. After the course, you will know:

*   Basics of Git for Version Control System (VCS)
*   Use Git with the terminal and share your code on Github (a platform of collaborative development)
*   How to collaborate on a development project using the agile method

## Basic terminal commands to manage files

In M01-D01, you learnt how to navigate into your filesystem with your terminal, using the commands `pwd`, `cd` and `ls`. Let's introduce some additonnal bash commands that will allow you to manage files and directories :

* `mkdir dirname`: **Create a new directory**
* `touch filename`: **Create a new empty file** 
* `open filename`: **Open filename with the default software related to its extension**
* `cp existing_file new_file`: **Copy `existing_file` into `new_file`.** Both `existing_file` and `new_file` can include paths (relatively to your working directory). For example, `cp ../file1.ipynb folder1/file2.ipynb` will create a copy of `file1.ipynb` which is stored in the parent directory of your working directory, and save it into `folder1/` directory which is located in your working directory, by renaming it `file2.ipynb`.
* `mv existing_file new_location`: **Move `existing_file` to `new_location`.** You can use paths in the same way as with the `cp` command. For example, `mv file1.py folder1/file1.py` will move `file1.py` into `folder1`. You can also use this command to rename files, for example : `mv file1.md README.md` will rename `file1.md` into `README.md`

NB : all the commands listed above will be executed relatively to your *working directory*. You can know at any time in which directory you are with the `pwd` command, and use `cd` to change your working directory.

## Git : a version control system

### What is git ? 

When you code for a project, you may have to make several versions of it. For example, you have a first file:


```python
myproject.py
```


Then you decide to improve the project, so you created a copy of this file that you called :


```python
myproject_V2.py
```


And then, it is possible that you have a third version of the project and that you have made a new copy of your file, etc. You can keep going this way for a little while longer, but as your project becomes more and more complex, you will find it hard to keep it all in order.

This is what a VCS or Version Control System is for. Thanks to this system, you will be able to automatically and serenely create versions of your projects without losing or breaking the initial code.

One of these VCSs is Git, which is the most popular and most used of all. So we're going to learn how to use it to work on complex projects.

### Configuring git

#### Download and setup

Go to [this webpage](https://git-scm.com/downloads), download the setup file corresponding to your operating system and then follow the setup instructions. 

#### Configure Git
#### Configure your username and email

Before you can use Git, you'll have to configure your username and contact email. This has to be done only once.

Open your terminal and type :

```
git config --global user.name
```

This allows to check which is your git username. To change your username, please type :

```
git config --global user.name "your_username"
```

Then, to configure your contact email, it's very similar, just replace `user.name` by `user.email` :


```python
git config --global user.email
```


### Basic git commands 

Git works with particular directories that are called "repository". A repository is a directory containing some project's code, already configured to work with git. Here are listed the commands that you'll need to download an existing repository from Github (see next section for more details about Github) and add/change some files in it, while keeping history of the different versions of the files:

* `git clone url.git`: **Download an existing repository from the remote url specified.**
* `git status`: **Show the current state of the repository.** This allows you to see if files have been created, modified or deleted since the last commit (i.e. the last time you saved changes : see `git commit`). Use this command often, it's important to be aware of the status of your repository to avoid mistakes !
* `git add file.ipynb`: **Add `file.ipynb` to the list of files in which changes will be tracked by git.** Only the files that are "added" will be taken into account when making a `git commit` (see next command). If you want to track all the files in your repository, you can type `git add .`
* `git commit -m "text describing the changes you made"`: **Create a new commit that reports changes made in the files that were "added" before.** A commit is like a "flag" that will allow you to go back to a specific version of a file at any time.
* `git log`: **Show all the history of commits you've already made.** This will allow you to get the unique identifier of a specific commit, in order to restore a specific version of your project.
* `git reset commit_identifier`: **Go back to the version of your project saved at the specified commit.** 
* `git push`: **Push all the commits containing the changes history to the remote Github url from where you downloaded the repository**. This allows you to have a remote copy in the cloud of all the changes you've made, such that you will never lose your code, even if your computer crashes :-) 

All the commands listed above will be executed in your *working directory*. Make sure your working directory is a git repository, otherwise it won't work ;-) (except for `git clone`, which will work anywhere in your system, as it downloads a repository)

NB : In this lecture, we don't cover how to create a git repository and initialize it from scratch. If you want to know more about git, please read [git's official documentation](https://git-scm.com/doc)


## Github : a platform for collaborative software development

### What is github ?

Once you have a VCS and you use it, it's very good, but there's even more powerful. Imagine that you don't even have to exchange your files from machine to machine between collaborators anymore, but you can put everything in the cloud so that all authorized people have access to the resources at any time. That's the Github principle.


Thanks to this tool, you will be able to create what we call "repositories" and put your projects in them, interact with all the people who collaborate on one of them and deploy them.


In Github, you can have a personal profile, which can be part of an organization (ex: Jedha). Let's explain a little bit about both.


#### Your profile

As in all social networks, to access Github, you need to have a profile. It's as simple as that. Just go to [github.com](https://github.com/) and enter :

1. A github username
2. An email
3. A password

Once you have all that, you can start using the tool.

You'll see that on your profile you'll see :

* The repositories you have
* The repositories you follow via the _stars_
* Your number of followers
* The number of people you follow

If you need to set up anything on your account, just go to the right side of the navigation bar, click on your avatar and a dropdown menu should appear where you can see the account settings.


#### Being part of an organization

If you are in a tech company, it is highly likely that this company has a "page" on Github. Instead of calling it a "Page", it is actually called an organization. This is used to create teams of people working on common projects.

With an organization, you can create its own repositories, manage who has access to the repositories, and manage team projects. It is very useful to create an organization or to be part of one if you have to work with a lot of people on the project. As we use Git & Github, you'll get a better understanding of how it all works. So let's get to it.


### Configuring a remote repository

Once you've signed up to Github, you will be able to create remote repositories. To do so, you can go to https://github.com/new (from the homepage, you can also click on the "new" button in the repositories section). Then follow these steps to configure your repository :

* Choose a name (pick something that sums up what will be inside, avoid names like "project1" or "test")
* Choose the access level : public or private. If your project is still ongoing, we advise you to make the repository private. You'll be able to make it public at any time, for example once you're satisfied with your code and if you'd like to share it as part of your portfolio.
* Select "initialize this repository with a README". This will create a Markdown file *README.md* at the root of your repository. This file will be displayed by default on Github's homepage of the repository. Later, you will be able to update this file with pieces of information such as: a description of your project, instructions about how to install and execute your code, a link to the dataset you used, and external resources (links to the external libraries you used, scientific articles, tutorials, blogposts, etc...). We'll take some time later in this lecture to teach you how to edit Markdown files.
* In the "add .gitignore" drop-down menu, select "Python". This will create a *.gitignore* hidden text file that will help Git automatically ignore files and directories that don't contain useful content (for example, the backup files that are created by Jupyter and that are stored in a *.ipynb_checkpoints* directory)

Once you've completed all this steps, you can click on "create repository". You will then be redirected to the homepage of the repository. You can click on the "clone or download" green button in the top right corner and copy the url of your repository to your clipboard (the url that ends with ".git") so you can use it together with the `git clone` command to download the repository to your computer.

### Contributing to an open-source project

You may be surprised by the fact that a lot of repositories are public on Github. Actually, many of the famous Python libraries are *open-source* : all data scientists over the world can use them for their own needs for free, and these libraries are have been developed thanks to contributions from all around the world. The famous libraries that you will learn to use in the next weeks are all open-source libraries and you can find their source code on Github : [pandas](https://github.com/pandas-dev/pandas), [seaborn](https://github.com/mwaskom/seaborn), [scikit-learn](https://github.com/scikit-learn/scikit-learn), [tensorflow](ttps://github.com/tensorflow/tensorflow), etc...

When you'll be an expert in Python and Data Science, you may want to add some features in the source code of such libraries. To do so, you will follow these steps :

* **Fork a repository** : this will create a copy of an existing repository (for example scikit-learn's repository) *hosted on your Github account*
* Use the git commands you already know to download the repository to your computer, make some changes in the source code, and publish the changes on your Github repository
* **Send a pull request** : this will notify the managing team of the original repository (for example, scikit-learn developers) that you made changes that you would like to merge into the original code. Usually, it opens a discussion with the source code's owners and in the end, they can accept or refuse your contribution.



### A very simple Git/Github workflow 

Nowadays, most companies are hosting their source codes on Github (or equivalent platforms such as Gitlab) and managing the versions control with Git. Thus, you can be pretty sure that you will have to use theese tools once you're working as a data analyst/engineer/scientist ! However, there exist many ways of working in teams with Git/Github that may vary from one company to another. These are called "Git/Github workflows". Don't worry, you'll be taught what is the Git/Github workflow once you're hired in a given company. Here, we only introduce a simplified workflow for you to understand the philosophy of how to use Git and Github.

Let's imagine that Roger is a very skilled data scientist that wants to add some features to scikit-learn's source code. Here is a scheme that represents the different steps he would go through :

![](https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/01-Git_and_github_basics.png)



## Resources 📚📚

* Git cheatsheet - [https://services.github.com/on-demand/downloads/fr/github-git-cheat-sheet/](https://services.github.com/on-demand/downloads/fr/github-git-cheat-sheet/)

* Git best practice - [https://guides.github.com/introduction/flow/](https://guides.github.com/introduction/flow/)

* Setting up your username and email on git - [https://help.github.com/articles/setting-your-username-in-git/](https://help.github.com/articles/setting-your-username-in-git/)

* Create a repository on Github - [https://help.github.com/articles/create-a-repo/](https://help.github.com/articles/create-a-repo/)

* Write on Github for Begginer - [https://help.github.com/articles/basic-writing-and-formatting-syntax/](https://help.github.com/articles/basic-writing-and-formatting-syntax/)