# `git2net` - Extracting and analysing co-editing relationships from *git* repositories

In this tutorial you will learn the basic steps required to obtain a co-editing relationship from a git repoitory using `git2net`.

## Prerequisits

This tutorial assumes you have `git2net` installed. In addition, it is recommended to create a folder for this tutorial as additional files will be downloaded to your local directory (if not specified otherwise).

## Repository Mining

To start, you will need to select and clone a git repository that you are interested in analysing. For the purpose of this tutorial, we will analyse the repository behind `git2net`&mdash;aiming to finally find a solution to the well-known chicken and egg problem.

The following lines will clone the `git2net` repository to your current working directory. To change this location, you can edit the path to the local directory stored in `local_directory`. The folder name of the repository is the name of the repository, which we store in `repo_name`.

In [1]:
import git
import os
import shutil

repo_url = 'https://github.com/gotec/git2net.git'
local_directory = '.'
repo_dir = 'git2net4analysis'

if os.path.exists(repo_dir):
    shutil.rmtree(repo_dir)

git.Git(local_directory).clone(repo_url, repo_dir)

''

Now that we have obtained a local copy of the repository, we can use `git2net` to obtain a database containing information on all commits and edits made to obtain the current state of the repository.

To do so, we use the `mine_git_repo` function. This function takes two required inputs as well as a number of optional inputs, some of which we will further explore later in this tutorial. Let's start with the required inputs. Here, we need to supply a path to the git repositoy that will be analysed. Below, this is done with the variable `repo_name`. In addition, `git2net` requires a path to the *sqlite* database that will be filled during the mining process. This path is provided as `sqlite_db_file`.

Note, that if no database exists on the supplied path, `git2net` will create a new database. If a database exists, `git2net` will check if the database was mined with the same setting and on the same repository and subsequently resume the mining process from wherever it was left off.

Let's try this out. Below we import `git2net` and point it to the path to which we cloned the database. In addition, we specify the location of the database file in which the results of the mining process will be stored and ensure the database does currently not exist. We then run the `mine_git_repo` function with the optional argument `max_modifications = 1`. With this only commits in which 1 or less files were modified are mined.

In [None]:
import git2net

sqlite_db_file = 'git2net.db'

# Remove database if exists
if os.path.exists(sqlite_db_file):
    os.remove(sqlite_db_file)

max_modifications = 1
    
git2net.mine_git_repo(repo_dir, sqlite_db_file, max_modifications=max_modifications, timeout=1)

Found no database on provided path. Starting from scratch.


Parallel (8 processes):   1%|          | 1/128 [00:00<00:21,  5.93it/s]

Commit exceeding max_modifications:  bd0ad7b12500239321a8b7c6ba547f6111c781bb


Parallel (8 processes):   3%|▎         | 4/128 [00:00<00:31,  3.97it/s]

Timeout processing commit:  db0e20b413363cb93447ed4567bb8e959fc7f306
Timeout processing commit:  6a844a712042c3ce688a2060d8ae691b9ab86a32
Timeout processing commit:  ace737585423d53a9641069c486272ff0a16cd3d
Timeout processing commit:  c71a73528640d81b325e19aa38a3f68e4fe34366


Parallel (8 processes):   5%|▍         | 6/128 [00:01<00:24,  4.98it/s]

Timeout processing commit:  d7def7ec7a8305990640d86470d97f1a16727c5a
Timeout processing commit:  69d9c6578fa1aafcbe2ed548159ba16cf39ffc46
Timeout processing commit:  5e799a0f7574d60ccc9e7d9b21192c4f157f0302
Timeout processing commit:  15e0d8e497ee8d91e67b081ef9444bb7d3a6e9d9
Commit exceeding max_modifications:  eed200119f675f2abc69a5f72c3505e903b82fd2
Timeout processing commit:  d7c56fb572def9a359df2db0e2c8299494b83ca6


Parallel (8 processes):  11%|█         | 14/128 [00:01<00:16,  6.87it/s]

Timeout processing commit:  4f206f78ab3f4a5e1d75b863366cf387825dcf04


Parallel (8 processes):  14%|█▍        | 18/128 [00:01<00:12,  9.05it/s]

Timeout processing commit:  a78055e4e88e1878bda1d18b7505c20f947ad2b9
Timeout processing commit:  6d0ffd02a7e9af28f942bd53d547755e7d885ed0


Parallel (8 processes):  16%|█▋        | 21/128 [00:01<00:14,  7.32it/s]

Timeout processing commit:  03162fd93b2cfca4ec877d0229c7b0bd5a0d775e
Timeout processing commit:  20be877a25d43a896ad6dbb93f365d87b075d364
Timeout processing commit:  b82391892fcb10e40e4a4eb80ebc0aaa6d131db1
Timeout processing commit:  f56a051ffb4dae05ced1b6a2ccf8a9326af9639e


Parallel (8 processes):  19%|█▉        | 24/128 [00:02<00:11,  8.77it/s]

Timeout processing commit:  f2e3cce39f8738d934d458825b31cb38b61a1da6
Timeout processing commit:  d3a26c0062b116bc2280a7728025665932f3c959
Timeout processing commit:  a2d7c4f4f7ae756fc3f16e76d5e258525a8cae95
Timeout processing commit:  78e4776fd52ba4c57696dde110d2e2056dd29319
Commit exceeding max_modifications:  40cc53f783aeb835fbec20f4d5e165af4e24fd32
Timeout processing commit:  0c29df6064551b13c7f40a61f873dfb0166d45db
Commit exceeding max_modifications:  9a042c9c7c6a99733d1b94bf0d440f5d22389a79
Timeout processing commit:  c13245365f07efbd1bdf1989b9bbd09d7478a551


Parallel (8 processes):  27%|██▋       | 34/128 [00:02<00:07, 12.08it/s]

Timeout processing commit:  adc80c82c9b4bfec53a6385a8498e15e75595e35
Timeout processing commit:  c657e752b411caf531e3fff8fc0ea8e0b756ed43
Commit exceeding max_modifications:  a2d25731c924765db4f21fa3afa7d263b7c9e79d
Timeout processing commit:  6dec07bf1b1adf1615d5cad3ceed551ae41bff8b
Commit exceeding max_modifications:  87c4d8f3206b400785602de03bdf87f109a65008
Commit exceeding max_modifications:  95eb238eeb60d6f4d1eee5acdba1d195d6e0cf70


Parallel (8 processes):  31%|███▏      | 40/128 [00:02<00:05, 15.59it/s]

Commit exceeding max_modifications:  16b2226a47e2747a3de9ff07f5fec0ad1abd8e0c
Commit exceeding max_modifications:  f8e0c813a4a3049725b4c65a69bf0b487a685276


Process ForkPoolWorker-6:


Commit exceeding max_modifications:  eb40bbb2e7c68c7ab73bef6d91b41d2376581907


Traceback (most recent call last):
  File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)


Commit exceeding max_modifications:  64701617d0d468bba66760046d5519c54c7f3371


  File "/usr/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/cgote/polybox/03_PhD_ETH/04_Code/git2net/git2net/extraction.py", line 1004, in _process_commit
    commit = git_repo.get_commit(args['commit_hash'])
  File "/home/cgote/.local/lib/python3.7/site-packages/pydriller/git_repository.py", line 97, in get_commit
    return Commit(self.repo.commit(commit_id), self.path, self.main_branch)
  File "/home/cgote/.local/lib/python3.7/site-packages/git/repo/base.py", line 466, in commit
    return self.rev_parse(text_type(rev) + "^0")


Commit exceeding max_modifications:  c81b190fe260050fcd7ff86a7e947b47cf8f8085


  File "/home/cgote/.local/lib/python3.7/site-packages/git/repo/fun.py", line 213, in rev_parse
    obj = name_to_object(repo, rev[:start])
  File "/home/cgote/.local/lib/python3.7/site-packages/git/repo/fun.py", line 150, in name_to_object
    return Object.new_from_sha(repo, hex_to_bin(hexsha))
  File "/home/cgote/.local/lib/python3.7/site-packages/git/objects/base.py", line 64, in new_from_sha
    oinfo = repo.odb.info(sha1)
Parallel (8 processes):  35%|███▌      | 45/128 [00:02<00:04, 18.84it/s]  File "/home/cgote/.local/lib/python3.7/site-packages/git/db.py", line 37, in info
    hexsha, typename, size = self._git.get_object_header(bin_to_hex(sha))
  File "/home/cgote/.local/lib/python3.7/site-packages/git/cmd.py", line 1076, in get_object_header
    cmd = self._get_persistent_cmd("cat_file_header", "cat_file", batch_check=True)
  File "/home/cgote/.local/lib/python3.7/site-packages/git/cmd.py", line 1059, in _get_persistent_cmd
    cmd = self._call_process(cmd_name, *args, **opti

Commit exceeding max_modifications:  e75736eaf9bd01e6f410c4dc51d9e58dcf20eacb


  File "/home/cgote/.local/lib/python3.7/site-packages/git/cmd.py", line 1014, in _call_process
    return self.execute(call, **exec_kwargs)
  File "/home/cgote/.local/lib/python3.7/site-packages/git/cmd.py", line 735, in execute
    **subprocess_kwargs


Commit exceeding max_modifications:  91d5d98881c6289f42f30508c4b26d3fa7baf6ca


Parallel (8 processes):  39%|███▉      | 50/128 [00:02<00:04, 18.72it/s]

Timeout processing commit:  a78d4fc45b3004fc6e05f21eb4bdb53f8af523e6
Commit exceeding max_modifications:  240a13c3b87558cb85963d3cda415a63b54a8cbf
Timeout processing commit:  b27590f748980558479a8e0fe38eac19ffd5ed58


Parallel (8 processes):  42%|████▏     | 54/128 [00:02<00:03, 18.62it/s]

Timeout processing commit:  de280d299821d237393e9a0802f8258bd47eb36e
Timeout processing commit:  cd47fa9230fd7edb2b4734d05b72ea950d4d8c13
Timeout processing commit:  dc91d406614549ddea4d7dfbec64ff772dbace00


Parallel (8 processes):  45%|████▌     | 58/128 [00:03<00:03, 19.77it/s]

Timeout processing commit:  5fa0ab29b13b66022dab9ef13eb2d2a99da672ef
Timeout processing commit:  362e42d869640b598a31f2a217babbfe48b8f0ea
Timeout processing commit:  e3c6420dc70bd515ee1b9185c59548d90a710b04
Timeout processing commit:  a3213cd995e850c8966355755c4ac2ff61f65503
Timeout processing commit:  a0610f6375d0d26657a73f67a84e10cc8d88f578
Timeout processing commit:  8d6c1528ff61707116a030e9bf68129f6fa28c09


Parallel (8 processes):  51%|█████     | 65/128 [00:03<00:02, 24.63it/s]

Timeout processing commit:  8c0863612be75e11c1437c8a06cac77013115576
Timeout processing commit:  405d2b5823b859fa86c2dd0e05bc408e34ad61f1
Timeout processing commit:  7053b4a63f2f84e34b685178a05456006b6c2969
Commit exceeding max_modifications:  9ef69d206d7cedb82b12d68a39445b2e936cd15f


Parallel (8 processes):  54%|█████▍    | 69/128 [00:03<00:02, 23.43it/s]

Timeout processing commit:  8d9369e9e5a37f6c0675882322a15820264d67d0
Timeout processing commit:  f04ec5edd7757c3744d7f353abc844498c100816
Timeout processing commit:  71b1cd496f6dc800acd7e59260d86b647cc58291
Timeout processing commit:  ac163cf1117e97fb22784c722269cf7502cd1139
Commit exceeding max_modifications:  806fc44d2250c316c75692601362aecabc63d137
Commit exceeding max_modifications:  9e72df61bf300b42c3fbc16d94153e8edbbe6dd6


Parallel (8 processes):  58%|█████▊    | 74/128 [00:03<00:02, 26.93it/s]

Timeout processing commit:  83dcbd676d27de45d815af6deacfbfc6cd2e8e9c
Timeout processing commit:  d6ccd2125c6f360fe37b8b89eb409d8462fb9ab3
Timeout processing commit:  1c170af0018d97ce7c1a4ca42d82c913685e2eba
Commit exceeding max_modifications:  090c00c342283134a23900f85c1d232499617365
Commit exceeding max_modifications:  509e1394637f74a357ef2bf0c567dc6520a80eb6


Parallel (8 processes):  63%|██████▎   | 81/128 [00:03<00:01, 33.00it/s]

Commit exceeding max_modifications:  cf51fa8ddf40c85645cf9e6e7fb5c64b322a20ef
Commit exceeding max_modifications:  73e2b77a786cf19ec4a04e0a95ae4a0f93c45c54
Commit exceeding max_modifications:  b3b8e33bd6ae43ba9ff50f4b84cc2c6c897fe92b


Process ForkPoolWorker-9:


Commit exceeding max_modifications:  1504d68a4daf1e7529c6ac1a192794da765da9d2


Traceback (most recent call last):
  File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()


Commit exceeding max_modifications:  2294efe5bf28560eb11437f54e18c4ff710e2bd1


Parallel (8 processes):  70%|███████   | 90/128 [00:03<00:00, 40.54it/s]  File "/usr/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/cgote/polybox/03_PhD_ETH/04_Code/git2net/git2net/extraction.py", line 1010, in _process_commit
    alarm.start()
  File "/usr/lib/python3.7/threading.py", line 852, in start
    self._started.wait()


Timeout processing commit:  808e9f944fb3b120bef7866cbc38d34752ee6851


  File "/usr/lib/python3.7/threading.py", line 552, in wait
    signaled = self._cond.wait(timeout)
  File "/usr/lib/python3.7/threading.py", line 296, in wait
    waiter.acquire()
KeyboardInterrupt


Commit exceeding max_modifications:  7e7a8bd30a12628028234308ae6c7e2f5b5ec2b2
Commit exceeding max_modifications:  0a8bc07dfd7c481b8936fddd99e7a8a8aac74dfe
Timeout processing commit:  e72fc32211fa12c2b0b923ac58f1e4afca34c5fc
Timeout processing commit:  9186ca066593faf4b82a4cbb3efcdf4ef7e146d0
Timeout processing commit:  b55475b76c80e76a9ca50c6f195caf95a06615bd


Parallel (8 processes):  75%|███████▌  | 96/128 [00:04<00:01, 24.93it/s]

Timeout processing commit:  7159ffa4c0fc6a1d4f7f89cd7170485d5d55010e
Timeout processing commit:  6fda83e6987efc76d5257c3cc9211d15961fdf64
Timeout processing commit:  b165e9923807b2e7141d1b405228f7b283ca1ec2
Timeout processing commit:  4354c797085796cb2211a939231f40c4e9462785
Timeout processing commit:  44d132cb03b228294afcc39b7b2161e1898d6aab


Parallel (8 processes):  79%|███████▉  | 101/128 [00:04<00:01, 23.41it/s]

Timeout processing commit:  a685cd013932af10fe914367194f6f1c10cb893a
Timeout processing commit:  fa37b8165def3bfeea683e265b2a4bbe20d1e1cb
Timeout processing commit:  aeced6e9483005b380cb950dbdc9600ef8d1255e
Timeout processing commit:  0e064348e183d81f6a0329443c3ce57bc1322e26
Timeout processing commit:  6471245ec05e98a862301783d4d255d77fe00939
Timeout processing commit:  6a34bb3979f1ef490f9471dd7c80466a542907d0


Parallel (8 processes):  84%|████████▎ | 107/128 [00:04<00:00, 28.27it/s]

Timeout processing commit:  8f511679685996f0548f5f31009a6d574368bd7c
Timeout processing commit:  6b837c8f5383d16434afa0eaaab347873113230b
Timeout processing commit:  e512575035dd55386f54c25aba82e0278cf0f808
Timeout processing commit:  d32df107ed2d75c741726e88e39ba8b3f46b146f
Timeout processing commit:  063e74d1c06b1d0f3c62d34c5668c434f39a5d0f
Timeout processing commit:  682fc1eb9533303b8dd04586ab00cd96d76db4ab


Parallel (8 processes):  89%|████████▉ | 114/128 [00:04<00:00, 34.28it/s]

Timeout processing commit:  5e259c75123f17c47e270eace26145fe1f20167b
Commit exceeding max_modifications:  03d8af2aee9f4b4f320ec83deecccf9245b8ce03
Timeout processing commit:  4515874def08a49af37a4d47615a2e05dc508d89


Process ForkPoolWorker-3:


Commit exceeding max_modifications:  bbc55b0ec194710e39bfe098acaf930598eb045c


Traceback (most recent call last):
  File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 127, in worker
    put((job, i, result))
  File "/usr/lib/python3.7/multiprocessing/queues.py", line 364, in put
    self._writer.send_bytes(obj)


Timeout processing commit:  5afb93fe796bb84955296278c0c78ee6b36e49a1


  File "/usr/lib/python3.7/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
Parallel (8 processes):  94%|█████████▍| 120/128 [00:04<00:00, 36.24it/s]  File "/usr/lib/python3.7/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/usr/lib/python3.7/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
KeyboardInterrupt


Commit exceeding max_modifications:  3f6435c4acbf8d1b6b02e9be02b514c2c2d53446
Commit exceeding max_modifications:  4de4c314a1261e784f1a54969a32ab4dfa439d76
Timeout processing commit:  56d841a65a39508b3325ed6b770ed20b405a61ec
Timeout processing commit:  51c8b566692857ed39e899316da79e6ce83b0c4d
Timeout processing commit:  f650aa008e91423a455c9303cc11dc5680944ce8


Parallel (8 processes):  98%|█████████▊| 125/128 [00:04<00:00, 36.72it/s]

Timeout processing commit:  3ca9e249a39dab08750eed286b2f16b1fb533fa1


Parallel (8 processes):  98%|█████████▊| 126/128 [00:19<00:00, 36.72it/s]

While mining, `git2net` provides information about the current progress. The first line shows that no database was found at the current path and mining will be started from scratch. This is totally expected, as we deliberately deleted any existing database before the run.

Subsequently, progress updates on the mining process are printed. The first information denotes the number of processes `git2net` spawns and runs on. `git2net` is highly parallelised and will automatically detect the number of threads of your CPU, fully utilising all of them during operation. In case you want to reduce this load, this can be done by specifically setting the number of processes with the `no_of_processes` option of the `mine_git_repo` function.

The other output shows the number of commits and total number of commits mined in this run, as well as the elapsed time and an estimate of the remaining time to finish.

If a commit is skipped, the reason and the commit hash are printed. Currently, there are three cases in which a commit can be skipped. Firstly, as seen above a commit can exceed the maximum number of modifications set by `max_modifications`. Secondly, processing the commit can take longer as a maximum time defined by the `timeout` option. Thirdly, a commit can be skipped due to an error occuring within the commit. In these cases, please report the repository and commit hash in a new issue on github.com/gotec/git2net.

Let's resume the mining process while increasing the maximum number of modifications to 5!

In [None]:
max_modifications = 5

git2net.mine_git_repo(repo_dir, sqlite_db_file, max_modifications=max_modifications)

As you can see from the output above, the process was resumed from the old database, skipping the already processed commits in the repository.

Great, we made some progress and a large amount of the commits in the repository are already mined and in the database! But what about the other ones? We get some more information on the commits missing from the database from the `mining_state_summary` function. Similar to `mine_git_repo`, it also requires the paths to the repository as well as the database.

In [None]:
git2net.mining_state_summary(repo_dir, sqlite_db_file)

The function again provides a summary of the mining state, as well as details on all missing commits. Let's assume, we are very interest in commit *090c00c342283134a23900f85c1d232499617365* but want to avoid crawling the other missing commits. While this is uneccessary for small repositories such as `git2net` this might become higly relevant for larger projects such as `linux`, where individual commits can make changes to thousands of files which in turn require significant computational resources to analyse. This is particularly important for merge commits, as all files included in the diffs to both parent commits need to be considered. Therefore, for larger projects I generally recommend to run `git2net` with `max_modifications = 1000`, subsequently increasing this number if required.

But now back to mining specifically commit *090c00c342283134a23900f85c1d232499617365*, which can be done with the `commits` option in `mine_git_repo`. We also set the number of processes to 1, enabling serial mode, which can be very helpful for debugging as significantly more information is printed.

In [None]:
# mine_git_repo takes list of commits
commits = ['090c00c342283134a23900f85c1d232499617365']

git2net.mine_git_repo(repo_dir, sqlite_db_file, commits=commits, no_of_processes=1)

Congratulations, you have now mined your first git repository using `git2net`! Note, though that not all commits have been mined at this point. This will be done at a later stage of this tutorial.

## Visualisation and Analysis

You can now use the database to query various information on different commits or edits. In addition, `git2net` also provides the functionality to generate various network projections of the data.

To start, lets try to obtain a co-editing network for our project. This is as simple as calling the `get_coediting_network` function and providing the database we just mined.

In [None]:
t, node_info, edge_info = git2net.get_coediting_network(sqlite_db_file)
t

The function returns a `pathpy` temporal network object as well as two dictionaries which can be used to return properties of nodes and edges. As of writing this tutorial not all of them are used but they are set as placeholders for future versions of `git2net`.

A `pathpy` temporal network object can be visualised by itself as shown above. In addition, we can also aggregate the network, by dropping the order of events, yielding a standard network object. Let's do this next.

In [None]:
import pathpy as pp
pp.Network.from_temporal_network(t)

In both the temporal and aggregated network, a node represents an author, whereas edges point from the person changing a line of code to the person who was the original author.

Next, we could ask the question which those files were that authors collaborated on. Therefore, we can plot a bipartite network containing both files and authors as nodes.

In [None]:
t, node_info, edge_info = git2net.get_bipartite_network(sqlite_db_file)
n = pp.Network.from_temporal_network(t)
n

For this network, `node_info` contains the classes of authors in the network. These can e.g. be used to color nodes as shown below.

In [None]:
colour_map = {'author': '#73D2DE', 'file': '#2E5EAA'}
node_color = {node: colour_map[node_info['class'][node]] for node in n.nodes}
pp.visualisation.plot(n, node_color=node_color)

The projection of this network that links authors editing the same file is the co-authorship network.

In [None]:
n, node_info, edge_info = git2net.get_coauthorship_network(sqlite_db_file)
n

Note that it looks similar, however, all information on the direction of interactions is lost.

If we are interested in e.g more recently edited files, we can filter the database by providing the `time_from` and `time_to` options. Let's check the files edited since May 2019.

In [None]:
from datetime import datetime
time_from = datetime(2019, 5, 1)
t, node_info, edge_info = git2net.get_bipartite_network(sqlite_db_file, time_from=time_from)
n = pp.Network.from_temporal_network(t)
colour_map = {'author': '#73D2DE', 'file': '#2E5EAA'}
node_color = {node: colour_map[node_info['class'][node]] for node in n.nodes}
pp.visualisation.plot(n, node_color=node_color)

`git2net` allows the extraction of editing paths on the level of individual lines. I.e. we are able to track consecutive changes made to a single line over time&mdash;even if these lines move up or down in a file, or even across files. This is very powerful, as it allows us to determine editing sequences as well as find lines that require more editing than others. These could either be very difficult lines to implement or contain very important information, such as the version number in an `__init__.py` file.

To extract these paths, we can use the `get_line_editing_paths` function. As these networks tend be very large we limit the analysis to a very small file for this tutorial. To only look at a specific set of file paths we can use the `file_paths` option.

In [None]:
paths, dag, node_info, edge_info = git2net.get_line_editing_paths(sqlite_db_file,
                                                                  file_paths=['git2net/__init__.py'])
pp.visualisation.plot(dag, node_color=node_info['colors'])

As you can see in the output above, the function first looks for aliases. These are other names of the files in the repository that can occur through renaming or moving the file. To follow the edits made to specific lines, we need to be aware of these renamings to track lines across these files.

Further notice, that despite only looking at a single file the network shown above is not connected. This is due to our database not being complete. Let's fix this now and try again.

In [None]:
git2net.mine_git_repo(repo_dir, sqlite_db_file)

In [None]:
paths, dag, node_info, edge_info = git2net.get_line_editing_paths(sqlite_db_file,
                                                                  file_paths=['git2net/__init__.py'])
pp.visualisation.plot(dag, node_color=node_info['colors'])

As mentioned before, these networks get very large very quickly. Therefore, it is often more useful to work with the `pathpy` path object that is also returned by the function. It cointains all paths and subpaths contained in the network shown above. More information regarding this object can be found in the documentation on [pathpy.net](http://www.pathpy.net/).

This concludes this tutorial, which I hope you found useful. Enjoy using `git2net` and best of luck for your research! If you find any bugs with the code please let me know on [github.com](https://github.com/gotec/git2net).

`git2net` has been developed as open source project. This means your ideas and inputs are highly welcome. Feel free to share the project and contribute yourself. You can imediately get started on the repository you just downloaded!