## *WARNING*
<ins>Before running this script make sure that you followed steps described [here](https://github.com/pwr-pbr23/M6#preparation-for-reproduction).</ins>
## Accessing files for reproduction
To access files we need to mount google drive and change working directory. To mount drive a pop up window will appear - follow necessary steps.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## Installing necessary libraries - 2 min
Since we mounted google collab we do not need to repeat all the steps each time, since the result files are permanently saved, howerer when it comes to installing libaries this step <ins>needs to be run before each new session</ins>.

In [None]:
!pip install -q condacolab
import condacolab
condacolab.install()
!conda env create -f /content/drive/MyDrive/M6/DeepLineDP/requirements.yml

In [None]:
%%bash
source activate DeepLineDP_env
pip install torch torchvision torchaudio

## Change working directory

In [None]:
%cd /content/drive/MyDrive/M6/DeepLineDP/script

In [None]:
!ls

The previous line should've returned:

```
condacolab_install.log			my_util.py
DeepLineDP_model.py			new_preprocessing_methods.py
export_data_for_line_level_baseline.py	preprocess_data.py
file-level-baseline			__pycache__
generate_prediction_cross_projects.py	Rplots.pdf
generate_prediction.py			run_py_files.ipynb
get_evaluation_result.R			train_model.py
line-level-baseline			train_word2vec.py
```

## preprocess_data.py - 9 min
Data prepared for training models is saved in `/content/drive/MyDrive/M6/DeepLineDP/datasets/preprocessed_data`.

In [None]:
%%bash
source activate DeepLineDP_env
python preprocess_data.py

## train_word2vec.py - 3 min
It creates word2vec model, which is saved in `/content/drive/MyDrive/M6/DeepLineDP/output/Word2Vec_model`.

In [None]:
%%bash
source activate DeepLineDP_env

python train_word2vec.py activemq
# python train_word2vec.py camel
# python train_word2vec.py derby
# python train_word2vec.py groovy
# python train_word2vec.py hbase
# python train_word2vec.py hive
# python train_word2vec.py jruby
# python train_word2vec.py lucene
# python train_word2vec.py wicket

## train_model.py - 51 min
Trains model and saves model along with loss (.csv).

| Output | saved location                                                 |
|--------|----------------------------------------------------------------|
| model  | `/content/drive/MyDrive/M6/DeepLineDP/output/model/DeepLineDP` |
| loss   | `/content/drive/MyDrive/M6/DeepLineDP/output/loss/DeepLineDP`  |


In [None]:
%%bash
source activate DeepLineDP_env

python train_model.py -dataset activemq
# python train_model.py -dataset camel
# python train_model.py -dataset derby
# python train_model.py -dataset groovy
# python train_model.py -dataset hbase
# python train_model.py -dataset hive
# python train_model.py -dataset jruby
# python train_model.py -dataset lucene
# python train_model.py -dataset wicket

## generate_prediction.py - 14 min

| generated files | saved location                                             |
|-----------------|------------------------------------------------------------|
| output          | `/content/drive/MyDrive/M6/DeepLineDP/output/intermediate_output/DeepLineDP/within-release/` |
| prediction      | `/content/drive/MyDrive/M6/DeepLineDP/output/prediction/DeepLineDP/within-release/`          |


In [None]:
%%bash
source activate DeepLineDP_env

python generate_prediction.py -dataset activemq
# python generate_prediction.py -dataset camel
# python generate_prediction.py -dataset derby
# python generate_prediction.py -dataset groovy
# python generate_prediction.py -dataset hbase
# python generate_prediction.py -dataset hive
# python generate_prediction.py -dataset jruby
# python generate_prediction.py -dataset lucene
# python generate_prediction.py -dataset wicket

## generate_prediction_cross_projects.py - 1h

| generated files | saved location                                             |
|-----------------|------------------------------------------------------------|
| output          | `/content/drive/MyDrive/M6/DeepLineDP/output/intermediate_output/DeepLineDP/cross-project/` |
| prediction      | `/content/drive/MyDrive/M6/DeepLineDP/output/prediction/DeepLineDP/cross-project/`          |


In [None]:
%%bash
source activate DeepLineDP_env

python generate_prediction_cross_projects.py -dataset activemq
# python generate_prediction_cross_projects.py -dataset camel
# python generate_prediction_cross_projects.py -dataset derby
# python generate_prediction_cross_projects.py -dataset groovy
# python generate_prediction_cross_projects.py -dataset hbase
# python generate_prediction_cross_projects.py -dataset hive
# python generate_prediction_cross_projects.py -dataset jruby
# python generate_prediction_cross_projects.py -dataset lucene
# python generate_prediction_cross_projects.py -dataset wicket

## Setup for running R script
rpy2 is a library that makes it possible to run R scripts and code from jupyter notebook. To better understand the following setup and installation go to
[this site](https://rpy2.github.io/doc/latest/html/interactive.html).

In [None]:
!pip install rpy2==3.5.1
%load_ext rpy2.ipython

## Installing R packages - 35 min

Before this step reload might be required.

In [None]:
%R install.packages("tidyverse", dependencies=TRUE)
%R install.packages("gridExtra", dependencies=TRUE)
%R install.packages("ModelMetrics", dependencies=TRUE)
%R install.packages("caret", dependencies=TRUE)
%R install.packages("reshape2", dependencies=TRUE)
%R install.packages("pROC", dependencies=TRUE)
%R install.packages("effsize", dependencies=TRUE)
%R install.packages("ScottKnottESD", dependencies=TRUE)

## get_evaluation_result.R - 15 min
The results from running this script aare saved in `/content/drive/MyDrive/M6/DeepLineDP/output/figure/` (graphs as .pdf files)

In [None]:
!Rscript get_evaluation_result.R

In [None]:
!sudo apt install imagemagick
!pip install wand

In [None]:
from IPython.display import Image, display

figures = '/content/drive/MyDrive/M6/DeepLineDP/output/figure'
listOfImageNames = [figures + '/file-Effort@Top20Recall.png',
                    figures + '/file-Recall@Top20LOC.png',
                    figures + '/file-IFA.png']

for imageName in listOfImageNames:
  print(imageName)
  display(Image(filename=imageName))