### Part 1 a: Web scrapping to get 124 github links
Instruction to use selenium: https://towardsdatascience.com/how-to-use-selenium-to-web-scrape-with-example-80f9b23a843a

In [83]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
import pandas as pd
import os

# Download and Access WebDriver
# In my case, I download at the link: https://googlechromelabs.github.io/chrome-for-testing/#stable

# Specify the path to the downloaded ChromeDriver
driver_path = os.path.join(os.getcwd(), "chromedriver-win64", "chromedriver.exe")

browser = webdriver.Chrome()
cService = webdriver.ChromeService(executable_path=driver_path)
driver = webdriver.Chrome(service=cService)

# Access Website Via Python
driver.get("http://aserg-ufmg.github.io/why-we-refactor/#/projects")

# driver.set_page_load_timeout(30)  # wait for the page to load

In [91]:
df = pd.DataFrame(columns=["Project", "Creation Date", "Commits", "Java Files", "Contributors"])  # creates master dataframe

# Extract project rows from the table
data = driver.find_elements(By.CLASS_NAME, "ng-binding")
list_data = enumerate(data)

print(len(list(list_data)))

# Group data into chunks of 5 for each project
rows = []
for i in range(1, len(data), 5):
    project_link = data[i].text
    creation_date = data[i + 1].text
    commits = data[i + 2].text
    java_files = data[i + 3].text
    contributors = data[i + 4].text

    # Create a dictionary for each project
    row = {
        "Project": project_link,
        "Creation Date": creation_date,
        "Commits": commits,
        "Java Files": java_files,
        "Contributors": contributors,
    }

    # Append each row to the list
    rows.append(row)

# Use pd.concat to combine all rows into the DataFrame
df = pd.concat([df, pd.DataFrame(rows)], ignore_index=True)

# Output the DataFrame
print(df)

621
                                               Project Creation Date Commits  \
0    https://github.com/JetBrains/intellij-communit...       9/30/11  162625   
1                 https://github.com/JetBrains/MPS.git       8/15/11   66445   
2    https://github.com/CyanogenMod/android_framewo...       5/13/13   62208   
3       https://github.com/liferay/liferay-plugins.git       9/25/09   33929   
4                   https://github.com/neo4j/neo4j.git      11/12/12   29187   
..                                                 ...           ...     ...   
119               https://github.com/zeromq/jeromq.git        8/1/12     316   
120          https://github.com/bitfireAT/davdroid.git       8/25/13     291   
121           https://github.com/bennidi/mbassador.git      10/23/12     236   
122        https://github.com/novoda/android-demos.git       7/26/09     142   
123               https://github.com/jfinal/jfinal.git       4/25/12     102   

    Java Files Contributors  
0    

In [108]:
# Close the browser once the data is collected
driver.quit()

### Part 1 b: Divide 124 projects into 5 groups (Keep the last group with 24 projects), randomly select 2 projects from each group (so we have 10 projects in total to work with)

In [13]:
import random
from numpy import sort
# List of 124 project names (example list, you can replace it with actual project names)
projects = [f"{i+1}" for i in range(124)]

# Divide projects into 5 groups, with first 4 groups contain 25 projects each
group_size = 25
groups = [projects[i : i + group_size] for i in range(0, 100, group_size)]
# Add the remaining 24 projects as the last group
groups.append(projects[100:])

# Randomly select 2 projects from each group
selected_projects = []
for group in groups:
    selected_projects.extend(random.sample(group, 2))

selected_projects = [int(project) for project in selected_projects] # Convert to int for sorting
selected_projects = sort(selected_projects)

print(selected_projects)
# Output the selected 10 projects
print("Selected projects to work with:")
for project in selected_projects:
    print("Project_"+str(project))

[ 10  25  45  50  58  70  78  91 106 114]
Selected projects to work with:
Project_10
Project_25
Project_45
Project_50
Project_58
Project_70
Project_78
Project_91
Project_106
Project_114


Save lists of project number just in case you restart kernal

In [100]:
selected_projects = [ 10,  25,  45,  50,  58,  70,  78,  91,  106,  114]

Get 10 random github links that we generate above

In [96]:
project_links = []

for project in selected_projects:
    project_link = df.loc[project - 1, "Project"]
    project_links.append(project_link)

# Print the project links
for link in project_links:
    print(link)

https://github.com/belaban/JGroups.git
https://github.com/eclipse/jetty.project.git
https://github.com/Activiti/Activiti.git
https://github.com/spring-projects/spring-roo.git
https://github.com/open-keychain/open-keychain.git
https://github.com/SecUpwN/Android-IMSI-Catcher-Detector.git
https://github.com/github/android.git
https://github.com/spring-projects/spring-data-neo4j.git
https://github.com/RoboBinding/RoboBinding.git
https://github.com/hierynomus/sshj.git


Clone 10 github repos to local machine.

Note that you might face error when cloning that returned non-zero exit status 128. If so, fix it by run cmd as admin, run "git config --system core.longpaths true"

In [107]:
import os
import subprocess

pwd = os.getcwd()

for link in project_links:
    # Get the clean repo name by splitting the URL and removing ".git"
    repo_suffix = link.split("/")[-1]
    repo_name = repo_suffix.rstrip(".git")
    # Path to the target directory where the repository will be cloned
    repo_path = os.path.join(pwd, "RepoFolder", repo_name)

    # Cloning the project into the current working directory
    # Check if the directory already exists

    print(f"Cloning project {repo_name}")
    # Cloning the project with error handling
    try:
        subprocess.run(["git", "clone", link, repo_path], check=True)
        print(f"Project {repo_name} cloned")
    except subprocess.CalledProcessError as e:
        print(f"Error cloning {repo_name}: {e}")
        continue  # Skip to the next repository if cloning fails

Cloning project JGroups
Project JGroups cloned
Cloning project jetty.projec
Project jetty.projec cloned
Cloning project Activ
Project Activ cloned
Cloning project spring-roo
Project spring-roo cloned
Cloning project open-keychain
Project open-keychain cloned
Cloning project Android-IMSI-Catcher-Detector
Project Android-IMSI-Catcher-Detector cloned
Cloning project android
Project android cloned
Cloning project spring-data-neo4j
Project spring-data-neo4j cloned
Cloning project RoboBindin
Project RoboBindin cloned
Cloning project sshj
Project sshj cloned


Don't run it for now, keep it last 

Delete projects to free memory (you can do it manually at last :D)

In [106]:
import stat
import shutil
from pathlib import Path

def readonly_to_writable(foo, file, err):
    if Path(file).suffix in [".idx", ".pack"] and "PermissionError" == err[0].__name__:
        os.chmod(file, stat.S_IWRITE)
        foo(file)

repo_path = os.path.join(pwd, "RepoFolder")
print("Repo path: ", repo_path)

 # Clean up: Remove the cloned repository
if os.path.exists(repo_path):
    shutil.rmtree(repo_path, onerror=readonly_to_writable)
    print(f"Project {repo_name} erased\n")

Repo path:  d:\Oulun\Period 5\Software development, Maintenance and Operation\Projects\RepoFolder
Project sshj erased



### Part 1 c: Mine the refactoring activity applied in the history of the cloned projects using RefactoringMiner library
You can read more here: https://github.com/tsantalis/RefactoringMiner

#### Step 1: First thing first, download RefactoringMiner library
- Link and instruction to download: https://github.com/tsantalis/RefactoringMiner/releases
- I use version 3.0.8: https://github.com/tsantalis/RefactoringMiner/releases/download/3.0.8/RefactoringMiner-3.0.8.zip


#### Step 2: Add path of bin to system environment variables
Search google if you don't know how to add

#### Step 3: Make sure it run normally
Make sure you have Java installed on your computer. If you don't, download it here: https://www.oracle.com/java/technologies/javase/jdk17-archive-downloads.html

Then try to run this command line:
> git clone https://github.com/danilofes/refactoring-toy-example.git refactoring-toy-example

> RefactoringMiner -c refactoring-toy-example 36287f7c3b09eff78395267a3ac0d7da067863fd

Since release 3.0.0, RefactoringMiner requires Java 17 or newer and Gradle 7.4 or newer

or run 
> RefactoringMiner -gc https://github.com/danilofes/refactoring-toy-example.git 36287f7c3b09eff78395267a3ac0d7da067863fd 10 -json result.json