TDMINING Official Repository

TDMINING or (Technical Debt Mining Process) is a process designed to generate data and knowledge about code smells, focusing on the developer. It is independent of any specific static code analysis or statistical analysis tool.

Experiments with TDMINING utilize tools such as SonarQube, Python scripts with NumPy, Pandas, and other machine learning and data science packages. Data visualization is performed using Jupyter Notebooks, but can be adapted to Grafana or other tools.

This repository contains all artifacts required to run the TDMINING experiment. Below is a detailed description of the main directories and files:

Directory and File Overview

BPMN/

tdmining_bizagi_ENUS.bpm: BPMN diagram of the TDMINING process (English). Can be viewed with Bizagi Modeler.

data_sqlserver/

DATA_DICTIONARY.md: Data dictionary for the experiment's database schema.
DM_ACTION.csv, DM_COMMITS.csv, DM_DEVELOPER.csv, DM_PROJECT.csv, DM_RULE.csv, DM_SEVERITY.csv, DM_TDITEM.csv, DM_TIME.csv, DM_VERSION.csv: Dimension tables in CSV format for the STARSCHEMA used in Experiment 1.
FT_TECHNICALDEBT.csv, FT_TECHNICALDEBT_IGNORE_CONDITION.csv: Fact tables with technical debt data and ignore conditions.
ORGANIZATION.csv, ORGANIZATION_PROJECT.csv, PROJECT_VERSION.csv: Additional supporting tables in CSV format.

DOCS/

DOC1 - INFRASTRUCTURE FORM_EN.docx: Infrastructure form (English).
DOC2 - ENVIRONMENT VARIABLES DOCUMENT_EN.docx: Documentation of environment variables (English).
DOC3 - RESULT EXECUTION SCRIPTS JUPYTER_EN.docx: Results documentation for Jupyter scripts (English).
DOC4 - RESTRICTIONS CHECKLIST _EN.docx: Restrictions checklist (English).

PYTHON/

Python scripts for data processing, ETL, and experiment automation:

condition-ignore.py: Updates the technical debt fact table to mark records as ignored based on conditions from the ignore condition table.
configuration.py: Loads configuration values from config.ini for database and project settings.
executer.py: Automates the process of running SonarQube analysis for each commit, updating the database, and managing project properties and branches.
extract.py: Extracts commit and file modification data from a Git repository using PyDriller and saves it to CSV files.
import_csvs_to_sqlite.py: Imports CSV files into a SQLite database, normalizing column names and creating indexes for ID columns.
insert.py: Inserts commit metadata from a Git repository into the database using PyDriller.
Query.py: Loads SQL queries from query.ini for use in scripts and notebooks.
reset-database.py: Resets the database to a previous state using a query from query.ini.
config.ini, query.ini: Configuration files for database connections and SQL queries.
requirements.txt: Python dependencies for the project.
get-pip.py: Script to install pip if needed.

Jupyter Notebooks for data analysis and visualization:

question-1-metric-1.ipynb: Analyzes the number of developers and the amount of open code smells using SQL queries and data visualization.
question-1-metric-2.ipynb: Analyzes the number of developers and the amount of closed code smells using SQL queries and data visualization.
question-2-metric-1.ipynb: Performs a multi-step analysis of code smells per developer, including several SQL scripts and result displays.
question-3-metric-1.ipynb: Applies association rule mining (Apriori, FP-Growth) to code smell data to find frequent patterns and rules.
question-4-metric-1.ipynb: Calculates and visualizes moving averages for code smell creation and effort indices for all developers and a selected developer.
question-5-metric-1.ipynb: Performs time series forecasting (ARIMA) on code smell creation, including stationarity tests and error metrics.
test-hypothesis.ipynb: Tests the correlation between the number of code smells and the number of commits using Spearman and Pearson coefficients.
test-normality.ipynb: Tests the normality of code smell creation data using Shapiro-Wilk and Kolmogorov-Smirnov tests, with histograms and descriptive statistics.

SQL/

SQL scripts for database creation, data loading, analysis, and utility functions:

ANALISYS_TEMP.sql: Performs ETL operations to identify and aggregate technical debt issues that were opened and closed, joining data from multiple sources. It creates temporary tables for open/closed issues, aggregates commit and issue data, and updates a summary table with commit hashes and timestamps for open/closed states.
CREATE_DDL.sql: Contains Data Definition Language (DDL) statements to create the main database schema for the experiment. It defines tables for organizations, projects, developers, actions, time, technical debt, rules, severities, and their relationships, including primary and foreign keys. Some commented-out sections suggest additional profile and remediation tables.
LOAD_DATA.sql: Populates the main dimension tables with data extracted from source repositories. Inserts developers, time periods, commits, actions, technical debt item types, rules, and severities. Also creates a temporary fact table for technical debt and loads it into the main fact table.

SQL/CREATE_TABLE/

COMMITS.sql: Creates the COMMITS table to store commit metadata, including author, hash, date, project, and processing status. Adds a column to track whether each commit has been processed.
INDEX.sql: Creates indexes on the ISSUES and ISSUES_SONAR tables to optimize queries by commit hash. Also adds a primary key column to ISSUES_SONAR.

SQL/FUNCTIONS_SQL/

FN_FORMAT_TEXT.sql: Defines a SQL function to format and normalize text by removing special characters, accents, punctuation, numbers, and spaces, and converting to lowercase. Useful for text standardization in data processing.
FN_SLINTSTRING .SQL: Defines a table-valued function to split a string into elements based on a separator character. Useful for parsing delimited strings into rows.
FN_UNIXDATE_TODATE.sql: Defines a function to convert a Unix timestamp (seconds since 1970-01-01) to a SQL DATETIME value.

SQL/WORDPRESS_VERSION_DATA/

VERSION_WORDPRESS.sql: Populates tables with WordPress project and version information, including version names and release dates. Calculates version periods and links commits to the corresponding version based on commit date.

For more information, refer to the documentation in the DOCS folder (English files only).

Windows User

Copy folder process_sonar to c:\

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
BPMN		BPMN
DOCS		DOCS
PYTHON		PYTHON
SQL		SQL
data_sqlserver		data_sqlserver
process_sonar/git-1/sonar-scanner-wordpress		process_sonar/git-1/sonar-scanner-wordpress
sonarqube-1		sonarqube-1
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
PROCESS_FLOW_MERMAID.md		PROCESS_FLOW_MERMAID.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TDMINING Official Repository

Directory and File Overview

BPMN/

data_sqlserver/

DOCS/

PYTHON/

SQL/

SQL/CREATE_TABLE/

SQL/FUNCTIONS_SQL/

SQL/WORDPRESS_VERSION_DATA/

Windows User

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TDMINING Official Repository

Directory and File Overview

BPMN/

data_sqlserver/

DOCS/

PYTHON/

SQL/

SQL/CREATE_TABLE/

SQL/FUNCTIONS_SQL/

SQL/WORDPRESS_VERSION_DATA/

Windows User

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages