# Software Evolution Analysis

![](images/heraclitus.png)

> Everything changes, and nothing stands still, 
> 
> and you can not step twice in the same... system.
> 
> -- Heraclitus


# Metaphor Limitations: Software Architecture

- Makes it sound like it's something fixed...
- Even real world architecture, in time changes [Brand]
- [Brand] - *How Buildings Learn*. Steward Brand
  - The Long Now Foundation - Podcast








## Further Metaphors

My Favorite Metaphors of Software Development Emphasize Change...


### 1. Performance Art
- art: because it's creative
- performance: you can't put it in a frame 
- => *advice:* if you ever create a cool innovative software then **make a screencast** about it


### 2. A Garden 
- It needs somebody to always tend to it

> I still remember the jolt I felt in 1958 when I first heard a friend talk about building a program, as opposed to writing one. In a flash he broadened my whole view of the software process.

Brooks however thinks the building metaphor is not well equipped to handle the current projects we’re developing. Instead of building, which requires adequate plans and foresight, we should focus on growing a program organically. (Once even a very simple program is up and running, developers are much more enthusiastic about the progress.)



### 3. Software Aging 

David Parnas's **Software Aging** [1]

> Programs, like people, get old. 

- We can’t prevent aging, but 
  - we can understand its causes, 
  - take steps to limits its effects, 
  - temporarily reverse some of the damage it has caused, 
  - and prepare for the day when the software is no longer viable

[1] Software Aging. David Lorge Parnas, https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=296790


# Laws of Software Evolution

Lehman[1] proposed the laws about e-type systems:
  - an e-type system is *embedded* in the real world
  - and since the real world always changes... 
      - even if it weren't, the software ecosystem eventually changes [2] 
      - e.g. javascript packages, etc.

[1] Lehman, Belady. Program Evolution: Processes of Software Change, London Academic Press, London, 1985

[2] We'll talk more about ecosystems in the ASE course

## 1st Law of Software Evolution: E-Type Systems Must Change


> A program that is used in a real-world environment must change, or become progressively less useful in that environment. (Lehman's Law of Continuing Change)



        



## 2nd Law of Software Evolution: Ent*0py Happens!

Manny Lehmann's **Law of Increasing Entropy**: 

> As a program evolves, it becomes more complex, and extra resources are needed to preserve and simplify its structure.



# What if We Use System Evolution for Good? 
## e.g. for understanding

 
 By data mining the version repository we can find: 

  - places in the code which are high-risk (because they were risky in the past)
    - + linking with issue tracker info

  - parts of the system that need refactoring (study of Hitesh Sajnani)
  
  - navigation suggestions (e.g. Mylar for Eclipse)


Today: 
  1. entities in the codebase where most effort was invested
  1. invisible dependencies between files (e.g. logical coupling)
  
  
  
 






## VCS Capture The Software Evolution

VCS = version control system 


Over the last two decades **we have seen increases in**...
  - **popularity of version control systems**
https://trends.google.com/trends/explore?date=all&q=git,svn,software%20architecture,mercurial
    - it's even funny for us to think that people used to email files around to collaborate
    - one of the many practices that we, software engineers, have been teaching the rest of the world



- **knowledge of how to manage versions**
  - branching strategies
  - integration with CI
  - semantic versioning 



*How to integrate this information in AR?...*


## Architectural Viewpoint: Evolutionary Hotspots 

Evolutionary Hotspots =(*def*) **code entities where most effort was invested ** [1]


Assumption: effort is proportional to architectural relevance


Why? 
- Philosophycally
 > *"The value of anything is proportional to time invested in it."* (M. Lungu)
 
 
- Practically:
  - high *churn* (change density) predicts bugs better than size [...]
  - studies observe correlation between churn and complexity metrics [...]
  - it's likely that they'll require more effort in the future (e.g. yesterday's weather [Girba et al.])
    
    
- Pragmatically:
  - can be detected with **language independent analysis** (which is good for polyglot systems)


[1] *Source Code as a Crime Scene*. A. Tornhill

  
  



### Evolutionary Hotspots In Practice

Challenges / Implementation Details: 
- how to measure effort invested? 
- what are the entities (files, aggregates?)
- on what period is the study performed 
  - results will likely differ for periods






### Example Analysis

VCS: Git

Period of study: whole history

Entities: files (+aggregation to modules)

Invested effort: number of commits

Case Study: Zeeguu-Core

Toolbox: Python + PyDriller


In [3]:
import sys

!{sys.executable} -m pip install pydriller
!{sys.executable} -m pip install gitpython

Collecting pydriller
  Using cached PyDriller-1.15.5-py3-none-any.whl (63 kB)
Collecting lizard
  Using cached lizard-1.17.7-py2.py3-none-any.whl (62 kB)
Collecting pytz
  Using cached pytz-2021.1-py2.py3-none-any.whl (510 kB)
Installing collected packages: pytz, lizard, pydriller
Successfully installed lizard-1.17.7 pydriller-1.15.5 pytz-2021.1


In [4]:
from pydriller import RepositoryMining
REPO_DIR = '/Users/mircea/Zeeguu-Core/'


#### Every commit is modelled as "multiple modifications" each one involving a filename

In [None]:
for commit in RepositoryMining(REPO_DIR).traverse_commits():
    print("commit" + str(commit))
    for m in commit.modifications:
        print(
            "- Author {}".format(commit.author.name),
            " modified {}".format(m.filename),
            " with a change type of {}".format(m.change_type.name),
            " and the complexity is {}".format(m.complexity)
        )


commit<pydriller.domain.commit.Commit object at 0x112e7e280>
- Author Mircea Lungu  modified LICENSE  with a change type of ADD  and the complexity is None
- Author Mircea Lungu  modified README.md  with a change type of ADD  and the complexity is None
- Author Mircea Lungu  modified de-test.txt  with a change type of ADD  and the complexity is None
- Author Mircea Lungu  modified de.txt  with a change type of ADD  and the complexity is None
- Author Mircea Lungu  modified fr.txt  with a change type of ADD  and the complexity is None
- Author Mircea Lungu  modified it.txt  with a change type of ADD  and the complexity is None
- Author Mircea Lungu  modified nl.txt  with a change type of ADD  and the complexity is None
- Author Mircea Lungu  modified sources.txt  with a change type of ADD  and the complexity is None
- Author Mircea Lungu  modified setup.py  with a change type of ADD  and the complexity is 0
- Author Mircea Lungu  modified test.py  with a change type of ADD  and the comp

- Author Mircea Lungu  modified default_config.py  with a change type of DELETE  and the complexity is None
- Author Mircea Lungu  modified model_test_mixin.py  with a change type of RENAME  and the complexity is None
- Author Mircea Lungu  modified __init__.py  with a change type of DELETE  and the complexity is None
- Author Mircea Lungu  modified run_all.sh  with a change type of RENAME  and the complexity is None
- Author Mircea Lungu  modified test_bookmark.py  with a change type of RENAME  and the complexity is 7
- Author Mircea Lungu  modified test_domain.py  with a change type of RENAME  and the complexity is 6
- Author Mircea Lungu  modified test_feed.py  with a change type of RENAME  and the complexity is 1
- Author Mircea Lungu  modified test_language.py  with a change type of RENAME  and the complexity is 6
- Author Mircea Lungu  modified test_user_accounts.py  with a change type of RENAME  and the complexity is 2
- Author Mircea Lungu  modified test_user_preferences.py  wi

- Author Mircea Lungu  modified text_difficulty.py  with a change type of MODIFY  and the complexity is 12
- Author Mircea Lungu  modified __init__.py  with a change type of MODIFY  and the complexity is 0
- Author Mircea Lungu  modified bookmark.py  with a change type of MODIFY  and the complexity is 52
- Author Mircea Lungu  modified encounter_based_probability.py  with a change type of DELETE  and the complexity is None
- Author Mircea Lungu  modified feeds.py  with a change type of MODIFY  and the complexity is 22
- Author Mircea Lungu  modified knowledge_estimator.py  with a change type of MODIFY  and the complexity is 19
- Author Mircea Lungu  modified known_word_probability.py  with a change type of DELETE  and the complexity is None
- Author Mircea Lungu  modified word_encounter_stats.py  with a change type of ADD  and the complexity is 8
- Author Mircea Lungu  modified word_exercise_stats.py  with a change type of RENAME  and the complexity is 25
- Author Mircea Lungu  modifie

- Author Mircea Lungu  modified .travis.yml  with a change type of MODIFY  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x1130d0dc0>
- Author Mircea Lungu  modified .travis.yml  with a change type of MODIFY  and the complexity is None
- Author Mircea Lungu  modified setup.py  with a change type of MODIFY  and the complexity is 0
- Author Mircea Lungu  modified model_test_mixin.py  with a change type of MODIFY  and the complexity is 4
commit<pydriller.domain.commit.Commit object at 0x1130d04c0>
- Author Mircea Lungu  modified README.md  with a change type of MODIFY  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x112e4a0d0>
- Author Mircea Lungu  modified setup.py  with a change type of MODIFY  and the complexity is 0
commit<pydriller.domain.commit.Commit object at 0x112e7e340>
- Author Mircea Lungu  modified setup.py  with a change type of MODIFY  and the complexity is 0
commit<pydriller.domain.commit.Commit object at 0x112e4a0

- Author Timon Back  modified testing_default.cfg  with a change type of MODIFY  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x1123055b0>
- Author Timon Back  modified bookmark_priority_arts.py  with a change type of MODIFY  and the complexity is 1
commit<pydriller.domain.commit.Commit object at 0x112e4a0d0>
- Author Timon Back  modified populate.py  with a change type of MODIFY  and the complexity is 28
commit<pydriller.domain.commit.Commit object at 0x1123055b0>
- Author Timon Back  modified arts.py  with a change type of ADD  and the complexity is 1
- Author Timon Back  modified bookmark.py  with a change type of MODIFY  and the complexity is 54
- Author Timon Back  modified bookmark_priority_arts.py  with a change type of MODIFY  and the complexity is 1
- Author Timon Back  modified exercise.py  with a change type of MODIFY  and the complexity is 1
- Author Timon Back  modified exercise_outcome.py  with a change type of MODIFY  and the complexity is 3

- Author Mircea Lungu  modified user.py  with a change type of MODIFY  and the complexity is 47
commit<pydriller.domain.commit.Commit object at 0x112e75cd0>
- Author Mircea Lungu  modified test_logging.py  with a change type of ADD  and the complexity is 1
- Author Mircea Lungu  modified test_user_accounts.py  with a change type of MODIFY  and the complexity is 5
- Author Mircea Lungu  modified __init__.py  with a change type of MODIFY  and the complexity is 1
- Author Mircea Lungu  modified user.py  with a change type of MODIFY  and the complexity is 47
commit<pydriller.domain.commit.Commit object at 0x1123055b0>
- Author Mircea Lungu  modified parallel_retriever.py  with a change type of MODIFY  and the complexity is 7
commit<pydriller.domain.commit.Commit object at 0x112e75cd0>
- Author Mircea Lungu  modified __init__.py  with a change type of MODIFY  and the complexity is 1
commit<pydriller.domain.commit.Commit object at 0x1130c4bb0>
commit<pydriller.domain.commit.Commit object at 

- Author Peter Ullrich  modified flask_sqlalchemy  with a change type of DELETE  and the complexity is None
- Author Peter Ullrich  modified os  with a change type of DELETE  and the complexity is None
- Author Peter Ullrich  modified re  with a change type of DELETE  and the complexity is None
- Author Peter Ullrich  modified zeeguu  with a change type of DELETE  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x112335670>
commit<pydriller.domain.commit.Commit object at 0x1130d97c0>
- Author Peter Ullrich  modified testing_default.cfg  with a change type of MODIFY  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x112335670>
- Author timonback  modified .travis.yml  with a change type of MODIFY  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x1130d9a60>
- Author Timon Back  modified word_exercise_stats.py  with a change type of MODIFY  and the complexity is 26
commit<pydriller.domain.commit.Commit objec

- Author Timon Back  modified feed.py  with a change type of MODIFY  and the complexity is 16
commit<pydriller.domain.commit.Commit object at 0x1123053a0>
- Author Timon Back  modified words_to_study.py  with a change type of MODIFY  and the complexity is 1
commit<pydriller.domain.commit.Commit object at 0x1130d0550>
commit<pydriller.domain.commit.Commit object at 0x1130d0d90>
- Author Timon Back  modified .travis.yml  with a change type of MODIFY  and the complexity is None
- Author Timon Back  modified testing_default.cfg  with a change type of MODIFY  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x1123053a0>
- Author Timon Back  modified test_knowledge_estimator.py  with a change type of MODIFY  and the complexity is 2
- Author Timon Back  modified test_words_to_study.py  with a change type of MODIFY  and the complexity is 4
commit<pydriller.domain.commit.Commit object at 0x1130d04c0>
commit<pydriller.domain.commit.Commit object at 0x1130d9fd0>
- Author

- Author Peter Ullrich  modified bookmark.py  with a change type of MODIFY  and the complexity is 49
- Author Peter Ullrich  modified exercise_outcome.py  with a change type of MODIFY  and the complexity is 3
- Author Peter Ullrich  modified populate.py  with a change type of MODIFY  and the complexity is 28
commit<pydriller.domain.commit.Commit object at 0x1123053a0>
- Author Peter Ullrich  modified language_rule.py  with a change type of MODIFY  and the complexity is 15
- Author Peter Ullrich  modified test_knowledge_estimator.py  with a change type of MODIFY  and the complexity is 6
- Author Peter Ullrich  modified test_language.py  with a change type of MODIFY  and the complexity is 8
- Author Peter Ullrich  modified test_retrieve_and_compute.py  with a change type of MODIFY  and the complexity is 3
- Author Peter Ullrich  modified test_text_difficulty.py  with a change type of MODIFY  and the complexity is 2
- Author Peter Ullrich  modified test_user_accounts.py  with a change typ

- Author Mircea Lungu  modified model_test_mixin.py  with a change type of MODIFY  and the complexity is 2
commit<pydriller.domain.commit.Commit object at 0x112e75fa0>
- Author Mircea Lungu  modified language_rule.py  with a change type of MODIFY  and the complexity is 15
commit<pydriller.domain.commit.Commit object at 0x112e7ebb0>
- Author Mircea Lungu  modified url_rule.py  with a change type of MODIFY  and the complexity is 4
commit<pydriller.domain.commit.Commit object at 0x112e75fa0>
- Author Mircea Lungu  modified user_word_rule.py  with a change type of MODIFY  and the complexity is 7
commit<pydriller.domain.commit.Commit object at 0x112e7ebb0>
- Author Mircea Lungu  modified test_retrieve_and_compute.py  with a change type of MODIFY  and the complexity is 3
- Author Mircea Lungu  modified test_url.py  with a change type of MODIFY  and the complexity is 7
- Author Mircea Lungu  modified retrieve_and_compute.py  with a change type of MODIFY  and the complexity is 4
- Author Mirce

commit<pydriller.domain.commit.Commit object at 0x112e75dc0>
- Author Mircea Lungu  modified words_to_study.py  with a change type of MODIFY  and the complexity is 3
- Author Mircea Lungu  modified bookmark.py  with a change type of MODIFY  and the complexity is 45
commit<pydriller.domain.commit.Commit object at 0x112e75ac0>
- Author Mircea Lungu  modified bookmark.py  with a change type of MODIFY  and the complexity is 45
commit<pydriller.domain.commit.Commit object at 0x112e75ca0>
- Author Mircea Lungu  modified .travis.yml  with a change type of MODIFY  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x112e4ad00>
- Author Mircea Lungu  modified bookmark.py  with a change type of MODIFY  and the complexity is 46
commit<pydriller.domain.commit.Commit object at 0x112e75ca0>
- Author Mircea Lungu  modified bookmark.py  with a change type of MODIFY  and the complexity is 46
commit<pydriller.domain.commit.Commit object at 0x112e4ad00>
- Author Mircea Lungu  modi

- Author Mircea Lungu  modified algo_service.py  with a change type of MODIFY  and the complexity is 15
commit<pydriller.domain.commit.Commit object at 0x112e75be0>
- Author Mircea Lungu  modified words_to_study.py  with a change type of MODIFY  and the complexity is 3
- Author Mircea Lungu  modified bookmark.py  with a change type of MODIFY  and the complexity is 55
commit<pydriller.domain.commit.Commit object at 0x112e75820>
- Author Mircea Lungu  modified bookmark.py  with a change type of MODIFY  and the complexity is 55
commit<pydriller.domain.commit.Commit object at 0x1123053a0>
- Author Mircea Lungu  modified bookmark.py  with a change type of MODIFY  and the complexity is 55
commit<pydriller.domain.commit.Commit object at 0x1123055b0>
- Author Mircea Lungu  modified bookmark.py  with a change type of MODIFY  and the complexity is 56
commit<pydriller.domain.commit.Commit object at 0x1123053a0>
- Author Mircea Lungu  modified bookmark.py  with a change type of MODIFY  and the com

- Author Mircea Lungu  modified words_to_study.py  with a change type of DELETE  and the complexity is None
- Author Mircea Lungu  modified retrieve_and_compute.py  with a change type of MODIFY  and the complexity is 4
- Author Mircea Lungu  modified bookmark.py  with a change type of MODIFY  and the complexity is 61
- Author Mircea Lungu  modified bookmark_priority_arts.py  with a change type of MODIFY  and the complexity is 4
- Author Mircea Lungu  modified exercise_outcome.py  with a change type of MODIFY  and the complexity is 13
- Author Mircea Lungu  modified exercise_source.py  with a change type of MODIFY  and the complexity is 7
- Author Mircea Lungu  modified feed.py  with a change type of MODIFY  and the complexity is 16
- Author Mircea Lungu  modified language.py  with a change type of MODIFY  and the complexity is 14
- Author Mircea Lungu  modified exercise_stats.py  with a change type of MODIFY  and the complexity is 5
- Author Mircea Lungu  modified learner_stats.py  wit

- Author Mircea Lungu  modified bookmark.py  with a change type of MODIFY  and the complexity is 63
commit<pydriller.domain.commit.Commit object at 0x112e7e3a0>
- Author Mircea Lungu  modified test_bookmark.py  with a change type of MODIFY  and the complexity is 35
- Author Mircea Lungu  modified bookmark.py  with a change type of MODIFY  and the complexity is 63
commit<pydriller.domain.commit.Commit object at 0x1130c3e80>
- Author Mircea Lungu  modified bookmark.py  with a change type of MODIFY  and the complexity is 63
- Author Mircea Lungu  modified exercise.py  with a change type of MODIFY  and the complexity is 2
commit<pydriller.domain.commit.Commit object at 0x112e7e3a0>
- Author Mircea Lungu  modified bookmark.py  with a change type of MODIFY  and the complexity is 63
- Author Mircea Lungu  modified words_to_study.py  with a change type of MODIFY  and the complexity is 6
commit<pydriller.domain.commit.Commit object at 0x112335940>
- Author Mircea Lungu  modified bookmark.py  wi

commit<pydriller.domain.commit.Commit object at 0x112335970>
- Author Mircea Lungu  modified test_bookmark.py  with a change type of MODIFY  and the complexity is 35
commit<pydriller.domain.commit.Commit object at 0x112e75a00>
- Author Lars Holdijk  modified test_difficulty_estimator_factory.py  with a change type of RENAME  and the complexity is None
- Author Lars Holdijk  modified difficulty_estimator_factory.py  with a change type of MODIFY  and the complexity is 3
- Author Lars Holdijk  modified difficulty_estimator_strategy.py  with a change type of MODIFY  and the complexity is 2
commit<pydriller.domain.commit.Commit object at 0x1130d9cd0>
- Author Lars Holdijk  modified difficulty_estimator_factory.py  with a change type of MODIFY  and the complexity is 3
- Author Lars Holdijk  modified difficulty_estimator_strategy.py  with a change type of MODIFY  and the complexity is 2
commit<pydriller.domain.commit.Commit object at 0x112e75940>
- Author Lars Holdijk  modified default_diffic

- Author Lars Holdijk  modified test_flesch_kincaid_difficulty_estimator.py  with a change type of MODIFY  and the complexity is 21
- Author Lars Holdijk  modified flesch_kincaid_difficulty_estimator.py  with a change type of MODIFY  and the complexity is 13
commit<pydriller.domain.commit.Commit object at 0x1123053a0>
- Author Lars Holdijk  modified flesch_kincaid_difficulty_estimator.py  with a change type of MODIFY  and the complexity is 13
- Author Lars Holdijk  modified frequency_difficulty_estimator.py  with a change type of MODIFY  and the complexity is 11
commit<pydriller.domain.commit.Commit object at 0x112e7e310>
- Author Lars Holdijk  modified test_flesch_kincaid_difficulty_estimator.py  with a change type of MODIFY  and the complexity is 21
- Author Lars Holdijk  modified flesch_kincaid_difficulty_estimator.py  with a change type of MODIFY  and the complexity is 14
- Author Lars Holdijk  modified frequency_difficulty_estimator.py  with a change type of MODIFY  and the comple

commit<pydriller.domain.commit.Commit object at 0x112e7e340>
- Author Mircea Lungu  modified text.py  with a change type of MODIFY  and the complexity is 1
commit<pydriller.domain.commit.Commit object at 0x112e7e640>
- Author Mircea Lungu  modified test_content_retrieval.py  with a change type of DELETE  and the complexity is None
- Author Mircea Lungu  modified test_retrieve_and_compute.py  with a change type of MODIFY  and the complexity is 2
- Author Mircea Lungu  modified parallel_retriever.py  with a change type of DELETE  and the complexity is None
- Author Mircea Lungu  modified retrieve_and_compute.py  with a change type of DELETE  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x112e7e340>
- Author Mircea Lungu  modified content_extractor.py  with a change type of DELETE  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x112e667f0>
- Author Mircea Lungu  modified frequency_difficulty_estimator.py  with a change type of MOD

- Author Mircea Lungu  modified article.py  with a change type of MODIFY  and the complexity is 14
- Author Mircea Lungu  modified url.py  with a change type of MODIFY  and the complexity is 18
commit<pydriller.domain.commit.Commit object at 0x112e4a220>
- Author Mircea Lungu  modified article.py  with a change type of MODIFY  and the complexity is 15
commit<pydriller.domain.commit.Commit object at 0x112e3eca0>
- Author Mircea Lungu  modified article.py  with a change type of MODIFY  and the complexity is 16
commit<pydriller.domain.commit.Commit object at 0x112e4a220>
- Author Mircea Lungu  modified test_user_article.py  with a change type of MODIFY  and the complexity is 4
- Author Mircea Lungu  modified user_article.py  with a change type of MODIFY  and the complexity is 15
commit<pydriller.domain.commit.Commit object at 0x112e3eca0>
- Author Lars Holdijk  modified setup.py  with a change type of MODIFY  and the complexity is 3
commit<pydriller.domain.commit.Commit object at 0x112e4a

- Author Mircea Lungu  modified __init__.py  with a change type of RENAME  and the complexity is None
- Author Mircea Lungu  modified model_test_mixin.py  with a change type of RENAME  and the complexity is None
- Author Mircea Lungu  modified __init__.py  with a change type of RENAME  and the complexity is None
- Author Mircea Lungu  modified article_rule.py  with a change type of RENAME  and the complexity is 4
- Author Mircea Lungu  modified base_rule.py  with a change type of RENAME  and the complexity is None
- Author Mircea Lungu  modified bookmark_rule.py  with a change type of RENAME  and the complexity is 12
- Author Mircea Lungu  modified cohort_rule.py  with a change type of RENAME  and the complexity is 3
- Author Mircea Lungu  modified exercise_rule.py  with a change type of RENAME  and the complexity is 4
- Author Mircea Lungu  modified language_rule.py  with a change type of RENAME  and the complexity is 15
- Author Mircea Lungu  modified outcome_rule.py  with a change t

- Author Mircea Lungu  modified __init__.py  with a change type of MODIFY  and the complexity is 0
commit<pydriller.domain.commit.Commit object at 0x112e3eca0>
- Author Mircea Lungu  modified __init__.py  with a change type of MODIFY  and the complexity is 0
commit<pydriller.domain.commit.Commit object at 0x1123350d0>
- Author Mircea Lungu  modified __init__.py  with a change type of MODIFY  and the complexity is 0
commit<pydriller.domain.commit.Commit object at 0x112e4a340>
- Author Mircea Lungu  modified feed.py  with a change type of MODIFY  and the complexity is 30
commit<pydriller.domain.commit.Commit object at 0x112e4acd0>
commit<pydriller.domain.commit.Commit object at 0x112e4a640>
- Author Mircea Filip Lungu  modified run_tests.sh  with a change type of MODIFY  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x112e7e370>
- Author Mircea Lungu  modified run_tests.sh  with a change type of MODIFY  and the complexity is None
commit<pydriller.domain.commi

- Author Oli26  modified test_cohort.py  with a change type of MODIFY  and the complexity is 8
- Author Oli26  modified cohort.py  with a change type of MODIFY  and the complexity is 10
commit<pydriller.domain.commit.Commit object at 0x112e75790>
- Author Oli26  modified test_cohort.py  with a change type of MODIFY  and the complexity is 8
- Author Oli26  modified cohort.py  with a change type of MODIFY  and the complexity is 10
commit<pydriller.domain.commit.Commit object at 0x112e667f0>
- Author Oli26  modified cohort.py  with a change type of MODIFY  and the complexity is 10
commit<pydriller.domain.commit.Commit object at 0x112335670>
commit<pydriller.domain.commit.Commit object at 0x112e4ad00>
- Author Mircea Lungu  modified feed.py  with a change type of MODIFY  and the complexity is 30
commit<pydriller.domain.commit.Commit object at 0x112e667f0>
- Author mircea lungu  modified add_broken_column.sql  with a change type of ADD  and the complexity is None
commit<pydriller.domain.com

- Author cpaz  modified user_reading_session.py  with a change type of MODIFY  and the complexity is 26
commit<pydriller.domain.commit.Commit object at 0x1130d09d0>
commit<pydriller.domain.commit.Commit object at 0x112e75d30>
- Author Mircea Lungu  modified user_activitiy_data.py  with a change type of MODIFY  and the complexity is 20
commit<pydriller.domain.commit.Commit object at 0x112e75fd0>
- Author Mircea Lungu  modified user_activitiy_data.py  with a change type of MODIFY  and the complexity is 20
commit<pydriller.domain.commit.Commit object at 0x112e75c70>
- Author Mircea Lungu  modified user_reading_session.py  with a change type of MODIFY  and the complexity is 26
commit<pydriller.domain.commit.Commit object at 0x112e75fd0>
- Author mircea lungu  modified user_activitiy_data.py  with a change type of MODIFY  and the complexity is 20
commit<pydriller.domain.commit.Commit object at 0x112e75c70>
- Author Mircea Lungu  modified bookmark.py  with a change type of MODIFY  and the co

- Author feikoritsema  modified user.py  with a change type of ADD  and the complexity is 61
- Author feikoritsema  modified user_activitiy_data.py  with a change type of ADD  and the complexity is 8
- Author feikoritsema  modified user_article.py  with a change type of ADD  and the complexity is 22
- Author feikoritsema  modified user_language.py  with a change type of ADD  and the complexity is 9
- Author feikoritsema  modified user_preference.py  with a change type of ADD  and the complexity is 17
- Author feikoritsema  modified user_word.py  with a change type of ADD  and the complexity is 17
- Author feikoritsema  modified populate.py  with a change type of ADD  and the complexity is 4
- Author feikoritsema  modified __init__.py  with a change type of ADD  and the complexity is 0
- Author feikoritsema  modified encoding.py  with a change type of ADD  and the complexity is 5
- Author feikoritsema  modified hash.py  with a change type of ADD  and the complexity is 4
- Author feikori

- Author feikoritsema  modified search_db_migration.SQL  with a change type of ADD  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x112e7e250>
- Author joel  modified saturate_word_interaction_history.py  with a change type of ADD  and the complexity is 0
- Author joel  modified word_interaction_history.py  with a change type of MODIFY  and the complexity is 15
commit<pydriller.domain.commit.Commit object at 0x112e4a280>
- Author cpaz  modified user_reading_session.py  with a change type of MODIFY  and the complexity is 27
commit<pydriller.domain.commit.Commit object at 0x112e4a430>
- Author feikoritsema  modified search_db_migration.SQL  with a change type of MODIFY  and the complexity is None
- Author feikoritsema  modified mixed_recommender.py  with a change type of MODIFY  and the complexity is 29
- Author feikoritsema  modified localized_topic.py  with a change type of MODIFY  and the complexity is 10
- Author feikoritsema  modified search.py  with a c

- Author feikoritsema  modified map_article_words.py  with a change type of MODIFY  and the complexity is 0
- Author feikoritsema  modified user_language.py  with a change type of MODIFY  and the complexity is 13
commit<pydriller.domain.commit.Commit object at 0x112e757c0>
- Author feikoritsema  modified empty_word_map.sql  with a change type of ADD  and the complexity is None
- Author feikoritsema  modified user_language.py  with a change type of MODIFY  and the complexity is 13
commit<pydriller.domain.commit.Commit object at 0x1123357f0>
- Author feikoritsema  modified map_article_words.py  with a change type of MODIFY  and the complexity is 0
- Author feikoritsema  modified user_language.py  with a change type of MODIFY  and the complexity is 11
commit<pydriller.domain.commit.Commit object at 0x112e757c0>
- Author feikoritsema  modified map_article_words.py  with a change type of MODIFY  and the complexity is 0
commit<pydriller.domain.commit.Commit object at 0x112e3eca0>
- Author fe

- Author feikoritsema  modified mixed_recommender.py  with a change type of MODIFY  and the complexity is 45
commit<pydriller.domain.commit.Commit object at 0x1123055b0>
commit<pydriller.domain.commit.Commit object at 0x1130cd580>
commit<pydriller.domain.commit.Commit object at 0x1130e0e50>
- Author feikoritsema  modified mixed_recommender.py  with a change type of MODIFY  and the complexity is 45
- Author feikoritsema  modified article_word.py  with a change type of MODIFY  and the complexity is 10
commit<pydriller.domain.commit.Commit object at 0x1130cd4c0>
- Author Oli26  modified teacher.py  with a change type of MODIFY  and the complexity is 4
commit<pydriller.domain.commit.Commit object at 0x1123055b0>
- Author joel  modified __init__.py  with a change type of MODIFY  and the complexity is 0
commit<pydriller.domain.commit.Commit object at 0x1130cd4c0>
- Author Mircea Lungu  modified translators.log  with a change type of DELETE  and the complexity is None
commit<pydriller.domain.

- Author feikoritsema  modified mixed_recommender.py  with a change type of MODIFY  and the complexity is 58
commit<pydriller.domain.commit.Commit object at 0x112e7ec40>
- Author Mircea Lungu  modified mixed_recommender.py  with a change type of MODIFY  and the complexity is 59
- Author Mircea Lungu  modified feed_registrations.py  with a change type of MODIFY  and the complexity is 8
- Author Mircea Lungu  modified search_filter.py  with a change type of MODIFY  and the complexity is 8
- Author Mircea Lungu  modified search_subscription.py  with a change type of MODIFY  and the complexity is 8
- Author Mircea Lungu  modified topic_filter.py  with a change type of MODIFY  and the complexity is 9
- Author Mircea Lungu  modified topic_subscription.py  with a change type of MODIFY  and the complexity is 9
commit<pydriller.domain.commit.Commit object at 0x1130e0520>
commit<pydriller.domain.commit.Commit object at 0x112e75c10>
commit<pydriller.domain.commit.Commit object at 0x112e75b80>
com

- Author Mircea Lungu  modified fill_article_ids.py  with a change type of MODIFY  and the complexity is 3
- Author Mircea Lungu  modified article.py  with a change type of MODIFY  and the complexity is 29
- Author Mircea Lungu  modified user_activitiy_data.py  with a change type of MODIFY  and the complexity is 27
commit<pydriller.domain.commit.Commit object at 0x112e7e370>
- Author Mircea Lungu  modified fill_article_ids.py  with a change type of MODIFY  and the complexity is 4
commit<pydriller.domain.commit.Commit object at 0x112e7ec40>
- Author Mircea Lungu  modified constants.py  with a change type of MODIFY  and the complexity is 0
- Author Mircea Lungu  modified user_reading_session.py  with a change type of MODIFY  and the complexity is 33
commit<pydriller.domain.commit.Commit object at 0x112e75850>
- Author Mircea Lungu  modified user_activitiy_data.py  with a change type of MODIFY  and the complexity is 27
commit<pydriller.domain.commit.Commit object at 0x112e75c40>
- Author 

- Author Mircea Lungu  modified search_subscription.py  with a change type of MODIFY  and the complexity is 8
- Author Mircea Lungu  modified topic_filter.py  with a change type of MODIFY  and the complexity is 9
- Author Mircea Lungu  modified topic_subscription.py  with a change type of MODIFY  and the complexity is 9
commit<pydriller.domain.commit.Commit object at 0x1130d0640>
commit<pydriller.domain.commit.Commit object at 0x1130d0fd0>
- Author cpaz  modified user_exercise_session.py  with a change type of MODIFY  and the complexity is 18
- Author cpaz  modified user_reading_session.py  with a change type of MODIFY  and the complexity is 30
commit<pydriller.domain.commit.Commit object at 0x112e3eca0>
- Author Mircea Lungu  modified cohort.py  with a change type of MODIFY  and the complexity is 11
commit<pydriller.domain.commit.Commit object at 0x1130cdc40>
- Author Mircea Lungu  modified bookmark.py  with a change type of MODIFY  and the complexity is 80
commit<pydriller.domain.com

- Author Alin Balutoiu  modified __init__.py  with a change type of MODIFY  and the complexity is 2
commit<pydriller.domain.commit.Commit object at 0x1123053a0>
commit<pydriller.domain.commit.Commit object at 0x1130cd2b0>
- Author Mircea Lungu  modified __init__.py  with a change type of MODIFY  and the complexity is 0
commit<pydriller.domain.commit.Commit object at 0x112e3eca0>
- Author Mircea Lungu  modified quality_filter.py  with a change type of MODIFY  and the complexity is 9
commit<pydriller.domain.commit.Commit object at 0x112305c70>
- Author Mircea Lungu  modified user.py  with a change type of MODIFY  and the complexity is 80
commit<pydriller.domain.commit.Commit object at 0x112e3eca0>
- Author Mircea Lungu  modified __init__.py  with a change type of ADD  and the complexity is None
- Author Mircea Lungu  modified difficulties_for_user.py  with a change type of ADD  and the complexity is 8
- Author Mircea Lungu  modified recent_activity.py  with a change type of RENAME  and t

- Author Mircea Lungu  modified difficulties_for_user.py  with a change type of MODIFY  and the complexity is 8
- Author Mircea Lungu  modified recent_activity.py  with a change type of MODIFY  and the complexity is 0
- Author Mircea Lungu  modified errors.translators.log  with a change type of DELETE  and the complexity is None
- Author Mircea Lungu  modified add_article_id_to_text.py  with a change type of MODIFY  and the complexity is 0
- Author Mircea Lungu  modified add_standard_topics.py  with a change type of MODIFY  and the complexity is 2
- Author Mircea Lungu  modified anonimize_user.py  with a change type of MODIFY  and the complexity is 0
- Author Mircea Lungu  modified anonymize.py  with a change type of MODIFY  and the complexity is 0
- Author Mircea Lungu  modified cleanup_non_content_bits.py  with a change type of MODIFY  and the complexity is 0
- Author Mircea Lungu  modified feed_retrieval.py  with a change type of MODIFY  and the complexity is 0
- Author Mircea Lungu

- Author Mircea Lungu  modified text.py  with a change type of RENAME  and the complexity is 18
- Author Mircea Lungu  modified topic.py  with a change type of RENAME  and the complexity is 14
- Author Mircea Lungu  modified topic_filter.py  with a change type of RENAME  and the complexity is 9
- Author Mircea Lungu  modified topic_subscription.py  with a change type of RENAME  and the complexity is 9
- Author Mircea Lungu  modified unique_code.py  with a change type of RENAME  and the complexity is 4
- Author Mircea Lungu  modified url.py  with a change type of RENAME  and the complexity is 24
- Author Mircea Lungu  modified user.py  with a change type of RENAME  and the complexity is 80
- Author Mircea Lungu  modified user_activitiy_data.py  with a change type of RENAME  and the complexity is 27
- Author Mircea Lungu  modified user_article.py  with a change type of RENAME  and the complexity is 25
- Author Mircea Lungu  modified user_exercise_session.py  with a change type of RENAME 

- Author Mircea Lungu  modified user.py  with a change type of MODIFY  and the complexity is 75
- Author Mircea Lungu  modified model_test_mixin.py  with a change type of MODIFY  and the complexity is 3
- Author Mircea Lungu  modified article_rule.py  with a change type of MODIFY  and the complexity is 5
- Author Mircea Lungu  modified rss_feed_rule.py  with a change type of MODIFY  and the complexity is 4
- Author Mircea Lungu  modified test_article.py  with a change type of MODIFY  and the complexity is 6
- Author Mircea Lungu  modified test_feed.py  with a change type of MODIFY  and the complexity is 4
- Author Mircea Lungu  modified test_retrieve_and_compute.py  with a change type of MODIFY  and the complexity is 12
- Author Mircea Lungu  modified test_text.py  with a change type of MODIFY  and the complexity is 2
- Author Mircea Lungu  modified urls_for_test.py  with a change type of ADD  and the complexity is 3
commit<pydriller.domain.commit.Commit object at 0x112305af0>
- Author

commit<pydriller.domain.commit.Commit object at 0x11301ec40>
- Author Mircea Lungu  modified .travis.yml  with a change type of MODIFY  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x112e667f0>
- Author Mircea Lungu  modified rss_feed_rule.py  with a change type of MODIFY  and the complexity is 4
- Author Mircea Lungu  modified mocking_the_web.py  with a change type of MODIFY  and the complexity is 3
commit<pydriller.domain.commit.Commit object at 0x11301ec40>
- Author Mircea Lungu  modified article_downloader.py  with a change type of MODIFY  and the complexity is 29
commit<pydriller.domain.commit.Commit object at 0x11301e640>
- Author Mircea Lungu  modified article_downloader.py  with a change type of MODIFY  and the complexity is 29
commit<pydriller.domain.commit.Commit object at 0x11301ec40>
- Author Mircea Lungu  modified article_word.py  with a change type of MODIFY  and the complexity is 13
commit<pydriller.domain.commit.Commit object at 0x11301e640

- Author Mircea Lungu  modified user.py  with a change type of MODIFY  and the complexity is 86
commit<pydriller.domain.commit.Commit object at 0x112e3eca0>
- Author Mircea Lungu  modified remove_unreferenced_articles.py  with a change type of MODIFY  and the complexity is 0
commit<pydriller.domain.commit.Commit object at 0x112e75910>
- Author Mircea Lungu  modified test_language.py  with a change type of MODIFY  and the complexity is 8
commit<pydriller.domain.commit.Commit object at 0x112e3eca0>
- Author Mircea Lungu  modified localized_topic.py  with a change type of MODIFY  and the complexity is 9
commit<pydriller.domain.commit.Commit object at 0x112e75910>
- Author Mircea Lungu  modified mixed_recommender.py  with a change type of MODIFY  and the complexity is 44
commit<pydriller.domain.commit.Commit object at 0x112e3eca0>
- Author Mircea Lungu  modified mixed_recommender.py  with a change type of MODIFY  and the complexity is 44
- Author Mircea Lungu  modified user.py  with a chan

- Author Mircea Lungu  modified language.py  with a change type of MODIFY  and the complexity is 26
commit<pydriller.domain.commit.Commit object at 0x1130eb100>
- Author Mircea Lungu  modified language.py  with a change type of MODIFY  and the complexity is 26
commit<pydriller.domain.commit.Commit object at 0x112e75940>
- Author Mircea Lungu  modified language.py  with a change type of MODIFY  and the complexity is 27
commit<pydriller.domain.commit.Commit object at 0x1130eb100>
- Author Marcus Grosen  modified recompute_recommender_cache.py  with a change type of MODIFY  and the complexity is 10
- Author Marcus Grosen  modified article_downloader.py  with a change type of MODIFY  and the complexity is 35
commit<pydriller.domain.commit.Commit object at 0x112e75940>
- Author Mircea Lungu  modified recompute_fk_difficulties_for_polish.py  with a change type of ADD  and the complexity is 0
commit<pydriller.domain.commit.Commit object at 0x1130eb100>
- Author Mircea Lungu  modified recomput

- Author Marcus  modified elasticsearch_query_comparison.py  with a change type of MODIFY  and the complexity is 20
- Author Marcus  modified mysqlSettings.py  with a change type of ADD  and the complexity is 0
commit<pydriller.domain.commit.Commit object at 0x112e4a640>
- Author Marcus  modified README.md  with a change type of MODIFY  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x112e7ebb0>
- Author Marcus  modified README.md  with a change type of MODIFY  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x1130d0730>
- Author Marcus  modified mysqlFullText.py  with a change type of MODIFY  and the complexity is 6
commit<pydriller.domain.commit.Commit object at 0x1130d9760>
commit<pydriller.domain.commit.Commit object at 0x1130eb340>
- Author Marcus  modified __init__.py  with a change type of DELETE  and the complexity is None
- Author Marcus  modified concurrent_test.bat  with a change type of DELETE  and the complexity is Non

- Author Marcus  modified __init__.py  with a change type of ADD  and the complexity is None
- Author Marcus  modified elastic_recommender.py  with a change type of MODIFY  and the complexity is 23
commit<pydriller.domain.commit.Commit object at 0x112e4a280>
- Author Marcus  modified __init__.py  with a change type of DELETE  and the complexity is None
- Author Marcus  modified topic_classification.py  with a change type of DELETE  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x1130d0730>
- Author Marcus  modified mysql_to_elastic.py  with a change type of MODIFY  and the complexity is 4
- Author Marcus  modified elastic_recommender.py  with a change type of MODIFY  and the complexity is 23
commit<pydriller.domain.commit.Commit object at 0x112e4a280>
- Author Marcus  modified mysql_to_elastic.py  with a change type of MODIFY  and the complexity is 4
- Author Marcus  modified elastic_recommender.py  with a change type of MODIFY  and the complexity is 23
- A

- Author Mircea Lungu  modified article_downloader.py  with a change type of MODIFY  and the complexity is 34
- Author Mircea Lungu  modified feed.py  with a change type of MODIFY  and the complexity is 33
commit<pydriller.domain.commit.Commit object at 0x1130cd3d0>
- Author Mircea Lungu  modified localized_topic.py  with a change type of MODIFY  and the complexity is 9
- Author Mircea Lungu  modified test_localized_topic.py  with a change type of ADD  and the complexity is 3
commit<pydriller.domain.commit.Commit object at 0x112e3eca0>
- Author Mircea Lungu  modified article_downloader.py  with a change type of MODIFY  and the complexity is 34
- Author Mircea Lungu  modified article.py  with a change type of MODIFY  and the complexity is 37
commit<pydriller.domain.commit.Commit object at 0x112e7e370>
- Author Mircea Lungu  modified test_localized_topic.py  with a change type of MODIFY  and the complexity is 4
commit<pydriller.domain.commit.Commit object at 0x112e3eca0>
- Author Mircea 

#### Let's Count the Modifications for Each File

In [None]:
from collections import defaultdict

commit_counts = defaultdict(int)

for commit in RepositoryMining(REPO_DIR).traverse_commits():
    for modification in commit.modifications:
        try:
            commit_counts [modification.new_path] += 1
        except: 
            pass

sorted(commit_counts.items(), key=lambda x: x[1], reverse=True)[:42]


#### Problem: many `__init__.py` files in our system but only one in the counts!

- what's the full file name? 

- looking at the documentation of PyDriller [1] we see that there's two:
  - old_path
  - new_path

- why? 
- which one should we be using? 

[1] https://pydriller.readthedocs.io/en/latest/commit.html


#### Lesson: to track full paths  we need to also track *individual file evolution*

In [None]:
from pydriller import ModificationType

commit_counts = {}

for commit in RepositoryMining(REPO_DIR).traverse_commits():
    for modification in commit.modifications:
        
        new_path = modification.new_path
        old_path = modification.old_path
        
        try:

            if modification.change_type == ModificationType.RENAME:
                commit_counts[new_path]=commit_counts.get(old_path,0)+1
                commit_counts.pop(old_path)

            elif modification.change_type == ModificationType.DELETE:
                commit_counts.pop(old_path, '')

            elif modification.change_type == ModificationType.ADD:
                commit_counts[new_path] = 1

            else: # modification to existing file
                    commit_counts [old_path] += 1
        except Exception as e: 
            print("something went wrong with: " + str(modification))
            pass
        
sorted(commit_counts.items(), key=lambda x:x[1], reverse=True)


#### Aggregating to module level



In [None]:
from code.basic_abstraction import (
    module_from_path, 
    top_level_module
)

module_activity = defaultdict(int)

for path, count in commit_counts.items():
    if ".py" in str(path):
        l2_module = top_level_module(module_from_path(path), 2)
        module_activity[l2_module] += count

sorted(module_activity.items(), key=lambda x: x[1], reverse=True)



In [None]:
most_active_modules = sorted(module_activity.items(), key=lambda x: x[1], reverse=True)

top_most_active_modules= [each[0] for each in most_active_modules][:5]
top_most_active_modules


#### Architectural View: Relationships Between Evolutionary Hotspots


In [None]:
# packages required for drawing
import sys
!{sys.executable} -m pip install networkx --upgrade
!{sys.executable} -m pip install matplotlib

In [None]:
def system_module(m):
    return m in top_most_active_modules

def module_size(m):
    return 30*module_activity[m]

In [None]:
from code.basic_abstraction import (
    dependencies_graph, 
    draw_graph_with_weights,
    top_level_module,
    abstracted_to_top_level)

directed = dependencies_graph(REPO_DIR)
directedAbstracted = abstracted_to_top_level(directed, system_module)

draw_graph_with_weights(directedAbstracted, module_size, (18,8))

### Stepping Back

We used Git but similar for any VCS 

Alternative tools for VCS Analysis: 

- git log + Unix Command Line tools (See tutorials by Spinellis, Helge in ASE, or Tornhill)
  
- your IDE (e.g. integrated git blame, visual diff, etc.)

- Any others...?

Definition of most active can be tuned based on needs
- could be log-weighted towards recency (discard past changes more)
- could be used to replay the history of the system by looking at non-overlapping time windows


### Limitations

- ignores developer styles
  - the guy with micro-commits vs. the girl who like to commit infrequently but large chunks of code
  
- might detect files that `README.md`, or `LICENSE.md` changes the most
  - can be combined with static complexity metrics [1]






## 2. Dependency Extraction: Logical Coupling

** Logical coupling** detects when **two sub-systems** change together **frequently**
- The more they change together, the more likely they are dependent
- Can capture dependencies that are not detectable by static/dynamic analysis
  - e.g. ? 


Introduced in the context of an industrial case study [1]

[1] Detection of Logical Coupling Based on Product Release History, Gall et al., ’98

### Logical Coupling: The Details...


- What are sub-systems (files? folders? packages?)
- What does it mean change together (same commit? sliding time window?)
- The threshold for "frequently" (e.g. *75% of the commits min 10*, etc.)



### Advantages of Logical Coupling

Language Independent

Complements some Structural / Dynamic Analysis disadvantages: 
- can not capture all the situations (i.e. writing to a file, reading from a file)
- does not work with documents that are not source code (e.g. XML files)


## Evolution Analysis Beyond Architecture Recovery

- improved developer tools
  - recording and replaying software evolution (e.g. "Replay" for Eclipse)
    - fine-grained (method-level) evolution monitoring (Robbes et al.)


- software quality evaluation


- *program comprehension* when first encountering a new system



- Mining software ecosystems

  - kinds of changes that are most likely to introduce bugs 
  - developer strategies in front of API deprecation
