# Software Evolution Analysis

![](images/heraclitus.png)

> Everything changes, and nothing stands still, 
> 
> and you can not step twice in the same... system.
> 
> -- Heraclitus


# Metaphor Limitations: Software Architecture

- Makes it sound like it's something fixed...
- Even real world architecture, in time changes [Brand]
- [Brand] - *How Buildings Learn*. Steward Brand
  - The Long Now Foundation - Podcast








## Further Metaphors

My Favorite Metaphors of Software Development Emphasize Change...


### 1. Performance Art
- art: because it's creative
- performance: you can't put it in a frame 
- => *advice:* if you ever create a cool innovative software then **make a screencast** about it


### 2. A Garden 
- It needs somebody to always tend to it

> I still remember the jolt I felt in 1958 when I first heard a friend talk about building a program, as opposed to writing one. In a flash he broadened my whole view of the software process.

Brooks however thinks the building metaphor is not well equipped to handle the current projects we’re developing. Instead of building, which requires adequate plans and foresight, we should focus on growing a program organically. (Once even a very simple program is up and running, developers are much more enthusiastic about the progress.)



### 3. Software Aging 

David Parnas's **Software Aging** [1]

> Programs, like people, get old. 

- We can’t prevent aging, but 
  - we can understand its causes, 
  - take steps to limits its effects, 
  - temporarily reverse some of the damage it has caused, 
  - and prepare for the day when the software is no longer viable

[1] Software Aging. David Lorge Parnas, https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=296790


# Laws of Software Evolution

Lehman[1] proposed the laws about **e-type systems**:
  - an e-type system is *embedded* in the real world
  - and since the real world always changes... 
      - even if it weren't, the software ecosystem eventually changes [2] 
      - e.g. javascript packages, etc.

[1] Lehman, Belady. Program Evolution: Processes of Software Change, London Academic Press, London, 1985

[2] We'll talk more about ecosystems in the ASE course

## 1st Law of Software Evolution: E-Type Systems Must Change


> A program that is used in a real-world environment must change, or become progressively less useful in that environment. (Lehman's Law of Continuing Change)



        



## 2nd Law of Software Evolution: Ent*0py Happens!

Manny Lehmann's **Law of Increasing Entropy**: 

> As a program evolves, it becomes more complex, and extra resources are needed to preserve and simplify its structure.



# What if We Use System Evolution for Good? 
## e.g. for understanding

 
 By data mining the version repository we can find: 

  - places in the code which are high-risk (because they were risky in the past)
    - + linking with issue tracker info

  - parts of the system that need refactoring (study of Hitesh Sajnani)
  
  - navigation suggestions (e.g. Mylar for Eclipse)
  
  - infer programmer knowledge


Today: 
  1. entities in the codebase where most effort was invested
  1. invisible dependencies between files (e.g. logical coupling)
  
  
  
 






## VCS Capture The Software Evolution

VCS = version control system 


Over the last two decades **we have seen increases in**...
  - **popularity of version control systems**
https://trends.google.com/trends/explore?date=all&q=git,svn,software%20architecture,mercurial
    - it's even funny for us to think that people used to email files around to collaborate
    - one of the many practices that we, software engineers, have been teaching the rest of the world



- **knowledge of how to manage versions**
  - branching strategies
  - integration with CI
  - semantic versioning 



*How to integrate this information in AR?...*


## Architectural Viewpoint: Evolutionary Hotspots 

### Evolutionary Hotspots =(*def*) **code entities where most effort was invested ** [1]


Assumption: effort is proportional to architectural relevance


Why? 
- Philosophycally
 > *"The value of anything is proportional to time invested in it."* (M. Lungu)
 
 
- Practically:
  - high *churn* (change density) predicts bugs better than size [...]
  - studies observe correlation between churn and complexity metrics [...]
  - it's likely that they'll require more effort in the future (e.g. yesterday's weather [Girba et al.])
    
    
- Pragmatically:
  - can be detected with **language independent analysis** (which is good for polyglot systems)



### Evolutionary Hotspots In Practice

Challenges / Implementation Details: 
- how to measure effort invested? 
- what are the entities (files, aggregates?)
- on what period is the study performed 
  - results will likely differ for periods






### Example Analysis

VCS: Git

Period of study: whole history

Entities: files (+aggregation to modules)

Invested effort: number of commits

Case Study: Zeeguu-Core

Toolbox: Python + PyDriller + gitpython

Online: https://colab.research.google.com/drive/19f2-lmL07rSBoyKpx5ZQo0LY17_Un4YD?usp=sharing


In [None]:
import sys

!{sys.executable} -m pip install pydriller
!{sys.executable} -m pip install gitpython

In [4]:
from pydriller import RepositoryMining
REPO_DIR = '/Users/mircea/Zeeguu-Core/'


#### Every commit is modelled as "multiple modifications" each one involving a filename

In [5]:
for commit in RepositoryMining(REPO_DIR).traverse_commits():
    print("commit" + str(commit))
    print ("- Author {}".format(commit.author.name))
    
    for m in commit.modifications:
        print(
            " modified {}".format(m.filename),
            " with a change type of {}".format(m.change_type.name),
            " and the complexity is {}".format(m.complexity)
        )


commit<pydriller.domain.commit.Commit object at 0x105906be0>
- Author Mircea Lungu
 modified LICENSE  with a change type of ADD  and the complexity is None
 modified README.md  with a change type of ADD  and the complexity is None
 modified de-test.txt  with a change type of ADD  and the complexity is None
 modified de.txt  with a change type of ADD  and the complexity is None
 modified fr.txt  with a change type of ADD  and the complexity is None
 modified it.txt  with a change type of ADD  and the complexity is None
 modified nl.txt  with a change type of ADD  and the complexity is None
 modified sources.txt  with a change type of ADD  and the complexity is None
 modified setup.py  with a change type of ADD  and the complexity is 0
 modified test.py  with a change type of ADD  and the complexity is 0
 modified bookmarkwords.txt  with a change type of ADD  and the complexity is None
 modified create_test_db.sh  with a change type of ADD  and the complexity is None
 modified generate-m

 modified test_bookmark.py  with a change type of RENAME  and the complexity is 7
 modified test_domain.py  with a change type of RENAME  and the complexity is 6
 modified test_feed.py  with a change type of RENAME  and the complexity is 1
 modified test_language.py  with a change type of RENAME  and the complexity is 6
 modified test_user_accounts.py  with a change type of RENAME  and the complexity is 2
 modified test_user_preferences.py  with a change type of RENAME  and the complexity is 6
 modified test_watch_event.py  with a change type of RENAME  and the complexity is 5
 modified testing.cfg  with a change type of ADD  and the complexity is None
 modified testing_default.cfg  with a change type of RENAME  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x1059066a0>
- Author Mircea Lungu
 modified __init__.py  with a change type of ADD  and the complexity is None
 modified goose_extractor.py  with a change type of ADD  and the complexity is 5
 modified 

 modified de-test.txt  with a change type of DELETE  and the complexity is None
 modified de.txt  with a change type of DELETE  and the complexity is None
 modified fr.txt  with a change type of DELETE  and the complexity is None
 modified it.txt  with a change type of DELETE  and the complexity is None
 modified nl.txt  with a change type of DELETE  and the complexity is None
 modified sources.txt  with a change type of DELETE  and the complexity is None
 modified __init__.py  with a change type of MODIFY  and the complexity is 0
 modified bookmark.py  with a change type of MODIFY  and the complexity is 50
 modified knowledge_estimator.py  with a change type of MODIFY  and the complexity is 19
 modified encounter_stats.py  with a change type of RENAME  and the complexity is None
 modified encounter_stats_update.py  with a change type of ADD  and the complexity is 2
 modified model_test_mixin.py  with a change type of MODIFY  and the complexity is 4
 modified run_all.sh  with a change 

 modified setup.py  with a change type of MODIFY  and the complexity is 0
commit<pydriller.domain.commit.Commit object at 0x105afafd0>
- Author Mircea Filip Lungu
 modified README.md  with a change type of MODIFY  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x105ae5d00>
- Author Mircea Filip Lungu
 modified README.md  with a change type of MODIFY  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x105ae55b0>
- Author Mircea Filip Lungu
 modified README.md  with a change type of MODIFY  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x105ae5d00>
- Author Mircea Lungu
 modified .gitignore  with a change type of MODIFY  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x105ae3ee0>
- Author Mircea Lungu
commit<pydriller.domain.commit.Commit object at 0x105aedaf0>
- Author Mircea Lungu
 modified .gitignore  with a change type of MODIFY  and the complexity is None
commit<pydriller.d

 modified word_exercise_stats.py  with a change type of MODIFY  and the complexity is 33
commit<pydriller.domain.commit.Commit object at 0x104feb8e0>
- Author Timon Back
 modified __init__.py  with a change type of MODIFY  and the complexity is 0
commit<pydriller.domain.commit.Commit object at 0x105910700>
- Author Timon Back
 modified words_to_study.py  with a change type of MODIFY  and the complexity is 11
commit<pydriller.domain.commit.Commit object at 0x104feb8e0>
- Author Mircea Lungu
 modified test_bookmark.py  with a change type of MODIFY  and the complexity is 8
 modified test_user_accounts.py  with a change type of MODIFY  and the complexity is 5
 modified words_to_study.py  with a change type of MODIFY  and the complexity is 13
 modified bookmark.py  with a change type of MODIFY  and the complexity is 54
 modified user.py  with a change type of MODIFY  and the complexity is 45
 modified populate.py  with a change type of MODIFY  and the complexity is 27
commit<pydriller.domai

 modified populate.py  with a change type of MODIFY  and the complexity is 28
commit<pydriller.domain.commit.Commit object at 0x105ae3940>
- Author Peter Ullrich
 modified model_test_mixin.py  with a change type of MODIFY  and the complexity is 4
 modified test_words_to_study.py  with a change type of MODIFY  and the complexity is 4
 modified algo_service.py  with a change type of MODIFY  and the complexity is 9
 modified arts_rt.py  with a change type of MODIFY  and the complexity is 1
 modified populate.py  with a change type of MODIFY  and the complexity is 28
commit<pydriller.domain.commit.Commit object at 0x10595b430>
- Author Mircea Lungu
 modified test_bookmark.py  with a change type of MODIFY  and the complexity is 8
 modified populate.py  with a change type of MODIFY  and the complexity is 28
commit<pydriller.domain.commit.Commit object at 0x104d708e0>
- Author Mircea Lungu
 modified __init__.py  with a change type of ADD  and the complexity is None
 modified default_words.py 

 modified exercise_stats.py  with a change type of MODIFY  and the complexity is 1
commit<pydriller.domain.commit.Commit object at 0x104feb8e0>
- Author Timon Back
 modified bookmark_priority_arts.py  with a change type of MODIFY  and the complexity is 1
commit<pydriller.domain.commit.Commit object at 0x105aeddf0>
- Author Timon Back
 modified bookmark_priority_arts.py  with a change type of MODIFY  and the complexity is 1
 modified exercise_stats.py  with a change type of MODIFY  and the complexity is 1
commit<pydriller.domain.commit.Commit object at 0x104feb8e0>
- Author Timon Back
 modified bookmark_priority_arts.py  with a change type of MODIFY  and the complexity is 1
 modified exercise_stats.py  with a change type of MODIFY  and the complexity is 1
commit<pydriller.domain.commit.Commit object at 0x105ae3ee0>
- Author Timon Back
commit<pydriller.domain.commit.Commit object at 0x10595b8e0>
- Author Timon Back
 modified bookmark_priority_arts.py  with a change type of MODIFY  and th

 modified model_test_mixin.py  with a change type of MODIFY  and the complexity is 2
 modified base_rule.py  with a change type of MODIFY  and the complexity is 4
 modified language_rule.py  with a change type of MODIFY  and the complexity is 15
 modified test_bookmark.py  with a change type of MODIFY  and the complexity is 4
 modified test_content_retrieval.py  with a change type of MODIFY  and the complexity is 3
 modified test_domain.py  with a change type of MODIFY  and the complexity is 9
 modified test_feed.py  with a change type of MODIFY  and the complexity is 1
 modified domain_name.py  with a change type of MODIFY  and the complexity is 5
 modified url.py  with a change type of MODIFY  and the complexity is 11
 modified user.py  with a change type of MODIFY  and the complexity is 49
commit<pydriller.domain.commit.Commit object at 0x105910700>
- Author Timon Back
commit<pydriller.domain.commit.Commit object at 0x105b8ad30>
- Author Timon Back
commit<pydriller.domain.commit.Com

 modified feed.py  with a change type of MODIFY  and the complexity is 16
 modified user.py  with a change type of MODIFY  and the complexity is 50
commit<pydriller.domain.commit.Commit object at 0x105910700>
- Author Mircea Lungu
 modified retrieve_and_compute.py  with a change type of MODIFY  and the complexity is 3
commit<pydriller.domain.commit.Commit object at 0x105ae37f0>
- Author Mircea Lungu
 modified retrieve_and_compute.py  with a change type of MODIFY  and the complexity is 4
commit<pydriller.domain.commit.Commit object at 0x105910700>
- Author Peter Ullrich
 modified test_domain.py  with a change type of MODIFY  and the complexity is 9
 modified test_logging.py  with a change type of MODIFY  and the complexity is 1
 modified test_text_difficulty.py  with a change type of MODIFY  and the complexity is 2
commit<pydriller.domain.commit.Commit object at 0x105ae37f0>
- Author Peter Ullrich
commit<pydriller.domain.commit.Commit object at 0x105b8a430>
- Author Peter Ullrich
 modif

 modified language.py  with a change type of MODIFY  and the complexity is 10
commit<pydriller.domain.commit.Commit object at 0x104d708e0>
- Author Mircea Lungu
 modified url.py  with a change type of MODIFY  and the complexity is 17
commit<pydriller.domain.commit.Commit object at 0x105b7be80>
- Author Mircea Lungu
 modified .gitignore  with a change type of MODIFY  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x104d708e0>
- Author Mircea Lungu
 modified bookmark.py  with a change type of MODIFY  and the complexity is 44
commit<pydriller.domain.commit.Commit object at 0x105b7bc40>
- Author Mircea Lungu
 modified user_word.py  with a change type of MODIFY  and the complexity is 16
commit<pydriller.domain.commit.Commit object at 0x104feb8e0>
- Author Mircea Lungu
 modified user_word.py  with a change type of MODIFY  and the complexity is 16
commit<pydriller.domain.commit.Commit object at 0x105b7bc40>
- Author Mircea Lungu
 modified text.py  with a change typ

 modified user.py  with a change type of MODIFY  and the complexity is 56
commit<pydriller.domain.commit.Commit object at 0x10595ba90>
- Author Mircea Lungu
 modified algo_service.py  with a change type of MODIFY  and the complexity is 15
commit<pydriller.domain.commit.Commit object at 0x105906be0>
- Author Mircea Lungu
 modified algo_service.py  with a change type of MODIFY  and the complexity is 15
commit<pydriller.domain.commit.Commit object at 0x10595ba90>
- Author Mircea Lungu
 modified words_to_study.py  with a change type of MODIFY  and the complexity is 3
 modified bookmark.py  with a change type of MODIFY  and the complexity is 55
commit<pydriller.domain.commit.Commit object at 0x105906be0>
- Author Mircea Lungu
 modified bookmark.py  with a change type of MODIFY  and the complexity is 55
commit<pydriller.domain.commit.Commit object at 0x105ae3ee0>
- Author Mircea Lungu
 modified bookmark.py  with a change type of MODIFY  and the complexity is 55
commit<pydriller.domain.commit

 modified learner_stats.py  with a change type of MODIFY  and the complexity is 11
 modified word_exercise_stats.py  with a change type of MODIFY  and the complexity is 26
 modified url.py  with a change type of MODIFY  and the complexity is 18
 modified user.py  with a change type of MODIFY  and the complexity is 59
 modified user_word.py  with a change type of MODIFY  and the complexity is 17
 modified populate.py  with a change type of DELETE  and the complexity is None
 modified default_words.py  with a change type of MODIFY  and the complexity is 3
 modified configuration.py  with a change type of MODIFY  and the complexity is 8
 modified __init__.py  with a change type of RENAME  and the complexity is None
 modified ab_testing.py  with a change type of ADD  and the complexity is 6
 modified algorithm_loader.py  with a change type of ADD  and the complexity is 8
 modified algorithm_service.py  with a change type of RENAME  and the complexity is 17
 modified algorithm_wrapper.py  w

 modified bookmark.py  with a change type of MODIFY  and the complexity is 63
commit<pydriller.domain.commit.Commit object at 0x105aedb50>
- Author Mircea Lungu
 modified bookmark_priority_updater.py  with a change type of MODIFY  and the complexity is 17
commit<pydriller.domain.commit.Commit object at 0x104feb8e0>
- Author Mircea Lungu
 modified bookmark.py  with a change type of MODIFY  and the complexity is 66
commit<pydriller.domain.commit.Commit object at 0x105aedb50>
- Author Mircea Lungu
 modified bookmark.py  with a change type of MODIFY  and the complexity is 66
commit<pydriller.domain.commit.Commit object at 0x104feb8e0>
- Author Mircea Lungu
 modified bookmark.py  with a change type of MODIFY  and the complexity is 66
commit<pydriller.domain.commit.Commit object at 0x105ae3940>
- Author Mircea Lungu
 modified .travis.yml  with a change type of MODIFY  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x104feb8e0>
- Author Mircea Filip Lungu
 modified

 modified difficulty_estimator_factory.py  with a change type of MODIFY  and the complexity is 3
 modified difficulty_estimator_strategy.py  with a change type of MODIFY  and the complexity is 2
commit<pydriller.domain.commit.Commit object at 0x105ae3ee0>
- Author Lars Holdijk
 modified default_difficulty_estimator.py  with a change type of MODIFY  and the complexity is 1
 modified flesch_kincaid_reading_ease_difficulty_estimator.py  with a change type of MODIFY  and the complexity is 11
 modified frequency_difficulty_estimator.py  with a change type of MODIFY  and the complexity is 11
commit<pydriller.domain.commit.Commit object at 0x10593c6a0>
- Author Lars Holdijk
 modified test_flesch_kincaid_reading_ease_difficulty_estimator.py  with a change type of MODIFY  and the complexity is 15
 modified frequency_difficulty_estimator.py  with a change type of MODIFY  and the complexity is 11
commit<pydriller.domain.commit.Commit object at 0x105ae3ee0>
- Author Lars Holdijk
 modified test_dif

 modified test_flesch_kincaid_difficulty_estimator.py  with a change type of MODIFY  and the complexity is 21
 modified flesch_kincaid_difficulty_estimator.py  with a change type of MODIFY  and the complexity is 14
 modified frequency_difficulty_estimator.py  with a change type of MODIFY  and the complexity is 11
commit<pydriller.domain.commit.Commit object at 0x104feb8e0>
- Author Lars Holdijk
 modified test_flesch_kincaid_difficulty_estimator.py  with a change type of MODIFY  and the complexity is 23
 modified flesch_kincaid_difficulty_estimator.py  with a change type of MODIFY  and the complexity is 14
commit<pydriller.domain.commit.Commit object at 0x10593c220>
- Author Mircea Filip Lungu
commit<pydriller.domain.commit.Commit object at 0x10593c490>
- Author Mircea Lungu
 modified feed.py  with a change type of MODIFY  and the complexity is 23
 modified feed_registrations.py  with a change type of MODIFY  and the complexity is 8
commit<pydriller.domain.commit.Commit object at 0x104f

commit<pydriller.domain.commit.Commit object at 0x105aed400>
- Author Mircea Lungu
 modified test_feed.py  with a change type of MODIFY  and the complexity is 4
 modified test_retrieve_and_compute.py  with a change type of MODIFY  and the complexity is 2
 modified feed.py  with a change type of MODIFY  and the complexity is 28
commit<pydriller.domain.commit.Commit object at 0x104feb8e0>
- Author Mircea Lungu
 modified test_retrieve_and_compute.py  with a change type of MODIFY  and the complexity is 2
 modified article_downloader.py  with a change type of MODIFY  and the complexity is 10
 modified feed.py  with a change type of MODIFY  and the complexity is 28
commit<pydriller.domain.commit.Commit object at 0x10593cfa0>
- Author Mircea Lungu
 modified test_retrieve_and_compute.py  with a change type of MODIFY  and the complexity is 2
 modified article_downloader.py  with a change type of MODIFY  and the complexity is 10
commit<pydriller.domain.commit.Commit object at 0x104feb8e0>
- Auth

 modified feed.py  with a change type of MODIFY  and the complexity is 28
 modified url.py  with a change type of MODIFY  and the complexity is 18
 modified user_activitiy_data.py  with a change type of MODIFY  and the complexity is 8
commit<pydriller.domain.commit.Commit object at 0x105ae3ee0>
- Author Mircea Lungu
 modified test_retrieve_and_compute.py  with a change type of MODIFY  and the complexity is 3
 modified article.py  with a change type of MODIFY  and the complexity is 17
commit<pydriller.domain.commit.Commit object at 0x105ae3700>
- Author Mircea Lungu
 modified user_article.py  with a change type of MODIFY  and the complexity is 20
commit<pydriller.domain.commit.Commit object at 0x105ae3ee0>
- Author Mircea Lungu
 modified user_article.py  with a change type of MODIFY  and the complexity is 21
commit<pydriller.domain.commit.Commit object at 0x105ae3700>
- Author Mircea Lungu
 modified user_article.py  with a change type of MODIFY  and the complexity is 21
commit<pydriller

 modified .travis.yml  with a change type of MODIFY  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x104e88190>
- Author Mircea Filip Lungu
commit<pydriller.domain.commit.Commit object at 0x105b8a100>
- Author Mircea Lungu
 modified ubuntu_install.sh  with a change type of ADD  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x104e88190>
- Author Mircea Lungu
commit<pydriller.domain.commit.Commit object at 0x105b8af40>
- Author Mircea Lungu
 modified ubuntu_install.sh  with a change type of MODIFY  and the complexity is None
 modified language.py  with a change type of MODIFY  and the complexity is 14
commit<pydriller.domain.commit.Commit object at 0x104e88190>
- Author Mircea Lungu
 modified ubuntu_install.sh  with a change type of MODIFY  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x105b8a100>
- Author Mircea Lungu
 modified ubuntu_install.sh  with a change type of MODIFY  and the complexity is No

 modified test_retrieve_and_compute.py  with a change type of MODIFY  and the complexity is 6
 modified article_downloader.py  with a change type of MODIFY  and the complexity is 16
 modified article_quality_filter.py  with a change type of MODIFY  and the complexity is 8
commit<pydriller.domain.commit.Commit object at 0x10595b580>
- Author Mircea Lungu
 modified article_quality_filter.py  with a change type of MODIFY  and the complexity is 9
commit<pydriller.domain.commit.Commit object at 0x105b94490>
- Author Mircea Lungu
 modified article_quality_filter.py  with a change type of MODIFY  and the complexity is 9
commit<pydriller.domain.commit.Commit object at 0x105b94cd0>
- Author Mircea Lungu
 modified article_quality_filter.py  with a change type of MODIFY  and the complexity is 9
commit<pydriller.domain.commit.Commit object at 0x105b948b0>
- Author Mircea Lungu
 modified recent_activity.py  with a change type of ADD  and the complexity is 0
commit<pydriller.domain.commit.Commit obj

 modified test_exercise_session.py  with a change type of MODIFY  and the complexity is 14
 modified test_reading_session.py  with a change type of MODIFY  and the complexity is 20
 modified user_exercise_session.py  with a change type of MODIFY  and the complexity is 18
 modified user_reading_session.py  with a change type of MODIFY  and the complexity is 26
commit<pydriller.domain.commit.Commit object at 0x10593ce50>
- Author cpaz
 modified test_exercise_session.py  with a change type of MODIFY  and the complexity is 13
 modified test_reading_session.py  with a change type of MODIFY  and the complexity is 19
 modified user_exercise_session.py  with a change type of MODIFY  and the complexity is 16
 modified user_reading_session.py  with a change type of MODIFY  and the complexity is 26
commit<pydriller.domain.commit.Commit object at 0x104feb8e0>
- Author Mircea Lungu
 modified test_reading_session.py  with a change type of MODIFY  and the complexity is 19
 modified user_reading_sessi

 modified article.py  with a change type of ADD  and the complexity is 28
 modified article_word.py  with a change type of ADD  and the complexity is 9
 modified bookmark.py  with a change type of ADD  and the complexity is 69
 modified bookmark_priority_arts.py  with a change type of ADD  and the complexity is 4
 modified cohort.py  with a change type of ADD  and the complexity is 10
 modified domain_name.py  with a change type of ADD  and the complexity is 13
 modified exercise.py  with a change type of ADD  and the complexity is 3
 modified exercise_outcome.py  with a change type of ADD  and the complexity is 13
 modified exercise_source.py  with a change type of ADD  and the complexity is 7
 modified feed.py  with a change type of ADD  and the complexity is 30
 modified feed_registrations.py  with a change type of ADD  and the complexity is 8
 modified knowledge_estimator.py  with a change type of ADD  and the complexity is 21
 modified language.py  with a change type of ADD  and t

 modified mixed_recommender.py  with a change type of MODIFY  and the complexity is 29
 modified topic.py  with a change type of MODIFY  and the complexity is 9
commit<pydriller.domain.commit.Commit object at 0x104feb8e0>
- Author Mircea Filip Lungu
commit<pydriller.domain.commit.Commit object at 0x105afa280>
- Author Mircea Lungu
 modified remove_unreferenced_articles.py  with a change type of ADD  and the complexity is 0
 modified article.py  with a change type of MODIFY  and the complexity is 24
 modified user_activitiy_data.py  with a change type of MODIFY  and the complexity is 20
 modified user_article.py  with a change type of MODIFY  and the complexity is 25
commit<pydriller.domain.commit.Commit object at 0x104feb8e0>
- Author cpaz
 modified test_exercise_session.py  with a change type of MODIFY  and the complexity is 14
commit<pydriller.domain.commit.Commit object at 0x105b94df0>
- Author joel
 modified __init__.py  with a change type of MODIFY  and the complexity is 0
 modifi

 modified map_article_words.py  with a change type of MODIFY  and the complexity is 0
commit<pydriller.domain.commit.Commit object at 0x105afafd0>
- Author cpaz
 modified constants.py  with a change type of MODIFY  and the complexity is 0
 modified saturate_word_interaction_history.py  with a change type of MODIFY  and the complexity is 2
 modified word_interaction_history.py  with a change type of MODIFY  and the complexity is 32
commit<pydriller.domain.commit.Commit object at 0x105afabb0>
- Author feikoritsema
 modified mixed_recommender.py  with a change type of MODIFY  and the complexity is 36
commit<pydriller.domain.commit.Commit object at 0x105afafd0>
- Author feikoritsema
commit<pydriller.domain.commit.Commit object at 0x105aed640>
- Author Mircea Filip Lungu
commit<pydriller.domain.commit.Commit object at 0x105aed3a0>
- Author Mircea Lungu
 modified user.py  with a change type of MODIFY  and the complexity is 65
commit<pydriller.domain.commit.Commit object at 0x105afabb0>
- Aut

 modified __init__.py  with a change type of MODIFY  and the complexity is 0
commit<pydriller.domain.commit.Commit object at 0x104e88190>
- Author Mircea Lungu
 modified translators.log  with a change type of DELETE  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x105aed400>
- Author Mircea Lungu
 modified saturate_word_interaction_history.py  with a change type of RENAME  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x104e88190>
- Author joel
 modified saturate_word_interaction_history.py  with a change type of MODIFY  and the complexity is 10
commit<pydriller.domain.commit.Commit object at 0x105aed400>
- Author joel
 modified word_interaction_history.py  with a change type of MODIFY  and the complexity is 42
commit<pydriller.domain.commit.Commit object at 0x104feb8e0>
- Author joel
commit<pydriller.domain.commit.Commit object at 0x104e62f10>
- Author joel
 modified saturate_word_interaction_history.py  with a change type of M

 modified user_reading_session.py  with a change type of MODIFY  and the complexity is 29
commit<pydriller.domain.commit.Commit object at 0x104feb8e0>
- Author Mircea Filip Lungu
commit<pydriller.domain.commit.Commit object at 0x104e62820>
- Author feikoritsema
 modified mixed_recommender.py  with a change type of MODIFY  and the complexity is 59
commit<pydriller.domain.commit.Commit object at 0x104e62520>
- Author Mircea Filip Lungu
commit<pydriller.domain.commit.Commit object at 0x105b8ad60>
- Author Mircea Lungu
 modified articles_cache.py  with a change type of MODIFY  and the complexity is 16
commit<pydriller.domain.commit.Commit object at 0x104e62520>
- Author Mircea Lungu
 modified recompute_recommender_cache.py  with a change type of ADD  and the complexity is 7
commit<pydriller.domain.commit.Commit object at 0x104e88190>
- Author Mircea Lungu
 modified recompute_recommender_cache.py  with a change type of MODIFY  and the complexity is 7
commit<pydriller.domain.commit.Commit ob

 modified article.py  with a change type of MODIFY  and the complexity is 29
 modified user_activitiy_data.py  with a change type of MODIFY  and the complexity is 27
commit<pydriller.domain.commit.Commit object at 0x104e88100>
- Author Mircea Lungu
 modified fill_article_ids.py  with a change type of MODIFY  and the complexity is 4
commit<pydriller.domain.commit.Commit object at 0x104e88190>
- Author Mircea Lungu
 modified constants.py  with a change type of MODIFY  and the complexity is 0
 modified user_reading_session.py  with a change type of MODIFY  and the complexity is 33
commit<pydriller.domain.commit.Commit object at 0x104e88100>
- Author Mircea Lungu
 modified user_activitiy_data.py  with a change type of MODIFY  and the complexity is 27
commit<pydriller.domain.commit.Commit object at 0x105b8abe0>
- Author Mircea Lungu
 modified saturate_word_interaction_history.py  with a change type of MODIFY  and the complexity is 7
commit<pydriller.domain.commit.Commit object at 0x104e8810

 modified bookmark.py  with a change type of MODIFY  and the complexity is 80
commit<pydriller.domain.commit.Commit object at 0x105b8abb0>
- Author Mircea Filip Lungu
commit<pydriller.domain.commit.Commit object at 0x105aed850>
- Author Mircea Lungu
 modified user.py  with a change type of MODIFY  and the complexity is 78
commit<pydriller.domain.commit.Commit object at 0x104e88100>
- Author Mircea Lungu
 modified test_cohort.py  with a change type of MODIFY  and the complexity is 8
commit<pydriller.domain.commit.Commit object at 0x105afafd0>
- Author Mircea Lungu
 modified requirements.txt  with a change type of MODIFY  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x105aed700>
- Author Mircea Lungu
 modified user.py  with a change type of MODIFY  and the complexity is 79
commit<pydriller.domain.commit.Commit object at 0x105b94b50>
- Author Mircea Lungu
 modified saturate_word_interaction_history.py  with a change type of MODIFY  and the complexity is 9
com

 modified __init__.py  with a change type of RENAME  and the complexity is None
 modified model_test_mixin.py  with a change type of RENAME  and the complexity is None
 modified __init__.py  with a change type of RENAME  and the complexity is None
 modified article_rule.py  with a change type of RENAME  and the complexity is 5
 modified base_rule.py  with a change type of RENAME  and the complexity is None
 modified bookmark_rule.py  with a change type of RENAME  and the complexity is 12
 modified cohort_rule.py  with a change type of RENAME  and the complexity is 3
 modified exercise_rule.py  with a change type of RENAME  and the complexity is 4
 modified language_rule.py  with a change type of RENAME  and the complexity is 15
 modified outcome_rule.py  with a change type of RENAME  and the complexity is 10
 modified rss_feed_rule.py  with a change type of RENAME  and the complexity is 4
 modified source_rule.py  with a change type of RENAME  and the complexity is 8
 modified text_rul

 modified encounter_stats.py  with a change type of RENAME  and the complexity is 8
 modified encounter_stats_update.py  with a change type of RENAME  and the complexity is 2
 modified exercise_stats.py  with a change type of RENAME  and the complexity is 5
 modified learner_stats.py  with a change type of RENAME  and the complexity is None
 modified localized_topic.py  with a change type of RENAME  and the complexity is 11
 modified ranked_word.py  with a change type of RENAME  and the complexity is 3
 modified search.py  with a change type of RENAME  and the complexity is 9
 modified search_filter.py  with a change type of RENAME  and the complexity is 8
 modified search_subscription.py  with a change type of RENAME  and the complexity is 8
 modified session.py  with a change type of RENAME  and the complexity is 15
 modified __init__.py  with a change type of RENAME  and the complexity is None
 modified watch_event_type.py  with a change type of RENAME  and the complexity is 3
 modi

 modified __init__.py  with a change type of MODIFY  and the complexity is 0
 modified user.py  with a change type of MODIFY  and the complexity is 80
 modified bookmark_rule.py  with a change type of MODIFY  and the complexity is 12
 modified exercise_rule.py  with a change type of MODIFY  and the complexity is 4
 modified language_rule.py  with a change type of MODIFY  and the complexity is 15
 modified text_rule.py  with a change type of MODIFY  and the complexity is 4
 modified url_rule.py  with a change type of MODIFY  and the complexity is 4
 modified user_rule.py  with a change type of MODIFY  and the complexity is 8
 modified user_word_rule.py  with a change type of MODIFY  and the complexity is 7
 modified watch_event_type_rule.py  with a change type of MODIFY  and the complexity is 4
 modified watch_interaction_event_rule.py  with a change type of MODIFY  and the complexity is 6
commit<pydriller.domain.commit.Commit object at 0x1058b9130>
- Author Mircea Lungu
 modified setup

 modified article_crawler.py  with a change type of ADD  and the complexity is 0
 modified feed_retrieval.py  with a change type of MODIFY  and the complexity is 2
 modified recompute_recommender_cache.py  with a change type of MODIFY  and the complexity is 7
commit<pydriller.domain.commit.Commit object at 0x104c154c0>
- Author Mircea Lungu
 modified article_crawler.py  with a change type of MODIFY  and the complexity is 0
commit<pydriller.domain.commit.Commit object at 0x105b94520>
- Author Mircea Lungu
 modified __init__.py  with a change type of ADD  and the complexity is None
 modified article_crawler.py  with a change type of MODIFY  and the complexity is 0
commit<pydriller.domain.commit.Commit object at 0x104c154c0>
- Author Mircea Lungu
 modified __init__.py  with a change type of DELETE  and the complexity is None
 modified article_crawler.py  with a change type of MODIFY  and the complexity is 0
commit<pydriller.domain.commit.Commit object at 0x105b94520>
- Author Mircea Lungu

commit<pydriller.domain.commit.Commit object at 0x105b7b8b0>
- Author mads
 modified cohort_article_map.py  with a change type of MODIFY  and the complexity is 3
commit<pydriller.domain.commit.Commit object at 0x104dfd5e0>
- Author mads
commit<pydriller.domain.commit.Commit object at 0x10593c940>
- Author Mircea Filip Lungu
commit<pydriller.domain.commit.Commit object at 0x10593cc10>
- Author Mircea Lungu
 modified exercise.py  with a change type of MODIFY  and the complexity is 6
commit<pydriller.domain.commit.Commit object at 0x104dfd5e0>
- Author Mircea Lungu
 modified user.py  with a change type of MODIFY  and the complexity is 85
commit<pydriller.domain.commit.Commit object at 0x10593c730>
- Author Mircea Lungu
 modified user.py  with a change type of MODIFY  and the complexity is 85
commit<pydriller.domain.commit.Commit object at 0x104dfd5e0>
- Author Mircea Lungu
 modified user.py  with a change type of MODIFY  and the complexity is 86
commit<pydriller.domain.commit.Commit objec

 modified top_bookmarks_for_user.py  with a change type of ADD  and the complexity is 6
 modified __init__.py  with a change type of ADD  and the complexity is 0
 modified is_learned.py  with a change type of ADD  and the complexity is 3
 modified SortedExerciseLog.py  with a change type of ADD  and the complexity is 17
 modified bookmark.py  with a change type of MODIFY  and the complexity is 38
 modified exercise_outcome.py  with a change type of MODIFY  and the complexity is 13
 modified user.py  with a change type of MODIFY  and the complexity is 88
 modified bookmark_rule.py  with a change type of MODIFY  and the complexity is 12
 modified test_bookmark.py  with a change type of MODIFY  and the complexity is 31
commit<pydriller.domain.commit.Commit object at 0x104c154c0>
- Author Mircea Lungu
 modified negative_qualities.py  with a change type of MODIFY  and the complexity is 17
 modified bookmark.py  with a change type of MODIFY  and the complexity is 36
commit<pydriller.domain.c

 modified mysql_to_elastic.py  with a change type of MODIFY  and the complexity is 7
 modified mixed_recommender.py  with a change type of MODIFY  and the complexity is 57
 modified mysqlFullText.py  with a change type of MODIFY  and the complexity is 6
 modified article_downloader.py  with a change type of MODIFY  and the complexity is 39
commit<pydriller.domain.commit.Commit object at 0x105b7b190>
- Author sigc
 modified concurrent_test.bat  with a change type of ADD  and the complexity is None
 modified elasticsearch_query_comparison.py  with a change type of ADD  and the complexity is 20
 modified elasticsearch_statistical_analyser  with a change type of MODIFY  and the complexity is None
 modified mysqlFullText.py  with a change type of MODIFY  and the complexity is 6
commit<pydriller.domain.commit.Commit object at 0x104c154c0>
- Author Marcus
 modified mysql_to_elastic.py  with a change type of MODIFY  and the complexity is 7
 modified elastic_recommender.py  with a change type o

 modified cohort_article_map.py  with a change type of MODIFY  and the complexity is 3
commit<pydriller.domain.commit.Commit object at 0x105910700>
- Author sigc
 modified .gitignore  with a change type of MODIFY  and the complexity is None
 modified __init__.py  with a change type of RENAME  and the complexity is None
 modified cake_30_users_english.csv1  with a change type of ADD  and the complexity is None
 modified df_to_csv.py  with a change type of MODIFY  and the complexity is 0
 modified elasticsearch_query_comparison.py  with a change type of ADD  and the complexity is 20
 modified mysqlFullText.py  with a change type of ADD  and the complexity is 6
 modified relevance_test.py  with a change type of ADD  and the complexity is 20
 modified statistical_analyzer.py  with a change type of MODIFY  and the complexity is 0
 modified topics_only_30_users.csv1  with a change type of ADD  and the complexity is None
 modified user_30_test_sentence_english.csv1  with a change type of ADD 

 modified tag_topics_in_danish.py  with a change type of ADD  and the complexity is 0
commit<pydriller.domain.commit.Commit object at 0x10593c6d0>
- Author Mircea Lungu
 modified tag_topics_in_danish.py  with a change type of MODIFY  and the complexity is 0
commit<pydriller.domain.commit.Commit object at 0x10593c520>
- Author Mircea Lungu
 modified elastic_first_recommender.py  with a change type of MODIFY  and the complexity is 5
commit<pydriller.domain.commit.Commit object at 0x105910700>
- Author Mircea Lungu
 modified __init__.py  with a change type of ADD  and the complexity is None
 modified user_account_creation.py  with a change type of RENAME  and the complexity is 14
commit<pydriller.domain.commit.Commit object at 0x105b94700>
- Author Mircea Lungu
 modified elastic_recommender.py  with a change type of MODIFY  and the complexity is 15
 modified mixed_recommender.py  with a change type of MODIFY  and the complexity is 39
 modified articles_cache.py  with a change type of MODI

 modified README.md  with a change type of MODIFY  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x104dfd5e0>
- Author Mircea Lungu
 modified README.md  with a change type of MODIFY  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x104c154c0>
- Author Mircea Lungu
commit<pydriller.domain.commit.Commit object at 0x10593c700>
- Author Mircea Lungu
 modified remove_unreferenced_articles.py  with a change type of MODIFY  and the complexity is 13
commit<pydriller.domain.commit.Commit object at 0x105910700>
- Author Mircea Lungu
 modified remove_unreferenced_articles.py  with a change type of MODIFY  and the complexity is 18
commit<pydriller.domain.commit.Commit object at 0x104dfd5e0>
- Author Mircea Lungu
 modified requirements.txt  with a change type of MODIFY  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x105910700>
- Author Mircea Lungu
 modified anonimize_user.py  with a change type of MODIFY  and th

 modified user.py  with a change type of MODIFY  and the complexity is 90
commit<pydriller.domain.commit.Commit object at 0x104e62f70>
- Author Mircea Lungu
 modified user.py  with a change type of MODIFY  and the complexity is 90
commit<pydriller.domain.commit.Commit object at 0x104e625e0>
- Author Mircea Lungu
 modified user.py  with a change type of MODIFY  and the complexity is 90
commit<pydriller.domain.commit.Commit object at 0x104e62f70>
- Author Mircea Lungu
 modified user_account_creation.py  with a change type of MODIFY  and the complexity is 5
commit<pydriller.domain.commit.Commit object at 0x1059e6be0>
- Author Mircea Lungu
 modified test.yml  with a change type of ADD  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x104e62f70>
- Author Mircea Lungu
 modified test.yml  with a change type of MODIFY  and the complexity is None
commit<pydriller.domain.commit.Commit object at 0x104dfd5e0>
- Author Mircea Lungu
 modified test.yml  with a change type 

#### Let's Count the Modifications for Each File

In [6]:
from collections import defaultdict

commit_counts = defaultdict(int)

for commit in RepositoryMining(REPO_DIR).traverse_commits():
    for modification in commit.modifications:
        try:
            commit_counts [modification.new_path] += 1
        except: 
            pass

sorted(commit_counts.items(), key=lambda x: x[1], reverse=True)[:42]


[(None, 197),
 ('zeeguu/model/bookmark.py', 83),
 ('zeeguu/model/user.py', 69),
 ('zeeguu/model/article.py', 49),
 ('README.md', 48),
 ('zeeguu/content_recommender/mixed_recommender.py', 47),
 ('zeeguu/model/__init__.py', 40),
 ('zeeguu/model/feed.py', 36),
 ('zeeguu_core/content_retriever/article_downloader.py', 36),
 ('zeeguu/populate.py', 35),
 ('.travis.yml', 35),
 ('zeeguu_core/model/user.py', 33),
 ('setup.py', 31),
 ('zeeguu/content_retriever/article_downloader.py', 29),
 ('zeeguu/model/language.py', 26),
 ('zeeguu_core/content_recommender/mixed_recommender.py', 26),
 ('zeeguu/model/url.py', 25),
 ('zeeguu/model/user_activitiy_data.py', 25),
 ('tests_core_zeeguu/test_words_to_study.py', 24),
 ('requirements.txt', 23),
 ('tests_core_zeeguu/test_bookmark.py', 22),
 ('tests_core_zeeguu/test_retrieve_and_compute.py', 22),
 ('tools/map_article_words.py', 22),
 ('zeeguu/model/user_reading_session.py', 20),
 ('tests_core_zeeguu/model_test_mixin.py', 19),
 ('zeeguu/temporary/default_wor

#### Problem: many `__init__.py` files in our system but only one in the counts!

- what's the full file name? 

- looking at the documentation of PyDriller [1] we see that there's two:
  - old_path
  - new_path

- why? 
- which one should we be using? 

[1] https://pydriller.readthedocs.io/en/latest/commit.html


#### Lesson: to track full paths  we need to also track *individual file evolution*

In [None]:
from pydriller import ModificationType

commit_counts = {}

for commit in RepositoryMining(REPO_DIR).traverse_commits():
    for modification in commit.modifications:
        
        new_path = modification.new_path
        old_path = modification.old_path
        
        try:

            if modification.change_type == ModificationType.RENAME:
                commit_counts[new_path]=commit_counts.get(old_path,0)+1
                commit_counts.pop(old_path)

            elif modification.change_type == ModificationType.DELETE:
                commit_counts.pop(old_path, '')

            elif modification.change_type == ModificationType.ADD:
                commit_counts[new_path] = 1

            else: # modification to existing file
                    commit_counts [old_path] += 1
        except Exception as e: 
            print("something went wrong with: " + str(modification))
            pass
        
sorted(commit_counts.items(), key=lambda x:x[1], reverse=True)


#### Aggregating to module level



In [None]:
from code.basic_abstraction import (
    module_from_path, 
    top_level_module
)

module_activity = defaultdict(int)

for path, count in commit_counts.items():
    if ".py" in str(path):
        l2_module = top_level_module(module_from_path(path), 2)
        module_activity[l2_module] += count

sorted(module_activity.items(), key=lambda x: x[1], reverse=True)



In [None]:
most_active_modules = sorted(module_activity.items(), key=lambda x: x[1], reverse=True)

top_most_active_modules= [each[0] for each in most_active_modules][:5]
top_most_active_modules


#### Architectural View: Relationships Between Evolutionary Hotspots


In [None]:
# packages required for drawing
import sys
!{sys.executable} -m pip install networkx --upgrade
!{sys.executable} -m pip install matplotlib

In [None]:
def system_module(m):
    return m in top_most_active_modules

def module_size(m):
    return 30*module_activity[m]

In [None]:
from code.basic_abstraction import (
    dependencies_graph, 
    draw_graph_with_weights,
    top_level_module,
    abstracted_to_top_level)

directed = dependencies_graph(REPO_DIR)
directedAbstracted = abstracted_to_top_level(directed, system_module)

draw_graph_with_weights(directedAbstracted, module_size, (18,8))

### Stepping Back

We used Git but similar for any VCS 

Alternative tools for VCS Analysis: 

- git log + Unix Command Line tools (See tutorials by Spinellis, Helge in ASE, or Tornhill)
  
- your IDE (e.g. integrated git blame, visual diff, etc.)

- Any others...?

Definition of most active can be tuned based on needs
- could be log-weighted towards recency (discard past changes more)
- could be used to replay the history of the system by looking at non-overlapping time windows


### Limitations

- ignores developer styles
  - the guy with micro-commits vs. the girl who like to commit infrequently but large chunks of code
  
- might detect files that `README.md`, or `LICENSE.md` changes the most
  - can be combined with static complexity metrics [1]






## 2. Dependency Extraction: Logical Coupling

** Logical coupling** detects when **two sub-systems** change together **frequently**
- The more they change together, the more likely they are dependent
- Can capture dependencies that are not detectable by static/dynamic analysis
  - e.g. ? 


Introduced in the context of an industrial case study [1]

[1] Detection of Logical Coupling Based on Product Release History, Gall et al., ’98

### Logical Coupling: The Details...


- What are sub-systems (files? folders? packages?)
- What does it mean change together (same commit? sliding time window?)
- The threshold for "frequently" (e.g. *75% of the commits min 10*, etc.)



### Advantages of Logical Coupling

Language Independent

Complements some Static Analysis disadvantages: 
- can not capture all the situations (i.e. writing to a file, reading from a file)
- does not work with documents that are not source code (e.g. XML files)


## Evolution Analysis Beyond Architecture Recovery

- Improved developer tools
  - e.g. recording and replaying software evolution (e.g. "Replay" for Eclipse)
    - fine-grained (method-level) evolution monitoring (Robbes et al.)


- *Program comprehension* when first encountering a new system



# References

- Laws of Software Evolution Revisited. M. Lehman.
- Detection Logical Coupling Based on Product Release History. Gall et al.
- Software Aging. David Lorge Parnas

# Further Reading
- *Source Code as a Crime Scene*. A. Tornhill




