







Donortor

# Multi-Dimensional Analysis of Software Power Consumptions in Multi-Core Architectures

Maxime Colmant — PhD Defense — 24<sup>th</sup> November, 2016

#### Jury:

Olivior

| Mr.  | Rüdiger     | KAPITZA    | University Braunschweig               | Reporter   |
|------|-------------|------------|---------------------------------------|------------|
| Mr.  | Giuseppe    | Lipari     | UNIVERSITY LILLE 1                    | Examiner   |
| Mrs. | Anne-Cécile | Orgerie    | CNRS                                  | Examiner   |
| Mr.  | Romain      | ROUVOY     | University Lille 1 University Lille 1 | Supervisor |
| Mr.  | Lionel      | SEINTURIER |                                       | Supervisor |
| Mr.  | Alain       | ANGLADE    | ADEME                                 | Guest      |

HMIVEDCITY DENNIC 1

#### TABLE OF CONTENTS

- 1. Introduction
- 2. Contributions
- 3. Conclusion & Perspectives

# Introduction

### THE GLOBAL ICT<sup>1</sup> FOOTPRINT<sup>2</sup>



Introduction 2/47

<sup>&</sup>lt;sup>1</sup>Information and Communications Technology

<sup>&</sup>lt;sup>2</sup>The Climate Group. SMART 2020: Enabling the low carbon economy in the information age. 2008.

#### MULTI-CORE CPU ARCHITECTURES ARE EVERYWHERE!



Introduction 3/47

#### MULTI-CORE CPU ARCHITECTURES ARE EVERYWHERE!



Introduction 3/47

#### **CASE STUDY**



Introduction 4/47



Introduction 5/47



Introduction 6/47



Introduction 7/47



Introduction 8/47



Introduction 9/47

#### **RESEARCH QUESTIONS**

**RQ1:** Can we model the software power consumption regardless of the underlying architecture?





Introduction 10/47

#### **RESEARCH QUESTIONS**

**RQ2:** Can we propose a uniform view of the service power consumption?



Introduction 11/47

#### **RESEARCH QUESTIONS**

**RQ3:** Can we analyze the power consumption of the artifacts which compose a software?



Introduction 12/47

# CONTRIBUTIONS

**RQ1:** Can we model the software power consumption regardless of the underlying architecture?





Contributions 13/47

RQ1: Can we model the software power consumption regardless of the underlying architecture?





Learning CPU Power Models

Contributions 13/47

• Math. function (metrics)  $\Rightarrow$  Power

- Math. function (metrics) ⇒ Power
- Mostly linear

Univariate:  $P = a_x + b$ 

Multivariate:  $P = a_x + b_y + c$ 

- Math. function (metrics) ⇒ Power
- · Mostly linear

Univariate: 
$$P = a_x + b$$
  
Multivariate:  $P = a_x + b_y + c$ 

Or polynomial

$$P = a_{x^2} + b_x + c$$

- Math. function (metrics) ⇒ Power
- · Mostly linear

Univariate: 
$$P = a_x + b$$
  
Multivariate:  $P = a_x + b_y + c$ 

· Or polynomial

$$P = a_{x^2} + b_x + c$$

· CPU metrics

From HW sensors (motherboard, power meters)
From Hardware Performance Counters (HPCs)

- Math. function (metrics) ⇒ Power
- · Mostly linear

Univariate: 
$$P = a_x + b$$
  
Multivariate:  $P = a_x + b_y + c$ 

Or polynomial

$$P = a_{x^2} + b_x + c$$

CPU metrics

From HW sensors (motherboard, power meters)
From Hardware Performance Counters (HPCs)

•  $[Nou14]^3$ :  $P_{cpu}^{app} = 0.7 * TDP * CPU_{stats}$ 

<sup>&</sup>lt;sup>3</sup>A. Noureddine. "Towards a Better Understanding of the Energy Consumption of Software Systems". PhD thesis. Université des Sciences et Technologie de Lille - Lille I, 2014.

| Ref.     | Processor(s)                            | Feature(s)                         | Regression(s)   | Benchmarks                                                     |
|----------|-----------------------------------------|------------------------------------|-----------------|----------------------------------------------------------------|
| [Ber+10] | Core 2 Duo                              | 14 PCs regrouped by component      |                 | sampl.: μ-benchs<br>eval.: SPEC CPU 06                         |
| [Col+15] | Xeon<br>W3520 & i3 2120                 | non-halted cycles reference cycles | polynomial      | sampl.: stress<br>eval.: PARSEC, SPECjbb                       |
| [CM05]   | XScale<br>PXA255                        | 5 PCs                              | multiple linear | eval.: SPEC CPU 00,<br>Java CDC/CLDC                           |
| [Dol+15] | Xeon<br>E3-1275                         | 3 PCs<br>HW sensors                | linear          | sampl.: linpack, stream, iperf, IOR<br>eval.: Quantum Espresso |
| [ERK06]  | Turion,<br>Itanium 2                    | HW sensors                         | multiple linear | sampl.: Gamut<br>eval.: SPECs, Matrix, Stream                  |
| [IM03]   | Pentium 4                               | 15 PCs                             | multiple linear | eval.: μ-benchs, AbiWord,<br>Mozilla, Gnumeric                 |
| [RRK08]  | Core 2 Duo & Xeon,<br>Itanium 2, Turion | HW sensors<br>PCs                  | multinla linaar | sampl.: calibration suite<br>eval.: SPECs, stream, Nsort       |
| [Yan+14] | Xeon<br>E5620 & E7530                   | 7 components<br>91 preselected     | support vector  | sampl.: NPB, IOzone, CacheBench<br>eval.: SPEC CPU 06, IOzone  |
| [Zha+14] | Sandy Bridge                            | non-halted cycles                  | linear          | eval.: Google, SPEC CPU 06                                     |
| ???      | ARM                                     | ???                                | ???             | ???                                                            |

### Only for Intel or AMD architectures

| Ref.     | Processor(s)                            | Feature(s)                         | Regression(s)                   | Benchmarks                                                    |
|----------|-----------------------------------------|------------------------------------|---------------------------------|---------------------------------------------------------------|
| [Ber+10] | Core 2 Duo                              | 14 HPCs regrouped by component     | multiple linear<br>by component | sampl.: μ-benchs<br>eval.: SPEC CPU 06                        |
| [Col+15] | Xeon<br>W3520 & i3 2120                 | non-halted cycles reference cycles | polynomial                      | sampl.: stress<br>eval.: PARSEC, SPECjbb                      |
| [CM05]   | XScale<br>PXA255                        | 5 HPCs                             | multiple linear                 | eval.: SPEC CPU 00,<br>Java CDC/CLDC                          |
| [Dol+15] | Xeon<br>E3-1275                         | 3 HPCs<br>HW sensors               | linear                          | sampl.: linpack, stream, iperf, IOR eval.: Quantum Espresso   |
| [ERK06]  | Turion,<br>Itanium 2                    | HW sensors                         | multiple linear                 | sampl.: Gamut<br>eval.: SPECs, Matrix, Stream                 |
| [IM03]   | Pentium 4                               | 15 HPCs                            | multiple linear                 | eval.: μ-benchs, AbiWord,<br>Mozilla, Gnumeric                |
| [RRK08]  | Core 2 Duo & Xeon,<br>Itanium 2, Turion | HW sensors<br>HPCs                 | multiple linear                 | sampl.: calibration suite<br>eval.: SPECs, stream, Nsort      |
| [Yan+14] | Xeon<br>E5620 & E7530                   | 7 components<br>91 preselected     | support vector                  | sampl.: NPB, IOzone, CacheBench<br>eval.: SPEC CPU 06, IOzone |
| [Zha+14] | Sandy Bridge                            | non-halted cycles                  | linear                          | eval.: Google, SPEC CPU 06                                    |

## HW sensors: coarse-grained CPU metrics

| Ref.     | Processor(s)                            | Feature(s)                         | Regression(s)                   | Benchmarks                                                    |
|----------|-----------------------------------------|------------------------------------|---------------------------------|---------------------------------------------------------------|
| [Ber+10] | Core 2 Duo                              | 14 HPCs regrouped by component     | multiple linear<br>by component | sampl.: μ-benchs<br>eval.: SPEC CPU 06                        |
| [Col+15] | Xeon<br>W3520 & i3 2120                 | non-halted cycles reference cycles | polynomial                      | sampl.: stress<br>eval.: PARSEC, SPECjbb                      |
| [CM05]   | XScale<br>PXA255                        | 5 HPCs                             | multiple linear                 | eval.: SPEC CPU 00,<br>Java CDC/CLDC                          |
| [Dol+15] | Xeon<br>E3-1275                         | 3 HPCs<br>HW sensors               | linear                          | sampl.: linpack, stream, iperf, IOR eval.: Quantum Espresso   |
| [ERK06]  | Turion,<br>Itanium 2                    | HW sensors                         | multiple linear                 | sampl.: Gamut<br>eval.: SPECs, Matrix, Stream                 |
| [IM03]   | Pentium 4                               | 15 HPCs                            | militinie linear                | eval.: μ-benchs, AbiWord,<br>Mozilla, Gnumeric                |
| [RRK08]  | Core 2 Duo & Xeon,<br>Itanium 2, Turion | HW sensors<br>HPCs                 | multiple linear                 | sampl.: calibration suite<br>eval.: SPECs, stream, Nsort      |
| [Yan+14] | Xeon<br>E5620 & E7530                   | 7 components<br>91 preselected     | support vector                  | sampl.: NPB, IOzone, CacheBench<br>eval.: SPEC CPU 06, IOzone |
| [Zha+14] | Sandy Bridge                            | non-halted cycles                  | linear                          | eval.: Google, SPEC CPU 06                                    |

## **HPCs:** fine-grained CPU metrics

| Ref.     | Processor(s)                            | Feature(s)                         | Regression(s)                   | Benchmarks                                                    |
|----------|-----------------------------------------|------------------------------------|---------------------------------|---------------------------------------------------------------|
| [Ber+10] | Core 2 Duo                              | 14 HPCs regrouped<br>by component  | multiple linear<br>by component | sampl.: μ-benchs<br>eval.: SPEC CPU 06                        |
| [Col+15] | Xeon<br>W3520 & i3 2120                 | non-halted cycles reference cycles | polynomial                      | sampl.: stress<br>eval.: PARSEC, SPECjbb                      |
| [CM05]   | XScale<br>PXA255                        | 5 HPCs                             | multiple linear                 | eval.: SPEC CPU 00,<br>Java CDC/CLDC                          |
| [Dol+15] | Xeon<br>E3-1275                         | 3 HPCs<br>HW sensors               | linear                          | sampl.: linpack, stream, iperf, IOR eval.: Quantum Espresso   |
| [ERK06]  | Turion,<br>Itanium 2                    | HW sensors                         | multiple linear                 | sampl.: Gamut<br>eval.: SPECs, Matrix, Stream                 |
| [IM03]   | Pentium 4                               | 15 HPCs                            | imilitinie linear               | eval.: μ-benchs, AbiWord,<br>Mozilla, Gnumeric                |
| [RRK08]  | Core 2 Duo & Xeon,<br>Itanium 2, Turion | HW sensors<br>HPCs                 | multiple linear                 | sampl.: calibration suite<br>eval.: SPECs, stream, Nsort      |
| [Yan+14] | Xeon<br>E5620 & E7530                   | 7 components<br>91 preselected     | support vector                  | sampl.: NPB, IOzone, CacheBench<br>eval.: SPEC CPU 06, IOzone |
| [Zha+14] | Sandy Bridge                            | non-halted cycles                  | linear                          | eval.: Google, SPEC CPU 06                                    |

### Power models are mostly linear

| Ref.     | Processor(s)                            | Feature(s)                         | Regression(s)      | Benchmarks                                                    |
|----------|-----------------------------------------|------------------------------------|--------------------|---------------------------------------------------------------|
| [Ber+10] | Core 2 Duo                              | 14 HPCs regrouped by component     |                    | sampl.: μ-benchs<br>eval.: SPEC CPU 06                        |
| [Col+15] | Xeon<br>W3520 & i3 2120                 | non-halted cycles reference cycles | polynomial         | sampl.: stress<br>eval.: PARSEC, <mark>SPECjbb</mark>         |
| [CM05]   | XScale<br>PXA255                        | 5 HPCs                             | multiple linear    | eval.: SPEC CPU 00,<br>Java CDC/CLDC                          |
| [Dol+15] | Xeon<br>E3-1275                         | 3 HPCs<br>HW sensors               | linear             | sampl.: linpack, stream, iperf, IOR eval.: Quantum Espresso   |
| [ERK06]  | Turion,<br>Itanium 2                    | HW sensors                         | multiple linear    | sampl.: Gamut<br>eval.: SPECs, Matrix, Stream                 |
| [IM03]   | Pentium 4                               | 15 HPCs                            | l milltinle linear | eval.: μ-benchs, AbiWord,<br>Mozilla, Gnumeric                |
| [RRK08]  | Core 2 Duo & Xeon,<br>Itanium 2, Turion | HW sensors<br>HPCs                 | militinia linaar   | sampl.: calibration suite<br>eval.: SPECs, stream, Nsort      |
| [Yan+14] | Xeon<br>E5620 & E7530                   | 7 components<br>91 preselected     |                    | sampl.: NPB, IOzone, CacheBench<br>eval.: SPEC CPU 06, IOzone |
| [Zha+14] | Sandy Bridge                            | non-halted cycles                  | linear             | eval.: Google, SPEC CPU 06                                    |

# Non free or private workloads

1. Portability

- 1. Portability
- 2. Accuracy

- 1. Portability
- 2. Accuracy
- 3. Reproducibility

- 1. Portability
- 2. Accuracy
- 3. Reproducibility

Towards an automatic approach for learning CPU power models

#### OUR APPROACH:

#### **OPEN-TESTBED TO AUTOMATICALLY LEARN POWER MODELS**



- Input workload injection
  - Configurable
  - PARSEC (open-source, multi-threaded)<sup>4</sup>
  - · Run several applications (x264, vips, etc.)

<sup>&</sup>lt;sup>4</sup>C. Bienia et al. "PARSEC 2.0: A New Benchmark Suite for Chip-Multiprocessors". In: Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation. 2009.

# Our approach: Open-Testbed To Automatically Learn Power Models



- Acquisition of raw input metrics
  - Automatically explore the high number of the available HPCs (Xeon W3520: 514 HPCs)
  - Take care of HPC multiplexing<sup>5</sup>

<sup>&</sup>lt;sup>5</sup>Intel. Intel 64 and IA-32 Architectures Software Developer's Manual. 2015.

# Our approach: Open-Testbed To Automatically Learn Power Models



- 3 Selection of relevant HPCs
  - Pearson coefficient (HPC ⇔ Power)
  - 1<sup>st</sup> phase: quickly filtering out uncorrelated HPCs (< 0.5) (Xeon W3250: 253 left out)
  - $\cdot$  2<sup>nd</sup> phase: full sampling for the remaining HPCs

# OUR APPROACH: OPEN-TESTBED TO AUTOMATICALLY LEARN POWER MODELS

Pearson coefficients of the Top-30 correlated events for the PARSEC benchmarks on a Xeon W3520.



# OUR APPROACH: OPEN-TESTBED TO AUTOMATICALLY LEARN POWER MODELS

Pearson coefficients of the Top-30 correlated events for the PARSEC benchmarks on a Xeon W3520.



# OUR APPROACH: OPEN-TESTBED TO AUTOMATICALLY LEARN POWER MODELS

Pearson coefficients of the Top-30 correlated events for the PARSEC benchmarks on a Xeon W3520.



# OUR APPROACH: OPEN-TESTBED TO AUTOMATICALLY LEARN POWER MODELS

Pearson coefficients of the Top-30 correlated events for the PARSEC benchmarks on a Xeon W3520.





- Power model inference
  - · Minimize the number of HPCs
  - Robust ridge regression (SotA?)

Average error per combination of HPCs for freqmine, fluidanimate, facesim on a Xeon W3520.

$$P_{idle} = 92 \text{ W}; \ P_{CPU} = \frac{1.40 \cdot \text{HPC (l1i:reads)}}{10^8} + \frac{7.29 \cdot \text{HPC (lsd:inactive)}}{10^9}$$



Average error per combination of HPCs for freqmine, fluidanimate, facesim on a Xeon W3520.

$$P_{idle} = 92 \text{ W}$$
;  $P_{CPU} = \frac{1.40 \cdot \text{HPC (l1i:reads)}}{10^8} + \frac{7.29 \cdot \text{HPC (lsd:inactive)}}{10^9}$ 



Relative errors for the PARSEC suite on the Cortex A15.



Portability

Beyond SotA: 4 CPUs (2×Intel, 1 AMD, 1 ARM)

## Portability

Beyond SotA: 4 CPUs (2×Intel, 1 AMD, 1 ARM)

### · Accuracy

Avg. error on the 4 CPUs: 1.5%

Portability

Beyond SotA: 4 CPUs (2×Intel, 1 AMD, 1 ARM)

Accuracy

Avg. error on the 4 CPUs: 1.5%

· Reproducibility

Built on open-source workloads

## · Portability

Beyond SotA: 4 CPUs (2×Intel, 1 AMD, 1 ARM)

#### Accuracy

Avg. error on the 4 CPUs: 1.5%

### Reproducibility

Built on open-source workloads

## · Extensibility

Can we extend our learning approach to SSD power models?

#### MOTIVATION

Comparison of power consumptions between CPU and SSD by varying the throughput with the **fio** tool.

(a) SSD read operations.



(b) SSD write operations.









Power consumption of the host for 5 workloads on a Xeon E5-2630.



**RQ2:** Can we propose a uniform view of the service power consumption?



Contributions 28/47

## **RQ2:** Can we propose a uniform view of the service power consumption?



## Challenges

- 1. Native
- 2. Virtualized
- 3. Distributed

Contributions 29/47

## **RQ2:** Can we propose a uniform view of the service power consumption?



## Challenges

- 1. Native
- 2. Virtualized
- 3. Distributed

Contributions 29/47

- · Code freely available on GITHUB: http://powerapi.org
  - · Scala / Akka
  - LoC: 8.7k
  - Docker
  - · AGPLv3

- · Code freely available on GITHUB: http://powerapi.org
  - · Scala / Akka
  - LoC: 8.7k
  - Docker
  - · AGPLv3
- 2<sup>nd</sup> major iteration<sup>6</sup>
  - Full support of multi-core CPU architectures (HT, DVFS, TB)
  - · Learning techniques
  - Better support of Akka

<sup>&</sup>lt;sup>6</sup>A. Noureddine. "Towards a Better Understanding of the Energy Consumption of Software Systems". PhD thesis. Université des Sciences et Technologie de Lille - Lille I, 2014.









#### SD Power Meter For Monitoring Concurrent Apps



· On the Intel Xeon W3520

Monitoring freq.: 4Hz

· Avg. error: 2%

· Low overhead: 2 W

## **RQ2:** Can we propose a uniform view of the service power consumption?



### Challenges

- 1. Native
- 2. Virtualized
- 3. Distributed

Contributions 33/47

#### **BITWATTS ARCHITECTURE**



#### **EVALUATION**

Scaling PARSEC on multiple VMs on a Xeon W3520.



- Errors: from 1% (fluidanimate) up to 10% (swaptions)
- Beyond SotA [Ber+12]: VM as a White-Box (+ multi-tenant)

<sup>&</sup>lt;sup>7</sup>R. Bertran et al. "Energy Accounting for Shared Virtualized Environments Under DVFS Using PMC-based Power Models". In: Future Generation Computer Systems (2012).

## **RQ2:** Can we propose a uniform view of the service power consumption?



## Challenges

- 1. Native
- 2. Virtualized
- 3. Distributed

Contributions 36/47

### A SERVICE-LEVEL POWER MONITORING



### A SERVICE-LEVEL POWER MONITORING



#### A Service-Level Power Monitoring



#### A Service-Level Power Monitoring



**RQ3:** Can we analyze the power consumption of the artifacts which compose a software?



Contributions 40/47

#### OVERVIEW OF THE CODENERGY APPROACH



### OVERVIEW OF THE CODENERGY APPROACH



### OVERVIEW OF THE CODENERGY APPROACH



### OVERVIEW OF THE CODENERGY APPROACH



### CODVIZU: SUNBURST (1)



### CODVIZU: STREAMGRAPH (2)





a => readQueryFromClient (CPU): 12.40 W ; readQueryFromClient (DISK): 4.66 W b => je\_huge\_ralloc (CPU): 12.34 W ; je\_huge\_ralloc (DISK): 4.56 W

 16:12:18
 16:12:20
 16:12:22
 16:12:24
 16:12:26
 16:12:28

### redis 2 (1) vs redis 3 (2).







### redis 2 (1) vs redis 3 (2).



### redis 2 (1) vs redis 3 (2).



# CONCLUSION & PERSPECTIVES

Multi-dimensional analysis of software power consumptions on multi-core architectures

Multi-dimensional analysis of software power consumptions on multi-core architectures

- RQ1: Can we model the software power consumption regardless of the underlying architecture?
  - Open-testbed approach for learning multi-core power models

# Multi-dimensional analysis of software power consumptions on multi-core architectures

• RQ1: Can we model the software power consumption regardless of the underlying architecture?

Open-testbed approach for learning multi-core power models

 RQ2: Can we propose a uniform view of the service power consumption?

In width energy monitoring with POWERAPI, BITWATTS & WATTSKIT

Multi-dimensional analysis of software power consumptions on multi-core architectures

• RQ1: Can we model the software power consumption regardless of the underlying architecture?

Open-testbed approach for learning multi-core power models

• RQ2: Can we propose a uniform view of the service power consumption?

In width energy monitoring with PowerAPI, BITWATTS & WATTSKIT

• RQ3: Can we analyze the power consumption of the artifacts which compose a software?

In depth energy monitoring with CODENERGY

### **SHORT-TERM PERSPECTIVES**

- Define a new scheduler for saving energy in cloud data centers
- · Continuous optimization of the power models in a cluster
- Turning-off nodes of a cluster during inactivity periods
- · Leveraging source-code energy monitoring
- Extend CODENERGY to other programming languages

### **SHORT-TERM PERSPECTIVES**

- Define a new scheduler for saving energy in cloud data centers
- · Continuous optimization of the power models in a cluster
- Turning-off nodes of a cluster during inactivity periods
- · Leveraging source-code energy monitoring
- Extend CODENERGY to other programming languages

### **SHORT-TERM PERSPECTIVES**

- Define a new scheduler for saving energy in cloud data centers
- · Continuous optimization of the power models in a cluster
- · Turning-off nodes of a cluster during inactivity periods
- · Leveraging source-code energy monitoring
- Extend CODENERGY to other programming languages

### **LONG-TERM PERSPECTIVES**

- · The power rising of GPU cards
- · Proposing a wider energy cartography of a system
- Using genetic programming to improve the energy-efficiency at source-code level
- Defining solutions to automatically optimize the software energy-efficiency

### **LONG-TERM PERSPECTIVES**

- · The power rising of GPU cards
- Proposing a wider energy cartography of a system
- Using genetic programming to improve the energy-efficiency at source-code level
- Defining solutions to automatically optimize the software energy-efficiency

### **LONG-TERM PERSPECTIVES**

- The power rising of GPU cards
- Proposing a wider energy cartography of a system
- Using genetic programming to improve the energy-efficiency at source-code level
- Defining solutions to automatically optimize the software energy-efficiency

### **PUBLICATIONS**

## Thanks for your attention.

### Conferences

| [Col+15] | M. Colmant et al. "Process-level Power Estimation in VM-based Systems". In: Proceedings of the 10th European Conference on |
|----------|----------------------------------------------------------------------------------------------------------------------------|
|          | Computer Systems (EuroSys). 2015.                                                                                          |

- [CRS14] M. Colmant, R. Rouvoy, and L. Seinturier. "Improving the Energy Efficiency of Software Systems for Multi-Core Architectures". In: Middleware 2014 Doctoral Symposium. 2014.
- [CRS15] M. Colmant, R. Rouvoy, and L. Seinturier. "Estimation de la consommation des systèmes logiciels sur des architectures multi-coeurs". In: Conférence d'informatique en Parallélisme, Architecture et Système (Compas). 2015.
- [Hav+ar] A. Havet et al. "GENPACK: A Generational Scheduler for Cloud Data Centers". In: IEEE International Conference on Cloud Engineering (IC2E), 2017. (To appear).

#### **Under Evaluation**

[Col+16] M. Colmant et al. "The Next 700 CPU Power Models". In: ACM Trans. Model. Perform. Eval. Comput. Syst. (ACM TOMPECS) (2016).

### REFERENCES I

| [Ber+10] | R. Bertran et al. "Decomposable and Responsive Power Models for Multicore Processors Using Performance Counters". In: Proceedings of the 24th ACM International Conference on Supercomputing. 2010.                         |
|----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [Ber+12] | R. Bertran et al. "Energy Accounting for Shared Virtualized Environments Under DVFS Using PMC-based Power Models". In: Future Generation Computer Systems (2012).                                                           |
| [BL09]   | C. Bienia and K. Li. "PARSEC 2.0: A New Benchmark Suite for Chip-Multiprocessors". In: Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation. 2009.                                               |
| [CM05]   | G. Contreras and M. Martonosi. "Power Prediction for Intel XScale® Processors Using Performance Monitoring Unit Events". In: Proceedings of the International Symposium on Low Power Electronics and Design. 2005.          |
| [Col+15] | M. Colmant et al. "Process-level Power Estimation in VM-based Systems". In: Proceedings of the 10th European Conference on Computer Systems (EuroSys). 2015.                                                                |
| [Col+16] | M. Colmant et al. "The Next 700 CPU Power Models". In: ACM Trans. Model. Perform. Eval. Comput. Syst. (ACM TOMPECS) (2016).                                                                                                 |
| [Col+17] | M. Colmant et al. "WattsKit: Software-Defined Power Monitoring of Distributed Systems". In: To be chosen. 2017.                                                                                                             |
| [CRS14]  | M. Colmant, R. Rouvoy, and L. Seinturier. "Improving the Energy Efficiency of Software Systems for Multi-Core Architectures". In: Middleware 2014 Doctoral Symposium. 2014.                                                 |
| [CRS15]  | M. Colmant, R. Rouvoy, and L. Seinturier. "Estimation de la consommation des systèmes logiciels sur des architectures multi-coeurs". In: Conférence d'informatique en Parallélisme, Architecture et Système (Compas). 2015. |
| [CRS17]  | M. Colmant, R. Rouvoy, and L. Seinturier. "codEnergy: an Approach For Leveraging Source-Code Level Energy Analysis". In: To be chosen. 2017.                                                                                |
| [Dol+15] | M. F. Dolz et al. "An analytical methodology to derive power models based on hardware and software metrics". In: Computer Science - Research and Development (2015).                                                        |
|          |                                                                                                                                                                                                                             |

### REFERENCES II

| [ERKU6]  | D. Economou, S. Rivoire, and C. Kozyrakis. Full-system Power Analysis and Modeling for Server Environments . In: In Workshop on Modeling Benchmarking and Simulation. 2006.                                                            |
|----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [Hav+ar] | A. Havet et al. "GENPACK: A Generational Scheduler for Cloud Data Centers". In: IEEE International Conference on Cloud Engineering (IC2E). 2017. (To appear).                                                                          |
| [IM03]   | C. Isci and M. Martonosi. "Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data". In: Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture. 2003.                           |
| [Int15]  | Intel Intel 64 and IA-32 Architectures Software Developer's Manual. 2015. URL: http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-1-manual.pdf (visited on 08/01/2016). |

[EDV06] D. Economou, C. Divoire, and C. Konyrakis, "Full System Bower Analysis and Modeling for Sonyer Environments", Inc. In

- [Kur+14] M. Kurpicz et al. How energy-efficient is your cloud app? Conférence d'informatique en Parallélisme, Architecture et Système (Compas) Poster session. 2014.
- [Nou14] A. Noureddine. "Towards a Better Understanding of the Energy Consumption of Software Systems". PhD thesis. Université des Sciences et Technologie de Lille Lille I, 2014.
- [RRK08] S. Rivoire, P. Ranganathan, and C. Kozyrakis. "A Comparison of High-level Full-system Power Models". In: Proceedings of the Conference on Power Aware Computing and Systems. 2008.
- [The08] The Climate Group. SMART 2020: Enabling the low carbon economy in the information age. 2008. URL: http://gesi.org/article/43 (visited on 09/23/2016).
- [Yan+14] H. Yang et al. "iMeter: An integrated VM power model based on performance profiling". In: Future Generation Computer Systems (2014).
- [Zha+14] Y. Zhai et al. "HaPPy: Hyperthread-aware Power Profiling Dynamically". In: Proceedings of the USENIX Annual Technical Conference. 2014.