generated from Green-Software-Foundation/if-plugin-template
-
Notifications
You must be signed in to change notification settings - Fork 1
/
common-crawl.yml
76 lines (75 loc) · 2.58 KB
/
common-crawl.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
name: Common Crawl s3 estimation
description: |
An attempt to replicate the carbon results for the Common Crawl dataset on s3.
As described in article here: https://www.linkedin.com/pulse/environmental-impact-cloud-common-crawl-case-study-julien-nioche-at8xf/?trackingId=XSEL7VNgQlywEWznDFxHZw%3D%3D
The dataset is 7.9 PB in size, or 7,900,000 GB for Impact Framework input unit.
The data was stored from April 2021 to December 2023, 33 months or 86,832,000 seconds for Impact Framework input unit.
The carbon impact of this public dataset alone is yet to be reported.
tags: null
initialize:
plugins:
cloud-storage-metadata:
method: CloudStorageMetadata
path: 'if-carbon-hack-plugin'
storage-energy:
method: StorageEnergy
path: 'if-carbon-hack-plugin'
carbon-intensity:
method: Multiply
path: '@grnsft/if-plugins'
global-config:
input-parameters:
- storage/energy
- grid/carbon-intensity
output-parameter: operational-carbon
drive-size-to-resources:
method: Coefficient
path: '@grnsft/if-plugins'
global-config:
input-parameter: storage/drive-size
coefficient: 1
output-parameter: resources-total
data-to-resources:
method: Multiply
path: '@grnsft/if-plugins'
global-config:
input-parameters:
- storage/data-stored
- storage/replication-factor
output-parameter: resources-reserved
embodied-carbon:
method: SciM
path: '@grnsft/if-plugins'
ramboll-resilio:
method: RambollResilio
path: 'if-carbon-hack-plugin'
tree:
children:
ramboll-method:
pipeline:
- ramboll-resilio
- carbon-intensity
inputs:
- timestamp: '2021-04-01 00:00:00'
duration: 86832000
storage/data-stored: 7900000
grid/carbon-intensity: 396
component-method:
pipeline:
- cloud-storage-metadata
- storage-energy
- carbon-intensity
- drive-size-to-resources
- data-to-resources
- embodied-carbon
inputs:
- timestamp: '2021-04-01 00:00:00'
duration: 86832000
cloud/vendor: aws
cloud/service: s3
storage/drive-size: 10000
storage/drive-power: 6.5
storage/data-stored: 7900000
grid/carbon-intensity: 396
device/emissions-embodied: 200000 # gCO2e for a HDD - estimated from https://www.networkworld.com/article/957264/are-hdds-greener-than-ssds.html
device/expected-lifespan: 157680000 # 5 years in seconds