-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Scaleway VM types #7
Comments
Hi, thank you for the feedback! Don't worry for the wall of text, I'm happy to see people interested in the project. I'll give try to answer your questions here, but for some of them it might be faster for you and me to connect somehow and discuss it live. Some introductory remarksThe ambition of Cloud Assess is to propose an executable version of official/standard methodology, namely the ADEME's PCR. It is based on a DSL, namely "LCA as Code". You can find the repo here; there are tutorials to learn about the language and a cli if you want to interact more easily with the models. Right now, Cloud Assess covers only one functional unit (VM) among eleven, and the current model is the result of a joint work with a local cloud partner. We started with a very simple configuration: a unique zone, all the physical servers are dedicated to running VMs, no reconditioning, only electricity impact for usage, etc. My answer below will refer to the current state (Jan. 2024) of the Cloud Assess models, but be aware that some design decisions may be revised soon. Indeed, we are now currently involved in a project to instantiate the whole PCR with more actors of the cloud industry. And, our models will certainly evolve (with breaking changes). We should have a better idea of where it will land by the end of march. Adding dataYou're right about filling the csv files with your data. We don't have the license to distribute emission factors. For physical equipments, Resilio has launched its new service Resilio DB. For electricity, one usually uses the data from Ecoinvent, but you need the appropriate license for that.
Virtual machines are not directly mapped to physical servers in the PCR. Instead, the servers are aggregated into a pool which acts as a "unique big abstract server" that provides ram and storage (I'll discuss the omission of vCPUs below). The Regarding the vCPUs, from our discussions with our partner, there seems to be no general consensus on what "1 vCPU" means quantitatively, so we omitted it for now. However, this is an ongoing topic of discussion, which we hope to settle soon.
The
Those specs might be necessary to compute the embodied impacts of the physical machines, but they are not really necessary in Cloud Assess. From Cloud Assess pov, it doesn't really matter how the embodied impacts are computed. We include the ram and storage capacity, because they are used to allocate the impacts for each client.
On this one I'm not sure I understand the question, or what you want to do. Let's discuss that point live. Querying Cloud Assess
As explained above, the approach in the PCR is to aggregate physical servers into pools, and then each VM is being mapped to a pool. That being said, because we only considered a unique pool so far, there is no way right now to specify a mapping of VM to "pool of servers" in the REST API. Clearly, we will need to do that in order to deal with multiple pools.
Noted, thanks. We do plan to include more relevant units in the API. I'm noting your request. General questions
From a development perspective, it is not really that difficult to take the load (in terms of CPU) into account. For now, the impact of a VM is allocated with respect to its RAM and storage usage. We could easily add the vCPU. However, as mentioned above, the issue with the vCPU is more about a consensus on a common definition. We will see what comes out of our discussions with the various players.
Indeed, the API for now only handles a granularity of 1 hour. It is relatively easy to include something like "MB_second" in the MemoryTimeUnitsDto (cf. openapi.yaml). |
Hi @pevab, thanks so much for the detail, much appreciated 🙏. Lots of very interesting topics!
Ok this is what I thought from looking at the code and PCR spec, but unfortunately the idea of all hardware in a data centre being in a single aggregated pool won't work for us. However, if we introduce the concept of multiple pools, to which we can map each type of VM (and functional unit in general), that would work. This is something that needs to be discussed at the PCR working group level.
That's a good point. It depends on the VM type; sometimes it's one vCPU per CPU thread, sometimes it's more than 1. See next point for why I think it's still useful.
This is where I think both CPU and GPU information would be important. We may have VM types that have the same RAM and storage capacity, but different allocations of CPU and/or GPU. We would then need to allocate the impacts for each client proportional to their usage of all the resources, and not just memory and storage. As you say, this is a consensus issue rather than a technical blocker, as the arithmetic will always be relatively simple. There needs to be a shared definition of how to allocate the impact of a server based on all resource types (CPU, GPU, RAM, SSDs, HDDs etc.), and whether that resource is just reserved, or actually used (and then under what load, as mentioned previously).
I think I understand this now given the context. I thought that The issue I see here is duplication, won't the columns like For example, in the current file there is the following row for
If I want to add 10
In this row, the values for Perhaps splitting the data into an
This was more a question about how (and whether) we could include open-source data from many providers in this repo. For example, if the VM types and impact factors for 10 different cloud providers was all put into Thanks again for the quick, detailed response. As I say, there are some issues here that need to be worked out at the PCR/ADEME level, so I'll contact you through those channels instead 😄 |
Hi @pevab, thanks for all your work on the project and for open-sourcing it, it looks great.
I'm doing a PoC of adding all the Scaleway VM types to the VM impact calculation, but running into a few snags.
First I'll describe how I'm framing the problem, and then I'll ask some specific questions on the implementation.
Framing the problem
All VMs are part of a family, and each family contains a number of different VMs, all with different resources. All the VMs in a given family run on a fixed type of base server.
For example, if we have an
alpha
family, we might have several VM types:alpha-small
(8GiB RAM, 2vCPU, 16GiB storage)alpha-medium
(16GiB RAM, 4vCPU, 32GiB storage)alpha-large
(32GiB RAM, 8vCPU, 64GiB storage)All of these will run on a single type of base server, let's call it
alpha-base
. Analpha-base
has 256GiB RAM, a 32 core processor (64 threads), and 512GiB of storage.The resources available on the base server will determine the max number of each type of VM that server can run. Assuming we allocate 1 vCPU to a CPU thread, and don't over-allocate memory, an
alpha-base
can run 16alpha-medium
VMs (256GiB memory on thealpha-base
divided by 16GiB of memory for thealpha-medium
, and the same ratio for CPU and storage).In each region, we run the same types of base servers, and offer the same VM families. For example, we might have 1000
alpha-base
servers running infrance-1
, and 500alpha-base
servers running innetherlands-1
. Analpha-medium
instance infrance-1
is exactly the same as analpha-medium
instance running innetherlands-1
, just with a different energy mix, in a different data center.Adding background data on electricity and base servers
To add the Scaleway VM types to
cloud-assess
, I would expect to do the following:trusted_library/background/electricity.csv
, with ageo
label specific to each data center/region, i.e. a row forfrance-1
, and a row fornetherlands-1
in our example.trusted_library/background/inventory.csv
, including their embodied impacts and lifespan, i.e. a row foralpha-base
in our example.This seems to be the case, I just have a few questions:
What is the need for the
n_items
value intrusted_library/background/inventory.csv
? Is this the total number of that server type we run? Why is this necessary? Whether we run 10 servers or 1000 servers, the impact of using any of those servers for any given workload is the same, isn't it?Why do we need to specify
geo
intrusted_library/background/inventory.csv
? As described above, the hardware is the same regardless of the region. If we run the same server type in 10 regions, will I have to duplicate the information over 10 lines in this file, with a differentgeo
value each time?Why do you not currently include CPU and GPU specs for the servers listed in
inventory.csv
?If we were to add data for many cloud providers, how can we split things up? Could we have a different subdirectory for
trusted_library/scaleway
, or perhaps add aprovider
column in each CSV?Querying
cloud-assess
With all the data for the electricity mix and base servers in place, I assume we would then submit a query for each VM type. For example, for the
alpha-medium
VM with 4vCPU, 16GiB RAM, and 32GiB storage we would submit the following query to get the impact of 1 hour's usage:This is almost what I see in the sample, but not quite:
How can I specify the base server type for each VM, i.e. how do I specify that an
alpha-medium
runs on analpha-base
server? I have put this inmeta.server
in the example, but I don't see this in the samples. Alternatively, this mapping could be expressed in the LCA-as-code, but I can't work out how to do that either.Currently, memory and storage units are in GiB. Would you consider reworking it to be MiB instead (and CPU to be mvCPU if added in future)? This would avoid using floating point numbers for smaller quantities, e.g. for VMs or serverless functions with memory/storage less than 1GiB.
General questions
Finally, I have a couple of general questions:
How can we take load into account? A VM running at 80% CPU usage will have a different impact to one running at 10% CPU usage, but I don't see this accounted for anywhere in the workload specification. Obviously, introducing this would require expressing/estimating the consumption curve of each component depending on load, so I understand that it's a big feature and not for now, just interested for the future.
Could the maximum time granularity be increased to something smaller than 1 hour, e.g. a minute or second? I know this is the smallest unit quoted in the PCR spec, but our usage data will be down to the second, especially for things like function-as-a-service.
Thanks again for all your work on this, and sorry for what I now realize is a wall of text 🙈. I would be very happy to write up the output of this conversation into a "Getting started for cloud providers" doc.
Thanks,
Simon
The text was updated successfully, but these errors were encountered: