Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of cluster costs - Azure #414

Closed
ghost opened this issue Apr 23, 2020 · 47 comments
Closed

Out of cluster costs - Azure #414

ghost opened this issue Apr 23, 2020 · 47 comments
Assignees
Labels
good first issue Good for newcomers

Comments

@ghost
Copy link

ghost commented Apr 23, 2020

We are using Kubecost with an Azure Kubernetes Cluster and we would like to track the costs of Azure specific out of cluster resources (e.g. databases) with Kubecost as well.
I checked the documentation, but currently there is no mention of out of cluster cost allocation/tracking for Azure.

Is this not yet implemented/supported or just the documentation missing?

@dwbrown2
Copy link
Contributor

Hi @Zenmodai! This is currently on our roadmap. We're actively discussing prioritization on this feature now. Reach out to us at team@kubecost.com if you want to discuss timeline/priority.

@ghost
Copy link
Author

ghost commented Jun 19, 2020

Hi @dwbrown2 Maybe you can give an update here as well. When is this feature currently planned in the roadmap? I am sure others, who use Azure, would be interested in this as well.

@dwbrown2
Copy link
Contributor

@Zenmodai this effort has been scoped but under development. We will share an update when it's completed, but please feel free to reach out to our team if this is a high priority for you.

@AjayTripathy
Copy link
Contributor

I've filed opencost/opencost#450 to track in the other repo

@dwbrown2 dwbrown2 added the good first issue Good for newcomers label Jun 19, 2020
@pierluigilenoci
Copy link

pierluigilenoci commented Jan 22, 2021

Any news about this? @dwbrown2 @AjayTripathy

@dwbrown2
Copy link
Contributor

Current expectation is that this work will be started in our next sprint!

@sylus
Copy link

sylus commented Feb 1, 2021

Happy to be an evaluator for this if need anyone to take a look :D

@dwbrown2
Copy link
Contributor

dwbrown2 commented Feb 1, 2021

Thanks, William! We currently have @Sean-Holcomb looking at this now. Working to find the right integration points with the Azure team. Let us know if anyone has a suggested contacts!

@dwbrown2
Copy link
Contributor

This feature just launched in Beta in our v1.74 release! Initial documentation here. Reach out to us at team@kubecost.com if you want to learn more or if we can help in anyway.

@pierluigilenoci
Copy link

@pierluigilenoci
Copy link

@dwbrown2 I configured the integration and Kubecost is able to gather cost from the subscription of the cluster.

During parsing, however, I noticed a very high number (many thousands) of error messages logs like these:

E0211 15:48:20.848889       1 log.go:17] [Error] Could not parse item tags invalid character ':' after top-level value
E0211 15:48:20.848907       1 log.go:17] [Error] Could not parse item tags invalid character ':' after top-level value
E0211 15:48:20.848925       1 log.go:17] [Error] Could not parse item tags invalid character ':' after top-level value
E0211 15:48:20.848945       1 log.go:17] [Error] Could not parse item tags invalid character ':' after top-level value
E0211 15:48:20.848961       1 log.go:17] [Error] Could not parse item tags invalid character ':' after top-level value
E0211 15:48:20.848976       1 log.go:17] [Error] Could not parse item tags invalid character ':' after top-level value
E0211 15:48:20.849000       1 log.go:17] [Error] Could not parse item tags invalid character ':' after top-level value
E0211 15:48:20.849016       1 log.go:17] [Error] Could not parse item tags invalid character ':' after top-level value
E0211 15:48:20.849028       1 log.go:17] [Error] Could not parse item tags invalid character ':' after top-level value
E0211 15:48:20.849043       1 log.go:17] [Error] Could not parse item tags invalid character ':' after top-level value
E0211 15:48:20.849065       1 log.go:17] [Error] Could not parse item tags invalid character ':' after top-level value
E0211 15:48:20.849090       1 log.go:17] [Error] Could not parse item tags invalid character ':' after top-level value
E0211 15:48:20.849111       1 log.go:17] [Error] Could not parse item tags invalid character ':' after top-level value
E0211 15:48:20.849125       1 log.go:17] [Error] Could not parse item tags invalid character ':' after top-level value
E0211 15:48:20.849144       1 log.go:17] [Error] Could not parse item tags invalid character ':' after top-level value

@ghost
Copy link
Author

ghost commented Feb 11, 2021

@pierluigilenoci What did you do for the integration? I set up the export of the cost report and created the secret, added the secret name to the deployment, but Kubecost is not showing the out of cluster Azure costs in the dashboard. There are also no log messages regarding the reading of the reports or any errors, that something is wrong. Did you do something else?

@pierluigilenoci
Copy link

I created the secret, added the following option in the helm chart, and created the export following this guide and this guide.

azureStorageSecretName: "azure-storage-config"

@ghost
Copy link
Author

ghost commented Feb 11, 2021

Hmm... strange, I followed the same guides, but nothing seems to happen regarding this new feature.

@evertonmc
Copy link

Hmm... strange, I followed the same guides, but nothing seems to happen regarding this new feature.

same here

@Sean-Holcomb
Copy link
Contributor

@evertonmc @Zenmodai Were you able to verify the creation of the first export file on your Azure Storage account. Additionally do you still see the banner at the top of your Assets page about OOC not being set up?

@Sean-Holcomb
Copy link
Contributor

@pierluigilenoci from what I can tell you have some nested JSON in your tags column, can you confirm and if possible give me an example

@Sean-Holcomb
Copy link
Contributor

@pierluigilenoci it doesn't actually seem to be nested JSON but malformed JSON. I was only able to replicate the exact error you are getting with a trailing ':' on one of my rows tags following the closing bracket. If this is not the case for you, an example of a failing row would be instructive.

@andreb89
Copy link

andreb89 commented Feb 12, 2021

@Sean-Holcomb
We are one step further, after recreating the Azure cost export. But we are now getting parse errors as well in the cost-model container log.

E0212 07:21:11.907060 1 log.go:17] [Error] failed to parse usage date: <InvoiceSectionName or DepartmentName>
E0212 08:53:57.798240 1 azureprovider.go:969] failed to parse usage date: <InvoiceSectionName or DepartmentName>

InvoiceSectionName/DepartmentName seems to be our company name for the invoice. I am not sure why it is trying to read the InvoiceSectionName/DepartmentName instead of the UsageDateTime column here.

So it seems, that always the first column is used. Even after manually moving the UsageDateTime column to the first position, then another parse error is thrown

I0212 09:57:43.906438 1 azureprovider.go:979] failed to parse cost: '2021-02-11'

Is there something wrong in how the columns are parsed? Or is there still something wrong with our reports? Maybe you can provide an example command with parameters az costmanagement export create on how to create the necessary report with the options.

@pierluigilenoci
Copy link

@Sean-Holcomb I assume you're talking about the export CSV file.
Our tags are like this, no nested JSON. I checked all rows.

"""akelius_environment"": ""devops"",""akelius_project"": ""k8s"",""akelius_team"": ""ops"",""aksEngineVersion"": ""v0.47.0-aks-gomod-at-aks"",""creationSource"": ""aks-aks-system201116-28105498-vmss"",""orchestrator"": ""Kubernetes:1.18.8"",""poolName"": ""system201116"",""resourceNameSuffix"": ""28105498"""

I am not able to distinguish which lines are generating the problem because the logs give no indication about it.

The only JSON inside the report is like this:

"{  ""ResourceType"": ""Bandwidth"",  ""UsageResourceKind"": ""DataTrOut:Linux_IaaS_Canonical|Standard_D2s_v3|b71f870a-6d7f-4fe9-b2aa-92bce2ad8a9e|4fd886c4-ae05-4d22-8b3a-3bd7085ab73e:Linux:Running"",  ""DataCenter"": ""AMZ07"",  ""NetworkBucket"": ""External"",  ""PipelineType"": ""v1""}"

What else can I do to help you debug the problem?

@Sean-Holcomb
Copy link
Contributor

@pierluigilenoci Given your error message we can rule out the nested JSON issue, I am specifically talking about the JSON in the "Tags" column of your CSV. The JSON is you provided is well formed so shouldn't be causing an issue. Specifically I am looking for rows that do not have a "MeterCategory" of "Virtual Machines" or "Storage". To replicate the error that you showed I had to add JSON to the "Tags" column which looked like this.
"{ ""ResourceType"": ""Bandwidth"", ""UsageResourceKind"": ""DataTrOut:Linux_IaaS_Canonical|Standard_D2s_v3|b71f870a-6d7f-4fe9-b2aa-92bce2ad8a9e|4fd886c4-ae05-4d22-8b3a-3bd7085ab73e:Linux:Running"", ""DataCenter"": ""AMZ07"", ""NetworkBucket"": ""External"", ""PipelineType"": ""v1""}:"
Note the trailing colon. This is not necessarily the only way to trigger that exact error message, but a possibility. Either way, I am hoping to figure out why Azure would be putting malformed JSON into CSV or why our JSON parser is failing.

A final note, this error is non fatal to the row that is being looked at it just prevents the cost from receiving labels that are generated from tags that you have set. That being said the costs of these rows are still showing up in your report

@Sean-Holcomb
Copy link
Contributor

@andreb89 The CSV parser should be agnostic to column ordering. It creates a map of the headers to column number. So when it looks for the "UsageDateTime" it is looking for the column with that string in the header. That being said does your CSV have any additional rows at the top of the file? Additionally the Algorithm grabs the most recent CSV in each month folder. It sounds like you tried uploading a modified file to the folder, is that correct?

@andreb89
Copy link

@Sean-Holcomb The top of our exported CSV looks as follows:
Name der Abteilung (DepartmentName),Name des Kontos (AccountName),ID des Kontobesitzers (AccountOwnerId),Abonnement-GUID (SubscriptionGuid),Name des Abonnements (SubscriptionName),Ressourcengruppe (ResourceGroup),Ressourcenstandort (ResourceLocation),UsageDateTime (UsageDateTime),ProductName (ProductName),Kategorie der Verbrauchseinheit (MeterCategory),MeterSubcategory (MeterSubcategory),ID der Verbrauchseinheit (MeterId),Name der Verbrauchseinheit (MeterName),Region der Verbrauchseinheit (MeterRegion),Maßeinheit (UnitOfMeasure),UsageQuantity (UsageQuantity),Ressourcensatz (ResourceRate),PreTaxCost (PreTaxCost),Kostenstelle (CostCenter),Genutzter Dienst (ConsumedService),ResourceType (ResourceType),InstanceId (InstanceId),Tags (Tags),OfferId (OfferId),Zusätzliche Informationen (AdditionalInfo),Dienstinformation 1 (ServiceInfo1),Dienstinformation 2 (ServiceInfo2)

So initially we already got the parse error, so next I tried to manually adjust the CSV, but it didn't really help, since then I got another parse error. So something in general seems to not work here. The unedited CSV export does not work and results in a parse error:

E0212 07:21:11.907060 1 log.go:17] [Error] failed to parse usage date: <DepartmentName>

@andreb89
Copy link

andreb89 commented Feb 15, 2021

@Sean-Holcomb In addition to the bug @pierluigilenoci already mentioned before, I found another problem.

@Sean-Holcomb trailing comma left here: https://github.com/kubecost/cost-analyzer-helm-chart/blob/master/cost-analyzer/templates/azure-storage-config-secret.yaml#L16 (JSON is not well-formed).

When you specify a secret for the azure storage name (azureStorageSecretName), but also use serviceKeySecretName or createServiceKeySecret, then in cost-analyzer-deployment-template.yaml - lines 259 and 263 you are mounting the /var/secrets path twice, which results in the following deployment error:

##[error]Error: Deployment.apps "kubecost-cost-analyzer" is invalid: spec.template.spec.containers[0].volumeMounts[2].mountPath: Invalid value: "/var/secrets": must be unique

@Sean-Holcomb
Copy link
Contributor

Sean-Holcomb commented Feb 15, 2021

@andreb89 Okay that makes a lot of sense, unfortunately it looks like both the naming and row ordering is not going to be consistent between locales and Azure account types, so that does pose a challenge to me. As for the error message it just seems like it is unsuccessful in finding the UsageDateTime column which is strange because you listed that column as having the same name. The error is being generated by this

usageDateTime, err := time.Parse(AzureLayout, record[headerMap["UsageDateTime"]])
if err != nil {
	log.Errorf("failed to parse usage date: '%s'", record[headerMap["UsageDateTime"]])
	continue
}

Where AzureLayout is "2006-01-02" and record is a []string for the CSV row and headerMap is map[string]int which matches column name to column number. The specific column headers that I am using as of right now are "MeterCategory", "UsageDateTime", "InstanceId", "AdditionalInfo", "Tags", "PreTaxCost", "SubscriptionGuid", "ConsumedService" and "ResourceGroup". If you can see if using these exact headers help, and I will think about a long term solution for this issue.

"So initially we already got the parse error, so next I tried to manually adjust the CSV, but it didn't really help, since then I got another parse error."
What was the other parse error?

As for your second comment that is meant as an alternate method of configuration that is still a WIP thank you for pointing it out though. For now just stick with the one configuration outlined here

Please let me know if you are seeing any other error messages or have any other insights you think I should know based off of what I have told you here.

@Sean-Holcomb
Copy link
Contributor

@andreb89 "Name der Abteilung (DepartmentName)", did you add the text in parentheses or is that how it shows up in the header?

@andreb89
Copy link

andreb89 commented Feb 15, 2021

@Sean-Holcomb This is how it shows up. The export job was created with Azure CLI. If I create the export with the portal, then I do not have the German column headers, but then I cannot create exports which include the UsageDateTime. The different types of reports I can create from the portal do not have this column.
So I can only use the exports from the job created by Azure CLI and those have the German and English header columns.

@Sean-Holcomb
Copy link
Contributor

@andreb89 Ok that is great news, I will start working on a fix for you

@pierluigilenoci
Copy link

@Sean-Holcomb I will share the complete CSV via Slack.

@andreb89
Copy link

@Sean-Holcomb I tried what you suggested and manually removed all header columns, which you currently aren't using and also edited the ones you are using to fit your header titles, e.g. "Abonnement-GUID (SubscriptionGuid)" => "SubscriptionGuid"

Afterwards there were no further parse errors for the header titles themselves, rather I am now getting the same error as mentioned before:
E0216 07:43:53.838875 1 log.go:17] [Error] Could not parse item tags invalid character ':' after top-level value

An example of a tag value is: "created-by": "azure"
Maybe the quotation marks are the problem in the parsing of the tags?

But now we know, that the locales of the headers and the ordering is currently the problem for our default generated reports. After manually changing the report headers, at least I can now see the data in the kubecost asset dashboard.

@Sean-Holcomb
Copy link
Contributor

@andreb89 currently looking at @pierluigilenoci 's CSV that he sent on Slack and seeing that none of the strings in his tags column have {} around the outside, which seems to be the things casing the JSON parser issues. Is that how your tags column looks also?

@andreb89
Copy link

@Sean-Holcomb Yes, our tags column looks the same. All look like this: "environment": "staging","project": "pau" without any {} around the outside.

@Sean-Holcomb
Copy link
Contributor

Fixes have been merged and will be in the next release. Please let me know if you have any additional feedback.

@pierluigilenoci
Copy link

@Sean-Holcomb I found a new bug. #783
Could you please fix this before the new release?

@pierluigilenoci
Copy link

@Sean-Holcomb now the CSV appears to be digested correctly. I'll send you a copy of the logs via Slack.

@andreb89
Copy link

andreb89 commented Feb 25, 2021

@Sean-Holcomb I found a new bug. Currently, if you create the azure-storage-config secret with the helm chart, the secret is only added to the volumes (see lines: 107-111 in cost-analyzer-deployment-template.yaml). But the secret is not mounted in the actual container deployment. The else if {{- else if .Values.kubecostProductConfigs.azureStorageCreateSecret }} is missing for the volumeMounts section (line 260).

On another note, maybe you could also add some more log messages for the Azure OOC feature, that will show up in the cost-model container, e.g. when the storage is searched for csv, when csv reports are parsed maybe with filename, etc.
Also at which point is the storage searched for new reports and when are the reports parsed, every hour or what time interval?

Another point that I noticed in the Assets dashboard.
In our exported report we have costs for every day of the month/week. Only on the weekends the cluster is destroyed and Monday morning recreated. But as you can see in the graph below something doesn't add up. For today 25th there is nothing at all shown. For yesterday 24th, there is just the Kubernetes part shown, but nothing else. Also the weekend was the 20th and 21st, but in the graph there is no data for the 19th (Friday) and 20th (Saturday). Even if the cluster is destroyed for the weekend, there are still other out-of-cluster costs e.g. like databases. Can they still be shown for those times, even if there was no cluster, but those other costs are still in the csv report? Is there a mix-up of the days, since the data for yesterday 24th is not completely shown and for today 25th nothing at all is shown, even though I made a manual export a couple of hours ago to already include parts of the costs for today?
Screenshot from 2021-02-25 16-18-47

@Sean-Holcomb
Copy link
Contributor

@andreb89 I have merged your suggested fix on the storage config so the next version should support that method of configuration.

I will look into adding more logging for the CSV processing as you have suggested. The ETL runs every 3 hours if you want to trigger a rebuild you can use the endpoint /model/etl/assets/rebuild?window=all&commit=true.

In terms of showing OOC data on days which do not have in cluster data, that feature is currently unavailable, but it is a known behavior.

@andreb89
Copy link

andreb89 commented Feb 26, 2021

@Sean-Holcomb Thanks for the endpoint, it is a great help during testing.
But the issue still remains, that the data/graph that is shown in the asset dashboard is unclear. Like I mentioned in the post above, as you can see here:
Screenshot from 2021-02-26 15-02-05
This is from this morning, after I manually exported a newly created cost report. So the data for yesterday the 25th should be complete. But what the dashboard shows is only part of the cost for yesterday 25th, specifically only Kubernetes, no out of cluster costs. That's why I asked, if maybe there is something wrong here regarding the dates? I checked the reports, there are no cost entries already for the 26th (today), which makes sense I guess. So maybe what is shown in the graph as costs for 25th should be 26th (today) and costs shown for 24th should be for 25th etc.? Since for today only Kubernetes costs are available, but no other out of cluster costs yet.

I hope you can clear this up. Maybe there is actually still a bug here, since something doesn't add up.

@AjayTripathy
Copy link
Contributor

Hi @andreb89 -- I believe we set the date ranges on our end to skip data less than 2d old because it may be incomplete. Regarding what day is shown when, all data should be in UTC.

@andreb89
Copy link

andreb89 commented Mar 2, 2021

@AjayTripathy But that is a strange behaviour. If I look at the costs, I wouldn't expect that the last 2 days are missing. I am not sure why the data from one day ago should not be complete, but even if it isn't, it should still be shown. Otherwise some of the time range options don't really make sense.

@dwbrown2
Copy link
Contributor

dwbrown2 commented Mar 4, 2021

@andreb89 I believe Sean and Ajay did this because Azure can provide partial data during this 48-hour time window.

@Sean-Holcomb I think it could be ok to display partial assets data during this window. We however would not want to do reconciliation on partial data because this is likely to skew metrics heavily.

Thoughts?

@Sean-Holcomb
Copy link
Contributor

@dwbrown2 That is exactly right, and that is how it is currently functioning with OOC showing everything that is available in the most current report and reconciliation excluding the partial data because of the skew that it causes.

@andreb89 I think that you are experience 2 issues, a time shift on the in cluster costs due to the data being displayed in UTC, you can see this in your weekend being offset in your earlier post. The other is like you suggested the Azure data might not be syncing properly with the k8s generated data. If you could give me some more information that might be helpful for solving this issue. For starters what timezone are you in? If you are willing to provide one of the export CSV and the time it was generated along with matching assets page, that would be helpful too, feel free to reach out to me on slack.

@andreb89
Copy link

@Sean-Holcomb My timezone is CET (Central European Time). Currently I also use the parameter kubecostModel.utcOffset to set "+01:00" for the timezone. I hope this helps. Regarding the cost export, I don't think I can share it at the moment.

@andreb89
Copy link

@dwbrown2 Could please elaborate on why it takes 48 hour to display the complete Azure out of cluster costs? I haven't found any information about this.

@Sean-Holcomb I just updated to the new version 1.76.0 and at least for me, the problem regarding the time shift in the displayed cluster costs remains. I am not sure what the problem is. You mentioned, that the data being displayed is in UTC, why should this be a problem and result in a time shift of the costs? I am using the parameter to offset UTC and in the reports there are only days specified. Is there something else that I have to configure regarding timezones?

For further information, I added a more specific example. The following image should display the Asset costs from 8th March to 12th March (today). But it shows from 7th - 11th. At the far right there is an empty column, which I assume should be for the 12th, but it is completely empty. So is there something missing in the configuration or is this behavior a bug?
Screenshot from 2021-03-12 10-23-38

@Sean-Holcomb
Copy link
Contributor

@andreb89 In response to your first question. The 48 hour window is only for the adjustment column on in cluster costs. That is because cost data takes a day to be exported and additionally there is a day when the costs for the most recent day are incomplete we wait a full 48 hours before trying to use them to adjust Kubecost's in cluster estimates. The reasoning here is that if the cost data for the day is incomplete, it will over adjust prices for that day downward. For OOC costs there is no such window, the most recent costs in the exported cost CSV are displayed whenever Kubecost pulls in that data.

Given your timezone the offset you are seeing is definitely a bug I have created an issue for you here #816. If you feel like I have missed something please add it in.

@dwbrown2
Copy link
Contributor

Now that we've confirmed the core functionality works, @Sean-Holcomb shall we close this in favor or #816?

@Sean-Holcomb
Copy link
Contributor

Sean-Holcomb commented Mar 13, 2021

@dwbrown2 sounds good.

@ all please tag me in any additional Azure issues you create, and hopefully I can be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

7 participants