# Embed simple calculations with SQL

## Scenario

The goal of this activity is to find the total number of bags of avocados sold on every day at each location using simple calculations with SQL.

## Dataset

I will be using the publicly available `Avocado Prices` dataset from [Kaggle](https://www.kaggle.com/datasets/neuromusic/avocado-prices) for this analysis. The dataset can also be viewed in [Google Sheets](https://drive.google.com/file/d/1x3M5kozxauTHfafBra3imlGJUTcOGVk7/view?usp=drive_link) or the [.csv file](/activities/sql/c05m04-simple-calculations/c05m04-avocado-data.csv).

## Preparation

The headings used in the .csv file contains spaces, which is not in line with SQL Best Practices and will be problematic when writing queries on the dataset. I make the necessary changes to the .csv file to update the headings accordingly:

| Old Heading | Updated Heading |
| --- | --- |
|  | ID |
| Date | Date |
| AveragePrice | AveragePrice |
| Total Volume | Total_Volume |
| 4046 | 4046 |
| 4225 | 4225 |
| 4770 | 4770 |
| Total Bags | Total_Bags |
| Small Bags | Small_Bags |
| Large Bags | Large_Bags |
| XLarge Bags | XLarge_Bags |
| type | type |
| year | year |
| region | region |

## Importing the data in BigQuery

The following steps are followed to import the avocado data to BigQuery:

- **Create dataset** with **Dataset ID** `avocado_data`
- In the **Dataset info** window, select the **CREATE TABLE** button
- In the **Source** section, select the ***Upload*** option in **Create table from**
- Browse to the `c05m04-avocado-data.csv` file and open
- Set the file format to `.csv`
- In the **Destination** section, name the table as `avocado_prices`
- In the **Schema** section, select **Auto detect**
- Finally, select **Create table**

A new table `avocado_prices` has been created and appear in the explorer pane under the database `avocado_data`. A preview of the BigQuery table is shown below:

![BigQuery table for avocado prices](c05m04-avocado-data-bigquery.png 'BigQuery table for avocado prices')

## Query: Verify the total number of bags

I execute the following query to verify that the total of small, large and extra-large bags is equal to the Total_Bags field in the dataset:

In [None]:
SELECT
	Date,
	Region,
	Small_Bags,
	Large_Bags,
	XLarge_Bags,
	Total_Bags,
	-- Calculate total bags to compare to Total_Bags field
	ROUND(Small_Bags + Large_Bags + XLarge_Bags, 2) AS Total_Bags_Calc
 FROM
    `plucky-aegis-427011-v5.avocado_data.avocado_prices`
 LIMIT 20;

The results of the query added a calculated field of the total bags to compare against the Total_Bags field to verify as shown below:

![Query results to verify total bags](c05m04-query-verify-total-bags.png 'Query results to verify total bags')

## Query: Calculate the percentage of small bags

It may be useful for decision making purposes to be able to show stakeholders what percentage of the total bags were, for example, small bags. To obtain this information, I execute the following query:

In [None]:
SELECT
	Date,
	Region,
	Total_Bags,
	Small_Bags,
	-- Calculate percentage of total bags that are small bags
	ROUND((Small_Bags / Total_Bags)*100, 2) AS Small_Bags_Percent
FROM
	`plucky-aegis-427011-v5.avocado_data.avocado_prices`
WHERE
	Total_Bags <>0
LIMIT 20;

My query successfully returns a table with the calculated field Small_Bags_Percent as shown below:

![Query results of percentage small bags](c05m04-query-small-bags-perc.png 'Query results of percentage small bags')