In [5]:
import pandas as pd
import seaborn as sns

In [6]:

file_name = "Coffee_Chain_Sales .csv"
df = pd.read_csv(file_name)

In [12]:
# Going to do some basic data exporation
# 1.) Look at the columns to see what is available in the data
display(df.columns)

Index(['Area Code', 'Cogs', 'DifferenceBetweenActualandTargetProfit', 'Date',
       'Inventory Margin', 'Margin', 'Market_size', 'Market', 'Marketing',
       'Product_line', 'Product_type', 'Product', 'Profit', 'Sales', 'State',
       'Target_cogs', 'Target_margin', 'Target_profit', 'Target_sales ',
       'Total_expenses', 'Type'],
      dtype='object')


**COLUMN INFORMATION**

1.**Area Code**: A unique identifier for different geographical areas or regions where the coffee chain operates.

2.**COGS** (Cost of Goods Sold): The total cost incurred by the coffee chain in producing or purchasing the products it sells.

3.**Difference between Actual and Target Profit**: This attribute indicates how well the company performed in terms of profit compared to its target. It reflects the financial performance against predefined goals.

4.**Date**: The date of sales transactions, which allows for time-based analysis of sales trends and patterns.

5.**Inventory Margin**: The difference between the cost of maintaining inventory and the revenue generated from selling those inventory items.

6.**Margin**: The profit margin, which is the percentage of profit earned from sales. It's a critical financial metric.

7.**Market Size**: Information about the size of the market in each area, helping to understand the potential customer base and market dynamics.

8.**Profit**: financial gain achieved by the company after deducting the cost of goods sold (COGS) and other expenses from the revenue generated through sales.

9.**Sales**: represent the revenue generated from the coffee chain's products, reflecting its financial performance and customer demand.



In [8]:
# We can see what type of information the dataset gives us now through head()
# 2.) Now I will find out more basic information about the data
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1062 entries, 0 to 1061
Data columns (total 21 columns):
 #   Column                                  Non-Null Count  Dtype 
---  ------                                  --------------  ----- 
 0   Area Code                               1062 non-null   int64 
 1   Cogs                                    1062 non-null   int64 
 2   DifferenceBetweenActualandTargetProfit  1062 non-null   int64 
 3   Date                                    1062 non-null   object
 4   Inventory Margin                        1062 non-null   int64 
 5   Margin                                  1062 non-null   int64 
 6   Market_size                             1062 non-null   object
 7   Market                                  1062 non-null   object
 8   Marketing                               1062 non-null   int64 
 9   Product_line                            1062 non-null   object
 10  Product_type                            1062 non-null   object
 11  Prod

In [9]:
# This brings out the data types, and we can see that there are no null values either
df.describe()

Unnamed: 0,Area Code,Cogs,DifferenceBetweenActualandTargetProfit,Inventory Margin,Margin,Marketing,Profit,Sales,Target_cogs,Target_margin,Target_profit,Target_sales,Total_expenses
count,1062.0,1062.0,1062.0,1062.0,1062.0,1062.0,1062.0,1062.0,1062.0,1062.0,1062.0,1062.0,1062.0
mean,587.030132,82.399247,0.387006,815.175141,102.423729,30.433145,60.556497,191.049906,71.676083,96.817326,60.169492,168.493409,53.836158
std,225.299162,64.824295,44.33118,916.156386,91.286704,25.963448,100.516593,148.270317,65.701583,89.467176,77.824869,145.955171,31.703526
min,203.0,0.0,-369.0,-3534.0,-294.0,0.0,-605.0,21.0,0.0,-210.0,-320.0,0.0,11.0
25%,425.0,41.0,-15.0,447.0,51.0,13.0,16.25,98.0,30.0,50.0,20.0,80.0,33.0
50%,573.0,57.0,-3.0,659.0,73.0,22.0,39.5,133.0,50.0,70.0,40.0,120.0,46.0
75%,774.0,101.0,13.0,968.0,130.0,40.75,87.0,227.0,90.0,120.0,80.0,210.0,66.0
max,985.0,294.0,249.0,8252.0,526.0,122.0,646.0,815.0,380.0,580.0,470.0,960.0,156.0


In [13]:
# This shows some key metrics and distribution information
# Finally, ill use .head() to give us a sample of some of the data points
# so that we can get a better understanding of what we are working with

df.head()

Unnamed: 0,Area Code,Cogs,DifferenceBetweenActualandTargetProfit,Date,Inventory Margin,Margin,Market_size,Market,Marketing,Product_line,...,Product,Profit,Sales,State,Target_cogs,Target_margin,Target_profit,Target_sales,Total_expenses,Type
0,303,51,-35,10/1/2012,503,71,Major Market,Central,46,Leaves,...,Lemon,-5,122,Colorado,30,60,30,90,76,Decaf
1,970,52,-24,10/1/2012,405,71,Major Market,Central,17,Leaves,...,Mint,26,123,Colorado,30,60,50,90,45,Decaf
2,409,43,-22,10/2/2012,419,64,Major Market,South,13,Leaves,...,Lemon,28,107,Texas,30,60,50,90,36,Decaf
3,850,38,-15,10/3/2012,871,56,Major Market,East,10,Leaves,...,Darjeeling,35,94,Florida,40,60,50,100,21,Regular
4,562,72,6,10/4/2012,650,110,Major Market,West,23,Leaves,...,Green Tea,56,182,California,20,60,50,80,54,Regular


In [19]:
# I am curious as to how many different area codes there are

print("Total rows of data:", len(df))
print("Unique area codes:", len(df['Area Code'].unique()))

Total rows of data: 1062
Unique area codes: 149


**Now I am going to ask some questions about the data**
    - Which areas have the largest market size for coffee?
    - During what 5 day period was the market for coffee the largest?
    - In any area or day, what was the most 
