# **Mini Business Case Study - Sales Analytics for EcoTrek Solutions**

**Case Background**

You are a new hire at EcoTrek Solutions, a U.S.-based company that specializes in eco-friendly travel accessories. As a business data analyst, your role involves analyzing sales trends and providing insights to help shape the company's marketing and production strategies.

**Correlation Analysis Assignment (Due: TUESDAY, 9/2, 11:59 pm.)**

After your first successful report, EcoTrek Solutions asks you to analyze last year's sales of its two other products: GreenTote and SolarTrek Water Bottle.  The company suspects there might be underlying factors influencing sales, but they are unsure what these might be. The company hopes to come up with some strategy to help with the sales of the products for the new year.  

Your task is to investigate the data to uncover any interesting correlations or trends. To assist in your analysis, the company provides the following monthly data from January to December 2024:

|Month	|Temperature (°F)	|GreenTote	|SolarTrek Water Bottle
|---|---|---|---|
|January	|40|	87,500	|80,900
|February |	45|	100,625	|85,200
|March	|55|	115,725|	92,500
|April	|65	|132,075	|100,980
|May	|75|	148,875|	110,500
|June	|85|	164,500 |	118,300
|July |	90	|172,725	|124,800
|August	|88|	185,800|	130,500
|September|86|180,900| 128, 800
|October| 80|170,000|120, 000
|November|70|165,000| 114, 000
|December|60|160,000|110, 000

**Tasks for you:**

Write Python code to accomplish the following:
1. Dictionary creation: Create three dictionaries, Temperature, GreenTote, and SolarTrek Water Bottle, with the month (e.g., 'January') as the key and the respective values as the data (temperature or sales revenue).

2. Using the created dictionaries, print the month has the highest sales and one that has the lowest sales for the two products; and print the month with the highest and lowest temperatures.

3. Calculate Pearson correlation coefficent: use scipy.stats.pearsonr(array A, array B) function to calculate the Pearson coorelation coefficient between:
- GreenTote sales and SolarTrek Water Bottle sales.
- Temperature and GreenTote sales.
- Temperature and SolarTrek Water Bottle sales.

4. Based on the result, you will choose one of the following three options to make your recommendation to the company:

- Option 1: Develop a joint marketing campaign for the two products.
- Option 2: Develop separate marketing campaigns for each product.
- Option 3: Explore external factors (e.g., weather, customer preferences) for future analysis.

**What to Submit**
1. Your Jupyter Notebook file named **YourLastName_FirstInitial_2.ipynb**.
2. Please provide comments in the Code cell to explain what your code does in that cell and run your code before submitting it.

Note:
Code that cannot run will receive 0

# **Week 3 Pandas**



**Week 3 Preview**


In [1]:
# obtain pandas package by import

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}

df = pd.DataFrame(data)
print(df)


      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


In [2]:
# create another data frame that is a subset of df: it keeps data that has age over 28
filtered_df = df[df['Age'] > 28]
print(filtered_df)


      Name  Age         City
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


In [3]:
# create another data frame that is a subset of df: it keeps data that is not in Chicago

# Your code starts here

df_not_chi = filtered_df = df[df['City'] != 'Chicago']
print(df_not_chi)

# Your code ends here

    Name  Age         City
0  Alice   25     New York
1    Bob   30  Los Angeles


In [4]:
df['Age'] = [30, 28, 32]
print (df)

      Name  Age         City
0    Alice   30     New York
1      Bob   28  Los Angeles
2  Charlie   32      Chicago


In [5]:
# Change the ages of two people in the subset dataframe filtered_df to [66, 44] and print it

# Your code starts here

# This produces a warning because of how we init filtered_df
# The professor said that the warning is fine here (today).
#filtered_df['Age'] = [66, 44]

# This is the preferred way (doesn't produce warning)
# The professor said that implementing this way is allowable.
filtered_df.loc[0, 'Age'] = 66
filtered_df.loc[1, 'Age'] = 44

print(filtered_df)

# Your code ends here





    Name  Age         City
0  Alice   66     New York
1    Bob   44  Los Angeles


In [6]:
#obtain summary of df
summary = df.describe()
print(summary)


        Age
count   3.0
mean   30.0
std     2.0
min    28.0
25%    29.0
50%    30.0
75%    31.0
max    32.0


In [7]:
#obtain summary of filtered_df and print it

# Your code starts here
filtered_summary = filtered_df.describe()
print(filtered_summary)

# Your code ends here


             Age
count   2.000000
mean   55.000000
std    15.556349
min    44.000000
25%    49.500000
50%    55.000000
75%    60.500000
max    66.000000


In [8]:
#read data from csv file into a data frame
df = pd.read_csv('daily_sales.csv')
print(df)


    Description: this dataset shows number of WaterCure units sold per day and the corresponding daily temperature from January 2024 to August 2024  \
0                                                  NaN                                                                                                
1                                                 Date                                                                                                
2                                             1/1/2024                                                                                                
3                                             1/2/2024                                                                                                
4                                             1/3/2024                                                                                                
..                                                 ...                                        

In [9]:
#read data from csv file into a data frame AND skip the heading
df = pd.read_csv('daily_sales.csv', skiprows = 2)
print(df)

          Date  Daily Units Sold  Daily Unit Price  Daily Temperature (C)
0     1/1/2024                91              24.0                     25
1     1/2/2024                90              24.0                     24
2     1/3/2024                70              24.0                     19
3     1/4/2024                89              24.0                     23
4     1/5/2024               100              24.0                     36
..         ...               ...               ...                    ...
239  8/27/2024                29              22.2                     16
240  8/28/2024                59              22.2                     21
241  8/29/2024                33              22.2                     17
242  8/30/2024                33              22.2                     17
243  8/31/2024                37              22.2                     18

[244 rows x 4 columns]


In [10]:
#write data frame to a csv file
df.to_csv('output.csv', index=False)


# **Mini Business Case Study - Sales Analytics for EcoTrek Solutions**

**Case Background**

You are a new hire at EcoTrek Solutions, a U.S.-based company that specializes in eco-friendly travel accessories. As a business data analyst, your role involves analyzing sales trends and providing insights to help shape the company's marketing and production strategies.

**Pandas, Reading/Writing Data Assignment (Due: TUESDAY, 9/9, 11:59 pm.)**



You, the business data analyst, wonder if such correlation has more to do with the climate than the products themselves.

You collected temperature data for the last 20 days as well as the daily sales of GreenTote With this data, you will perform the following tasks:

1. create a data frame to store the data read from the file.
2. get a summary of each data field (i.e., the count, mean, standard deviation, minimum, maximum, ... etc. ) and interpret the summary in the Markdown or Text Cell (e.g., what does maximum mean for the daily temperature field?)

3. use the Pearson Correlation function to examine the correlation relationship between daily sold units and daily temperaturll

4. sort the data frame according to the number of units sold per day

5. write the sorted data frame to a csv file named "**GreenTote_analysis_YourLastName_FirstName.csv*

6. based on your analysis, choose one of the following options to recommend to your company: (in the text cell)
- Option 1: Develop a marketing campaign that considers weather information to strategize the sales of GreenTote
- Option 2: Use a random campaign to help increase the sales for the new year*"

**​What to Submit:​**

You can choose to submit a Jupyter Notebook file.

1). Please name your file **YourLastName_FirstInitial_3.ipynb**. In the code, please just use the data file's name (i.e., data are in the local path) .

2). Please submit your data file "**GreenTote_analysis_YourLastName_FirstName.csv**"

3). Please remember to provide your explanation/interpretation in the Text cells

4). Please provide comments in the Code cell to explain what your code does in that cell and run your code before submitting it.