# **Milestone** | Cleaning & Analyzing Revenue Data for ASOS

<div style="text-align: center;">
<img src="https://upload.wikimedia.org/wikipedia/commons/a/a8/Asos.svg" alt="ASOS Logo" width="200"/>
</div>

## Introduction
In this Milesone, you'll take on the role of a Junior Data Analyst at ASOS, an online fast-fashion and cosmetic retailer. Your task is to help make sens of some weekly revenue data that ... isn't exactly clean.

In the list named `revenue_by_week`, you'll find a snippet of ASOS's estimated weekly revenue, captured in millons of pounds.

If you try to sum this list, Python throws a TypeError. Why?

In [None]:
revenue_by_week = [65, 77, '66', '74',
                   64, 82, '86', 72, '80',
                   96, 101, '35', '72', '68',
                  ]

sum(revenue_by_week)



In [None]:
REASON: the above list has a mix of integer and strings. The sum() function cannot be used for calculating strings.

<div style="border: 3px solid #30EE99; background-color: #f0fff4; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
  <span style="font-size: 10pt;">
    <strong>Try This AI Prompt:</strong> I have the following list of revenue data:
revenue_by_week = [65, 77, '66', '74', 64, 82, '86', 72, '80', 96, 101, '35', '72', '68']
When I try to sum this list, I get a TypeError. Why is that happening, and how can I fix it?
  </span>
</div>


## Task 1: Cleaning The Data

A `TypeError` is basically saying that Python doesn't know how to add (`+`) a number (`int`) and string (`str`) together.

Take a closer look at the `revenue_by_week` list. You'll notice that some numbers are stored as strings (with quotes around them), while others are integers.

While you could go into the list and manually remove all of the `'` marks, that sounds like a pain, and imagine if you had a lot more numbers than the three months of ASOS data. You can use your programming skills so that Python does the manual work for you! There's a built-in function, `int()`, that changes the argument given to it into a number.

**Run the cell below** to see a demonstration on some various data types.

<div style="border: 3px solid #b67ae5; background-color: #f9f1ff; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
<span style="font-size: 10pt;">
<strong>Note: </strong>The last line in the cell will give an error since a `list` is not a number (the individual elements can be, though!)
</span>
</div>

In [None]:
print(int(105))   # integer
print(int(52.80)) # decimal (float)
print(int('97'))  # integer string
print(int([105, '97'])) # list

Notice that the `int()` function can accept values that are already numbers (chopping off any decimal part if present) or strings that depict integers (it will fail on decimal strings, however). But you should also notice that Python threw an error when we tried to give it a list.

In order to clean the data, we need to do that item by item.


To complete this task, create a new list `cleaned_revenue` that has all of the data values in a numeric data type:
- Set up `cleaned_revenue` as an empty list.
- Use a `for` loop to loop over the elements of `revenue_by_week`.
  - For each element, convert it to an integer data type with the `int()` function
  - Append the converted value to the `cleaned_revenue` list.

Outside of your loop, `print` the completed `cleaned_revenue` and `print` the total sum of values.

<div style="border: 3px solid #30EE99; background-color: #f0fff4; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
  <span style="font-size: 10pt;">
    <strong>Try This AI Prompt:</strong> I’ve got a mix of strings and integers in my list, and I’m converting everything to int(). Are there any risks or edge cases where this approach could go wrong? When might it fail in a real dataset?
  </span>
</div>

In [7]:
# set up storage for cleaned data
revenue_by_week = [65, 77, '66', '74',
                   64, 82, '86', 72, '80',
                   96, 101, '35', '72', '68',
                  ]
cleaned_revenue = []

# loop through data and convert to integers
for revenue in revenue_by_week:
    cleaned_revenue.append(int(revenue))


# assess the cleaned data by printing it
print(sum(cleaned_revenue))


1038


<div style="border: 3px solid #f8c43e; background-color: #fff3c1; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
  <span style="font-size: 10pt;">
      If done correctly, the value for the <span style="font-family: monospace; color: #222;">sum</span> of <span style="font-family: monospace; color: #222;">cleaned_revenue</span> should be <strong>1038</strong>.
  </span>
</div>

## Task 2: Monthly Analysis

Great. With a cleaned list `cleaned_revenue` you're ready to calculate:
- **The total amount made in each month.**
- **The highest average (weekly) revenue.**

You're told the months break down like this:
- June: first 4 weeks
- July: next 5 weeks
- August: final 5 weeks


Use slicing to get the relevant parts of the original revenue list, then use the `sum()` and `len()` functions to help you calculate the total and average for each month. Remember: the average will be the total revenue divided by the number of weeks.

<div style="border: 3px solid #b67ae5; background-color: #f9f1ff; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
<span style="font-size: 10pt;">
<strong>Hint: </strong>Be careful about how indexing works in Python! You might want to try printing the slices you pull out first to check that they're capturing the correct values, before trying to summarize them.
</span>
</div>

In [17]:
# revenue by month
june_revenue = cleaned_revenue[0:4]
july_revenue = cleaned_revenue[4:9]
august_revenue = cleaned_revenue[9:]

# calculate sum
june_total = sum(june_revenue)
july_total = sum(july_revenue)
august_total = sum(august_revenue)
total = [june_total, july_total, august_total]

# calculate avg
june_avg = sum(june_revenue)/len(june_revenue)
july_avg = sum(july_revenue)/len(july_revenue)
august_avg = sum(august_revenue)/len(august_revenue)
avg = [june_avg, july_avg, august_avg]

# print the total amount and average revenue for each month
months = ['June', 'July', 'August']
for i in range(3):
    print(f'For {months[i]}, the total revenue is {total[i]} and the average revenue is {avg[i]}.')


For June, the total revenue is 282 and the average revenue is 70.5.
For July, the total revenue is 384 and the average revenue is 76.8.
For August, the total revenue is 372 and the average revenue is 74.4.


<div style="border: 3px solid #f8c43e; background-color: #fff3c1; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
  <span style="font-size: 10pt;">
      If done correctly, the average values you should get are:
  <ul>
    <li>June: 70.5</li>
    <li>July: 76.8</li>
    <li>August: 74.4</li>
  </ul>  </span>

</div>

Which month had the highest average revenue?

July has the highest average revenue.

What do you think could explain the difference in revenue between months? Are there any patterns or external factors that could be influencing these fluctuations?

<div style="border: 3px solid #30EE99; background-color: #f0fff4; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
  <span style="font-size: 10pt;">
    <strong>Try This AI Prompt:</strong> [MONTH] had the highest average revenue, but only by a small margin. What kinds of external factors might explain fluctuations in weekly sales for a fashion retailer like ASOS?
  </span>
</div>

July had the highest average revenue, but only by a small margin. Fluctuations in weekly sales for a fashion retailer like ASOS can be influenced by a variety of external factors. These include seasonal demand for summer clothing, marketing campaigns, weather changes, and the timing of major events. For example, U.S. Independence Day on July 4th often drives holiday sales and increased shopping activity, especially from American customers. Other factors such as limited-time discounts, product drops, or even supply chain disruptions can also affect weekly revenue patterns.

## LevelUp

If you were preparing a short internal summary for ASOS leadership, what is one clear takeaway about monthly revenue performance and what would you recommend they investigate further?

July achieved the highest average weekly revenue, narrowly outperforming June and August. This suggests consistent demand across the summer period, with a possible boost in July from holiday-related promotions or U.S. Independence Day traffic. My recommended action is to investigate what specifically drove July’s peak, such as targeted marketing campaigns, product drops, or geographic trends (for example, U.S. versus UK sales). Also, we should identify which efforts had the greatest impact can help optimize performance in future seasonal campaigns.