An e-commerce company, Store 1, recently began collecting data about its customers. Store 1’s goal is to better understand customer behavior and make data-driven decisions to improve their online experience.

As part of the analytics team, your first task is to assess the quality of a sample of collected data and prepare it for future analysis.

# Quiz<br>
Store 1 aims to ensure consistency in data collection. As part of this effort, the quality of the data collected about users needs to be assessed. You have been asked to review the data collected and propose changes. Below, you will see data about a specific user. Review the data and identify potential issues.

In [277]:
user_id = '32415'
user_name = ' mike_reed '
user_age = 32.0
fav_categories = ['ELECTRONICS', 'SPORT', 'BOOKS']

Options:

- The data type of user_id should be changed from string to integer.

- The user_name variable contains a string with unnecessary spacing and an underscore between the first and last names.

- The data type of user_age is incorrect.

- The fav_categories list contains strings in uppercase letters. Instead, we should convert the values ​​in the list to lowercase letters.

In the Markdown cell below, write the number of options you identified as problems. If you identified multiple problems, separate the numbers with commas. For example, if you think the numbers 1 and 3 are incorrect, write 1, 3, and explain why.

1. The user_id variable is a string and should be converted to a numeric type, since user_id is a number.
2. The user_name variable should have its spaces removed, and the underscore between the first and last name should be replaced with a space.
3. The user_age variable is a float and should be changed to an integer type, since all ages are integers.
4. The fav_categories variable is storing strings in uppercase letters. For good data organization and good programming practice, we should store strings in lowercase letters.

# Task 1<br>

Let's implement the changes we've identified. First, we want to fix the issues with the `user_name` variable. As we've seen, it has unnecessary spaces and an underscore as a separator between the first and last names. Your goal is to remove the spaces and then replace the underscore with a space.

In [278]:
user_name = ' mike_reed '
user_name = user_name.strip() # spaces removed
user_name = user_name.replace('_',' ')# the underscore between the first and last name should be replaced with a space

print(user_name)

mike reed


# Task 2<br>
Next, we need to split the updated user_name into two substrings to get a list that contains two values: the string for the first name and the string for the last name.

In [279]:
user_name = 'mike reed'
name_split = user_name.split() # split string user_name 

print(name_split)

['mike', 'reed']


# Task 3<br>
Great! Now we want to work with the `user_age` variable. As we mentioned before, it has an incorrect data type. Let's fix this issue by transforming the data type and printing the final result.

In [280]:
user_age = 32.0
user_age = int(user_age)# chaging user_age type to int

print(user_age)

32


# Task 4<br>
As we know, data is not always perfect. We have to consider scenarios where the value of `user_age` cannot be converted to an integer. To prevent our system from failing, we need to take measures in advance.

Write some code that tries to convert the `user_age` variable to an integer and assign the converted value to `user_age_int`. If the attempt fails, we will display a message asking the user to provide their age as a numeric value with the message: `Please provide your age as a numeric value.`

In [281]:
user_age = 'thirty two' # user_age variable as string

try:
    user_age = int(user_age)  
except:
    print('Forneça sua idade como um valor numérico.')

Forneça sua idade como um valor numérico.


# Task 5<br>
Finally, note that all favorite categories are stored in uppercase letters. To populate a new list called `fav_categories_low` with the same categories but in lowercase letters, iterate over the values ​​in the `fav_categories` list, modify them, and append the new values ​​to the `fav_categories_low` list. As always, print the final result.

In [282]:
fav_categories = ['ELECTRONICS', 'SPORT', 'BOOKS']
fav_categories_low = []

# code
fav_categories[0] = fav_categories[0].lower() 
fav_categories[1] = fav_categories[1].lower() 
fav_categories[2] = fav_categories[2].lower() 
fav_categories_low = fav_categories

print(fav_categories_low)

['electronics', 'sport', 'books']


# Task 6<br>

We have gathered additional information about our users' spending habits, including the amount spent in each of their favorite categories. The administration is interested in the following metrics:

- Total amount spent by the user
- Minimum amount spent
- Maximum amount spent

Let's calculate and print these values:

In [283]:
fav_categories_low = ['electronics', 'sport', 'books']
spendings_per_category = [894, 213, 173]

total_amount = sum(spendings_per_category)
max_amount = max(spendings_per_category)
min_amount = min(spendings_per_category)


print(total_amount)
print(max_amount)
print(min_amount)

1280
894
173


# Task 7<br>

The company wants to offer discounts to its loyal customers. Customers who make purchases totaling more than $1,500 are considered loyal and will receive a discount.

Our goal is to create a while loop that checks the total amount spent and stops when it is reached. To simulate new purchases, the new_purchase variable outputs a number between 30 and 80 in each loop. This represents the amount of money spent on a new purchase, and is what you need to add to the total.

Once the target amount is reached and the while loop is finished, the final amount will be printed.

In [284]:
from random import randint

total_amount_spent = 0
target_amount = 1500

while total_amount_spent < target_amount: 
	new_purchase = randint(30, 80) # random number
	total_amount_spent += new_purchase

print(total_amount_spent)

1508


# Task 8<br>

Now we have all the information about a customer the way we want it. A company's management has asked us to find a way to summarize all the information about a user. Their goal is to create a formatted string that uses information from the variables `user_id`, `user_name`, and `user_age`.

Here is the final string we want to create: `User 32415 is named mike and is 32 years old.`

In [285]:
user_id = '32415'
user_name = ['mike', 'reed']
user_age = 32

user_info = "Usuário {} chama-se {} e tem {} anos".format(user_id,user_name[0],user_age)

print(user_info)

Usuário 32415 chama-se mike e tem 32 anos


As you may already know, companies collect and store data in a specific way. Store 1 wants to store all the information about its customers in a table.<br>

| user_id | user_name | user_age | purchase_category | spending_per_category |
| --- | --- | --- | --- | --- |
| '32415' | 'mike', 'reed' | 32 | 'electronics', 'sport', 'books' | 894, 213, 173 |
| '31980' | 'kate', 'morgan' | 24 | 'clothes', 'shoes' | 439, 390 |

In technical terms, a table is simply a nested list that has a sublist for each user.

Store 1 created this table for its users. It is stored in the `users` variable. Each sublist contains the user's ID, first and last name, age, favorite categories, and the amount spent in each category.

# Task 9<br>

To calculate the company's revenue, follow these steps:

1. Use a `for` loop to iterate over the `users` list.

2. Get the list of expenses for each user and add the amounts together.

3. Update the revenue value with the total for each user.

This will give you the company's total revenue, which you'll print out at the end.

In [286]:
users = [
	  # first list
    ['32415', ['mike', 'reed'], 32, ['electronics', 'sport', 'books'],
        [894, 213, 173]
    ], # end first list

    # second list
    ['31980', ['kate', 'morgan'], 24, ['clothes', 'shoes'],
        [439, 390]
    ] # end second list
]

revenue = 0

for user in users:
	spendings_list = user[-1]# # extract the list of expenses for each user and add the values
	total_spendings = sum(spendings_list) # sum spendings
	revenue += total_spendings # update revenue
    


print(revenue)

2109


# Task 10<br>

Use a for loop to iterate through the list of users we provided and print out the names of customers under the age of 30.

In [287]:
users = [
    ['32415', ['mike', 'reed'], 32, ['electronics', 'sport', 'books'],
     [894, 213, 173]],
    ['31980', ['kate', 'morgan'], 24, ['clothes', 'books'], [439,
     390]],
    ['32156', ['john', 'doe'], 37, ['electronics', 'home', 'food'],
     [459, 120, 99]],
    ['32761', ['samantha', 'smith'], 29, ['clothes', 'electronics',
     'beauty'], [299, 679, 85]],
    ['32984', ['david', 'white'], 41, ['books', 'home', 'sport'], [234,
     329, 243]],
    ['33001', ['emily', 'brown'], 26, ['beauty', 'home', 'food'], [213,
     659, 79]],
    ['33767', ['maria', 'garcia'], 33, ['clothes', 'food', 'beauty'],
     [499, 189, 63]],
    ['33912', ['jose', 'martinez'], 22, ['sport', 'electronics', 'home'
     ], [259, 549, 109]],
    ['34009', ['lisa', 'wilson'], 35, ['home', 'books', 'clothes'],
     [329, 189, 329]],
    ['34278', ['james', 'lee'], 28, ['beauty', 'clothes', 'electronics'
     ], [189, 299, 579]],
    ]


for user in users:
    if user[2]<30:
        print(' '.join(user[1]))

kate morgan
samantha smith
emily brown
jose martinez
james lee


# Task 11<br>

Let's combine tasks 9 and 10 and print out the names of users under 30 with total spending over $1,000.

In [288]:
users = [
    ['32415', ['mike', 'reed'], 32, ['electronics', 'sport', 'books'],
     [894, 213, 173]],
    ['31980', ['kate', 'morgan'], 24, ['clothes', 'books'], [439,
     390]],
    ['32156', ['john', 'doe'], 37, ['electronics', 'home', 'food'],
     [459, 120, 99]],
    ['32761', ['samantha', 'smith'], 29, ['clothes', 'electronics',
     'beauty'], [299, 679, 85]],
    ['32984', ['david', 'white'], 41, ['books', 'home', 'sport'], [234,
     329, 243]],
    ['33001', ['emily', 'brown'], 26, ['beauty', 'home', 'food'], [213,
     659, 79]],
    ['33767', ['maria', 'garcia'], 33, ['clothes', 'food', 'beauty'],
     [499, 189, 63]],
    ['33912', ['jose', 'martinez'], 22, ['sport', 'electronics', 'home'
     ], [259, 549, 109]],
    ['34009', ['lisa', 'wilson'], 35, ['home', 'books', 'clothes'],
     [329, 189, 329]],
    ['34278', ['james', 'lee'], 28, ['beauty', 'clothes', 'electronics'
     ], [189, 299, 579]],
    ]


for user in users:    
        spendings_list = user[-1]
        total_spendings = sum(spendings_list)
        if user[2]<30 and total_spendings>1000:
            print(' '.join(user[1]))

samantha smith
james lee


# Task 12<br>

Now let's print the name and age of all users who purchased clothes. Print the name and age in the same print statement.

In [289]:
users = [
    ['32415', ['mike', 'reed'], 32, ['electronics', 'sport', 'books'],
     [894, 213, 173]],
    ['31980', ['kate', 'morgan'], 24, ['clothes', 'books'], [439,
     390]],
    ['32156', ['john', 'doe'], 37, ['electronics', 'home', 'food'],
     [459, 120, 99]],
    ['32761', ['samantha', 'smith'], 29, ['clothes', 'electronics',
     'beauty'], [299, 679, 85]],
    ['32984', ['david', 'white'], 41, ['books', 'home', 'sport'], [234,
     329, 243]],
    ['33001', ['emily', 'brown'], 26, ['beauty', 'home', 'food'], [213,
     659, 79]],
    ['33767', ['maria', 'garcia'], 33, ['clothes', 'food', 'beauty'],
     [499, 189, 63]],
    ['33912', ['jose', 'martinez'], 22, ['sport', 'electronics', 'home'
     ], [259, 549, 109]],
    ['34009', ['lisa', 'wilson'], 35, ['home', 'books', 'clothes'],
     [329, 189, 329]],
    ['34278', ['james', 'lee'], 28, ['beauty', 'clothes', 'electronics'
     ], [189, 299, 579]],
    ]



for user in users:
    if 'clothes' in user[3]:
            print(' '.join(user[1]), user[2])

kate morgan 24
samantha smith 29
maria garcia 33
lisa wilson 35
james lee 28
