# Instructions

Your submission will be tested with the code tester. It is important to follow these instructions to ensure your work tests properly.

- Do not change the content of the cells under __SETUP__ and __TESTS__
- Work only in the __YOUR WORK__ area
- Rename the notebook with your group at the end (subsitute XX with your group number).
- Assign the results of each numbered question to the appropriate test variable. For example, the answer of `1.` should be assigned to `test_1`
- Rounding: use the supplied function `hround` to round decimal numbers when instructed. It's important to use this function because there are [multiple ways to round numbers in Python](https://www.knowledgehut.com/blog/programming/python-rounding-numbers) and they may not result in the same value that the tester is testing against.
- Ensure your run the cells under __SETUP__ before you run your work
- Before you submit your work, ensure you clean up your notebook. Your notebook has to run without an error in order to be tested. The easiest way to ensure is to `Kernel->Restart & Run All`
- Answers are provided in along with this notebook in eLC (look a picture named `solution_key`) for your convenience
- You will need to write a program to calculate the answers. Setting the answers to be their correct values without solving them is considered *hardcoding* and will result in zero grade for the assignment as well as a potential academic honesty violation.
- You can also test your submission using [the online code tester](https://notebook-tester.safadi-puzzler.com/)


# SETUP

In [1]:
import pandas as pd
import numpy as np

In [2]:
# DO NOT EDIT OR CHANGE THE CONTENT OF THIS CELL
scenario = 0

In [3]:
def hround(number):
    return round(number, 2 - scenario)

In [4]:
test_1=test_2=test_3=test_4=test_5=test_6=test_7=test_8=test_9=test_10=0.0
test_11=test_12=test_13=test_14=test_15=test_16=test_17=test_18=test_19=test_20=0.0

In this homework, we have data from an accounting system with the following columns:

- `tech_approval_required`: a binary variable. Orders requiring technical approval are marked with 1
- `requester_id`: the ID of the requester.
- `role`: if the requester is from the IT department, these are labeled tech.
- `product`: the type of product.
- `quantity`: quantity ordered
- `price`: price each
- `total`: total price


In [5]:
part1 = pd.read_csv('orders.csv')
part1.head()

Unnamed: 0,tech_approval_required,requester_id,role,product,quantity,price,total
0,0,E2300,tech,Desk,1,664,664
1,0,E2300,tech,Keyboard,9,649,5841
2,0,E2374,non-tech,Keyboard,1,821,821
3,1,E2374,non-tech,Desktop Computer,24,655,15720
4,0,E2327,non-tech,Desk,1,758,758


# Part 1

Focusing on the data frame `part1`

1. report the number rows
2. report the number of columns
3. what is the total number of values in the data frame?
4. select the first five rows (return a data frame)
5. select the last five rows (return a data frame)
6. select every other 100th row (return a data frame)
7. we want to double check that the total values are accurate, calculate a new column `total2` by multiplying `price` with `quantity`. show the first five rows of `total` and `total2` (return a data frame).
8. check if each value of `total` equals the corresponding value of `total2` (return a `bool`)
9. How many orders required technical approval?
10. What is the most expensive order (highest total)? return a series representing the order row from the data frame.
11. What is the average price of `Desk`. Round the number with `hround`
12. Report the requester ids that end with `0`. Return a sorted list.
13. Who requested the order with the largest quantity, return the requester id.
14. What are the distinct roles? Return a sorted list.
15. Who ordered most keyboards (in terms of total quantity). Report the requester id.

# Part 2

Focusing on the data frame `part2`. 

The file `orders_corrupt.csv` was obtained by running OCR on a scanned image.
The OCR had issues in some entries mistaking number `1` with letter `l`.
As a result, some numerical entries in the file are corrupt.

16. What is the `dtypes` of `part2`?
17. Replace the corrupt numbers with `na`. Show the first six rows.
18. What are the `dtypes` now?
19. How many rows are corrupt?
20. Drop the corrupt rows. Show the first five rows in the resulting data frame.

# YOUR WORK

## Part 1

In [6]:
# work on the remaining items below

In [7]:
# you can add as many cells as needed

In [8]:
#1

num_rows = len(part1)
num_rows

1000

In [9]:
#2

num_columns = len(part1.columns)
num_columns

7

In [10]:
#3

total_values = num_rows * num_columns
total_values

7000

In [11]:
#4

first_five = part1.head()
first_five

Unnamed: 0,tech_approval_required,requester_id,role,product,quantity,price,total
0,0,E2300,tech,Desk,1,664,664
1,0,E2300,tech,Keyboard,9,649,5841
2,0,E2374,non-tech,Keyboard,1,821,821
3,1,E2374,non-tech,Desktop Computer,24,655,15720
4,0,E2327,non-tech,Desk,1,758,758


In [12]:
#5

last_five = part1.tail()
last_five

Unnamed: 0,tech_approval_required,requester_id,role,product,quantity,price,total
995,1,E2364,non-tech,Laptop Computer,1,116,116
996,1,E2357,non-tech,Laptop Computer,1,1132,1132
997,0,E2330,non-tech,Keyboard,2,804,1608
998,0,E2384,non-tech,Desk,3,270,810
999,0,E2343,non-tech,Mouse,1,236,236


In [13]:
#6

every_other = part1[::100]
every_other

Unnamed: 0,tech_approval_required,requester_id,role,product,quantity,price,total
0,0,E2300,tech,Desk,1,664,664
100,0,E2354,non-tech,Desk,3,134,402
200,0,E2321,non-tech,Cleaning,5,48,240
300,0,E2392,non-tech,Chair,2,746,1492
400,0,E2343,non-tech,Desk,1,364,364
500,0,E2396,non-tech,Cleaning,1,1036,1036
600,0,E2334,non-tech,Desk,3,222,666
700,0,E2374,non-tech,Keyboard,2,138,276
800,0,E2344,non-tech,Desk,1,49,49
900,0,E2355,non-tech,Desk,7,348,2436


In [14]:
#7

part1['total2'] = part1['quantity'] * part1['price']

first_five_totals = part1[['total', 'total2']].head()
first_five_totals

Unnamed: 0,total,total2
0,664,664
1,5841,5841
2,821,821
3,15720,15720
4,758,758


In [15]:
#8

t1_equals_t2 = (part1['total'] == part1['total2']).all()   
t1_equals_t2

True

In [16]:
#9

approval_required = part1['tech_approval_required'].sum()
approval_required

193

In [17]:
#10

max_total = part1.loc[part1['total'].apply(lambda x: x == part1['total'].max())].drop('total2', axis=1).iloc[0]
max_total

tech_approval_required                   1
requester_id                         E2358
role                              non-tech
product                   Desktop Computer
quantity                                21
price                                 1082
total                                22722
Name: 11, dtype: object

In [18]:
#11

avg_desk_price = hround(part1.loc[part1['product'] == 'Desk','price'].mean())
avg_desk_price

622.85

In [19]:
# 12
sorted_list = sorted(part1.loc[part1['requester_id'].apply(lambda s: s.endswith('0')),'requester_id'].unique())
sorted_list

['E2300',
 'E2310',
 'E2330',
 'E2340',
 'E2350',
 'E2360',
 'E2370',
 'E2380',
 'E2390',
 'E2400']

In [20]:
requestor_id_max = part1.loc[part1['quantity'].apply(lambda x: x == part1['quantity'].max()),'requester_id'].iloc[0]
requestor_id_max

'E2329'

In [21]:
#14

distinct_role = sorted(part1['role'].unique())
distinct_role

['non-tech', 'tech']

In [22]:
#15

keyboard_data = part1.loc[part1['product'] == 'Keyboard']
grouped_quantity = keyboard_data.groupby('requester_id')['quantity'].sum()
most_keyboards = grouped_quantity[grouped_quantity.apply(lambda x: x == grouped_quantity.max())].index[0]
most_keyboards

'E2341'

## Part 2

In [23]:
part2 = pd.read_csv('orders_corrupt.csv')
part2.head()

Unnamed: 0,tech_approval_required,requester_id,role,product,quantity,price,total
0,0,E2300,tech,Desk,1,664,664
1,0,E2300,tech,Keyboard,9,649,584l
2,0,E2374,non-tech,Keyboard,1,821,821
3,1,E2374,non-tech,Desktop Computer,24,655,15720
4,0,E2327,non-tech,Desk,1,758,758


In [24]:
#16

part2_dtypes = part2.dtypes
part2_dtypes

tech_approval_required    object
requester_id              object
role                      object
product                   object
quantity                  object
price                     object
total                     object
dtype: object

In [25]:
#17

part2[['price', 'total', 'quantity', 'tech_approval_required']] = part2[['price', 'total', 'quantity', 'tech_approval_required']].apply(pd.to_numeric, errors='coerce')
fixed = part2.head(6)
fixed

Unnamed: 0,tech_approval_required,requester_id,role,product,quantity,price,total
0,0.0,E2300,tech,Desk,1.0,664.0,664.0
1,0.0,E2300,tech,Keyboard,9.0,649.0,
2,0.0,E2374,non-tech,Keyboard,1.0,821.0,821.0
3,1.0,E2374,non-tech,Desktop Computer,24.0,655.0,15720.0
4,0.0,E2327,non-tech,Desk,1.0,758.0,758.0
5,0.0,E2354,non-tech,Desk,1.0,576.0,576.0


In [26]:
#18

dtypes_again = part2.dtypes
dtypes_again

tech_approval_required    float64
requester_id               object
role                       object
product                    object
quantity                  float64
price                     float64
total                     float64
dtype: object

In [27]:
#19

total_corrupt = part2.isna().sum().sum()
total_corrupt

7

In [28]:
#20

final = part2.dropna().head(5)
final

Unnamed: 0,tech_approval_required,requester_id,role,product,quantity,price,total
0,0.0,E2300,tech,Desk,1.0,664.0,664.0
2,0.0,E2374,non-tech,Keyboard,1.0,821.0,821.0
3,1.0,E2374,non-tech,Desktop Computer,24.0,655.0,15720.0
4,0.0,E2327,non-tech,Desk,1.0,758.0,758.0
5,0.0,E2354,non-tech,Desk,1.0,576.0,576.0


# TESTS

In [29]:
### TEST 1
test_1 = num_rows
test_1

1000

In [30]:
## TEST 2
test_2 = num_columns
test_2

7

In [31]:
## TEST 3
test_3 = total_values
test_3

7000

In [32]:
## TEST 4
test_4 = first_five
test_4

Unnamed: 0,tech_approval_required,requester_id,role,product,quantity,price,total
0,0,E2300,tech,Desk,1,664,664
1,0,E2300,tech,Keyboard,9,649,5841
2,0,E2374,non-tech,Keyboard,1,821,821
3,1,E2374,non-tech,Desktop Computer,24,655,15720
4,0,E2327,non-tech,Desk,1,758,758


In [33]:
## TEST 5
test_5 = last_five
last_five

Unnamed: 0,tech_approval_required,requester_id,role,product,quantity,price,total
995,1,E2364,non-tech,Laptop Computer,1,116,116
996,1,E2357,non-tech,Laptop Computer,1,1132,1132
997,0,E2330,non-tech,Keyboard,2,804,1608
998,0,E2384,non-tech,Desk,3,270,810
999,0,E2343,non-tech,Mouse,1,236,236


In [34]:
## TEST 6
test_6 = every_other
test_6

Unnamed: 0,tech_approval_required,requester_id,role,product,quantity,price,total
0,0,E2300,tech,Desk,1,664,664
100,0,E2354,non-tech,Desk,3,134,402
200,0,E2321,non-tech,Cleaning,5,48,240
300,0,E2392,non-tech,Chair,2,746,1492
400,0,E2343,non-tech,Desk,1,364,364
500,0,E2396,non-tech,Cleaning,1,1036,1036
600,0,E2334,non-tech,Desk,3,222,666
700,0,E2374,non-tech,Keyboard,2,138,276
800,0,E2344,non-tech,Desk,1,49,49
900,0,E2355,non-tech,Desk,7,348,2436


In [35]:
## TEST 7
test_7 = first_five_totals
test_7

Unnamed: 0,total,total2
0,664,664
1,5841,5841
2,821,821
3,15720,15720
4,758,758


In [36]:
## TEST 8
test_8 = t1_equals_t2
test_8

True

In [37]:
## TEST 9
test_9 = approval_required
test_9

193

In [38]:
## TEST 10
test_10 = max_total
test_10

tech_approval_required                   1
requester_id                         E2358
role                              non-tech
product                   Desktop Computer
quantity                                21
price                                 1082
total                                22722
Name: 11, dtype: object

In [39]:
## TEST 11
test_11 = avg_desk_price
test_11

622.85

In [40]:
## TEST 12
test_12 = sorted_list
test_12

['E2300',
 'E2310',
 'E2330',
 'E2340',
 'E2350',
 'E2360',
 'E2370',
 'E2380',
 'E2390',
 'E2400']

In [41]:
## TEST 13
test_13 = requestor_id_max
test_13

'E2329'

In [42]:
## TEST 14
test_14 = distinct_role
test_14

['non-tech', 'tech']

In [43]:
## TEST 15
test_15 = most_keyboards
test_15

'E2341'

In [44]:
## TEST 16
test_16 = part2_dtypes
test_16

tech_approval_required    object
requester_id              object
role                      object
product                   object
quantity                  object
price                     object
total                     object
dtype: object

In [45]:
## TEST 17
test_17 = fixed
test_17

Unnamed: 0,tech_approval_required,requester_id,role,product,quantity,price,total
0,0.0,E2300,tech,Desk,1.0,664.0,664.0
1,0.0,E2300,tech,Keyboard,9.0,649.0,
2,0.0,E2374,non-tech,Keyboard,1.0,821.0,821.0
3,1.0,E2374,non-tech,Desktop Computer,24.0,655.0,15720.0
4,0.0,E2327,non-tech,Desk,1.0,758.0,758.0
5,0.0,E2354,non-tech,Desk,1.0,576.0,576.0


In [46]:
## TEST 18
test_18 = dtypes_again
test_18

tech_approval_required    float64
requester_id               object
role                       object
product                    object
quantity                  float64
price                     float64
total                     float64
dtype: object

In [47]:
## TEST 19
test_19 = total_corrupt
test_19

7

In [48]:
## TEST 20
test_20 = final
test_20

Unnamed: 0,tech_approval_required,requester_id,role,product,quantity,price,total
0,0.0,E2300,tech,Desk,1.0,664.0,664.0
2,0.0,E2374,non-tech,Keyboard,1.0,821.0,821.0
3,1.0,E2374,non-tech,Desktop Computer,24.0,655.0,15720.0
4,0.0,E2327,non-tech,Desk,1.0,758.0,758.0
5,0.0,E2354,non-tech,Desk,1.0,576.0,576.0
