# Task 1: Sorting Groceries

## Prompt:
Mimicking the behavior of the Apple Reminders app when creating a "Groceries" reminder list, create a programme that when passed a grocery item name (eg. "oranges", "eggs") is able to return which category they belong in (eg. "Fruits and vegetable", "Eggs and Dairy", respectively).

You should start with collecting and processing necessary data that can contribute to completing this task. The focus of this task will be on this data collection and processing steps. What kind of data do you need, how will you process them, how much will you need, etc.

### ML Classifier VS Neural Network?

My data:
- only need product name & category columns
- item names are short (1-4 words)
- small # of categories
- text input → text classification

So, I should use ML Classifier → learns patterns from given data to guess the correct answer (category)

Steps:

1. Training dataset with name and category
2. Tokenize necessary words
3. Vectorize the tokens into numbers as computers cannot understand words
4. Train the classifier to know what words belong to which category
5. Predict for the new items

Categories:

1. Fruits & Vegetables
2. Dairy & Egg
3. Meat & Seafood
4. Grains & Staples
5. Bakery
6. Snacks
7. Beverages
8. Electronics
9. Household
10. Clothing & Lifestyle
11. Personal care & Health
12. Stationery & Books

In [10]:
import pandas as pd

# remove rows with NULL values
dataset1 = (pd.read_csv("data1.csv"))
dataset1.dropna(inplace = True)

# to_string() prints out the ENTIRE data
# print(dataset1.to_string())
print(dataset1)

      order_id  user_id  order_date      time  order_hour_of_day  \
0         1000      341  2024-09-04  15:43:00                 15   
1         1000      341  2024-09-04  15:43:00                 15   
2         1000      341  2024-09-04  15:43:00                 15   
3         1000      341  2024-09-04  15:43:00                 15   
4         1001      324  2024-03-07   8:45:00                  8   
...        ...      ...         ...       ...                ...   
9995      2989      271  2024-11-29  14:22:00                 14   
9996      2989      271  2024-11-29  14:22:00                 14   
9997      2989      271  2024-11-29  14:22:00                 14   
9998      2990      400  2024-02-09  20:08:00                 20   
9999      2990      400  2024-02-09  20:08:00                 20   

            product_name  quantity   price             category  product_id  
0            Wheat Flour         1  333.44     Grains & Staples         457  
1     Dishwashing Liquid   

In [11]:
dataset2 = (pd.read_csv("data2.csv"))
dataset2.dropna(inplace = True)
print(dataset2)

      Product_ID     Product_Name             Catagory  Supplier_ID  \
0    29-205-1132       Sushi Rice      Grains & Pulses  38-037-1699   
1    40-681-9981   Arabica Coffee            Beverages  54-470-2479   
2    06-955-3428       Black Rice      Grains & Pulses  54-031-2945   
3    71-594-6552  Long Grain Rice      Grains & Pulses  63-492-7603   
4    57-437-1828             Plum  Fruits & Vegetables  54-226-4308   
..           ...              ...                  ...          ...   
985  82-977-7752          Spinach  Fruits & Vegetables  57-473-8672   
986  62-393-9939   Cheddar Cheese                Dairy  93-877-9384   
987  31-745-6850          Cabbage  Fruits & Vegetables  96-215-2767   
988  86-692-2312      Avocado Oil          Oils & Fats  77-783-4107   
989  28-044-4102           Papaya  Fruits & Vegetables  93-358-1118   

    Supplier_Name  Stock_Quantity  Reorder_Level  Reorder_Quantity Unit_Price  \
0       Jaxnation              22             72                70

In [None]:
# for x in dataset1.index:
#   if dataset1.loc[x, "category"] == "Stationery" or dataset1.loc[x, "category"] == "Books":
#     dataset1.loc[x, "category"] = "Stationery & Books"
#   elif dataset1.loc[x, "category"] == "Clothing" or dataset1.loc[x, "category"] == "Footwear" or dataset1.loc[x, "category"] == "Clothing Accessories":
#     dataset1.loc[x, "category"] = "Clothing & Lifestyle"
#   elif dataset1.loc[x, "category"] == "Personal Care" or dataset1.loc[x, "category"] == "Health & Wellness":
#     dataset1.loc[x, "category"] = "Personal Care & Health"

# simplified code
dataset1["category"] = dataset1["category"].replace({
  "Stationery": "Stationery & Books",
  "Books": "Stationery & Books",
  "Clothing": "Clothing & Lifestyle",
  "Footwear": "Clothing & Lifestyle",
  "Clothing Accessories": "Clothing & Lifestyle",
  "Personal Care": "Personal Care & Health",
  "Health & Wellness": "Personal Care & Health"
  "Snacks": "Snacks & Desserts"
})

print(dataset1)

# create a new csv dataset with the changes
dataset1.to_csv("dataset1.csv", index=False)

      order_id  user_id  order_date      time  order_hour_of_day  \
0         1000      341  2024-09-04  15:43:00                 15   
1         1000      341  2024-09-04  15:43:00                 15   
2         1000      341  2024-09-04  15:43:00                 15   
3         1000      341  2024-09-04  15:43:00                 15   
4         1001      324  2024-03-07   8:45:00                  8   
...        ...      ...         ...       ...                ...   
9995      2989      271  2024-11-29  14:22:00                 14   
9996      2989      271  2024-11-29  14:22:00                 14   
9997      2989      271  2024-11-29  14:22:00                 14   
9998      2990      400  2024-02-09  20:08:00                 20   
9999      2990      400  2024-02-09  20:08:00                 20   

            product_name  quantity   price              category  product_id  
0            Wheat Flour         1  333.44      Grains & Staples         457  
1     Dishwashing Liquid 

In [20]:
dataset2["Catagory"] = dataset2["Catagory"].replace({
  "Oils & Fats": "Grains & Staples",
  "Grains & Pulses": "Grains & Staples",
  "Dairy": "Dairy & Eggs",
  "Seafood": "Meat & Seafood"
})

print(dataset2)

dataset2.to_csv("dataset2.csv", index=False)

      Product_ID     Product_Name             Catagory  Supplier_ID  \
0    29-205-1132       Sushi Rice     Grains & Staples  38-037-1699   
1    40-681-9981   Arabica Coffee            Beverages  54-470-2479   
2    06-955-3428       Black Rice     Grains & Staples  54-031-2945   
3    71-594-6552  Long Grain Rice     Grains & Staples  63-492-7603   
4    57-437-1828             Plum  Fruits & Vegetables  54-226-4308   
..           ...              ...                  ...          ...   
985  82-977-7752          Spinach  Fruits & Vegetables  57-473-8672   
986  62-393-9939   Cheddar Cheese         Dairy & Eggs  93-877-9384   
987  31-745-6850          Cabbage  Fruits & Vegetables  96-215-2767   
988  86-692-2312      Avocado Oil     Grains & Staples  77-783-4107   
989  28-044-4102           Papaya  Fruits & Vegetables  93-358-1118   

    Supplier_Name  Stock_Quantity  Reorder_Level  Reorder_Quantity Unit_Price  \
0       Jaxnation              22             72                70