### I. About the dataset:
This dataset of 1000 rows captures sales transactions from a local restaurant in India from April 2022 to March 2023. It includes details about the order ID, date of the transaction, item names (representing various food and beverage items), item types (categorized as Fast-food or Beverages), item prices, quantities ordered, transaction amounts, transaction types (cash, online, or others), the gender of the staff member who received the order, and the time of the sale (Morning, Evening, Afternoon, Night, Midnight). The dataset offers a valuable snapshot of the restaurant's daily operations and customer behavior.

* **Columns**:
1. *order_id*: a unique identifier for each order.
2. *date*: date of the transaction.
3. *item_name*: name of the food.
4. *item_type*: category of item (Fastfood or Beverages).
5. *item_price*: price of the item for 1 quantity.
6. *quantity*: how much quantity the customer orders.
7. *transaction_amount*: the total amount paid by customers.
8. *transaction_type*: payment method (cash, online).
9. *received_by*: gender of the person handling the transaction.
10. *time_of_sale*: different times of the day (Morning, Evening, Afternoon, Night, Midnight).

### II.	Foreseeable challenges:

The data was collected within around 2 years, so the prediction can only be based solely on the 2-year data, which means data on some special occasions that often occur 4 years a time, or 2 years a time (such as World Cup or other champions) could be missed. 

In [None]:
import pandas as pd 

df = pd.read_csv("D:\Local Restaurant Sales\Local-Restautant-Sales\Balaji Fast Food Sales.csv")
df.head()


Unnamed: 0,order_id,date,item_name,item_type,item_price,quantity,transaction_amount,transaction_type,received_by,time_of_sale
0,1,07/03/2022,Aalopuri,Fastfood,20,13,260,,Mr.,Night
1,2,8/23/2022,Vadapav,Fastfood,20,15,300,Cash,Mr.,Afternoon
2,3,11/20/2022,Vadapav,Fastfood,20,1,20,Cash,Mr.,Afternoon
3,4,02/03/2023,Sugarcane juice,Beverages,25,6,150,Online,Mr.,Night
4,5,10/02/2022,Sugarcane juice,Beverages,25,8,200,Online,Mr.,Evening


In [2]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 10 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   order_id            1000 non-null   int64 
 1   date                1000 non-null   object
 2   item_name           1000 non-null   object
 3   item_type           1000 non-null   object
 4   item_price          1000 non-null   int64 
 5   quantity            1000 non-null   int64 
 6   transaction_amount  1000 non-null   int64 
 7   transaction_type    893 non-null    object
 8   received_by         1000 non-null   object
 9   time_of_sale        1000 non-null   object
dtypes: int64(4), object(6)
memory usage: 78.2+ KB


transaction_type has missing values, all other attributes has 1000 rows. 

As the transaction_type only has 
 cash, and online, I will fill the missing values with the data in front of it, and behind it.

In [2]:
df = df.fillna(method='ffill').fillna(method='bfill')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 10 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   order_id            1000 non-null   int64 
 1   date                1000 non-null   object
 2   item_name           1000 non-null   object
 3   item_type           1000 non-null   object
 4   item_price          1000 non-null   int64 
 5   quantity            1000 non-null   int64 
 6   transaction_amount  1000 non-null   int64 
 7   transaction_type    1000 non-null   object
 8   received_by         1000 non-null   object
 9   time_of_sale        1000 non-null   object
dtypes: int64(4), object(6)
memory usage: 78.2+ KB


**Change to proper datatype**

In [3]:
df['date'] = pd.to_datetime(df['date'])
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 10 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   order_id            1000 non-null   int64         
 1   date                1000 non-null   datetime64[ns]
 2   item_name           1000 non-null   object        
 3   item_type           1000 non-null   object        
 4   item_price          1000 non-null   int64         
 5   quantity            1000 non-null   int64         
 6   transaction_amount  1000 non-null   int64         
 7   transaction_type    1000 non-null   object        
 8   received_by         1000 non-null   object        
 9   time_of_sale        1000 non-null   object        
dtypes: datetime64[ns](1), int64(4), object(5)
memory usage: 78.2+ KB


**Export the cleaned data to csv file**

In [None]:
df.to_csv("D:\Local Restaurant Sales\Local-Restautant-Sales\Project.csv")

This data will then be used in Tableau