# AI Powered Recipe Recommendation System 

### Team : Group 3
| Student No  | First Name                  | Last Name     |
|-------------|-----------------------------|---------------|
| 9041129     | Nidhi                       | Ahir          |
| 9016986     | Keerthi                     | Gonuguntla    |
| 9027375     | Khushbu                     | Lad           |


## Configuration and procedure to run code

### Project and environment setup

1. Move to project directory "PROG8431" where you have cloned the project
2. Create virtual environment with name **"venvPROG8431"**
    - Make sure ```python --version``` is set to **12.3.6** in your system
    - ```python -m venv venvPROG8431```
3. Activate environment
    - ```.\venvPROG8431\Scripts\Activate.ps1```
    - In case you are using visual studio code, Choose the environment from menu as active environment
4. Install packages mentioned in **"requirements.txt"**
    - ```pip install -r requirements.txt```
5. Select **"venvPROG8431"** environment in your IDE
6. Create folder named **"Dataset"** in your project directory
7. Move all files downloaded from Kaggle dataset in the "Dataset" Directory
8. Open "Workshop1.ipynb" and run first program snippet. 
9. It should display top 5 rows from file "RAW_recipe.csv"


### Update Requirements.txt file once installing new packages

```pip freeze > requirements.txt```

### Import Packages

In [3]:
import pandas as pd
from scipy.stats import norm
import math

### Read data from source : RAW_recipe.csv

In [6]:
class RawRecipe:
    def __init__(self):
        self.file_path = './Dataset/RAW_recipes.csv'
        self.data = None
    
    # Loads the data from a CSV file.
    def load_data(self):
        self.data = pd.read_csv(self.file_path)
        print(f"---> STEP 1 : Loads the data from a CSV file. \r\n")
        print(f"RAW_recipes.csv : Data loaded successfully.")
        print(f"Total Records : {self.data.shape[0]} \r\n")
        return self.data
    
    # Data quality : Null Check
    def check_null_values(self):
        print(f"---> STEP 2 : Null Check for data \r\n")
        if self.data is not None:
            nulls = self.data.isnull().sum()
            print(nulls)
            return nulls
        else:
            print("Data not loaded.")

    # Data quality : Duplicate Check
    def check_duplicate_values(self):
        print(f"---> STEP 3 : Duplicate data Check for name \r\n")
        if self.data is not None:
            dupl = self.data[self.data.duplicated(subset="name")]
            print(dupl)
            return dupl
        else:
            print("Data not loaded.")

    
if __name__ == "__main__":

    # Create an instance of the DataAnalytics class
    recepeData = RawRecipe()
    
    # Load data
    recepeData.load_data()

    # Check for missing values
    recepeData.check_null_values()

    recepeData.check_duplicate_values()

---> STEP 1 : Loads the data from a CSV file. 

RAW_recipes.csv : Data loaded successfully.
Total Records : 231637 

---> STEP 2 : Null Check for data 

name                 1
id                   0
minutes              0
contributor_id       0
submitted            0
tags                 0
nutrition            0
n_steps              0
steps                0
description       4979
ingredients          0
n_ingredients        0
dtype: int64
---> STEP 3 : Duplicate data Check 

                                      name      id  minutes  contributor_id  \
600                cream  of mushroom soup   51922       10           59064   
846                           10 bean soup  470575      180         1020526   
1314                    3 bean baked beans  313237       40          407338   
1315                          3 bean salad  258846       15          607801   
1335    3 ingredient peanut butter cookies  323810       13          138435   
...                                    ...     

### Data Cleanup

1. Dataset contains "null" values in only "description" column which is not mandatory data which affects our analysis
2. There is 1 null value in "name" column that we will be eliminating while performing other operations
3. Duplicate check does not apply in Minutes, number of steps, nunber of ingrediants and ingrediants columns in dataset because it does not give any valuable insights
