# Development Cycle for Diet Recommendation System

 1. Problem Definition
 2. Data Collection
 3. Exploratory Data Analysis (EDA)
 4. Data Preprocessing
 5. Feature Engineering
 6. Model Selection
 7. Model Training
 8. Model Evaluation
 9. Integration
10. Deployment



# 1. Problem Definition

## Goal:
To provide personalized diet recommendations based on user profiles.

## Key Objectives:
- **Analyze User Profiles**: Understand user demographics and their health-related metrics like age, weight, height, activity level, etc.
- **Map Profiles with Food and Exercise Data**: Link the user's preferences and health information with appropriate food items and exercises.
- **Build a Recommendation System**: Use machine learning algorithms to recommend diet plans tailored to each user's profile, incorporating their activity level, health status, and goals.

---

# 2. Data Collection

## Datasets Used:

### 1. **User Profiles Dataset**:
Contains detailed user profiles, including:
- **age**: The user's age.
- **gender**: The user's gender (Male, Female, Non-Binary).
- **height_cm**: The user's height in centimeters.
- **weight_kg**: The user's weight in kilograms.
- **activity_level**: The user's activity level (Sedentary, Lightly Active, Moderately Active, Very Active).
- **diet_recommendation**: Recommended diet (e.g., Low Carb, High Protein, Balanced, Keto).
- **blood_pressure**: Blood pressure measurement (Systolic/Diastolic).
- **cholesterol_level**: The user's cholesterol level (Normal, Borderline, High).
- **calorie_intake**: Daily recommended calorie intake.
- **allergy_status**: Any known food allergies (None, Gluten, Dairy, Nuts, Eggs).

This dataset helps in understanding user demographics and health status, which forms the basis of personalized recommendations.

### 2. **Food Nutritional Dataset**:
Contains nutritional information for various food items, including:
- **food_id**: A unique identifier for each food item.
- **food_name**: Name of the food item.
- **calories**: Total calorie count per serving.
- **protein**: Protein content per serving.
- **carbs**: Carbohydrate content per serving.
- **fat**: Fat content per serving.
- **serving_size**: The size of a single serving (grams or milliliters).
- **food_category**: Type of food (e.g., Fruits, Vegetables, Grains, Proteins, etc.).

This dataset is essential for providing calorie and nutrient breakdowns for food recommendations.

### 3. **Exercise Dataset**:
Contains information about various exercises and their caloric burn per minute, including:
- **exercise_id**: A unique identifier for each exercise.
- **exercise_name**: Name of the exercise (e.g., Running, Cycling, Swimming).
- **calories_burnt_per_minute**: Number of calories burnt per minute for the exercise.
- **intensity_level**: Intensity of the exercise (Low, Moderate, High).
- **duration_minutes**: Recommended duration for the exercise (minutes).

This dataset helps in suggesting suitable exercises based on a user's activity level and caloric requirements.


## 2.1 Importing Data and Required Packages

### Steps:
1. Import essential Python libraries for analysis and visualization.
2. Load the datasets into Pandas DataFrames.
3. Preview the datasets to ensure successful loading.

---

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Importing datasets and converting it to pandas DataFrame

In [10]:
user_profiles = pd.read_csv(r'data/user_profiles_raw.csv')
food_data = pd.read_csv(r'data/food_nutritional_data_raw.csv')
exercise_data = pd.read_csv(r'data/exercise_data_raw.csv')

#### Show Top 5 Records

In [11]:
user_profiles.head()

Unnamed: 0,user_id,age,gender,height_cm,weight_kg,activity_level,diet_recommendation,blood_pressure,cholesterol_level,calorie_intake,allergy_status
0,fad5fb99-161b-4edc-9d0f-2f3a48f07fe1,39,Non-Binary,178.2,46.8,Sedentary,Keto,176/60,Borderline,1504.6,Nuts
1,238894af-d7aa-460e-8f7d-9d8c95f68797,22,Male,140.6,115.1,Sedentary,High Protein,100/116,High,2521.4,Dairy
2,4ddaa68f-c015-4766-8420-6ead9c1085c9,30,Female,179.4,55.4,Sedentary,Keto,148/104,Borderline,3015.5,
3,187b6276-2799-44c9-ace5-e36c67d1a170,43,Female,180.1,84.0,Moderately Active,Low Carb,153/101,Borderline,3110.3,Eggs
4,999d5e2d-4aca-48d5-9c78-eb901afa76e5,56,Female,153.0,40.0,Lightly Active,Balanced,98/120,Normal,3432.7,Dairy


In [12]:
food_data.head()

Unnamed: 0,food_id,calories,protein,carbs,fat,allergens
0,1,595.6,48.5,87.9,49.7,Gluten
1,2,258.3,0.4,80.2,13.9,Dairy
2,3,684.4,1.4,64.2,22.9,
3,4,440.8,38.0,91.7,13.2,Eggs
4,5,345.6,34.9,93.9,19.3,Dairy


In [13]:
exercise_data.head()

Unnamed: 0,exercise_name,calories_burned,duration_min,intensity
0,state,559.6,53.2,High
1,road,343.3,91.9,High
2,majority,350.4,73.5,Medium
3,bar,629.9,31.1,Low
4,professor,428.2,50.4,Low


#### Shape of the dataset

In [14]:
print(f"Shape of User Data : {user_profiles.shape}")
print(f"Shape of food data : {food_data.shape}")
print(f"Shape of excercise data : {exercise_data.shape}")

Shape of User Data : (1000000, 11)
Shape of food data : (10000, 6)
Shape of excercise data : (5000, 4)


## 2.2 Dataset Information

### 2.2.1 User Profiles Dataset
This dataset contains user information, which includes physical attributes, lifestyle, and dietary needs. Each row represents an individual user with the following columns:

- **user_id**: Unique identifier for each user.
- **age**: Age of the user.
- **gender**: Gender of the user.
- **height_cm**: Height of the user in centimeters.
- **weight_kg**: Weight of the user in kilograms.
- **activity_level**: The activity level of the user .
- **dit_recommendation**: The recommended diet for the user.
- **blood_pressure**: Blood pressure of the user.
- **cholesterol_level**: Cholesterol level of the user.
- **calorie_intake**: Daily calorie intake.
- **allergy_status**: Allergies the user has.

This dataset has a total of 100,000 rows, with each row representing a unique user profile. The data is generated with random values for these columns, with some missing data introduced to simulate real-world scenarios. This dataset can be used to perform exploratory data analysis, outlier detection, and to train machine learning models for personalized diet recommendations.

---

### 2.2.2 Food Nutritional Dataset
The Food Nutritional dataset contains nutritional information for a variety of food items. Each row represents a specific food item and contains the following columns:

- **food_id**: Unique identifier for each food item.
- **food_name**: Name of the food item.
- **food_category**: Category of the food item (e.g., Fruits, Vegetables, Grains, Dairy, etc.).
- **calories**: The amount of calories per serving (kcal).
- **protein**: The amount of protein per serving (g).
- **carbs**: The amount of carbohydrates per serving (g).
- **fat**: The amount of fat per serving (g).
- **fiber**: The amount of fiber per serving (g).
- **sodium**: The amount of sodium per serving (mg).
- **sugar**: The amount of sugar per serving (g).

This dataset consists of various food items that are typically part of a balanced diet. The nutritional information is helpful in providing personalized dietary recommendations based on the user's preferences and requirements. The dataset includes food categories, which can help users select foods within specific groups (e.g., fruits, vegetables, etc.).

---

### 2.2.3 Exercise Data
This dataset includes information about various exercise types and their corresponding calorie burn rates. Each row represents a specific exercise activity and contains the following columns:

- **exercise_id**: Unique identifier for each exercise.
- **exercise_name**: Name of the exercise activity (e.g., Running, Cycling, Yoga, etc.).
- **intensity_level**: Intensity level of the exercise (Low, Moderate, High).
- **duration_minutes**: Duration of the exercise (minutes).
- **calories_burnt_per_minute**: The number of calories burnt per minute of exercise.
- **muscle_group**: The primary muscle group targeted by the exercise (e.g., Legs, Core, Arms, etc.).

This dataset is useful for understanding the relationship between exercise types and their impact on calorie burning, which can be an essential part of a personalized diet and fitness recommendation system.

---

### 2.2.4 Summary of Dataset Sizes
- **User Profiles Dataset**: 100,000 rows, 11 columns.
- **Food Nutritional Dataset**: 10,000 rows, 10 columns (approximately).
- **Exercise Data**: 5,000 rows, 6 columns.

The datasets are large and contain both numerical and categorical features. These datasets provide a comprehensive foundation for building a recommendation system that personalizes diet and exercise recommendations based on user profiles.

---

### 2.2.5 Data Quality and Preprocessing

- **Missing Values**: The datasets contain missing values that were intentionally introduced. Missing data is handled by filling or removing the values during the preprocessing phase.
- **Outliers**: Some of the columns (e.g., weight, calorie intake) may contain outliers, which will be identified and handled during the preprocessing stage.
- **Feature Engineering**: New features like BMI (Body Mass Index) may be derived from existing features to provide more meaningful insights.
- **Data Normalization/Standardization**: Numerical features may be normalized or standardized to improve model performance during training.

By performing the necessary preprocessing steps, the data can be prepared for use in machine learning models to generate personalized diet and exercise recommendations.

---


In [17]:
user_profiles.select_dtypes("object").describe()

Unnamed: 0,user_id,gender,activity_level,diet_recommendation,blood_pressure,cholesterol_level,allergy_status
count,1000000,1000000,1000000,995036,999719,1000000,800361
unique,1000000,3,4,4,5551,3,4
top,61992c79-2de6-4236-b993-7664844fe5d4,Non-Binary,Very Active,Keto,171/92,High,Gluten
freq,1,333658,250378,249339,240,333545,200628
