To create a machine learning model using your Food_and_Nutrition.csv dataset, here’s a step-by-step plan:
### 1.Load and Explore the Data:
     Read the CSV, inspect columns, and check for missing values.
### 2.Preprocess the Data: 
    Encode categorical variables, handle missing data, and scale/normalize features if needed.
### 3.Define the Problem: 
    Decide what you want to predict (target variable). For example, you might want to predict "Disease" based on the other features.
### 4.Split the Data: 
    Divide the data into training and testing sets.
### 5.Train a Model: 
    Use a suitable algorithm (e.g., Random Forest, Logistic Regression, etc.).
### 6.Evaluate the Model: 
    Check accuracy, precision, recall, etc.
### 7.Save the Model: 
    Optionally, save the trained model for future use.

## 1. Import Libraries

In [36]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler, MultiLabelBinarizer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score
from sklearn.preprocessing import LabelEncoder

## 2. Load and Inspect Data

In [37]:
df = pd.read_csv('Food_and_Nutrition.csv')
print(df.head())
print(df.info())

   Ages  Gender  Height  Weight     Activity Level Dietary Preference  \
0    25    Male     180      80  Moderately Active           Omnivore   
1    32  Female     165      65     Lightly Active         Vegetarian   
2    48    Male     175      95          Sedentary              Vegan   
3    55  Female     160      70        Very Active           Omnivore   
4    62    Male     170      85          Sedentary         Vegetarian   

   Daily Calorie Target  Protein  Sugar  Sodium  Calories  Carbohydrates  \
0                  2000      120  125.0    24.0      2020            250   
1                  1600       80  100.0    16.0      1480            200   
2                  2200      100  150.0    20.0      2185            300   
3                  2500      140  175.0    28.0      2680            350   
4                  2000       80  125.0    16.0      1815            250   

   Fiber  Fat                               Breakfast Suggestion  \
0   30.0   60                      O

##   3. Preprocess Data
    Encode categorical columns (Gender, Activity Level, Dietary Preference)
    Convert multi-label "Disease" column to binary columns
    Drop suggestion columns (as they are text and not useful for prediction)

In [38]:
# Drop suggestion columns
suggestion_cols = ['Breakfast Suggestion', 'Lunch Suggestion', 'Dinner Suggestion', 'Snack Suggestion']
df = df.drop(columns=suggestion_cols)

# Create and store encoders for each categorical column
label_encoders = {}
for col in ['Gender', 'Activity Level', 'Dietary Preference']:
    le = LabelEncoder()
    df[col] = le.fit_transform(df[col])
    label_encoders[col] = le
    
# Handle multi-label Disease column
df['Disease'] = df['Disease'].str.replace(' ', '')  # Remove spaces
mlb = MultiLabelBinarizer()
disease_encoded = mlb.fit_transform(df['Disease'].str.split(','))
disease_df = pd.DataFrame(disease_encoded, columns=mlb.classes_)
df = pd.concat([df.drop(columns=['Disease']), disease_df], axis=1)

print(df.head())

   Ages  Gender  Height  Weight  Activity Level  Dietary Preference  \
0    25       1     180      80               2                   0   
1    32       0     165      65               1                   3   
2    48       1     175      95               3                   2   
3    55       0     160      70               4                   0   
4    62       1     170      85               3                   3   

   Daily Calorie Target  Protein  Sugar  Sodium  ...  Carbohydrates  Fiber  \
0                  2000      120  125.0    24.0  ...            250   30.0   
1                  1600       80  100.0    16.0  ...            200   24.0   
2                  2200      100  150.0    20.0  ...            300   36.0   
3                  2500      140  175.0    28.0  ...            350   42.0   
4                  2000       80  125.0    16.0  ...            250   30.0   

   Fat  Acne  Diabetes  HeartDisease  Hypertension  KidneyDisease  WeightGain  \
0   60     0         0 

## 4. Split Data

In [39]:
X = df.drop(columns=mlb.classes_)  # Features
y = df[mlb.classes_]               # Multi-label targets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## 5. Train Model

In [40]:
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)

## 6. Evaluate Model

In [41]:
# Example: Predict for a new sample (replace with actual values)
sample = X_test.iloc[0:1]
pred = clf.predict(sample)
print("Predicted diseases:", mlb.inverse_transform(pred))

Predicted diseases: [('WeightGain',)]


# 7. Predict on New Data

In [42]:
# Example: Predict for a new sample (replace with actual values)
sample = X_test.iloc[0:1]
pred = clf.predict(sample)
print("Predicted diseases:", mlb.inverse_transform(pred))

Predicted diseases: [('WeightGain',)]


In [43]:
import joblib

# Save the trained model
joblib.dump(clf, 'model.pkl')

# Save the encoders
encoders = {
    'label_encoders': label_encoders,
    'mlb': mlb
}
joblib.dump(encoders, 'encoders.pkl')

['encoders.pkl']